LangSmith has launched over 30 evaluation templates, so quality checks for AI agents no longer need to be written from scratch.

robot
Abstract generation in progress
ME News Report, April 17 (UTC+8), according to Beating Monitoring, the observability tool LangSmith under the AI agent development platform LangChain has released two updates: an evaluator template library and reusable evaluators.
Assessing whether an AI agent is "useful" is currently one of the most time-consuming parts of development.
Agents may call the correct tools but produce incorrect response formats, perform normally in single-turn conversations but crash in multi-turn dialogues, or produce seemingly reasonable answers but retrieve incorrect documents in the intermediate steps.
Developers need to set checkpoints at multiple levels—single steps, complete trajectories, multi-turn conversations, specific tool calls—and each evaluator must go through writing prompts, calibrating with real data, and repeated tuning.
Starting from scratch often takes several weeks.
LangSmith now offers over 30 ready-made templates covering five categories: safety and protection (prompt injection detection, personal information leakage checks, bias and toxicity), response quality (accuracy, usefulness, tone), execution trajectory (whether the agent followed the correct steps), user behavior analysis (language distribution, satisfaction signals), and multimodal (voice and image output review).
The templates include fine-tuned LLM evaluation prompts and rule-based code evaluators, which can be used directly or customized, suitable for both online monitoring and offline experiments.
Reusable evaluators address organizational management issues: the newly added Evaluators tab centrally displays all evaluators within the workspace, allows one-click deployment to new projects, and updates to prompts take effect globally without maintaining duplicate copies in each project.
The above templates are open-sourced simultaneously with the release of openevals v0.2.0, which adds support for multimodal evaluation.
(Source: BlockBeats)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 7
  • 6
  • Share
Comment
Add a comment
Add a comment
GateUser-4bd1cc87
· 2h ago
A multi-level evaluation finally has a unified plan, and the issue of inconsistent standards among team members has been alleviated significantly.
View OriginalReply0
MempoolDrifter
· 2h ago
The user behavior analysis template is quite interesting; I finally can systematically see how agents are being used.
View OriginalReply0
GateUser-5578154d
· 2h ago
The one-click mounting to a new project feature is very friendly for us who run multiple projects in parallel.
View OriginalReply0
MistValleyFront
· 2h ago
Security and protection templates are a must-have; the biggest concern before launching the AI agent is this aspect.
View OriginalReply0
MorningGoldAsWavesCrashAgainst
· 2h ago
The Evaluators tab is designed quite intuitively; it's easy to find.
View OriginalReply0
PermissionedFury
· 2h ago
OpenEvals v0.2.0 is highly praised; community collaboration is much better than working behind closed doors.
View OriginalReply0
GateUser-176c498f
· 2h ago
LangSmith's latest update is really useful. It used to be a headache to write evaluators, but now just applying templates saves a lot of trouble.
View OriginalReply0
  • Pinned