LangSmith has launched over 30 evaluation templates, so quality checks for AI agents no longer need to be written from scratch.

ME News message: On April 17 (UTC+8), according to Dongcha Beating monitoring, LangSmith, the observability tool under the AI agent development platform LangChain, released two updates: an evaluator template library and reusable evaluators.

Evaluating whether an AI agent is “useful” is currently one of the most time-consuming parts of development. An agent may call the correct tools but return answers in an incorrect format; single-turn conversations may work normally, but multi-turn conversations may fail; the final answer may look reasonable, yet the intermediate steps retrieve the wrong documents. Developers need to set checkpoints at multiple levels—single steps, complete trajectories, multi-turn conversations, specific tool calls, etc.—and each evaluator must go through the process of writing prompts, calibrating against real data, and repeatedly tuning; starting from scratch often takes weeks.

LangSmith now provides more than 30 ready-made templates covering five categories: Safety and protection (prompt injection detection, personal information leakage checks, bias and toxicity), Answer quality (correctness, usefulness, tone), Execution trajectory (whether the agent followed the correct steps), User behavior analysis (language distribution, satisfaction signals), and Multimodal (review of voice and image outputs). The templates include fine-tuned LLM judging prompts and rule-based code evaluators, which can be used directly or customized as needed, and are suitable for online monitoring and offline experiments.

Reusable evaluators address organizational-level management issues: the newly added Evaluators tab centrally displays all evaluators within the workspace, enables one-click mounting to new projects, and applies globally after updating prompts—without needing to maintain duplicate copies in each project. The above templates are released as open source together with openevals v0.2.0, with added support for multimodal evaluation.

(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 7
  • 12
  • Share
Comment
Add a comment
Add a comment
DegenWithNotebook
· 11h ago
Evaluator template library + reusable evaluators, the combination focuses on improving development efficiency
View OriginalReply0
OutsiderOfZhiyuandao
· 12h ago
Monitoring of Beating is quite fast, and the LangChain ecosystem is becoming increasingly active.
View OriginalReply0
StargazerInTheWoods
· 12h ago
The reusable evaluator design idea is good, avoiding reinventing the wheel.
View OriginalReply0
QuietValidator
· 12h ago
Counting weeks from zero vs ready-made templates, this comparison is a bit of a blow to the heart
View OriginalReply0
AirdropDreamsInAGlassBottle
· 12h ago
Multi-turn conversation crashes—this is so realistic. Finally, someone is seriously solving it.
View OriginalReply0
Don’tRushToDoubleItYet.
· 12h ago
Can more than 30 templates save a few weeks? I'll wait and see the actual results.
View OriginalReply0
MirrorBallPeeking
· 12h ago
LangSmith's recent update indeed hits the pain points; evaluating AI agents was too frustrating.
View OriginalReply0
  • Pinned