LangSmith releases an evaluator template library and reusable evaluators, simplifying multi-level evaluation work for AI agents. The templates cover five major categories: safety and protection, response quality, execution trajectories, user behavior analysis, and multimodal evaluation, and include optimized judging prompts and rule-based evaluators. They are suitable for online monitoring and offline experiments. Reusable evaluators are centrally managed at the organizational level, with a new Evaluators tab, one-click mounting to new projects, and prompts that take effect globally. The library is open-sourced and released with openevals v0.2.0, adding multimodal support.

MeNews

2026-05-21 00:50:03

Abstract generation in progress

ME News Report, April 17 (UTC+8), according to Beating Monitoring, the observability tool LangSmith under the AI agent development platform LangChain has released two updates: an evaluator template library and reusable evaluators.
Assessing whether an AI agent is "useful" is currently one of the most time-consuming parts of development.
Agents may call the correct tools but produce incorrect response formats, perform normally in single-turn conversations but crash in multi-turn dialogues, or produce seemingly reasonable answers but retrieve incorrect documents in the intermediate steps.
Developers need to set checkpoints at multiple levels—single steps, complete trajectories, multi-turn conversations, specific tool calls—and each evaluator must go through writing prompts, calibrating with real data, and repeated tuning.
Starting from scratch often takes several weeks.
LangSmith now offers over 30 ready-made templates covering five categories: safety and protection (prompt injection detection, personal information leakage checks, bias and toxicity), response quality (accuracy, usefulness, tone), execution trajectory (whether the agent followed the correct steps), user behavior analysis (language distribution, satisfaction signals), and multimodal (voice and image output review).
The templates include fine-tuned LLM evaluation prompts and rule-based code evaluators, which can be used directly or customized, suitable for both online monitoring and offline experiments.
Reusable evaluators address organizational management issues: the newly added Evaluators tab centrally displays all evaluators within the workspace, allows one-click deployment to new projects, and updates to prompts take effect globally without maintaining duplicate copies in each project.
The above templates are open-sourced simultaneously with the release of openevals v0.2.0, which adds support for multimodal evaluation.
(Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

8 Likes

Reward
8
7
6
Share

Comment

Add a comment

GateUser-4bd1cc87

· 2h ago

A multi-level evaluation finally has a unified plan, and the issue of inconsistent standards among team members has been alleviated significantly.

View OriginalReply0

MempoolDrifter

· 2h ago

The user behavior analysis template is quite interesting; I finally can systematically see how agents are being used.

View OriginalReply0

GateUser-5578154d

· 2h ago

The one-click mounting to a new project feature is very friendly for us who run multiple projects in parallel.

View OriginalReply0

MistValleyFront

· 2h ago

Security and protection templates are a must-have; the biggest concern before launching the AI agent is this aspect.

View OriginalReply0

MorningGoldAsWavesCrashAgainst

· 2h ago

The Evaluators tab is designed quite intuitively; it's easy to find.

View OriginalReply0

PermissionedFury

· 2h ago

OpenEvals v0.2.0 is highly praised; community collaboration is much better than working behind closed doors.

View OriginalReply0

GateUser-176c498f

· 2h ago

LangSmith's latest update is really useful. It used to be a headache to write evaluators, but now just applying templates saves a lot of trouble.

View OriginalReply0

Trending Topics
View More
#
TradfiTradingChallenge
208.76K Popularity
#
GrayscaleBuysAndStakesOver510KHYPE
8.89M Popularity
#
DailyPolymarketHotspot
1.01M Popularity
#
SpaceXOfficiallyFilesforIPO
734.02K Popularity
#
GateSquarePizzaDay
571.51K Popularity

Pinned

Sitemap

LangSmith has launched over 30 evaluation templates, so quality checks for AI agents no longer need to be written from scratch.

Trending Topics

TradfiTradingChallenge

GrayscaleBuysAndStakesOver510KHYPE

DailyPolymarketHotspot

SpaceXOfficiallyFilesforIPO

GateSquarePizzaDay

Pinned