AI Agent Developer Tools Company Raindrop releases an open-source local debugger Workshop (v0.1.6) this week, allowing developers to instantly track each token output and tool call of the Agent, and through MCP enable Claude Code to automatically read, write tests, and fix issues. (Previous context: Is Claude writing code and wildly making mistakes while pretending to be clueless? Transforming Andrej Karpathy's 12 rules to reduce error rates from 41% to 3%) (Additional background: Anthropic launches "Claude for Small Business": targeting small and medium-sized enterprises for AI automation workflows) Your AI Agent just produced a strange result. It chose a tool you didn't expect and output a segment of

動區BlockTempo

2026-05-15 02:45:58

AI Agent Developer Tool Company Raindrop releases this week’s open-source local debugger Workshop (v0.1.6), allowing developers to instantly track each token output and tool call of the Agent, and through MCP enable Claude Code to automatically read, write tests, and fix issues.
(Background: Is Claude writing code and wildly making mistakes or pretending to be stupid? Transforming Andrej Karpathy’s 12 rules to cut error rates from 41% down to 3%)
(Additional background: Anthropic launches “Claude for Small Business”: targeting SMEs with AI automation workflows)

Your AI Agent just produced a strange result. It chose an unexpected tool and output a vague response. You open the logs and see a bunch of API calls and token counts, but no clues as to which decision went wrong.

On May 14, Raindrop released an open-source tool aiming to prevent this scenario: a completely local, free AI Agent debugging workshop. It allows developers to track every token output and tool call in real-time, and then delegate the debugging process itself to Claude Code or Codex.

Why is debugging AI Agents particularly difficult?

Traditional software debugging has breakpoints, complete call stacks, and deterministic execution paths. Debugging AI Agents is different. Their behavior is probabilistic; the same input can lead to completely different paths in different runs. Their decisions are dispersed across multiple layers of LLM calls, making it nearly impossible to infer logic from just the final output.

The core issue is: you’re not looking for “which line of code is wrong,” but rather “under which context combination did the Agent make an unexpected judgment, and at which step did it go wrong.” Such problems cannot be solved with traditional debuggers.

Existing solutions usually have two approaches:

One is cloud monitoring platforms that send traces to third-party services for dashboard analysis
The other is inserting custom logging logic into the code

The former is unfriendly to developers concerned about data privacy, and the latter is time-consuming and requires maintaining a new logging infrastructure with each framework update. Both share a common problem: they tell you “what happened,” but not “how to fix it.”

The Workshop chooses a third path: fully local execution, no data sent to external servers, open-source, free, enabling AI to participate directly in the debugging loop.

How the Workshop works

Once started, the Workshop runs a visual interface locally and exposes an MCP (Model Context Protocol) Server externally. MCP translates to “a standard communication protocol allowing AI tools to call external capabilities”—it’s the bridge for Claude Code and other AI coding tools to access external data.

Once integrated with a supported SDK, every execution node of the Agent—each token output, each tool call, each decision branch—appears streamed in real-time at localhost:5899, without polling or manual refresh.

In plain terms, it’s like opening a monitoring window on your computer, watching live as the AI Agent operates.

The key design of the Workshop is to bring Claude Code and similar assistant tools into the debugging loop. Since the Workshop exposes the MCP Server, Claude Code can directly read trace data, write eval tests based on these traces, run the tests, observe failed assertions, modify the Agent’s code, and rerun—until all tests pass.

Raindrop calls this loop a “self-healing eval cycle.” The entire process is a local closed loop: Claude Code reads traces, writes evals, observes failures, modifies code, and reruns—all automatically, without developer intervention at each step.

The Workshop also supports Replay: pulling trace data from the online environment back to local, re-executing with the real code for regression testing. This is especially useful when bugs occur in production but cannot be reproduced locally; using real traces to run tests saves the time of recreating scenarios.

CODEX-0.91%

TOKEN-0.03%

ANTHROPIC-1.94%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateSquareMayTradingShare
1.73M Popularity
#
CLARITYActPassesSenateCommittee
3.48M Popularity
#
DailyPolymarketHotspot
942.85K Popularity
#
BitcoinVShapedReversalBack
226.95M Popularity
#
WCTCTradingKingPK
796.86K Popularity

Pinned

Sitemap

Raindrop Workshop uses Codex to help your AI Agent automatically find bugs and fix them (free and open source)

Why is debugging AI Agents particularly difficult?

How the Workshop works

Trending Topics

GateSquareMayTradingShare

CLARITYActPassesSenateCommittee

DailyPolymarketHotspot

BitcoinVShapedReversalBack

WCTCTradingKingPK

Pinned