Anthropic Open-Source AI Security Workflow: Seven-Stage Automated Vulnerability Detection, Verification, and Patch Generation

Anthropic has open-sourced a Claude-driven automated security pipeline, which handles everything from vulnerability discovery, multi-layer verification, to patch generation. The entire system is AI-automated and collaborative, allowing any security team to set it up independently.
(Background: Anthropic: "Mythos Preview" model decision-making surpasses human experts with a 64% success rate)
(Additional context: Bloomberg reports that Claude Mythos has been accessed without authorization! The most difficult breach for Anthropic to defend against is always "people")

Table of Contents

Toggle

  • Seven stages, a self-verifying pipeline
  • Two paths, one choice
  • The same tool is exposing the wall

Claude Opus under Anthropic has recently discovered hundreds of security vulnerabilities across numerous open-source repositories. These flaws "despite years of expert review, remained unnoticed," highlighting the structural limits of manual inspection.

Recently, Anthropic open-sourced this entire pipeline—automatic vulnerability discovery, multi-layer verification, and final patch generation—on GitHub. Any security team can deploy, customize, and adapt it to their own codebases.

Seven stages, a self-verifying pipeline

The entire system is called Defending Code Reference Harness, centered around a seven-stage automated pipeline:

Build, Recon, Find, Verify, Dedupe, Report, Patch. Each stage is handled by an independent AI agent, passing only minimal information between stages to prevent subjective bias from previous reasoning contaminating subsequent analysis.

The Build stage compiles the target software into an image with AddressSanitizer (ASAN) enabled. ASAN is a memory safety bug detector—think of it as a "mine detector" for memory vulnerabilities. When the program accesses illegal memory, it triggers an immediate alert. This image is shared across all subsequent stages, ensuring every AI sees the exact same code environment.

The Find stage is the core engine. N parallel AI agents operate in isolated containers, reading source code and generating malicious inputs. This "malicious input generation" is essentially fuzzing: feeding strange, malformed, out-of-bounds data to see if the program crashes.

Agents only submit findings after they can reproduce a crash consistently three times, filtering out false positives. False positives—mistakenly classifying normal behavior as vulnerabilities—are a common criticism of security tools.

Anthropic emphasizes that the system employs multiple verification layers to assign confidence scores and severity levels to each reported vulnerability.

Next is Verify. A new agent runs the proof-of-concept (PoC)—the minimal executable demonstrating the vulnerability—in an isolated container. Only the raw bytes of the PoC are exchanged between containers. This ensures the verifying agent cannot see the previous reasoning process, maintaining independence.

The Report stage generates a comprehensive exploitability analysis for each vulnerability. An independent scoring agent reviews the report to verify that the points made align with source code line numbers and actual execution results. Before generating candidate patches in the Patch stage, manual confirmation is required.

The entire pipeline runs within gVisor sandbox. gVisor is a lightweight virtualization technology that isolates the environment at the OS kernel level. No matter what code the AI agents execute inside containers, they cannot access the host file system, and network access is limited to the Claude API endpoint, preventing data leaks.

Two paths, one choice

The system offers two usage paths, with significant complexity differences. Anthropic recommends starting with the simpler one.

First: Interactive Skills. Just four commands:

git clone https://github.com/anthropics/defending-code-reference-harness cd defending-code-reference-harness claude /quickstart

Running /quickstart guides you through a complete interactive process on a demo target: threat modeling → static vulnerability scanning → manual classification and deduplication → patch generation. No container setup needed. Ideal for understanding the workflow before automating.

Second: Autonomous Pipeline. Requires installing gVisor sandbox and setting ANTHROPIC_API_KEY. Then you can run the full seven-stage process on real targets, producing confidence-scored vulnerability reports and candidate patches. The GitHub repo includes a vulnerable sample library called drlibs. It’s recommended to practice on this before switching to your own targets.

Anthropic suggests a schedule: Day 1, run the full interactive process; Day 2, switch to the automated pipeline on C/C++ targets; Days 3-5, use /customize skills to adapt to other languages or vulnerability types.

A key takeaway from the documentation: "Successful teams resist the urge to design a perfect pipeline before starting. Run it first, then iterate."

The same tool is exposing the wall

The asymmetry in cybersecurity offense and defense has long been structural. Attackers only need to find one entry point; defenders must block every gap.

Targets like GhostScript, OpenSC, CGIF—widely deployed open-source projects—contain vulnerabilities lurking for decades, unnoticed by manual review. Until Claude Opus autonomously reads commit histories, infers incomplete patches, traces logic across files, and constructs executable PoCs. This process relies not on rule matching but on reasoning.

Anthropic offers two routes: the open-source Defending Code Reference Harness for teams wanting full control—self-hosted and customizable; and Claude Security, a fully managed commercial version requiring no gVisor setup or infrastructure management.

The open-source version provides transparency and control, while the hosted version offers frictionless onboarding. Behind both is Anthropic’s strategic positioning of defensive security tools as foundational infrastructure.

Vulnerability discovery was once limited to top-tier red teams with substantial resources. Now, with this open pipeline, the asymmetrical wall between attackers and defenders is being simultaneously torn down from both sides by the same tool.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned