Anthropic has called for protecting AI agents based on the Zero Trust principle - ForkLog: cryptocurrencies, AI, singularity, the future

AI-agents ИИ агенты 3# Anthropic Calls for Protecting AI Agents Based on Zero Trust Principles

The Anthropic team published a guide titled "Zero Trust for AI Agents" on their blog about securely deploying autonomous AI agents in corporate environments. The document outlines key risks of agent systems and approaches to business cybersecurity.

AI Accelerated the Attack Cycle

According to Anthropic, advanced models have shortened the gap between vulnerability discovery and exploitation from months to hours. The company suggests considering not only AI-accelerated attacks on infrastructure but also the risks posed by the agents themselves, which can interpret goals, select tools, and perform multi-step actions without constant human oversight.

The guide is based on Zero Trust principles: do not trust by default, verify every action, and assume possible compromise. Anthropic references the NIST SP 800-207 recommendations, published in 2020, and a series of Zero Trust Implementation Guidelines that the NSA began releasing in 2026. The guide is positioned as a practical framework for security teams, architects, and engineers, rather than a universal compliance scheme.

Among the key threats listed are direct and indirect prompt injections, tool infections, identity and privilege abuse, memory and context poisoning, and supply chain attacks.

Direct prompt poisoning is described as inserting malicious instructions via user input, while indirect poisoning occurs through web pages, emails, documents, and other external sources that the agent processes during operation.

The document discusses substituting legitimate tools with malicious ones and dangerous call chains, where individually safe means combined can produce risky outcomes. Anthropic uses concepts like "blast radius" and "least agency": not only minimal access rights but also strict restrictions on agent actions, call frequency, and accessible areas.

Zero Trust for Agent Systems

To defend against these threats, the company proposes a three-level maturity model and a set of basic technical measures. At the initial level, the guide recommends issuing each agent instance a unique cryptographic identity, using short-lived tokens, applying a "deny by default" policy, and role-based access control. For agents working with untrusted inputs like web content and documents, sandboxing is effectively mandatory.

At higher levels, Anthropic suggests implementing:

  • mTLS standard with mutual client-server authentication via digital certificates;
  • hardware-bound identity through HSM or TPM, as well as remote attestation.

Static API keys and shared service account passwords are deemed unsuitable even for the basic level.

A large section is dedicated to observability. Anthropic recommends detailed logging of all agent actions, including tool calls, data access, and external communications, then transmitting events to a SIEM for real-time correlation. Key metrics include dwell time and coverage. For critical systems, the target detection time for anomalies is within one hour. The guide also suggests building a "traceability matrix" to link each agent action to the original request and reconstruct the full decision chain.

The Future of Security Operations Centers — Agents Under Human Control

In terms of incident response, Anthropic formulates the principle: automate the bureaucracy around incidents but not the key decisions. Agents and models should handle collecting and initially filtering artifacts, maintaining parallel investigation tracks, and drafting post-mortem reports. Decisions on containment, disclosure, and client communication should remain with humans. The same approach applies to "security operations" — transitioning from traditional SOAR to agent-based systems.

The document provides quantitative benchmarks. Anthropic cites Microsoft’s Spotlighting research, where the success rate of indirect prompt poisoning attacks decreased from over 50% to less than 2%. The company also reports its own results using "constitutional classifiers," which reportedly block over 95% of jailbreak attempts with minimal false positives.

In the supply chain section, Anthropic recommends using AI-BOM, OpenSSF Scorecard, dependency audits, and access analysis. As evidence, the company cites its own research indicating that 250 malicious documents are sufficient to embed a backdoor in models ranging from 600 million to 13 billion parameters.

Ultimately, Anthropic concludes that point filters and perimeter defenses are insufficient for AI agents. Instead, protection should be built around identity, minimal permissions, pre-defined damage limits, and continuous action verification. According to Anthropic, organizations with the strongest basic security architecture will be best positioned, not necessarily those with the most advanced AI.

Recall that in June, the Anthropic team warned about the risks of recursive AI self-improvement.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments