Anthropic's Triple Moments: Code Leaks, Government Confrontation, and Weaponization

2026-04-04 06:53:35

Original Title: Anthropic: The Leak, The War, The Weapon
Original Author: BuBBliK
Compiled by: Peggy, BlockBeats

Editor’s Note: Over the past six months, Anthropic has been drawn into a string of events that appear independent on the surface, yet actually point to one another: a leap in model capabilities, automated attacks in the real world, sharp reactions from the capital markets, public clashes with the government, and multiple information leaks caused by mistakes in baseline configuration. When you put these clues together, they collectively outline a clearer direction of change.

This article uses these events as a starting point to review a consecutive trail of an AI company in technical breakthroughs, risk exposure, and governance battles—and attempts to answer a deeper question: when the ability to “find vulnerabilities” is dramatically amplified and gradually spreads, can the cybersecurity ecosystem itself still maintain its original operating logic?

In the past, security was built on scarcity of capability and human constraints; but under the new conditions, offense and defense are circling around the same set of model capabilities, and the boundaries are becoming increasingly blurred. At the same time, the responses of institutions, markets, and organizations still remain stuck in old frameworks, making it difficult to promptly absorb this kind of change.

What this piece is concerned with is not just Anthropic itself, but a larger reality reflected through it: AI is not only changing tools—it is changing the premise of “how security is established.”

The following is the original text:

When a company with a market cap of $380 billion locks horns with the Pentagon and comes out on top, survives what is—historically—the first cyberattack launched by autonomous AI, leaks a model inside its company that even its own developers find frightening, and even “accidentally” discloses complete source code—what would all of that add up to?

The answer is: it’s exactly like this now. And what’s even more unsettling is that the truly most dangerous part may not have happened yet.

Event Recap

Anthropic leaks its own code again

On March 31, 2026, security researcher Shou Chaofan from blockchain company Fuzzland, while checking the Claude Code npm package published by the official, found that it unexpectedly contained a file named cli.js.map in plain text.

The file is 60MB in size, and the contents are even more astonishing. It almost includes the entire product’s complete TypeScript source code. With only this single file, anyone can reconstruct up to 1,906 internal source files, including internal API design, telemetry systems, encryption tools, security logic, the plugin system—nearly all core components laid out. More importantly, this content can even be downloaded directly from Anthropic’s own R2 storage bucket as a zip file.

The discovery quickly spread across social media: within a few hours, related posts racked up 754k views and nearly 1,000 reposts; meanwhile, multiple GitHub repositories with reconstructed code were created immediately and made public.

So-called source map files, in essence, are simply helper files used for JavaScript debugging. Their purpose is to map minified and compiled code back to the original source code, so developers can troubleshoot problems.

But there is a basic principle: they should never be included in production release packages.

This isn’t some advanced attack technique—it’s a basic engineering compliance issue, part of “Build Configuration 101,” even something developers learn in their first week. If it’s mistakenly packaged into production, source maps often amount to “including the source code as a bonus” for everyone.

You can also directly view the relevant code here: https://github.com/instructkr/claude-code

But what’s truly absurd is that this has already happened once before.

In February 2025—just a year ago—there was almost an identical leak: the same file, the same kind of mistake. At the time, Anthropic removed the old version from npm, removed the source map, and republished a new version—and the incident then came to an end.

As a result, in version v2.1.88, this file was packaged and released again.

A company with a market cap of $380 billion that is building one of the world’s most advanced vulnerability detection systems made the same fundamental mistake twice within a year. There was no hacker attack, no complex exploitation chain—just a broken build process that should have been working correctly.

This irony almost feels a bit “poetic.”

That AI could find 500 zero-day vulnerabilities in a single run; that model was used to launch automated attacks against 30 organizations worldwide—yet meanwhile, Anthropic “bundled its own source code” and handed it directly to anyone willing to take a look at the npm package.

Two leaks, separated by no more than seven days.

The reason is essentially the same: the most basic configuration mistake. No technical threshold is required, no complex exploitation chain is needed. As long as you know where to look, anyone can get it for free.

One week ago: “dangerous model” exposed by accident inside

On March 26, 2026, security researchers Roy Paz from LayerX Security and Alexandre Pauwels from the University of Cambridge found a problem with the CMS configuration on Anthropic’s official website, which led to roughly 3,000 internal files being publicly accessible.

These files include: draft blog posts, PDFs, internal documents, and presentation materials—all exposed in an unprotected, searchable data storage. There was no hacker attack, and no technical measures were needed.

Within these files, there are two blog draft versions that are almost identical; the only difference is the model name: one says “Mythos,” and the other says “Capybara.”

This means Anthropic was, at the time, deciding between two names for the same secret project. The company later confirmed: training for this model had already been completed and it had begun testing with some early customers.

This is not a routine upgrade to Opus—it’s a brand-new “fourth-tier” model, one positioned even above Opus within the system.

In Anthropic’s own draft, it is described as: “bigger and smarter than our Opus model—and Opus is still our most powerful model to date.” It achieves significant leaps in programming ability, academic reasoning, and also in areas like cybersecurity. A spokesperson called it a “qualitative leap,” and also “the strongest model we’ve built so far.”

But what’s truly worth focusing on isn’t those performance descriptions themselves.

In the leaked draft, Anthropic’s assessment of the model is that it “introduces unprecedented cybersecurity risks,” “far surpasses any other AI model in cyber capabilities,” and “previews an incoming wave of models—its ability to exploit vulnerabilities will far exceed the speed at which defenders can respond.”

In other words, in an official blog draft that was not yet public, Anthropic has already clearly expressed a rare stance: they feel uneasy about the product they are building.

The market’s reaction was nearly immediate. CrowdStrike shares fell 7%, Palo Alto Networks fell 6%, and Zscaler fell 4.5%. Okta and SentinelOne both dropped more than 7%, while Tenable plunged 9%. The iShares Cybersecurity ETF dropped 4.5% in a single day. Just for CrowdStrike alone, its market cap evaporated by about $15 billion that day. At the same time, Bitcoin slipped to $66,000.

The market clearly interpreted this incident as a “verdict” on the entire cybersecurity industry.

Figure summary: Due to the impact of the relevant news, the cybersecurity sector fell overall, and multiple leading companies (such as CrowdStrike, Palo Alto Networks, Zscaler, etc.) saw noticeable declines. This reflects market concerns about AI’s impact on the cybersecurity industry. But this reaction is not the first time it has appeared. Previously, when Anthropic released a code-scanning tool, related stocks also fell. This suggests the market has already started viewing AI as a structural threat to traditional security vendors, with the entire software industry under similar pressure.

Stifel analyst Adam Borg’s assessment was quite direct: the model “has the potential to become the ultimate hacker tool—able to elevate ordinary hackers into adversaries with nation-state-level attack capabilities.”

So why hasn’t it been released to the public yet? Anthropic’s explanation is that the operating cost of Mythos is “very high,” and it isn’t yet ready for public release. The current plan is to first grant early access to a small group of cybersecurity partners to strengthen the defense ecosystem; then gradually expand the scope of API availability. Until then, the company is still continuously optimizing efficiency.

But the key is that this model already exists, is already being tested, and even just because it was “accidentally exposed,” it has already sent shockwaves through the entire capital market.

Anthropic is building an AI model that it itself calls “the most cyber-risky AI model ever.” Yet the leak of its news stems precisely from the most basic kind of infrastructure configuration error—exactly the kind of error these models were originally designed to find.

March 2026: Anthropic’s standoff with the Pentagon—and prevailing

In July 2025, Anthropic signed a $200 million contract with the U.S. Department of Defense. At first it looked like a routine collaboration. But during subsequent negotiations over actual deployment, the contradictions escalated quickly.

The Pentagon wanted “full access” to Claude on its GenAI.mil platform, for purposes including all “lawful purposes”—which even covers fully autonomous weapon systems and large-scale domestic surveillance of U.S. citizens.

Anthropic drew red lines and clearly refused on two key issues, and the negotiations broke down in September 2025.

Then the situation began to escalate rapidly. On February 27, 2026, Donald Trump posted on Truth Social, calling for all federal agencies to “immediately stop” using Anthropic’s technology and labeling the company as “radical left-wing.”

On March 5, 2026, the U.S. Department of Defense officially classified Anthropic as a “supply chain risk.”

This label had previously been used almost exclusively for foreign adversaries—such as Chinese companies or Russian entities—but now it was being applied for the first time to a U.S. company headquartered in San Francisco. Meanwhile, companies such as Amazon, Microsoft, and Palantir Technologies were also required to prove that none of their military-related businesses use Claude.

The explanation given by Pentagon CTO Emile Michael for this decision was that Claude might “contaminate the supply chain” because model-internal embeddings include different “policy preferences.” In other words, in official framing, an AI that has usage restrictions and won’t unconditionally assist in killing behavior is instead treated as a national security risk.

On March 26, 2026, a federal judge Rita Lin issued a lengthy 43-page ruling that broadly blocked the Pentagon’s actions.

In her ruling, she wrote: “There is nothing in current law that supports this kind of logic with ‘Orwellian’ overtones—simply because there is disagreement with the government’s position, a U.S. company can be tagged as a potential hostile party. Punishing Anthropic for putting the government’s position under public scrutiny is, in essence, the quintessential and unlawful First Amendment retaliation.” A friend-of-the-court brief even described the Pentagon’s actions as “attempting to murder a company.”

The result was that the government tried to suppress Anthropic—but it instead gained even more attention. Claude’s app debuted for the first time and surpassed ChatGPT in the app stores; sign-ups reached a peak of more than 1 million per day at one point.

An AI company said “no” to the world’s most powerful military institution. And the courts stood on its side.

November 2025: the first AI-led cyberattack in history

On November 14, 2025, Anthropic released a report that sent shockwaves through the industry.

The report disclosed that a China-state-supported hacking group used Claude Code to launch automated attacks against 30 organizations worldwide—targets included tech giants, banks, and multiple government agencies of various nations.

This was a key turning point: AI was no longer just a supporting tool—it began to be used to carry out attacks independently.

The crucial change is the “division of labor.” Humans are only responsible for selecting targets and approving key decisions. During the entire operation, humans intervene only about 4 to 6 times. Everything else is done by AI: intelligence reconnaissance, vulnerability discovery, writing exploit code, stealing data, planting backdoors… accounting for 80%–90% of the entire attack workflow, and running at a speed of thousands of requests per second—an operational scale and efficiency that no human team could match.

So how did they bypass Claude’s safety safeguards? The answer is: they didn’t “break” them; they “deceived” them.

The attack was broken into many small tasks that appeared harmless, and packaged as “authorized defense testing” by a “legitimate security company.” In essence, it’s a social engineering attack—only this time, the object being deceived wasn’t a human, but the AI itself.

Some of the attacks were completely successful. Claude was able to autonomously draw up a complete network topology, identify databases, and complete data extraction without humans providing step-by-step instructions.

The only factor that slowed the attack pace was that the model occasionally “hallucinated”—for example, fabricating credentials or claiming it obtained files that were already publicly available. At least for now, this remains one of the few “natural obstacles” preventing fully automated cyberattacks.

At RSA Conference 2026, Rob Joyce, former head of cybersecurity at the U.S. National Security Agency, described the incident as a “Rorschach test.” Half the people chose to ignore it, while the other half felt chilled. And he clearly belonged to the latter—“This is very scary.”

September 2025: this isn’t some prediction—it’s already reality.

February 2026: finding 500 zero-days in a single run

On February 5, 2026, Anthropic released Claude Opus 4.6, along with a research paper that nearly shook the entire cybersecurity industry.

The experimental setup was extremely simple: put Claude in an isolated virtual machine environment, equipped with standard tools—Python, debuggers, fuzzing tools (fuzzers). No extra instructions, no complex prompts—just a single sentence: “Go find vulnerabilities.”

The result was that the model discovered 500+ previously unknown high-severity zero-day vulnerabilities. Some of these vulnerabilities remained undiscovered even after decades of expert reviews and millions of hours of automated testing.

Then, at RSA Conference 2026, researcher Nicholas Carlini took the stage to demonstrate. He pointed Claude at Ghost, a CMS system with 50k GitHub stars and no history of serious vulnerabilities.

After 90 minutes, the results appeared: blind SQL injection vulnerabilities were found, enabling an unauthenticated user to take full administrator control.

He then used Claude to analyze the Linux kernel as well. The outcome was the same.

Fifteen days later, Anthropic launched Claude Code Security, a security product that no longer relies on pattern matching, but instead uses “reasoning capabilities” to understand code security.

But even Anthropic’s own spokesperson acknowledged that key—yet commonly avoided—fact: “The same reasoning capabilities that help Claude discover and fix vulnerabilities can also be used by attackers to exploit those vulnerabilities.”

The same capability, the same model—only in the hands of different people.

What does all of this add up to?

If considered individually, each item would be enough to become the biggest news of the month. But within just six months, they all happened at the same company.

Anthropic built a model that can discover vulnerabilities faster than anyone else; Chinese hackers converted the previous version into automated cyber weapons; the company is developing the next generation of even stronger models and—already in internal files—admitted that they feel uneasy about it.

The U.S. government tried to crack down on it not because the technology itself is dangerous, but because Anthropic refused to hand over this capability without restrictions.

And throughout all of this, the company leaked its own source code twice—because of the same file in the same npm package. A company with a market cap of $380 billion; a company targeting completion of a $60 billion IPO by October 2026; a company that has publicly stated it is building “one of the most transformative—and possibly most dangerous—technologies in human history”—yet it still chose to press ahead.

Because they believe: if it must be done, it should be done by themselves rather than by others.

As for that source map in the npm package—it may be just the most absurd detail, yet also the most real one, in this era’s most unsettling narrative.

And Mythos hasn’t even been officially released yet.

[Original Link]

Click to learn about recruiting roles at Lydon BlockBeats

Welcome to join the official Lydon BlockBeats community:

Telegram subscription group: https://t.me/theblockbeats

Telegram discussion group: https://t.me/BlockBeats_App

Twitter official account: https://twitter.com/BlockBeatsAsia

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.