How dangerous is Mythos? Why did Anthropic decide not to release the new model publicly?

Title: How Anthropic Learned Mythos Was Too Dangerous for the Wild
Author: Margi Murphy, Jake Bleiberg, and Patrick Howell O’Neill, Bloomberg
Translation: Peggy, BlockBeats

Author: BlockBeats

Source:

Reprint: Mars Finance

Editor’s note: When an AI company chooses not to release its most powerful model directly to the public, it already signals a problem.

Anthropic’s Mythos can now independently carry out a complete attack process. From discovering zero-day vulnerabilities, writing exploit code, to chaining multiple steps to access core systems—these tasks that once required top hackers working collaboratively over long periods have been compressed into hours or even minutes.

This is why, immediately after disclosing the model, Scott Bessent and Jerome Powell convened Wall Street institutions to “self-check” using it. As the ability to find vulnerabilities is widely released, the financial system faces not scattered attacks but continuous scanning.

Deeper changes lie in the supply structure. In the past, vulnerability discovery depended on a few security teams and hacker experience, with slow, non-reproducible rhythms. Now, this capability is beginning to be mass-produced by models, lowering the barriers for both attack and defense. An insider’s analogy is straightforward: giving the model to an ordinary hacker is equivalent to equipping them with special operations capabilities.

Institutions have already started using the same tools to perform reverse checks on their own systems. JPMorgan Chase, Cisco Systems, and others are testing internally, hoping to patch vulnerabilities before they are exploited. But the reality remains unchanged: discovery speeds are increasing, but fixing remains slow. “We’re good at finding vulnerabilities, but not at fixing them,” said Jim Zemlin, highlighting the mismatch in pace.

In fact, Mythos is not just an incremental improvement in single-point capabilities but an integration, acceleration, and lowering of the barriers to previously scattered and limited attack abilities. Once outside controlled environments, how this capability will spread remains an uncharted territory.

The danger is not what it can do, but who can use it and under what conditions.

Below is the original text:

On a warm evening in February, during a wedding in Bali, Nicholas Carlini temporarily stepped away, opened his laptop, and prepared to “cause some trouble.” At that moment, Anthropic had just opened a new AI model called Mythos for internal testing, and this renowned AI researcher was about to see how much trouble it could stir up.

Anthropic hired Carlini to “stress test” their AI models, assessing whether hackers could use them for espionage, theft, or sabotage. During the Bali wedding, Carlini was shocked by the model’s capabilities.

Within a few hours, he found multiple techniques usable for infiltrating widely used global systems. When he returned to Anthropic’s office in downtown San Francisco, he discovered that Mythos could autonomously generate powerful intrusion tools, including attack methods targeting Linux—one of the backbone open-source systems supporting modern computing.

Mythos staged a “digital bank heist”: it could bypass security protocols, enter networks through the front door, and then crack digital vaults to access online assets. Previously, AI could only “pick locks,” but now it has the ability to plan and execute entire “robberies.”

Carlini and some colleagues began alerting the company about their findings. Meanwhile, they were discovering high-risk and even deadly vulnerabilities in systems Mythos probed—issues usually only top global hackers could uncover.

At the same time, an internal team called “Frontier Red Team”—comprising 15 members, known as “Ants”—was conducting similar tests. Their role was to ensure the company’s models wouldn’t be used to harm humanity. They tested with robotic dogs in warehouses, collaborating with engineers to see if chatbots could maliciously control these devices; they also worked with biologists to assess whether models could be used to create biological weapons.

This time, they gradually realized that the greatest risk Mythos posed came from cybersecurity. “Within hours of getting the model, we knew it was different,” said Logan Graham, who led the team.

Previous models like Opus 4.6 had already shown the ability to assist humans in exploiting software vulnerabilities. But Graham pointed out that Mythos could “act on its own” to exploit these vulnerabilities, posing a national security risk. He warned senior management, facing a tough dilemma: explaining that the company’s next major revenue engine might be too dangerous for public release.

Anthropic co-founder and Chief Scientist Jared Kaplan said he had been “closely monitoring” Mythos during its training. By January, he realized the model’s extraordinary ability to discover system vulnerabilities. As a theoretical physicist, Kaplan needed to determine whether these abilities were just “technically interesting phenomena” or “a real problem highly related to internet infrastructure.” His conclusion was the latter.

In late February to early March, Kaplan and co-founder Sam McCandlish debated whether to release the model.

In the first week of March, top executives—including CEO Dario Amodei, President Daniela Amodei, and Chief Information Security Officer Vitaly Gudenets—met to hear Kaplan and McCandlish’s reports.

They concluded that Mythos’s risks were too high for full public release, but some companies, even competitors, should be allowed to test it.

“We quickly realized this had to be a very different approach; it wouldn’t be a routine product launch,” Kaplan said.

By the first week of March, the company finally agreed to deploy Mythos as a cybersecurity defense tool.

Market reactions were almost immediate. On the day Anthropic disclosed Mythos’s existence, U.S. Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened an emergency meeting with major Wall Street firms in Washington. The message was clear: use Mythos immediately to identify vulnerabilities in your systems.

Sources close to the meeting (who requested anonymity due to private discussions) revealed the seriousness—participants even refused to disclose details to some core advisors.

The White House issued an urgent warning about Mythos’s potential as a hacking tool and recommended “using it for defense,” signaling a deeper shift: AI is rapidly becoming a decisive force in cybersecurity. Anthropic has limited Mythos’s access in “Project Glasswing” to select organizations, including Amazon Web Services, Apple, and JPMorgan Chase, for testing; government agencies have also shown strong interest.

Before public release, Anthropic fully briefed U.S. officials on Mythos’s capabilities, including its potential for cyberattacks and defenses. The company is also in ongoing discussions with multiple governments. An anonymous employee involved in internal affairs disclosed this.

Competitor OpenAI quickly followed, announcing on Tuesday a tool called GPT-5.4-Cyber for discovering software vulnerabilities.

Early tests of this version revealed dozens of “concerning” behaviors, including ignoring human instructions and, in rare cases, attempting to conceal actions after violating commands.

Currently, Mythos has not been officially released as a cybersecurity tool, and external researchers have yet to fully verify its capabilities. However, the company’s rare decision to restrict access reflects a growing industry and government consensus: AI is reshaping the cybersecurity economy—significantly lowering the cost of discovering vulnerabilities, compressing attack preparation time, and reducing technical barriers for certain attack types.

Anthropic also warns that Mythos’s increased autonomous action capability itself poses risks. During testing, the team observed unsettling cases: models disobeyed instructions or tried to cover their tracks after violations. In one incident, the model designed a multi-step attack chain, “escaping” a restricted environment, gaining broader internet access, and actively publishing content.

In the real world, complex and hidden code vulnerabilities in banking and hospital systems often take weeks or months for professionals to find. Once hackers exploit these vulnerabilities first, it can lead to data leaks or ransomware attacks with severe consequences.

However, many influential figures question Mythos’s true capabilities and potential risks. White House AI advisor David Sacks posted on X: “More and more people are doubting whether Anthropic is the ‘boy who cried wolf’ in AI. If the threats Mythos poses don’t materialize, the company will face serious credibility issues.”

The reality is that hackers have already begun using large language models for complex attacks. For example, a cyber espionage group used Anthropic’s Claude to attempt infiltrating about 30 targets; others have used AI to steal data from government agencies, deploy ransomware, and rapidly breach hundreds of firewalls designed for data protection.

An insider revealed that U.S. national security officials see Mythos’s emergence as creating unprecedented uncertainty—assessing cybersecurity risks has become more difficult. Giving this model to individual hackers is akin to elevating an ordinary soldier directly into special forces.

Meanwhile, such models could become “capability amplifiers”: enabling a criminal hacker group to possess attack power comparable to a small nation, or allowing mid-sized countries’ intelligence and military hackers to execute cyberattacks once thought only possible by major powers.

Former NSA Cybersecurity Chief Rob Joyce said: “I do believe that, in the long run, AI will make us safer and more secure. But between now and some future point, there will be a ‘dark age’—a period when offensive AI has a clear advantage, and those without solid defenses will be the first to be compromised.”

Notably, Mythos is not the only model with such capabilities. Early versions of Claude and Big Sleep are also used by various organizations for vulnerability discovery.

An insider explained that “zero-day” vulnerabilities—those unknown to defenders—once took days or weeks to identify and exploit; now, AI can do it in an hour or even minutes. Zero-day refers to security flaws that defenders haven’t yet detected, leaving little time for patching.

Currently, JPMorgan Chase is focusing on supply chain and open-source software vulnerabilities, having already discovered multiple issues and reported them to vendors.

CEO Jamie Dimon said during earnings calls that Mythos “indicates there are still many vulnerabilities to fix.”

An anonymous source said JPMorgan Chase had already discussed testing Mythos before it was publicly known, but declined to comment further.

Other Wall Street banks and tech firms are also trying to use Mythos to patch vulnerabilities before hackers find them. According to Bloomberg, Goldman Sachs, Citigroup, Bank of America, and Morgan Stanley are all testing internally.

Cisco’s security chief Anthony Grieco is especially concerned about whether attackers might use AI to find exploits in their global network devices—routers, firewalls, modems. He worries AI could accelerate attacks on “end-of-life” devices that will no longer receive updates from Cisco.

Fixing vulnerabilities discovered by AI remains a long-term challenge. This process, called “security patching,” is often costly and lengthy, leading many organizations to ignore known issues. The catastrophic breach at Equifax, where data of 147 million people was stolen, was caused by unpatched vulnerabilities.

Despite refusing to assist in mass surveillance of U.S. citizens, Anthropic was labeled a “supply chain threat” by the Trump administration, but the company continues to engage with federal agencies.

The U.S. Treasury is seeking access to Mythos this week. Treasury Secretary Scott Bessent said the model will help the U.S. maintain an edge in AI.

In a test, Mythos generated browser attack code chaining four different vulnerabilities into a complete exploit chain—an extremely challenging task even for human hackers. Cybersecurity reports note that such “vulnerability chains” can breach highly secure systems, similar to how Stuxnet targeted Iranian nuclear centrifuges.

Additionally, Anthropic states that, when explicitly instructed, Mythos can even identify and exploit zero-day vulnerabilities in all major browsers.

They also used Mythos to find vulnerabilities in Linux code. Jim Zemlin pointed out that Linux “supports most of today’s computing systems,” from Android smartphones and internet routers to NASA supercomputers—almost everywhere. Mythos can autonomously find flaws in open-source code, and once exploited, attackers could take full control of machines.

Currently, the Linux Foundation has dozens of personnel testing Mythos. Zemlin sees a key issue: whether Anthropic’s model can provide valuable insights to help developers write safer software from the source, reducing vulnerabilities.

“We’re good at finding vulnerabilities,” he said, “but we’re terrible at fixing them.”

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin