The triple moments of Anthropic: code leaks, government confrontation, and weaponization

Author: Ben Thompson

Translated by: Deep Tide TechFlow

Deep Tide Briefing: Anthropic’s new model, Fable, was abruptly halted by the U.S. government just two months after its release. On the surface, this was attributed to a “security leak,” but in reality it reveals a double conflict—between AI labs and the government, and between AI labs and the software industry. With “safety” as its selling point, the company is turning the safety narrative into a commercial moat, while what they truly want to seize is the user data held by Microsoft and others.

I understand the position of the mockers. They always assume that Anthropic’s public statements—especially the wording used when releasing models—are simply marketing tactics meant to spread panic. Two months ago, Anthropic announced Mythos Preview, claiming the model was too dangerous to release publicly, particularly due to its powerful cybersecurity capabilities. Then two months later, the company publicly released Fable—a Mythos version with various safety guardrails added.

Based on my limited experience, Fable really is an outstanding model. Aside from programming performance, it has become very difficult to evaluate models objectively, but subjective impressions remain. And my interaction experience with Fable has been extremely good—it makes other models, including GPT 5.5 and Opus 4.8, seem small and stupid. I’ve only had this feeling twice before: once with GPT-4 and once with Grok 4. Both represented a new generation in model scale and complexity. I think Fable comes from a new pretraining run—the first of this new generation.

So I can fully accept that Fable/Mythos is indeed stronger at identifying and exploiting security issues, and that Anthropic’s cautious rollout makes sense. But the problem with publicly releasing a model is that the guardrails can be jailbroken—and clearly, something like that happened shortly after release.

Anthropic again faces the U.S. government

What happened next is a bit unclear. Anthropic wrote in a blog post:

The U.S. government invoked national security authorities and issued an export control order, suspending all foreign citizens’ access to Fable 5 and Mythos 5, whether inside or outside the U.S., including Anthropic’s foreign employees. The actual effect of this order is that we must suddenly disable access to Fable 5 and Mythos 5 for all customers to ensure compliance. Access to all other Anthropic models is unaffected.

Today at 5:21 PM, we received the government’s directive. The letter did not provide any specific details about the national security concerns. We understand the government believes it has identified methods to bypass or “jailbreak” Fable 5. We reviewed demonstrations that used this specific technique to identify a small number of known minor vulnerabilities. These vulnerabilities all appear relatively simple, and we found that other publicly available models can detect them as well without needing to bypass.

Anthropic then argues that non-universal jailbreaks are inevitable and limited in scope, and that there is no evidence of a universal jailbreak. The jailbreaks that have been found appear to have been reported by Amazon, which is worth noting because Amazon is both an investor in Anthropic and the primary provider of the company’s reasoning services. As I write this, Anthropic executives are in Washington, D.C., trying to resolve what they insist is a misunderstanding, while White House officials are implying that the company leadership is indifferent to legitimate national security concerns.

Given how many facts are disputed, there isn’t much I can add about the current conflict. But I’m not surprised it’s happening. I explained in my article “Anthropic and Alignment” that a conflict between the U.S. government and Anthropic is inevitable. In this regard, those who think Mythos isn’t strong enough to warrant the government taking drastic action are missing the point: if it isn’t strong enough now, the next one will be—or the one after that—especially now that models are becoming increasingly useful for creating successors.

However, that raises another question—one that seems to support the mockers’ view. If Mythos is so dangerous, why release Fable in the first place? And why confront the government while doing the very thing you claim you want? In fact, I think Anthropic’s behavior is entirely understandable. What’s unique about the company is how it defends these actions—and it is precisely those defenses that give the mockers ammunition, while also giving Anthropic its “magic.”

Economic inevitability

In the first few years of AI, the most economic value flowed into compute power—for obvious reasons. We didn’t have enough supply to meet demand, which meant prices skyrocketed. The biggest beneficiaries were Nvidia, TSMC, and memory manufacturers (SK Hynix, Samsung, and Micron). Meanwhile, Anthropic and OpenAI together lost hundreds of billions of dollars building frontier models. Once these models are released, they are distilled and commoditized by open-source models, largely from China.

This reflects a bleak reality for labs: they can never recoup their costs, because their differentiation is short-lived, and free alternatives become “good enough”—which I think is reasonable. In a world where models are interchangeable, models are commodities, and most of the value flows elsewhere. It’s compute now, but as time goes on, when we have enough compute, the most valuable position in the value chain will be the place that has always been the most valuable: owning user touchpoints.

Therefore, frontier labs have an economic imperative to move closer to users. That has always been clear to me. If you own user touchpoints, you get meaningful lock-in. And the best way to own user touchpoints is to be the canvas for everything they need to do. This, in turn, means frontier labs are moving toward conflict with software companies: software companies own user touchpoints, while the long-term interest of frontier labs is not merely to be an input into software, but to directly replace software.

At the same time, software companies are trying to do the opposite. Satya Nadella laid out his vision for how companies should build on models in a post on X:

Every company must build what I call human capital and token capital. Human capital includes the knowledge, judgment, relationships, originality, and pattern recognition of its employees, while token capital is the AI capabilities the company builds and owns. Importantly, as token capital grows, human capital does not become less valuable. It only becomes more valuable! I believe human initiative will be the driver of token capital growth. Humans will set ambitious goals, connect dots across domains, build relationships, and identify the most important patterns. Without human guidance, your compute is idle.

This means the real opportunity isn’t choosing the best model—it’s building learning loops on top of models so that human and token capital grow in a compounding way. You can outsource a task, or even a job, but you can never outsource your learning. The future of companies is to enable this learning to compound between humans and AI. That requires a new architectural approach: every enterprise should be able to build agent systems that improve over time while still retaining control over its intellectual property. Companies should be able to swap out “general” models without losing the “company veteran” expertise embedded in their learning systems. This is the key “test” of control and sovereignty in the future era.

Nadella opens this vision with a warning:

What we all don’t want to see is a world where every industry and every company gives value to a small handful of models that consume everything. If all value is captured by just a few models, political-economic systems won’t tolerate it. Society will not permit an AI future that hollowed out entire industries.

Think about what happened in the first phase of globalization: the entire industrial economy was outsourced and hollowed out. On the surface, GDP figures looked fine, but displacement was real, and the consequences are still being felt. Let’s not bring this dynamic into the AI era—let’s not let a small number of AI systems capture all economic returns while the entire industry finds that its knowledge has been commoditized right under its own eyes.

The problem with this analogy is that globalization really did happen, and industrial economies really were hollowed out. This might not be a warning so much as a prophecy. No wonder Nadella is sounding the alarm, because Microsoft could be one of the victims. Similarly, the economic inevitability of model makers is precisely to achieve this.

Data inevitability

These models—even Mythos—aren’t at that stage yet. What they need, in addition to more compute, is more and better data. Model improvement increasingly comes from reinforcement learning. Some of it can be synthesized, but for frontier labs, the strongest lever is real-world usage.

I think this is the main reason why both OpenAI and Anthropic offer heavily subsidized subscription plans. SemiAnalysis recently estimated that a $200 plan can give you access to $8,000 worth of Claude tokens and $14,000 worth of Codex tokens. Of course, they are competing for shares of users’ and developers’ minds—but they are also competing for access to actual usage data to improve their models.

Anthropic is ramping up on Fable, announcing that they will retain all usage data for 30 days—even for enterprise plans that previously promised zero data retention. The company says it will not use this data for training, but it has not put in place any safeguards to ensure it won’t in the future (for example, storing data with third parties). If this policy change (when Fable resumes) doesn’t lead to large-scale customer churn, I suspect it’s only a matter of time before they start using the data. For their ultimate goal, it’s simply too valuable.

Also pay attention to the virtuous cycle of moving upward toward user touchpoints: the more workflows are completed directly with Claude or Codex, the more data each company can feed back into training. That makes their products stronger and more useful, expands the number of workflows they can serve, and increases their access to data.

Nadella emphasized the importance of this data in his article, but naturally he assumes it should be independent of the models:

Companies need to turn workflows, domain knowledge, and accumulated judgment into AI systems that improve with every use. Private evaluation should capture whether the model is truly improving outcomes that matter to the business (not just external benchmarks!). Private reinforcement learning environments should make models stronger on real trajectories within organizations. Its knowledge base makes organizational memory queryable, and token usage more efficient.

This cycle becomes the company’s new intellectual property. I think of it as a mountain-climbing machine. Unlike most assets, it compounds. Each improved workflow generates better training signals, accelerating the accumulation of the company’s unique tacit knowledge. Companies that build this early will have an advantage that’s difficult to replicate, regardless of how the capabilities of any single future model evolve.

This cycle becomes the company’s new IP. I think of it as a mountain-climbing machine. Unlike most assets, it can grow in a compounding way. Every improved workflow produces better training signals, speeding up the accumulation of the company’s unique tacit knowledge. Companies that establish this capability early will have an advantage that’s difficult to replicate, no matter how much individual model capabilities improve in the future.

But what if companies that comply with Anthropic’s data policies can already get better results right now? Or what if existing companies resist, leaving opportunities for new entrants—or even the model makers themselves—to beat them in the market? Anthropic is indeed testing Nadella’s call for resolve.

Power claims

Regarding the data retention policy surrounding Fable/Mythos, surprisingly, it’s not even the most controversial part of the release. Instead, Anthropic said at the time of release that if Fable were used for LLM development, it would quietly degrade its performance. The system card states:

We have also added safeguards related to cutting-edge LLM development. As discussed in section 6.1 of our February 2026 risk report, we are concerned about the risks of accelerating the pace of overall AI development, even though we remain uncertain about the severity of these risks. In particular, our concern—as we wrote at the time—is “accelerating other AI developers to build powerful AI systems with similar risks to our system—without necessarily having appropriate safeguards.”

Given recent models’ ability to accelerate their own development, we have implemented new interventions to limit the effectiveness of Claude for requests to develop cutting-edge LLMs (e.g., building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our terms of service, but enforcing this restriction through safeguards can prevent accelerating the actors most willing to violate those terms.

Unlike our interventions in cybersecurity, biochemistry, and distillation attempts, these safeguards are not visible to users. Fable 5 will not revert to another model. Instead, safeguards will limit effectiveness through methods such as prompt modifications, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of programming work. We estimate they will impact about 0.03% of traffic, concentrated in less than 0.1% of organizations. When these interventions are active, we expect their impact on model behavior to be minimal beyond restricting development of cutting-edge LLMs. Claude will still respond helpfully to user requests. After this model’s release, we will continue to improve the accuracy of our detection methods.

Anthropic withdrew this change—Fable will hand off LLM-related requests to Opus 4.8 and disclose this handoff to users—but I think the original policy was very revealing. On one hand, I don’t blame Anthropic for not wanting to help competitors. On the other hand, it should be very clear that Anthropic believes that no one outside of itself should build cutting-edge LLMs.

What makes this policy even more striking is that it was issued only two months after a dispute with the Department of Defense. The latter wanted to use Claude for any legitimate purposes, while the former sought tighter controls over monitoring and autonomous weapons. This downgrade measure reflects not only Anthropic’s ability to quietly modify its models to realize its policy preferences, but also its willingness. In other words, Anthropic has actively validated some critics’ biggest fears about its role as a supply chain risk.

However, the broader conclusion drawn from that incident is that Anthropic believes it should have the final say on how its models are used. Given their belief that only they should develop frontier AI, they essentially believe that only they should have ultimate control over AI overall. When you combine this with the company’s statements that AI can conduct all economic activity, you realize Anthropic’s leadership actually wants power over everything and everyone.

Safety narrative

Of course, Anthropic would never put it so bluntly. Instead, the story is about safety:

I expect Anthropic will increasingly expose its models’ capabilities to end users through endpoints tailored to different workflows, even if they begin restricting API access. This replacement of software and restriction of access will be justified in the name of safety, even as Anthropic pursues its economic demand to move closer to end users.

Anthropic’s explanation for its major change to its data retention policy is safety. Specifically, the company claims that retaining all users’ data for 30 days is necessary to prevent jailbreak behaviors that concern the U.S. government. I can certainly imagine a future where safety concerns compel them to train on this data as well, in order to better defend against malicious use.

Anthropic’s entire origin story is rooted in the founders’ belief that OpenAI didn’t take safety seriously enough. The company believes only it can control AI, and because they uniquely care about safety, they have reason to try to control everyone else—including the U.S. government.

The problem with these safety reasons is that I think they are valid—only because they are not really reasons for Anthropic. The company truly believes it is the only one that believes in superintelligence, and therefore the only one sufficiently concerned about danger. This provides an excuse after excuse—one decision after another, one policy after another, and one confrontation after another—which, to outsiders, looks like a strange combination of cynicism and naivete.

The contrast with OpenAI is huge. I think one way to understand how and why OpenAI lost its lead is that, in the years after ChatGPT’s release, the company was internally at war. A former research lab was suddenly tasked with the burden of becoming an accidental consumer tech company. In resolving this conflict, OpenAI lost a large number of people to companies like Anthropic.

On the other hand, Anthropic’s talent, mission, and business are perfectly aligned. The company can sell researchers a vision of creating machine gods—wrapped in the halo of caring about danger and being smart enough to represent humanity’s concerns. And each policy change that follows is conveniently beneficial to its business—an absolutely wonderful coincidence.

I respect this consistency, and I fear it. I respect it because it is clearly very effective. The closest analogy might be Apple: the company always wraps every selfish action in the guise of doing what’s right for users—and often, they do. Anthropic is similar. But I fear that letting the people most convinced they know best build a smartphone I can accept or reject is one thing; letting them build superintelligence that has the potential to rival or surpass the power of nation-states—or even just large corporations—is much more worrying. The history of smart people who believe they know what humanity needs is dirty precisely because they convince themselves their intentions are good, providing reasons for actions that actually are not.

TSM-0.46%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned