Claude Sonnet 5 Launched: Anthropic Claims Multiple Performance Metrics Approaching Opus, but API Costs 60% Cheaper

Anthropic officially launches Claude Sonnet 5. Official benchmark scores show its multiple indicators are approaching the flagship Opus 4.8. The API standard pricing is $3 per million tokens for input and $15 for output, about 60% cheaper than Opus.
(Previous: California announces partnership with Anthropic: state agencies can use Claude at half price)
(Background: Is the era of high AI pricing ending? Five structural reasons why tokens will inevitably drop in price)

60% cheaper with only a slight performance gap — sounds like a perfect business story, but is it really that good? Earlier today, Anthropic officially released Claude Sonnet 5 and set it as the default model for Free and Pro users. Pricing wise, the API standard pricing is $3 per million tokens for input and $15 for output (promotional period before August 31 at $2/$10), compared to the flagship Opus 4.8 at $5/$25, about 60% cheaper.

Benchmarks approaching flagship

The numbers officially released by Anthropic are as follows, but all benchmarks are official self-assessments and have not yet been independently verified by third parties:

On SWE-bench Pro (agentic code ability), Sonnet 5 scored 63.2%, the previous generation Sonnet 4.6 was 58.1%, and the flagship Opus 4.8 was 69.2%.

Terminal-Bench 2.1 terminal operations: Sonnet 5 80.4%, Opus 4.8 82.7%.

Humanity’s Last Exam multidisciplinary reasoning: Sonnet 5 with tool use reaches 57.4%, nearly matching Opus 4.8's 57.9%.

GDPval-AA v2 knowledge work ability: Sonnet 5 scored 1,618, surpassing Opus 4.8's 1,615.

Computer operation ability also improved: In the OSWorld-Verified evaluation, Sonnet 5 scored 81.2%, the previous generation was 78.5%. The core scenario of this benchmark is to have the model actually control the desktop, completing tasks like screenshots, dragging, cross-application data transfer in a real OS environment, close to the difficulty of real automated workflows.

Additionally, Sonnet 5 supports a context window of up to 1 million tokens, with a maximum output of 128k tokens. That means it can take in about the text volume of 750 novels, or a large enterprise's batch of contract files, allowing the model to complete cross-file comparison, summarization, and decision-making in a single conversation without batch processing. This specification is particularly suitable for long-cycle agentic tasks, as the model does not need to 'forget' the previous context mid-task.

The bill doesn't necessarily follow the 'cheaper' logic

Sonnet 5 uses an updated tokenizer. In simple terms, a tokenizer is the way text is split into tokens. If the splitting method changes, the same piece of text will produce a different number of tokens, and the bill changes accordingly.

Anthropic explains that the same input under the new tokenizer may produce 1.0 to 1.35 times the number of tokens, depending on the content. Officially, the pricing is claimed to be adjusted to 'roughly cost-neutral,' but high-traffic users are advised to run their own benchmarks, as bills may not decrease and could even increase.

In terms of safety, Anthropic reports that Sonnet 5 has lower tendencies toward hallucination and sycophancy than Sonnet 4.6, and is better at rejecting malicious requests. But safety comparisons are relative: Sonnet 5 still has a higher rate of undesirable behavior than the more powerful Opus 4.8, and also higher than the strictly restricted Claude Mythos Preview.

In the Firefox 147 vulnerability development assessment conducted in collaboration with Mozilla, Sonnet 5 failed to produce a usable exploit (0%), but the partial success rate was 13.2%, higher than Sonnet 4.6's 8.8%. These numbers are still far behind Opus 4.8's 68.8%, but Anthropic has enabled network security protections by default.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned