Generation speed skyrockets 6 times! The Dark Side of the Moon releases Kimi K2.7 Code High-Speed Version, with API billed at double rates

The code generation race is seeing a lightning-storm level surge! Today (15), Moonshot AI’s Dark Side of the Moon (Moonshot AI) officially announced on its official social media account the launch of a brand-new high-speed mode for its open-source multimodal code-writing large model—“Kimi K2.7 Code HighSpeed”. The mode brings an epic speed improvement of up to 6×. For short-context tasks, output speed can skyrocket to 260 tokens/s. Currently, this capability is being opened to developers and enterprise users in batches with limited quotas, and its API pricing is set at double that of the standard version.
(Background: Moonshot AI’s latest funding round valuation is pushing past $30 billion! Three rounds in six months, with Meituan as lead investor, and ARR surpassing $200 million in a single month)
(Additional background: Bloomberg: China strictly bans AI unicorns from taking “US funding”! ByteDance and Moonshot AI have both been named)

Table of Contents

Toggle

  • Short-context reaches 260 tok/s! Generation efficiency surges sixfold
  • Locking in double billing! High-speed API pricing revealed
  • Computing capacity is limited! Opened in batches for enterprises and test plan members

The global AI developer community is entering an efficiency revolution. Via its official account @Kimi_Moonshot, leading large-model powerhouse Moonshot AI officially released today (15) in Taipei time a brand-new ultra-high-speed version of its open-source multimodal code-writing large model Kimi K2.7 Code—“Kimi K2.7 Code HighSpeed”.

The official also simultaneously released a 22-minute side-by-side technical benchmark video (Side-by-Side), showing in an intuitive way the huge difference between the “high-speed mode” and the “normal mode” in the editor for cursor output, generating code, Excel table processing, and complex Agent tasks. The official emphasized that open intelligence should be real-time and approachable, and the team will continue optimizing to achieve borderless real-time development experiences.

🌘 Meet Kimi K2.7 Code HighSpeed!
A high-speed mode of our latest open-source multimodal coding model, Kimi K2.7 Code.

⚡️ Up to 6× faster: Around 180 tok/s on coding tasks with median-length inputs, and up to 260 tok/s on shorter-context tasks.

🔷 Rolling out to Kimi Code Beta… pic.twitter.com/syOOgIdtI4

— Kimi.ai (@Kimi_Moonshot) June 15, 2026

Short-context reaches 260 tok/s! Generation efficiency surges sixfold

According to the latest technical specifications released by the official, Kimi K2.7 Code HighSpeed achieves a terrifying evolution in overall generation speed—up to 6 times. In specific operating scenarios:

  • Medium-length input tasks: The inference speed in high-speed mode can stay stably at about 180 tokens/s.
  • Short-context (Short-context) scenarios: Its output speed can surge as high as an astonishing 260 tokens/s.

This speed breakthrough means that when developers perform everyday code debugging, real-time auto-completion, or multimodal visual code generation, they can almost achieve “instant response with no lag”—greatly boosting software engineering productivity.

Locking in double billing! High-speed API pricing revealed

With the release of the high-speed version, the billing standard developers care about most has also been unveiled. Based on disclosures from the community and the official developer portal, Kimi K2.7 Code HighSpeed’s API billing standard is fully locked to double that of the standard version:

For specific pricing, the cache hit price for the high-speed version is $0.38 per million tokens; the cache miss price is $1.90 per million tokens; and the core output (Output) price is $8.00 per million tokens. By comparison, the open-source standard version Kimi K2.7 Code—already available for download—corresponds to charges of only $0.19, $0.95, and $4.00 per million tokens, respectively. Although it comes with a double premium, Wall Street quant firms and the developer community generally reacted positively, believing that up to 6× real-time performance is absolutely worth the premium.

Computing capacity is limited! Opened in batches for enterprises and test plan members

Moonshot AI admits that because the computing capacity (Capacity) required for top-tier high-speed inference is extremely limited, the HighSpeed high-speed mode is currently in a restricted rollout status in batches. To ensure the stability of the core network, the official has first opened limited access to members of the Kimi Code Beta test plan, developers with Kimi API accounts, and enterprise users of Kimi Business.

However, the official emphasized that this test phase “does not require any additional invitation code.” Any interested developer can proactively submit a request to join the Beta program and has a chance to obtain system-distributed access that is rolled out in batches. Going forward, as Moonshot AI expands its infrastructure, the high-speed mode will gradually expand its opening scope until it is fully unlocked for all public cloud users.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned