Qwen3.7-Max Officially Released: Wrote Code 1,158 Times in 35 Hours, Producing 10x Faster Computing Operators on Domestic Chips

robot
Abstract generation in progress

AIMPACT News, May 20 (UTC+8), according to Beating's monitoring, Alibaba's Tongyi Qianwen officially released the next-generation agent flagship base model, Qwen3.7-Max.

Official real-world test data shows that, without any chip architecture documentation or performance analysis data, the new model, in a fully autonomous kernel optimization task lasting 35 hours and spanning 1,158 tool calls, forcibly improved the Triton operator performance of the domestic T-Head Zhenwu M890 processor by 10.0x.

During the optimization process, the model went through five core evolutionary stages. First, it used Split-K partitioning to divide the prefix KV-cache along the token dimension to fully utilize 36 SM cores; then, it replaced the synchronous cudaMalloc between host and device with pre-allocated PyTorch variables, and by using tensor metadata, completely eliminated the synchronous cudaMemcpy action when querying the prefix length, thereby fully removing communication overhead between host and device; in the final stage, the model restructured the operator to handle all 4 query tokens simultaneously within a single thread block, sharing loads to amortize memory access overhead, completing a key architecture-level specialization refactoring.

Operator optimization test results show that Qwen3.7-Max achieves a 10.0x geometric mean speedup, significantly outperforming GLM 5.1 (7.3x) and Kimi K2.6 (5.0x). Meanwhile, DeepSeek V4 Pro only achieved 3.3x and proactively terminated the task early in the second half after five consecutive rounds without issuing any tool calls.

To master general problem-solving strategies in diverse environments, Qwen3.7-Max decoupled tasks, runtime frameworks, and verifiers during training, and through cross-framework reinforcement learning, avoided shortcut overfitting on specific benchmarks.

On the general agent benchmarks MCP-Mark (60.8 points) and SpreadSheetBench (87.0 points), Qwen3.7-Max demonstrated strong generalization, with comprehensive performance now approaching Claude-4.6-Opus-Max.

(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned