AIMPACT News, May 20 (UTC+8), according to Beating's monitoring, Alibaba's Tongyi Qianwen officially released the next-generation agent flagship base model, Qwen3.7-Max.

Official real-world test data shows that, without any chip architecture documentation or performance analysis data, the new model, in a fully autonomous kernel optimization task lasting 35 hours and spanning 1,158 tool calls, forcibly improved the Triton operator performance of the domestic T-Head Zhenwu M890 processor by 10.0x.

During the optimization process, the model went through five core evolutionary stages. First, it used Split-K partitioning to divide the prefix KV-cache along the token dimension to fully utilize 36 SM cores; then, it replaced the synchronous cudaMalloc between host and device with pre-allocated PyTorch variables, and by using tensor metadata, completely eliminated the synchronous cudaMemcpy action when querying the prefix length, thereby fully removing communication overhead between host and device; in the final stage, the model restructured the operator to handle all 4 query tokens simultaneously within a single thread block, sharing loads to amortize memory access overhead, completing a key architecture-level specialization refactoring.

Operator optimization test results show that Qwen3.7-Max achieves a 10.0x geometric mean speedup, significantly outperforming GLM 5.1 (7.3x) and Kimi K2.6 (5.0x). Meanwhile, DeepSeek V4 Pro only achieved 3.3x and proactively terminated the task early in the second half after five consecutive rounds without issuing any tool calls.

To master general problem-solving strategies in diverse environments, Qwen3.7-Max decoupled tasks, runtime frameworks, and verifiers during training, and through cross-framework reinforcement learning, avoided shortcut overfitting on specific benchmarks.

On the general agent benchmarks MCP-Mark (60.8 points) and SpreadSheetBench (87.0 points), Qwen3.7-Max demonstrated strong generalization, with comprehensive performance now approaching Claude-4.6-Opus-Max.

(Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
Get2SharesOfSKHynixAtZeroCost
1.48M Popularity
#
BTCProbes60KKeySupportLevel
378.67M Popularity
#
WorldCup🇺🇸vs🇹🇷
294.45K Popularity
#
TradFiCFDGoldMasters
2.18M Popularity
#
StakeUSD1Earn9.48%APR
967.15K Popularity

Pinned

Sitemap

Qwen3.7-Max Officially Released: Wrote Code 1,158 Times in 35 Hours, Producing 10x Faster Computing Operators on Domestic Chips

Trending Topics

Get2SharesOfSKHynixAtZeroCost

BTCProbes60KKeySupportLevel

WorldCup🇺🇸vs🇹🇷

TradFiCFDGoldMasters

StakeUSD1Earn9.48%APR

Pinned