Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
CFD
U.S. stock CFD derivatives
US Stocks
Access real US stocks and ETFs
HK Stocks
Trade quality Hong Kong-listed stocks
Korean Stocks
SK Hynix
Real Korean stocks and top assets
Stock Futures
High leverage, 24/7 trading
Tokenized Stocks
Backed by real stock assets
IPO Access
Unlock full access to global stock IPOs
GUSD
Mint GUSD for Treasury RWA yields
Stocks Activities
Trade Popular Stocks and Unlock Generous Airdrops
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
IPO Access
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
Qwen3.7-Max Officially Released: Wrote Code 1,158 Times in 35 Hours, Producing 10x Faster Computing Operators on Domestic Chips
AIMPACT News, May 20 (UTC+8), according to Beating's monitoring, Alibaba's Tongyi Qianwen officially released the next-generation agent flagship base model, Qwen3.7-Max.
Official real-world test data shows that, without any chip architecture documentation or performance analysis data, the new model, in a fully autonomous kernel optimization task lasting 35 hours and spanning 1,158 tool calls, forcibly improved the Triton operator performance of the domestic T-Head Zhenwu M890 processor by 10.0x.
During the optimization process, the model went through five core evolutionary stages. First, it used Split-K partitioning to divide the prefix KV-cache along the token dimension to fully utilize 36 SM cores; then, it replaced the synchronous cudaMalloc between host and device with pre-allocated PyTorch variables, and by using tensor metadata, completely eliminated the synchronous cudaMemcpy action when querying the prefix length, thereby fully removing communication overhead between host and device; in the final stage, the model restructured the operator to handle all 4 query tokens simultaneously within a single thread block, sharing loads to amortize memory access overhead, completing a key architecture-level specialization refactoring.
Operator optimization test results show that Qwen3.7-Max achieves a 10.0x geometric mean speedup, significantly outperforming GLM 5.1 (7.3x) and Kimi K2.6 (5.0x). Meanwhile, DeepSeek V4 Pro only achieved 3.3x and proactively terminated the task early in the second half after five consecutive rounds without issuing any tool calls.
To master general problem-solving strategies in diverse environments, Qwen3.7-Max decoupled tasks, runtime frameworks, and verifiers during training, and through cross-framework reinforcement learning, avoided shortcut overfitting on specific benchmarks.
On the general agent benchmarks MCP-Mark (60.8 points) and SpreadSheetBench (87.0 points), Qwen3.7-Max demonstrated strong generalization, with comprehensive performance now approaching Claude-4.6-Opus-Max.
(Source: BlockBeats)