Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Google releases the seventh-generation Ironwood TPU Developer Training Guide, detailing system-level performance optimization
ME News update: On April 2 (UTC+8), Google officially released developer training guidelines for the seventh-generation Ironwood TPU. The guide is designed to help developers fully leverage the system-level performance of Ironwood TPU for efficient training and deployment of cutting-edge AI models. Ironwood TPU is customized AI infrastructure engineered to meet the compute needs of trillion-parameter models. It uses technologies such as chip-to-chip interconnects (ICI), optical circuit switches (OCS), data center networks (DCN), and aggregated high-bandwidth memory (HBM) to build a complete system that supports up to 9,216 chips. The article details multiple key optimization strategies for this hardware, including: using its matrix multiplication units (MXU), which natively support FP8 training to improve throughput; adopting Tokamax, a JAX kernel library specifically optimized for TPU, to handle irregular tensors in long-context and mixture-of-experts models through “splash attention” and “Megablox grouped matrix multiplication”; using the fourth-generation SparseCore to offload collective communication operations to hide latency; carefully tuning the allocation of TPU fast on-chip SRAM (VMEM) to reduce memory stalls; and selecting the best sharding strategy (such as FSDP, TP, EP) based on model size, architecture, and sequence length. (Source: InFoQ)