Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 30+ AI models, with 0% extra fees
Muon's confidence is very accurate during training, but it tends to overfit when switching to new samples.
CoinWorld News reports that the Muon optimizer exhibits high confidence during training but tends to be overconfident on new samples. The latest paper, “Too Sharp, Too Sure: When Calibration Follows Curvature,” points out that the model can accurately assess its confidence on the training set, but on the test set, the confidence levels do not match the actual accuracy, leading to overconfidence. Experiments show that Muon’s test ECE on the CIFAR-10 image classification task is 0.065, AdamW is 0.061, SGD is 0.081, and SAM is 0.020. Muon’s training ECE is nearly zero, indicating a more significant gap between training and testing performance. The proposed Calmo method can reduce Muon’s test ECE to 0.019 but has not yet been validated on large language models. The DeepSeek V4 technical report indicates that some modules still use AdamW, highlighting the need to monitor Muon’s performance during generalization.