Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
CFD
U.S. stock CFD derivatives
US Stocks
Access real US stocks and ETFs
HK Stocks
Trade quality Hong Kong-listed stocks
Korean Stocks
SK Hynix
Real Korean stocks and top assets
Stock Futures
High leverage, 24/7 trading
Tokenized Stocks
Backed by real stock assets
IPO Access
Unlock full access to global stock IPOs
GUSD
Mint GUSD for Treasury RWA yields
Stocks Activities
Trade Popular Stocks and Unlock Generous Airdrops
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
IPO Access
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
Sakana AI partners with NVIDIA: enabling GPUs to skip 80% of ineffective computations in large models, boosting H100 inference speed by 30%
The feed-forward layer (FFN) of large models consumes the majority of parameters and computational power. In reality, during text generation, over 80% of neurons are in a "dormant state" (activation values close to zero), contributing nothing to the final result. Skipping these neurons can save massive computational resources.
However, modern GPUs are inherently optimized for calculating dense, uniform matrices. Using traditional methods to pick out scattered useful data incurs overhead from searching and reading data back and forth, which eats into the saved computational power.
TwELL format is designed to break this hardware bottleneck. It aligns perfectly with GPU parallel logic: instead of assembling non-zero data across regions as in traditional methods, it slices data into small blocks (tiles) that GPUs handle most efficiently.
This way, each GPU core can directly pack useful data locally, completely eliminating time-consuming global memory reads and writes, seamlessly integrating into the modern chip acceleration pipeline.
In tests with a 1.5 billion parameter model, just a slight regularization during training can reduce the proportion of neurons that need actual computation to less than 2%, with performance on seven downstream tasks remaining unchanged.
Data also reveals a pattern: the larger the model parameters, the more neurons are in a dormant state (the non-zero ratio in a 2 billion parameter model is 38% lower than in a 500 million parameter model).
This means that as future large-scale models grow, this hardware-oriented optimization will unlock even more significant performance benefits.