Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
CFD
U.S. stock CFD derivatives
US Stocks
Access real US stocks and ETFs
HK Stocks
Trade quality Hong Kong-listed stocks
Korean Stocks
SK Hynix
Real Korean stocks and top assets
Stock Futures
High leverage, 24/7 trading
Tokenized Stocks
Backed by real stock assets
IPO Access
Unlock full access to global stock IPOs
GUSD
Mint GUSD for Treasury RWA yields
Stocks Activities
Trade Popular Stocks and Unlock Generous Airdrops
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
IPO Access
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
DeepSeek open-sources inference acceleration framework DeepSpec, launches DSpark to boost V4 model speed by up to 85%.
According to Dongcha Beating monitoring, DeepSeek, together with Peking University, released the technical report for DSpark, a speculative sampling acceleration framework, and open-sourced the full-stack codebase DeepSpec. DSpark has now been deployed in DeepSeek-V4’s online business. While ensuring lossless output, DSpark improves the single-user generation speed of the Flash version by 60% to 85%, and the Pro version by 57% to 78%. Under strict latency constraints, DSpark outperforms the original single-token multi-branch prediction (MTP-1) baseline and significantly boosts the system’s overall throughput.
Previously, multi-token speculative sampling was mostly hard to implement in online production environments. The autoregressive draft model is too slow to generate, while the parallel draft model has extremely low acceptance rates for the latter half of long sequences because predictions for each position are independent. If, under high concurrency, multi-token drafts were blindly verified, the large model would waste a large amount of compute power verifying incorrect tokens that are destined to be rejected, causing the system’s overall throughput to collapse severely. Therefore, in the industry, online production primarily relied on single-token prediction (MTP-1).
DSpark overcomes the throughput degradation bottleneck under high concurrency. First, DSpark uses the DFlash parallel backbone network to generate hidden states, and then adds an extremely lightweight Markov head. The Markov head injects associations between adjacent words in a serial manner at very low cost through table lookup and a single matrix multiplication. At the same time, the system integrates a confidence prediction head and a posterior calibration algorithm. To achieve zero-cost scheduling compatibility with production environments perfectly and to prevent future information leakage, the scheduler adopts an asynchronous mechanism that uses predictions from two steps earlier to dynamically determine the candidate word truncation length, completely preventing the large model from verifying high-risk tail incorrect tokens under heavy load.
In addition to DSpark, the DeepSpec codebase open-sourced by DeepSeek this time also natively supports open-source large models such as Qwen3 and Gemma. DeepSpec provides a complete Python toolchain, covering everything from downloading prompts, rebuilding large model caches, and training draft models to benchmark evaluation. Developers can directly use the open-source scripts to customize and deploy dedicated acceleration modules locally for different open-source large models.