Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
CFD
U.S. stock CFD derivatives
US Stocks
Access real US stocks and ETFs
HK Stocks
Trade quality Hong Kong-listed stocks
Korean Stocks
SK Hynix
Real Korean stocks and top assets
Stock Futures
High leverage, 24/7 trading
Tokenized Stocks
Backed by real stock assets
IPO Access
Unlock full access to global stock IPOs
GUSD
Mint GUSD for Treasury RWA yields
Stocks Activities
Trade Popular Stocks and Unlock Generous Airdrops
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
IPO Access
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
Perplexity publicly released the post-training method for the search Agent, and the model based on Qwen3.5 surpasses GPT-5.4 in accuracy and cost.
The process is based on the open-source models Qwen3.5-122B-A10B and Qwen3.5-397B-A17B, adopting a two-stage approach: first, supervised fine-tuning (SFT) to establish instruction following, language consistency, and other behaviors necessary for deployment; then, online policy reinforcement learning (RL) to optimize search accuracy and tool usage efficiency.
The RL phase uses the GRPO algorithm, and the training data consists of two parts: first, a self-developed synthetic multi-hop verifiable QA dataset, starting from internal seed queries, constructing questions requiring 2 to 4 hops of reasoning through entity chains, and verifying answer uniqueness with multiple independent solvers; second, general dialogue data based on a rubric, converting deployment requirements such as instruction following and format constraints into objectively checkable atomic conditions, used in the RL phase to prevent degradation of behaviors established by SFT.
The core of the reward design is gated aggregation: the preference score only participates in calculation when the baseline is correct (QA answered correctly or all rubric criteria are satisfied), preventing high preference signals from masking factual errors. The efficiency penalty adopts an intra-group anchoring method, using correct answers within the same group as a baseline, applying a smooth penalty for excessive tool calls and generation length.
Evaluation shows that the post-trained Qwen3.5-397B-SFT-RL performs best on multiple search benchmarks. On FRAMES, a single tool call achieves 57.3%, 5.7 percentage points higher than GPT-5.4 and 4.7 percentage points higher than Sonnet 4.6. Under a moderate budget (4 tool calls), it reaches 73.9%, with a cost of 2.0 cents per query; under the same conditions, GPT-5.4 is at 67.8% / 8.5 cents, and Sonnet 4.6 at 62.4% / 15.3 cents. Cost data is calculated based on each vendor's public API pricing, excluding cache optimization. (Source: BlockBeats)