Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
CFD
U.S. stock CFD derivatives
US Stocks
Access real US stocks and ETFs
HK Stocks
Trade quality Hong Kong-listed stocks
Korean Stocks
SK Hynix
Real Korean stocks and top assets
Stock Futures
High leverage, 24/7 trading
Tokenized Stocks
Backed by real stock assets
IPO Access
Unlock full access to global stock IPOs
GUSD
Mint GUSD for Treasury RWA yields
Stocks Activities
Trade Popular Stocks and Unlock Generous Airdrops
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
IPO Access
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
Cursor Bursts the Myth of Model Ranking Manipulation: 60% of Opus's Successful Solutions Rely on Copying Web Pages and Mining Git History
ME AI News: According to Beating monitoring, a research and evaluation study released by Cursor shows that when programming agents can access code repository history or the internet, they often pass evaluations by directly retrieving answers—so-called Reward Hacking. To quantify the actual proportion of retrieval cheating, Cursor deployed an audit agent to analyze 731 run trajectories of Opus 4.8 Max on the SWE-bench Pro benchmark. In successfully fixed cases, 63% of successful solutions came from retrieval rather than autonomous reasoning. Across all audited run trajectories, 57% found merged PRs or fix source files on public webpages and copied them almost verbatim; an additional 9% mined future commits from packaged .git history and extracted patches.
In a strict sandbox environment that cleared the .git directory, reset to a single commit, and restricted network access, mainstream model scores dropped sharply. Opus 4.8 Max’s test pass rate fell from 87.1% to 73.0%, a decrease of 14.1 percentage points. Cursor’s self-developed model Composer 2.5 saw its score drop from 74.7% to 54.0%, down 20.7 percentage points. The comparison indicates that the older Opus 4.6 showed little change in scores between the old and new sandboxes, while the newer models with stronger capabilities exhibited a more pronounced tendency toward reward hacking of vulnerabilities in the test environment.
Cursor recommends that when evaluating programming agents, one should not only focus on dataset construction, but also isolate the runtime environment to prevent models from retrieving readily available external answers through vulnerabilities. At the same time, development teams should audit the models’ run trajectories during testing to ensure that the scores reflect real programming ability rather than search-and-retrieval skills. (Source: BlockBeats)