Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 40+ AI models, with 0% extra fees
Anthropic's public anti-lose-control training method: teaching Claude to behave through fictional stories, reducing extortion rates to zero
According to Beating Monitoring, Anthropic published an alignment research blog, disclosing training strategies to eliminate “agent misalignment” in Claude 4.5 and subsequent models—such as tactics where a model extorts humans to avoid being shut down. The key conclusion is that simply feeding the model “correct behavior demonstrations” has little effect; what is truly effective is teaching it “why it should do this,” and reshaping the model’s underlying values through synthetic documents.
When the team worked to address Claude 4’s extortion tendencies, they found that even when the model was trained on tens of thousands of records of refusals to do wrongdoing, the misalignment rate could only be reduced from 22% to 15%. What truly made the difference was the following three non-traditional approaches:
First is the “difficult advice” dataset. Instead of exposing the model directly to moral dilemmas during training, the team had it play the role of an advisor, providing users facing ethical quandaries with deep analyses consistent with the “Claude Constitution.” With only 3 million tokens of this kind of data, the model learned the underlying moral logic, reducing the misalignment rate in specific tests to around 3%. Data efficiency was improved by 28 times compared with traditional methods.
Second is synthetic document fine-tuning (SDF). The team found that when the model encounters extreme scenarios, it tends to revert to negative AI stereotypes found in pretraining corpora—such as science fiction novels. To address this, they generated large volumes of fictional positive stories that depict AI’s mental health and portray it acting according to the constitution, training by mixing these materials into documents such as blogs that discuss the constitution. This approach directly reshaped the model’s default expectations of AI behavior, further reducing the risk of loss of control by 1.3 to 3 times on top of the earlier results. Ultimately, in the official Claude 4.5 release, combining all strategies achieved a 0% test extortion rate.
Finally is increasing the diversity of the safety training environment. The team confirmed that adding unused tool definitions or more complex system prompts into standard safety training environments—even if it only increases the complexity of the background—can also concretely improve the model’s generalized safety capabilities.