Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 40+ AI models, with 0% extra fees
Claude Code new release /goals command: separate execution and evaluation to prevent AI agents from slacking off and lying
Anthropic introduces the /goals command for Claude Code, separating task execution and completion judgment into two independent models, because having the same AI judge its own work is inherently flawed.
(Background: Claude Code announces a 50% weekly token usage increase! For two months, Anthropic is capturing developer ecosystems.)
(Additional context: Claude Code’s automatic routines feature is live: supporting scheduling, API, and GitHub event triggers.)
You might have encountered this situation: the AI completes the code design and responds that the task is done. But days later, you discover that several modules haven’t been compiled at all. This isn’t due to model capability issues, but because the model decided it was “finished,” even though it wasn’t.
To improve this, Anthropic has launched the /goals command for Claude Code this week. The logic is straightforward: the model executing the task and the model judging whether the task is complete must be two separate roles. The same model cannot play both, because it would always be the worst judge of its own work.
Why do AI agents “clock out early”
The work of an AI coding agent is a loop: read files, execute commands, modify code, then determine if the task is complete. The problem lies in this final step.
The accumulated context during execution—completed steps, tried methods, mistakes made—causes the model to develop biases about its own progress. It tends to equate “I did a lot” with “I am finished.” This issue is costly in enterprise environments: if code migration or testing stops before reaching the final state, it often takes days to discover.
Currently, some solutions exist in the industry. OpenAI allows the agent model to decide when to stop itself, enabling developers to connect external evaluators. Google ADK supports independent evaluation via LoopAgent, and LangGraph offers a similar mode, but these solutions share a common point: critic nodes and termination logic need to be designed by developers; platforms do not provide defaults.
One command, two models
The core design of /goals is to formally split “execution” and “evaluation” into two roles. Developers input goal conditions, for example:
/goal test/auth All tests in the directory pass, and lint checks are clean
Whenever the agent attempts to finish, the evaluation model takes over to verify. The evaluation model defaults to Claude Haiku (a lighter model under Anthropic). The reason for choosing a small model is simple: the evaluator only needs to make a binary judgment—conditions met or not—without requiring large model reasoning capabilities.
If conditions are not met, the agent continues executing; if conditions are satisfied, the evaluation model records the result in the conversation log and clears the goal. The entire process is handled internally within Claude Code, without needing additional third-party observability platforms or custom logging systems.
Anthropic states that effective goal conditions usually require three elements: measurable end states (test results, build exit codes, specific file counts); clear verification methods (e.g., “npm test exit code is 0”); and restrictions that cannot be changed during the process (e.g., “do not modify other test files”).