Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
DWF Deep Report: AI Outperforms Humans in Yield Optimization in DeFi, but Still Trails by 5 Times in Complex Transactions
Core Highlights
Automation and agent activity currently account for about 19% of all on-chain activity, but true end-to-end autonomy has yet to be achieved.
In narrow, well-defined use cases such as yield optimization, agents have demonstrated performance superior to humans and bots. However, for multi-faceted actions like trading, humans outperform agents.
Among agents, model selection and risk management have the greatest impact on trading performance.
As agents are adopted at scale, there are multiple trust and execution risks, including wolf attacks, strategy congestion, and privacy trade-offs.
Agent Activity Continues to Grow
Over the past year, agent activity has steadily increased, with both trading volume and number of trades rising. We see Coinbase’s x402 protocol leading significant developments, with players like Visa, Stripe, and Google also launching their own standards. Most of the infrastructure currently being built aims to serve two scenarios: channels between agents or agent calls triggered by humans.
While stablecoin trading is widely supported, current infrastructure still relies on traditional payment gateways as the underlying layer, meaning it remains dependent on centralized counterparties. Therefore, the fully autonomous endpoint—where agents can self-finance, self-execute, and continuously optimize based on changing conditions—has not yet been realized.
Agent activity is not entirely unfamiliar to DeFi. For years, on-chain protocols have employed automation via bots to capture MEV or extract excess profits that cannot be achieved without code. These systems operate very well under clearly defined parameters that do not change frequently or require additional oversight.
However, markets have become more complex over time. This is where the new generation of agents enters, as on-chain activity has become an experimental ground for such approaches in recent months.
Agents’ Actual Performance
According to reports, agent activity has grown exponentially, with over 17,000 agents launched since 2025. The total volume of automation/agent activity is estimated to cover over 19% of all on-chain activity. This is not surprising, given that over 76% of stablecoin transfers are estimated to be bot-generated. This indicates enormous growth potential for agent activity within DeFi.
Agent autonomy spans a broad spectrum—from chatbots requiring high levels of human supervision to agents capable of devising strategies that adapt to market conditions based on goal inputs. Compared to bots, agents have several key advantages, including the ability to respond and act on new information within milliseconds, and to scale coverage across thousands of markets while maintaining similar levels of strictness.
Most current agents are still at analyst or co-pilot levels, as they are mostly in testing phases.
Yield Optimization: Agents Perform Well
Liquidity provision is a domain where automation is already frequent, with total TVL held by agents exceeding $39 million. This figure mainly measures assets directly deposited into agents by users, excluding capital routed through vaults.
Giza Tech is one of the largest protocols in this space, having launched its first agent application, ARMA, at the end of last year, aimed at enhancing yield capture on major DeFi protocols. It has attracted over $19 million in managed assets and generated over $4 billion in agent trading volume.
The high ratio of trading volume to assets under management indicates that agents frequently rebalance capital, enabling higher yield capture. Once capital is deposited into the contract, execution is automated, providing users with a simple one-click experience that requires minimal supervision.
ARMA’s performance is measurable and excellent, generating over 9.75% annualized yield on USDC. Even after accounting for rebalancing fees and a 10% performance fee for the agent, returns still surpass those of ordinary lending on Aave or Morpho. Nonetheless, scalability remains a key issue, as these agents have not yet been tested in real-world scenarios to manage or scale to the size of major DeFi protocols.
Trading: Humans Significantly Outperform
However, for more complex actions like trading, results are much more varied. Current trading models operate based on human-defined inputs and produce outputs according to preset rules. Machine learning extends this by enabling models to update their behavior based on new information without explicit reprogramming, pushing them into a co-pilot role. With fully autonomous agents joining, the trading landscape will undergo significant change.
Several competitions between agents and humans have been held, revealing large differences in model performance. Trade XYZ hosted a human vs. agent trading contest for its listed stocks. Each account started with $10k, with no leverage or trading frequency limits. The results overwhelmingly favored humans, with top human performance exceeding that of top agents by more than five times.
Meanwhile, Nof1 organized a model-to-model agent trading contest, pitting models like Grok-4, GPT-5, Deepseek, Kimi, Qwen3, Claude, and Gemini against each other, testing various risk configurations from capital preservation to maximum leverage. Several factors emerged that help explain performance differences:
Position Holding Time: Strongly correlated; models holding each position for 2-3 hours significantly outperformed those flipping positions frequently.
Expected Value: Measures whether the model’s trades are profitable on average. Interestingly, only the top three models had positive expected value, indicating most models’ losing trades outnumber profitable ones.
Leverage: Lower leverage levels, around 6-8x, performed better than models running with over 10x leverage, as high leverage accelerates losses.
Prompt Strategies: Monk Mode was the best-performing model so far, while Situational Awareness performed the worst. Based on model features, focusing on risk management and fewer external sources tends to yield better results.
Base Models: Grok 4.20 significantly outperformed other models by over 22% across different prompt strategies and was the only model with an average profit.
Other factors like long/short bias, trade size, and confidence scores lack sufficient data or have not shown any positive correlation with performance. Overall, results suggest that agents tend to perform better within clearly defined constraints, indicating that human oversight remains crucial for goal configuration.
How to Evaluate Agents
Given that agents are still in early stages, there is no comprehensive evaluation framework yet. Historical performance is often used as a benchmark, but it is influenced by underlying factors that provide stronger signals of agent effectiveness.
Performance under different volatility conditions: Includes disciplined loss control during adverse conditions, indicating agents can recognize off-chain factors affecting profitability.
Transparency and Privacy: Both have trade-offs. Transparent agents that can be actively copied may lack strategic advantage. Private agents face risks of internal extraction by creators, who can easily front-run their users.
Data Sources: The data sources accessed by agents are critical for decision-making. Ensuring trusted, non-single dependency sources is essential.
Security: Having smart contract audits and proper custody architectures to ensure backup measures during black swan events is vital.
Agents’ Next Steps
To enable large-scale adoption, much work remains on infrastructure. This boils down to key issues around trust and execution of autonomous agents. Without safeguards, autonomous agents have already shown instances of mismanaged funds.
ERC-8004, launched in January 2026, became the first on-chain registry allowing autonomous agents to discover each other, establish verifiable reputation, and collaborate securely. This is a key unlock for DeFi composability, embedding trust scores directly into smart contracts, enabling permissionless interactions between agents and protocols.
However, this does not guarantee agents will always operate in a non-malicious manner, as security vulnerabilities like collusion and wolf attacks remain possible. Therefore, there is still significant room for improvement in insurance, security, and economic staking of agents.
As agent activity expands in DeFi, strategy congestion becomes a structural risk. Yield farms are the clearest precedent; as strategies proliferate, returns compress. The same dynamic could apply to agent trading: if many agents are trained and optimized on similar data and goals, they will tend to converge on similar positions and exit signals.
The CoinAlg paper published by Cornell University in January 2026 formalized this issue. Transparent agents can be arbitraged because their trades are predictable and can be front-run. Private agents avoid this risk but introduce different risks, such as creators retaining informational advantages over their users and extracting value through opacity.
Agent activity will only accelerate. The infrastructure laid today will determine how on-chain finance evolves in the next phase. As agent usage increases, they will self-iterate and become more attuned to user preferences. Therefore, the key differentiator will be trustworthy infrastructure, which will capture the largest market share.