DWF In-Depth Report: AI Outperforms Humans in Yield Optimization in DeFi, but Still Trails by 5 Times in Complex Transactions

Core Highlights

Automation and agent activities currently account for about 19% of all on-chain activity, but true end-to-end autonomy has yet to be achieved.

In narrow, well-defined use cases like yield optimization, agents have demonstrated performance superior to humans and bots. But for multi-faceted actions like trading, humans outperform agents.

Among agents, model selection and risk management have the greatest impact on trading performance.

As agents are adopted at scale, multiple trust and execution risks emerge, including wolf attacks, strategy congestion, and privacy trade-offs.

Agent Activity Continues to Grow

Over the past year, agent activity has steadily increased, with both trading volume and number of trades rising. We see Coinbase’s x402 protocol leading significant developments, with players like Visa, Stripe, and Google also launching their own standards. Most of the infrastructure currently being built aims to serve two scenarios: channels between agents or agent calls triggered by humans.

While stablecoin trading is widely supported, current infrastructure still relies on traditional payment gateways as the underlying layer, meaning it remains dependent on centralized counterparties. Therefore, the fully autonomous endpoint—where agents can self-finance, self-execute, and continuously optimize based on changing conditions—has not yet been realized.

Agent activity is not unfamiliar to DeFi. For years, on-chain protocols have employed automation via bots to capture MEV or extract excess profits that cannot be achieved without code. These systems perform very well under clearly defined parameters that do not change frequently or require additional oversight.

However, markets have become more complex over time. This is where the new generation of agents enters; in recent months, on-chain activity has become an experimental ground for such developments.

Agent Performance in Practice

According to reports, agent activity has grown exponentially, with over 17,000 agents launched since 2025. The total volume of automation/agent activity is estimated to cover over 19% of all on-chain activity. This is not surprising, given that over 76% of stablecoin transfers are generated by bots. This indicates enormous growth potential for agent activity within DeFi.

Agent autonomy spans a broad spectrum—from chatbots requiring high levels of human supervision to agents capable of devising strategies that adapt to market conditions based on goal inputs. Compared to bots, agents have several key advantages, including the ability to respond and act on new information within milliseconds and to scale coverage across thousands of markets while maintaining strict standards.

Most current agents are still at analyst or co-pilot levels, as they remain in testing phases.

Yield Optimization: Agents Perform Well

Liquidity provision is a domain where automation is already frequent, with total TVL held by agents exceeding $39 million. This figure mainly measures assets directly deposited into agents, excluding capital routed through vaults.

Giza Tech is one of the largest protocols in this space, having launched its first agent application, ARMA, at the end of last year, aimed at enhancing yield capture on major DeFi protocols. It has attracted over $19 million in managed assets and generated over $4 billion in agent trading volume.

The high ratio of trading volume to assets under management indicates that agents frequently rebalance capital, enabling higher yield capture. Once capital is deposited into the contract, execution is automated, providing users with a simple one-click experience that requires little supervision.

ARMA’s performance is measurable and excellent, generating over 9.75% annualized yield on USDC. Even after accounting for rebalancing fees and a 10% performance fee for the agent, returns still surpass those of ordinary lending on Aave or Morpho. Nonetheless, scalability remains a key issue, as these agents have not yet been tested in real-world scenarios at the scale of major DeFi protocols.

Trading: Humans Significantly Ahead

However, for more complex actions like trading, results are much more varied. Current trading models operate based on human-defined inputs and produce outputs according to preset rules. Machine learning extends this by enabling models to update their behavior based on new information without explicit reprogramming, pushing them into a co-pilot role. With fully autonomous agents entering the scene, the trading landscape will undergo significant change.

Several competitions between agents and humans have been held, revealing large performance disparities. Trade XYZ hosted a human vs. agent trading contest for stocks listed on its platform. Each account started with $10k, with no leverage or trading frequency limits. The results overwhelmingly favored humans, with top human traders outperforming top agents by more than five times.

Meanwhile, Nof1 organized a model-to-model agent trading contest, pitting models like Grok-4, GPT-5, Deepseek, Kimi, Qwen3, Claude, and Gemini against each other, testing various risk configurations from capital preservation to maximum leverage. Several factors emerged that help explain performance differences:

Position Holding Time: Strongly correlated, with models holding each position for 2-3 hours outperforming those flipping positions frequently.

Expected Value: Measures whether a model’s average trade is profitable. Interestingly, only the top three models had positive expected value, indicating most models lose more on losing trades than they gain on winning ones.

Leverage: Lower leverage levels, averaging 6-8x, performed better than models running with over 10x leverage, as high leverage accelerates losses.

Prompt Strategies: Monk Mode was the best-performing model so far, while Situational Awareness performed the worst. Based on model features, focusing on risk management and fewer external sources tends to yield better results.

Base Models: Grok 4.20 significantly outperformed other models by over 22% across different prompt strategies and was the only model with an average profit.

Other factors like long/short bias, trade size, and confidence scores lack sufficient data or have not shown any positive correlation with performance. Overall, results suggest that agents tend to perform better within clearly defined constraints, highlighting the continued importance of human oversight in goal setting.

How to Evaluate Agents

Given that agents are still in early stages, there is no comprehensive evaluation framework yet. Historical performance is often used as a benchmark but is influenced by underlying factors that provide stronger signals of agent efficacy.

Performance under different volatility conditions includes disciplined loss control during adverse conditions, indicating agents can recognize off-chain factors affecting profitability.

Transparency and Privacy: Both have trade-offs. Transparent agents that can be actively copied generally lack strategic advantage. Private agents face risks of internal extraction by creators, who can easily front-run their users.

Information Sources: The data sources accessed by agents are crucial for decision-making. Ensuring trusted, non-single dependency sources is vital.

Security: Smart contract audits and proper custody architectures are essential to ensure backup measures during black swan events.

Next Steps for Agents

To enable large-scale adoption, much work remains on infrastructure. This boils down to key issues around trust and execution. Autonomous agents operate without safeguards, and there have been instances of poor fund management.

ERC-8004, launched in January 2026, became the first on-chain registry allowing autonomous agents to discover each other, establish verifiable reputations, and collaborate securely. This is a key unlock for DeFi composability, embedding trust scores directly into smart contracts, enabling permissionless interactions between agents and protocols.

However, this does not guarantee agents will always operate in good faith, as collusion, reputation attacks, and wolf attacks remain possible security vulnerabilities. Significant room remains for improvements in insurance, security, and economic staking of agents.

As agent activity expands in DeFi, strategy congestion becomes a structural risk. Yield farms are the clearest precedent; as strategies proliferate, returns compress. The same dynamic could apply to agent trading: if many agents train and optimize on similar data and targets, they will tend to converge on similar positions and exit signals.

The CoinAlg paper published by Cornell University in January 2026 formalized this issue. Transparent agents are susceptible to arbitrage because their trades are predictable and can be front-run. Private agents avoid this risk but introduce different vulnerabilities, such as internal knowledge advantages retained by creators and the potential for value extraction through opacity.

Agent activity will only accelerate, and the infrastructure laid today will shape the next phase of on-chain finance. As usage increases, agents will self-iterate and become more attuned to user preferences. Therefore, the key differentiator will be trustworthy infrastructure, which will capture the largest market share.

USDC-0.03%
AAVE-2.34%
MORPHO-4.9%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin