Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Futures Kickoff
Get prepared for your futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
AI Prediction Authority: I Still Underestimated the Speed of AI. Achieving "AI Research and Development Automation" by the End of This Year Is Truly Possible
The rapid advancement of artificial intelligence capabilities is catching even the most cautious forecasters off guard.
Renowned AI prediction researcher Ajeya Cotra recently admitted that her forecast for AI progress by 2026, published just two months ago, was significantly conservative. The trigger for this self-correction was the performance of Anthropic’s latest model, Claude Opus 4.6, in the authoritative METR benchmark tests. The software engineering “time span” for this model has reached about 12 hours, far exceeding Cotra’s previous prediction of around 24 hours by the end of 2026. This means AI’s real progress in software engineering has advanced nearly ten months earlier than she anticipated.
Even more striking, Cotra has raised her probability estimate for “full automation of AI research and development.” She maintains a 10% chance that AI will fully take over research conception and implementation without human intervention by the end of this year, and explicitly states: “This is the first time I can’t find any solid extrapolations to confidently say this won’t happen very soon.” This statement has garnered widespread attention in the AI prediction community.
Cotra previously served as the head of AI safety research funding at Coefficient Giving, one of the world’s largest AI safety funding organizations. She is now affiliated with METR—a firm focused on AI capability assessment.
Forecasts fall short: judgments from two months ago are outdated
On January 14, Cotra predicted, based on historical trends where the time span roughly doubled less than twice per year from 2019 to 2025, that the 50% success rate for the most advanced models by the end of 2026 would correspond to a time span of about 24 hours, with an 80% prediction of 40 hours.
However, just about two months after her forecast, Opus 4.6 was evaluated to have a time span of approximately 12 hours. In the METR test set, among 19 software engineering tasks estimated to take humans over 8 hours, Opus 4.6 was able to at least partially complete 14, and reliably solve 4 of them. Cotra admits that, despite ten more months of progress, AI agents still fail about half of the 24-hour tasks, making her previous confidence “no longer credible.”
It is also noteworthy that Cotra points out the uncertainty in current time span estimates has increased significantly—Opus 4.6’s 95% confidence interval ranges from 5.3 hours to 66 hours. This is partly due to the small number of long tasks, the reliance on estimated human completion times, and the near saturation of benchmark tests themselves.
Capability boundaries: traditional evaluation frameworks are failing
As AI agents approach or surpass tasks requiring dozens of hours, Cotra believes the very concept of “time span” is being challenged.
She notes that task decomposability increases markedly with scale: debugging tasks of one hour are nearly impossible to split and run in parallel; a day-long development task can be roughly divided but with fuzzy boundaries; projects spanning a month or more are naturally suited to being broken into multiple parallel sub-tasks. Once AI agents can reliably complete tasks of 80 hours or more, in theory, continuous progress can be made by “management-level AI” assigning tasks and “execution-level AI” working in parallel, enabling ongoing advancement of projects of any size.
Cotra’s colleague Tom has proposed using the calendar time required for a large team to complete a task—rather than individual man-hours—as a better measure of “intrinsic difficulty.” Cotra believes that as AI enters this new scale, the “single-person time” metric may begin to grow super-exponentially, making the upper limit of software engineering capabilities by year’s end extremely difficult to estimate.
She also admits that such large-scale task decomposition won’t work perfectly in practice—participants’ intuitive grasp of the overall context can’t be fully replaced by Jira tickets or Asana tasks. However, she believes that for a significant class of software projects, this approach “may prove surprisingly effective.”
Key milestone: AI research automation could become a reality this year
Among all predictions, Cotra’s assessment of the probability of “full automation of AI research and development” has attracted the most attention.
She defines this probability as: AI systems fully undertaking research conception and implementation without human involvement. In her January forecast, she assigned a 10% chance, which received feedback from many peers in the AI prediction field suggesting this number might be too high. But after the performance of Opus 4.6, she states that 10% “feels reasonable again.”
Cotra remains cautious. She points out that fully automating AI R&D not only requires advanced software engineering capabilities but also breakthroughs in “research judgment” and “creativity,” areas where current AI systems still lag significantly behind human researchers. She believes the likelihood of achieving this within the next three to five years is much higher than within this year.
However, her wording has fundamentally shifted: “This is the first time I can’t find any solid extrapolation to confidently say it won’t happen very soon.”
Risk warning and disclaimer
Market risks exist; investments should be cautious. This article does not constitute personal investment advice and does not consider individual users’ specific investment goals, financial situations, or needs. Users should consider whether any opinions, viewpoints, or conclusions in this article are suitable for their particular circumstances. Invest at your own risk.