Anthropic analyzed approximately 400k Claude Code sessions and about 235k users, discovering that the key to AI coding success or failure is not whether one can write code, but how deep their understanding of the problem domain is.
(Background: Anthropic releases Claude Code economic study! AI agent cost-saving potential reaches 4 billion)
(Additional context: Anthropic launches AI impact dashboard: input your profession, instantly see how much of your job AI can take over)

Table of Contents

Toggle

How an accountant can become a "specialist" in Claude's eyes
After mistakes, who can steer the agent back on track
Managers outperform software engineers; occupational differences nearly disappear

In their latest research report, Anthropic analyzed about 235k user samples and found that what truly determines AI effectiveness is how well the "instruction giver" understands the problem they are solving.

How an accountant can become a "specialist" in Claude's eyes

This study by Anthropic covers roughly 400k Claude Code sessions from October 2025 to April 2026.

The report establishes a five-level task-specific professionalism scale, from novice to expert. The key lies in how this "professionalism" is defined, which differs from common assumptions. Simply put: It’s how well you understand the problem you’re trying to solve, not how good you are at coding.

The example given is straightforward: a senior engineer writing Rust for the first time is considered a novice for that task; conversely, an accountant who has never used Python, but can precisely tell Claude the accounting rules that must be satisfied and identify logical errors at month-end closing, is an expert in that task.

The numerical differences directly illustrate the severity of the issue. A novice session triggers about 5 Claude actions on average per prompt, producing around 600 words; an expert session triggers about 12 actions, producing roughly 3,200 words—more than double the actions and five times the output of the novice.

Regression analysis by Anthropic shows that each increase in professionalism level results in approximately a 9% increase in Claude’s actions and about a 13% increase in output. This relationship remains significant even after controlling for work type, task value, month, profession, and model version.

After mistakes, who can steer the agent back on track

Success rate figures further clarify the issue. Anthropic defined two success criteria: "judgment success" (the classifier determines whether the goal is met after reading the conversation) and "verification success" (requires verifiable hard evidence, such as tests, git commits, or explicit user confirmation).

Overall, the higher the user’s professionalism, the higher the probability of session success. Most of the improvement is concentrated at the lower end of the scale; the gap from novice to intermediate is larger than from intermediate to expert. Anthropic found that verification success rate in expert-level sessions is more than twice that of novices.

Even more interesting is the "post-error recovery rate." Anthropic tracked sessions that encountered issues—conversations with failure signals. In these sessions, verification success rate rose from 4% for novices to 15% for experts; the proportion of at least partial success was 60% for novices and 80-81% for intermediate to expert.

The gap in abandonment rates is also significant. When sessions encounter difficulties, novices have a 19% chance of giving up immediately (judging failure and producing zero code), while other levels only have 5-7%. Anthropic interprets this as: domain expertise is valuable because it enables the user to guide the agent back on track when it goes astray.

This finding points to an counterintuitive conclusion: "Understanding the problem" is more important than "knowing the tools." Because understanding the problem allows you to identify errors when Claude gives incorrect answers, specify boundary conditions precisely, and immediately correct strange decisions made by the agent.

Managers outperform software engineers; occupational differences nearly disappear

Anthropic’s data challenges another expectation: professional background is not as important as one might think.

Overall success rate for software-related professions is about 30%, while other professions are around 26%. Looking only at sessions that produce actual code, the gap widens to 34% vs. 29%. But if you relax the success criterion to "at least partial success," both groups are nearly equal: 89% vs. 88%.

More notably, each of the top ten professions falls within 7 percentage points of the software engineer verification success rate. Management roles even slightly outperform software engineers. Anthropic speculates that managers’ habit of assigning tasks and setting specifications translates well into commanding the agent.

Work patterns have also evolved rapidly over seven months. Bug fixing sessions decreased from 33% to 19%, nearly halving; operations like deployment, configuration, and pipeline execution increased from 14% to 21%; writing and data analysis roughly doubled from 10% to 20%.

In other words, users are applying Claude Code to more "peripheral programming tasks," not just coding itself.

The economic value of tasks has also risen in tandem. Anthropic estimates the market value per session based on freelance project rates, with an average increase of about 27% over seven months; construction tasks up about 43%, operational tasks about 34%, and repair tasks about 32%.

At the end of the report, Anthropic proposes a memorable framework: benefits come from "competence, not mastery," meaning "adequate proficiency" rather than deep expertise.

Having a basic to intermediate understanding of a domain allows you to reap most benefits; climbing from intermediate to expert significantly flattens the success rate curve.

As AI tools continue to expand, they amplify not coding skills but your depth of understanding of the problem. Those who do not understand what they are trying to solve will only get more lost faster, even with more powerful models.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
MyGateTradeStory
866.15K Popularity
#
WarshDebutsAsFedHoldsRatesSteady
1.45M Popularity
#
PredictWorldCup🇧🇷vs🇭🇹
897.34K Popularity
#
TradFiCFDGoldMasters
1.32M Popularity
#
HoldUSD1EarnYield
61.33K Popularity

Pinned

Sitemap

Anthropic Research: Domain expertise more than coding ability determines Claude Code generation performance

How an accountant can become a "specialist" in Claude's eyes

After mistakes, who can steer the agent back on track

Managers outperform software engineers; occupational differences nearly disappear

Trending Topics

MyGateTradeStory

WarshDebutsAsFedHoldsRatesSteady

PredictWorldCup🇧🇷vs🇭🇹

TradFiCFDGoldMasters

HoldUSD1EarnYield

Pinned