Only Vibe Coding, not becoming an expert! Anthropic reveals the truth: expertise is more important than writing code

The report points out that when using artificial intelligence to write programs, domain knowledge and validation ability are more critical than coding skills. Possessing professional judgment and question-setting ability can significantly improve the success rate of tasks.

On June 16, Anthropic released a research report titled "Agentic coding and persistent returns to expertise" (agentic coding refers to "delegated programming," where you give instructions, and AI reads files and executes commands to complete the task).

The report analyzes approximately 235k user interactions with Claude Code, about 400k dialogues from October 2025 to April 2026. It aims to answer a question many are anxious about: can people without formal programming training effectively command AI to perform complex technical work?

The answer provided is affirmative, but what’s truly noteworthy is the report’s conclusion: whether you can code might not be as important, and "whether you understand the task at hand" could be more crucial.

"Everyone can code," this statement is only half true

Over the past year, "vibe coding" (coding based on intuition—describing what you want in natural language, and AI generates working code without you needing to understand every line) has become popular among developers. Riding this trend, the most common narrative is: the barrier to coding has been flattened, and everyone is an engineer.

Who benefits most from this narrative? AI tool vendors and bosses who no longer need to hire engineers—it's a good story. But the report’s data corrects this into a more pragmatic version.

Anthropic evaluates each user’s dialogue, rating their professionalism on a scale from "novice" to "expert" based on verbatim transcripts. Note that this expertise level is unrelated to job titles or intelligence, and it’s task-specific.

The report gives a key example: an accountant who has never used Python, as long as they can clearly tell Claude how to set reconciliation rules and can identify boundary errors missed by AI during month-end closing, is an expert in that task; conversely, a senior engineer asking about Rust for the first time is a novice.

In other words, the "professionalism" discussed here isn’t about coding ability but about how well you understand "the problem itself." This is why oversimplifying the report to "everyone can replace engineers" is a misread—domain knowledge itself is a form of professional judgment that requires years of accumulation. It hasn’t disappeared; it’s just shifted into a bottleneck position.

What does division of labor look like: you set the questions, AI answers

The clearest diagram in the report shows the decision-making division between humans and AI. Anthropic breaks down each decision into "planning" (what to do, which method to use, how to determine completion) and "execution" (which files to modify, what code to write, which language to use). The result: on average, humans handle about 70% of planning decisions, while Claude handles about 80% of execution decisions.

Image source: Anthropic

In plain terms, humans are responsible for setting questions and validation, while AI handles the hands-on work. Moreover, the more experienced the user, the more this division leans toward "hands-off": the report finds that novices’ commands trigger about 5 actions and produce roughly 600 words; experts’ commands trigger about 12 actions and generate around 3,200 words. Skilled users are willing to delegate larger chunks because they know how to describe and how to validate.

Image source: Anthropic

This is the first counterintuitive insight of the report: the stronger the AI, the greater the leverage for experts—not smaller, but larger.

The real gap lies in success rates

Anthropic measures "whether this dialogue was successful" in two ways. The loosest standard is "at least partial success"; the strictest is verified success (meaning not only does the AI judge it as complete, but there’s also git commits, passing tests, or explicit user confirmation as tangible evidence).

Under the strictest standard: verified success rate for novices is only 15%, while intermediate to expert levels range from 28% to 33%. Under the looser standard: novices achieve 77%, and intermediate or higher levels reach 91% to 92%.

Image source: Anthropic

But there’s a detail the report emphasizes: most gains are concentrated in moving from "novice to intermediate." The curve flattens when progressing from intermediate to expert. According to the report, as long as someone has a basic grasp of a domain and can get started, they can reap most benefits; deep mastery only adds a small margin.

The gap also shows in "who can hold on when stuck." When encountering issues (errors, failed tests, repeated attempts), 19% of novices give up without writing a single line of code; others have a dropout rate of only 5% to 7%. The report interprets this as: the ability to steer AI back on track is itself a form of professionalism.

An underestimated finding: the difference in professional background is much smaller than you might think

If coding background were truly critical, software engineers should significantly outperform others. But the data says otherwise.

In dialogues that generate code, verified success rates are about 34% for software-related professions and about 29% for others—a mere 5 percentage points difference, and this gap has neither widened nor narrowed over seven months.

The report analyzed the top ten professions in the data, and each’s success rate is within 7 percentage points of software engineers. Even more counterintuitive, managerial roles have verified success rates slightly higher than software engineers.

The report offers two possible explanations: one, managers' skills in "directing, delegating, defining tasks" can transfer to commanding AI; two, measurement bias—since verified success partly depends on users explicitly confirming "yes, that’s right" during dialogue, and managers may be more accustomed to articulating instructions clearly.

Over these seven months, another notable change is the proportion of dialogues spent debugging (troubleshooting, fixing broken code), which dropped from 33% to 19%, nearly halving; meanwhile, activities like operating software (deployment, configuration, running the software) increased from 14% to 21%, and writing or data analysis roughly doubled from about 10% to 20%.

The report estimates the value of each task by comparing it to the freelance market (noting this is a relative comparison, not an exact dollar amount). The result: the average task value increased by about 27% during this period (the report summary mentions roughly 25%).

What the report doesn’t say but should be considered

The report openly admits its limitations: it cannot see real-world outcomes—whether the code generated in a dialogue is ultimately used; it also excludes "non-interactive" uses (such as embedding Claude Code into automation workflows), which constitute a significant portion. All classifications are based on the model’s interpretation of verbatim transcripts. So, this is an "early snapshot," not a definitive conclusion.

More importantly for knowledge workers, the key question at the end of the report is: if "the return on domain knowledge" begins to decline someday, it would indicate that models are starting to supply the kind of judgment that users currently have to bring in themselves.

The implication of this report is: you don’t need to panic about "not knowing how to code" and rush to take a programming course. A smarter investment is to deepen your expertise in your current field and clarify what "correct" means.

Think through the problem first, then let AI accelerate; validate first, then dare to delegate.

  • This article is reprinted with permission from 《Digital Age》
  • Original title: 《不能只懂寫Code!Anthropic揭Vibe coding真相:比起coding,「本業知識」才是最大槓桿》
  • Original author: 李先泰
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned