V4-Pro Internal Evaluation: Encoding success rate approaches Opus 4.5, with 52% of beta testers approving it as the default model

According to Beating monitoring, V4 rarely discloses internal dogfooding data publicly.
The team collected about 200 real R&D tasks from over 50 engineers, covering feature development, bug fixes, refactoring, and diagnostics, with a tech stack including PyTorch, CUDA, Rust, and C++, and after strict screening, 30 were retained as evaluation sets.

V4-Pro-Max has a pass rate of 67%, significantly higher than Sonnet 4.5’s 47%, close to Opus 4.5’s 70%, but lower than Opus 4.5 Thinking’s 73% and Opus 4.6 Thinking’s 80%.
Haiku 4.5’s pass rate is only 13%.

In an internal survey with N=85, all respondents used V4-Pro for agentic coding in their daily work.
52% believe V4-Pro can serve as the default main coding model, 39% tend to agree, and less than 9% disagree.
The main feedback issues include low-level errors, misunderstandings of vague prompts, and occasional overthinking.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin