V4-Pro Internal Evaluation: Coding pass rate approaches Opus4.5, 52% of internal testers approve it as the default model.

robot
Abstract generation in progress
ME News report, April 24 (UTC+8), according to Dongcha Beating monitoring, V4 has rarely disclosed internal dogfooding data.
The team collected approximately 200 real R&D tasks from over 50 engineers, covering feature development, bug fixes, refactoring, and diagnostics, with the tech stack including PyTorch, CUDA, Rust, C++; after strict screening, 30 tasks were retained as the evaluation set.
V4-Pro-Max pass rate 67%, significantly higher than Sonnet 4.5's 47%, close to Opus 4.5's 70%, but lower than Opus 4.5 Thinking's 73% and Opus 4.6 Thinking's 80%. Haiku 4.5 pass rate is only 13%.
In an internal survey with N=85, all respondents use V4-Pro for agentic coding in their daily work.
52% believe V4-Pro can serve as the default primary coding model, 39% tend to agree, and less than 9% disagree.
The main issues reported include low-level errors, misunderstanding of ambiguous prompts, and occasional overthinking.
(Source: BlockBeats)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned