V4-Pro Codeforces3206 surpasses GPT-5.4 to top the chart, but still loses to Opus and Gemini in long context and knowledge.

robot
Abstract generation in progress
ME News, April 24 (UTC+8), according to Dongcha Beating monitoring, the V4 technical report discloses a comparison between DeepSeek-V4-Pro-Max (highest reasoning mode) and closed-source flagships. The comparison group includes Opus 4.6 Max, GPT-5.4 xHigh, Gemini 3.1 Pro High, and open-source Kimi K2.6 and GLM-5.1, excluding the recently released Opus 4.7 and GPT-5.5. In coding, V4-Pro-Max scored 3206 on Codeforces, surpassing GPT-5.4's 3168 and Gemini 3.1 Pro's 3052, setting a new benchmark record. LiveCodeBench 93.5 is also the highest overall. SWE Verified 80.6, only 0.2 percentage points behind Opus 4.6's 80.8. In long context, V4-Pro-Max ranks second in both 1M benchmarks: CorpusQA 1M scores 62.0, trailing Opus 4.6's 71.7 but leading Gemini 3.1 Pro's 53.8; MRCR 1M scores 83.5, with Opus 4.6 leading by nearly 10 percentage points at 92.9. In agent tasks, MCPAtlas Public 73.6 is only slightly below Opus 4.6's 73.8. Terminal-Bench 2.0 scores 67.9, lower than GPT-5.4's 75.1 and Gemini 3.1 Pro's 68.5. In knowledge and reasoning, V4-Pro-Max still shows significant gaps: GPQA Diamond 90.1 (Gemini 94.3), SimpleQA-Verified 57.9 (Gemini 75.6), HLE 37.7 (Gemini 44.4). As an open-source model, V4-Pro-Max has for the first time matched or even exceeded closed-source flagships on multiple coding and long-context benchmarks, but still lags behind Gemini 3.1 Pro on knowledge-intensive evaluations. It should be noted that the above comparison does not include the recently released GPT-5.5 and Opus 4.7, and the gap between V4 and the latest generation of closed-source models awaits third-party verification. (Source: BlockBeats)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned