SOOHAK Benchmark Reveals AI Model Flaws: Ability to Identify Unsolvable Math Problems Does Not Exceed 50%

robot
Abstract generation in progress
AIMPACT News: On May 17 (UTC+8), SOOHAK, a new AI math benchmark test created by the 64-bit mathematician alliance, reveals critical flaws in AI models. The test includes 439 handwritten tasks, 99 of which are deliberately designed to be unsolvable. Google Gemini 3 Pro leads on research-level questions, with a 30% accuracy rate, but no model can identify unsolvable tasks with more than 50% accuracy. Research finds that more computational resources can improve a model’s ability to solve problems, but cannot strengthen its ability to recognize when a problem is unsolvable. SOOHAK aims to quantify the gap between AI’s few standout achievements and the broad research skills the system still lacks.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments