SOOHAK Benchmark Reveals AI Model Flaws: Ability to Identify Unsolvable Math Problems Does Not Exceed 50%

robot
Abstract generation in progress

AIMPACT News, May 17 (UTC+8), a new AI math benchmark test SOOHAK created by the 64-bit mathematician alliance reveals critical flaws in AI models. The test includes 439 handwritten tasks, of which 99 are deliberately designed to be unsolvable. Google Gemini 3 Pro leads in research-level problems, achieving 30% accuracy, but no model can identify unsolvable tasks with over 50% accuracy. Research shows that more computational resources can improve a model’s problem-solving ability but cannot enhance its ability to recognize unsolvable problems. SOOHAK aims to quantify the gap between AI’s few standout achievements and the widespread research skills still lacking in the system.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned