Forget AGI—Top AI Models Still Struggle With Math
In brief
MATHVISTA, built with more than 6,000 annotated datapoints from Sahara AI, tests AI models on multimodal math reasoning.
GPT-4V scored 49.9%, the highest result among 12 models tested, but still 10.4 percentage points below human performance.
Researchers say progress toward AGI m
Decrypt·03-18 12:10
