Can AI replace financial analysts? Vals AI's new version failed completely in testing, with GPT 5.5 barely surpassing a 50% accuracy rate

robot
Abstract generation in progress

AIMPACT News, May 14 (UTC+8), according to Beating Monitoring, AI evaluation organization Vals AI released the second-generation Financial Agent Benchmark Test (Finance Agent v2).
This is an end-to-end test simulating the workflow of a junior financial analyst, including 927 expert-reviewed questions.
The difficulty of the new test has soared significantly, with GPT 5.5 topping the chart at only 51.76% accuracy, closely followed by Claude Opus 4.7 (51.51%) and Claude Sonnet 4.6 (51.03%).
Unlike single-round Q&A, this test requires models to independently find relevant paragraphs within hundreds of pages of 10-K and 10-Q financial reports, handle cross-year financial statement adjustments, and perform multi-step calculations with precise intermediate numbers.
Vals AI revealed that under a strict scoring standard of “must be completely correct,” the accuracy of all cutting-edge models drops below 40%; in the most difficult categories of “financial modeling” and “precedent analysis,” the highest score is only 23%.
Regarding other models, Kimi K2.6 ranks fifth with 44.87%, the highest scoring domestic model; followed by GLM 5.1 (44.79%) and DeepSeek V4 (44.08%).
Additionally, the official awarded the “Fastest Speed” label to Claude Opus 4.7 (single query time of 360 seconds), while GLM 5.1 received the “Most Cost-Effective” label (single query cost of $0.62).
The collective decline in test scores (the previous generation test saw Opus 4.7 at 64.4%) proves one point: current AI can handle simple retrieval tasks, but in the complex, high-precision financial deep-water areas that require adherence to industry conventions, it still cannot replace human analysts.
(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned