Can AI replace financial analysts? Vals AI's new version failed completely in testing, with GPT 5.5 barely surpassing a 50% accuracy rate

robot
Abstract generation in progress

According to Beating Monitoring, the AI evaluation organization Vals AI released the second-generation Financial Agent Benchmark Test (Finance Agent v2).
This is an end-to-end test simulating the workflow of a junior financial analyst, including 927 expert-reviewed questions.
The difficulty of the new test has soared significantly, with GPT 5.5 topping the chart at only 51.76% accuracy, closely followed by Claude Opus 4.7 (51.51%) and Claude Sonnet 4.6 (51.03%).

Unlike single-round Q&A, this test requires models to independently find relevant paragraphs within hundreds of pages of 10-K and 10-Q financial reports, handle cross-year financial statement adjustments, and perform multi-step calculations with precise intermediate numbers.
Vals AI revealed that if a strict scoring standard of “must be completely correct” is applied, all cutting-edge models’ accuracy rates drop below 40%; in the most difficult categories of “financial modeling” and “precedent analysis,” the highest scores are only 23%.

Regarding other models, Kimi K2.6 ranks fifth with 44.87%, making it the highest-scoring domestic model; followed by GLM 5.1 (44.79%) and DeepSeek V4 (44.08%).
Additionally, the official awarded the “Fastest Speed” label to Claude Opus 4.7 (single query time of 360 seconds), while GLM 5.1 received the “Most Cost-Effective” label (single query cost of $0.62).

The collective decline in test scores (the previous generation test saw Opus 4.7 at 64.4%) proves one point:
Current AI can handle simple retrieval tasks, but in the complex, high-precision financial deep-water areas that require adherence to industry conventions and exact numerical accuracy, it still cannot replace human analysts.

4-1.28%
GLM2.96%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned