General-purpose multimodal agents operating in semantic space sound advanced, but high latency makes them useless.

View Original
MeNews
NUS team releases GameWorld benchmark, evaluating multimodal AI agents across 34 browser games
The NUS team released the GameWorld benchmark, which includes 34 browser games and 170 tasks, with verifiable metrics for objective evaluation. Tested two types of proxy interfaces: computer-use, which involves direct keyboard and mouse commands, and general multimodal agents operating in semantic space. Empirical results on 18 model-interface combinations show that even the best performers are far behind humans, with challenges remaining in real-time latency, context memory sensitivity, and action effectiveness. The related paper and code are publicly available on HuggingFace and GitHub.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned