Semantic action space vs direct keyboard and mouse control, comparing the two interfaces to see the data.

View Original
MeNews
NUS team releases GameWorld benchmark, evaluating multimodal AI agents across 34 browser games
ME News Report, April 17 (UTC+8), according to Beating Monitoring, the team from the National University of Singapore (NUS) released GameWorld, a benchmark designed to standardize the evaluation of multimodal large language models (MLLMs) as general agents in video games. The study points out that although video games provide an ideal closed-loop interaction testing ground, existing assessments are often limited by inconsistent operation interfaces and manual heuristic validation. GameWorld includes 34 diverse browser games and 170 tasks, with verifiable metrics based on the underlying game state for each task to achieve objective result evaluation. The research team tested two types of agent interfaces: one is a "computer-use" agent that directly outputs keyboard and mouse commands, and the other is through semantic parsing within a semantic action space.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned