Semantic space operations vs. direct keyboard and mouse, two interface design approaches that are quite interesting; the latter is more practical, but the former is more versatile.

View Original
MeNews
NUS team releases GameWorld benchmark, evaluating multimodal AI agents across 34 browser games
The NUS team released the GameWorld benchmark, which includes 34 browser games and 170 tasks, equipped with verifiable metrics for objective evaluation. Tested two types of proxy interfaces: computer-use, which involves direct keyboard and mouse commands, and general multimodal agents operating in semantic space. Empirical results on 18 model-interface combinations show that even the best performers are far behind humans, with challenges remaining in real-time latency, context memory sensitivity, and action effectiveness. The related paper and code are publicly available on HuggingFace and GitHub.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned