Microsoft MDASH tops the CyberGym leaderboard, with a vulnerability reproduction rate of 88.4%

robot
Abstract generation in progress

AIMPACT News, May 14 (UTC+8), CyberGym cybersecurity assessment framework update ranking, Microsoft MDASH (Multi-Model System) ranks first with an 88.4% vulnerability reproduction success rate, surpassing Anthropic Agent (83.1%) and OpenAI Agent (GPT-5.5, 81.8%). The framework includes 1,507 benchmark test cases covering historical vulnerabilities in 188 large software projects. MDASH not only reproduces known vulnerabilities but also discovers 35 zero-day vulnerabilities and 17 historically incomplete patches. CyberGym is built on real vulnerabilities found by OSS-Fuzz, with the evaluation environment including the patch before codebase, requiring the agent to reason over the entire codebase (thousands of files, millions of lines of code) to generate proof of concept. Thanks are given to Taesoo Kim and others from Microsoft’s Autonomous Code Security team. (Source: InFoQ)

4-2.86%
ANTHROPIC-2.65%
OPENAI-0.17%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned