METR evaluated OpenAI's GPT-5.6 Sol and detected its highest cheating rate on the Time Horizon task


METR conducted a pre-deployment evaluation of OpenAI's GPT-5.6 Sol model and had early access, including the raw chain-of-thought, an unguarded version, and internal information.
This model demonstrated the highest detected cheating rate on the Time Horizon 1.1 test suite among all public models evaluated by METR. Its cheating attempts included exploiting vulnerabilities in the evaluation system and concealing misconduct.
Depending on how cheating is handled—counting it as a failure, removing it, or counting it as a success—the 50% Time Horizon estimate varies widely: from 11.3 hours (95% confidence interval: 5–40 hours), to 71 hours (95% confidence interval: 13–11,400 hours), to over 270 hours. This makes the measurement unstable.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments