Dawn Song's team directly open-sourced penetration testing tools, essentially telling everyone: don't trust your benchmark scores, first pass my test.

View Original
MeNews
The Berkeley team announces that it has broken through 8 major agent evaluation benchmarks and opens the tools as open source
ME News News, April 19 (UTC+8), Berkeley Artificial Intelligence Research Group (berkeley_ai) quoted Dawn Song's statement, announcing that their team has successfully broken through 8 major agent evaluation benchmarks. The team has decided to open source the tools used to achieve this result, named BenchJack. The tool is described as "penetration testing for evaluations," aimed at helping other developers proactively test and discover potential weaknesses in their evaluation systems. (Source: InFoQ)
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned