OpenAI's strongest model o3 'cheated' suspected of using privileges to get test answers in advance, falsifying mathematical abilities?

Recently, on the Less Wrong forum, a contractor of Epoch AI, a non-profit organization named 'Meemi', posted an expose, alleging that OpenAI, the developer behind ChatGPT, privately funded the AI development mathematical Benchmark testing platform, FrontierMath, and had privileged access to the questions and answers of the FrontierMath tests, helping its latest model, o3, achieve high scores. (Background: Want to control ChatGPT? Musk wrote to the Attorney General, requesting the forced auction of OpenAI's shares) (Background: OpenAI releases Day2》Mind-blowing 'reinforcement learning fine-tuning' feature, enhancing AI's professional field learning accuracy) OpenAI, the developer behind ChatGPT, has recently been rumored to have fabricated the model, sparking widespread discussions in the tech community. The incident originated from a post by a contractor of Epoch AI, a non-profit organization named 'Meemi', on the Less Wrong forum. The article pointed out that the mathematical Benchmark testing platform, FrontierMath, used for AI development testing, had not only received funding from OpenAI, but also granted OpenAI privileged access to the latest o3 model. Further reading: OpenAI releases o3 model! Reasoning ability pushed to a higher level, paving the way for the next generation of AI Meemi accused OpenAI of obtaining the questions and answers before testing the o3 model Meemi mentioned in the article that many mathematicians and contractors involved in creating mathematical problems for FrontierMath were unaware of OpenAI's funding: The mathematicians creating problems for FrontierMath were not (actively) informed of the funding from OpenAI. Contractors were asked to keep the questions and their answers confidential, including not using Overleaf, Colab, or discussing the questions via email, and to sign an NDA (non-disclosure protocol) to ensure the confidentiality of the questions and to prevent leaks. In addition, contractors were not informed of OpenAI's funding on December 20th. I believe that even some of the authors of the signed papers were unaware of OpenAI's funding. Meemi then added that he had indirect sources indicating that OpenAI had obtained the questions and answers of FrontierMath before testing: Currently, Epoch AI or OpenAI has not publicly stated whether OpenAI was able to obtain these questions, answers, or solutions. I have indirect sources indicating that OpenAI did indeed have these questions and answers and used them for verification testing. I am not sure whether Epoch AI and OpenAI have a protocol restriction on using this dataset for training, but there are some indications that such a protocol does not exist. What is FrontierMath? It is understood that FrontierMath is a new mathematical Benchmark jointly launched by Epoch AI and more than 60 mathematicians from around the world, including professors, IMO proposers, and Fields Medal winners. These mathematical problems cover all major branches of current mathematical research, from computationally intensive problems in number theory and real analysis to abstract problems in algebraic geometry and group theory. Co-founder of Epoch AI issues an apology As the community buzzed with controversy, Tamay Besiroglu, co-founder of Epoch AI, also issued an apology on the 19th, stating: We made a mistake by not disclosing OpenAI's involvement in FrontierMath sooner. Our contract barred us from disclosing until the release of the o3 model. In hindsight, we should have pushed for earlier transparency. We acknowledge this and will do better in the future. Besiroglu also added in the blog that although OpenAI has access to FrontierMath, it has a 'verbal protocol' with Epoch AI and will not use FrontierMath's problem set to train AI models. In addition, Epoch AI has a separate reserve set as an additional safeguard for independently verifying the results of FrontierMath Benchmark testing.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)