Spent several days testing OpenAI Codex


Even with GPT-5.4 xhigh (highest reasoning level) enabled,
the main model still makes quite a few mistakes.
For example, once it misunderstood the instructions and directly deleted things it shouldn't have. Another time was even more outrageous: it thought it had successfully written the code, but in reality, it hadn't.
The same mistake was made three times, each time only caught by Opus during review afterward.
My current conclusion is that Codex is very suitable as a tool, giving it clear coding tasks it can do quickly and well.
But to use it as the main model to understand complex multi-step instructions or decide whether to act? Still falls short.
For now, I will still primarily rely on Opus.
View Original
post-image
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin