According to monitoring by Dongcha Beating, Codex’s /goal mode allows the Agent to loop continuously until the task is completed, but this amplifies the flaws in vague human prompts. OpenAI engineer Chris Hayduk pointed out from internal practical experience that vague instructions like ‘optimize code’ can cause the model to give up too early due to not knowing what the endpoint is, or to fall into a blind modification loop. To ensure the Agent can work steadily for days or even longer, he summarized three disciplines: - Eliminate qualitative terms and replace them with checklists: The model cannot assess what is ‘better,’ but it can understand ‘reduce time by 20% without failing tests.’ When faced with qualitative tasks like formatting papers, he even directly provided Codex with a Markdown checklist containing 200 formatting requirements, brutally transforming abstract tasks into quantitative ones—‘completing all checkboxes means completion.’ - Reduce validation time to minutes: The Agent needs to validate actions through testing. Do not let it run for hours in a large production environment; instead, provide it with a sample dataset and lightweight framework to make the feedback loop as short as possible. - Create three files as an ‘external brain’: Even with a large context window, it will lose memory after running for a few days. He recommends directly creating three Markdown files locally: PLAN.md (macro plan), EXPERIMENTS.md (record of experiments and outcomes), and EXPERIMENT_NOTES.md (real-time thinking drafts), forcing the model to write the trial-and-error process to the hard drive.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateSquareMayTradingShare
1.82M Popularity
#
CLARITYActPassesSenateCommittee
3.38M Popularity
#
DailyPolymarketHotspot
955.76K Popularity
#
BitcoinVShapedReversalBack
178.98M Popularity
#
WCTCTradingKingPK
803.87K Popularity

Pinned

Sitemap

Why Does Your Agent Stop Working After a Few Minutes? OpenAI Engineer: It Needs a Scoreboard and External Memory

Trending Topics

GateSquareMayTradingShare

CLARITYActPassesSenateCommittee

DailyPolymarketHotspot

BitcoinVShapedReversalBack

WCTCTradingKingPK

Pinned