Codex Goal Mode User Guide: How to Enable AI to Continuously Pursue a Specific Objective

Original Title: A guide to /goal
Original Author: @dkundel, OpenAI Developer Relations team
Translation: Peggy

Editor's Note: This article comes from Dominik Kundel, a member of OpenAI's Developer Relations team, summarizing his experience with the Codex "goal mode / /goal" feature. It discusses not just a typical prompt technique, but a role shift happening in AI programming tools: Codex is no longer just a code assistant responding to single-turn instructions, but is beginning to become an executable agent capable of continuously advancing around clear objectives.

In /goal mode, what truly matters is not writing requirements in longer, more detailed prompts, but setting clear, verifiable exit criteria for Codex. For example, "reduce deployment time by 30%," "achieve 100% test coverage parity," or "bring LCP below 2.5 seconds." These metrics allow Codex to determine whether the task is complete and prevent it from endlessly trial-and-error in vague goals. Meanwhile, users need to provide enough guidance, tools, and real environment access so Codex can measure progress and verify results, rather than just completing seemingly feasible solutions locally or under hypothetical conditions.

The article especially warns that visual tasks are most prone to trapping Codex in details. Instead of demanding "100% pixel-perfect restoration," it’s better to break visual goals into feature lists, design system specifications, and measurable indicators. For long-term tasks lasting hours or days, continuous tracking through commits, draft PRs, progress documents, Slack updates, or side chats is necessary to avoid ending up with a pile of untraceable changes.

The added value of this article is redefining /goal as a "long-term task management mechanism." When AI can execute continuously for dozens or even hundreds of hours, developers’ core skills also shift: not just making AI generate code, but defining goals, establishing metrics, configuring execution environments, and finally conducting reviews and retrospectives. In other words, AI programming is moving from "prompt writing" toward "managing a continuously working engineering executor."

Below is the original text:

We introduced goal mode (/goal) to help you keep Codex moving toward a specific result. Once you set a goal, Codex will keep working until the goal is achieved—whether it takes a few hours or several days. Some have already had Codex work on the same goal for over 120 hours continuously.

Goal mode is very powerful. To maximize its effectiveness, there are 7 things to keep in mind when using /goal.

Set Clear, Verifiable Standards

When you activate goal mode, the prompt you input can serve as the initial prompt, but more importantly, it becomes the exit criterion for this goal. Codex will check after each iteration: has this goal been completed?

Therefore, your goal prompt should not be overly long, but should focus on a clear standard: under what conditions is this goal considered achieved?

In most cases, a good goal includes a specific numerical indicator to help the model judge completion. For example:

"Reduce build and deployment time by 30%."

"Migrate this feature from TypeScript to Rust and achieve 100% test parity."

"Optimize the application scaffold so that Largest Contentful Paint (LCP) in production is below 2.5 seconds."

This prompt doesn’t always need to include numbers, but generally, numbers make subsequent steps easier to push forward.

If you're unsure how to define a goal, or want to brainstorm the project with Codex first, you don’t need to start with goal mode right away.

Codex can set its own goals. You can start a normal conversation, and when you're ready for Codex to begin executing, ask it to set a goal based on the previous discussion.

You can also edit the goal at any time: click the edit button in the Codex app, or use /goal again in CLI.

Provide Guidance as Much as Possible

Prompts like "Reduce build and deployment time by 30%" sound impressive and might inspire Codex to find creative solutions. But if you already have a rough idea of where the problem lies, such prompts might lead Codex astray.

Therefore, when possible, it’s better to tell Codex where to start troubleshooting, what tools to use, or give other hints to avoid it going down the wrong path.

For example, my colleague @reach_vb did this in an experiment: he told Codex it could use Chrome to access Google Colab, and specified some acceptable constraints, such as allowing it to generate datasets during model training.

Similarly, if you want to shorten build times and already know where most of the time is spent, it’s best to point Codex to that area in your prompt.

Another approach is to have Codex do some preliminary research in plan mode, and create a plan document to record potential solutions. Later, you can reference this plan in your goal.

Make Progress Measurable

If your goal is ambitious or Codex has multiple ways to approach it step-by-step, it’s crucial to provide tools for measuring progress.

For some tasks, this may be natural. For example, optimizing build time or increasing test coverage, since Codex can usually use relevant tools or naturally create them.

But for other goals, it’s better to brainstorm with Codex: what tools help judge progress? Or give it hints on how to confirm it’s moving toward the goal. For example, creating visual difference tools for comparing screenshots, or building an evaluation set for a debugging agent.

I once asked Codex to replicate components from a video, and it created a tool to compare screenshots and check differences. Later, it iterated on this tool, adding different comparison modes.

Image: A screenshot generated by Codex for visual comparison of two frames.

Depending on the task, you also need to consider whether additional standards need to be measured or checked. Otherwise, Codex might think the task is done when it’s not in your view.

For example, Codex might crop design reference images and embed them into the page to achieve "pixel-perfect" UI restoration; or it might reduce test coverage to hit 100% test pass rate. These are not truly what you want.

Create a Real Environment

If you want Codex to make genuine progress toward your goal, it needs to operate in a sufficiently realistic environment.

In practice, this means: if you want to optimize deployment time or latency, Codex should have access to deployment and testing environments that closely resemble production. Using the same tech stack, configuration switches, and similar databases.

For example, we once debugged deployment and build time issues for developers.openai.com. We were already using deployment previews, so Codex could deploy using these preview environments and check logs. But the problem was, our preview deployments and full production environment had some build paths disabled.

As a result, Codex had to manually deploy to an environment closer to production to truly identify issues.

Similarly, you can let Codex use computer use (the ability for the model to operate real applications) to test actual apps. To optimize iOS performance issues, @dimillian even used physical devices for the most accurate testing environment.

Be Cautious with Visual Goals

Setting a visual goal, like "100% pixel-perfect restoration of this UI," can be tempting. But depending on the setup, it might cause problems.

If you don’t provide proper guidance and constraints, Codex might get lost in details and overlook the overall goal. For example, if the reference image contains graphical elements, and you expect Codex to generate those—whether SVG icons or images—it might spend a lot of effort "precisely copying" those assets instead of properly breaking down the problem.

Additionally, Codex needs tools to perform visual comparisons correctly. This means more image inputs and higher token consumption, but it doesn’t necessarily give Codex a simple way to identify truly valuable improvements.

Therefore, images are better used as context rather than the sole completion standard. You should find other ways for Codex to judge whether the goal is achieved, such as feature lists, implementation specs, or conformance to design systems.

Track Progress

If Codex is working in the background for hours or days, or running on another machine, it’s easy to forget where it’s at or what work has been done.

Depending on your goal, I find these methods helpful:

· Have Codex commit code at key milestones and push to a draft PR. Especially useful if you’re building a website with preview deployments.

· Have Codex update a management-facing deliverable. It could be an HTML file you keep open in a browser, a page deployed via Sites for the team, a rendered progress chart, or just a Markdown document.

Instruct Codex to proactively share progress updates. You can also include this in your goal: ask Codex to send updates to Slack or other channels when significant progress is made.

Use other chat windows to ask for status. If you just want a quick update, run /side to start a new side chat and ask there. It will branch from the current thread, with full context, but has a short lifespan.

Another alternative in the Codex app is to start a new normal chat, have Codex read another goal thread, and answer your questions. If you automate periodic progress checks, this method becomes especially powerful.

Clean Up and Finalize Results

Great, the goal is finally achieved! Can you just hand off the results to your team and call it a day?

Generally, especially for optimization tasks, I find it helpful to have Codex review and audit its own work. You can run /review for a local code review, but it’s also worth having Codex reflect more deeply: what paths did it try? Which worked? Which didn’t? Then clean up the code accordingly.

Since Codex works until the goal is reached, it might try some ineffective or even counterproductive methods, leaving residual changes in the final code.

Set a goal for your next task as well

The goal feature in Codex is an extremely powerful tool to help you tackle some of the most meaningful engineering challenges. But it only works efficiently when you provide the right environment and instructions.

What have you used /goal for?

[Original Link]

Click to learn more about Rhythm BlockBeats job openings

Join the Rhythm BlockBeats official community:
Telegram Subscription Group: https://t.me/theblockbeats
Telegram Group Chat: https://t.me/BlockBeats_App
Twitter Official Account: https://twitter.com/BlockBeatsAsia

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned