Google engineers teach you what Loop Engineering is? Five building blocks + external memory, the AI mandatory course for 2026

Loop Engineering is a system that automatically executes prompt proxies, composed of five building blocks: Automations, Worktrees, Skills, Plugins/Connectors, Sub-agents, plus an external memory. This article originates from Google software engineer Addy Osmani.
(Previous context: Anthropic added distillation detection to Claude Fable 5, can it block Chinese open-source models?)
(Additional background: Anthropic: Leading AI models in the US to protect democracy, proposing distillation attacks as criminal offenses)

Table of Contents

Toggle

  • The five building blocks, plus some notes
  • Automations: The heartbeat of the loop
  • Worktrees: Prevent parallel chaos
  • Skills: No more repeating your project every time
  • Plugins and Connectors: Loop reaching out to your real tools
  • Sub-agents: Separate the doers from the checkers
  • What does a loop look like
  • Things a loop still doesn’t do for you
  • Building the loop, but still being an engineer

Loop engineering replaces the “manual prompt proxy” you, with a system you design to do that. The “loop” here can be understood as a recursive goal: you define the purpose, and AI iterates repeatedly until it’s done. I believe this might be how we will collaborate with coding agents in the future.

But first, it’s still very early, I have doubts myself, and you must be careful about token costs. I want to break down what this is: what it is, and what it implies.

Peter Steinberger recently said: “You shouldn’t prompt your coding agent anymore. You should design a loop that prompts your agent.” Boris Cherny, head of Claude Code at Anthropic, also said something similar: “I no longer prompt Claude. I let the loop run and prompt Claude, letting the loop decide what to do.”

Almost two years ago, getting results from a coding agent relied on writing a good prompt and providing enough context. You type a piece, read the response, then type the next. The agent is a tool, and you hold it from start to finish, one round after another. That phase is nearly over, or so some believe.

Now, you’re building a small system: it finds work, assigns work, checks work, records what’s done, then decides what’s next—this system interacts with the agent, not you directly. I previously wrote about its close relative, “agent harness engineering,” which creates an environment for running a single agent, a “factory model” for software.

Loop engineering is one level above harness: harness still exists, but it runs on schedule, creates assistants, and feeds itself.

What surprised me is that this is no longer just a tool layer. A year ago, writing a loop meant stacking a bunch of bash scripts and maintaining it yourself—your personal setup. Now, these components are built directly into products. Steinberger’s list almost perfectly matches the Codex app, and nearly matches Claude Code. Once you see that the shapes are actually the same, you stop arguing over “which tool to use,” and just design a loop that can run regardless of the tool.

The five building blocks, plus some notes

A loop needs five things, plus a memory location. I’ll list and map them.

  1. Automations: Triggered by schedule, automatically doing discovery and triage.
  2. Worktrees: Prevent parallel proxies from stepping on each other.
  3. Skills: Encode project knowledge that was once guesswork.
  4. Plugins and Connectors: Connect your proxies to tools you already use.
  5. Sub-agents: One for ideation, another for verification.

And the sixth: memory. Any markdown file, Linear board, or anything outside a single conversation used to record “what was done, what’s next” counts. Sounds dumb, no one would want to use it. But it’s the trick every long-running agent relies on. I discussed this in the long-running agents article: models forget everything between runs, so memory must be stored on disk, not in context. Agents forget, repos don’t.

Both products now have all five.

| Original Term | Role in Loop | Codex app | Claude Code | | --- | --- | --- | --- | | Automations | Scheduled discovery and triage | Automation tab: select project, prompt, cadence, environment; results go to Triage inbox; /goal for run-until-done | Scheduled tasks, cron, /loop, /goal, hooks, GitHub Actions | | Worktrees | Isolate parallel development | Each thread has a worktree built-in | git worktree, --worktree, isolation: worktree on subagent | | Skills | Encode project knowledge | Agent Skills (SKILL.md), called via $name or implicitly | Agent Skills (SKILL.md) | | Plugins / Connectors | Connect to tools | Connectors (MCP) plus deployment plugins | MCP servers plus plugins | | Sub-agents | Ideation and verification | Defined in .codex/agents/ with TOML | In .claude/agents/ with task subagents, agent teams | | State | Track completion | Markdown or via connector to Linear | Markdown (AGENTS.md, progress files) or via MCP to Linear |

Names differ slightly here and there, but the core capabilities are the same. I’ll unpack each, because honestly, the devil is in the details: whether the loop stays stable or leaks depends on these details.

Automations: The heartbeat of the loop

Automations are what make “the loop” truly a loop, not just “something you ran once.” In the Codex app, you create one in the Automations tab, choosing project, prompt, frequency, environment—whether to run locally or in a background worktree. Discovered results go into Triage inbox; undiscovered ones are archived, which is pretty nice.

OpenAI internally uses it for mundane but necessary tasks: daily issue triage, CI failure summaries, commit briefings, finding who introduced bugs last week. Automation can call skills, so your periodic tasks are maintainable: trigger $skill-name instead of pasting a big set of instructions into an outdated schedule.

Claude Code reaches the same goal but via scheduling and hooks. You can use /loop to rerun prompts or commands at intervals, schedule cron jobs, trigger shell commands at specific points in the agent lifecycle, or push the whole thing to GitHub Actions to keep running after you close your laptop. The concept is identical: define an automatic task, give it rhythm, and automatic discovery presents it to you—you don’t need to check manually.

Another native feature worth knowing is that /loop repeats on schedule, while /goal runs until your specified condition is truly met. Each round is judged by a smaller model to determine if it’s done, so “the coder” isn’t the same as “the scorer.”

For example, give it a condition like “all tests pass in test/auth, lint is clean,” and it will run until that’s true. Codex has the same /goal, running across multiple rounds until a verifiable stop condition is met, with pause, resume, and clear options. Both tools share this pattern, which is basically the core of this article.

This part is about “making work visible.” The rest of the loop is “doing something about that work.”

Worktrees: Prevent chaos in parallel

Running more than one proxy simultaneously causes file conflicts—this is how it breaks. Two proxies editing the same file is like two engineers committing to the same line without coordination. git worktree solves this: it’s a separate working directory, a dedicated branch, sharing the same repo history, so one proxy’s edits physically can’t clash with another’s checkout.

Codex supports worktrees natively, so multiple threads can work on the same repo without collision. Claude Code uses git worktree, --worktree flag (to run a session in its own checkout), and the isolation: worktree setting on subagents (each helper gets a fresh checkout, cleans up after itself). I discussed this from a human perspective in the orchestration tax article: worktrees eliminate mechanical collisions, but your review bandwidth still limits how many you can run—tools only go so far.

Skills: No more repeating your project every session

Skills let you avoid re-explaining your project context each time. Both tools use the same format: a folder with SKILL.md containing instructions and metadata, plus optional scripts, references, assets. Codex calls skills via $ or /skills, or automatically when your task matches the skill description—that’s why precise, boring descriptions beat pretty, clever ones. Claude Code does the same, as I explained in the agent skills article.

Skills also serve as “intent not to be paid again.” I argued in the intent debt article: each session starts cold, and the agent makes confident guesses to fill in gaps in your intent. Skills externalize that intent—conventions, build steps, “we don’t do that because of last time”—write it once, and the agent reads it every run. Loops without skills reinitialize everything each time; with skills, they accumulate compound interest.

One thing to clarify: skill is a format for writing, plugin is a way to distribute. When sharing a skill across repos or bundling multiple skills, you package them as plugins. Codex does this, Claude Code does too.

Plugins and Connectors: Loop reaching out to your real tools

A loop that only sees the filesystem is very limited. Connectors, built on MCP, let proxies read issue trackers, query databases, call staging APIs, send Slack messages. Both Codex and Claude Code mention MCP, so a connector you write for one is usually usable for the other. Plugins bundle connectors and skills into a package, so teammates can install your entire setup at once, instead of rebuilding from memory.

This is the difference between “the proxy tells you ‘this is a law change’” and “the loop opens a PR, links a Linear ticket, and posts in channels after CI passes.” Connectors enable the loop to act in your real environment, not just tell you “if it can, it will.”

Sub-agents: Separate doers from checkers

The most useful structural pattern in a loop is separating “the writer” and “the verifier.” The model that writes code tends to be too lenient with itself. A second proxy, with a different instruction set or even a different model, can catch the first’s self-justifications.

Codex only spawns subagents when you call for them, running in parallel, then folding results into a final answer. You define proxies in .codex/agents/ as TOML files, each with name, description, instructions, optional model and reasoning effort—your safety reviewer can be a powerful model with high reasoning budget, your explorer can be a fast, read-only one. Claude Code uses .claude/agents/ with similar subagent and team definitions: passing work between proxies. Common patterns are: one for exploration, one for implementation, one for verification against specs.

I’ve argued twice before: once called code agent orchestra, another adversarial code review. Why is this especially important in loops? Because loops run when you’re not watching—trustworthy verifier proxies are your only guarantee you can proceed confidently. Subagents do burn tokens, since each runs its own model and tools. So spend tokens where you want a second opinion. Claude Code’s /goal is roughly this: a new model decides if the loop is done, not the code-writing model—“the doer vs. the checker” applies to stopping conditions themselves.

What does a loop look like

Putting these together, a single thread becomes a mini control panel. I often imagine it like this:

An automation runs daily on the repo. It calls a triage skill, reads yesterday’s CI failures, open issues, recent commits, and writes findings into a markdown file or Linear board. For each worthwhile discovery, this thread creates an isolated worktree, assigns a sub-agent to draft a fix, then another sub-agent reviews that draft against project skills and existing tests.

Connectors enable the loop to open PRs, update tickets. Anything the loop can’t handle goes into the triage inbox. The state file is the backbone, recording what was tried, what succeeded, what’s still open—so tomorrow’s run can pick up where today left off.

Looking back, you only designed it once. No step involves prompt engineering. That’s the concrete realization of Steinberger’s quote. And the same loop can run on Codex and Claude Code, because the components are the same.

Things a loop still doesn’t do for you

Loops change the shape of work, but don’t remove you from the process. And three issues become sharper, not easier, as loops improve.

Verification still rests on you. An unmonitored loop is a loop with no one watching for errors. Separating verifier subagents from creators is about making “done” meaningful; even then, “done” is a claim, not proof. I keep repeating the same point from “code review in the age of AI”: your job is to ship code you’ve personally verified can run.

Your understanding can decay, if you let it. The faster loops ship “not your code,” the bigger the gap between “what exists” and “what you truly have.” That’s comprehension debt. A smooth loop accelerates this, unless you read what it does.

Comfort is dangerous. When a loop runs itself, you’ll want to stop having opinions and just accept what it returns. I call that cognitive surrender. Designing loops with judgment is a cure; relying on them without judgment is a poison—same action, opposite result.

Build the loop, but keep being an engineer

I see this as a preview of where our work is heading. But remember: if I don’t review code myself, or rely entirely on automatic loops to fix things, product quality drops. I’d probably get stuck in a downward spiral, digging deeper and deeper.

That said, building your loop—yet not forgetting that prompting your proxy directly remains valid—is about balance.

Loops will give very different results depending on “who” uses them. Two people running the same loop can get completely opposite outcomes. One uses it to accelerate work they deeply understand; another uses it to avoid understanding anything. You only realize the difference when you see the results.

That’s why loop design is harder, not easier, than prompt engineering. Cherny’s point isn’t that work gets simpler, but that the leverage point shifts.

Build your loop, but do it like a “continuing engineer,” not just someone who presses start and walks away.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned