Are Claude and Codex getting dumber the more you use them? It's because your context is too bloated.

From how to control context and handle AI flattery tendencies to how to define task termination conditions, this is currently the clearest practical guide I’ve seen on implementing Claude/Codex.

Author: sysls

Translation: Deep潮 TechFlow

Deep潮 Guide: Developer blogger sysls with 2.6 million followers wrote a long practical article that 827 people shared and 7,000 liked. The core message is simple: your plugins, memory systems, and various harnesses are probably doing more harm than good. This article skips the big principles and is all about actionable insights drawn from real production projects — from how to control context, handle AI flattery, to how to define task termination conditions. It’s the clearest practical guide on Claude/Codex engineering I’ve seen so far.

Full Text:

Introduction

You’re a developer, using Claude and Codex CLI every day, constantly wondering if you’ve squeezed all their capabilities. Sometimes you see them do ridiculously stupid things and can’t understand why some people seem to build rockets with AI while you can’t even stack two stones steadily.

You think it’s your harness problem, plugin problem, terminal problem, whatever. You’ve used beads, opencode, zep; your CLAUDE.md has 26,000 lines. But no matter how much you tinker, you just can’t understand why you’re drifting further from paradise while others are playing with angels.

This is the article you’ve been waiting for.

By the way, I have no vested interest. When I say CLAUDE.md, I include AGENT.md; when I say Claude, I include Codex — I use both extensively.

Over the past few months, I’ve noticed something interesting: almost no one truly knows how to maximize the potential of agents.

It feels like a small handful of people can make agents build entire worlds, while the rest are lost in a sea of tools, suffering from choice paralysis — thinking that finding the right package, skill, or harness combo will unlock AGI.

Today, I want to break all that, leave you with a simple, honest message, and start from there. You don’t need the latest agent harness, you don’t need to install a hundred packages, and you definitely don’t need to read a million articles to stay competitive. In fact, your enthusiasm might do more harm than good.

I’m not here for sightseeing — I started using agents when they could barely write code. I’ve tried all packages, all harnesses, all paradigms. I’ve written signal, infrastructure, and data pipelines with agent factories — not “toy projects,” but real production use cases. After all that…

Today, I use a configuration so simple it’s almost trivial, relying only on basic CLI (Claude Code and Codex), plus a fundamental understanding of core agent engineering principles, and I’ve achieved my most breakthrough work ever.

Understanding the World is Moving Fast

First, I want to say that foundational model companies are in a epoch-making sprint, and it’s clear they won’t slow down anytime soon. Every improvement in “agent intelligence” changes how you collaborate with them because agents are designed to be increasingly obedient to instructions.

Just a few generations ago, if you wrote in CLAUDE.md “Read READTHISBEFOREDOINGANYTHING.md before doing anything,” it had a 50% chance of telling you “go to hell,” then doing whatever it wanted. Today, it obeys most instructions, even complex nested ones — like “Read A first, then B, if C, then D,” and it’s happy to follow most of the time.

What does this mean? The key principle is recognizing that each new generation of agents forces you to rethink what’s optimal — which is why less is more.

When you use many different libraries and harnesses, you lock yourself into a “solution,” but that problem might not even exist for the next-gen agents. Do you know who the most enthusiastic and highest-usage users of agents are? That’s right — employees at cutting-edge companies, with unlimited token budgets, using the latest models. Do you understand what that implies?

It means that if a real problem exists and there’s a good solution, the cutting-edge companies will be the biggest users of that solution. And what will they do next? They’ll incorporate that solution into their products. Think about it: why would a company allow another product to solve real pain points and create external dependencies? How do I know this is true? Look at skills, memory harnesses, sub-agents… They all start from solutions that solve real problems, proven useful through practical testing.

So, if something is truly groundbreaking and can meaningfully expand agent use cases, it will eventually be integrated into the core products of foundational companies. Trust me, foundational companies are racing ahead. So relax — you don’t need to install anything or rely on external dependencies to do your best work.

I predict the comments will soon say “SysLS, I used XYZ harness, and I rebuilt Google in a day!” — to which I say: congratulations! But you’re not the target audience. You represent an extremely niche group that truly understands agent engineering.

Context is Everything

Honestly. Context is everything. Another problem with using many plugins and external dependencies is “context bloat” — meaning your agent is overwhelmed with too much information.

Let me do a quick Python guessing game? Easy. Wait, what’s that “manage memory” note from 26 conversations ago? Ah, the user’s screen froze 71 conversations ago because we generated too many subprocesses. Always write notes? Sure… What does that have to do with guessing the word?

You see. You only want to give the agent exactly the information needed to complete the task — no more, no less! The better you control this, the better the agent performs. Once you start introducing various memory systems, plugins, or overly complicated skill calls, you’re giving the agent instructions for building a bomb and a cake recipe at the same time — when you just want it to write a poem about redwoods.

So, I preach again — strip away all dependencies, then…

Do Something Truly Useful

Describe Implementation Details Precisely

Remember, context is everything?

Remember to inject exactly the information needed to complete the task, no more, no less?

The first way to do this is to separate research from implementation. Be extremely precise about what you ask the agent to do.

What are the consequences of being imprecise? “Build a authentication system.” The agent has to research: what is an authentication system? What options are there? What are their pros and cons? Now it has to search online for a bunch of information that’s not really usable, cluttering the context with possible implementation details. When it’s time to implement, it’s more likely to get confused or hallucinate unnecessary or irrelevant details.

Conversely, if you say “Implement JWT authentication with bcrypt-12 password hashing, refresh token rotation, 7-day expiry…”, it doesn’t need to research alternatives, just knows what you want, and can fill the context with implementation details.

Of course, you won’t always know the implementation details. Often, you don’t know what’s correct, and sometimes you want to delegate the decision on implementation details to the agent. What to do then? Simple — create a research task to explore options, either decide yourself or let the agent choose, then have another agent with a fresh context implement it.

Once you start thinking this way, you’ll notice where your agent’s context gets unnecessarily polluted, and you can set up isolation walls in your workflow, abstracting away unnecessary info, leaving only the specific context that makes it excel at the task. Remember, you have a talented, smart team member who knows all kinds of spheres — but unless you tell him you want a space for dancing and fun, he’ll keep talking about the benefits of spherical objects.

Design Limitations of Flattery Tendencies

No one wants a product that constantly criticizes you, tells you you’re wrong, or ignores your instructions altogether. So, these agents will try hard to agree with you and do what you want.

If you tell it to add a “happy” after every 3 words, it will try its best — most people understand this. Its obedience is what makes it such a useful product. But here’s an interesting feature: it means if you say “Help me find a bug in the codebase,” it will find a bug — even if it has to “manufacture” one. Why? Because it very much wants to obey your command!

Most people quickly complain about hallucinations and fabrications, but don’t realize the problem lies with them. Whatever you ask it to find, it will deliver — even if it means stretching the truth a bit!

What to do? I find “neutral prompts” very effective — don’t bias the agent toward a particular outcome. For example, instead of “Help me find a bug in the database,” say “Scan the entire database, follow the logic of each component, and report all findings.”

Such neutral prompts sometimes find bugs, sometimes just describe how the code runs objectively. But they won’t bias the agent toward “having a bug.”

Another way to handle flattery is to turn it into an advantage. I know the agent is eager to please me and follow my instructions, so I can bias it this way or that.

For example, I have a bug-finding agent that scores low-impact bugs +1, moderate +5, severe +10. I know this agent will enthusiastically identify all types of bugs (including non-bugs), then report a score like 104. I see this as the super set of all possible bugs.

Then I have an adversarial agent that refutes these bugs, earning the bug’s score for each successful refutation, but penalizes wrong refutations with -2 times the bug’s score. This agent will try to refute as many bugs as possible but remains cautious due to penalties. It will still actively “refute” bugs (including real ones). I see this as the subset of real bugs.

Finally, I have a judge agent that consolidates both inputs and scores them. I tell the judge I have the true answer, and it gets +1 for correct, -1 for wrong. It scores each bug from the bug detector and refuter. The judge’s verdict on what’s true is what I verify. Most of the time, this method is surprisingly high-fidelity, occasionally wrong, but close to error-free.

You might find that just the bug detector alone suffices, but this method works well for me because it leverages each agent’s innate tendency — to please.

How to Tell What’s Useful and Worth Using?

This question seems tricky, like you need to learn deeply and track AI frontiers constantly. But it’s actually simple… If OpenAI and Claude implement it or acquire companies that do, then it’s probably useful.

Notice that “skills” are everywhere now and are part of Claude and Codex’s official docs? Notice OpenAI acquired OpenClaw? Notice Claude added memory, voice, and remote work features?

What about planning? Remember how many people found that planning before implementation is very useful, and it became a core feature?

Yes, those are useful!

Remember endless stop-hooks? Super useful because agents are very reluctant to do long-running tasks… then Codex 5.2 came out, and that need disappeared overnight?

That’s all you need to know… If something is truly important and useful, Claude and Codex will implement it themselves! So you don’t need to worry too much about “new stuff” or “familiarity with new things,” and you don’t even need to “stay updated.”

Help me out. Occasionally update your chosen CLI tools, see what new features they have. That’s enough.

Compression, Context, and Assumptions

Some people find a huge pitfall when using agents: sometimes they seem the smartest beings on Earth, other times you can’t believe you’re being played.

“Is this thing smart? It’s a damn fool!”

The biggest difference is whether the agent is forced to make assumptions or “fill in the gaps.” Today, they’re still terrible at “connecting dots,” “filling gaps,” or making assumptions. As soon as they do, it’s obvious — performance drops sharply.

One of the most important rules in CLAUDE.md is about how to get context, and instructs the agent to read that rule first every time it reads CLAUDE.md (after each compression). As part of context acquisition, a few simple instructions can have a huge effect: re-read the task plan, and before continuing, re-read relevant files.

Tell the agent how to end the task

Humans have a pretty clear sense of “task completion.” The biggest problem with current intelligence is that it knows how to start a task but not how to end it.

This often leads to frustrating results: the agent ends up just implementing some stubs and stops.

Testing is a great milestone because it’s deterministic — you can set very clear expectations. Unless these X tests pass, your task isn’t done; and you don’t allow modifications to the tests.

Then you just review the tests, and once all pass, you’re good. You can automate this, but the key point — “task ending” is natural for humans, not for agents.

Do you know what recent feasible task endpoint is? Screenshot + verification. You can have the agent implement something until all tests pass, then have it screenshot and verify the design or behavior on the screenshot.

This allows you to iterate and steer the design without worrying about it stopping after the first try!

A natural extension is to create a “contract” with the agent, embedding it into rules. For example, a {TASK}CONTRACT.md specifies what needs to be done before you’re allowed to terminate the session. Inside {TASK}CONTRACT.md, you specify tests, screenshots, and other validations needed before you can end the task!

Always-Running Agents

A common question I get is: how can I run an agent 24/7 and ensure it doesn’t go off track?

Here’s a simple method. Create a stop-hook that prevents the agent from ending the session unless all parts of {TASK}_CONTRACT.md are completed.

If you have 100 such well-defined contracts, the stop-hook will prevent termination until all 100 are fulfilled, including all tests and validations!

Pro tip: I find long 24-hour sessions aren’t optimal for “doing things.” Partly because this structure inherently introduces context bloat — unrelated contracts’ contexts all pile into one session!

So, I don’t recommend that.

A better automation approach: open a new session for each contract. Whenever you need to do something, create a new contract.

Build an orchestration layer that creates a new contract and session whenever “something needs to be done.”

This will totally change your agent experience.

Iterate, Iterate, Iterate

You hire an assistant. Do you expect them to know your schedule from day one? Or how you drink coffee? Or that you have dinner at 6 PM instead of 8? Of course not. You gradually develop preferences over time.

Agents are the same. Start with the simplest configuration, forget about complex structures or harnesses, give the basic CLI a chance.

Then gradually add your preferences. How?

Rules

If you don’t want the agent to do something, write it as a rule. Then tell the agent in CLAUDE.md. For example: “Before coding, read coding-rules.md.” Rules can be nested, conditional! If you’re coding, read coding-rules.md; if testing, read coding-test-rules.md; if tests fail, read coding-test-failing-rules.md. You can create arbitrary logical branches for the agent to follow, and Claude (and Codex) will happily follow, as long as it’s clearly specified in CLAUDE.md.

In fact, this is my first practical advice: treat your CLAUDE.md as a logical, nested directory, indicating where to find context under specific scenarios and results. Keep it as concise as possible, only containing “if-then” logic for “where to look for context under what conditions.”

If you see the agent doing something you disapprove of, add it as a rule, tell the agent to read that rule before doing it again, and it will definitely stop doing it.

Skills

Skills are similar to rules but more about operational steps rather than coding preferences. If you want a certain task to be done in a specific way, embed it into a skill.

People often complain they don’t know how the agent will solve a problem, which feels unsettling. To make it deterministic, have the agent research how it would solve it, then write that plan into a skill file. You’ll see how the agent plans to handle the problem beforehand, and you can correct or improve it before it encounters the real issue.

How do you tell the agent about this skill? Exactly! Write in CLAUDE.md: “When encountering scenario X, read SKILL.md.”

Managing Rules and Skills

You’ll want to keep adding rules and skills. That’s how you give it personality and preferences. Almost everything else is redundant.

Once you do this, your agent will feel like magic. It will “do things your way.” And you’ll finally feel like you “got it” in agent engineering.

Then…

You’ll see performance start to decline again.

What’s going on?!

It’s simple. As you add more rules and skills, they start conflicting, or the agent begins to suffer from severe context bloat. If you need the agent to read 14 markdown files before coding, you’ll face the same problem of useless info piling up.

What to do?

Clean up. Let your agent “do a spa,” consolidate rules and skills, and eliminate contradictions by updating your preferences.

Then it will feel like magic again.

That’s the secret. Keep it simple, use rules and skills, treat CLAUDE.md as a directory, and mind their context and design limitations carefully.

Be Responsible for Results

Today, there’s no perfect agent. You can delegate a lot of design and implementation work to it, but you need to be responsible for the results.

So, be cautious… and enjoy it!

Playing with future toys (while obviously using them for serious work) is a joy!

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
  • Pin