Claude keeps making errors when coding? These 12 rules reduce the error rate to 3%

Question

> Original Title: Karpathy's 4 CLAUDE.md Rules Cut Claude Mistakes from 41% to 11%. After 30 Codebases, I Added 8 More  > Original Author: @Mnilax  > Compilation: Peggy, BlockBeats  >     Editor's Note: In January 2026, Andrej Karpathy's complaints about Claude writing code led to a seemingly small but extremely critical file in the AI programming workflow: CLAUDE.md. Forrest Chang then organized these issues into four behavioral rules, aiming to constrain common coding errors by Claude: silent assumptions, overengineering, unintended damage to unrelated code, and lack of clear success criteria.  But a few months later, the use cases for Claude Code were no longer just "letting the model write a piece of code." As multi-step agents, hook chain triggers, skill loading, and multi-repository collaboration became the norm, new failure modes began to emerge: models losing control during long tasks, passing tests without verifying real logic, silently skipping errors after migration, and incorrectly mixing different coding styles.  Over six weeks, the author tested 30 codebases and added 8 new rules on top of Karpathy's original 4, attempting to cover new issues arising from AI coding shifting from single completions to agent-based collaboration.  Below is the original text:  In late January 2026, Andrej Karpathy posted a tweet thread criticizing Claude's coding style. He pointed out three typical problems: making incorrect assumptions without explanation, overcomplicating, and causing unrelated damage to code that shouldn't be changed.  Forrest Chang saw this thread, organized the complaints into four behavioral rules, wrote them into a separate CLAUDE.md file, and published it on GitHub. The project received 5,828 stars on its first day, was bookmarked 60,000 times within two weeks, and now has 120k stars, becoming the fastest-growing single-file code repository of 2026.  ![](https://img-cdn.gateio.im/social/moments-7b0a1eec7c-0a4d907328-8b7abd-e5a980)  Then, over six weeks, I tested it across 30 codebases.  These four rules are indeed effective. Errors that previously occurred with about 40% probability dropped below 3% in tasks where these rules are applicable. But the problem is, this template was originally designed to address errors that appeared when Claude was writing code in January.  By May 2026, the issues faced by the Claude Code ecosystem had changed: conflicts between agents, hook chain triggers, skill loading conflicts, and multi-step workflow interruptions across sessions.  So, I added 8 more rules. Below is the full version of the 12 rules in CLAUDE.md: why each rule is worth adding, and where the original Karpathy template can quietly fail in four specific areas.  If you want to skip the explanations and copy directly, the full file is at the end.  Why This Is Important=======  CLAUDE.md in Claude Code is the most underestimated file in the entire AI programming tech stack. Most developers tend to make three types of mistakes:  First, treating it as a preference trash bin, dumping all their habits into it, which inflates it to over 4,000 tokens and drops rule adherence to 30%.  Second, not using it at all, prompting from scratch each time. This causes five times token waste and lack of consistency across sessions.  Third, copying a template and then never updating it. It might work for two weeks, but as the codebase changes, it will silently become invalid.  Anthropic's official documentation clearly states: CLAUDE.md is essentially suggestive. Claude roughly follows it 80% of the time. Once it exceeds 200 lines, adherence drops significantly because important rules are drowned out by noise.  Karpathy's template solves this problem: one file, 65 lines, 4 rules. This is the minimum baseline.  But the ceiling can be higher. By adding the following 8 rules, it covers not only the coding issues Karpathy complained about in January 2026 but also the agent orchestration problems that emerged only in May 2026—issues that didn't exist when the original template was written.  Original 4 Rules=========  If you haven't seen Forrest Chang's repository, start with this basic version:  Rule 1: Think before coding.  Don't silently make assumptions. State your assumptions and expose trade-offs. Ask questions before guessing. When simpler solutions exist, proactively oppose them.  Rule 2: Prioritize simplicity.  Use the minimal code that solves the problem. Don't add imagined features. Don't design abstractions for one-off code. If a senior engineer would consider it overcomplicated, simplify.  Rule 3: Surgical modifications.  Only change what must be changed. Don't "optimize" neighboring code, comments, or formatting. Don't refactor unchanged parts. Keep consistent with existing style.  Rule 4: Goal-oriented execution.  Define success criteria first, then iterate until verification. Don't tell Claude how to do each step; tell it what the successful result should look like, and let it iterate itself.  These four rules can address about 40% of failure modes I see in unsupervised Claude Code sessions. The remaining ~60% are hidden in the gaps below.  ![](https://img-cdn.gateio.im/social/moments-7e81c51e50-faafd2f7be-8b7abd-e5a980)  Why I Added 8 More Rules and Their Rationale================  Each rule comes from a real moment: Karpathy's original four rules are no longer sufficient. I'll first describe the scenario, then present the corresponding rule.  Rule 5: Don't let the model do non-linguistic work-----------------  > Claude can handle: classification, drafting, summarization, extracting info from unstructured text.  > Don't use Claude for: routing, retries, status code handling, deterministic transformations.  > If a status code already answers the question, let regular code handle it.  Karpathy's rules didn't cover this. As a result, the model started deciding on issues that should be handled deterministically: whether to retry an API call, how to route a message, when to escalate. The decisions varied weekly, leading to unstable, token-costly if-else logic.  For example: there was code calling Claude to "decide whether to retry on 503." It worked well for two weeks, then suddenly became unstable because the model started including the request body as context. Retry strategies became random, as the prompt itself was random.  Rule 6: Set a hard token budget, no exceptions-----------------------  > Single task budget: 4,000 tokens.  > Single session budget: 30,000 tokens.  > If a task approaches the limit, summarize current state and restart.  > Don't push through. Explicitly expose budget overruns instead of silently exceeding.  Without budget constraints, CLAUDE.md is like a blank check. Each cycle risks spiraling into a 50,000-token context dump. The model won't stop itself.  For example: a debugging session lasted 90 minutes. The model repeatedly iterated over the same 8KB error info, gradually forgetting what fixes had been tried. By the end, it proposed 40 solutions I had already rejected. With a token budget, this process should have been terminated at minute 12.  Rule 7: Expose conflicts, don't compromise-----------------  > If two existing patterns in the codebase conflict, don't blend them.  > Choose one, favor the newer or more tested, explain why, and mark the other for cleanup later.  > Trying to satisfy both creates the worst code.  When two parts of the codebase conflict, Claude tries to please both, resulting in incoherent code.  For example: a codebase had two error handling modes—async/await with explicit try/catch, and global error boundaries. Claude wrote new code combining both, causing errors to be swallowed twice. It took 30 minutes to realize why errors were hidden twice.  Rule 8: Read before writing----------  > Before adding new code to a file, review its exports, callers, and any shared utility functions.  > If you don't understand why the existing code is organized that way, ask first.  > "I think this isn't related" is the most dangerous phrase in this codebase.  Karpathy's "surgical modifications" tell Claude not to change neighboring code. But it doesn't tell it to understand that code first. Without that, Claude may write conflicting code 30 lines away.  For example: Claude added a completely identical function next to an existing one, without reading the original. Both do the same thing, but due to import order, the new one overwrote the old, which had been the de facto standard for six months.  Rule 9: Testing isn't optional, but testing isn't the goal----------------------  > Every test must encode "why this behavior matters," not just "what it does."  > Tests like expect(getUserName()).toBe('John') are worthless if the function takes a hardcoded ID.  > If you can't write a test that fails when business logic changes, the function is wrong.  Karpathy's "goal-oriented execution" suggests tests as success criteria. But in practice, Claude treats "passing tests" as the only goal, writing shallow tests that pass but miss key logic.  For example: Claude wrote 12 tests for an authentication function, all passing. But the production auth logic was broken. The tests only verified the function "returned something," not whether it returned the correct thing. It passed because it returned a constant.  Rule 10: Long-running operations need checkpoints-------------------  > In multi-step tasks, after each step, summarize what was done, verified, and remaining.  > Don't proceed from a state you can't clearly restate.  > If you get lost, stop and restate the current state.  Karpathy's template assumes one-shot interaction. But real Claude Code workflows are multi-step: refactoring across 20 files, building features in one session, debugging across multiple commits. Without checkpoints, a mistake can lose all progress.  For example: a six-step refactor failed at step 4. When I discovered it, Claude had already completed steps 5 and 6 based on the wrong state. Fixing this took longer than redoing the entire task. Checkpoints at step 4 would have caught the issue.  Rule 11: Conventions take precedence over novelty-------------  > If the codebase uses snake_case, and you prefer camelCase: use snake_case.  > If the codebase uses class-based components, and you prefer hooks: use class-based components.  > Disagreements are another discussion. Inside the codebase, consistency beats personal preference.  > If you believe a convention is harmful, explicitly raise it. Don't silently create a fork.  In a mature codebase, Claude tends to introduce its own style. Even if "better," adding a second style is worse than sticking to one.  For example: Claude introduced hooks into a React class component codebase. It worked but broke existing tests relying on componentDidMount. It took half a day to remove and rewrite.  Rule 12: Fail explicitly, don't fail silently------------------  > If you're unsure something succeeded, state it clearly.  > If 30 records are silently skipped, don't say "migration complete."  > If you skip any tests, don't say "tests passed."  > If you haven't verified boundary cases, don't say "feature ready."  > Default to exposing uncertainty, not hiding it.  Claude's most costly failures are often those that seem like success: a function "runs" but returns wrong data; a migration "completes" but skips 14% of constraint violations; a test "passes" but the assertion is wrong.  For example: Claude claimed a database migration "completed successfully," but silently skipped 14% of records triggering constraints. The skipped behavior was logged but not explicitly exposed. Eleven days later, when reports showed anomalies, the problem was discovered.  Data Results----  Over six weeks, I tracked 50 representative tasks across 30 codebases, testing three configurations.  ![](https://img-cdn.gateio.im/social/moments-2f2dbbae07-3eec04bb2e-8b7abd-e5a980)  Error rate refers to: tasks needing correction or rewriting to match original intent. Errors include: silent false assumptions, overengineering, unrelated damage, silent failures, violation of conventions, conflict compromises, missed checkpoints.  Adherence rate refers to: the probability that Claude explicitly applies a rule when it is relevant.  The truly interesting result isn't just reducing error rate from 41% to 3%. More importantly, expanding from 4 to 12 rules, adherence rate only dropped slightly from 78% to 76%, but error rate decreased by 8 percentage points. The new rules cover failure modes not addressed by the original four, without competing for the same attention budget.  ![](https://img-cdn.gateio.im/social/moments-ce8a5be697-2b327d9f3b-8b7abd-e5a980)  Where the Karpathy Template Can Quietly Fail=====================  Even without adding new rules, the original four-rule template is insufficient in at least four areas.  First, long-running agent tasks.  Karpathy's rules target the moment Claude writes code. But what happens when Claude runs a multi-step pipeline? The original template lacks budget rules, checkpoint rules, or explicit failure signals. The pipeline drifts over time.  Second, multi-repo consistency.  "Match existing style" defaults to one style. But in a monorepo with 12 services, Claude must choose which style to follow. The original rules don't specify how. It may choose randomly or blend styles, leading to inconsistent code.  Third, test quality.  "Goal-oriented execution" treats "tests passing" as success but doesn't specify tests must be meaningful. As a result, Claude may generate superficial tests that pass but don't verify key logic.  Fourth, differences between production and prototype stages.  The same four rules can prevent overengineering in production but slow down prototyping. Early-stage code often needs 100 lines of exploratory scaffolding. Karpathy's "prioritize simplicity" can trigger over-application early on.  These 8 new rules are not meant to replace Karpathy's original four but to patch their gaps: the original template suited the January 2026 auto-completion scenario; by May 2026, Claude Code had shifted to an agent-driven, multi-step, multi-repo environment, requiring different considerations.  ![](https://img-cdn.gateio.im/social/moments-e0672eeb10-251df2c740-8b7abd-e5a980)  Which Methods Didn't Work========  Before finalizing these 12 rules, I also tried some other approaches.  Adding rules from Reddit/X I saw.  Most were either just rephrasing Karpathy's original four rules or domain-specific, like "Always use Tailwind classes." All were eventually discarded.  More than 12 rules.  I tested up to 18. Beyond 14, adherence dropped from 76% to 52%. The 200-line limit is real. Longer rules tend to cause Claude to pattern-match "here's a rule" rather than read each rule carefully.  Relying on certain tools.  For example: "Always use eslint." If eslint isn't installed, the rule silently fails. I replaced such rules with tool-agnostic ones, like "Follow the style enforced in the codebase."  Including examples instead of rules in CLAUDE.md.  Examples consume more context—about the equivalent of 10 rules—and Claude easily overfits to them. Rules are abstract; examples are concrete. Use rules.  Vague commands like "Be more careful," "Think seriously," "Focus more."  These have about 30% adherence because they can't be verified. I replaced them with more specific imperative rules, like "Explicitly state assumptions."  Telling Claude to act like a "senior engineer."  This is ineffective. Claude already thinks it is a senior engineer. The real issue is whether it acts like one. Imperative rules can narrow this gap; identity prompts don't.  Complete 12-rule version of CLAUDE.md=====================  Here is the full version ready for direct copy-paste.  Currently, cannot display outside Feishu docs.  Save it as CLAUDE.md in your repository root. Below the 12 rules, add project-specific rules—tech stack, test commands, error modes. Keep total under 200 lines; longer than that, adherence drops noticeably.  How to Install====  Two steps:  1. Append Karpathy's 4 basic rules to your CLAUDE.md  curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md    2. Paste rules 5–12 from this article below  Save the file in your repo root. The ">>" ensures you're appending, not overwriting existing project rules.  Mental Model----  CLAUDE.md isn't a wishlist; it's a behavioral contract to block observed failure modes.  Each rule should answer: what mistake does it prevent?  Karpathy's four rules address failures seen in January 2026: silent assumptions, overengineering, unrelated damage, weak success criteria. They are foundational—don't skip them.  The additional eight rules target new failure modes after May 2026: unbounded agent loops, multi-step tasks without checkpoints, superficial tests missing key logic, silent failures masquerading as success. They are incremental patches.  Of course, effectiveness varies. If you don't run multi-step pipelines, Rule 10 isn't critical. If your codebase enforces style via lint, Rule 11 may be redundant. After reading these 12, keep only those that address your actual mistakes, discard the rest.  A tailored 6-rule CLAUDE.md based on real failure modes beats a 12-rule version with half unused.  Conclusion==  Karpathy's January 2026 tweet was essentially a complaint. Forrest Chang turned it into four rules. Over 120,000 developers starred this result, most still using only those four rules today.  Models have advanced, ecosystems have changed. Multi-step agents, hook chains, skill loading, multi-repo collaboration—these didn't exist when Karpathy posted that tweet. The original four rules didn't address these issues. They weren't wrong, just incomplete.  Adding 8 rules, six weeks of testing across 30 codebases, reduced error rate from 41% to 3%.  Save this article tonight, paste these 12 rules into your CLAUDE.md. If it saves you a week of detours, share it.  [Original Link]  Click to learn about Rhythm BlockBeats' open positions  **Join the Rhythm BlockBeats official community:**  Telegram Subscription Group: https://t.me/theblockbeats  Telegram Group: https://t.me/BlockBeats_App  Twitter Official Account: https://twitter.com/BlockBeatsAsia

Claude keeps making errors when coding? These 12 rules reduce the error rate to 3%

Why This Is Important

Original 4 Rules

Why I Added 8 More Rules and Their Rationale

Rule 5: Don’t let the model do non-linguistic work

Rule 6: Set a hard token budget, no exceptions

Rule 7: Expose conflicts, don’t compromise

Rule 8: Read before writing

Rule 9: Testing isn’t optional, but testing isn’t the goal

Rule 10: Long-running operations need checkpoints

Rule 11: Conventions take precedence over novelty

Rule 12: Fail explicitly, don’t fail silently

Data Results

Where the Karpathy Template Can Quietly Fail

Which Methods Didn’t Work

Complete 12-rule version of CLAUDE.md

How to Install

Mental Model

Conclusion

Trending Topics

GateSquareMayTradingShare

DailyPolymarketHotspot

JaneStreetReducesBitcoinETFHoldings

TrumpVisitsChinaMay13

WCTCTradingKingPK

Pinned