Augment Code actual test of AGENTS.md's impact on code generation: the best is equivalent to upgrading the model by one tier, the worst is worse than not writing it.

robot
Abstract generation in progress

ME News, April 23 (UTC+8), according to Beating monitoring, AI programming tool company Augment Code extracted dozens of AGENTS.md files from its own monorepo, and used its internal evaluation suite AuggieBench to measure their real-world impact on the outputs of coding agents. The method was to use already-merged high-quality PRs as benchmarks, then have the agent redo the same task under two conditions—with and without AGENTS.md—and compare the scores. The gap was much larger than expected. The best-written AGENTS.md improved quality by an amount comparable to upgrading the model from Haiku to Opus, while the worst-written version was still worse than having no AGENTS.md at all.

Moreover, the same document could have opposite effects on different tasks: it increased compliance for a bug-fix specification by 25%, but reduced completion of a complex feature in the same module by 30%. There are several effective writing approaches: keeping the main file to between 100 and 150 lines, pairing it with a few focused reference documents, and in medium-sized modules with around a hundred core files, achieving an overall improvement of 10% to 15%. Writing the process as numbered steps works best—using a 6-step deployment process reduced PRs with missing files from 40% to 10%, and increased accuracy by 25%. Using decision tables to help the agent choose the right approach before acting also improved compliance by 25%. For prohibitions, alternative solutions must be provided; simply writing “don’t” makes the agent indecisive, and performance noticeably deteriorates with more than 15 consecutive warnings.

The easiest way for this to go wrong is having too many documents. Once the agent is pulled into a large amount of architecture documentation, after loading hundreds of thousands of tokens, its output actually gets worse. In one module, 226 documents totaled over 2MB, and even the best AGENTS.md was useless. Additionally, AGENTS.md is the only document location that the agent reads 100% of the time; documents under _docs/ that are not referenced have a discovery rate of less than 10%. (Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned