Most people's written skills are wrong: 5 reflections after Anthropic disclosed their internal methodology

Author: AI Product A-Ying

I read a blog post written by the Anthropic team titled "Lessons from building Claude Code: How we use skills." This is probably the most in-depth practical summary I’ve seen so far regarding skills.

Skills are not actually complicated, but I think it’s not that easy to truly do them well.

I remember when skills first became popular, everyone loved creating various writing style skills, writing skills. It seemed that as long as you embed your writing style, the model could reliably output in that style.

But I later experimented myself and found that many times, it’s simply not feasible.

Because a style skill might contain just a few thousand or even tens of thousands of words. Once a skill is loaded, the context takes up a large chunk. When the context is large, the model’s reasoning ability tends to decline.

The result is often a situation where: the style is learned, but the content becomes shallow, and analytical ability weakens.

There’s also a common scenario.

Many people writing skills like to stuff various operational instructions inside. Step one, do this; step two, do that; step three, do this. As a result, when run, the model’s execution becomes unstable.

Later, I gradually understood that many of these repetitive tasks are actually better suited to be distilled into scripts, rather than long instructions.

After reading this article from Anthropic, my biggest impression is that many people are actually using skills, but may not truly understand what a skill is.

Fundamentally, a skill is about Context Engineering. When should you put knowledge into a skill, when should you split it into references, when should you write it as a script, and when should you use gotchas to constrain the model—these are all experiential insights.

Once you understand how skills operate, looking back at those excellent skills, you’ll find they’re never about prompt engineering per se, but about solving issues related to context, experience sedimentation, and capability reuse.

If you want to deeply study skills, I especially recommend two articles:

#01 Don’t Waste Words

A skill is essentially the tacit knowledge sedimented within an organization. So, don’t repeat common knowledge that the model already knows inside a skill. The truly valuable information is actually what the model doesn’t know at all.

Anthropic often emphasizes that what should be written in a skill are gotchas—common pitfalls.

For example:

  1. This table should not be sorted by created_at

  2. Returning 200 from staging does not mean success

  3. request_id and trace_id are the same thing

Because these pieces of information are often stored in employees’ experience. So, always remember what a skill fundamentally is.

Skill = writing down the veteran’s experience.

Through skills, we sediment the experience that was previously scattered across different minds.

#02 Skills Are Actually Context Engineering

This might be one of Anthropic’s most profound insights.

A skill is not a markdown file, but a folder. To those who have used skills, this might sound like nonsense.

But over the past couple of days, I’ve been pondering and gradually realized: they are trying to use the folder structure to express the concept of Context Engineering.

Let’s revisit the typical skill structure:

skill/ ├── SKILL.md ├── references/ (detailed explanations, API references, boundary conditions) ├── scripts/ (executable scripts) ├── examples/ (sample cases) ├── assets/ (templates, images, fixed materials)

When invoking a skill, the model first reads SKILL.md. If we stuff all information into this file, the context will quickly explode.

Suppose this is a payment troubleshooting skill, containing Stripe error codes, historical fault cases, troubleshooting scripts, and final report templates.

If all these are piled into SKILL.md, every time the skill is invoked, Claude has to re-read it.

Even if the user just wants to confirm what a certain error code means, or check why a payment status hasn’t updated, a lot of irrelevant information will also be included in the context.

Anthropic’s approach is completely different.

SKILL.md acts more like a navigation page. Its role is to tell the model that when encountering a Stripe error, it should look into references for the corresponding explanation.

When referencing historical cases, check examples; when performing actual troubleshooting, run scripts in the scripts folder; and when generating a final report, use templates from assets.

The entire process is a progressive exposure.

I strongly recommend everyone save the following diagram.

#03 Use Scripts as Much as Possible

Don’t waste the model’s limited context and reasoning ability on repetitive work. Delegate these tasks to scripts.

For example, many people write skills like this:

  1. Query registration data; 2. Query payment data; 3. Calculate conversion rate; 4. Analyze abnormal causes.

This approach is fine, and the model can handle it. But each time it runs, it has to redo the entire analysis process from scratch.

Querying data, organizing data, handling various edge cases—these are all repetitive tasks.

Since these capabilities have been validated countless times, why reinvent the wheel each time? It’s better to provide specific scripts directly.

Moreover, using scripts makes skill execution more accurate and token-efficient.

From this perspective, scripts in skills are actually about sedimenting organizational capabilities. Each script often encapsulates best practices learned from past pitfalls.

By solidifying these capabilities, Claude can work based on accumulated experience rather than starting from zero every time.

So, I increasingly believe that instructions and scripts in skills address two different levels of problems.

Instructions provide experience and judgment; scripts provide capabilities and execution.

For example, in a payment troubleshooting skill, there might be a line like:

“If Stripe returns 200, don’t assume the payment succeeded; further check the payment_events table.”

This is an instruction—based on experience. The function check_payment_events() is a script—an execution capability.

If there’s only a script, the model knows how to check but may not understand why.

If there are only instructions, the model knows what to check but has to re-implement each time. Both are necessary.

#04 Description Is More Like a Routing Rule

Many people write skill descriptions in a fundamentally wrong way.

Because they tend to describe the functionality, e.g., PR Management Skill helps users monitor PR status, handle CI issues, and automatically merge.

But the problem is, the model doesn’t find skills based on functions. When Claude Code starts, it scans all skills’ names and descriptions.

Then, based on the user’s current question, it decides which skill to load.

So, the most important information in the description isn’t what the skill does, but under what circumstances it should be loaded.

The description actually functions as a routing rule for the skill.

In the real world, few people say, “Help me call a PR management tool.” More likely, they say, “Help me watch this PR,” or “The CI is down,” etc.

Therefore, a good description should focus on describing the user’s intent, not listing features.

I even think there’s a simple way to check this.

After writing the description, delete the entire skill, leaving only this line. Then ask yourself: after seeing the user’s question, can the model tell when to load this skill?

If not, then it probably needs further refinement.

#05 Skill Management and Distribution

Another point is about skill management.

When one person uses skills, it’s quite simple—write a few skills, maintain and upgrade them personally. But I believe most teams will face the same problem later.

When skills grow from a few to dozens or hundreds, how should they be managed? How to upgrade? How to distribute to team members?

Anthropic’s experience in this area is quite worth referencing.

In smaller teams, skills are directly stored with the codebase, in the .claude/skills directory of the project. Everyone shares the same set of skills and working methods.

But as the number of skills increases, a new problem arises.

Claude Code, when starting, scans all skill names and descriptions to decide which skill to invoke for the current task. The more skills, the higher the routing cost.

This is also why Anthropic later started building a marketplace. But more interestingly, they manage the marketplace differently.

Many companies’ first reaction to this problem is to establish an approval process: whoever writes a skill submits an application; after review, it enters the official skill library. We’ve done this internally too, but it’s very rigid—overly so, just for management.

I found that Anthropic’s organization is quite lightweight.

New skills are first propagated within a small scope, allowing colleagues to install and test them themselves.

If more people start using a skill, it indicates that it truly solves a real problem. At this point, the author can submit it to the official marketplace.

So, they don’t first evaluate whether a skill has value; instead, they let it undergo real-world testing. If it’s used by many, it naturally enters the formal system. The skills that remain are mostly those genuinely needed by the team.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned