How to Build a Production-Ready Claude Code Skill

1.
The Claude Code Skill ecosystem is growing rapidly. As of March 2026, the anthropics/skills repository reached over 87,000 stars on GitHub and more people are building and sharing skills every week.
How can we build a Skill from scratch in a systematic way? This article walks through designing, building, and deploying the capability from scratch. I'll use my experience posting an e-commerce review Skill (Link) as a working example throughout.
2. What is Claude's Skill?
Claude's skill is a set of instructions that teach Claude how to handle certain tasks or workflows. Skills are one of the most powerful ways to customize Claude.
Skills are built on continuous exposure. Claude downloads information in three stages:
- Metadata (name + description): Always in the context of Claude. About 100 tokens. Claude decides whether to load an ability based on this alone.
- Body of SKILL.md: It is only loaded when activated.
- Integrated resources (scripts/, directories/, assets/): It is loaded on demand when needed.
With this layout, you can install multiple Skills without opening the context window. If you keep copying-pasting the same long command, just change it to a Skill.
3. Skills vs MCP vs Subagents
Before building a skill, let me show you how Skills, MCP, and Subagents are different, so you can be sure that Skill is the right choice.
- Skills teach Claude how to behave – analyze workflows, coding standards, product guidelines.
- MCP servers give Claude new tools – sending a Slack message, asking about the database.
- Subagents let Claude do independent work in a different situation.
An analogy that helped me: MCP is the kitchen — knives, pots, ingredients. A skill is a recipe that tells you how to use them. You can combine them. The code review capability of senders, for example, defines the workflow for PR analysis and retrieves error data via MCP. But in most cases Skill alone is enough to get started.
4. Planning and Design
I jumped right into writing SKILL.md the first time and ran into problems. If the description is not properly designed, the Skill will not run. I would say spend time on design before writing commands or code.
4a. Start with Use Cases
The first thing to do is define 2-3 concrete use cases. Not a “Useful Skill” in the abstract, but a real repetitive task that you see in action.
Let me give my own example. I noticed that many of my colleagues and I were repeating the same monthly and quarterly business reviews. In e-commerce and retail, the process of breaking down KPIs tends to follow a similar pattern.
That was the beginning. Instead of building a generic 'Data Analytics Capability', I defined it this way: “A capability that takes structured CSV data, decomposes KPIs into a tree, summarizes the findings with priorities, and creates a tangible action plan.”
Here, it is important to think about how users will express their requests:
- “run my store update using this orders.csv”
- “analyze the last 90 days of sales data, explain why revenue is down”
- “Compare Q3 vs Q4, find the top 3 things I need to fix”
If you write practical instructions like this first, the position of the Skill becomes clear. Input is CSV. The axis of analysis is the KPI decomposition. The output is a review report and an action plan. The user is not a data scientist — he is a business person who wants to know what to do next.
That level of detail shapes everything else: Skill name, description, file formats, output format.
Questions to ask when defining use cases:
- Who will use it?
- In what condition?
- How would they explain their request?
- What is input?
- What is the expected result?
4b. YAML Frontmatter
Once the terms of use are clear, write a name and description. Determines if your Ability actually triggers.
As I mentioned earlier, Claude only sees the metadata to determine which Skill to load. When a user request comes in, Claude decides which Capabilities will be loaded based solely on this metadata. If the meaning is unclear, Claude will never achieve the Skill — no matter how good the instructions on the body are.
To complicate matters, Claude tends to handle simple tasks alone without consulting skills. It defaults to not starting. So your explanation needs to be specific enough for Claude to see that “this is Skill's job, not mine.”
So the definition needs to “push” in some way. Here is what I mean.
# Bad — too vague. Claude does not know when to trigger.
name: data-helper
description: Helps with data tasks
# Good — specific trigger conditions, slightly "pushy"
name: sales-data-analyzer
description: >
Analyze sales/revenue CSV and Excel files to find patterns,
calculate metrics, and create visualizations. Use when user
mentions sales data, revenue analysis, profit margins, churn,
ad spend, or asks to find patterns in business metrics.
Also trigger when user uploads xlsx/csv with financial or
transactional column headers.
The most important thing is to be transparent about what the Skill does and what it expects – “Analyze sales/income CSV files and Excel files” leaves no confusion. After that, write the configuration keywords. Go back to the use case you wrote in 4a and pull out the words that users actually say: sales data, revenue analysis, profit margins, churn. Finally, think about situations where the user would mention your Skill by name. “Also trigger when user uploads xlsx/csv with financial or commercial column headings” catches that match.
The restrictions are: name up to 64 characters, description up to 1,024 characters (per Agent Skills API). You have a place, but prioritize information that directly affects configuration.
5. Usage Patterns
Once the design is set, let's use it. First, understand the file structure, and choose the right pattern.
5a. File Structure
The physical structure of Skill is simple:
my-skill/
├── SKILL.md # Required. YAML frontmatter + Markdown instructions
├── scripts/ # Optional. Python/JS for deterministic processing
│ ├── analyzer.py
│ └── validator.js
├── references/ # Optional. Loaded by Claude as needed
│ ├── advanced-config.md
│ └── error-patterns.md
└── assets/ # Optional. Templates, fonts, icons, etc.
└── report-template.docx
Only SKILL.md is required. That alone makes the Skill work. Try to keep SKILL.md under 500 lines. If it's taking too long, move the content to the references/directory and tell Claude at SKILL.md where to look. Claude will not read reference files unless you point them there.
For background skills, a different approach works well:
cloud-deploy/
├── SKILL.md # Shared workflow + selection logic
└── references/
├── aws.md
├── gcp.md
└── azure.md
Claude only reads the appropriate reference file based on the user's context.
5b. Pattern A: Fast Only
A very simple pattern. Just Markdown commands on SKILL.md, no scripts.
Suitable for: product guidelines, coding standards, review checklists, message formatting, writing style enforcement.
When to use: If Claude's language skills and judgment are sufficient for the job, use this pattern.
Here is a compact example:
---
name: commit-message-formatter
description: >
Format git commit messages using Conventional Commits.
Use when user mentions commit, git message, or asks to
format/write a commit message.
---
# Commit Message Formatter
Format all commit messages following Conventional Commits 1.0.0.
## Format
():
## Rules
- Imperative mood, lowercase, no period, max 72 chars
- Breaking changes: add `!` after type/scope
## Example
Input: "added user auth with JWT"
Output: `feat(auth): implement JWT-based authentication`
That's all. No scripts, no dependencies. If Claude's judgment is sufficient for the job, that's all you need.
5c. Pattern B: Prompt + Scripts
Markdown instructions and executable code in scripts/directories.
Suitable for: data transformation/validation, PDF/Excel/image processing, template-based document generation, numerical reports.
Supported languages: Python and JavaScript/Node.js. Here is an example structure:
data-analysis-skill/
├── SKILL.md
└── scripts/
├── analyze.py # Main analysis logic
└── validate_schema.js # Input data validation
In SKILL.md, you specify when to call each script:
## Workflow
1. User uploads a CSV or Excel file
2. Run `scripts/validate_schema.js` to check column structure
3. If validation passes, run `scripts/analyze.py` with the file path
4. Present results with visualizations
5. If validation fails, ask user to clarify column mapping
SKILL.md explains “when and why.” Scripts capture the “how.”
5d. Pattern C: Ability + MCP / Subagent
This pattern calls the MCP servers or Subagents within the Skill workflow. Ideal for workflows involving external resources — consider creating an issue → create a branch → fix code → open a PR. More moving parts means more things to fix, so I'd recommend getting comfortable with pattern A or B first.
Choosing the Right Pattern
If you are not sure which pattern to choose, follow this order:
- Need real-time communication with external APIs? → Yes → Pattern C
- Do you need to perform fixed processing such as calculations, verification, or file conversion? → Yes → Pattern B
- Claude's language skills and his judgment handle it alone? → Yes → Pattern A
When in doubt, start with Pattern A. It's easy to add scripts later and evolve into Pattern B. But simplifying an overly complex Skill is difficult.
6. Testing
Writing SKILL.md is not the end. What makes a Skill great is how much you test it and how much you repeat it.
6a. Writing Evaluation Information
“Testing” here does not mean unit testing. It means throwing real data into the Skill and checking if it behaves correctly.
One rule of thumb for test commands: write them the way real users speak.
# Good test prompt (realistic)
"ok so my boss just sent me this XLSX file (its in my downloads,
called something like 'Q4 sales final FINAL v2.xlsx') and she wants
me to add a column that shows the profit margin as a percentage.
The revenue is in column C and costs are in column D i think"
# Bad test prompt (too clean)
"Please analyze the sales data in the uploaded Excel file
and add a profit margin column"
The problem with clean test instructions is that they don't reflect reality. Real users make typos, use abbreviations, and forget file names. A skill tested only with clean instructions will break in unexpected ways in production.
6b. Iteration Loop
The basic test loop is simple:
- Activate the Skill with test commands
- Evaluate whether the output matches what you defined as the ideal output in 4a
- Modify SKILL.md if needed
- Go back to 1
You can use this loop manually, but the Anthropic skill builder can help a lot. It semi-automates test case generation, execution, and review. It uses split train/testing for testing and allows you to review the output in an HTML viewer.
6c. Developing a Definition
As you test, you may find that the Skill works well when activated but doesn't activate often enough. Skill Creator has a built-in optimization loop for this: it splits test cases 60/40 into train/test, measures the execution level, generates optimization definitions, and selects a priority by test score.
One thing I learned: Claude rarely awakens Skills for short, simple requests. So make sure your test set covers enough complex information.
7. Distribution
Once your Skill is ready, you need to get it to users. The best method depends on whether it's just for you, your team, or everyone.
Getting Your Skills Out of Users
For most people, two methods cover everything:
ZIP download (claude.ai): ZIP the Skill folder and upload using Settings > Customize > Skills. One Gotcha – the ZIP must contain the folder itself at the root, not just the contents.
.claude/skills/ index (Claude's code): Add the Skill to your project repo under .claude/skills/. When colleagues merge the repo, everyone gets the same Skill.
Besides this, there are many options as your distribution needs grow: Plugin Marketplace for open source distribution, Anthropic Official Marketplace for wider access, Vercel's npx skills add installation of various agents, and the Skills API for system management. I won't go into each detail here – the docs cover it well.
Before sharing, check three things: the ZIP has a root folder (not just the contents), frontmatter has both name and description within character limits, and there are no hard-coded API keys.
And one more thing – hit the version field when you review. Automatic updates will not come in otherwise. Treat user feedback such as “did not run on this information” as new test cases. The iteration loop from Step 6 does not stop when launched.
The conclusion
A skill is a reusable warning with a structure. You package what you know about the domain into something that others can install and use.
Flow: decide if you need a Skill, MCP, or Subagent. Design using shapes and write a truly evocative description. Choose a simple pattern that works. Explore with dirty, practical instructions. Submit it and keep repeating.
The talent is young and there is a lot of space. If you keep doing the same analysis, the same review, the same formatting task over and over again, that repetition is your skill waiting to be built.
If you have any questions or want to share what you have, find me on LinkedIn.



