From Creative Inspiration to Reproducible AI Walkthrough with Claude's Coding Skills

1. Introduction
the application works once. It is very difficult to make it repeatable.
Prompting ChatGPT or Claude in each run is fast, but the results are inconsistent and difficult to reproduce. Building everything in Python or closing workflows improves reliability, but often removes the flexibility that makes LLMs useful for testing.
The Claude Code skill can close this gap. It preserves the flexibility of natural language, while SKILL.md and integrated documents provide enough structure to keep workflow consistent.
This approach works best for repetitive tasks with small changes, where natural language instructions are important, and where coding everything adds unnecessary complexity.
In my previous article, I walked through the process of designing, building, and deploying the Claude Code capability from scratch. In this article, I will focus on concrete research to show where the skill adds real value.
2. Use Case: Real Customer Research
Case study LLM persona interviews – using LLM to simulate client interviews for qualitative research.
Customer research is important, but expensive. Quality research with a professional agency can easily cost tens of thousands of dollars.
That is why many groups are turning to LLMs as a stand-in. You could tell ChatGPT, 'You're a 25-year-old female interested in skin care,' and ask for a reaction to a new idea. This method is fast, free, and always available.
However, when you try this method in real projects, several problems arise. They show the main limitations of ad hoc information.
3. What's Wrong with Ad Hoc Prompting
It is straightforward for the LLM to play a persona and answer questions. The real problems start when you try to make that process repeatable for multiple people, sessions, or projects.
During the personal interview process, these issues arise quickly. Responses in a shared discussion begin to focus on previous responses, results drift to common ground, and the panel is difficult to reuse for later evaluations or follow-up questions.
That is why better information alone does not solve the problem. The story is not just words. The workflow itself needs structure: stable person definitions, intentional diversity, and independent interview situations.
4. From Motivation to Reusable Skills
An important step was to write better information. It turned a fragile, multi-step workflow into a reusable Claude code capability.
Instead of manually repeating panel setup, person generation, and tracking commands every time, I can now start the entire workflow with a single command:
/persona generate 10 Gen Z skincare shoppers in the US
From the user's point of view, this looks simple. But behind that single line, the skill handles panel design, character generation, validation, and output packaging in an iterative fashion.
5. What Runs After the Command
That single command creates a workflow, not just a single prompt.
Behind the scenes, the skill does two things: it defines the structure of the panel and generates people in a controlled way. This allows us to conduct practical interviews in individual cases, so that the output can be reused for later evaluation or follow-up.

5a. Treat People Like Objects
The first change was to treat a persona as a structured data object, not just a line of dialog setup. This change makes the workflow more reliable and easier to analyze.
A nonsensical method usually looks like this:
You are a 22-year-old college student interested in skincare.
What do you think about a concept called "Barrier Repair Cream"?
The persona is vague here, and as you ask more questions, the character drifts away. Instead, I define the persona as a JSON object:

This structure allows us to narrow down the main features, so that the persona doesn't get overwhelmed by the questions. Since each individual is stored in a JSON file, you can reload the same panel for your next concept test or follow-up.
5b. Diversity Design Panel Forward, And Validate
The second change was to define the variables of the customer panel before allowing the AI model to generate the human data.
If you're just asking LLM to produce 10 people at once, you can't control the balance of the panel. The ages can come together very slowly, and attitudes often end up sounding like small variations of the same person.
So I designed the ability of the Claude code to define a mix of attitudes upfront, then generate people within that structure, and finally validate the result later. For the Gen Z skin care panel, that may mean an eclectic mix of dedicated, skin care skeptics, budget-conscious consumers, trend-chasers, and problem-driven consumers.
Once the segments are set, the skill creates individuals and ensures distribution after a generation.
Another design choice is important during the interview: each person runs in the area alone. That prevents later responses from overshadowing earlier ones and helps maintain sharp contrast across the panel.
6. Why Claude's Code Ability – Not a Prompt, Not a Python Library
The design choices above were inspired by TinyTroupe, a Python library from Microsoft Research for LLM-powered multiagent persona simulation. One of its main ideas is to treat people as objects in a multi-agent setting. I borrowed that concept, but found that using it as a Python library added more friction than I wanted in day-to-day work. So I rebuilt the workflow as a Claude code capability.
A skill fits this workflow better than a data or library because it sits in the middle ground between a variable and a structure.


Based on this comparison, the advantages of Claude's Code skill come down to key points.
There are no additional fees. Python libraries that call LLMs, including TinyTroupe, require a separate OpenAI or Claude API key, and you should watch the usage costs. If you're still trying, that little meter running in the background is causing friction. The Claude Code skill works within the subscription you already have, so scaling the panel from 10 to 20 people doesn't add more than that.
Parameters are passed as natural language. With the Python library, you must match the function signature, for example: factory.generate_person(context="A hospital in São Paulo", prompt="Create a Brazilian doctor who loves pets"). With the Claude Code skill, you can simply write:
/persona generate 10 Gen Z skincare shoppers in the US
Enough is enough.
SKILL.md it acts as a protection. The rules for designing a persona, the steps to design variations, and the overall workflow reside in the instructions file. You don't have to re-enter the information each time. Whatever the type of user, the skeleton of the operation is protected by skill.
Here's how it looks in action. Generating a panel takes one natural language command:
/persona generate 10 Gen Z skincare shoppers in the US
Ten different people are generated and saved as structured JSON objects. Part distribution and age distribution are automatically determined. Then, he ran /persona ask What frustrates you most about choosing skincare products? it interviews each individual in an independent context and returns a full picture of the panel's frustrations and needs. The full demo, which includes concept testing and semantics, is available in the demo folder on GitHub.
7. When Claude's Coding Skills Are Worth It – And When They Don't
There are situations where skill is not the right tool. Fully deterministic pipes are better as empty code. Logic that requires auditing or control review is not well suited to natural language instructions. For a single test question, just asking in the chat window is fine.
Claude's coding ability is not limited to natural language commands. You can also include Python scripts within the ability as well. In the persona skill, I use Python to validate the panel variables and aggregate the results. This allows you to combine the parts where you want to judge the LLM variable and the parts that have to decide, all with the same ability. That's what makes it different from the instant template.
The rule of thumb is simple: if your workflow requires structure but hard coding will be too difficult, skill is usually a good fit.
8. Conclusion
There's a middle ground in iterative AI work: it's too unstable to be informed ad hoc, it's too robust for a Python library. Claude Code's ability fills that gap, maintaining the flexibility of natural language while SKILL.md and integrated scripts act as monitors.
In this article, I used LLM persona interviews as a case study and walked through the key design decisions behind that workflow: organizing people as objects and designing panel variations up front. The core concepts were inspired by Microsoft TinyTroupe research.
Full SKILL.mdPython code, and a detailed demo for claude-persona are on GitHub.
Important takeaways
- Claude Code's ability resides between ad hoc information and the Python library. The balance of flexibility and guardrails makes it a good fit for packing AI workflows that are repetitive but not unique to each run.
- LLM persona interviews become more reliable when you frame people as objects and intentionally design for panel-level diversity.
- If you have an AI workflow that's too fragile to be fast but too fluid to justify a library, Claude's code skill might be the right middle layer.
If you have any questions or want to share what you have, find me on LinkedIn.
References
- TinyTroupe: github.com/microsoft/TinyTroupe
- TinyTroupe Paper: Salem, P., Sim, R., Olsen, C., Saxena, P., Barcelos, R., & Ding, Y. (2025). TinyTroupe: An LLM-powered Multiagent Persona Toolkit. arXiv:2507.09788
- claude-persona: github.com/takechanman1228/claude-persona
- How to Build a Production-Ready Claude Code Skill: I Moved to Data Science :



