Machine Learning

Beyond Information: Applying Agent Skills to Data Science

In my last article, I shared how to use MCP to integrate LLMs into your complete data science workflow. I also briefly mentioned another.

The capability is a package of reusable instructions and optional support files. It helps AI manage repetitive workflows reliably and consistently. At a minimum, it requires a SKILL.md file containing metadata (name and description) and detailed instructions on how the skill should work. People often associate it with documents, templates, and examples of standardization and accuracy.

At this point, you might be wondering why we're using skills instead of just writing this whole thing directly in the Claude Code or Codex context. Another advantage is that skills help keep the main content short. The AI ​​only needs to load lightweight metadata initially—it can read remaining instructions and accumulated resources when it decides the skill is appropriate. You can find a great community collection of skills at skills.sh.

Let me make the concept more clear with a simple example.


My Example – Weekly Visualization Skill

Context

I've been doing one idea every week since 2018 – if you want to know, I wrote about my journey in this article. This process is very repetitive and usually takes me about one hour each week. So, I found it to be a good candidate for automation skills.

Examples of my vision for 2025

Workflow without AI

Here is my weekly routine:

  1. Find a dataset that interests me. Websites I often go to for inspiration include Tableau Viz of the Day, Voronoi, Economics Daily by BLS, r/dataisbeautiful, etc.
  2. Open Tableau, play with data, discover insights, and create a single view that tells a story intuitively.
  3. Published it on my personal website.

AI app

While the dataset search step is still manual, I created two skills to automate steps 2 and 3:

  • A telling a story – eg the ability to analyze datasets, identify insights, propose visualization types, and produce interactive visualizations that are accurate, concise, and focused on storytelling.
  • A viz-publish ability that publishes visibility to my website as embedded HTML – I won't share this one, as it's specific to my website repo structure.

Below is an example where I enabled the chat ability – ie in Codex Desktop. I used the same Apple Health dataset as last time, asking Codex to query the data from the Google BigQuery database, and then using the ability to generate the view. It was able to reveal information about the time of year for exercise versus calories burned, and recommended a chart type with a trade-off.

Skill trigger screenshot by author (part 1)
Skill trigger screenshot by author (part 2)

The whole process took less than 10 minutesand here's what you get — it leads with an insight-driven topic, followed by a clean interactive visualization, alerts, and data source. I've been testing the skill with my demos for the past few weeks, and you can find more examples of visualizations in the skills repo.

storytelling – i.e. a creative skill (screenshot by author)

How I Build It Actually

Now that we've looked at the output, let me show you how I build the skill.

Step 1: Start with the program

As I shared in my last article, I like to settle on a system with AI first before implementation. Here, I started by explaining my weekly visualization workflow and my goal of automating it. We discussed the technology stack, requirements, and what a “good” output should look like. This leads to my first version of the skill.

The good part is that you don't need to create the SKILL.md file manually — just ask Claude's Code or Codex to create a skill for your use case, and it can open the first version for you (it will open the ability to create a skill).

Skill building (screenshot by author)
Skill building (screenshot by author)

Step 2: Test and repeat

However, that first version only got me 10% of my ideal visualization workflow – it could generate visualizations, but the chart types were often small, the visual styles were inconsistent, and the key wasn't always highlighted, etc.

This remaining 90% requires iterative development. Here are some strategies that have helped.

1. Share my experience

Over the past eight years, I've developed my own visualization habits and preferences. I wanted the AI ​​to follow these patterns instead of inventing a different style each time. So, I shared my screenshots of the look and my style guide. The AI ​​was able to summarize common principles and update skill instructions accordingly.

Improve skills with my knowledge (screenshot by author)

2. Research external resources

There are many resources online about good data visualization design. Another useful step I took was to ask the AI ​​to research better visualization techniques from known sources and similar social skills. This added ideas that I hadn't written clearly, and made the skill grow and strengthen.

Improve skills with external resources (screenshot by author)
Improve skill with similar skills (screenshot by author)

3. Learn from experimentation

Evaluation is important to identify areas for improvement. I tested this skill on 15+ different data sets to see how it behaves and how its output compares to my own observations. That process helped me propose tangible updates, such as:

  • Standardize font selection and layout
  • Checks desktop and mobile previews to avoid overlapping labels and annotations
  • Making charts understandable even without tools
  • Always request a data source and link it to the view
Skill development from test 1 (screenshots by author)
Skill development from test 2 (screenshots by author)
Skill development in test 3 (author's screenshots)

You can find the latest version of the story telling skill here. Please feel free to play with it and let me know how you like it 🙂


Takeaways for Data Scientists

Where skills are useful

My weekly visualization project is just one example, but the skills can help advance a career in data science. They are especially important if you have a job to do it occurs repeatedly, follows a poorly structured process, depends on domain knowledge, and is difficult to manage with a single command.

  • For example, to investigate the movement of metric X. You probably already know the common drivers of X, so you always start by cutting through segments A/B/C and check upfunnel metrics D and E. This is actually a process that you can package into a skill, so that AI can follow the same analytics playbook and identify your root cause.
  • Another example: let's say you plan to conduct an experiment in region A, and you want to check other experiments that work in the same area. In the past, you could search for keywords in Slack, dig into Google Docs, and open an internal testing platform to review state-marked tests. Now, you can summarize these general steps into a skill and ask LLMs to do extensive research and produce a test report related to their goals, duration, traffic, conditions, and documents.

If your workflow contains many independent and reusable components, you should separate them into different capabilities. In my case, I created two skills – one to generate the visualization, the other to publish it on my blog. That makes pieces more modular and easier to reuse in other workflows later.

Skills and MCP work well together. I've used BigQuery MCP with the one-command display capability, and it has successfully generated visualizations based on my datasets in BigQuery. The MCP helps the model to access external tools smoothly, and the skill helps it to follow the correct procedure for a particular task.. Therefore, this combination is strong and compatible.


One last note on my weekly watch project

Now that I can automate 80% of my weekly viewing process, why do I still do it?

When I started this practice in 2018, the goal was to practice Tableau, which was the main BI tool used by my employer. However, the purpose has changed over time – now I use this weekly ritual to explore different data sets that I may not get at work, sharpen my knowledge of data and storytelling, and see the world through the lens of data. So for me, it's not really about the tool, but the process of discovery. And that's why I plan to keep doing it, even in the age of AI.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button