Designing Data and AI Systems Embedded in Manufacturing

0 1 4 minutes read

Designing Data and AI Systems Embedded in Manufacturing

In the Author Spotlight series, TDS Editors talk to members of our community about their work in data science and AI, their writing, and their sources of inspiration. Today, we are excited to share our interview Mike Huls.

Mike is a technology leader working at the intersection of data engineering, AI, and architecture, helping organizations transform complex data environments into reliable, usable systems. With a strong full-stack background, he designs end-to-end solutions that balance technical depth with business value. Alongside client work, he builds and shares practical tools and insights into databases, AI systems, and unmanaged architecture.

Do you consider yourself a full-stack developer? How does your experience across the stack (from front-end to database) change the way you view the role of data science?

I do, but not in the sense of manually building all the layers. To me, full stack means understanding that architectural decisions in one system's behavior in one framework, risk and cost over time. That perspective is important when designing systems that need to survive change.

This perspective also influences how I view the role of data science. Models created in notebooks are only a starting point. The real value comes when those models are embedded in production systems with the right data pipelines, APIs, governance, and user-facing interfaces. Data science is impactful when it is treated as a core part of a larger system, not as a stand-alone activity.

You cover many topics. How do you decide what to focus on next, and how do you know when a new topic should be explored?

I tend to follow repeated conflicts. When I see multiple teams struggling with the same problems, whether technical or organizational, I take that as a sign that the issue is systemic rather than individual, and should be addressed at the structural or process level.

I also deliberately try new technologies, not something new, but to understand their trade-offs. A topic becomes worth writing about when it solves a real problem I'm facing or reveals a risk that isn't widely understood. Ultimately, I write about topics that I find interesting and worth exploring, because it's the continued interest that allows me to go deeper.

He has written about LangGraph, MCP, and proxy agents. What is the biggest misconception you think people have about AI agents today?

Agents are truly powerful and open up new opportunities. The misconception is that they are easy. It's easy today to put together a cloud infrastructure, connect an agent framework, and produce something that seems to work. That accessibility is important, but it hides a lot of complexity.

When the agents go beyond the demos, the real challenges emerge. State management, permits, cost control, monitoring, and failure management are often overlooked. Without clear boundaries and ownership, agents become unpredictable, expensive, and risky to operate. It's not just about being informed about tools; they are long-lived software programs and need to be designed and implemented accordingly.

In your article on Layered Architecturehe says that adding features can often feel like “open-heart surgery.” For a startup or a small data team that wants to avoid this, what is your top advice for setting up the architecture?

“The only change” is the name for a good reason so prepare for the change rather than the initial delivery speed. Even a little bit of layered thinking helps: separating logical domain, application flow, and infrastructure concerns.

The goal is not to have the buildings perfect on day one or to have perfect classification. It's about creating clear boundaries that allow the system to evolve without constant rewriting. A little upfront discipline pays off big as systems grow.

Balanced PostgreSQL installation techniques and noted that “faster is not always better.” In an ML production pipeline, in what situation would you deliberately choose a slower, safer installation method?

If accuracy, traceability, and recovery are more important than raw performance. For most pipelines, reducing the execution time by a few seconds offers little benefit compared to the risk posed by weak guarantees.

For example, pipelines serving regulatory reporting, financial decision-making, or long-term training datasets benefit from transaction security and transparent authentication. Silent data corruption is more expensive than accepting limited performance trade-offs, especially if the data becomes a long-term asset for others to build upon.

In yours Personal, Agent Assistants topic, build a 100% private, self-hosted forum. Why was avoiding “token costs” and “privacy leaks” more important to you than using a more powerful, cloud-based LLM?

In my daily work I have seen that trust in the system is essential for system adoption. Token costs, opaque data flows, and external dependencies subtly influence how systems are used and perceived.

I also made a decision not to store my personal or sensitive data with external cloud providers as there are limited guarantees on how the data is handled over time. By keeping the system to myself, I can design an assistant that is predictable, readable, and aligned with European privacy expectations. Users have full control over what the assistant can access and this lowers the barrier to using the assistant.

Finally, not all use cases require the largest or most expensive model. By isolating the system from one provider, users can choose the model that best suits their needs, scalability, cost, and risk.

How do you see the daily job of a data scientist changing in 2026?

Despite common perceptions, data engineering and software engineering is a highly public profession. I strongly believe that the most important part of the work happens before coding: aligning with stakeholders, understanding the problem area, and designing solutions that fit existing systems and teams.

This earlier work becomes even more important as agent-assisted development accelerates implementation. Without clear goals, context, and constraints, agents increase confusion rather than productivity.

By 2026, data scientists will spend more time designing systems, defining parameters, validating assumptions, and ensuring appropriate behavior in production environments.

Looking ahead to the rest of 2026, what are the big topics that will define the year for data professionals, in your opinion? Why?

Generative AI and agent-based systems will continue to grow, but the biggest change is their maturation into first-class production systems rather than testing.

That change depends on reliable, high-quality, accessible data and robust engineering practices. As a result, holistic thinking and system-level design will be critical for organizations that want to use AI responsibly and at scale.

To learn more about Mike's work and stay up to date with his latest articles, you can follow him on TDS or LinkedIn.

Source link

nimda 9 hours ago

0 1 4 minutes read