Stop the lost hearing: How to improve ML system design

nimda October 16, 2025

0 9 5 minutes read

Stop the lost hearing: How to improve ML system design

Whether you're a data scientist or an ML engineer, designing a machine learning program is one of the most important skills you need to know. It is the bridge between building models and implementing solutions that drive real business results.

The ability to turn ML ideas into production plans that will save money, increase revenue, and create measurable value determines your long-term career growth and income.

I've built machine learning systems that have saved companies over $1.5 million a year, and these same skills have helped me deliver a $100,000 donation.

In this guide, I'll break down how I think about ML System Design so you can do the same.

General Outline

Below is my outline of how to approach building a machine learning program:

Note: This is the most common type of machine learning design used in an established technology company. There are other cases, more limited cases, such as infrastructure development and AI / ML research.

An outline drawing made by the author.

If you want a PDF copy of this template, you can get access using this link:

Let's break down these steps in more detail.

A business problem

The purpose of this step is to:

Clarify Objectives –What business or user problem are you trying to solve, and how can you translate that solution to machine learning?
Define metrics –What are the metrics we target: accuracy, F1-Score, ROC-AUC, precision / recall, RMSE, etc., etc. and how that translates into business performance.
Constraints and scope– How much compute is available, do we want live time prediction or batch softening, do we even need machine learning?
Superior design– What will the bad artwork look like from the data to the visualization?

Text

This is about gathering and receiving data:

Identify data sources –Databases, APIs, logs, or user generated data.
Identify the target variable– What is the target audience and how do we find them?
Quality control– What kingdom is it? Are there any legal issues using the data?

An engineering feature

Create novel features from data to address a specific problem:

The importance of the feature– Understanding what factors are likely to drive target conversions.
Data Cleaning –Manage missing values, vendors, and inconsistent entries.
Presentation Feature– Single encoding, encoding, embedding, and data measurement.
Sample and crack– Account for unbalanced data, data leakage, and separate well from training and testing data.

Design Design & Selection

This is where you demonstrate your theoretical knowledge of machine learning models:

A jacket– Start with a model or a Heuristic or a simple Heuristic and build up little by little.
Preparing for the game– Cross validation, hyperparameter programming, early positioning.
Exchange– Consider trade-offs such as training speed, measurement speed, latency, and interpretation.

Service and delivery

Understanding the best way to work and deploy a model to production.

Infrastructure –Choose Cloud / On-Prem, set up CI / CD pipelines, and ensure stability.
To serve– API Endpoint, Edge model, batch predictions vs online prediction.

Inspection and Monitoring

The last part sets up the systems and framework to follow your model to the production environment.

Metrics– What metrics to track with the “Online” Model “Offline” model.
Monitoring– Set up a dashboard, check notebooks, slack alerts.
Experimental work– Create A/B tests.

What Can You Learn?

Let me tell you a secret: Machine learning design is not a conversation about entry or skill set.

This is because the design of the machine learning program is evaluated in the middle of the above levels.

By then, you'll have a strong background in both machine learning and software engineering, and you'll likely have expertise.

Anyway, if you want a complete, but not complete list, this is what you need to read.

Machine Learning Concept

Supervised Reading –Classification (Logistic Regression, support vector machines, decision trees), regression (linear regression, decision trees. Gradient boosting trees).
Random reading –Clustering (k-means, DBSCAN), dimensionality reduction, semantic analysis
Deep learning –Neural networks, neural networks are neural networks and similar neural networks.
Lost jobs –Precision, F1-Score, NDCG, Precision / Recall, RMSE etc.
Feature selection –How to identify key features, such as convergence analysis, iterative feature elimination, feature deletion, cross validation and hyperparameter tuning.
Mathematics –Bayesian statistics, hypothesis testing and A/B testing.
Expertise– Time Series, Computer Vision, Performance Research, Recommendation Systems. Natural language processing etc. You only need 1-2.

System design and engineering

Clouds– The main one is AWS, and you should know S3, EC2, Lambda functions, and ECS. A lot of things are just for storage and integration anyway.
Environmental protection– DOCKER and Kubernetes.
System design– Caching, networks, rates, apis and storage.
Version control– Circleci, Jenkins, GIT, MLFLOW, Datadog, tools and privacy.
Submissions and frameworks for decorations– Argo, Metaflow, Databricks, Airflow and Kubeflow.

Resources

ML System Design Discussions

I plan to release a detailed video on the learning process of the machine learning program later, but for now, I'd like to give you a high-level overview and some tips to help you prepare.

Machine Learning Tutorials are often aimed at intermediate and advanced machine learning developers. In these discussions, you will often be presented with a broad, open-ended problem such as designing a Replender program or a spam filter.

If your role involves a specific skill, such as computer vision, the interview question will often focus on that specific background.

One of the biggest challenges with programming design discussions is their lack of standardization. Unlike software engineering interviews, which follow a fixed format, ML design interviews vary in structure. There is also a lot to cover: countless concepts, trade-offs, and possible approaches.

That said, most hiring managers often evaluate candidates in a few key ways:

Interpretation of the Problem –Can you take a business problem and position it as a machine learning solution?
Decision making –Do you recognize the trade-offs and make your design decisions logical?
Width and depth –Do you demonstrate a solid understanding of ML theory, various models, and how to effectively apply them to real-world situations?

How to prepare for interviews

Regarding preparations, there is one important thing that I recommend.

Work on past problems.

Here are some resources for finding such problems:

I also recommend looking at the posts of major Tech Companies to learn more about how machine learning algorithms are being deployed at scale:

Earlier, I discussed how the systems that generate conversations test more than just your modeling skills.

But which fundamentals are actually tested?

That's exactly what I covered in one of my previous articles, which will walk you through everything you need to know, along with some great resources.

The Ultimate AI / ML Roadmap for Beginners

One thing!

I offer 1:1 coaching calls where we can discuss anything you need – whether it's designs, career advice, or just figuring out your next step. I'm here to help you move forward!

1:1 Call preparation by EGOR Howell
Career Guidance, Career Advice, Project Help, Resume Reviewtopic.io