Lessons From a Machine Learning Engineer — Part 1: Data | by David Martin | January, 2025
Practical ideas for a data-driven approach to model development

It is said that for a machine learning model to be successful, you need to have good data. While this is true (and very obvious), it is very difficult to define, create, and maintain good data. Let me share with you the unique techniques I've learned over the years building an ever-growing image classification system and how you can apply these techniques to your application.
With persistence and diligence, you can avoid the classic “garbage in, garbage out”, increase the accuracy of your model, and demonstrate true business value.
In this series of articles, I'll dive into maintaining and deploying a single-label image classification application and what it takes to achieve high performance. I won't go into any coding or direct user interaction, just basic concepts that you can put together to suit your needs with the tools you have.
Here is a brief description of the topics. You'll notice that the model is last on the list as we need to focus on data selection first and foremost:
- Part 1 – Data – Labeling standards, classes and subclasses