Understanding MLOps with the ZenML Project

nimda January 22, 2025

0 20 10 minutes read

Understanding MLOps with the ZenML Project

The AI revolution is upon us, but amidst this chaos a very important question is being ignored by most of us – How do we maintain these complex AI systems? This is where Machine Learning Operations (MLOps) come into play. In this blog we will understand the importance of MLOps with ZenML, an open source MLOps framework, by building an end-to-end Project.

Learning Objectives

Understand the critical role of MLOps in simplifying and automating machine learning workflows.
Check out ZenML, an open source MLOps framework, for managing ML projects through modular coding.
Learn how to set up an MLOps environment and integrate ZenML with a hands-on project.
Build and implement an end-to-end pipeline for predicting Customer Lifetime Value (CLTV).
Get insights into creating application pipelines and the Flask application for production-grade ML models.

This article was published as part of the Data Science Blogathon.

What are MLOps?

MLOps empowers Machine Learning Developers to simplify the ML model lifecycle process. Generating a learning machine is difficult. The life cycle of machine learning consists of many complex parts such as data ingestion, data preparation, model training, model tuning, model application, model monitoring, interpretation, and many more. MLOps automates each step of the process with robust pipelines to minimize manual errors. It's a collaborative practice to simplify your AI infrastructure with less manual effort and greater efficiency. Think of MLOps as DevOps for the AI industry with some spice.

What is ZenML?

ZenML is an Open-Source MLOps framework that simplifies the development, deployment and management of machine learning workflows. Using the MLOps principle, it seamlessly integrates with various tools and infrastructure giving the user a modular way to maintain AI workflows under a single workspace. ZenML provides features like auto logs, meta-data tracker, model tracker, test tracker, artifact store and simple python decorators for core logic without complex configuration.

Understanding MLOps with a Hands-on Project

We will now understand how MLOps are implemented with the help of a simple but productive end-to-end Data Science Project. In this project we will build and deploy a Machine Learning Model to predict the customer lifetime value (CLTV) of a customer. CLTV is a key metric that companies use to determine how much they will gain or lose a customer over the long term. Using this metric the company can choose whether or not to continue spending money on the customer through targeted ads, etc.

Let's start using the project in the next section.

Initial Configuration

Now let's get straight into the project configuration. First, we need to download the online retail data set from the UCI Machine Learning Repository. ZenML is not supported on windows, so we need to use linux(WSL on Windows) or macos. Next download requirements.txt. Now let's proceed to the terminal for a few configurations.

# Make sure you have Python 3.10 or above installed
python --version

# Make a new Python environment using any method
python3.10 -m venv myenv 

# Activate the environment
source myenv/bin/activate

# Install the requirements from the provided source above
pip install -r requirements.txt

# Install the Zenml server
pip install zenml[server] == 0.66.0

# Initialize the Zenml server
zenml init

# Launch the Zenml dashboard
zenml up

Now just login to the ZenML dashboard with the default login credentials (No Password Required).

Congratulations on successfully completing Project Configuration.

Experimental Data Analysis (EDA)

Now is the time to get our hands dirty with data. We will create a jupyter notebook to analyze our data.

Pro tip : Do your own analysis without following me.

Or you can just follow this notebook where we have created different data analysis methods that we will use in our project.

Now, assuming you've done your share of data analysis, let's jump right into the spicy part.

Defining ZenML Steps as Modular Coding

To increase the Modularity and reusability of our code the @step decorator is used in ZenML which organizes our code to flow through pipelines without hassles reducing the chances of making mistakes.

In our Source folder we will write the methods for each step before running them. We follow System Design Patterns in each of our methods by creating an abstract strategy for each method (data entry, data cleansing, feature engineering, etc.)

Sample Data Import Code

Sample code for ingest_data.py

import logging
import pandas as pd
from abc import ABC, abstractmethod

# Setup logging configuration
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

# Abstract Base Class for Data Ingestion Strategy
# ------------------------------------------------
# This class defines a common interface for different data ingestion strategies.
# Subclasses must implement the `ingest` method.
class DataIngestionStrategy(ABC):
    @abstractmethod
    def ingest(self, file_path: str) -> pd.DataFrame:
        """
        Abstract method to ingest data from a file into a DataFrame.

        Parameters:
        file_path (str): The path to the data file to ingest.

        Returns:
        pd.DataFrame: A dataframe containing the ingested data.
        """
        pass
    
# Concrete Strategy for XLSX File Ingestion
# -----------------------------------------
# This strategy handles the ingestion of data from an XLSX file.
class XLSXIngestion(DataIngestionStrategy):
    def __init__(self, sheet_name=0):
        """
        Initializes the XLSXIngestion with optional sheet name.

        Parameters:
        sheet_name (str or int): The sheet name or index to read, default is the first sheet.
        """
        self.sheet_name = sheet_name

    def ingest(self, file_path: str) -> pd.DataFrame:
        """
        Ingests data from an XLSX file into a DataFrame.

        Parameters:
        file_path (str): The path to the XLSX file.

        Returns:
        pd.DataFrame: A dataframe containing the ingested data.
        """
        try:
            logging.info(f"Attempting to read XLSX file: {file_path}")
            df = pd.read_excel(file_path,dtype={'InvoiceNo': str, 'StockCode': str, 'Description':str}, sheet_name=self.sheet_name)
            logging.info(f"Successfully read XLSX file: {file_path}")
            return df
        except FileNotFoundError:
            logging.error(f"File not found: {file_path}")
        except pd.errors.EmptyDataError:
            logging.error(f"File is empty: {file_path}")
        except Exception as e:
            logging.error(f"An error occurred while reading the XLSX file: {e}")
        return pd.DataFrame()


# Context Class for Data Ingestion
# --------------------------------
# This class uses a DataIngestionStrategy to ingest data from a file.
class DataIngestor:
    def __init__(self, strategy: DataIngestionStrategy):
        """
        Initializes the DataIngestor with a specific data ingestion strategy.

        Parameters:
        strategy (DataIngestionStrategy): The strategy to be used for data ingestion.
        """
        self._strategy = strategy

    def set_strategy(self, strategy: DataIngestionStrategy):
        """
        Sets a new strategy for the DataIngestor.

        Parameters:
        strategy (DataIngestionStrategy): The new strategy to be used for data ingestion.
        """
        logging.info("Switching data ingestion strategy.")
        self._strategy = strategy

    def ingest_data(self, file_path: str) -> pd.DataFrame:
        """
        Executes the data ingestion using the current strategy.

        Parameters:
        file_path (str): The path to the data file to ingest.

        Returns:
        pd.DataFrame: A dataframe containing the ingested data.
        """
        logging.info("Ingesting data using the current strategy.")
        return self._strategy.ingest(file_path)


# Example usage
if __name__ == "__main__":
    # Example file path for XLSX file
    # file_path = "../data/raw/your_data_file.xlsx"

    # XLSX Ingestion Example
    # xlsx_ingestor = DataIngestor(XLSXIngestion(sheet_name=0))
    # df = xlsx_ingestor.ingest_data(file_path)

    # Show the first few rows of the ingested DataFrame if successful
    # if not df.empty:
    #     logging.info("Displaying the first few rows of the ingested data:")
    #     print(df.head())
    pass csv

We will follow this pattern to make other methods. You can copy the codes from the given Github repository.

After writing all the methods, it's time to implement the ZenML steps in our Steps folder. Now all the methods we have done so far, will be used in the ZenML steps accordingly.

Data Import Code Sample

Sample code for data_ingestion_step.py :

import os
import sys
sys.path.append(os.path.dirname(os.path.dirname(__file__)))

import pandas as pd
from src.ingest_data import DataIngestor, XLSXIngestion
from zenml import step

@step
def data_ingestion_step(file_path: str) -> pd.DataFrame:
    """
    Ingests data from an XLSX file into a DataFrame.

    Parameters:
    file_path (str): The path to the XLSX file.

    Returns:
    pd.DataFrame: A dataframe containing the ingested data.
    """
    # Initialize the DataIngestor with an XLSXIngestion strategy
    
    ingestor = DataIngestor(XLSXIngestion())
    
    # Ingest data from the specified file
    
    df = ingestor.ingest_data(file_path)
    
    return df

We will follow the same pattern as above to do some ZenML steps in our project. You can copy them from here.

Hey! Congratulations on creating and learning the most important part of MLOps. It's okay to be a little overwhelmed since it's your first time. Don't take too much pressure as everything will make sense if you are going to use the first production grade ML Model.

Pipeline Construction

Time to build our pipelines. No, not to carry water or oil. Pipelines are a series of steps arranged in a specific order to build our complete machine learning workflow. The @pipeline decorator is used in ZenML to specify the Pipeline that will contain the steps we created above. This method makes sure that we can use the output of one step as the input of the next step.

Here is our training_pipeline.py:

#import csvimport os
import sys
sys.path.append(os.path.dirname(os.path.dirname(__file__)))
from steps.data_ingestion_step import data_ingestion_step
from steps.handling_missing_values_step import handling_missing_values_step
from steps.dropping_columns_step import dropping_columns_step
from steps.detecting_outliers_step import detecting_outliers_step
from steps.feature_engineering_step import feature_engineering_step
from steps.data_splitting_step import data_splitting_step
from steps.model_building_step import model_building_step
from steps.model_evaluating_step import model_evaluating_step
from steps.data_resampling_step import data_resampling_step
from zenml import Model, pipeline


@pipeline(model=Model(name="CLTV_Prediction"))
def training_pipeline():
    """
    Defines the complete training pipeline for CLTV Prediction.
    Steps:
    1. Data ingestion
    2. Handling missing values
    3. Dropping unnecessary columns
    4. Detecting and handling outliers
    5. Feature engineering
    6. Splitting data into train and test sets
    7. Resampling the training data
    8. Model training
    9. Model evaluation
    """
    # Step 1: Data ingestion
    raw_data = data_ingestion_step(file_path="data/Online_Retail.xlsx")

    # Step 2: Drop unnecessary columns
    columns_to_drop = ["Country", "Description", "InvoiceNo", "StockCode"]
    refined_data = dropping_columns_step(raw_data, columns_to_drop)

    # Step 3: Detect and handle outliers
    outlier_free_data = detecting_outliers_step(refined_data)

    # Step 4: Feature engineering
    features_data = feature_engineering_step(outlier_free_data)
    
    # Step 5: Handle missing values
    cleaned_data = handling_missing_values_step(features_data)
    
    # Step 6: Data splitting
    train_features, test_features, train_target, test_target = data_splitting_step(cleaned_data,"CLTV")

    # Step 7: Data resampling
    train_features_resampled, train_target_resampled = data_resampling_step(train_features, train_target)

    # Step 8: Model training
    trained_model = model_building_step(train_features_resampled, train_target_resampled)

    # Step 9: Model evaluation
    evaluation_metrics = model_evaluating_step(trained_model, test_features, test_target)

    # Return evaluation metrics
    return evaluation_metrics


if __name__ == "__main__":
    # Run the pipeline
    training_pipeline()

Now we can use training_pipeline.py to train our ML model with one click. You can test the pipeline in your zenml dashboard :

flowchart pipeline: MLOps and Zenml Project

We can check our Model data and train multiple models and compare them in the MLflow dashboard by running the following code in the terminal.

mlflow ui

Creating a Deployment Pipeline

Next we will create deployment_pipeline.py

import os
import sys
sys.path.append(os.path.dirname(os.path.dirname(__file__)))
from zenml import pipeline
from zenml.client import Client
from zenml.integrations.mlflow.steps import mlflow_model_deployer_step
from steps.model_deployer_step import model_fetcher

@pipeline
def deploy_pipeline():
    """Deployment pipeline that fetches the latest model from MLflow.
    """
    model_uri = model_fetcher()
    
    deploy_model = mlflow_model_deployer_step(
        model_name="CLTV_Prediction",
        model = model_uri
    )

if __name__ == "__main__":
    # Run the pipeline
    deploy_pipeline()

As we run the deployment pipeline we will get a view like this in our ZenML dashboard:

Congratulations on implementing the best model using MLFlow and ZenML in your area.

Create a Flask App

Our next step is to build a Flask application that will deploy our model to the end user. For that we have to create app.py and index.html inside the templates folder. Follow the code below to create app.py:

from flask import Flask, request, render_template, jsonify
import pickle
"""
This module implements a Flask web application for predicting Customer Lifetime Value (CLTV) using a pre-trained model.

Routes:
    /: Renders the home page of the customer lifecycle management application.
    /predict: Handles POST requests to predict customer lifetime value (CLTV).

Functions:
    home(): Renders the home page of the application.
    predict(): Collects input data from an HTML form, processes it, and uses a pre-trained model to predict the CLTV. 
               The prediction result is then rendered back on the webpage.

Attributes:
    app (Flask): The Flask application instance.
    model: The pre-trained model loaded from a pickle file.

Exceptions:
    If there is an error loading the model or during prediction, an error message is printed or returned as a JSON response.
"""

app = Flask(__name__)

# Load the pickle model
try:
    with open('models/xgbregressor_cltv_model.pkl', 'rb') as file:
        model = pickle.load(file)
except Exception as e:
    print(f"Error loading model: {e}")

@app.route("
def home():
    """
    Renders the home page of the customer lifecycle management application.
    Returns:
        Response: A Flask response object that renders the "index.html" template.
    """
    return render_template("index.html")

@app.route("/predict", methods=["POST"]) #Handle POST requests to the /predict endpoint to predict customer lifetime value (CLTV).
def predict():
    """
    This function collects input data from an HTML form, processes it, and uses a pre-trained model
    to predict the CLTV. The prediction result is then rendered back on the webpage.
    Form Data:
        frequency (float): The frequency of purchases.
        total_amount (float): The total amount spent by the customer.
        avg_order_value (float): The average value of an order.
        recency (int): The number of days since the last purchase.
        customer_age (int): The age of the customer.
        lifetime (int): The time difference between 1st purchase and last purchase.
        purchase_frequency (float): The frequency of purchases over the customer's lifetime.
    Returns:
        Response: A rendered HTML template with the prediction result if successful.
        Response: A JSON object with an error message and a 500 status code if an exception occurs.
    """
    try:
        # Collect input data from the form
        input_data = [
            float(request.form["frequency"]),
            float(request.form["total_amount"]),
            float(request.form["avg_order_value"]),
            int(request.form["recency"]),
            int(request.form["customer_age"]),
            int(request.form["lifetime"]),
            float(request.form["purchase_frequency"]),
        ]
        
        # Make prediction using the loaded model
        predicted_cltv = model.predict([input_data])[0]
        
        # Render the result back on the webpage
        return render_template("index.html", prediction=predicted_cltv)

    except Exception as e:
        # If any error occurs, return the error message
        return jsonify({"error": str(e)}), 500

if __name__ == "__main__":
    app.run(debug=True)

To create the index.html file, follow the codes below :




    
    
    CLTV Prediction
    


    
    

    {% if prediction %}
        
            
Predicted CLTV: {{ prediction }}
        
    {% endif %}

Your app.py should look like this after execution :

cltv prediction: MLOps with Zenml Project

Now the last step is to make these changes in your github repository and deploy the model online on any cloud server, in this project we will deploy app.py to a free render server and you can do the same.

Go to Render.com and connect your github repository for the project to render.

That's all. You have successfully created your first MLOps project. I hope you enjoyed it!

The conclusion

MLOps have become a key practice in managing the complexity of machine learning workflows, from data entry to model execution. Using Zenml, an open source MLOps framework, we streamlined the process of building, training, and deploying a production-grade ML model for predicting Customer Lifetime Value (CLTV). Using modular code, robust pipelines, and seamless integration, we've shown how to create a successful end-to-end project. As businesses increasingly rely on AI-driven solutions, frameworks like ZenML empower teams to maintain agility, reproducibility, and efficiency with minimal manual intervention.

Key Takeaways

MLOps simplifies the ML lifecycle, reduces errors and increases efficiency with automated pipelines.
ZenML provides modular, reusable code structures for managing machine learning workflows.
Building an end-to-end pipeline involves defining clear steps, from data entry to consumption.
Application pipelines and Flask applications ensure that ML models are production-ready and accessible.
Tools like ZenML and MLFlow enable seamless tracking, monitoring, and optimization of ML projects.

Frequently Asked Questions

Q1. What are MLOps, and why are they important?

IA. MLOps (Machine Learning Operations) streamlines the ML lifecycle by automating processes such as data entry, model training, deployment, and monitoring, ensuring efficiency and scalability.

Q2. What is ZenML used for?

A. ZenML is an open source MLOps framework that simplifies the development, deployment, and management of machine learning workflows with modular and reusable code.

Q3. Can I use ZenML on Windows?

A. ZenML is not directly supported on Windows but can be used with WSL (Windows Subsystem for Linux).

Q4. What is the purpose of Zenml pipelines?

A. Pipelines in ZenML define a sequence of steps, ensuring a streamlined workflow and reusability for machine learning projects.

Q5. How does a Flask application integrate with ML modeling?

A. The Flask application acts as a user interface, allowing end users to enter data and obtain predictions from the applied ML model.

The media shown in this article does not belong to Analytics Vidhya and is used at the discretion of the Author.

Source link

nimda January 22, 2025

0 20 10 minutes read