Machine Learning

Democratizing Marketing Mix Models (MMM) with open source and Gen AI

they have been in this industry for several years and recently got a revival. As digital tracking signals are withdrawn from increasing data privacy restrictions, Marketers are turning to MMMs for a strategic, reliable, secure privacy measure and attribution framework.

Unlike user-level tracking tools, MMM uses aggregated time series and disaggregated data to measure how well marketing channels are driving business KPIs. Advances in Bayesian modeling with improved computing power have brought MMM back to the center of marketing analytics.

For years marketers and media agencies have used and relied on Bayesian MMM to understand marketing channel contributions and marketing budget allocations.

The role of GenAI in modern MMM

An increasing number of companies are now using GenAI features as an enhancement to MMM in several ways.

1. Data Processing and Feature Engineering
2. Pipeline Automation: Generates MMM pipeline code
3. Definition of details – translate model details into simple business language
4. Planning the situation and optimizing the budget

Although these capabilities are powerful, they rely on MMM's proprietary engines.

The purpose of this article is not to show how Bayesian MMM works but to show the power open and free system design that vendors can explore without the need to sign up for the MMM black box stack provided by vendors in the industry..

The method includes:

1. Google Meridian as an open Bayesian MMM engine
2. Open source Large Language Model (LLMs) – Mistral 7B as a cognitive and interactive layer on top of Meridian's Bayesian inference.

Here is an architectural diagram representing the design of the proposed open source system for advertisers.

This architecture diagram was created using design tools that help Gen-AI perform rapid prototyping

This open source workflow has several advantages:

  1. Democratization of Bayesian MMM: eliminating the black box problem of proprietary MMM tools.
  2. Cost Effectiveness: reduces the financial barrier for small/medium enterprises to achieve advanced analytics.
  3. This separation preserves the computational robustness required for MMM engines and makes them more accessible.
  4. With GenAI's insights layer, audiences don't need to understand Bayesian math, instead they can simply engage using GenAI's insights to learn about channel contribution model insights, ROI, and potential budget allocation strategies.
  5. Adaptability to new open source tools: the GenAI layer can be replaced by new LLMs as and when they are openly available for improved insights.

An example of using the Google Meridian MMM model with the LLM layer

For the purpose of this demonstration, I used the open source model Mistral 7Blocated in the area of A Hugging Face platform is managed by Llama engine.

This framework should be domain-agnostic, i.e. any other open source MMM variants such as Meta's Robyn, PyMC, etc. and LLM versions of the GPT and Llama models can be used, depending on the scale and scope of detail desired.

Important note:

  1. A synthetic marketing dataset was created, with a KPI such as 'Conversions' and marketing channels such as TV, Search, Paid Social, Email, and OOH (Out of Home Media).
  2. Google Meridian produces rich outputs such as ROI, channel coefficients and contributions to driving KPI, response curves, etc. Although this output sounds good mathematically, it often requires a specialist to interpret. This is where an LLM becomes valuable and can be used as a an understanding translator.
  3. Google Meridian python code examples were used to apply the Meridian MMM model to the generated marketing data. For more information on how to use the Meridian code, please see this page.
  4. An open source LLM model, Mistral 7B, was used due to its compatibility with the free class of Google Colab GPU resources and for being an adequate model for generating learning-based data without relying on any API access requirements.

For example: below is a snippet of Python code used on the Google Colab platform:

# Install meridian: from PyPI @ latest release 
!pip install --upgrade google-meridian[colab,and-cuda,schema] 

# Install dependencies 
import IPython from meridian 
import constants from meridian.analysis 
import analyzer from meridian.analysis 
import optimizer from meridian.analysis 
import summarizer from meridian.analysis 
import visualizer from meridian.analysis.review 
import reviewer from meridian.data 
import data_frame_input_data_builder 
from meridian.model import model
from meridian.model import prior_distribution 
from meridian.model import spec 
from schema.serde import meridian_serde 
import numpy as np 
import pandas as pd

A synthetic marketing dataset (not shown in this code) was created, and as part of the Meridian workflow requirement, an example input dataset builder was created as shown below:

builder = data_frame_input_data_builder.DataFrameInputDataBuilder( 
   kpi_type='non_revenue', 
   default_kpi_column='conversions', 
   default_revenue_per_kpi_column='revenue_per_conversion', 
   ) 

builder = ( 
   builder.with_kpi(df) 
  .with_revenue_per_kpi(df) 
  .with_population(df) 
  .with_controls( 
  df, control_cols=["sentiment_score_control", "competitor_sales_control"] ) 
  ) 

channels = ["tv","paid_search","paid_social","email","ooh"] 

builder = builder.with_media( 
  df, 
  media_cols=[f"{channel}_impression" for channel in channels], 
  media_spend_cols=[f"{channel}_spend" for channel in channels], 
  media_channels=channels, 
  ) 

data = builder.build() #Build the input data

Configure and run the Meridian MMM model:

# Initializing the Meridian class by passing loaded data and customized model specification. One advantage of using Meridian MMM is the ability to set modeling priors for each channel which gives modelers ability to set channel distribution as per historical knowledge of media behavior.

roi_mu = 0.2  # Mu for ROI prior for each media channel.
roi_sigma = 0.9  # Sigma for ROI prior for each media channel.

prior = prior_distribution.PriorDistribution(
    roi_m=tfp.distributions.LogNormal(roi_mu, roi_sigma, name=constants.ROI_M)
)

model_spec = spec.ModelSpec(prior=prior, enable_aks=True)

mmm = model.Meridian(input_data=data, model_spec=model_spec)


mmm.sample_prior(500)
mmm.sample_posterior(
    n_chains=10, n_adapt=2000, n_burnin=500, n_keep=1000, seed=0
)

This code snippet uses a meridian model with defined priors for each station in the generated input dataset. The next step is to test the performance of the model. Although there are model output parameters like R-squared, MAPE, P-Values ​​etc. can be tested, for the purpose of this article I just include an example of a visual test:

model_fit = visualizer.ModelFit(mmm)
model_fit.plot_model_fit()

Now that the Meridian MMM model has been released, we have modeled output parameters for each media channel, such as ROI, response curves, model coefficients, spending levels, etc. We can bring all this information into a single JSON input object that can be used directly as input to LLM to generate the information:

import json

# Combine everything into one dictionary
genai_input = {
    "roi": roi.to_dict(orient='records'),
    "coefficients": coeffs.to_dict(orient='records'),
    "priors": priors.to_dict(orient='records'),
    "response_curves": response_curves.to_dict(orient='records')
}

# Convert to JSON string for the LLM
genai_input_json = json.dumps(genai_input, indent=2)

It downloads the Mistral 7B LLM from the Hugging Face platform locally and installs the Llama engine needed to run the LLM:

# Download the Mistral 7B llm from Hugging Face
!wget -O /content/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf 


# Install Llama for python engine
!pip install llama-cpp-python --upgrade

It runs Mistral LLM using JSON input with Meridian MMM output and includes the appropriate tutorial command:

from llama_cpp import Llama

# Initialize the model
llm = Llama(
    model_path="/content/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
    n_ctx=2048,        
    n_gpu_layers=35,    
    seed=42,
    verbose=False,
    temperature=0.2
)

# Convert JSON to a prompt for recommendations
prompt = f"""
You are a marketing mix modeling expert.

1. Summarize the effeciency of each marketing channel based on MMM output
2. Explain the key drivers of ROI for each channel in simple business terms.
3. Channels with diminishing returns
4. If we want to maximize total ROI with the current budget, how should we reallocate spend across channels

JSON:
{genai_input_json}
"""

# Generate recommendations
response = llm(prompt, max_tokens=300)

# Print only the text output
print(response["choices"][0]["text"])

Example output:

1. Based on Marketing Mix Modeling (MMM) output, TV, OOH, and Search Ads have positive ROI (3.0, 2.5, and 1.5 respectively), while Social Media and Email have low ROI (0.9 and 0.7 respectively).

2. The key ROI drivers for each channel in simple business terms are:

– TV: Every dollar spent on TV ads generates $3 in sales.

– Social Media: Every dollar spent on Social Media ads generates $0.9 in sales.

– Search Ads: Every dollar spent on search ads generates $2.5 in sales.

– Email: Every dollar spent on email marketing generates $0.7 in sales.

– OOH: Every dollar spent on Out-of-Home (OOH) advertising generates $1 .5 in sales.

3. Channels with diminishing returns are Social Media and Email as their response curves show that sales growth is not proportional to increasing marketing costs.

4. To maximize the overall ROI with the current budget, we should consider reallocating the spend from Social Media and Email to TV and OOH as it has a higher ROI.

Useful Considerations

  • The quality of the model and details still depends on the quality of the input data.
  • A quick design is essential to avoid misleading details.
  • Automation of input data processing and model output reporting and visualization will help this stack work at scale.

Final thoughts

This walkthrough shows how an open-source Bayesian MMM extended with GenAI workflows can translate complex Bayesian results into actionable insights for marketers and leaders.

This approach does not attempt to simplify the calculations of Marketing Mix Models, rather it preserves it and makes an effort to make it easily accessible to a wider audience with limited knowledge of the model and budgeting resources of their organization.

As privacy-protected marketing analytics become the norm, open source MMM systems with GenAI extensions offer a sustainable approach: transparent, flexible, and designed to evolve in both the business and the underlying technology.

Resources and References:

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button