Machine Learning

From Transactions to Trends: Predict When a Customer is About to Stop Buying

how math can solve many problems in the real world. When I was in elementary school, I didn't see it that way. I never hated math, though, and I had no trouble learning most of the basic concepts.

However, I admit that in many classes beyond classical arithmetic, I used to think, “I will never use that for anything in my life”.

However, those were other times. There was no internet, no data science, and computers were not a thing. But time flies. Life happens, and we see a day when we will solve important business problems good old math!

In this post, we will apply the popular linear regression to a different problem: predicting customer churn.

Decline vs Churn

Customer churn rarely happens overnight. In most cases, customers will gradually reduce the frequency of purchases before stopping completely. Some call that quiet commotion [1].

Predicting churn can be done with traditional churn models, which (1) require labeled churn data; (2) it is sometimes complicated to explain; (3) detect the disturbance after it has occurred.

On the other hand, this project shows a different solution, answering a simple question:

Is this customer
slow down the pace of shopping?

This question is answered by the following logic.

We use monthly purchase trends and linear regression to measure customer momentum over time. If the customer continues to increase their spending, the aggregate value will increase over time, resulting in an upward trend (or a positive slope in a linear regression, if you prefer). The opposite is also true. Low transaction prices will add to the downtrend.

Let's break the logic down into small steps, and understand what we're going to do with the data:

  1. Aggregate customer sales per month
  2. Create a continuous time index (eg 1, 2, 3…n)
  3. Fill the missing months with zero purchases
  4. Enter the linear regression line
  5. Use the slope (converted to degrees) to measure buying behavior
  6. Evaluation: A negative slope indicates a decrease in engagement. A positive slope indicates increasing participation.

Well, let's move on to the next implementation.

The code

The first thing is to import some modules into the Python session.

# Imports
import scipy.stats as stats
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

Then, we will generate some data that mimics other customer activities. You can check the complete code in this GitHub repository. The generated dataset returns the columns customer_id, transaction_dateagain total_amtand it will look like the following image.

Data set created for this task. Author's photo.

Now we're going to create a new column that outputs the month of the date, so it's easier for us to collect the data later.

# Create new column month
df['mth'] = df['transaction_date'].dt.month

# Group customers by month
df_group = (
    df
    .groupby(['mth','customer_id'])
    ['total_amt']
    .sum()
    .reset_index()
)

Here is the result.

Collected data. Author's photo.

If we take a quick look to see if there are any customers who haven't done a job every month, we'll find a few cases.

That leads us to the next point. We have to make sure that, if a customer doesn't have one purchase in a month, then we have to add that month at $0 cost.

Let's create a function that can do that and calculate the slope of a customer's buying curve.

This task looks big, but we will pass it in small parts. Let's do this.

  1. Filter data for a specific customer using Pandas query() way.
  2. Make a quick group and check that the customer has at least one purchase every month.
  3. If not, we will add the missing month at a cost of $0. I used this by combining the interim data frame with 12 months and $0 with actual data. After adding the months, those missing times will be four lines NaN in the original data column, which can be filled with $0.
  4. After that, we fix the axes. Remember that the X-axis is an index from 1 to 12, but the Y-axis is the cost amount, in thousands of dollars. Therefore, to avoid distortion in our slope, we adjust everything to the same scale, between 0 and 1. For that, we use a custom function. min_max_standardize.
  5. Next, we can program the regression using another custom function.
  6. Then we will calculate the slope, which is the first result returned to the function scipy.linregress().
  7. Finally, to calculate the angle of the slope in degrees, we will apply to pure mathematics, using the concept of arc tangent to calculate the angle between the X-axis and the linear regression line. In Python, just use functions np.arctan() again np.degrees() from numpy.
Arctan concept. Author's photo.
# Standardize the data
def min_max_standardize(vals):
    return (vals - np.min(vals)) / (np.max(vals) - np.min(vals))

#------------

# Quick Function to plot the regression
def plot_regression(x,y, cust):
  plt.scatter(x,y, color = 'gray')
  plt.plot(x,
          stats.linregress(x,y).slope*np.array(x) + stats.linregress(x,y).intercept,
          color = 'red',
          linestyle='--')
  plt.suptitle("Slope of the Linear Regression [Expenses x Time]")
  plt.title(f"Customer {cust} | Slope: {np.degrees(np.arctan(stats.linregress(x,y).slope)):.0f} degrees. Positive = Buying more | Negative = Buying less", size=9, color='gray')
  plt.show()

#-----

def get_trend_degrees(customer, plot=False):

  # Filter the data
  one_customer = df.query('customer_id == @customer')
  one_customer = one_customer.groupby('mth').total_amt.sum().reset_index().rename(columns={'mth':'period_idx'})

  # Check if all months are in the data
  cnt = one_customer.groupby('period_idx').period_idx.nunique().sum()

  # If not, add 0 to the months without transactions
  if cnt < 12:
      # Create a DataFrame with all 12 months
      all_months = pd.DataFrame({'period_idx': range(1, 13), 'total_amt': 0})

      # Merge with the existing one_customer data.
      # Use 'right' merge to keep all 12 months from 'all_months' and fill missing total_amt.
      one_customer = pd.merge(all_months, one_customer, on='period_idx', how='left', suffixes=('_all', ''))

      # Combine the total_amt columns, preferring the actual data over the 0 from all_months
      one_customer['total_amt'] = one_customer['total_amt'].fillna(one_customer['total_amt_all'])

      # Drop the temporary _all column if it exists
      one_customer = one_customer.drop(columns=['total_amt_all'])

      # Sort by period_idx to ensure correct order
      one_customer = one_customer.sort_values(by='period_idx').reset_index(drop=True)

  # Min Max Standardization
  X = min_max_standardize(one_customer['period_idx'])
  y = min_max_standardize(one_customer['total_amt'])

  # Plot
  if plot:
    plot_regression(X,y, customer)

  # Calculate slope
  slope = stats.linregress(X,y)[0]

  # Calculate angle degrees
  angle = np.arctan(slope)
  angle = np.degrees(angle)

  return angle

Good. It's time to test this job. Let's find two clients:

  • C_014.
  • This is a lead customer who buys more over time.
# Example of strong customer
get_trend_degrees('C_014', plot=True)

An expressive structure shows a tendency. We note that, although there are weak months in between, overall, prices tend to increase as time goes on.

A trending customer. Author's photo.

The habit is 32 degrees, so it points well, indicating a strong relationship with this client.

  • C_003.
  • This is a low-cost customer who buys less over time.
# Example of customer stop buying
get_trend_degrees('C_003', plot=True)
Dropping customer. Author's photo.

Here, the costs during the months are clearly decreasing, making the slope of this curve downward. The line is negative at 29 degrees, indicating that this customer is moving away from the product, thus needing to be motivated to come back.

Before You Go

Well, that's a wrap. This project shows a simple, intuitive method to find the downward trending behavior of customers using linear regression.

Instead of relying on complex churn models, we analyze purchasing trends over time to identify when customers are slowly withdrawing.

This simple model can give us a good idea of ​​where the customer is looking, whether it's a better relationship with the brand or moving away from it.

Indeed, with other data from the business, it is possible to improve this concept and use a tuned limit and quickly identify potential churners every month, based on previous data.

Before closing, I would like to give due credit to the original post that inspired me to learn more about this startup. Sent from Matheus da Rocha which you can find here, at this link.

Finally, find out more about me on my website.

GitHub Repository

Here you find the full code and documentation.

References

[1. Forbes]

[2. Numpy Arctan]

[3. Arctan Explanation]

[4. Numpy Degrees]

[5. Scipy Lineregress]

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button