Machine Learning

An Effective Time Series Toolkit for Fuzzy Detection, Using Python

The interesting features of time series are internal complexity of a seemingly simple data type.

At the end of the day, in a time series, you have an x ​​axis that usually represents time

However, the evolution of the interest rate (y axis) over time (x axis) is where the complexity is hidden. Does this variable introduce a habit? Does it have data points that clearly deviates from the expected signal? Is that so stable or unexpected? Average value of value big ones than we would expect? All of which can in some way be described as which is not understood.

This article is a compilation of many techniques for finding anomalies. The goal is that, given a large time series dataset, we can identify which the time series is a puzzle as well why.

These are the 4 season series spoilers we'll get:

  1. We will find any trend in our time series (Trend anomaly)
  2. We will test how variable the time series is (volatility anomaly).
  3. We will see anomalous points in the time series (single point anomaly).
  4. We will see anomalies within our signal bank, to understand which signal behaves differently in our set of signals (dataset-level anomalies).
Photo taken by the author

We will clearly explain each method of finding anomalies in this collection, and we will show a Python implementation. All the code I used for this blog post is included in the PieroPaialungaAI/timeseriesanomaly GitHub folder

0. The dataset

In order to build an anomaly collector, we need to have a dataset where we know exactly what anomalies we are looking for, so we know if our anomaly detector is working or not. To do that, I created a data.py script. The script contains a DataGenerator object:

  1. You are studying our dataset configuration in the config.json* file.
  2. Old anomaly dataset
  3. It gives you the ability easily the shop data and plot see.

Here is the code snippet:

Photo taken by the author

So we can see that we have:

  1. A shared axis timefrom 0 to 100
  2. A series of many times which form a time series The dataset
  3. Each time series presents one or more which is not understood.

The confusion, as expected:

  1. Code of Conductwhere the time series has linear or polynomial degree behavior
  2. Instabilitywhen the time series is more volatile and changing than usual
  3. Change of levelwhere the time series has a higher average than normal
  4. A confusing pointwhere the time series has one surprising point.

Now our goal will be to have toolbox which can identify each of these anomalies for the entire dataset.

*The config.json file allows you to change all the parameters of our dataset, such as the number of time series, the axis of the time series and the type of anomaly. It looks like this:

1. Trend Anomaly Identification

1.1 Theory

When we say “trend anomaly”, we are looking at a structural behavior: the series goes up or down over time, or bends in a constant way. This is important for real data because drift often means sensor degradation, changing user behavior, model/data pipeline problems, or some other underlying phenomenon that should be investigated in your dataset.

We look for two types of trends:

  • The descent of the line: we measure a time series with a linear trend
  • Polynomial regression: we fit the time series with a low-order polynomial.

In practice, we measure the error of a linear regression model. If it is very large, we fit Polynomial Regression. We consider a trend “significant” when the p-value is lower than a set threshold (usually up < 0.05).

1.2 The Code

The AnomalyDetector object in anomaly_detector.py will execute the code described above using the following functions:

  • I detectorwhich will load the data we generated into the DataGenerator.
  • find_unusual_trends again find_all_trends find the (final) trend of a single time series and the entire dataset, respectively
  • find_series_and_trend returns indicators with a significant trend.

We can use plot_trend_anomalies to display the time series and see how it's doing:

Photo taken by the author

Good! So we are able to retrieve a “normal” time series from our dataset without any errors. Let's continue!

2. Volatility Anomaly Identification

2.1 Theory

Now that we have a global trend, we can focus on it to waver. What I mean by flexibility is that, in plain English, how is our time series everywhere? In more precise words, how does the variance of the time series compare to the single average of our dataset?

Here's how we'll explore this paradox:

  1. We are going get rid of the habit from a time series dataset.
  2. we will get a statistics of variance.
  3. we will get a outsiders of these figures

It's pretty simple, isn't it? Let's dive into the code!

2.2 The Code

Similar to what we did for trends, we have:

  • find_volatility_anomalywhich tests whether a given time series has volatility or not.
  • get_all_variablesagain get_string_with_top_variablewhich examines all data series variable data sets and returns confusing indexes, respectively.

This is how we show the results:

Photo taken by the author

3. Anomaly of one point

3.1 Theory

Okay, now let's ignore the rest of the dataset's time series and focus on each time series at a time. For our time series of interest, we want to see what we have one an apparently surprising point. There are many ways to do that; we can use Transformers, 1D CNN, LSTM, Encoder-Decoder, etc. For simplicity, let's use a very simple algorithm:

  1. We will accept a roll up window way, where the limited window will move from left to right
  2. For each point, we calculate the said again standard deviation of the surrounding window (except the point itself)
  3. We calculate how many standard deviations a point is away from its location using Z-points

We define the point as in a strange way if it exceeds a fixed value of Z-score. We will use Z-score = 3 which means 3 times the standard deviation.

3.2 The Code

Similar to what we did in trends and fluctuations, we have:

  • find_point_oddwhich tests whether a given time series has a single point anomaly using the rolling window Z-score method.
  • find_all_point_anomalies again get_series_with_point_anomalieswhich examines all series of the dataset for anomalies and returns indexes of series containing at least one anomaly point, respectively.

And this is how it works:

Photo taken by the author

4. Anomaly dataset level

4.1 Theory

This part is deliberately simple. Here we are not we want odd points in time, we want them strange symptoms in the bank. What we want to answer is:

Is there a time series whose overall magnitude is larger (or smaller) than we expect given the entire dataset?

To do that, we compress each time series into a single “base” number (normalized rate), and then compare those bases across the bank. Comparisons will be made in terms of which is in between again Z score.

4.2 The Code

This is how we do dataset-level manipulations:

  1. detect_dataset_level_anomalies()detects dataset level differences across the dataset.
  2. get_dataset_level_anomalies()finds indicators that introduce dataset-level anomalies.
  3. plot_dataset_level_anomalies()shows a sample time series that presents anomalies.

Here is the code to do this:

5. All together!

Okay, time to wrap it all up. We will use detector.detect_all_anomalies() and we will evaluate the anomalies of the entire dataset based on it inclination, flexibility, one point again dataset level which is not understood. The script to do this is very simple:

df will give you the anomalies for each time series. It looks like this:

If we use the following function we can see that it works:

Photo taken by the author

Pretty impressive right? We did it. 🙂

6. Conclusions

Thank you for spending time with us, it means a lot. ❤️ Here's what we did together:

  • Build a small anomaly detection toolkit for time series bank.
  • Found general ambiguity linear regression is used, and polynomial regression when a linear fit is insufficient.
  • Found volatility anomalies by first withdrawing and then comparing the differences across the dataset.
  • Found one-point anomaly with a rolling window Z-score (easy, fast, and incredibly effective).
  • Found dataset-level anomalies by compressing each series into a baseline (median) and flagging signals that remain on a different magnitude scale.
  • Combine everything in one pipe that returns a clean summary table we can inspect or edit.

For most real projects, a toolbox like the one we've built here gets you very far, because:

  • It gives you it is understandable signals (trend, volatility, fundamental shift, local outliers).
  • It gives you strength the foundation before you move on to heavier models.
  • It grows well if you have it many signswhich is where the discovery of the anomaly is often painful.

Remember that the base is intentionally simple, and uses very simple math. However, code modeling allows you to easily add complexity by simply adding functionality to anomaly_detector_utils.py and anomaly_detector.py.

7. Before you go out!

Thanks again for your time. It means a lot ❤️

My name is Piero Paialunga, and I'm this guy here:

Photo taken by the author

I am from Italy, I have a Ph.D. from the University of Cincinnatiand work as Data Scientist at The Trade Desk New York City. I write about AI, Machine Learning, and the evolving role of data scientists both here on TDS and on LinkedIn. If you liked this article and want to know more about machine learning and follow my tutorials, you can:

A. Follow me Linkedinwhere I publish all my stories
B. Follow me GitHubwhere you can see all my code
C. For questions, you can email me at piero.paialunga@hotmail

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button