Teaching Nonlinear Data: A Guide to Scikit-Learn's SplineTransformer

that linear models can be… well, rigid. Have you ever looked at a scatter plot and realized that a straight line won't cut it? We've all been there.
Real world data is always a challenge. Most of the time, it feels like the exception is the rule. The data you get in your work is not like that fine line dataset we used during the years of training in school.
For example, you're looking at something like “Power Demand vs. Temperature.” It's not a line; curve. Usually, our first instinct is to reach for Polynomial Regression. But that is a trap!
If you've ever seen a model curve get rough around the edges of your graph, you've experienced the “Runge Phenomenon.” High-degree polynomials are like a toddler with a crayon, as they are very flexible and undisciplined.
That's why I'm going to show you this option called Splines. They are pure solutions: more flexible than linear, but more disciplined than polynomial.
Splines are mathematical functions defined by polynomials, and are used to smooth a curve.
Instead of trying to fit one complex equation across your dataset, you divide the data into segments called fields knots. Each part gets its own simple polynomial, and they all fit together so smoothly that you can't even see the seams.
The problem with Polynomials
Assume that we have a non-linear trend, and we use a polynomial x² or x³ in it. It looks okay on the surface, but then we look at the edges of your data, and the curve goes away. In accordance with Runge's action [2], higher polynomials have this problem where a single odd data point can eventually pull the entire curve out of whack.
Why Splines are the “right” choice
Splines don't try to fit one giant equation to everything. Instead, they divide your data into segments using called points knots. We have some advantages of using knots.
- Local Government: What happens in one part stays in that part. Because these pieces are local, an odd data point at one end of your graph won't spoil the fit on the other side.
- Smoothness: They use “B-splines” (Basis splines) to ensure that where the segments meet, the curve is perfectly smooth.
- Stability: Unlike polynomials, they do not move to boundaries.
That's right. Enough talk, now let's implement this solution.
Using it with Scikit-Learn
Scikit-Learn's SplineTransformer it is the only option in this case. Converts a single number element to multiples foundational features that a simple linear model can be used to study complex, non-linear situations.
Let's import some modules.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import SplineTransformer
from sklearn.linear_model import Ridge
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
Next, we create some curved data.
# 1. Create some 'wiggly' synthetic data (e.g., seasonal sales)
rng = np.random.RandomState(42)
X = np.sort(rng.rand(100, 1) * 10, axis=0)
y = np.sin(X).ravel() + rng.normal(0, 0.1, X.shape[0])
# Plot the data
plt.figure(figsize=(12, 5))
plt.scatter(X, y, color='gray', alpha=0.5, label='Data')
plt.legend()
plt.title("Data")
plt.show()

That's right. Now we will create a pipeline using the SplineTranformer with default settings, followed by Ridge Regression.
# 2. Build a pipeline: Splines + Linear Model
# n_knots=5 (default) creates 4 segments; degree=3 makes it a cubic spline
model = make_pipeline(
SplineTransformer(n_knots=5, degree=3),
Ridge(alpha=0.1)
)
Next, we will tune the number of nodes in our model. We use GridSearchCV to run multiple versions of the model, testing the computation of different nodes until it finds the one that performs best for our data.
# We tune 'n_knots' to find the best tune
param_grid = {'splinetransformer__n_knots': range(3, 12)}
grid = GridSearchCV(model, param_grid, cv=5)
grid.fit(X, y)
print(f"Best knot count: {grid.best_params_['splinetransformer__n_knots']}")
Best knot count: 8
After that, we retrain our spline model with the best number of knots, predict, and organize the data. And, let's understand what we're doing here with this quick breakdown of SplineTransformer class arguments:
n_knots: number of curves. The more you have, the more flexible the curve.degree: This defines the “smoothness” of the segments. Specifies the degree of polynomial used between nodes (1 is linear; 2 is smooth; 3 is default).knots: This one tells the model there to put joints. For example,uniformit divides the curve into equal intervals, while the quantile places extra knots where the data is dense.- Tip: Use
'quantile'if your data is aggregated.
- Tip: Use
extrapolation: Tells the model what to do when it encounters data outside the range it saw during training.- Tip: use
'periodic'with cyclical data, such as a calendar or clock.
- Tip: use
include_bias: Whether to include a “bias” column (a column for all). If you use aLinearRegressionorRidgemodel later in your pipeline, those models usually have their ownfit_intercept=Trueso you can set this to beFalseto avoid being pushed back.
# 2. Build the optimized Spline
model = make_pipeline(
SplineTransformer(n_knots=8,
degree=3,
knots= 'uniform',
extrapolation='constant',
include_bias=False),
Ridge(alpha=0.1)
).fit(X, y)
# 3. Predict and Visualize
y_plot = model.predict(X)
# Plot
plt.figure(figsize=(12, 5))
plt.scatter(X, y, color='gray', alpha=0.5, label='Data')
plt.plot(X, y_plot, color='teal', linewidth=3, label='Spline Model')
plt.plot(X, y_plot_10, color='purple', linewidth=2, label='Polynomial Fit (Degree 20)')
plt.legend()
plt.title("Splines: Flexible yet Disciplined")
plt.show()
Here is the result. With splines, we have better control and a smoother model, avoiding the problem at the end.

We compare the polynomial model of degree=20 with the spline model. One could argue that lower degrees would make the best modeling of this data, and they would be right. I checked up to the 13th degree, and it matches this dataset perfectly.
However, that is exactly the point of this article. If the model does not fit well to the data, and we need to keep increasing the polynomial degree, we will certainly fall wild edges problem.
Real Life Applications
Where can you actually use this in business?
- Time series cycles: Use it
extrapolation='periodic'with features such as “hour of day” or “month of the year.” It ensures that the model knows that 11:59 PM is closer to 12:01 AM. With this argument, we tell theSplineTransformerthat the end of our cycle (hour 23) we have to go around and meet the beginning (hour 0). Therefore, the spline ensures that the trend and the value at the end of the day perfectly match the beginning of the next day. - Answer from Medicine: Modeling how the drug affects the patient. Most drugs follow a non-linear curve where the benefit eventually reaches (saturation) or, worse, turns into toxicity. Splines are the “gold standard” here because they can map these complex biological variables without forcing the data into a rigid state.
- Income vs. experience: Income tends to grow rapidly early on and then plateau; Splines capture this “bend” well.
Before You Go
We've covered a lot here, from why polynomials can be a “wild” choice to how periodic splines solve the midnight gap. Here's a quick wrap to keep in your back pocket:
- The Golden Rule: Use Splines when a straight line is too simple, but a higher order polynomial starts oscillating and overfitting.
- Key Points: Nodes are the “joints” of your model. Finding the right number with
GridSearchCVthe difference between a smooth curve and a jagged kneading. - Power of Time: For any rotating feature (hours, days, months), use
extrapolation='periodic'. It ensures that the model understands that the end of the cycle flows completely back to the beginning. - Engineering Feature > Complex Models: In general, simple
Ridgeregression combined withSplineTransformerit will surpass the complex “Black Box” model while still being very easy to explain to your boss.
If you liked this content, find out more about my work and my contacts on my website.
GitHub Repository
Here is the complete code for this function, along with a few additions.
References
[1. SplineTransformer Documentation]
[2. Runge’s Phenomenon]
[3. Make Pipeline Docs]



