Implementation of a Conditional Pipeline Construction Code for Bayesian Hyperparameter Optimization with Hyperopt, TPE, and Prepositioning

In this tutorial, we use an advanced Bayesian hyperparameter workflow Hypeopt and the Tree-structured Parzen Estimator (TPE) algorithm. We construct a conditional search space that dynamically transitions between families of different models, demonstrating how Hyperopt handles sequential and structured parameter graphs. We develop an objective function for the production range using cross-validation within the scikit-learn pipeline, which allows for realistic model testing. We also include pre-positioning based on volatility loss development and fully test the Trial item to analyze development trajectories. By the end of this course, we not only find the best model configuration but also understand how Hyperopt internally tracks, analyzes, and optimizes the search process. It creates a scalable and reproducible parameter tuning framework that can be extended to deep learning or distributed settings.
!pip -q install -U hyperopt scikit-learn pandas matplotlib
import time
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from hyperopt import fmin, tpe, hp, Trials, STATUS_OK, STATUS_FAIL
from hyperopt.pyll.base import scope
from hyperopt.early_stop import no_progress_loss
X, y = load_breast_cancer(return_X_y=True)
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
We include dependencies and import all required libraries for development, modeling, and visualization. We load the breast cancer dataset and prepare a cross-sectional validation to ensure a balanced test in all aspects. This forms the experimental basis for our optimal Bayesian optimization.
space = hp.choice("model_family", [
{
"model": "logreg",
"scaler": True,
"C": hp.loguniform("lr_C", np.log(1e-4), np.log(1e2)),
"penalty": hp.choice("lr_penalty", ["l2"]),
"solver": hp.choice("lr_solver", ["lbfgs", "liblinear"]),
"max_iter": scope.int(hp.quniform("lr_max_iter", 200, 2000, 50)),
"class_weight": hp.choice("lr_class_weight", [None, "balanced"]),
},
{
"model": "svm",
"scaler": True,
"kernel": hp.choice("svm_kernel", ["rbf", "poly"]),
"C": hp.loguniform("svm_C", np.log(1e-4), np.log(1e2)),
"gamma": hp.loguniform("svm_gamma", np.log(1e-6), np.log(1e0)),
"degree": scope.int(hp.quniform("svm_degree", 2, 5, 1)),
"class_weight": hp.choice("svm_class_weight", [None, "balanced"]),
}
])
We define a conditional search space using hp.choice, allowing Hyperopt to switch between Logistic Regression and SVM. Each branch has its own sub-parameter, which shows the behavior of the tree-structured search. We've also created absolute parameters using scope.int to avoid floating point misconfigurations.
def build_pipeline(params: dict) -> Pipeline:
steps = []
if params.get("scaler", True):
steps.append(("scaler", StandardScaler()))
if params["model"] == "logreg":
clf = LogisticRegression(
C=float(params["C"]),
penalty=params["penalty"],
solver=params["solver"],
max_iter=int(params["max_iter"]),
class_weight=params["class_weight"],
n_jobs=None,
)
elif params["model"] == "svm":
kernel = params["kernel"]
clf = SVC(
kernel=kernel,
C=float(params["C"]),
gamma=float(params["gamma"]),
degree=int(params["degree"]) if kernel == "poly" else 3,
class_weight=params["class_weight"],
probability=True,
)
else:
raise ValueError(f"Unknown model type: {params['model']}")
steps.append(("clf", clf))
return Pipeline(steps)
def objective(params: dict):
t0 = time.time()
try:
pipe = build_pipeline(params)
scores = cross_val_score(
pipe,
X, y,
cv=cv,
scoring="roc_auc",
n_jobs=-1,
error_score="raise",
)
mean_auc = float(np.mean(scores))
std_auc = float(np.std(scores))
loss = 1.0 - mean_auc
elapsed = float(time.time() - t0)
return {
"loss": loss,
"status": STATUS_OK,
"attachments": {
"mean_auc": mean_auc,
"std_auc": std_auc,
"elapsed_sec": elapsed,
},
}
except Exception as e:
elapsed = float(time.time() - t0)
return {
"loss": 1.0,
"status": STATUS_FAIL,
"attachments": {
"error": repr(e),
"elapsed_sec": elapsed,
},
}
We use a pipe constructor and an objective function. We evaluate the models using the validated ROC-AUC and transform the optimization problem into a minimization function by defining the loss as 1 – mean_auc. We also attach structured metadata to each test, allowing for rich post-optimization analysis.
trials = Trials()
rstate = np.random.default_rng(123)
max_evals = 80
best = fmin(
fn=objective,
space=space,
algo=tpe.suggest,
max_evals=max_evals,
trials=trials,
rstate=rstate,
early_stop_fn=no_progress_loss(20),
)
print("nRaw `best` (note: includes choice indices):")
print(best)
We use the TPE configuration using fmin, specifying the maximum number of checks and premature stop conditions. We perform random replication and keep track of all experiments using the Experiment object. This snippet performs a full Bayesian search procedure.
def decode_best(space, best):
from hyperopt.pyll.stochastic import sample
fake = {}
def _fill(node):
return node
cfg = sample(space, rng=np.random.default_rng(0))
return None
best_trial = trials.best_trial
best_params = best_trial["result"].get("attachments", {}).copy()
best_used_params = best_trial["misc"]["vals"].copy()
best_used_params = {k: (v[0] if isinstance(v, list) and len(v) else v) for k, v in best_used_params.items()}
MODEL_FAMILY = ["logreg", "svm"]
LR_PENALTY = ["l2"]
LR_SOLVER = ["lbfgs", "liblinear"]
LR_CLASS_WEIGHT = [None, "balanced"]
SVM_KERNEL = ["rbf", "poly"]
SVM_CLASS_WEIGHT = [None, "balanced"]
mf = int(best_used_params.get("model_family", 0))
decoded = {"model": MODEL_FAMILY[mf]}
if decoded["model"] == "logreg":
decoded.update({
"C": float(best_used_params["lr_C"]),
"penalty": LR_PENALTY[int(best_used_params["lr_penalty"])],
"solver": LR_SOLVER[int(best_used_params["lr_solver"])],
"max_iter": int(best_used_params["lr_max_iter"]),
"class_weight": LR_CLASS_WEIGHT[int(best_used_params["lr_class_weight"])],
"scaler": True,
})
else:
decoded.update({
"kernel": SVM_KERNEL[int(best_used_params["svm_kernel"])],
"C": float(best_used_params["svm_C"]),
"gamma": float(best_used_params["svm_gamma"]),
"degree": int(best_used_params["svm_degree"]),
"class_weight": SVM_CLASS_WEIGHT[int(best_used_params["svm_class_weight"])],
"scaler": True,
})
print("nDecoded best configuration:")
print(decoded)
print("nBest trial metrics:")
print(best_params)
We encode Hypereopt's internal options pointers into human-readable settings. Since hp.choice returns index values, we manually map them to the corresponding parameter labels. This produces a cleaner, more interpretable configuration for final training.
rows = []
for t in trials.trials:
res = t.get("result", {})
att = res.get("attachments", {}) if isinstance(res, dict) else {}
status = res.get("status", None) if isinstance(res, dict) else None
loss = res.get("loss", None) if isinstance(res, dict) else None
vals = t.get("misc", {}).get("vals", {})
vals = {k: (v[0] if isinstance(v, list) and len(v) else None) for k, v in vals.items()}
rows.append({
"tid": t.get("tid"),
"status": status,
"loss": loss,
"mean_auc": att.get("mean_auc"),
"std_auc": att.get("std_auc"),
"elapsed_sec": att.get("elapsed_sec"),
**{f"p_{k}": v for k, v in vals.items()},
})
df = pd.DataFrame(rows).sort_values("tid").reset_index(drop=True)
print("nTop 10 trials by best loss:")
print(df[df["status"] == STATUS_OK].sort_values("loss").head(10)[
["tid", "loss", "mean_auc", "std_auc", "elapsed_sec", "p_model_family"]
])
ok = df[df["status"] == STATUS_OK].copy()
ok["best_so_far"] = ok["loss"].cummin()
plt.figure()
plt.plot(ok["tid"], ok["loss"], marker="o", linestyle="none")
plt.xlabel("trial id")
plt.ylabel("loss = 1 - mean_auc")
plt.title("Trial losses")
plt.show()
plt.figure()
plt.plot(ok["tid"], ok["best_so_far"])
plt.xlabel("trial id")
plt.ylabel("best-so-far loss")
plt.title("Best-so-far trajectory")
plt.show()
final_pipe = build_pipeline(decoded)
final_pipe.fit(X, y)
print("nFinal model fitted on full dataset.")
print(final_pipe)
print("nNOTE: SparkTrials is primarily useful on Spark/Databricks environments.")
print("Hyperopt SparkTrials docs exist, but Colab is typically not the right place for it.")
We convert the Trials object into a structured DataFrame for analysis. We visualize the loss progression and the best performance to date to understand the convergence behavior. Finally, we train the best model on the full dataset and validate the final optimized pipeline.
In conclusion, we have developed a fully structured Bayesian hyperparameter optimization scheme using the TPE hyperopt algorithm. We showed how to build conditional search spaces, use robust objective functions, use early positioning, and analyze trial metadata in depth. Rather than treating hyperparameter tuning as a black box, we expose and test every part of the optimization pipeline. We now have an extensible and scalable framework that can be adapted for development, deep neural networks, reinforcement learning agents, or distributed environments for Spark. By combining systematic search environments with intelligent sampling, we have achieved the development of an efficient and interpretable model suitable for both research and production environments.
Check it out Full Codes with Notebook here. Also, feel free to follow us Twitter and don't forget to join our 130k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.
Need to work with us on developing your GitHub Repo OR Hug Face Page OR Product Release OR Webinar etc.?contact us
The post Coding Implementation of Conditional Bayesian Hyperparameter Optimization Pipeline with Hyperopt, TPE, and Preparameterization appeared first on MarkTechPost.



