How AutoGluon Enables Modern AutoML Pipelines for Production Grade Table Models with Merge and Extraction

nimda January 21, 2026

0 5 4 minutes read

How AutoGluon Enables Modern AutoML Pipelines for Production Grade Table Models with Merge and Extraction

In this tutorial, we build a machine learning pipeline for a production grade table using AutoGluontaking real-world mixed-type datasets from raw inputs to ready-to-use artifacts. We train high-quality stacked and bagged ensembles, evaluate performance with robust metrics, perform subgroup and feature-level analyses, and optimize the model for real-time inference using refit-full and distillation. In every workflow, we focus on practical decisions that measure accuracy, latency, and utilization. Check out FULL CODES here.

!pip -q install -U "autogluon==1.5.0" "scikit-learn>=1.3" "pandas>=2.0" "numpy>=1.24"


import os, time, json, warnings
warnings.filterwarnings("ignore")


import numpy as np
import pandas as pd


from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, log_loss, accuracy_score, classification_report, confusion_matrix


from autogluon.tabular import TabularPredictor

We set up the environment by installing the necessary libraries and importing all the dependencies used throughout the system. We prepare alerts to keep the output clean and ensure that numerical, tabular, and analytical resources are correct. Check out FULL CODES here.

from sklearn.datasets import fetch_openml
df = fetch_openml(data_id=40945, as_frame=True).frame


target = "survived"
df[target] = df[target].astype(int)


drop_cols = [c for c in ["boat", "body", "home.dest"] if c in df.columns]
df = df.drop(columns=drop_cols, errors="ignore")


df = df.replace({None: np.nan})
print("Shape:", df.shape)
print("Target positive rate:", df[target].mean().round(4))
print("Columns:", list(df.columns))


train_df, test_df = train_test_split(
   df,
   test_size=0.2,
   random_state=42,
   stratify=df[target],
)

We load a real-world mixed-type dataset and perform light processing to prepare a clean training signal. We define the target, remove the most leaky columns, and verify the structure of the dataset. We then create a stratified train test division to maintain class balance. Check out FULL CODES here.

def has_gpu():
   try:
       import torch
       return torch.cuda.is_available()
   except Exception:
       return False


presets = "extreme" if has_gpu() else "best_quality"


save_path = "/content/autogluon_titanic_advanced"
os.makedirs(save_path, exist_ok=True)


predictor = TabularPredictor(
   label=target,
   eval_metric="roc_auc",
   path=save_path,
   verbosity=2
)

We recognize hardware availability to dynamically select the most appropriate AutoGluon training preset. We prepare the directory for the continuous model and run the forecast table with the appropriate test metric. Check out FULL CODES here.

start = time.time()
predictor.fit(
   train_data=train_df,
   presets=presets,
   time_limit=7 * 60,
   num_bag_folds=5,
   num_stack_levels=2,
   refit_full=False
)
train_time = time.time() - start
print(f"nTraining done in {train_time:.1f}s with presets="{presets}"")

We train a high quality collection using bags and packaging within a controlled time budget. We rely on AutoGluon's automated model analysis to effectively evaluate robust structures. We also record the training time to understand the computation cost. Check out FULL CODES here.

lb = predictor.leaderboard(test_df, silent=True)
print("n=== Leaderboard (top 15) ===")
display(lb.head(15))


proba = predictor.predict_proba(test_df)
pred = predictor.predict(test_df)


y_true = test_df[target].values
if isinstance(proba, pd.DataFrame) and 1 in proba.columns:
   y_proba = proba[1].values
else:
   y_proba = np.asarray(proba).reshape(-1)


print("n=== Test Metrics ===")
print("ROC-AUC:", roc_auc_score(y_true, y_proba).round(5))
print("LogLoss:", log_loss(y_true, np.clip(y_proba, 1e-6, 1 - 1e-6)).round(5))
print("Accuracy:", accuracy_score(y_true, pred).round(5))
print("nClassification report:n", classification_report(y_true, pred))

We test the trained models using a delayed test set and check the leaderboard to compare performance. We calculate probabilistic and imprecise predictors and derive important classification metrics. It gives us a broader view of model accuracy and measurement. Check out FULL CODES here.

if "pclass" in test_df.columns:
   print("n=== Slice AUC by pclass ===")
   for grp, part in test_df.groupby("pclass"):
       part_proba = predictor.predict_proba(part)
       part_proba = part_proba[1].values if isinstance(part_proba, pd.DataFrame) and 1 in part_proba.columns else np.asarray(part_proba).reshape(-1)
       auc = roc_auc_score(part[target].values, part_proba)
       print(f"pclass={grp}: AUC={auc:.4f} (n={len(part)})")


fi = predictor.feature_importance(test_df, silent=True)
print("n=== Feature importance (top 20) ===")
display(fi.head(20))

We analyze the behavior of the model by using the performance cut of the subgroup and the importance of the feature based on the permutation. We see how performance varies across logical segments of data. It helps us to check stability and interpretability before shipping. Check out FULL CODES here.

t0 = time.time()
refit_map = predictor.refit_full()
t_refit = time.time() - t0


print(f"nrefit_full completed in {t_refit:.1f}s")
print("Refit mapping (sample):", dict(list(refit_map.items())[:5]))


lb_full = predictor.leaderboard(test_df, silent=True)
print("n=== Leaderboard after refit_full (top 15) ===")
display(lb_full.head(15))


best_model = predictor.get_model_best()
full_candidates = [m for m in predictor.get_model_names() if m.endswith("_FULL")]


def bench_infer(model_name, df_in, repeats=3):
   times = []
   for _ in range(repeats):
       t1 = time.time()
       _ = predictor.predict(df_in, model=model_name)
       times.append(time.time() - t1)
   return float(np.median(times))


small_batch = test_df.drop(columns=[target]).head(256)
lat_best = bench_infer(best_model, small_batch)
print(f"nBest model: {best_model} | median predict() latency on 256 rows: {lat_best:.4f}s")


if full_candidates:
   lb_full_sorted = lb_full.sort_values(by="score_test", ascending=False)
   best_full = lb_full_sorted[lb_full_sorted["model"].str.endswith("_FULL")].iloc[0]["model"]
   lat_full = bench_infer(best_full, small_batch)
   print(f"Best FULL model: {best_full} | median predict() latency on 256 rows: {lat_full:.4f}s")
   print(f"Speedup factor (best / full): {lat_best / max(lat_full, 1e-9):.2f}x")


try:
   t0 = time.time()
   distill_result = predictor.distill(
       train_data=train_df,
       time_limit=4 * 60,
       augment_method="spunge",
   )
   t_distill = time.time() - t0
   print(f"nDistillation completed in {t_distill:.1f}s")
except Exception as e:
   print("nDistillation step failed")
   print("Error:", repr(e))


lb2 = predictor.leaderboard(test_df, silent=True)
print("n=== Leaderboard after distillation attempt (top 20) ===")
display(lb2.head(20))


predictor.save()
reloaded = TabularPredictor.load(save_path)


sample = test_df.drop(columns=[target]).sample(8, random_state=0)
sample_pred = reloaded.predict(sample)
sample_proba = reloaded.predict_proba(sample)


print("n=== Reloaded predictor sanity-check ===")
print(sample.assign(pred=sample_pred).head())


print("nProbabilities (head):")
display(sample_proba.head())


artifacts = {
   "path": save_path,
   "presets": presets,
   "best_model": reloaded.get_model_best(),
   "model_names": reloaded.get_model_names(),
   "leaderboard_top10": lb2.head(10).to_dict(orient="records"),
}
with open(os.path.join(save_path, "run_summary.json"), "w") as f:
   json.dump(artifacts, f, indent=2)


print("nSaved summary to:", os.path.join(save_path, "run_summary.json"))
print("Done.")

We prepare a trained ensemble to consider the collapse of bagged models and the improvement of measurement delays. We voluntarily break down the combination into fast models and verify persistence with reload tests. Also, we export structured artifacts required for production deployment.

In conclusion, we implemented an end-to-end workflow with AutoGluon that transforms raw tabular data into production-ready models with minimal manual intervention, while maintaining tight control over the accuracy, robustness, and efficiency of the views. We performed systematic error analysis and feature importance testing, optimized large ensembles through refactoring and filtering, and verified deployment readiness using latency benchmarking and artifact packaging. This workflow allows for the deployment of tabular models that are highly efficient, scalable, interpretable, and well-suited to real-world production environments.

Check out FULL CODES here. Also, feel free to follow us Twitter and don't forget to join our 100k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.

Source link

nimda January 21, 2026

0 5 4 minutes read

How AutoGluon Enables Modern AutoML Pipelines for Production Grade Table Models with Merge and Extraction

nimda

Leave a Reply Cancel reply

Subscribers, Revenue, Market Share & Global Reach

5-return back to the base

Gemma 3 270m: Model of a hyper-effective compact of AI

WorkOS Releases auth.md: An Open Agent Registration Protocol Built on OAuth Standards

Cut researchers present the work that calls llms: Eliminating SQL relief to improve the accuracy of information and efficiency

OASIS: Simuleringar av social interaction mellan en miljon agent

FALCON 3 models are now available at Amazon Sagemaker Jumpstart

This AI paper introduces codesters: Physical models are symbolic language with code / guide

Meta SAM 2.1 is now available in Amazon SageMaker JumpStart

nimda

Subscribe to our mailing list to get the new updates!

Matthew McConaughey gave the patent to sin and always exist

Salesforce AI Introduces FOFPred: A Language-Driven Future Flow Prediction Framework That Enables Advanced Robot Control and Video Production

Related Articles

WorkOS Releases auth.md: An Open Agent Registration Protocol Built on OAuth Standards

Build a Complete Langfuse Visualization and Testing Pipeline for Tracking, Rapid Management, Scoring, and Testing

Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% in Odysseys, Up from Base GPT-5.4's 33.5%

NVIDIA AI Releases Gated DeltaNet-2: A Separate Attention Layer That Decouples and Writes on the Delta Law

Leave a Reply Cancel reply

Subscribers, Revenue, Market Share & Global Reach

5-return back to the base

Gemma 3 270m: Model of a hyper-effective compact of AI

WorkOS Releases auth.md: An Open Agent Registration Protocol Built on OAuth Standards

Cut researchers present the work that calls llms: Eliminating SQL relief to improve the accuracy of information and efficiency

OASIS: Simuleringar av social interaction mellan en miljon agent

FALCON 3 models are now available at Amazon Sagemaker Jumpstart

This AI paper introduces codesters: Physical models are symbolic language with code / guide

Meta SAM 2.1 is now available in Amazon SageMaker JumpStart