ANI

Weight & Run: The Kidnugget crash course

nimda October 6, 2025

0 13 5 minutes read

Weight & Run: The Kidnugget crash course

Photo by the writer

If you are training more than one booklet, you may have hit the same heads: Takes five heads, Rerun training, and Friday cannot remember ROC curve or you use the ROC curve. Weight & Discrimity (W & B) tracking paper – Metrics, Confings, Confings, Datasets, and models – so that you can respond to evidence, not guess.

Below is a practical journey. There is a vision, light on a party, and prepared for groups that require clean assessment history without building their platform. Let's call no-fluff walkthrough.

Obvious Why is W & B at all?

Writing books grow into exams. Exams multiply. Soon you ask: What is the use of a piece of data? Why is today's ROC turn high? Can I resolve the basis of the week last week?

W & B gives you a place to:

Log metrics, settings, sites, and system statistics
The Dubasets of the version and models have articles
Run Hyperparameter Sweep
Share Dashboard outside the screen

You can start small and background features when needed.

Obvious Set in 60 seconds

Start by installing a library and logging in with your API keys. If you don't have yet, you can find it here.

pip install wandb
wandb login # paste your API key once

Photo by the writer

// Sanity small check

import wandb, random, time

wandb.init(project="kdn-crashcourse", name="hello-run", config={"lr": 0.001, "epochs": 5})
for epoch in range(wandb.config.epochs):
    loss = 1.0 / (epoch + 1) + random.random() * 0.05
    wandb.log({"epoch": epoch, "loss": loss})
    time.sleep(0.1)
wandb.finish()

Now you have to see something like:

Photo by the writer

Now let's visit useful batches.

Obvious Tracking Right Assessment

// LOG Hyperperameters and metrics

Manage wandb.config Like a single resource for true with your test rods. Provide metskens to default default chart charts.

cfg = dict(arch="resnet18", lr=3e-4, batch=64, seed=42)
run = wandb.init(project="kdn-mlops", config=cfg, tags=["baseline"])

# training loop ...
for step, (x, y) in enumerate(loader):
    # ... compute loss, acc
    wandb.log({"train/loss": loss.item(), "train/acc": acc, "step": step})

# log a final summary
run.summary["best_val_auc"] = best_auc

Few Tips:

Use words like this train/loss or val/auc In group charts automatically
Add tags like "lr-finder" or "fp16" So don't sort run over time
Use run.summary[...] For one-off results you want to see on a running card

// Log photos, deceased, and custom zones

wandb.log({
    "val/confusion": wandb.plot.confusion_matrix(
        preds=preds, y_true=y_true, class_names=classes)
})

You can also save any structure of matplotlib:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(history)
wandb.log({"training/curve": fig})

// The Dubasets of the version and models have articles

Questions for feedback response such as, “What files are used directly?” And “What do we train?” Never again final_final_v3.parquet secrets.

import wandb

run = wandb.init(project="kdn-mlops")

# Create a dataset artifact (run once per version)
raw = wandb.Artifact("imdb_reviews", type="dataset", description="raw dump v1")
raw.add_dir("data/raw") # or add_file("path")
run.log_artifact(raw)

# Later, consume the latest version
artifact = run.use_artifact("imdb_reviews:latest")
data_dir = artifact.download() # folder path pinned to a hash

import torch
import wandb

run = wandb.init(project="kdn-mlops")

model_path = "models/resnet18.pt"
torch.save(model.state_dict(), model_path)

model_art = wandb.Artifact("sentiment-resnet18", type="model")
model_art.add_file(model_path)
run.log_artifact(model_art)

Now, list list is obvious: This model came from that data, under this code.

// Tables for exam and error analysis

wandb.Table Is a simple datafadia of results, predictions, and pieces.

table = wandb.Table(columns=["id", "text", "pred", "true", "prob"])
for r in batch_results:
    table.add_data(r.id, r.text, r.pred, r.true, r.prob)
wandb.log({"eval/preds": table})

Sort the table to UI to locate failure patterns (eg short revision, rare classes, etc.).

// Hyperparameter sweeps

Describe the search space in Yaml, implementation agencies, and allow & B to link.

# sweep.yaml
method: bayes
metric: {name: val/auc, goal: maximize}
parameters:
  lr: {min: 1e-5, max: 1e-2}
  batch: {values: [32, 64, 128]}
  dropout: {min: 0.0, max: 0.5}

Start Sweep:

wandb sweep sweep.yaml # returns a SWEEP_ID
wandb agent // # run 1+ agents

Your training text should read wandb.config A Members lr, batchetc.. Dashboard shows the top tests, associated links, and the best config.

Obvious The combination of acquisition

Choose one you use and continue to go.

// Pytorch Lightning

from pytorch_lightning.loggers import WandbLogger
logger = WandbLogger(project="kdn-mlops")
trainer = pl.Trainer(logger=logger, max_epochs=10)

// Topic

import wandb
from wandb.keras import WandbCallback

wandb.init(project="kdn-mlops", config={"epochs": 10})
model.fit(X, y, epochs=wandb.config.epochs, callbacks=[WandbCallback()])

// Scikit-learn

from sklearn.metrics import roc_auc_score
wandb.init(project="kdn-mlops", config={"C": 1.0})
# ... fit model
wandb.log({"val/auc": roc_auc_score(y_true, y_prob)})

Obvious Model Registry and Starging

Consider the register as a shelf with a word called your beautiful models. You press the arts once, then carry aliases like staging or production So the low code can pull the right without guessing the files of the file.

run = wandb.init(project="kdn-mlops")
art = run.use_artifact("sentiment-resnet18:latest")
registry = wandb.sdk.artifacts.model_registry.ModelRegistry()
entry = registry.push(art, name="sentiment-classifier")
entry.aliases.add("staging")

Flip alias when you encourage new construction. Consumers always study sentiment-classifier:production.

Obvious The recurrence list is made

Settings: Keep all the hyperparameter in wandb.config
Code and Meaning: Use wandb.init(settings=wandb.Settings(code_dir=".")) Finding a snaping code or depending on CI to attach the git sha
Nature: Log requirements.txt or a Docker mark and included Artifact
Seeds: They entered them and put yourself

Seed Assistant:

def set_seeds(s=42):
    import random, numpy as np, torch
    random.seed(s)
    np.random.seed(s)
    torch.manual_seed(s)
    torch.cuda.manual_seed_all(s)

Obvious Collaboration and Sharing External screenshots

Add notes and tags so that group readers can search. Use the search messages, tables, and comment on link that can count on slack or PR. Participants can follow without opening a letter of writing.

Obvious CI and Automation Tips

Run wandb agent to the nodes of training to get rid of the sweep from CI
Sign in for data data after your ETL activity; The rail duties may depend on that translation clearly
After checking, promoting model aliases (staging → production) In a small month in the back
Pass WANDB_API_KEY Like a secret and a related group with WANDB_RUN_GROUP

Obvious Privacy advice and trust

Use secrets confidential projects for groups
Use the internet unraveling mode for air-stacked running. So train regularly wandb sync later

export WANDB_MODE=offline

Don't get into a green pii. If necessary, Hash id before logging.
Large files, keep them as artificial substances instead of attaching wandb.log.

Obvious Normal Snags (and quick adjustment)

“My run did nothing.” The script may have been broken before wandb.finish() He was called. Also, check that you haven't yet WANDB_DISABLED=true in your area.
Signing is a little bit. Enter ghosts in each step, but keep heavy goods such as pictures or epoch end tables. You can pass again commit=False above wandb.log() and batch many logs together.
Double sighting running in UI? If you resume from testing, set id including resume="allow" in wandb.init() to continue to run the same.
Finding Mysterery Data Drift? Put all data of snatching data into artifact and pin your versions in clear structures.

Obvious Pocket Cheasheet

// 1. Start running

wandb.init(project="proj", config=cfg, tags=["baseline"])

// 2. Logic Metrics, Photos, or Tables

wandb.log({"train/loss": loss, "img": [wandb.Image(img)]})

// 3. Version dataset or model

art = wandb.Artifact("name", type="dataset")
art.add_dir("path")
run.log_artifact(art)

// 4. Use Artifact

path = run.use_artifact("name:latest").download()

// 5. Run the Sweep

wandb sweep sweep.yaml && wandb agent //

Obvious Rolling up

Start Less: Get started running, enter few metrics, and press your model file as artifact. When that sounds natural, add sweeping and short report. You will keep with re-evaluation, data available and models, and dashboard explaining your work without a slate.

JOSEP FERRER by analytics engineer from Barcelona. Graduated from physics engineer and is currently working in a data science association used for human movement. He is a temporary content operator that focuses on science and technology. Josep writes in all things Ai, covering the use of ongoing explosion on the stadium.

Source link

nimda October 6, 2025

0 13 5 minutes read