Weight & Run: The Kidnugget crash course


Photo by the writer
If you are training more than one booklet, you may have hit the same heads: Takes five heads, Rerun training, and Friday cannot remember ROC curve or you use the ROC curve. Weight & Discrimity (W & B) tracking paper – Metrics, Confings, Confings, Datasets, and models – so that you can respond to evidence, not guess.
Below is a practical journey. There is a vision, light on a party, and prepared for groups that require clean assessment history without building their platform. Let's call no-fluff walkthrough.
Obvious Why is W & B at all?
Writing books grow into exams. Exams multiply. Soon you ask: What is the use of a piece of data? Why is today's ROC turn high? Can I resolve the basis of the week last week?
W & B gives you a place to:
- Log metrics, settings, sites, and system statistics
- The Dubasets of the version and models have articles
- Run Hyperparameter Sweep
- Share Dashboard outside the screen
You can start small and background features when needed.
Obvious Set in 60 seconds
Start by installing a library and logging in with your API keys. If you don't have yet, you can find it here.
pip install wandb
wandb login # paste your API key once


Photo by the writer
// Sanity small check
import wandb, random, time
wandb.init(project="kdn-crashcourse", name="hello-run", config={"lr": 0.001, "epochs": 5})
for epoch in range(wandb.config.epochs):
loss = 1.0 / (epoch + 1) + random.random() * 0.05
wandb.log({"epoch": epoch, "loss": loss})
time.sleep(0.1)
wandb.finish()
Now you have to see something like:


Photo by the writer
Now let's visit useful batches.
Obvious Tracking Right Assessment
// LOG Hyperperameters and metrics
Manage wandb.config Like a single resource for true with your test rods. Provide metskens to default default chart charts.
cfg = dict(arch="resnet18", lr=3e-4, batch=64, seed=42)
run = wandb.init(project="kdn-mlops", config=cfg, tags=["baseline"])
# training loop ...
for step, (x, y) in enumerate(loader):
# ... compute loss, acc
wandb.log({"train/loss": loss.item(), "train/acc": acc, "step": step})
# log a final summary
run.summary["best_val_auc"] = best_auc
Few Tips:
- Use words like this
train/lossorval/aucIn group charts automatically - Add tags like
"lr-finder"or"fp16"So don't sort run over time - Use
run.summary[...]For one-off results you want to see on a running card
// Log photos, deceased, and custom zones
wandb.log({
"val/confusion": wandb.plot.confusion_matrix(
preds=preds, y_true=y_true, class_names=classes)
})
You can also save any structure of matplotlib:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(history)
wandb.log({"training/curve": fig})
// The Dubasets of the version and models have articles
Questions for feedback response such as, “What files are used directly?” And “What do we train?” Never again final_final_v3.parquet secrets.
import wandb
run = wandb.init(project="kdn-mlops")
# Create a dataset artifact (run once per version)
raw = wandb.Artifact("imdb_reviews", type="dataset", description="raw dump v1")
raw.add_dir("data/raw") # or add_file("path")
run.log_artifact(raw)
# Later, consume the latest version
artifact = run.use_artifact("imdb_reviews:latest")
data_dir = artifact.download() # folder path pinned to a hash
Log in your model in the same way:
import torch
import wandb
run = wandb.init(project="kdn-mlops")
model_path = "models/resnet18.pt"
torch.save(model.state_dict(), model_path)
model_art = wandb.Artifact("sentiment-resnet18", type="model")
model_art.add_file(model_path)
run.log_artifact(model_art)
Now, list list is obvious: This model came from that data, under this code.
// Tables for exam and error analysis
wandb.Table Is a simple datafadia of results, predictions, and pieces.
table = wandb.Table(columns=["id", "text", "pred", "true", "prob"])
for r in batch_results:
table.add_data(r.id, r.text, r.pred, r.true, r.prob)
wandb.log({"eval/preds": table})
Sort the table to UI to locate failure patterns (eg short revision, rare classes, etc.).
// Hyperparameter sweeps
Describe the search space in Yaml, implementation agencies, and allow & B to link.
# sweep.yaml
method: bayes
metric: {name: val/auc, goal: maximize}
parameters:
lr: {min: 1e-5, max: 1e-2}
batch: {values: [32, 64, 128]}
dropout: {min: 0.0, max: 0.5}
Start Sweep:
wandb sweep sweep.yaml # returns a SWEEP_ID
wandb agent // # run 1+ agents
Your training text should read wandb.config A Members lr, batchetc.. Dashboard shows the top tests, associated links, and the best config.
Obvious The combination of acquisition
Choose one you use and continue to go.
// Pytorch Lightning
from pytorch_lightning.loggers import WandbLogger
logger = WandbLogger(project="kdn-mlops")
trainer = pl.Trainer(logger=logger, max_epochs=10)
// Topic
import wandb
from wandb.keras import WandbCallback
wandb.init(project="kdn-mlops", config={"epochs": 10})
model.fit(X, y, epochs=wandb.config.epochs, callbacks=[WandbCallback()])
// Scikit-learn
from sklearn.metrics import roc_auc_score
wandb.init(project="kdn-mlops", config={"C": 1.0})
# ... fit model
wandb.log({"val/auc": roc_auc_score(y_true, y_prob)})
Obvious Model Registry and Starging
Consider the register as a shelf with a word called your beautiful models. You press the arts once, then carry aliases like staging or production So the low code can pull the right without guessing the files of the file.
run = wandb.init(project="kdn-mlops")
art = run.use_artifact("sentiment-resnet18:latest")
registry = wandb.sdk.artifacts.model_registry.ModelRegistry()
entry = registry.push(art, name="sentiment-classifier")
entry.aliases.add("staging")
Flip alias when you encourage new construction. Consumers always study sentiment-classifier:production.
Obvious The recurrence list is made
- Settings: Keep all the hyperparameter in
wandb.config - Code and Meaning: Use
wandb.init(settings=wandb.Settings(code_dir="."))Finding a snaping code or depending on CI to attach the git sha - Nature: Log
requirements.txtor a Docker mark and included Artifact - Seeds: They entered them and put yourself
Seed Assistant:
def set_seeds(s=42):
import random, numpy as np, torch
random.seed(s)
np.random.seed(s)
torch.manual_seed(s)
torch.cuda.manual_seed_all(s)
Obvious Collaboration and Sharing External screenshots
Add notes and tags so that group readers can search. Use the search messages, tables, and comment on link that can count on slack or PR. Participants can follow without opening a letter of writing.
Obvious CI and Automation Tips
- Run
wandb agentto the nodes of training to get rid of the sweep from CI - Sign in for data data after your ETL activity; The rail duties may depend on that translation clearly
- After checking, promoting model aliases (
staging→production) In a small month in the back - Pass
WANDB_API_KEYLike a secret and a related group withWANDB_RUN_GROUP
Obvious Privacy advice and trust
- Use secrets confidential projects for groups
- Use the internet unraveling mode for air-stacked running. So train regularly
wandb synclater
export WANDB_MODE=offline
- Don't get into a green pii. If necessary, Hash id before logging.
- Large files, keep them as artificial substances instead of attaching
wandb.log.
Obvious Normal Snags (and quick adjustment)
- “My run did nothing.” The script may have been broken before
wandb.finish()He was called. Also, check that you haven't yetWANDB_DISABLED=truein your area. - Signing is a little bit. Enter ghosts in each step, but keep heavy goods such as pictures or epoch end tables. You can pass again
commit=Falseabovewandb.log()and batch many logs together. - Double sighting running in UI? If you resume from testing, set
idincludingresume="allow"inwandb.init()to continue to run the same. - Finding Mysterery Data Drift? Put all data of snatching data into artifact and pin your versions in clear structures.
Obvious Pocket Cheasheet
// 1. Start running
wandb.init(project="proj", config=cfg, tags=["baseline"])
// 2. Logic Metrics, Photos, or Tables
wandb.log({"train/loss": loss, "img": [wandb.Image(img)]})
// 3. Version dataset or model
art = wandb.Artifact("name", type="dataset")
art.add_dir("path")
run.log_artifact(art)
// 4. Use Artifact
path = run.use_artifact("name:latest").download()
// 5. Run the Sweep
wandb sweep sweep.yaml && wandb agent //
Obvious Rolling up
Start Less: Get started running, enter few metrics, and press your model file as artifact. When that sounds natural, add sweeping and short report. You will keep with re-evaluation, data available and models, and dashboard explaining your work without a slate.
JOSEP FERRER by analytics engineer from Barcelona. Graduated from physics engineer and is currently working in a data science association used for human movement. He is a temporary content operator that focuses on science and technology. Josep writes in all things Ai, covering the use of ongoing explosion on the stadium.



