Machine learning lessons I learned this month

nimda October 27, 2025

0 7 5 minutes read

Machine learning lessons I learned this month

In machine learning it is the same.

Entering the codes, waiting for the results, interpreting them, going back to the code. Also, some representations are among human development. But, things that are very similar does not mean that there is nothing to learn. Quite the opposite! Two to three years ago, I started a daily habit of documenting the lessons I learned from my ML work. Looking back at some of the lessons from this month, I found three practical lessons that stood out:

Using the ReadMe documentation yourself
It asks for bits of mig instead of full gpus
Sprinkle movement throughout the day

Save the Readme – For your help

Most questions are written with other people in mind. They are there for new participants, or to make an open repider usable by strangers.

Write for yourself, mostly – mostly for yourself – forward itself.

When you're in the middle of a project, all the subtleties, commands, and settings stay in your head. After a pause, not so much. I ran into this recently when I had to prepare a paper review. At ML Research, reviews often take months. At that point, you move on to the next project with new details, new code, new assemblies. When the update finally arrives, you go back to the old project and … Spend half a day recreating the resulting incentive.

That's a lot of pressure when the deadline is fast approaching.

Preparing for your forgetfulness is part of the job. A little Readme saves a big headache.

What should you hold (for you)

Keep it active. Your future doesn't want prose; They want “how”:

Project QuickStart. Environment Setup, exact Python Version, Ev file or conda/pip commands.
Data areas. Where raw and processed data live; How to download, cache, and checksum. Be careful anywhere Gotchas (eg
Reproduce results. One instruction per art: Figures, tables, and tests.
Training and procrastination. Specific instructions for conducting advanced tests; how to start over; How to put the seeds.
Hyperparameter search. The command you actually used, in grades; where the results come in.
General issues. Anything you know you'll forget (required Env vars, GPU flags, named files).
Changelog. Single line characters for logical changes.

A small, general template you can adapt to all projects:

# 

## Quickstart
# env
conda create -n proj python=3.10 -y
conda activate proj
pip install -r requirements.txt

## Data
# download & preprocess
python tools/download_data.py --out data/raw
python tools/preprocess.py --in data/raw --out data/processed

## Train
python train.py --cfg cfgs/base.yaml --seed 42

## Evaluate
python eval.py --ckpt runs/exp123/best.ckpt --split test

## Reproduce Figures
python scripts/fig_1.py  # outputs to figs/fig_1.png
python scripts/tab_2.py  # writes tables/tab_2.csv

## Hyperparameter Search
python sweep.py --study local.pkl --n-trials 100

## Notes / Pitfalls
- Requires CUDA 12.1
- Set `WANDB_MODE=offline` if no internet

MIG pieces for quick editing

Training the current generation of large language models requires hundreds (or thousands) of high-end GPUs. But most ML- and ml-day work do not require LLM-Scale models. Many problems are solved with compact CNNs or small MLPS – and those do not need a full A100 / H100 GPU.

It requests a full GPU for a small model watures resources and gets you back in line. I did this hard this month: I was training a 4-layer mlp And years of waiting are planned. In my programming requests, I requested a full GPU termination. Of course, this is most wanted by those jobs that really need it (like structured llms).

When I switched to mig slice, jobs started faster and my iTeration speed jumped.

What is Mig and why do you use it?

MIG (multiple service) it allows you to split the latest nvidia GPU into several separate slices ” One large GPU becomes up to seven small virtual GPUs, and the total vram is divided across these slices. Basically, each slice is a small GPU. With many workloads, those small slices are more than enough.

There are additional benefits: Few people ask for pieces (because they don't know that opportunity), so schedules can't fit your work right away. This allows you to go over your models quickly, reducing the time to get the best results.

Using it with practice

Check availability. Ask your Cluster admins group or view the schedule documentation for the mig partition names (eg. 1g.10gb, 2g.20gb).
Size it right for your application. Start small. If vram ooms, go up one size. Don't settle for a full GPU.
Profile memory in advance. Run a small batch to read the peak vram; Choose the smallest slice that leaves 10-20% of the head.
The template is your job. Keep a record of mig activity and one of the full GPU; Change the flag.

Combating prolonged motion

Most computer work is done in front of a, well, computer. We have all noticed that the more time we spend, the more we become. We are surrounded in front; All work is done before us *.

This is not a good position to hold for long, but it is common these days. For ML people, computers are our tools and we need to) spend a lot of time with this opportunity.

Fortunately, we don't need to spend time in a bad situation when working on the screen (no, those two are not connected Each description).

I noticed this (again, because it seems I'm going back to old habits) this month: hours of reading, coding, and meetings pulled my shoulders forward and out of my upper back. Then, after a few days of reading difficult papers, my shoulders reminded me to change something.

The fix, it turned out, was simple and didn't require a gym or a full workout. It just needs to Exchange of positions and Short snacks.

I put together a little program, where I used parts throughout the day (check YouTube for any of those exercises if you don't know them. That information may help you do your job better and longer – and healthier):

Audio only meetings: Get up and walk around. If you have to sit near a desk, switch to Separate the situation (one foot forward) to open the hips.
Setup takes two minutes (eg. After a coffee break or appearing at the printer, etc.):
- Band Dons-orps or face pulls (10-15 reps)
- Wall Pec Stretch (30-45s each side)
- HIP Flevel Streak / Couch Stretch (30-45s each side)
The learning blocks are: Print a paper or read on a tablet while standing; The other is standing with residential blocks.

Additionally, I found that a short morning session made my shoulders feel better and improved focus throughout the day – a welcome bonus, because ML work requires focus:

5 Min Easy Cardio (walking or stair climbing)
5 Min Mobility (thoracic rotation, shoulder circles, deep squat held)
5 min light strength (linges, push-ups against the desk, band rows)

* The argument that often raises: is not all the work done in front of us? Or, put differently: What work requires us to work with our hands behind us? On the right, except maybe the symymys, almost all the activities are happening in front of us. After all, that's where our eyes are! But: Non-computer activities get different movements and day: Grabbing something from a shelf, dragging something, etc. Calculated exchange.

Source link

nimda October 27, 2025

0 7 5 minutes read