ANI

Amathuluzi Okulungisa Okubonakalayo Okusebenza Komshini Wokufunda

# Isingeniso

Ukuqeqesha imodeli yokufunda yomshini nokubona ukwehla kokulahlekelwa kuwumuzwa wenqubekelaphambili, kuze kube yilapho ukunemba kokuqinisekisa kufinyelela endaweni eyithafa noma ukulahlekelwa kuqala ukukhuphuka, futhi awunaso isiqiniseko sokuthi yini ebangele lokho. Ngaleso sikhathi, abantu abaningi bengeza ukugawulwa kwemithi okwengeziwe noma baqale ukulungisa ama-hyperparameter, benethemba lokuthi kukhona okuzoshintsha. Okweqa abahlaziyi abaningi kulesi sigaba ukubonakala kwangempela kulokho okwenzeka ngaphakathi kwemodeli ngesikhathi sokuqeqeshwa. Amathuluzi okulungisa amaphutha abonakalayo anganikeza imininingwane ewusizo kulesi sigaba.

Kulesi sihloko, sihlanganisa izihloko ezintathu: yini okufanele uyibone ngeso lengqondo ngesikhathi sokuqeqeshwa (ama-gradient, ukulahlekelwa, nokushumeka), amathuluzi ahlinzeka ngalokho okubonwayo (I-TensorBoard kanye nezinye izindlela zayo eziyinhloko), kanye nezindlela zokuthwebula imodeli yezibalo usebenzisa amahhuku nama-breakpoint.

Amathuluzi Okulungisa Okubonakalayo Okufunda Ngomshini

# Ukubona ngeso lengqondo ama-Gradients, Ukulahlekelwa, kanye Nokushumeka

// Amajika Okulahlekelwa

Lapho uqeqesha imodeli, ijika lokulahlekelwa ngokuvamile liyinto yokuqala okufanele uyihlole. Lapho kokubili ukulahlekelwa kokuqeqeshwa nokulahlekelwa kokuqinisekisa kwehla futhi kuhlala kuseduze, kubonisa ukuthi ukuqeqeshwa kuqhubeka kahle. Uma ukulahlekelwa kokuqinisekisa kuqala ukukhuphuka ngenkathi ukulahlekelwa kokuqeqeshwa kuqhubeka kwehla, imodeli ilingana ngokweqile. Uma womabili ejika ithafa kusenesikhathi, imodeli ayifundi, okubonisa inkinga ngedatha noma izinga lokufunda.

Ngaphezu kwalokho, ukugeleza kwe-gradient nakho kubalulekile. Inkinga yegradient eshabalalayo ingase ibonakale ekusebenzeni uma amajika okulahlekelwa ehla ngokushelela kodwa kancane kakhulu, okubonisa ukuthi ama-gradient mancane kakhulu ngesikhathi efinyelela izendlalelo zakuqala.

Isakhiwo esiboniswe ngezansi silingisa iphethini yokugcwalisa ngokweqile. Kokubili ukulahlekelwa kuncipha ndawonye kuma-epoch ayishumi okuqala, bese ukulahlekelwa kokuqinisekisa kuqala ukwanda ngenkathi ukulahlekelwa kokuqeqeshwa kuqhubeka nokwehla.

Ulayini onamachashazi abomvu uphawula lapho ukuhlukana kuqala khona: ekuhambeni kwangempela, lelo iphuzu lokuqala ukuphenya ngokujwayelekile noma ukuyeka ngaphambi kwesikhathi.

import torch
import torch.nn as nn
import matplotlib.pyplot as plt

model = nn.Sequential(nn.Linear(16, 16), nn.Tanh(),
                      nn.Linear(16, 16), nn.Tanh(),
                      nn.Linear(16, 1))

grad_magnitudes = {}

def grad_hook(name):
    def hook(module, grad_input, grad_output):
        grad_magnitudes[name] = grad_output[0].abs().mean().item()
    return hook

for i, layer in enumerate(model):
    layer.register_backward_hook(grad_hook(f"Layer {i}"))

output = model(torch.randn(32, 16))
output.mean().backward()

plt.bar(grad_magnitudes.keys(), grad_magnitudes.values())
plt.title("Mean Gradient Magnitude per Layer")
plt.ylabel("Mean |gradient|")
plt.xticks(rotation=15)
plt.tight_layout()
plt.show()

Iyakhipha:

Amathuluzi Okulungisa Okubonakalayo Okufunda Ngomshini

// I-Raw Gradient Magnitudes

Layer 4 (Linear): 0.031250
Layer 3 (Tanh): 0.004646
Layer 2 (Linear): 0.004241
Layer 1 (Tanh): 0.002126
Layer 0 (Linear): 0.001631

Ishadi lifundeka kwesokudla kuye kwesobunxele: Isendlalelo sesi-4 simele ungqimba oluphumayo, futhi Isendlalelo esingu-0 singesokuqala. Isendlalelo esiphumayo sithola ukuhleleka okungu-0.031, kodwa ngesikhathi sifika ku-Usendlalelo 0, lelo nani lehle laya ku-0.0016 — cishe libe lincane izikhathi ezingu-20.

Ibha ebomvu evela kusendlalelo ngasinye kwezithathu zokuqala ikhombisa ukuthi ama-gradient asevele asendaweni engcuphe ngaphambi kokuthi afinyelele ekuqaleni kwenethiwekhi. Ekuqeqesheni kwangempela kumodeli ejulile, lezi zingqimba zokuqala zingalungisa izisindo zazo kancane kangangokuthi ngeke zifunde lutho.

Lesi isibonelo esisebenzayo senkinga ye-gradient eshabalalayo: izendlalelo zakuqala zingaphansi kokuqeqeshwa buthule, okungenakubonwa ngaphandle kwalolu hlobo lwesakhiwo.

// Ukubonwa Kwegradient

Ukuhlela i-gradient magnitudes layer by layer ngesikhathi sokuqeqeshwa kunikeza umbono oqondile wokuthi ingabe ama-gradient afinyelela izingxenye zakuqala zenethiwekhi ngamavelu amakhulu. Kumamodeli ajulile, ama-gradient angase anyamalale njengoba ehlehla ngezingqimba. Ama-histogram yenani legradient yesendlalelo ngasinye, arekhodwa phakathi nokuqeqeshwa, angadalula le phethini futhi asisize sihlonze inkinga kusenesikhathi.

I-PyTorch's register_backward_hook umsebenzi usivumela ukuthi sithole ama-gradient tensor kusuka kunoma iyiphi isendlalelo ngaphandle kokushintsha iluphu yokuqeqeshwa. Sixhuma ihhuku kumojula, esebenza ngesikhathi sokudlula ngakunye, sithumela ama-gradient tensor ku-callback ecacisiwe.

I-histogram engezansi ibonisa ukusatshalaliswa okuphelele kwamanani egradient yesendlalelo ngasinye ngemva kokudlula okukodwa okubuyela emuva. Isiqephu ngasinye esincane simelela isendlalelo esisodwa, esihlelwe ukusuka kwesendlalelo sokuqala kuya kwesokugcina.

Ikhodi yalokhu ingatholakala lapha.

Amathuluzi Okulungisa Okubonakalayo Okufunda Ngomshini

Esikufunayo kunethiwekhi enempilo ama-histograms kuzo zonke izendlalelo ezinokusabalalisa okucishe kufane.

Uma izendlalelo zangaphambili zibonisa ukusatshalaliswa okuncane kakhulu, okunjenge-spike kugxile kuziro, lokho kungase kube ifulegi elibomvu elibonisa ama-gradient ashabalalayo.

Ama-gradient asekhona, kepha mancane kakhulu awaphethe cishe ulwazi lokufunda. Lokhu kuboniswa kungasisiza ukuthi sibambe le phethini ngemva kwamaqoqo ambalwa okuqala, kunokuba ngemva kokuqeqeshwa okugcwele.

// Ukushumeka

Uma imodeli iveza okokufaka kwesethulo esifundiwe, ukubona lokho kuvezwa kusitshela ukuthi imodeli iyayihlukanisa yini idatha ngendlela ebesingayilindela. Indlela ejwayeleke kakhulu ukuthatha okushumekiwe kumodeli eqeqeshiwe (noma eqeqeshwe kancane), ukunciphisa ubukhulu bazo kusetshenziswa. t-SNE noma UMAPfuthi uwahlele ngamalebula ekilasi njengemibala.

Uma amakilasi eqinile futhi ehlukaniswe kahle, lokho kusho ukuthi imodeli ifunde ukuhlukana okuwusizo. Amakilasi agqagqene asho ukuthi imodeli ayikahlukanisi imiqondo kuze kube manje. Lesi sinyathelo siwusizo kumamodeli okulungisa amaphutha aqeqeshwe embhalweni noma ezithombeni ngaphambi kokwengeza isendlalelo sokugcina sokuhlukanisa.

# I-TensorBoard kanye Nezinye Izindlela Zayo

Amathuluzi Okulungisa Okubonakalayo Okufunda Ngomshini

// I-TensorBoard

I-TensorBoard yindawo yakho evamile yokuqala. Yakhelwe ekuqaleni I-TensorFlowisebenza ngePyTorch ngokusebenzisa torch.utils.tensorboard. Idatha ingafakwa ngokusebenzisa a SummaryWriter into, futhi ungakwazi ukubuka imiphumela kuthebhu yesiphequluli. Iphatha ama-scalar (ukulahlekelwa, ukunemba), ama-histogram (isisindo nokusatshalaliswa kwegradient), izithombe, kanye neprojektha yokushumeka ukuze ubone ngeso lengqondo izethulo ezinobukhulu obuphezulu.

Umkhawulo oyinhloko indawo yayo. Ukwabelana ngemiphumela yakho nethimba kusho ukuhlela isitoreji esabiwe samafayela okungena noma ukusebenzisa i-TensorBoard.dev, enemikhawulo kulokho elikusekelayo.

// Izisindo & Ukuchema

Izisindo & Ukuchema (I-W&B) yilokho amaqembu amaningi okufunda ngomshini akusebenzisela ukubambisana noma ukulandelela okunemininingwane eyengeziwe.

Ukusetha kwenziwa ngemigqa emibili: wandb.init() ekuqaleni kokugijima futhi wandb.log() ngaphakathi kweluphu yokuqeqeshwa. Yonke into ivunyelaniswa kudeshibhodi yefu ngokuzenzakalelayo, futhi okugijimayo kuqoqwe ngephrojekthi, okwenza ukuqhathanisa kokuhlolwa kuqonde ngqo.

Hlola amazwibela ekhodi ngezansi:

import wandb

wandb.init(project="my-model", config={"lr": 0.001, "epochs": 20, "batch_size": 32})

for epoch in range(wandb.config.epochs):
    train_loss = 1 / (1 + 0.3 * epoch)   # simulated
    val_loss   = train_loss + max(0, 0.04 * (epoch - 10))  # simulated
    wandb.log({"epoch": epoch, "train_loss": train_loss, "val_loss": val_loss})

wandb.finish()

Uma ukuqalisa sekuqediwe, amamethrikhi afakiwe angabukwa kudeshibhodi ye-W&B, eduze nokulungiselelwa okuwakhiqizile. Ukuqhathanisa ama-run amabili namapharamitha ahlukene kungenziwa kalula ngokuwakhetha kusixhumi esibonakalayo, ngaphandle kokwahlukaniswa kwelogi okwenziwa ngesandla okudingekayo.

I-W&B futhi isekela ukushanela kwe-hyperparameter ngokubonakala okwakhelwe ngaphakathi, okubonisa ukuthi imaphi amapharamitha athinte umphumela kakhulu.

Amamethrikhi esistimu afana nokusetshenziswa kwe-GPU nokusetshenziswa kwememori nawo alogwa ngokuzenzakalelayo. Emaqenjini asebenzisa izivivinyo eziningi ngokuhambisana, indawo yokusebenza okwabelwana ngayo isusa okungaphezulu kwezandla zokugcina umkhondo walokho okuzanyiwe.

// Ingcwele

Ingcwele ithatha indlela ehlukile. Igxile ekukhiqizeni kabusha esikhundleni sokubuka ngeso lengqondo. Sichasisa iskripthi sokuqeqesha ngomhlobisi wokuhlola we-Sacred, orekhoda konke ukulungiselelwa, noma yiziphi izinguquko ezenziwe ngesikhathi sokusebenza, nawo wonke amamethrikhi arekhodiwe kusizindalwazi (ngokuvamile i-MongoDB). Ngale ndlela, ukugijima ngakunye kanye nezilungiselelo zakho eziqondile ziphenduka irekhodi elihlala njalo.

Okwengxenye yokubuka ngeso, amapheya Angcwele anamaphethelo angaphambili njenge-Omniboard noma i-Sacredboard. Lokhu kungeza ubunkimbinkimbi uma kuqhathaniswa ne-TensorBoard noma i-W&B, kodwa amandla awokufundeka: noma ikuphi ukugijima okuvela esikhathini esidlule kungenziwa kabusha njengoba bekumisiwe.

// I-Guild.ai

I-Guild.ai isebenza kusukela kulayini womyalo futhi ayidingi ukuthi ushintshe ikhodi yokuqeqesha. Senza isikripthi sokuqeqesha ngokusebenzisa i-Guild sisebenzisa guild run train.pyerekhoda wonke amalogi akhiqizwa iskripthi kanye nanoma yimaphi amafayela okukhiphayo, ewaxhuma kulokho kuqalisa okuthile. Amamethrikhi nokuqhathanisa okusebenzayo kuyatholakala ngesixhumi esibonakalayo somugqa womyalo we-Guild (CLI) noma i-UI yakhona yendawo.

Lolu hlaka luyisinqumo esihle uma usebenza nemibhalo ekhona noma ikhodi yenkampani yangaphandle esincamela ukuyishintsha. Ihlinzeka ngezici ezimbalwa kune-W&B, kodwa izindleko zokusetha nazo ziphansi.

# Ukusebenzisa Ama-Breakpoints kanye Nezihhuku Zezibalo Zokufunda Ngomshini

// Izingwegwe Zaphambili Nasemuva

Isistimu yehuku ye-PyTorch isivumela ukuthi sibambe izibalo nganoma isiphi isikhathi ekudluleleni phambili noma emuva kwemodeli. I register_forward_hook umsebenzi unamathisela i-callback kunoma yisiphi isendlalelo, futhi ivutha njalo lapho ungqimba lucubungula iqoqo. I-callback ithwebula okokufaka nokukhishwayo kwesendlalelo, esingabe sesingena ngemvume, sihlole amanani e-NaN, noma isakhiwo.

I register_backward_hook umsebenzi wenza okufanayo ekudluleni emuva, okusinika ukufinyelela ku-gradient tensor egeleza kusendlalelo ngasinye. Ndawonye, ​​lawa mahhuku amabili amboza okuningi kwalokho esingathanda ukukuhlola phakathi nokuqeqeshwa ngaphandle kokulungisa incazelo yemodeli noma iluphu yokuqeqeshwa.

Uhlelo lokusebenza olungokoqobo ukutholwa kwamanani e-NaN. Ihuku eliya phambili elihlolayo tensor.isnan().any() kukho konke okukhiphayo kwesendlalelo kuthola ukungazinzi kwezinombolo ngaso leso sikhathi, kuvimbele ukuthi singasakazeki futhi kulimaze konke ukuqeqeshwa.

Nasi isibonelo esincane sokusebenza, kusetshenziswa imodeli yezendlalelo ezintathu enehhuku enamathiselwe kusendlalelo ngasinye:

import torch
import torch.nn as nn

model = nn.Sequential(nn.Linear(8, 16), nn.ReLU(), nn.Linear(16, 4))

def nan_hook(layer, input, output):
    if output.isnan().any():
        print(f"[NaN detected] Layer: {layer.__class__.__name__}")
    else:
        print(f"[Clean] Layer: {layer.__class__.__name__}, output shape: {tuple(output.shape)}")

for layer in model:
    layer.register_forward_hook(nan_hook)

print("--- Normal input ---")
model(torch.randn(2, 8))

print("n--- Corrupted input ---")
bad_input = torch.randn(2, 8)
bad_input[0, 3] = float('nan')
model(bad_input)

Okukhiphayo okulindelekile lapho kuqaliswa:

--- Normal input ---

[Clean] Layer: Linear, output shape: (2, 16)
[Clean] Layer: ReLU, output shape: (2, 16)
[Clean] Layer: Linear, output shape: (2, 4)

--- Corrupted input ---

[NaN detected] Layer: Linear
[NaN detected] Layer: ReLU
[NaN detected] Layer: Linear

Kulesi sibonelo, ihuku ihlola i-tensor yokuphumayo ngemva kokusha kwesendlalelo ngasinye futhi ibike ukuthi ihlanzekile noma yonakele.

Ukuyiqhuba kabili – kanye nokokufaka okuvamile kanye ne-NaN eyodwa ejovwe – kubonisa ukuthi ukungazinzi kubhebhetheka kanjani kunethiwekhi, isendlalelo ngesendlalelo.

// I-Debugger Breakpoints

I-Standard Python debuggers isebenza kahle ngaphakathi kwezihibe zokuqeqesha.

Ukuwisa import pdb; pdb.set_trace() nganoma isiphi isikhathi ima isikhashana ukwenza futhi ilethe umyalo osebenzisanayo osivumela ukuthi sihlole umumo we-tensor, siqinisekise ukuthi ukucubungula idatha akukhiqizanga amanani angalindelekile, futhi sidlule mathupha sidlulele phambili.

Izindawo eziningi zokuthuthukisa umshini wokufunda – I-VScode futhi I-PyCharm kokubili – ake sibeke izindawo zokunqamuka ngezithombe futhi sihlole ama-tensor kufasitelana elizinikele, sinikeze enye indlela esheshayo kutheminali-based. pdb esibonakalayo.

Nokho, ama-breakpoint abaluleke kakhulu phakathi neqoqo lokuqala noma amabili, njengoba siqinisekisa ukuthi idatha, imodeli, nomsebenzi wokulahlekelwa kusebenza kahle ngaphambi kokuqala ukuqeqeshwa okuphelele.

# Isiphetho

Ukuqeqesha imodeli ngaphandle kokubona okwenzekayo ngaphakathi kusho ukuhumusha izimpawu kunezimbangela zangempela.

Amathuluzi Okulungisa Okubonakalayo Okufunda Ngomshini

Lapho uqeqesha imodeli, kungakhathaliseki ukuthi ijika lejika lokulahlekelwa kusenesikhathi, ama-gradient ayanyamalala, noma ukushumeka akuhlukani, ngaphandle kwezinto ezisetshenziswayo ezifanele, ayikho kulezi zici ezimemezelayo ngokucacile.

Amathuluzi ahlanganiswe kulesi sihloko asebenza emazingeni ahlukene. Amajika okulahlekelwa kanye nama-histogram e-gradient anikeza impendulo eqhubekayo phakathi nokuqeqeshwa, ukubamba izinkinga ezifana nokufakwa ngokweqile noma ama-gradient ashabalalayo ngaphambi kokuba ahlanganiswe futhi aphule uhlaka lwakho.

Ukushumeka okubukwayo kuveza ukuthi imodeli ifunda ukuhlukana okuhle kudatha. I-TensorBoard, i-W&B, i-Sacred, ne-Guild.ai ngayinye iphatha uhlangothi lokugawulwa kwemithi nokulandelela ngendlela ehlukile, kodwa zonke zisebenzisa injongo efanayo: ukwenza umlando wokuhlolwa ukwazi ukusesheka futhi uqhathaniseke kunokuba uhlakazeke. Ekugcineni, amahhuku nezilungisi zamaphutha ziqhubekela phambili futhi zikuvumela ukuthi ume kancane futhi uhlole ama-tensor angempela ageleza kunethiwekhi kunoma yisiphi isendlalelo.

Noma kunjalo, lawa mathuluzi awakwazi ukulungisa imodeli ephukile ngokwawo. Abakwenzayo ukufinyeza ibanga phakathi kokuthile okungahambanga kahle nokuqonda ukuthi kungani – okuvamise ukuba ngumsebenzi omningi.

Nate Rosidi ungusosayensi wedatha nakusu lomkhiqizo. Uphinde abe nguprofesa osizayo ofundisa izibalo, futhi ungumsunguli we-StrataScratch, inkundla esiza ososayensi bedatha ukulungiselela izingxoxo zabo ngemibuzo yenhlolokhono yangempela evela ezinkampanini eziphezulu. U-Nate ubhala ngamathrendi akamuva emakethe yemisebenzi, unikeza izeluleko zenhlolokhono, wabelane ngamaphrojekthi wesayensi yedatha, futhi uhlanganisa yonke into ye-SQL.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button