I-PyTorch NaNs Ngababulali Abathule – Ngakho Ngakhe Ihuku Le-3ms Ukuze Ngibabambe Ngesendlalelo Esiqondile

- Ama-NaN awaveli lapho avela khona – basakaza buthule kuzo zonke izingqimba
torch.autograd.set_detect_anomalykuyinto inensa kakhulu futhi ivame ukudukisa ngokulungisa iphutha langempela- A umtshina osuselwe phambili ehukeni ingabamba ama-NaN kusendlalelo esiqondile futhi iqoqo aqala ukwenzeka
- Okungaphezulu kungu-~3–4 ms ngokudlula okuya phambili ngakunye, kuphansi kakhulu kunokutholwa okudidayo (ikakhulukazi ku-GPU)
- Ukuqhuma kwe-gradient kuyimbangela yangempela ezimweni eziningi – ukuyithola kusenesikhathi kuvimbela ama-NaN ngokuphelele
- Amalogi wesistimu imicimbi ehleliwe (ungqimba, iqoqo, izibalo) ukuze kulungiswe iphutha
- Idizayinelwe ukukhiqiza: intambo ephephile, iboshwe inkumbulo, futhi iyakaleka
Kwakuyiqoqo le-47,000. Okuhlukile kwe-ResNet Bengiziqeqeshe amahora ayisithupha kudathasethi yezithombe zezokwelapha zangokwezifiso. Ukulahlekelwa bekuguquguquka ngokuhlanzekile – 1.4, 1.1, 0.87, 0.73 – bese, lutho. Akulona iphutha. Hhayi ukuphahlazeka. Nje nan.
Ngengeza torch.autograd.set_detect_anomaly(True) futhi yaqala kabusha. Ukuqeqeshwa kwehle kwaze kwaba yilapho kucaca khona – cishe ngo-7-10× ubude ngeqoqwana ngalinye ku-CPU iyodwa – futhi ngemva kwamahora amathathu ngigcine ngithole umkhondo westaki okhomba ungqimba, ngokungananazi, olubukeka lulungile. Icala langempela kwaba umhleli wesilinganiso sokufunda osebenzisana kabi nesendlalelo sokwenza ngokwezifiso izendlalelo ezimbili ezikhuphuka nomfula. set_detect_anomaly wayengikhombe ku uphawuhhayi umthombo.
Leyo seshini yokususa iphutha ingibize isikhathi esiningi sosuku. Ngakho ngakhe okuthile okungcono.
Ama-NaN awaphazamisi imodeli yakho — ayayonakalisa buthule. Ngesikhathi uqaphela, usuvele ususa iphutha isendlalelo esingalungile.
Ikhodi ephelele:
Inkinga nge set_detect_anomaly
Imikhumbi ye-PyTorch nge torch.autograd.set_detect_anomaly(True)okuyisincomo esijwayelekile sokususa iphutha ezindabeni ze-NaN. Isebenza ngokugcina igrafu yokubala egcwele kanye nokuhlola okudidayo ngesikhathi sokudlula emuva. Lokhu kunamandla, kodwa kuza nezindleko ezingathi sína eziyenza ingalungeli lutho ngaphandle kokuhlola ukuhlanzeka kwendawo.
Inkinga eyinhloko ukuthi iphoqelela injini ye-PyTorch ye-autograd ibe yimodi yokuvumelanisa lapho igcina khona ukusebenza okuphakathi kwakho konke ukusebenza okukodwa. Ku-GPU, lokhu kusho ukwephula ipayipi lokubulala elingavumelaniyo – konke ukwethulwa kwe-kernel kufanele kuqedwe ngaphambi kokuthi kuqale okulandelayo. Umphumela, njengoba kubikiwe emibhalweni ye-PyTorch futhi ebonwa kabanzi ekusebenzeni, ingaphezulu elisukela cishe ku-10–15× ku-CPU kuye ku-50–100× ku-GPU kumamodeli amakhulu. [1][2].
Kukhona inkinga yesibili: set_detect_anomaly ukukhomba lapho i-NaN kusakazwa ku ekudluleni emuva, hhayi lapho ekhona kwavela. Uma i-NaN ingena kunethiwekhi yakho kusendlalelo 3 semodeli yezendlalelo ezingu-50, ukudlula okubuyela emuva kuzoveza iphutha ndawana thize ekubalweni kwegradient kwesendlalelo sakamuva, futhi usale usebenza uhlehla kusukela lapho.
Ibhentshimakhi yami, sebenzisa i-CPU MLP encane (64→256→256→10), ikalwe:
| Indlela | Kusho ukubambezeleka | I-overhead vs isisekelo |
|---|---|---|
| Akukho ukutholwa | ~0.60 ms | isisekelo |
| I-NaNDetector (izingwegwe zangaphambili) | ~3–4 ms | ~5–6× |
set_detect_anomaly |
~7–8 ms | ~12–13× |
Kule modeli encane umehluko ophelele unesizotha. Esikalini – i-transformer enamakhulu ezigidi zamapharamitha kuma-GPU amaningi – igebe liwumehluko phakathi kokugijima kokuqeqeshwa okuqedayo nokungaqedi.
Indlela Yokuhlela: I-Forward Hooks

I-PyTorch's register_forward_hook I-API ikuvumela ukuthi unamathisele ukuphinda ushayele kunoma iyiphi nn.Module evutha ngaso sonke isikhathi lapho leyo module iqeda ukudlula phambili [3]. I-callback ithola imojuli ngokwayo, okokufaka kwayo, kanye nemiphumela yayo. Lokhu kusho ukuthi ungahlola yonke i-tensor egeleza kuzo zonke izendlalelo ngesikhathi sangempela – ngaphandle komthelela kugrafu yokubala, akukho ukuvumelanisa okuphoqelekile, futhi akukho ukwenza kusebenze okugciniwe.
Ukuqonda okubalulekile ukuthi udinga kuphela ukuhlola i-NaN, hhayi ukudlala kabusha ukubala. Isheke ngokumelene torch.isnan() futhi torch.isinf() Ku-tensor okukhiphayo i-CUDA kernel invocation eyodwa futhi iqeda ngama-microseconds.
def hook(module, inputs, output):
if torch.isnan(output).any():
print(f"NaN detected in {layer_name}")
Yilowo umongo womqondo. Okulandelayo inguqulo eyenziwe lukhuni yokukhiqiza.
Ukuqaliswa
Umthombo ogcwele utholakala kokuthi:
Ngizohamba ngezingxenye ezine ezibalulekile.
Ingxenye 1: Ikilasi ledatha le-NaNEvent
Uma i-NaN itholwa, udinga okungaphezu kwesitatimende sokuphrinta. Udinga irekhodi elihlelekile ongalihlola ngemuva kweqiniso, ungene kudiski, noma ulithumele ohlelweni lokuxwayisa.
@dataclass
class NaNEvent:
batch_idx: int
layer_name: str
module_type: str
input_has_nan: bool
output_has_nan: bool
input_has_inf: bool
output_has_inf: bool
output_shape: tuple
output_stats: dict = field(default_factory=dict)
is_backward: bool = False
I output_stats inkambu iqukethe imin, ubuningi, kanye nencazelo ye anomkhawulo amanani ku-tensor yokuphumayo ngesikhathi sokutholwa. Lokhu kuyasiza ngokumangazayo – okukhipha isendlalelo lapho amanani angu-3 eyi-NaN kodwa amanye anesiphetho axoxa indaba ehlukile kunaleyo yonke eyi-NaN.
I is_backward umaka uhlukanisa ukuthi umcimbi ubanjwe ngehuku eya phambili noma ihuku elibuyela emuva, okubalulekile ekuhlaziyweni kwezimpande.
Ingxenye 2: Ukubhaliswa kwehuku ephephile ngochungechunge
Ukucatshangelwa okubaluleke kakhulu kokukhiqiza ukuphepha kwentambo. I-PyTorch's DataLoader iqhuba izinqubo zabasebenzi ezingacupha amahhuku adlulele emicu yangemuva. Uma ushintsha triggered = True futhi self.event = ev ngaphandle kokukhiya, uzothola izimo zomjaho ekusetheni kwezisebenzi eziningi.
self._lock = threading.Lock()
def _make_fwd_hook(self, layer_name: str):
def hook(module, inputs, output):
with self._lock:
if self.triggered and self.stop_on_first:
return
current_batch = self._batch_idx
# ... tensor checks happen outside the lock
if out_nan or out_inf:
self._record_event(...) # lock re-acquired inside
return hook
Amasheke e-tensor ayenzeka ngaphandle kwengidi ngoba torch.isnan() ifundwa kuphela futhi iphephile ngochungechunge. Izinguquko zesimo esabiwe kuphela ezikhiyiwe.
Ingxenye 3: Inkumbulo eboshwe
Inkinga ecashile enokuqeqeshwa okude: uma uqongelela izikhathi eziphezulu ohlwini olungenamkhawulo, uzogcina uqede inkumbulo ekugijimeni okuhlala izigidi zamaqoqo. Ukulungiswa kuyi-cap elula:
_OVERHEAD_CAP = 1000
with self._lock:
if len(self._overhead_ms) < self._OVERHEAD_CAP:
self._overhead_ms.append(elapsed)
Umqondo ofanayo uyasebenza naku all_events nini stop_on_first=False -a max_events ipharamitha (okuzenzakalelayo 100) ivimbela ukunqwabelana okungenamkhawulo phakathi nokugijima kokugula.
Ingxenye 4: I-Gradient normal guard
Indlela evamile yomhlaba wangempela eya ku-NaN ayisona isiphazamisi esikhiqiza ngokuqondile nan – izinga lokufunda eliphezulu kakhulu elidala ukuthi imikhuba ye-gradient iqhume infebese isakaza ezisindweni futhi ikhiqize ukusebenza kwe-NaN ekudluleleni phambili okulandelayo. Ngesikhathi ihuku lakho eliya phambili livutha, usuvele uphuze kakhulu isinyathelo esisodwa.
I check_grad_norms() indlela ibhekana nalokhu ngokuhamba yonke imingcele ngemuva loss.backward() nokugawula a GradEvent nganoma iyiphi ipharamitha okujwayelekile kwegradient kudlula umkhawulo:
def check_grad_norms(self) -> bool:
if self.grad_norm_warn is None:
return False
for name, module in self.model.named_modules():
for pname, param in module.named_parameters(recurse=False):
if param.grad is None:
continue
norm = param.grad.detach().float().norm().item()
if not math.isfinite(norm) or norm > self.grad_norm_warn:
# log GradEvent
Kudemo engezansi, le ndlela ibamba ukuqhuma kwe-gradient ku-batch 1 – isinyathelo esisodwa sokuqeqesha esigcwele ngaphambi kokuba i-NaN ivele ekudluleleni phambili.

Ukusetshenziswa
Okuyisisekelo: umphathi wokuqukethwe
from nan_detector import NaNDetector
with NaNDetector(model) as det:
for batch_idx, (x, y) in enumerate(loader):
det.set_batch(batch_idx)
loss = criterion(model(x), y)
loss.backward()
det.check_grad_norms()
optimizer.step()
if det.triggered:
print(det.event)
break
Lapho umtshina uvutha, det.event iqukethe okugcwele NaNEvent ngegama lesendlalelo, uhlobo lwemojuli, inkomba yeqoqo, nezibalo zokukhiphayo.
Ukukhiqiza: iluphu yokuqeqesha yokungena
from nan_detector import train_with_nan_guard
losses, event = train_with_nan_guard(
model, loader, criterion, optimizer,
device="cuda",
grad_norm_warn=50.0,
)
if event:
print(f"NaN at batch {event.batch_idx}, layer {event.layer_name}")
Okuthuthukisiwe: amahhuku angemuva + amagama ezendlalelo ezifundekayo
Ukuze ubambe ama-NaN e-gradient ngokuqondile (hhayi nje izexwayiso ezivamile), vumela check_backward=True. Sebenzisa OrderedDict lapho ukwakha Sequential amamodeli ukuze uthole amagama afundekayo kukho konke ukuphuma kwelogi:
from collections import OrderedDict
model = nn.Sequential(OrderedDict([
("fc1", nn.Linear(16, 32)),
("relu1", nn.ReLU()),
("fc2", nn.Linear(32, 1)),
]))
with NaNDetector(model, check_backward=True, grad_norm_warn=10.0) as det:
...
Ngaphandle OrderedDicti-PyTorch ibiza izendlalelo ngenkomba (0.weight, 2.bias). Ngayo, uthola fc1.weight, fc2.bias – into encane egcina isikhathi sangempela lapho kulungiswa amamodeli ajulile.
Ukweqa izendlalelo
Ezinye izinhlobo zesendlalelo kulindeleke ukuthi zikhiqize okuphumayo okunganqamuki ngaphansi kwezimo ezijwayelekile – nn.Dropout ngesikhathi se-eval, izendlalelo ezithile zokujwayela ngesikhathi sokudlula phambili kokuqala ngaphambi kokusebenza kwezibalo. Yeqe ngokuthi:
det = NaNDetector(model, skip_types=(nn.Dropout, nn.BatchNorm1d))
Okukhipha Idemo
Ukusebenzisa amademo amathathu kukhiqiza okulandelayo:
────────────────────────────────────────────────────────────
Demo 1: Forward NaN detection + loss curve plot
────────────────────────────────────────────────────────────
[NaNDetector] Attached 5 hooks.
============================================================
NaN/Inf detected! [FORWARD PASS]
Batch : 12
Layer : layer4
Type : Linear
Flags : NaN in INPUT, NaN in OUTPUT
Out shape : (8, 1)
Out stats : min=n/a (all non-finite) max=n/a (all non-finite) mean=n/a (all non-finite)
============================================================
[NaNDetector] Detached. Avg overhead: 0.109 ms/forward-pass
────────────────────────────────────────────────────────────
Demo 2: Backward / grad-norm detection + grad norm plot
────────────────────────────────────────────────────────────
[NaNDetector] Attached 8 hooks (+ backward).
[GradNorm WARNING] batch=1 layer=fc1.weight norm=inf threshold=10.0
[GradNorm WARNING] batch=1 layer=fc1.bias norm=inf threshold=10.0
[GradNorm WARNING] batch=1 layer=fc2.weight norm=inf threshold=10.0
[GradNorm WARNING] batch=1 layer=fc2.bias norm=4.37e+18 threshold=10.0
Caught at batch 1

I-hook ngaphezulu kwe 0.109 ms iphasi ngayinye eya phambili kuDemo 1 inombolo yangempela ongayisho. Isibalo sebhentshimakhi esingu-~3 ms sibonisa imodeli enkudlwana enama-callback amahhuku amahlanu abhalisiwe asebenza ngesikhathi esisodwa – okuyisimo sokukhiqiza esingokoqobo.
Imikhawulo Eyaziwayo
Amahhuku aphambili abona ukwenza kusebenze, hhayi konke ukubala. Uma i-NaN isuka ngaphakathi kwesiko torch.autograd.Function's backward() Indlela, noma ngaphakathi kwesandiso se-C++/CUDA esingaveli ngegama nn.Module ama-submodules, ihuku eliya phambili ngeke liyibambe. Sebenzisa check_backward=True ukumbozwa kwe-gradient-side, kanye grad_norm_warn ukuze uthole isexwayiso kusenesikhathi.
Izikali ezingaphezulu ezinokujula kwemodeli. Ibhentshimakhi yenziwe nge-MLP enezingqimba ezi-5. I-transformer enezendlalelo ezingama-200 izoba nama-callbacks angama-200 adubulayo ngokudlula okuya phambili. I-overhead iseyi-sub-millisecond ihuku ngalinye, kodwa iyanqwabelana. Nciphisa ngokusebenzisa skip_types ukuze ukhiphe izendlalelo ezingezona ezepharamitha njenge ReLU, Dropoutfuthi LayerNorm uma i-overhead iba ukukhathazeka.
Izilinganiso zebhentshimakhi ze-CPU zinomsindo. Isilinganiso esiphezulu phakathi NaNDetector futhi set_detect_anomaly kwahluka phakathi 5x futhi 6x ukugijima konkana ekuhloleni kwami, ngoba ama-microbenchmarks e-CPU esikalini se-sub-millisecond azwela ukuhlela i-OS nesimo senqolobane. Izinombolo ze-millisecond eziphelele zizinzile. Isibalo esingu-50–100× esicashunwe ku-GPU sithathwe emibhalweni ye-PyTorch kanye namabhentshimakhi omphakathi. [1][2]hhayi izilinganiso zami ze-GPU.
Yini Lokhu Akukushintshi
Leli ithuluzi lokususa iphutha nokuqapha, hhayi esikhundleni senhlanzeko yokuqeqeshwa okuhle. Izincomo ezijwayelekile zisasebenza: ukusika i-gradient (torch.nn.utils.clip_grad_norm_), ukuhlela izinga lokufunda ngokucophelela, ukwenza kube lula ukufaka, kanye nokuqaliswa kwesisindo. U-NaNDetector uyakutshela lapho futhi nini inkinga yenzekile – ayikutsheli nganifuthi ukulungisa umsuka kusadinga ukwahlulela kobunjiniyela.
Uma ushaya ama-NaN ekuqeqesheni okunembayo okuxubile (fp16/bf16), izigebengu ezivame kakhulu ukuchichima kwesikali sokulahlekelwa kanye nokungazinzi kwesendlalelo esivamile, futhi lezo kufanele ziphenywe ngokuqondile ngaphambi kokufinyelela ihuku lokususa iphutha.
I-Benchmark Methodology
Wonke ama-benchmarks asetshenziswa ku-CPU (Windows 11, PyTorch 2.x) kusetshenziswa i-MLP enesendlalelo esi-4 enobukhulu bokufaka 64, izendlalelo ezimbili ezifihliwe ezingu-256, kanye nobukhulu bokuphumayo 10. Usayizi weqoqo ubungu-64. Indlela ngayinye inamaphasi angu-30 aya phambili. Ukudlula kokuqala kufakwe phakathi – imiphumela yokuqala ebandayo ingokoqobo futhi kufanele ibalwe. Izikhathi zikalwa nge time.perf_counter() eduze kocingo oludlulisela phambili kuphela, okungabandakanyi ukulayisha idatha noma ukubala kokulahlekelwa.
Umsebenzi wokulinganisa ogcwele ufakiwe emthonjeni futhi ungasebenza nawo benchmark(n_batches=30, batch_size=64).
Izithenjwa
[1] Imibhalo ye-PyTorch. “I-Autograd Mechanics – Ukutholwa Okungaqondakali.” pytorch.org. Itholakala e-:
[2] Imibhalo ye-PyTorch. torch.autograd.set_detect_anomaly. pytorch.org. Itholakala e-:
[3] Imibhalo ye-PyTorch. torch.nn.Module.register_forward_hook. pytorch.org. Itholakala e-:
[4] Imibhalo ye-PyTorch. torch.nn.Module.register_full_backward_hook. pytorch.org. Itholakala e-:
[5] Imibhalo ye-PyTorch. “I-Gradient Clipping – clip_grad_norm_.” pytorch.org. Itholakala e-:
[6] U-Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., … & Chintala, S. (2019). I-PyTorch: Isitayela esibalulekile, umtapo wolwazi wokufunda ngokujulile osebenza kakhulu. I-arXiv preprint arXiv:1912.01703.
[7] I-Python Software Foundation. threading – Ukufana okusekelwe emculweni. Python 3 Documentation. Itholakala e-:
[8] I-Python Software Foundation. dataclasses – Amakilasi wedatha. Python 3 Documentation. Itholakala e-:
[9] Hunter, JD (2007). I-Matplotlib: Indawo yehluzo ye-2D. Ukwenza ikhompuyutha kuSayensi nobunjiniyela9(3), 90–95.
Ukudalula
Ngakha ngabhala ngaleli thuluzi ngokwami. Alukho uxhaso, akukho ukusebenzelana ne-PyTorch noma i-PyTorch Foundation, futhi abukho ubudlelwano bezezimali nanoma iyiphi inkampani eshiwo kulesi sihloko. Amabhentshimakhi asetshenziswa ku-hardware yami futhi akhiqizwa kabusha kusetshenziswa ikhodi esendaweni yokugcina exhunywe ngenhla.
Yonke ikhodi ekulesi sihloko ingeyokuqala. Ithuluzi labhalwa kusukela ekuqaleni; awukho umtapo wezincwadi wokuthola umthombo ovulekile we-NaN osetshenziswe njengesisekelo. Uma usebenzisa lokhu emsebenzini wakho, isichasiselo siyabongwa kodwa akudingekile – ikhodi inelayisensi ye-MIT.
Ukuqhathaniswa kwebhentshimakhi ngokumelene set_detect_anomaly isekelwe ezilinganisweni zami ekucushweni kwehadiwe ethile. Imiphumela izohluka ngesakhiwo semodeli, ihadiwe, nenguqulo ye-PyTorch. Isibalo esiphezulu esingu-50–100× GPU sithathwe emibhalweni esemthethweni ye-PyTorch [1][2] futhi akusona esami isilinganiso se-GPU.
Ikhodi yomthombo ogcwele, okuhlanganisa wonke amademo amathathu kanye nomsebenzi wokuqhathanisa:



