I-NVIDIA Ikhipha I-Nemotron 3.5 ASR: Imodeli Yokusakaza Yenqolobane Yepharamitha engu-600M Ibhala Izilimi Zendawo Ezingu-40 Ngesikhathi Sangempela

Ithimba le-NVIDIA le-Nemotron Speech likhiphe i-Nemotron 3.5 ASR. Kuyimodeli yepharamitha engu-600M esakaza i-Automatic Speech Recognition (ASR). Indawo yokuhlola eyodwa ibhala izilimi zendawo ezingu-40 ngesikhathi sangempela. Izimpawu zokubhala nosonhlamvukazi zakhiwe ngokomdabu. Imodeli ihamba njengezisindo ezivulekile ku-Hugging Face. Ilayisensi yi-OpenMDW-1.1. Izakhiwo ziyi-Cache-Aware FastConformer-RNNT.
Iyini i-Nemotron 3.5 ASR
I-Nemotron 3.5 ASR inweba nvidia/nemotron-speech-streaming-en-0.6b ngezilimi eziningi. Yengeza isimo se-ID yolimi esisekelwe ekwazisweni kumodeli yesisekelo. Lokho kuvumela indawo yokuhlola eyodwa yepharamitha engu-600M ikhava izilimi zendawo ezingama-40. Ayikho imodeli yolimi ngalunye noma ukushintshwa kwemodeli okudingekayo.
Imodeli iqondise imisebenzi emibili. Eyokuqala iwukusakazwa kokubambezeleka okuphansi komsindo obukhoma. Okwesibili ukulotshwa kwenqwaba ye-high-throughput. Okukhiphayo umbhalo olungele ukukhiqizwa onobukhulu obufanele kanye nezimpawu zokuloba. Asikho isinyathelo esihlukile sokubuyisela izimpawu zokuloba esidingekayo.

Isebenza kanjani i-Cache-Aware FastConformer-RNNT
Imodeli inezicucu ezimbili eziyinhloko. Eyokuqala i-Cache-Aware FastConformer encoder enezendlalelo ezingama-24. I-FastConformer iwukuguquguquka okusebenzayo kwezakhiwo zeConformer. Isebenzisa ukunakwa okukala ngomugqa. Ucezu lwesibili luyi-RNNT (Recurrent Neural Network Transducer) idecoder. I-RNNT ikhipha uhlaka lombhalo ngohlaka njengoba umsindo usakazwa ngaphakathi.
Idizayini “ye-cache-aware” iyisilinganisi esisebenzayo. Ukusakaza-bukhoma okuvikelekile kucubungula kabusha amawindi omsindo agqagqene ngaso sonke isikhathi. Lokho kuphinda umsebenzi ofanayo futhi kwengeza ukubambezeleka. Le modeli igcina inqolobane yesifaki khodi sokuzinaka kanye nokwenza kusebenze kwe-convolution esikhundleni salokho. Iphinda isebenzise lezo zifundazwe ezigcinwe kunqolobane njengoba kufika umsindo omusha. Ngakho-ke uhlaka ngalunye lomsindo lucutshungulwa kanye ncamashi, ngaphandle kokugqagqana. Ukubambezeleka kwekhompyutha kanye nokuphela-kuya-sekupheleni kokubili kuyehla, ngaphandle kwesijeziso sokunemba.
I-Latency Knob: att_context_size
Isilungiselelo esisodwa sokucatshangelwa silawula ukuhwebelana kwe-latency-nemba. Usayizi womongo wokunaka, att_context_size. Umongo omncane ukhipha umbhalo ngokushesha kodwa ubona umsindo omncane wesikhathi esizayo. Umongo omkhulu uphakamisa ukunemba ekubambeni okuphezulu.
Indawo yokuhlola efanayo ifaka ububanzi obugcwele. Imephu yezilungiselelo ukuze ihlukanise osayizi abangu-80ms, 160ms, 320ms, 560ms, no-1.12s. Ngokwesibonelo, [56,0] inikeza imodi yokubambezeleka engu-80ms ephezulu kakhulu. I [56,13] ukulungiselelwa kunikeza i-1.12s ngokunemba okuphezulu. Amaqembu akhetha indawo yokusebenza ngesikhathi esinqunyiwe, ngaphandle kokuqeqeshwa kabusha.
Ukutholwa Nokufakwa Kolimi
Izilimi ezingu-40 zifaka okuhlukile kwesiNgisi, iSpanishi, isiJalimane, nesiFulentshi. Zihlanganisa nesi-Arabhu, isiJapane, isiKorea, isiMandarin, isiHindi nesiThai. Ezinye izilimi ezimbalwa zaseYurophu naseNordic nazo zifakiwe.
Isimo solimi sisebenza ngezindlela ezimbili. Ukusetha target_lang endaweni eyaziwayo ngokuvamile inikeza ukunemba okungcono kakhulu. Ukusetha target_lang=auto ivumela imodeli ukuthi ibone ulimi ngokwalo. Kumodi ezenzakalelayo, ikhipha ithegi yolimi ngemva kwezimpawu zokubhala zetheminali. Ukuthunyelwa okukodwa kungase kubhale ithrafikhi yolimi oluxubile. Ayikho ingxenye ehlukene ye-ID yolimi edingekayo.
Ukuqhathanisa
| Umkhiqizo | Inkampani | Ukufinyelela | Ukusakaza bukhoma | Ukusabalala kolimi | Ukubambezeleka okubikiwe | Imodeli yentengo |
|---|---|---|---|---|---|---|
| I-Nemotron 3.5 ASR | I-NVIDIA | Vula izisindo (OpenMDW-1.1), ukuzibamba; umbuki zindwendwe ngu DeepInfra | Yebo – i-cache-aware FastConformer-RNNT | Izilimi-zendawo ezingama-40 | 80ms–1.12s, okulungisekayo ngokuqonda | Kumahhala ukuzibamba; isekelwe ekusetshenzisweni ngomsingathi |
| Ukuhleba okukhulu-v3 | I-OpenAI | Vula izisindo (MIT), ukuzibamba; I-API | Cha – okungaxhunyiwe ku-inthanethi/inqwaba | ~99 izilimi | Akukona okomdabu wokusakaza | Ukuzibamba mahhala; I-API ~$0.006/min (inqwaba) |
| I-Nova-3 | I-Deepgram | I-API evaliwe; endaweni/uzibambele ngokwakho (ibhizinisi) | Yebo – ukusakaza + inqwaba | Izilimi eziningi; +10 izilimi zolimi olulodwa zengezwe Jan 2026 | Ukusakaza-bukhoma okuphansi (okubikiwe kwe-sub-300ms) | ~$0.0077/min (Nova-3 Monolingual, PAYG) |
| Ukusakazwa kwe-Universal-3 Pro | InhlanganoAI | I-API evaliwe (i-EU endpoint iyatholakala) | Yebo | Izilimi eziyisi-6: isiNgisi, iSpanishi, isiFulentshi, isiJalimane, isiNtaliyane, isiPutukezi | I-Sub-300ms (esemthethweni); ingxenye yokuqala ~ 750ms | Isekelwe ekusetshenzisweni (PAYG) |
| Scribe v2 Isikhathi sangempela | ElevenLabs | I-API evaliwe | Yebo | Izilimi ezingu-90+ (99 per ElevenLabs) | ~150ms (p50) | ~$0.28/ihora |
| Ursa / ukusakaza | I-Speechmatics | I-API + on-premise + onqenqemeni | Yebo – ukusakaza + inqwaba | Izilimi ezingu-50+ ezinokuhlonza okuzenzakalelayo | Ukubambezeleka okuphansi kakhulu (okumi) | Ibhizinisi/ukusetshenziswa |
Ukushuna Kahle Imiphumela
Ngenxa yokuthi izisindo zivuliwe, amaqembu angakwazi ukushuna kahle ulimi, isizinda, noma indlela yokuphimisa. I-NVIDIA ishicilele isibonelo esisetshenziwe ngesiGreki nesiBulgaria. Ilungise kahle indawo yokuhlola eyisisekelo ngeresiphi efanayo ye-Cache-Aware FastConformer-RNNT. Isiqeshana ngasinye siphethe a target_lang umaka wokulungisa ulimi. Idatha yokuqeqesha ivele kuzinkampani zomphakathi, okuhlanganisa iGranary, i-Common Voice, ne-FLEURS.
Imiphumela ikalwe njenge-WER ku-FLEURS ebambezelekile, ngokulungiselelwa kwe-80ms. I-Greek WER yehle isuka ku-35 yaya ku-24, ukuthuthuka okuhlobene okungama-32%. IsiBulgaria sehle sisuka ku-22 saya ku-15, ukuthuthuka okuhlobene okungama-31%. Lawa amaphesenti e-WER angakaphesenti kumodi yokusakaza yokubambezeleka ephansi kakhulu. I-NVIDIA iphawula ukuthi ukuhlola ekusetshenzisweni kwe-latency, kudatha ebanjiwe, kunikeza izinombolo ezithembekile.
Amandla Nokucatshangelwa
Amandla:
- Indawo yokuhlola eyodwa engu-600M-ipharamitha ihlanganisa izilimi-zendawo ezingama-40, ukunqamula ukusakazeka kokuthunyelwa.
- Ukusakaza-bukhoma okuqaphela inqolobane kucubungula ifreyimu ngayinye kanye, kubikwe ku-17x buffered concurrency ku-H100.
att_context_sizeishuna ukubambezeleka ukusuka ku-80ms kuye ku-1.12s ngokuqonda, ngaphandle kokuqeqeshwa kabusha.- Izimpawu zokuloba, osonhlamvukazi, kanye
autoukumaka ngolimi kwakhiwe ngaphakathi. - Izisindo ezivuliwe zinike amandla ukwehla okungu-31–32% kwe-WER ku-Greek nesi-Bulgarian ngemva kokulungiswa kahle.
Okucatshangwayo:
- Imodeli iphatha isiNgisi, kodwa i-NVIDIA incoma imodeli yayo ezinikele yesiNgisi ukuze isetshenziswe isiNgisi kuphela.
- Imodi ye-80ms ihweba ngokunemba okuthile ukuze uthole ukubambezeleka okuphansi kakhulu.
- IsiJapane nesiKorea zisebenzisa i-CER, ngakho-ke ukuqhathanisa kwamaphutha olimi oluhlukene kudinga ukunakekelwa.
- Izibalo zokuphuma zilinganiswa ku-H100, ngakho imiphumela kwamanye ama-GPU izohluka.
- Ukukhiqizwa kwe-NIM enokusakazwa kwe-gRPC kuyamenyezelwa, kodwa akukakakhululwa.
Okuthathwayo Okubalulekile
- I-NVIDIA's Nemotron 3.5 ASR iyisindo esivulekile (OpenMDW-1.1), imodeli yokusakaza yepharamitha engu-600M ebhala izilimi zesifunda ezingu-40 endaweni yokuhlola eyodwa.
- Idizayini yayo ye-Cache-Aware FastConformer-RNNT icubungula uhlaka ngalunye lomsindo kanye, kubikwe ngo-17x ukusakazwa okufanayo kwezindlela ezivinjiwe ku-H100.
- Ukubambezeleka kuyalungiseka ukusuka ku-80ms kuye ku-1.12s lapho kucatshangwa khona
att_context_sizengaphandle kokuqeqeshwa kabusha. - Ukusikwa okucolisekileyo okufushane kwe-FLEURS WER 32% ngesi-Greek (35→24) kanye no-31% kusi-Bulgarian (22→15), ekusethweni kwe-80ms.
- Iyazibambela yona futhi ingowomdabu wokusakaza, ngokungafani nama-API avaliwe (Deepgram, AssemblyAI, ElevenLabs) noma i-Whisper engaxhunyiwe ku-inthanethi.
Isichazi Esibonakalayo sikaMarktechpost
I-NEMOTRON 3.5 ASR
1 / 10
Ikhethelwe onjiniyela be-AI ngu I-Marktechpost – Ukufakwa kochwepheshe bokuqala kwe-AI ne-ML.
Hlola Izisindo zemodeli. Futhi, zizwe ukhululekile ukusilandela Twitter futhi ungakhohlwa ukujoyina wethu 150k+ ML SubReddit futhi Bhalisela ku Iphephandaba lethu. Linda! ukutelegram? manje ungasijoyina kuthelegramu futhi.
Udinga ukusebenzisana nathi ekuthuthukiseni i-GitHub Repo yakho NOMA Ikhasi Lobuso Lokugona NOMA Ukukhishwa Komkhiqizo NOMA I-Webinar njll.? Xhuma nathi




