Generative AI

I-NVIDIA Ikhipha I-Nemotron 3.5 ASR: Imodeli Yokusakaza Yenqolobane Yepharamitha engu-600M Ibhala Izilimi Zendawo Ezingu-40 Ngesikhathi Sangempela

Ithimba le-NVIDIA le-Nemotron Speech likhiphe i-Nemotron 3.5 ASR. Kuyimodeli yepharamitha engu-600M esakaza i-Automatic Speech Recognition (ASR). Indawo yokuhlola eyodwa ibhala izilimi zendawo ezingu-40 ngesikhathi sangempela. Izimpawu zokubhala nosonhlamvukazi zakhiwe ngokomdabu. Imodeli ihamba njengezisindo ezivulekile ku-Hugging Face. Ilayisensi yi-OpenMDW-1.1. Izakhiwo ziyi-Cache-Aware FastConformer-RNNT.

Iyini i-Nemotron 3.5 ASR

I-Nemotron 3.5 ASR inweba nvidia/nemotron-speech-streaming-en-0.6b ngezilimi eziningi. Yengeza isimo se-ID yolimi esisekelwe ekwazisweni kumodeli yesisekelo. Lokho kuvumela indawo yokuhlola eyodwa yepharamitha engu-600M ikhava izilimi zendawo ezingama-40. Ayikho imodeli yolimi ngalunye noma ukushintshwa kwemodeli okudingekayo.

Imodeli iqondise imisebenzi emibili. Eyokuqala iwukusakazwa kokubambezeleka okuphansi komsindo obukhoma. Okwesibili ukulotshwa kwenqwaba ye-high-throughput. Okukhiphayo umbhalo olungele ukukhiqizwa onobukhulu obufanele kanye nezimpawu zokuloba. Asikho isinyathelo esihlukile sokubuyisela izimpawu zokuloba esidingekayo.

Umthombo wesithombe:

Isebenza kanjani i-Cache-Aware FastConformer-RNNT

Imodeli inezicucu ezimbili eziyinhloko. Eyokuqala i-Cache-Aware FastConformer encoder enezendlalelo ezingama-24. I-FastConformer iwukuguquguquka okusebenzayo kwezakhiwo zeConformer. Isebenzisa ukunakwa okukala ngomugqa. Ucezu lwesibili luyi-RNNT (Recurrent Neural Network Transducer) idecoder. I-RNNT ikhipha uhlaka lombhalo ngohlaka njengoba umsindo usakazwa ngaphakathi.

Idizayini “ye-cache-aware” iyisilinganisi esisebenzayo. Ukusakaza-bukhoma okuvikelekile kucubungula kabusha amawindi omsindo agqagqene ngaso sonke isikhathi. Lokho kuphinda umsebenzi ofanayo futhi kwengeza ukubambezeleka. Le modeli igcina inqolobane yesifaki khodi sokuzinaka kanye nokwenza kusebenze kwe-convolution esikhundleni salokho. Iphinda isebenzise lezo zifundazwe ezigcinwe kunqolobane njengoba kufika umsindo omusha. Ngakho-ke uhlaka ngalunye lomsindo lucutshungulwa kanye ncamashi, ngaphandle kokugqagqana. Ukubambezeleka kwekhompyutha kanye nokuphela-kuya-sekupheleni kokubili kuyehla, ngaphandle kwesijeziso sokunemba.

I-Latency Knob: att_context_size

Isilungiselelo esisodwa sokucatshangelwa silawula ukuhwebelana kwe-latency-nemba. Usayizi womongo wokunaka, att_context_size. Umongo omncane ukhipha umbhalo ngokushesha kodwa ubona umsindo omncane wesikhathi esizayo. Umongo omkhulu uphakamisa ukunemba ekubambeni okuphezulu.

Indawo yokuhlola efanayo ifaka ububanzi obugcwele. Imephu yezilungiselelo ukuze ihlukanise osayizi abangu-80ms, 160ms, 320ms, 560ms, no-1.12s. Ngokwesibonelo, [56,0] inikeza imodi yokubambezeleka engu-80ms ephezulu kakhulu. I [56,13] ukulungiselelwa kunikeza i-1.12s ngokunemba okuphezulu. Amaqembu akhetha indawo yokusebenza ngesikhathi esinqunyiwe, ngaphandle kokuqeqeshwa kabusha.

Ukutholwa Nokufakwa Kolimi

Izilimi ezingu-40 zifaka okuhlukile kwesiNgisi, iSpanishi, isiJalimane, nesiFulentshi. Zihlanganisa nesi-Arabhu, isiJapane, isiKorea, isiMandarin, isiHindi nesiThai. Ezinye izilimi ezimbalwa zaseYurophu naseNordic nazo zifakiwe.

Isimo solimi sisebenza ngezindlela ezimbili. Ukusetha target_lang endaweni eyaziwayo ngokuvamile inikeza ukunemba okungcono kakhulu. Ukusetha target_lang=auto ivumela imodeli ukuthi ibone ulimi ngokwalo. Kumodi ezenzakalelayo, ikhipha ithegi yolimi ngemva kwezimpawu zokubhala zetheminali. Ukuthunyelwa okukodwa kungase kubhale ithrafikhi yolimi oluxubile. Ayikho ingxenye ehlukene ye-ID yolimi edingekayo.

Ukuqhathanisa

Umkhiqizo Inkampani Ukufinyelela Ukusakaza bukhoma Ukusabalala kolimi Ukubambezeleka okubikiwe Imodeli yentengo
I-Nemotron 3.5 ASR I-NVIDIA Vula izisindo (OpenMDW-1.1), ukuzibamba; umbuki zindwendwe ngu DeepInfra Yebo – i-cache-aware FastConformer-RNNT Izilimi-zendawo ezingama-40 80ms–1.12s, okulungisekayo ngokuqonda Kumahhala ukuzibamba; isekelwe ekusetshenzisweni ngomsingathi
Ukuhleba okukhulu-v3 I-OpenAI Vula izisindo (MIT), ukuzibamba; I-API Cha – okungaxhunyiwe ku-inthanethi/inqwaba ~99 izilimi Akukona okomdabu wokusakaza Ukuzibamba mahhala; I-API ~$0.006/min (inqwaba)
I-Nova-3 I-Deepgram I-API evaliwe; endaweni/uzibambele ngokwakho (ibhizinisi) Yebo – ukusakaza + inqwaba Izilimi eziningi; +10 izilimi zolimi olulodwa zengezwe Jan 2026 Ukusakaza-bukhoma okuphansi (okubikiwe kwe-sub-300ms) ~$0.0077/min (Nova-3 Monolingual, PAYG)
Ukusakazwa kwe-Universal-3 Pro InhlanganoAI I-API evaliwe (i-EU endpoint iyatholakala) Yebo Izilimi eziyisi-6: isiNgisi, iSpanishi, isiFulentshi, isiJalimane, isiNtaliyane, isiPutukezi I-Sub-300ms (esemthethweni); ingxenye yokuqala ~ 750ms Isekelwe ekusetshenzisweni (PAYG)
Scribe v2 Isikhathi sangempela ElevenLabs I-API evaliwe Yebo Izilimi ezingu-90+ (99 per ElevenLabs) ~150ms (p50) ~$0.28/ihora
Ursa / ukusakaza I-Speechmatics I-API + on-premise + onqenqemeni Yebo – ukusakaza + inqwaba Izilimi ezingu-50+ ezinokuhlonza okuzenzakalelayo Ukubambezeleka okuphansi kakhulu (okumi) Ibhizinisi/ukusetshenziswa

Ukushuna Kahle Imiphumela

Ngenxa yokuthi izisindo zivuliwe, amaqembu angakwazi ukushuna kahle ulimi, isizinda, noma indlela yokuphimisa. I-NVIDIA ishicilele isibonelo esisetshenziwe ngesiGreki nesiBulgaria. Ilungise kahle indawo yokuhlola eyisisekelo ngeresiphi efanayo ye-Cache-Aware FastConformer-RNNT. Isiqeshana ngasinye siphethe a target_lang umaka wokulungisa ulimi. Idatha yokuqeqesha ivele kuzinkampani zomphakathi, okuhlanganisa iGranary, i-Common Voice, ne-FLEURS.

Imiphumela ikalwe njenge-WER ku-FLEURS ebambezelekile, ngokulungiselelwa kwe-80ms. I-Greek WER yehle isuka ku-35 yaya ku-24, ukuthuthuka okuhlobene okungama-32%. IsiBulgaria sehle sisuka ku-22 saya ku-15, ukuthuthuka okuhlobene okungama-31%. Lawa amaphesenti e-WER angakaphesenti kumodi yokusakaza yokubambezeleka ephansi kakhulu. I-NVIDIA iphawula ukuthi ukuhlola ekusetshenzisweni kwe-latency, kudatha ebanjiwe, kunikeza izinombolo ezithembekile.

Amandla Nokucatshangelwa

Amandla:

  • Indawo yokuhlola eyodwa engu-600M-ipharamitha ihlanganisa izilimi-zendawo ezingama-40, ukunqamula ukusakazeka kokuthunyelwa.
  • Ukusakaza-bukhoma okuqaphela inqolobane kucubungula ifreyimu ngayinye kanye, kubikwe ku-17x buffered concurrency ku-H100.
  • att_context_size ishuna ukubambezeleka ukusuka ku-80ms kuye ku-1.12s ngokuqonda, ngaphandle kokuqeqeshwa kabusha.
  • Izimpawu zokuloba, osonhlamvukazi, kanye auto ukumaka ngolimi kwakhiwe ngaphakathi.
  • Izisindo ezivuliwe zinike amandla ukwehla okungu-31–32% kwe-WER ku-Greek nesi-Bulgarian ngemva kokulungiswa kahle.

Okucatshangwayo:

  • Imodeli iphatha isiNgisi, kodwa i-NVIDIA incoma imodeli yayo ezinikele yesiNgisi ukuze isetshenziswe isiNgisi kuphela.
  • Imodi ye-80ms ihweba ngokunemba okuthile ukuze uthole ukubambezeleka okuphansi kakhulu.
  • IsiJapane nesiKorea zisebenzisa i-CER, ngakho-ke ukuqhathanisa kwamaphutha olimi oluhlukene kudinga ukunakekelwa.
  • Izibalo zokuphuma zilinganiswa ku-H100, ngakho imiphumela kwamanye ama-GPU izohluka.
  • Ukukhiqizwa kwe-NIM enokusakazwa kwe-gRPC kuyamenyezelwa, kodwa akukakakhululwa.

Okuthathwayo Okubalulekile

  • I-NVIDIA's Nemotron 3.5 ASR iyisindo esivulekile (OpenMDW-1.1), imodeli yokusakaza yepharamitha engu-600M ebhala izilimi zesifunda ezingu-40 endaweni yokuhlola eyodwa.
  • Idizayini yayo ye-Cache-Aware FastConformer-RNNT icubungula uhlaka ngalunye lomsindo kanye, kubikwe ngo-17x ukusakazwa okufanayo kwezindlela ezivinjiwe ku-H100.
  • Ukubambezeleka kuyalungiseka ukusuka ku-80ms kuye ku-1.12s lapho kucatshangwa khona att_context_sizengaphandle kokuqeqeshwa kabusha.
  • Ukusikwa okucolisekileyo okufushane kwe-FLEURS WER 32% ngesi-Greek (35→24) kanye no-31% kusi-Bulgarian (22→15), ekusethweni kwe-80ms.
  • Iyazibambela yona futhi ingowomdabu wokusakaza, ngokungafani nama-API avaliwe (Deepgram, AssemblyAI, ElevenLabs) noma i-Whisper engaxhunyiwe ku-inthanethi.

Isichazi Esibonakalayo sikaMarktechpost


I-NEMOTRON 3.5 ASR
1 / 10

NVIDIA · INKULUMO ESAKAZAYO AI · IVULA IZISIndo

I-Nemotron 3.5 ASR

Imodeli yokusakaza yenqolobane yepharamitha engu-600M ebhala izilimi zendawo ezingu-40 ngesikhathi sangempela, ukusuka endaweni yokuhlola eyodwa.

600M amapharamitha
Izilimi-zendawo ezingama-40
80ms–1.12s ukubambezeleka
I-OpenMDW-1.1

01 – YINI

Imodeli eyodwa, izilimi-zendawo ezingama-40

  • Iyandisa nvidia/nemotron-speech-streaming-en-0.6b ngokulungiswa kwe-ID yolimi okusekelwe ngokushesha.
  • Indawo yokuhlola eyodwa yepharamitha engu-600M ifaka izilimi zendawo ezingama-40. Akukho ukushintshaniswa kwamamodeli.
  • Izimpawu zokubhala nosonhlamvukazi zakhelwe ngaphakathi. Asikho isinyathelo esihlukile sangemva kokucubungula.
  • Iqondise imithwalo yomsebenzi emibili: ukusakaza okubambezeleke okuphansi kanye nenqwaba yokukhiqiza okuphezulu.
  • I-NVIDIA isancoma imodeli yayo yesiNgisi kuphela ukuze isetshenziswe ngesiNgisi kuphela.

02 – ARCHITECTURE

I-Cache-Aware FastConformer-RNNT

  • Isishumeki se-FastConformer sezendlalelo ezingama-24 esibhangqwe nesikhiphi khodi se-RNNT.
  • Ukusakaza-bukhoma okuvikelekile kucubungula kabusha amawindi omsindo agqagqene ngaso sonke isikhathi.
  • Le modeli igcina inqolobane yesifaki khodi sokuzinaka kanye nezimo ze-convolution, bese iphinda izisebenzisa.
  • Uhlaka ngalunye lomsindo lucutshungulwa kanye ncamashi, ngaphandle kokugqagqana.
  • Bala futhi unciphise ukubambezeleka kokuphela, ngaphandle kwesijeziso sokunemba.

03 – I-LATENCY KNOB

Isilungiselelo esisodwa siqhathanisa ukubambezeleka ngokumelene nokunemba

att_context_size I-Chunk (ukubambezeleka) Sebenzisa icala
[56,0] 80ms (Ultra-Low) Abenzeli bezwi ababambezeleke kakhulu
[56,1] 160ms (Phansi) Ama-ejenti ezwi asebenzisanayo
[56,3] 320ms (Kulinganisiwe) Ingxoxo ye-AI, amazwibela abukhoma
[56,6] 560ms (Maphakathi) Ukunemba okuphezulu, ukubambezeleka okunengqondo
[56,13] 1.12s (Phezulu) Ukunemba okuphezulu

Indawo yokuhlola efanayo, ekhethwe ngesikhathi sokubikezela. Akukho ukuqeqeshwa kabusha okudingekayo.

04 — IZILIMI NOKUTHOLA

Ikhava kanye ne-ID yolimi ezenzakalelayo

  • Izilimi ezingu-40, okuhlanganisa isiNgisi, iSpanishi, isiJalimane, nesiFulentshi okuhlukile.
  • Futhi ihlanganisa isi-Arabhu, isiJapane, isiKorea, isiMandarin, isiHindi, nesiThai.
  • Setha target_lang endaweni eyaziwayo ngokunemba okungcono kakhulu.
  • Setha target_lang=auto ukuvumela imodeli ukuthi ibone ulimi.
  • Kumodi ezenzakalelayo, ikhipha ithegi yolimi ngemva kwezimpawu zokubhala zetheminali.
  • Ukuthunyelwa okukodwa kusingatha ithrafikhi yolimi oluxubile, ngaphandle kwengxenye ye-ID yolimi ehlukile.

05 – NGENXA

Uhhafu wosayizi, ukusakaza okuningi ngesikhathi esisodwa

  • I-NVIDIA iyiqhathanisa ne-Parakeet RNNT 1.1B yezilimi eziningi, esebenzisa ukusakazwa kwe-buffered.
  • I-Nemotron 3.5 ASR icishe ilingane nohhafu kasayizi: 0.6B uma iqhathaniswa no-1.1B.
  • Ithimba libika izikhathi ezingu-17 ukusakaza okufanayo kwezindlela ezigcinwe ku-buffered, ku-H100 efanayo.
  • Ukugwema ukubala kabusha okungafuneki kwehlisa izindleko zokusakaza ngakunye ekukhiqizeni.

Isibalo esingu-17x sisuka esimemezelweni sokukhululwa; ikhadi eliyimodeli lisho isimangalo sekhwalithi ngokuqondile.

06 – IMIPHUMELA YOKULUNGISA

Ukushuna okufushane kuphakamisa izilimi ezibuthakathaka

Ulimi Isisekelo se-WER Kushunwe kahle Isihlobo
isiGreki 35 24 32%
IsiBulgaria 22 15 31%

I-WER eluhlaza (%) ku-FLEURS ebanjiwe kusethingi ye-80ms. Idatha: I-Granary, Izwi Elivamile, i-FLEURS.

07 – UKUTHOLAKALA NOKUFINYELELA

Vula izisindo, kanye nendlela esingethwe

  • Isisindo ku-Hugging Face ngaphansi kwelayisensi ye-OpenMDW-1.1.
  • Isikhathi sokusebenza yi-NeMo 26.06 noma entsha. Okokufaka kufanele kube yisiteshi esisodwa.
  • Ibanjwe ku-DeepInfra, enezela ukuthuthukiswa kwamagama ku-vocabulary yesizinda.
  • I-NVIDIA ithi ukukhishwa kwe-NIM kuhlelelwe kamuva enyangeni, ngokusakazwa kwe-gRPC.
  • Ukusekelwa kwe-GPU eshiwo: Ampere, Hopper, Blackwell, Lovelace, Turing, Volta, neJetson.

08 – IQHATHANISWA KANJANI

Lapho ehlala khona endaweni

Umkhiqizo Ukufinyelela Ukusakaza Izilimi
I-Nemotron 3.5 ASR Vula izisindo Abomdabu 40 izindawo
Ukuhleba okukhulu-v3 Vula izisindo Cha (inqwaba) ~99
I-Deepgram Nova-3 I-API / on-prem Abomdabu Izilimi eziningi
I-AssemblyAI U-3 Pro I-API Abomdabu 6
I-ElevenLabs Scribe v2 I-API Abomdabu 90+
I-Google Chirp / Azure I-API Abomdabu 100+ / 140+

Ukubambezeleka kanye ne-WER akufani ngqo nabathengisi bonke; lokhu kuqhathanisa isakhiwo, hhayi izinga.

09 – IZINDABA EZINKULU

Inguqulo emfushane

  • Imodeli yokusakaza ye-open-weights 600M ebhala izilimi-zendawo ezingu-40 endaweni yokuhlola eyodwa.
  • Idizayini eqaphela inqolobane icubungula ifreyimu ngayinye kanye; kubikwe i-17x buffered concurrency ku-H100.
  • Ukubambezeleka kuyalungiseka ukusuka ku-80ms kuye ku-1.12s lapho kucatshangwa khona, ngaphandle kokuqeqeshwa kabusha.
  • Ukusika okucolisekileyo okufushane kwe-FLEURS WER 32% (isiGreki) no-31% (isiBulgaria).
  • Ukuzibamba ngokwakho futhi ukusakaza-bukhoma, ngokungafani nama-API avaliwe noma i-Whisper engaxhunyiwe ku-inthanethi.

Ikhethelwe onjiniyela be-AI ngu I-Marktechpost – Ukufakwa kochwepheshe bokuqala kwe-AI ne-ML.


Hlola Izisindo zemodeli. Futhi, zizwe ukhululekile ukusilandela Twitter futhi ungakhohlwa ukujoyina wethu 150k+ ML SubReddit futhi Bhalisela ku Iphephandaba lethu. Linda! ukutelegram? manje ungasijoyina kuthelegramu futhi.

Udinga ukusebenzisana nathi ekuthuthukiseni i-GitHub Repo yakho NOMA Ikhasi Lobuso Lokugona NOMA Ukukhishwa Komkhiqizo NOMA I-Webinar njll.? Xhuma nathi


Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button