I-NVIDIA AI Ikhipha I-Star Elastic: Indawo Yokuhlola Eyodwa Equkethe 30B, 23B, kanye ne-12B Amamodeli Okubonisana Anokusikela I-Zero-Shot

Ukuqeqesha umndeni wamamodeli wezilimi ezinkulu (LLMs) kuhlala kufika nokuphindaphinda okubuhlungu: yonke imodeli ehlukile emndenini—noma ngabe i-8B, 30B, noma i-70B—ngokujwayelekile idinga ukuqeqeshwa kwayo okugcwele, isitoreji sayo, nesitaki sayo sokuphakelwa. Eqenjini le-dev elisebenzisa i-inference esikalini, lokhu kusho ukuphindaphinda izindleko zokubala ngenani lamamodeli osayizi abafuna ukuwasekela. Abacwaningi be-NVIDIA manje baphakamisa indlela ehlukile ebizwa ngokuthi Inkanyezi ye-Elastic.
Inkanyezi ye-Elastic iyindlela yangemuva kokuqeqeshwa eshumeka amamodeli amancane afakwe esidlekeni amaningi—kumabhajethi epharamitha ahlukene—ngaphakathi kwemodeli yokucabanga yomzali oyedwa, kusetshenziswa ukuqeqeshwa okukodwa. Kusetshenziswe ku I-Nemotron Nano v3 (imodeli eyingxube ye-Mamba–Transformer–MoE enamapharamitha angu-30B esewonke kanye namapharamitha asebenzayo angu-3.6B), i-Star Elastic ikhiqiza okungu-23B (2.8B okusebenzayo) no-12B (2.0B okusebenzayo) okuhlukile okufakwe esidlekeni okuqeqeshwe cishe ngamathokheni angu-160B. Zontathu lezi zinhlobonhlobo zihlala endaweni yokuhlola eyodwa futhi zingakhishwa ngaphandle kokucushwa okwengeziwe.
Kusho ukuthini okuthi “Nested” Empeleni lapha
Uma ungakaze uhlangabezane nezakhiwo ezinwebekayo noma ezifakwe isidleke ngaphambili, umbono uthi: esikhundleni sokuqeqesha amamodeli amathathu ahlukene we-30B, 23B, kanye ne-12B, uqeqesha imodeli eyodwa iqukethe amancane njengamasethi angaphansi kwawo. Amamodeli amancane asebenzisa kabusha izisindo ezibaluleke kakhulu ezivela kumzali, ezikhonjwe ngenqubo ebizwa ukubaluleka kokulinganisa.
I-Star Elastic ihola ingxenye yemodeli ngayinye: iziteshi ezishumekayo, izinhloko zokunaka, amakhanda e-Mamba SSM, ochwepheshe be-MoE, namashaneli e-FFN ngokuthi anikela kangakanani ekunembeni kwamamodeli. Izingxenye zibe sezilinganiselwa futhi zihlungwe, ngakho amamodeli amancane esabelomali ahlala esebenzisa isethi engaphansi enezinga eliphezulu kakhulu yezingxenye ezivela kumodeli enkulu. Le ndawo ibizwa ngokuthi ukwabelana ngesisindo esidlekeni.
Indlela isekela ukwakha izidleke ngokuhambisana nezimbazo eziningi: ubukhulu be-SSM (State Space Model), iziteshi zokushumeka, amakhanda okunaka, amakhanda eMamba namashaneli ekhanda, ukubala kochwepheshe be-MoE, kanye nobukhulu obumaphakathi be-FFN. Okwezendlalelo ze-MoE ngokuqondile, i-Star Elastic isebenzisa I-Router-Weighted Expert Activation Pruning (REAP)elinganisa ochwepheshe ngawo womabili amanani wesango lomzila futhi ubukhulu bokuphuma kochwepheshe—isignali enesimiso kakhulu kunokuthena okusekelwe emazingeni avamile, okuziba ukuthi uchwepheshe ngamunye unikela kangakanani ekuphumeni kwesendlalelo.
Irutha Efundekayo, Hhayi Iresiphi Yokucindezela Okugxilile
Umehluko obalulekile ezindleleni zangaphambili zokuminyanisa njengeMinitron ukuthi i-Star Elastic isebenzisa i irutha eqeqeshekayo esuka ekupheleni ukuya ekupheleni ukucacisa izakhiwo ze-submodel ezisidleke. Irutha ithatha ibhajethi eqondiwe (isb, “nginike imodeli yepharamitha esebenzayo engu-2.8B”) njengokufaka okukodwa okushisayo futhi ikhiphe imaski ehlukanisekayo ekhetha ukuthi yiziphi izingxenye ezisebenzayo kulelo zinga lebhajethi. Lezi zimaski ziqeqeshwa ngokuhlanganyela nemodeli ngokusebenzisa I-Gumbel-Softmaxokuvumela ukugeleza kwe-gradient ngezinqumo ezihlukene zezakhiwo.
Umsebenzi wokulahlekelwa uhlanganisa ulwazi lwe-distillation (KD) lapho imodeli yomzali enganwetshiwe isebenza njengothisha ngokulahlekelwa kwerutha okujezisa ukuchezuka kubhajethi yensiza eqondisiwe (isibalo sepharamitha, inkumbulo, noma ukubambezeleka). Lokhu kusho ukuthi irutha ifunda ukwenza ukukhetha kwezakhiwo ezithuthukisa ukunemba ngaphansi kwe-KD, kunokunciphisa nje imethrikhi yommeleli.
Ukuqeqeshwa kusebenzisa a ikharikhulamu yezigaba ezimbili: isigaba somongo omfishane (ubude bokulandelana okungu-8,192) esinamasampula ebhajethi afanayo, alandelwa yisigaba somongo owandisiwe (amathokheni angu-49,152 ubude bokulandelana) namasampula angalingani abeka kuqala imodeli engu-30B egcwele (p(30B)=0.5, p(23B)=0.3, p(12B)=0.2). Isigaba somongo esandisiwe sibalulekile ekusebenzeni kokucabanga. Ukukhishwa kwethimba locwaningo ku-Nano v2—okukhiqizwe kabusha ngokusobala njengesisekelo sokuqina sokukhetha ikharikhulamu efanayo ku-Nano v3 kubonisa izinzuzo ezingafika ku-19.8% ku-AIME-2025 kokuhlukile kwe-6B kanye namaphesenti angu-4.0 okuhlukile kwe-12B kusukela kuSigaba 2 kuphela, okukhuthaza ukusetshenziswa kwayo lapha.
Ukulawulwa Kwesabelomali Okunwebekayo: Amamodeli Ahlukene Ezigaba Ezihlukene Zokubonisana
Ukulawulwa kwesabelomali okukhona kumamodeli okucabanga okuhlanganisa nokuziphatha okuzenzakalelayo kwe-Nemotron Nano v3 kusebenza ngokufaka inani lamathokheni akhiqizwe ngesikhathi isigaba ngaphambi kokuphoqa impendulo yokugcina. Le ndlela isebenzisa imodeli efanayo kulo lonke. I-Star Elastic ivula isu elihlukile: ukusebenzisa ama-submodels ahlukahlukene kwesigaba sokucabanga ngokumelene nesigaba sokuphendula.
Abacwaningi bahlole ukucushwa okune. Elilungile, elibizwa ℳS → ℳL (imodeli encane yokucabanga, imodeli enkulu yokuphendula), yabela imodeli eshibhile ukuze ikhiqize imikhondo yokucabanga eyandisiwe futhi igodle imodeli enekhono eligcwele lokuhlanganisa impendulo yokugcina. Ukucushwa kwe-23B → 30B ikakhulukazi kuthuthukisa ukunemba-ukubambezeleka komngcele wePareto, kufinyelele ku- 16% ukunemba okuphezulu kanye nokubambezeleka okuphansi okungu-1.9× uma kuqhathaniswa nokulawula isabelomali se-Nemotron Nano v3. I-intuition: amathokheni okucabanga anevolumu ephezulu kodwa ayabekezelela ukuncishiswa kwamandla athile; impendulo yokugcina idinga ukunemba okuphezulu.
I-Quantization Ngaphandle Kokwephula Isakhiwo Esisidleke
Indlela ewubuwula yokusebenzisa imodeli yokunwebeka yokulinganisa ingaba ukulinganisa ukwahluka ngakunye ngokuhlukana ngemva kokusikwa. Lokho kwephula isakhiwo sokwabelana ngesisindo esisidleke futhi kudinga iphasi yokulinganisa ehlukile ngosayizi ngamunye. Kunalokho, i-Star Elastic iyasebenza I-Quantization-Aware Distillation (QAD) ngqo endaweni yokuhlola enwebekayo, igcina isizinda semaski esidleke kulo lonke.
Ku-FP8 (ifomethi ye-E4M3), i-post-training quantization (PTQ) yanele, ithola u-98.69% wokunemba kwe-BF16 kokuhlukile okungu-30B. Ku-NVFP4 (ifomethi yephoyinti elintantayo le-NVIDIA ye-4-bit), i-PTQ iyodwa ibangela ukwehla kokunemba okumaphakathi okungu-4.12%, ngakho-ke isigaba esifushane esifakwe esidlekeni se-QAD (amathokheni angu-5B kumongo we-48K) sibuyisela ukululama ku-97.79% kokuhlukile okungu-30B. Kuzo zombili izimo, ukusikwa kwe-zero-shot kokuhlukile okungu-23B no-12B kusuka endaweni yokuhlola eyodwa elinganiselwe kuyagcinwa.
Imiphumela yenkumbulo ibalulekile. Ukugcina izindawo zokuhlola ezihlukene ze-12B, 23B, ne-30B BF16 kudinga u-126.1 GB; indawo yokuhlola eyodwa enwebekayo idinga u-58.9 GB. Iphoyinti lokuhlola elinwebekayo le-30B NVFP4 lilingana ku-18.7 GB, livumela okuhlukile kwe-12B NVFP4 ukuthi kusebenze ku-RTX 5080 lapho konke ukucushwa kwe-BF16 kuphelelwa inkumbulo. Ku-RTX Pro 6000, okuhlukile kwe-12B NVFP4 kufinyelela kumathokheni/s angu-7,426, ukuthuthukiswa kokuphuma kwe-3.4× ngaphezu kwesisekelo esingu-30B BF16.
Ukujula vs. Ububanzi: Kungani i-Star Elastic Icindezela Ububanzi
Inketho eyodwa yedizayini okufanele ibize ngokucacile: ithimba labacwaningi liqhathanise amasu amabili okucindezela—ukususa izendlalelo ngokuphelele (ukucindezela okujulile) kuqhathaniswa nokunciphisa ubukhulu bangaphakathi njengosayizi ofihliwe, ukubala kochwepheshe, nokubala kwekhanda (ukuminyanisa ububanzi). Ngokuncishiswa kwepharamitha ngo-15% kanye namathokheni angama-25B okukhipha ulwazi, ukucindezelwa kobubanzi kutholakele 98.1% yokusebenza kwesisekelo kuyilapho ukucindezelwa okujulile kutholakale kuphela 95.2%ngokonakaliswa okuphawulekayo ku-HumanEval ne-MMLU-Pro. Ngenxa yalokho, i-Star Elastic ibeka phambili ukunwebeka okusekelwe kububanzi emiphumeleni yayo eyinhloko, nakuba ukucindezelwa kokujula (ukweqa ungqimba) kuhlala kutholakala njengendlela yezimo ezicindezela kakhulu ukubambezeleka.
Kuhlelo lokuhlola—i-AIME-2025, GPQA, LiveCodeBench v5, MMLU-Pro, IFBench, ne-Tau Bench—okuhlukile kwe-Elastic-30B kufana nomzali wayo i-Nemotron Nano v3 30B kumabhentshimakhi amaningi, kuyilapho i-Elastic-23B ne-Elastic-23B ne-Elastic-12B ehlukile isala ihambisana nosayizi ofanayo we-completitive. I-Elastic-23B ithola amaphuzu ngokuphawulekayo angu-85.63 ku-AIME-2025 iqhathaniswa ne-Qwen3-30B-A3B's 80.00, naphezu kokuba namapharamitha asebenzayo ambalwa.
Ezindlekweni zokuqeqesha, ithimba labacwaningi libika a 360× ukuncishiswa kwamathokheni uma kuqhathaniswa nokuqeqesha kusengaphambili ukwahluka ngakunye kusukela ekuqaleni, kanye a 7 × ukunciphisa ngaphezu kwezindlela zangaphambili zokuminyanisa zesimanjemanje ezidinga ama-distillation alandelanayo asebenza ngosayizi wemodeli ngayinye. Okuhlukile kwe-12B kusebenza ku-2.4× ukuphuma komzali ongu-30B ku-H100 GPU ku-bfloat16 enobude bokulandelana kokufakwayo/kokukhiphayo okufanayo.
Ungayisebenzisa kanjani i-NVIDIA Star Elastic
Isinyathelo ngesinyathelo Umhlahlandlela
I-Nemotron Nano v3 Elastic — 30B / 23B / 12B endaweni yokuhlola eyodwa · BF16 / FP8 / NVFP4
Isinyathelo 1 kwe-5
Okuthathwayo Okubalulekile
- I-Star Elastic iqeqesha i-30B, 23B, kanye ne-12B amamodeli okucabanga anesidleke asuka ekugijimeni okukodwa kwe-160B-token post-training, ukuzuza ukuncipha kwethokheni okungu-360× ngaphezu kokuziqeqesha kusukela ekuqaleni.
- Ukulawulwa kwebhajethi enwebekayo (23B yokucabanga, 30B yokuphendula) kuthuthukisa ukunemba-ukubambezeleka komngcele wePareto ngokunemba okungafika ku-16% kanye nezinzuzo zokubambezeleka okungu-1.9×.
- Irutha efundekayo ene-Gumbel-Softmax inika amandla ukukhethwa kwezakhiwo eziqeqeshekayo kokuphela kuye ekupheleni, isusa isidingo sokuminyaniswa okuhlukene ngosayizi wemodeli ngayinye.
- I-Nest QAD igcina ukusikwa kwe-zero shot kuzo zonke izindawo zokuhlola ezilinganiselwe ze-FP8 ne-NVFP4, yehlisa indawo yokuhlola engu-30B eya ku-18.7 GB ku-NVFP4.
- Zonke izinhlobo ezintathu zokunemba (BF16, FP8, NVFP4) zitholakala esidlangalaleni ku-Hugging Face ngaphansi
nvidia/NVIDIA-Nemotron-Labs-3-Elastic-30B-A3B.
Hlola Iphepha, Amamodeli Elastic ku Ubuso Obugonayo BF16, FP8 futhi I-NVFP4 . Futhi, zizwe ukhululekile ukusilandela Twitter futhi ungakhohlwa ukujoyina wethu 150k+ ML SubReddit futhi Bhalisela ku Iphephandaba lethu. Linda! ukutelegram? manje ungasijoyina kuthelegramu futhi.
Udinga ukusebenzisana nathi ekuthuthukiseni i-GitHub Repo yakho NOMA Ikhasi Lobuso Lokugona NOMA Ukukhishwa Komkhiqizo NOMA I-Webinar njll.? Xhumana nathi



