I-AlpamayoR1: Amamodeli Amakhulu Wesizathu Sokucabanga Wokushayela Ngokuzenzakalelayo

bathathe umhlaba wokushayela ngokuzenzakalelayo ngesakhiwo sabo esisha se-AlpamayoR1 esihlanganisa i-Vision-Language Model enkulu njengomgogodla wokucabanga onesisekelo. Lokhu kukhululwa, okuhambisana nedathasethi entsha yezinga elikhulu nesifanisi sokushayela esinesithombe sangempela, kakade kubeka inkampani njengomunye wabadlali abakhulu kulo mkhakha ngo-2026.
Kulesi sihloko, sizodiliza i-Architecture ye-AlpamayoR1, uchungechunge lwezizathu zokucabanga, kanye nenqubo yokuqeqesha enemininingwane esetshenziswa ukuqeqesha imodeli.
Isimo Samanje Sokushayela Ngokuzenzakalelayo
Ukukhishwa kwe-AlpamayoR1 (AR1) kuthola umongo kupharadigm yamanje ye-End-to-End (E2E) yezakhiwo. Amamodeli e-E2E ahlose ukwenza imephu okokufaka kwezinzwa okungaphekiwe (amakhamera, i-LiDAR, i-radar, …) kuma-trajectories esakhiweni esihlukaniseka ngokugcwele esithuthukisa umgomo obumbene.
Ukuthambekela okuvelayo ku-E2E kuhilela ukusebenzisa ulwazi olubanzi lomhlaba lwamamodeli amakhulu e-Vision-Language Models (VLMs) ukuze kubhekwane nezimo zokushayela eziyinkimbinkimbi. Lokhu ngokuvamile kuhilela ukusebenzisa ama-VLM njengesisekelo sokucabanga ukwazisa ama-trajectories esikhathi esizayo noma njengothisha abangochwepheshe ukuze banikeze isignali yokugada kumamodeli amancane abafundi.
I-AR1 Architecture
I-AR1 iyisibonelo esiyinhloko sendlela yokucabanga-VLM-as-a-backbone. Naphezu kobukhulu bawo obukhulu, i-architecture ithuthukiselwe ukusetshenziswa komhlaba wangempela futhi isebenzisa ukubambezeleka kwe 99ms noma 10Hz ku-BlackWell GPU eyodwa, ebhekwa njengethagethi evamile ngenxa yezizathu zokuphepha. Kulesi sigaba, sizodiliza izakhiwo kanye nokuqanjwa kwazo okuningi.
Isifaki khodi sombono
I-AR1 isebenzisa kokubili okokufaka okubukwayo nokombhalo ngendlela yokuphakelayo kwekhamera yamathokheni kanye nemiyalelo yolimi lwemvelo. Ngokusebenza, kubalulekile ukuthi isifaki khodi sombono sikhiqize amathokheni ambalwa ngangokunokwenzeka.
Kuze kube manje, ababhali basebenzisa i-Vision Transformer (ViT)[2] okokufaka uphawu kwesithombe esisodwa. Izithombe ze-ViTs ezihlukanisayo ngokulandelana kwamathokheni afakwe ikhodi yi-transformer evamile. Qaphela ukuthi ukuhlanganiswa kwama-algorithms asebenza kahle njenge-Flex [3] amathokheni wevidiyo amaningi asele ngomsebenzi wesikhathi esizayo.
![I-Vision Transformer architecture, umthombo: [2]](https://contributor.insightmediagroup.io/wp-content/uploads/2026/02/image-59-1024x572.png)
Umgogodla Wokubonisana
I-architecture ye-AR1 yakhiwe eduze kwe-Cosmos-Reason, enye ye-VLM ye-Nvidia eqeqeshelwe ukucabanga okuhlanganisiwe ezimweni zokusebenzisa i-Physical AI. Isethi yayo evamile yokuqeqeshwa ihlanganisa amasampuli angu-3.7M ajwayelekile e-Visual Question-Answering (VQA) ukuze kuthuthukiswe isethi evamile evamile yemodeli, ehambisana namasampuli okushayela angu-24.7K. Lokhu kufaka phakathi ividiyo ye-VQA echazwe nge-DeepSeek-R1 yokulandela umkhondo ukuze ubikezele isenzo esilandelayo.
I-Cosmos-Reason icubungula amathokheni abonakalayo nawemibhalo kanye nomlando we-ego wakamuva (izikhundla ze-xy ezedlule kanye ne-engeli ye-ego-vehicle) ukuze okukhiphayo uchungechunge lwe-causation iminonjana yokucabanga ukwazisa imikhondo yesikhathi esizayo.
I-Chain of Causation
Umkhawulo obalulekile wamamodeli olimi ulele ekungaqondakalini okungokwemvelo kwamalebula ombhalo kumasethi edatha abukwayo. Lokhu kufaka phakathi izincazelo ezingacacile ezingenazo isakhiwo esiyimbangela. Amamodeli aqeqeshwe kudatha enjalo abonisa ukuhlobana okuphansi phakathi kokulandela umcabango nezenzo ezibikezelwe kanye nokudideka okuyimbangela.

Kumenzeli ohlanganisiwe njengemoto ezimele, amakhono aqinile okucabanga abalulekile. Ukuze ligweme lezo zinkinga, ithimba le-Nvidia lenze imizamo ebalulekile yokwakha idathasethi yokushayela enezichasiselo ezingaguquki.
Ngokukhethekile, idathasethi iqukethe iziqeshana ezingamasekhondi angu-20 ezikhishwe ekurekhodweni kokushayela komhlaba wangempela ezindaweni nasemazweni ahlukahlukene. Isiqeshana ngasinye siqukethe imizuzwana emi-2 yokuqukethwe okuholela esinqumweni sokushayela (isb. ukudlula, ukuhoxa, ukudlula umpambanomgwaqo, …) nemiphumela yako. Isakhiwo esiyimbangela salezi zimo sivezwa izichasiselo zombhalo ezingaguquki ezilandela isifanekiso esiqinile.

I-10% yokuqala yedathasethi ichazwa abantu, kuyilapho okusele kuchazwe ama-VLM asezingeni eliphezulu njenge-GPT5 ukuze kukale inqubo yokulebula. Nakulokhu, imizamo ebalulekile isetshenziswa ukuze kuqinisekiswe ukuvumelana, ikhwalithi kanye nokunemba kwalezi zichasiselo zomuntu neze-AI.

I-Decoder ye-Trajectory
Isinyathelo sokugcina sephasi eya phambili sihlanganisa ukuqopha ukulandelana kokucabanga kube umzila wamaphoyinti angu-64. Nakuba ama-trajectories evamise ukukhishwa amakhodi njengokulandelana kwama-waypoints (xy coordinates), ithimba le-Nvidia lithole ukuthi ukusebenzisa i-unicycle dynamics (okungukuthi ukukhiqiza ukulandelana kwamanani okusheshisa nama-engeli okuqondisa) kukhiqize imiphumela engaguquki. Ikakhulukazi, kusiza umsebenzi wokufunda ngokuvimbela imodeli ekubikezeleni imikhondo engenakwenzeka ngokomzimba (isb. iphuzu t ukuba kude kakhulu nephoyinti t+1).
Kuyathakazelisa ukuthi ababhali bamukela ukumelwa okumbaxambili kwe-trajectory lapho imodeli ngokuzenzakalelayo ikhiqiza amathokheni acacile ngesikhathi sokuqeqeshwa futhi isebenzisa ukufaniswa kokugeleza ukuze kukhiqizwe i-trajectory eqhubekayo ngesikhathi sokunquma. Izizathu eziyinhloko zalo mklamo yilezi ezilandelayo:
- Isikhala Sethokheni Yesenzo Sokucabanga Esihlanganyelwe: Ukusebenzisa amathokheni esenzo ahlukene kuvumela ukuhlangana okuqinile phakathi kokulandela umcabango nezenzo. Lapho imodeli ikhiqiza umkhondo wokucabanga, amathokheni alandelayo ngokulandelana kwawo (ukusheshisa nokugoba) axhunyaniswa ngokwezibalo kuleyo ncazelo, avimbele ukubona izinto ezingekho.
- Ukukhuthaza Ukulungiselelwa Kwe-RL: Ukukhawulela isethi yamathokheni esenzo okungenzeka kube yisethi ehlukile kwenza ukulungiselelwa kwe-RL kube lula kakhulu. Impela, ukuthatha isampula yethokheni elungile kusuka kusilulumagama esihlukile (isib
ACCEL_NEG_2) kulula kakhulu kunokuhlinzeka ngegradient yenani eliqhubekayo njenge-2.145 m/s^2. Njengoba sizobona esigabeni esilandelayo, lokhu kunika amandla ukuqeqeshwa kwangemuva kwe-RL, okubalulekile ukuthuthukisa ukuphepha nokuvumelana kwemodeli. - Isiginali Yokuqondisa Kakhudlwana: Ukusebenzisa ukulahleka kwe-cross-entropy kumathokheni ahlukene kusebenza njengomsebenzi wokuhlukanisa futhi kuthwebula kangcono izindlela eziningi (isb. ithuba elihlukile lokujikela kwesokunxele noma kwesokudla) kunokulahlekelwa kwe-MSE kumazixhumanisi.
- Ukulinganisa Ukugeleza Kwezincazelo: Ngenkathi amathokheni ahlukene elungele ukufundwa, ngokuvamile aholela emigwaqweni ejeqeza. Ngaphezu kwalokho, ukukhiqiza ukulandelana kwamathokheni angu-128 ngokuqhubekayo kuhamba kancane kakhulu ekuqondeni kwesikhathi sangempela. Ukuze kubhekwane naleyo mikhawulo, ababhali bethula uchwepheshe wesenzo: okuhlukile okuncane kwesakhiwo esiyinhloko kusetshenziswa inqolobane ye-KV (equkethe amathokheni abonakalayo, ukunyakaza komlando nokulandela ukucabanga) ukuze kuqondwe umzila oqhubekayo endaweni eyodwa kusetshenziswa ukugeleza okufaniswayo kokusabalalisa. Lesi ngesinye sezizathu ezinkulu zokuthi kungani i-AR1 ingasebenza ngokubambezeleka okuphansi kangaka.

I-Fine-Tuning Egadiwe kanye ne-RL Post-Training

Ukuze kuguqulwe umgogodla we-VLM ube inqubomgomo yokushayela okusebenzayo, uqondiswa kahle (i-SFT) ochungechungeni lwedathasethi ye-causation. Ngokucacile, ifunda ukukhiqiza kabusha ukulandelana kokucabanga kanye nezenzo zeqiniso eliyisisekelo ezihambisanayo ngokwandisa amathuba elogi wokulandelana kwesenzo:
Nokho, i-SFT iyodwa ayanele. Ama-VLM adume kabi ngokungezwani phakathi kokucabanga kwawo kanye nezenzo ezibikezelwe. Imvelo emile yedathasethi ye-open-loop ivumela imodeli ukuthi ilingise ukulandelana kokucabanga, kodwa ukuntuleka kwempendulo yemvelo kuyawavimbela ekufakeni ngaphakathi ngempela ukusabela okuyimbangela.
Ngenhlanhla, ukuqeqeshwa kwangemuva kwe-RL kusiza ekwehliseni leyo mikhawulo ngokunikeza impendulo eqondile ekukhishweni kwemodeli. Kuleli phepha, ababhali basebenzisa i-RL ngezinhloso ezintathu eziyinhloko:
- Ukuthuthukisa ikhwalithi yokucabanga: imodeli enkulu yokucabanga (isb. i-DeepSeek-R1) ihlola ukulandelana kokucabanga kwe-AR1 ukuze kuqinisekiswe ukuthi akukho ukungqubuzana noma imibono engemihle futhi inikeze umvuzo ohlukile esikalini sika-0 kuya ku-5 ngokufanele. Nakuba i-DeepSeek ingalindelekile ukuthi ikwazi ukukhiqiza imikhondo yezinga eliphezulu yokucabanga yokushayela, kulula kakhulu ukuhlola ukucabanga kwe-AR1, lokhu kwaziwa ngokuthi igebe lokuqinisekisa.
- Ukuphoqelela ukuvumelana kwesenzo sokucabanga: ababhali bayakhipha imeta-izenzo (sheshisa, qondisa, hamba uqonde, …) kusuka kudathasethi ye-CoC usebenzisa amasistimu asekelwe emithethweni. Uma lezo zenzo zemeta zihambisana nalezo ezishiwo ekulandeleni kokucabanga, imodeli ithola umklomelo owengeziwe ongu-1, ngaphandle kwalokho u-0.
- Ikhwalithi Yendlela: umvuzo we-trajectory ukala ibanga le-L2 phakathi komgudu obikezelwe kanye nochwepheshe, ujezisa ama-trajectories aholela ekushayisaneni namajerks anobukhulu obukhulu.
Ngesikhathi sokuqeqeshwa, i-AR1 ikhiqiza ukukhishwa okufana okuningi futhi iqoqe imiklomelo r_i ngokusekelwe kumasignali amathathu omvuzo angenhla. Le miklomelo ibe isisetshenziswa ukubala ukulahlekelwa kwe-GRPO [4]. I-GRPO ibala inzuzo yokukhishwa ngakunye ngokuhlobene nesilinganiso seqembu. Le ndlela yamahhala yesisekelo (ngokuphambene namanye ama-algorithms e-RL afana ne-PPO), iqinisa ukuqeqeshwa ngezindlela zokucabanga ezivuzayo ezidlula ozakwabo ngokufaka okufanayo, kunokuthembela kumphumela ongenamthetho.
Odinga ukukuqonda mayelana nale nhloso ukuthi ihlose ukukhulisa amathuba e-trajectories (igama lelogi) ngenzuzo ephezulu (itemu le-softmax) elihlobene nabanye. Ukuze ugweme ukulahlekelwa okubalulekile kolimi lombono okuvela ku-VLM nolwazi lokushayela olutholwe ngesikhathi se-SFT, umgomo uhlelwa ngokwehlukana kwe-KL phakathi kwenqubomgomo yamanje kanye nereferensi (inqubomgomo etholwe ekupheleni kwe-SFT).
Ukuhlola
Iphrothokholi yokuhlola ihlanganisa izigaba ezi-4: Ukubikezelwa kwe-Open-loop trajectory, ukulingisa iluphu evaliwe, izifundo zokunciphisa kanye nokuhlolwa komgwaqo emotweni. Yize iqiniso lokuthi i-AR1 yasetshenziswa ezimeni zomhlaba wangempela lihlaba umxhwele, imiphumela evulekile nevaliwe ayibonakali kahle. ngombono wami; isizathu esiyinhloko ukuthi atholwe kudathasethi ye-Nvidia (iluphu evaliwe: isethi yedatha ye-PhysicalAI-AV, iluphu evaliwe: i-AlpaSim) ekhishwe ngesikhathi esifanayo nemodeli. Lokhu kusho ukushoda kwesisekelo sokwenza ukusebenza kwe-AR1 kube yingqikithi.
Isibonelo, imiphumela yelophu evaliwe ifaka kuphela i-AR1 kanye nesisekelo esingacabangi kuzimo ezingama-75. Nakuba i-AR1 idlula isisekelo kuwo wonke amamethrikhi akalwayo, ngokuvamile ikwenza ngephesenti elilodwa ngokwesilinganiso nangokuhluka okukhulu kunokuqala.

Ngalesi sizathu, ngingakweluleka ukuthatha le miphumela ngohlamvu lukasawoti ngaphambi kokuba ezinye izakhiwo zasemngceleni zihlolwe ku-AlpaSim.
Isiphetho
Naphezu kokuntuleka kwemiphumela enomongo, i-AR1 kanye namasethi edatha ahambisana nawo ahlala eyimpumelelo yobunjiniyela ehlaba umxhwele futhi iyinkomba enhle yokuthi ukushayela ukuzimela kuphokophele kuphi: amamodeli aphuma ekupheleni azuza njengefa ulwazi lomhlaba kuma-VLM amakhulu aqeqeshelwe imisebenzi ehlanganisiwe.
Kodwa-ke, ukuqoqwa kwedathasethi enesisekelo edingekayo ukuze kunikwe amandla uchungechunge lwembangela kudinga ukutshalwa kwezimali okubalulekile kanye nemizamo yokulebula ekhawulela ukuphindaphindeka. kuze kube lawa madathasethi enziwa esidlangalaleni. Esihlokweni sami esilandelayo, ngizoqhathanisa indlela ye-AR1 nenye imodeli yesimanjemanje elahla amalebula abhaliwe futhi esikhundleni salokho iqeqeshe ama-VLM ukuthi enze futhi acabange endaweni ecashile.
Siyabonga ngokufunda kuze kube manje!
Uma uthole lesi sihloko siwusizo, sicela ucabangele ukwabelana ngakho; kuyasiza ngempela ukusekela isikhathi nomzamo owenziwayo ekukhiqizeni lo msebenzi. Njengenhlalayenza, zizwe ukhululekile thinta mina uma unemibuzo, imicabango, noma imibono yokulandelela. Uma ungathanda ukusekela ucwaningo lwami oluzimele nokubhala, zizwe ukhululekile ukwenza kanjalo ngithengele ikhofi 😉
Kuze kube ngokuzayo! 👋



