I-Trajectory Ikhipha I-Multi-LoRA Training Stack for Continuous Learning, Ibika i-2.81× Experiment-Throughput Gain

Isitaki se-Trajectory esisebenza ngesikhathi esisodwa se-multi-LoRA sibika inzuzo yokuhlola engu-2.81× ngaphezu kwe-RL yomqashi oyedwa, nayo yonke ikhodi endaweni ye-NovaSky-AI/SkyRL GitHub.
Amamodeli amaningi olimi athuthuka ngokugxuma okungaqhubeki. Ithimba liqoqa idatha, liqeqesha, futhi lithumele inguqulo entsha. Lokhu kuthatha izinyanga futhi kukhiqiza ukuziphatha okuphawulekayo noma okuyinhlekelele kubasebenzisi. I-Trajectory ifuna ukushintsha lowo mjikelezo ngokufunda okuqhubekayo.
Ithimba le-Trajectory lishicilele umbiko wenkundla ochaza ukuthi kanjani. Yakha inkundla yokuqeqeshwa ngesikhathi esisodwa, ye-LoRA eminingi ukuze iqhubeke nokufunda imithwalo yemisebenzi. Umsebenzi wenziwa nge-UC Berkeley Sky Lab kanye ne-Anyscale. Yonke ikhodi yokuqeqesha ivulekile endaweni yokugcina ye-NovaSky-AI/SkyRL.
Umphumela uwukuthuthuka kokuhlolwa kokuphela kwe-2.81×. Ukuqhathanisa kuphambene nohlaka lokuqeqeshwa komqashi oyedwa. I-trajectory ibika ukuthi akukho kuhlehla kunoma yimiphi imiklomelo yokuqeqeshwa.
Yini I-Multi-LoRA Training Empeleni
Ukufunda okuqhubekayo kudinga amamodeli ukuthi athuthuke kusukela kumpendulo ebukhoma nokusebenzisana kokukhiqiza. I-ejenti yekhodi ingafunda amaphethini wobunjiniyela njengoba abathuthukisi belungisa umsebenzi wayo. I-ejenti yokusekela ingaxazulula amathikithi aqinile njengoba opharetha bengenelela ezimeni ezinzima.
Iningi lengqalasizinda yokuqeqesha isathatha umjikelezo wokuphila oqondile. Amaqembu abela ama-GPU, aqalise imodeli, aqhube umsebenzi, bese ajikelezisa. Ukufunda okuqhubekayo kubuyekeza lobo budlelwano. Lapho ukusebenzelana kokukhiqiza kuba okokufaka kokuqeqesha, ukuqeqeshwa kuba yingxenye yesistimu ebukhoma.
Ukuqeqeshwa kwe-RL yesimanje kunciphisa ku ezintathu core primitives. I Isampula yakha ama-trajectories kusuka kumodeli yenqubomgomo yamanje. I Umqeqeshi ibala ama-gradients futhi ibuyekeze izisindo zenqubomgomo. Ukuvumelanisa ipharamitha isakaza izisindo ezibuyekeziwe emuva kubasebenzi be-inference.
I-Trajectory ibiza indlela yayo ethi Continuous Multi-LoRA Training, noma i-C-LoRA. Isilingo ngasinye sibonisa imephu ye-adaptha ye-LoRA efudumele, enabaqashi abaningi.
Izinkinga Eziqondise
Ithimba le-Trajectory lihlonza ukungasebenzi kahle kwezitaki zendabuko:
(1) Ukuqala okubandayo kuhamba kancane: Wonke umsebenzi we-serial ulayisha kabusha izindawo zokuhlola, uqalisa isikhathi sokusebenza esisabalalisiwe, futhi ufudumeza izinjini zokukhomba. Kumamodeli amakhulu, lesi sinyathelo sisodwa singadlula imizuzu engama-30 ngokugijima ngakunye.
(2) I-RL iyinkumbulo eningi: Amamodeli e-Frontier avame ukweqa amapharamitha angu-100B. I-Qwen3.5-397B ingadinga amanodi angu-H200 afika kwayisishiyagalombili ukuze ingene kumemori. I-LoRA inciphisa ukusetshenziswa kwememori ngohlelo lobukhulu. Imisa imodeli eyisisekelo futhi iqeqeshe kuphela izisindo ze-adaptha ezincane.
(3) Izitaki zendabuko ziwumqashi oyedwa: Benza isilingo esisodwa ngesikhathi. I-Multi-LoRA imephu ukuhlolwa ngakunye ku-adaptha eyodwa, i-multiplexing throughput nge-factor of N.
(4) Ukusetshenziswa komsebenzi kuphansi: Abaqeqeshi nezinjini zokukhomba ziyama ngenkathi zilindana. Amabhalansi wokulayisha we-Multi-LoRA kuyo yonke imisebenzi ukuze agcwalise umthamo wokungenzi lutho.
Ngaphakathi kwe-Architecture
Iningi lokuwina komphumela livela ekucabangeni. Ku-vLLM, wonke ama-adaptha alayishwa ngokushisa kumemori ye-GPU. Izinyathelo zokunquma zingase zihlanganise amathokheni asuka kuma-adaptha ahlukene kunqwaba efanayo. Ukhiye onika amandla amandla yi-SGMV decode kernel. Ihlanganisa umsebenzi we-matrix-vector ye-adaptha ngayinye ibe ukwethulwa kwe-GPU eyodwa ngesinyathelo sekhodi ngayinye.
Ngemva kwesinyathelo ngasinye sokuthuthukisa, izisindo ezibuyekeziwe ze-LoRA zilayisha endaweni enjinini yokukhomba. Isihleli asiqandi, ngakho abanye abaqashi baqhubeka nokukhipha amakhodi.
Ukuqeqeshwa kusebenza ngendlela ehlukile. I-adaptha ye-LoRA eyodwa esebenzayo iqeqesha ku-GPU. Bonke abanye bahlala kumemori ye-CPU ephiniwe. Isimo somqashi ngamunye sihlala endaweni AdapterStore. Iphethe amapharamitha we-LoRA, izisindo eziyinhloko ze-FP32, izikhathi ze-optimizer, namabhafa we-gradient.
Injini ishintsha isimo somqashi oyedwa ku-GPU, isebenzisa iphasi eyodwa eya phambili_emuva, bese iyishintshanisa futhi. Le ndlela yokuqeqesha iseyi-adaptha eyodwa. Izinzuzo ze-inference concurrency azisebenzi ekuqeqeshweni.
Izinombolo
I-Trajectory ihlolwe endaweni eyodwa ye-H200 ene-Qwen3-4B-Instruct-2507. Isebenzise i-RL yokuvumelanisa ku-GSM8K kuzilungiselelo ze-ejenti. Ithimba le-Trajectory lifake kabusha i-GSM8K njengomsebenzi wokufunda wokusebenzisa ithuluzi. Imodeli inquma ukuthi uzoshayela nini u-a Isibali kanye a Impendulo yokugcina ithuluzi. Umvuzo ungu-1.0 kuphela uma impendulo yokugcina ibizwa ngempendulo efanele.
Inqubomgomo iqala eduze nokunemba okungu-40% esinyathelweni 0. Nge-algorithm yokufunda efanele, ikhuphuka idlule ku-90% ngesinyathelo 9.
Ithimba le-Trajectory lifinyelele kuma-run amaningi e-LoRA ayisishiyagalombili ngesikhathi esisodwa. Isikhathi Sokugcina Sokulinga sifinyelele ku-5433s ku-N=8, isivinini esingu-2.81×. Izivivinyo eziyisishiyagalombili zasikhathi sinye ziqedwe ngaphambi kokuthi uchungechunge oluthathu luqhubeke ngokubuyela emuva. Isikhathi Sokuhlola Esisho ukuthini siphinde saba ngcono, ifinyelela phezulu ku-N=4 ngesivinini esingu-1.88×. Lonke izinga le-concurrency lifinyelele ukunemba kwe-reward_accuracy ngaphezu kuka-90% ngesinyathelo 9.
I-Tradeoffs
Izindleko zokuphuma eziphakeme ngesinyathelo ngasinye sokubambezeleka. Njengoba i-N ikhula, Isikhathi Sokulinga Sokuqala Nesinyathelo Sesinyathelo siyancipha. Ku-N=8, isilingo sokuqala se-serial siqeda ngokushesha okungu-1.97×. Isikhathi esimaphakathi sikhuphuka sisuka ku-191s siye ku-500s, sihamba ngo-2.62× kuphela.
Okuningi kwalokho kwanda isikhathi sokukhishwa. Ukukhishwa kukhula kusuka ku-162s kuya ku-401s, cishe u-77% wokunyuka. Ku-N=2, ukuphinda kabili umthwalo kungeza isikhathi sokukhishwa esingu-15%. Leyo yicala elifanele le-multi-LoRA.
Iphethini ibambe umthwalo onzima womsebenzi. Ku-τ-bench okuthengiswayo ngemodeli ye-NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 MoE, i-N=2 iqede izinyathelo eziyi-10 ngo-1.28× ngokushesha. Isikhathi sesinyathelo somqashi ngamunye sikhuphuke ngo-1.57×.
Amandla Nobuthakathaka
Amandla:
- 2.81× ukuzuza kokuhlolwa kokuphela kuze kube sekugcineni kuma-run angu-8 ngesikhathi esisodwa
- Akukho ukuhlehla kokunemba; igijima ilandelele isisekelo sochungechunge ngaphakathi ±1σ ezinyathelweni zokugcina
- I-LoRA inqamula inkumbulo ngohlelo lobukhulu ngokulungiswa kahle okugcwele
- Ivuleke ngokugcwele ku-NovaSky-AI/SkyRL ukuze umphakathi wakhele phezu kwayo
Ubuthakathaka:
- Ukubambezeleka kwesinyathelo ngasinye kanye nesikhathi Sokulinga Sokuqala kwehlisa izinga njengoba i-N ikhula
- Ukuqeqeshwa kusalokhu ku-serialized kubo bonke abaqashi; i-inference kuphela ephindwe kabili
- Ihlolwe kakhulu kumamodeli anosayizi omaphakathi, hhayi amapharamitha esikali somngcele
- Ukusetha kudinga i-node engu-8× H100/H200 kanye nokwakhiwa kweMegatron
Okuthathwayo Okubalulekile
- I-Trajectory yakhele isitaki sokuqeqeshwa se-LoRA RL ngesikhathi esifanayo, ukuze siqhubeke nokufunda, sitholakala ku-NovaSky-AI/SkyRL.
- Ibika inzuzo yokuhlolwa kokuphela kwe-2.81× ngaphezulu kwesisekelo somqashi oyedwa, ngaphandle kokuhlehla komvuzo.
- Ukuhlolwa ngakunye kumephu ku-adaptha ye-LoRA ezinikezele enjinini ehlala ishisa, i-multiplexing throughput ka-N.
- Izinzuzo eziningi zivela ku-vLLM multi-LoRA inference ngokusebenzisa i-SGMV decode kernel; ukuqeqeshwa kuhlala i-adaptha eyodwa.
- I-tradeoff iwukubambezeleka kwesinyathelo ngasinye: ku-N=8, isikhathi sesinyathelo sikhuphuka sisuka ku-191s siye ku-500s.
Isichazi Esibonakalayo sikaMarktechpost
Lapho (Iziqondiso) okufanele Zisebenze
I-Inference & compute abahlinzeki
Ungafinyelela kuphi imodeli yesisekelo ye-Qwen3-4B-Instruct-2507, isitaki sokuqeqeshwa se-SkyRL, kanye nama-NVIDIA GPU asetshenziswa ekuhloleni.
Hlola I-Repo futhi Imininingwane Yezobuchwepheshe. Futhi, zizwe ukhululekile ukusilandela Twitter futhi ungakhohlwa ukujoyina wethu 150k+ ML SubReddit futhi Bhalisela ku Iphephandaba lethu. Linda! ukutelegram? manje ungasijoyina kuthelegramu futhi.
Udinga ukusebenzisana nathi ekuthuthukiseni i-GitHub Repo yakho NOMA Ikhasi Lobuso Lokugona NOMA Ukukhishwa Komkhiqizo NOMA I-Webinar njll.? Xhuma nathi
U-Michal Sutter uchwepheshe wesayensi yedatha one-Master of Science in Data Science yase-University of Padova. Ngesisekelo esiqinile ekuhlaziyeni izibalo, ukufunda ngomshini, nobunjiniyela bedatha, u-Michal uphuma phambili ekuguquleni amasethi edatha ayinkimbinkimbi abe imininingwane ephathekayo.



