Ukulawula Ulimi namamodeli we-Effession ngokuhambisa kusebenze

Amamodeli amakhulu akhiqizwayo aya ngokuya ekwaziswa kakhulu futhi athunyelwa kabanzi kwizicelo zokukhiqiza amandla, kepha ukuthola la mamodeli ukukhiqiza ngqo lokho okufiswayo kusengaba yinselele. Ukulawulwa okukhohlisayo okumihle ngalezi zinhlobo zemiphumela kubalulekile ukufeza okulindelwe umsebenzisi nokunciphisa amandla angaba khona, kuqinisekisa ukuthembeka nokuphepha kwamamodeli. Ukubhekana nalezi zingqinamba, abacwaningi bokufunda umshini we-Apple bathuthukise inqubo entsha enobuhle – i-agnostic futhi ihlinzeka ngokulawula okutholwe kahle ngokuziphatha kwemodeli nge-overhead ye-computational computational, ngenkathi kunomthelela omncane amakhono wemodeli. I-activation Transport (Act) uhlaka olujwayelekile ekusebenzeni kokuqondisa okuqondiswe yi-ORETAppy Transport theory esebenza ngokusebenza kwemisebenzi eminingi eyedlule. Umsebenzi uzokwethulwa njengendawo yokubuka e-ICLR 2025, nekhodi iyatholakala lapha.
Ukusiza amamodeli akhiqizayo akhiqize okuphumayo okuvumelana nokulindelwe kwabasebenzisi bawo, abacwaningi bavame ukuncika ekufundiseni okuqinisile ngempendulo yabantu (i-RLHF) noma imiyalo ebabazekayo, kepha lezi zindlela zisebenza kakhulu njengoba amamodeli akhula ngobulukhuni. Ngaphezu kwalokho, ukuguqula amapharamitha wemodeli angaba nemiphumela engahlosiwe, ethinta ukusebenza kwawo okuphelele kweminye imisebenzi.
Ukulawula ukuphuma kwalezi zimodeli ezikhiqizayo, abasebenzisi bavame ukuzama ukuqamba ama-Producting Proteps, kepha ngenkathi lokhu kutholakala kalula, kunikeza izilawuli ezinqunyelwe. Ngisho nokuphakanyiswa okwenziwe ngokucophelela, ukuphuma kwemodeli kungahle kube okungalindeleki futhi kunciphe kakhulu umsebenzisi angahle adinge. Isibonelo, kuvamile kumamodeli wehluleke lapho kutshelwa imiyalo ukuthi ingafaki okuthile (bheka umfanekiso 1):
Kuzinhlelo eziningi, njengokuqina kokuqukethwe, ukubhala okudala, noma ukwakheka okusizwe yi-AI, ukuba nokulawula okuhle okukhohlisayo ngokuphuma kwemodeli kubalulekile. Isibonelo, umsebenzisi angafuna ukulungisa ithoni yombhalo ngaphandle kokuguqula okuqukethwe kwawo, ashintshe isitayela sezithombe ezikhiqizwayo ngenkathi ugcina umongo wazo, noma ukuqinisekisa ukuthi izihloko ezibucayi ziphathwa ngokucophelela, ngaphandle kokuyekethisa ngokucophelela. I-activation Transport (Act) iletha lokhu, ngaphandle kwekhanda le-computational, inkimbinkimbi, kanye nevolumu yedatha edingwa yi-rlhf noma i-finely tuning, futhi ngemiphumela ethembekile kune-gest njiniyela.
Ukusebenza kokusebenza kwe-activation: Isixazululo esilula-esinamandla
Izindleko zeComputational nezomnotho zamamodeli amakhulu ahlenge kahle kanye nesidingo sokulawulwa okunamatshe ahle ziye zagqugquzela ucwaningo ngokungenelela okuhlosiwe ekusebenzeni kwemodeli yokuguqula indlela ethosiwe. Inzuzo eyinhloko yalezi zindlela “zokwenza izisebenzeli zisebenze” ukuthi azidingi ukusakazeka emuva futhi ngokujwayelekile zingahlanganiswa ezinyameni zemodeli.
Ngokungafani ne-RLHF noma ukuqondisa okuhle, ukuqondisa kwe-activation akunamthelela emapharamitha wemodeli, kepha esikhundleni salokho kuthobeka ukuqonda kwemisebenzi yemodeli ukuze kuqondiswe ngokuqinile imiphumela yayo ngesikhathi esifunekayo ngesikhathi sokuthola.
Izindlela zangaphambi kokusebenza kwezinyathelo zisebenzise ukungenelela okususelwa kwiVector ezithatha ukusebenza komthombo ama-neurons, futhi uwashintshe ubheke kwelitshe elifundwe (bheka umfanekiso 2). Kodwa-ke, iziqu lapho kusebenza khona kulawulwa kulawulwa yi-parameter yemodeli engakhokhwanga (λ lambda), okwenza ukungenelela okuguqukayo kube yinselele. Ngokwengeziwe, lokhu kungenelela kungashintsha kusebenze ngaphandle kokusatshalaliswa, okusho ukuthi ukusebenza okuguqukayo kungagcina kukude nalokho imodeli okufundile ukulindela amandla okuqeqeshwa, okungaholela ekuziphatheni okungalindelekile kanye nokusebenza okuncishisiwe.
I-activation Transport (Act) ne-Linear-Act
I-activation Transport (Act) uhlaka lokungenelela kwenoveli oludlula ukulinganiselwa kwangaphambili kokusebenza kokuya nge-accounting kokusatshalaliswa komthombo kanye nokuqondiswa kokusebenza, nokusebenzisa ipharamitha yamandla aguqukayo ukuze kulawulwe kahle.
I-Act ifunda imephu yezokuthutha ephelele (i-OT) phakathi komthombo nokuqondiswa kwenhloso yokwenza kusebenze. Lokhu kuqinisekisa ukuthi kusebenze ukuthuthwa okusuka emthonjeni kuzohambisana nokusatshalaliswa okuqondiwe, ukunciphisa imiphumela ehlanganisiwe ku-Dynamics yemodeli uqobo. Ukulinganisa imephu ye-OT, inani elincane lemisho (isib. Amakhulu) ligijima ngemodeli, eveza amasethi okusebenza kuwo womabili umthombo (isib.
Ukulinganisa imephu ye-Multi-ntathu futhi okungenzeka okungekho emthethweni kwe-OT kungadinga inani ledatha elingavunyelweyo, futhi ukuthobeka kwalo kunganciphisa ijubane lesizukulwane se-LLM umbhalo ophelele. Ngakho-ke, senza okwenziwe lula: (1) sibheka ukusebenza ngezindlela ozimele, okuvumela ukulinganisa imephu ye-1d nge-neuron nge-neuron ngayinye futhi (2) sibheka amamephu aqondile, sivimbela ukulandelwa kwezinyawo. Ukuqondisa okunezinto zokuhamba eziqondile nezimele kuhlanganiswe nomugqa (bheka umfanekiso 3).
I-Linear-Act isebenza ngaphandle kwebhokisi le-LLMS kanye namamodeli we-Texth-to-Imagen (T2I) Amamodeli we-Puring, efinyelela imiphumela eqinile kuzo zombili lezi zimo. Lona ngumsebenzi wokuqala ukuphakamisa i-algorithm enesimo esingasetshenziswa ngaphandle kokuguqulwa kolimi nesizukulwane sesithombe.
Ukulawula ukuphuma kwe-LLM nge-Linear-Act
Ukukhombisa ukusebenza kwe-Linear-Act ngokulawula ukuphuma kwe-LLMS, sikubhekisise ukusebenza kwaso emisebenzini emibili ebalulekile: ukunciphisa ubuthi kanye nokungeniswa kweqiniso, ngenkathi kubhekiswa ukuthi ezinye izinkomba zokusebenza zithinteka kanjani ngamamethrikhi we-Delleplerity (PPL) nama-MMLU.
Sihlole emugqeni – yenza ngokunciphisa ubuthi ku-gemma-2-2b ne-llama-3-8b (kuhlolwe kusetshenziswa ukuncishiswa kwe-realtoxicitypromptsptts), ukuthola ukuncishiswa kwe-7.5x no-4.3x ngokulandelana.
| Khipha ku-internet indawo | Ukuncishiswa kobuthi | Ithambo lomphakathi | Mmlu | |
| -Asekuqaleni | 13.98 | 53.1 | ||
| Faka isicelo | 0.5 | I-1.1x | 14.69 | 53.0 |
| I-aura | – | I-2.0x | 14.18 | 53.0 |
| ITI-C | 8.0 | I-5.6x | 14.90 | 52.6 |
| I-Linear-Act | 1.0 | I-7.5x | 14.79 | 51.3 |
| Khipha ku-internet indawo | Ukuncishiswa kobuthi | Ithambo lomphakathi | Mmlu | |
| -Asekuqaleni | 9.06 | 65.3 | ||
| Faka isicelo | 0.3 | I-1.0x | 9.71 | 65.5 |
| I-aura | – | I-3.1x | 9.52 | 65.5 |
| ITI-C | I-3.0 | I-3.6x | 9.48 | 64.7 |
| I-Linear-Act | 1.0 | I-4.3x | 9.56 | 64.5 |
Sihlole ukungeniswa kweqiniso (sisebenzisa i-Direcfuqu Dataset), sibonisa ukwanda kwe-4.9x no-7.5x kwemodeli ngayinye.
| Khipha ku-internet indawo | IQINISO (MC1 ACC) | Mmlu | |
| -Asekuqaleni | 21.05 | 53.10 | |
| Faka isicelo | I-3.0 | 23.01 | 52.83 |
| I-aura | – | 21.20 | 52.73 |
| ITI-C | I-2.0 | 24.53 | 51.39 |
| I-Linear-Act | 1.0 | 26.00 | 51.47 |
| Khipha ku-internet indawo | IQINISO (MC1 ACC) | Mmlu | |
| -Asekuqaleni | 25.46 | 65.35 | |
| Faka isicelo | 0.7 | 26.19 | 65.42 |
| I-aura | – | 25.34 | 65.37 |
| ITI-C | I-2.0 | 30.11 | 64.71 |
| I-Linear-Act | 1.0 | 33.22 | 64.78 |
Ukulawula amamodeli we-Tex-To-To-Image Promelsion nge-Linear-Act
Amamodeli we-Text-to-Image Provil Models (T2S) amathuluzi anamandla avumela abasebenzisi ukuthi bakhiqize izithombe ezinhle kakhulu ezincomeni zombhalo ezilula. Kodwa-ke, ukulawula okubanda kahle kuyinselele; Isibonelo, akunakwenzeka ukusebenzisa ushintsho olwengeziwe – njengokungeza izihlahla eziningi noma ukwenza isitayela kancane ikhathuni – noma ukubikezela ngokuthembekile ukuthi amagama athi 'izihlahla eziningi' azothinta kanjani isithombe esikhiqiziwe.
Ngoba uMthetho ungakwazi ukuthutha ukusebenza kusuka kokusatshalaliswa okukodwa kuya kolunye noma ngabe ukwakhiwa okuyimodeli noma ukuvumelanisa, kungavumela ukulawula okukhohlisayo okutholakele ngemininingwane enjenge-T2IS (bheka umfanekiso 4). Ukuze wenze lokhu, yenza isenzo sidinga kuphela uhlu lwemininingwane echaza umthombo (isib., Akekho umuntu) nelitshe (isib. Abantu) Ukuhanjiswa kokwenza.
Yenza ngokulawulwa okuhle kwesitayela esingokomfanekiso
Umthetho futhi unika amandla ukulawula ngesitayela sobuciko bezithombe ezikhiqizwayo. Ukukhombisa lokhu, safaka amathegi we-stylistic (isib.
Kule sethaphu, ukuvela kwangempela kusebenza njengokusatshalaliswa komthombo, ngenkathi ukushukumisa okuguqulwa kwesitayela kubonisa ukusatshalaliswa okuhlosiwe. Ngokufunda amamephu we-OT phakathi kwalokhu kusatshalaliswa, nangokusebenzisa lawo mamephu ngokusebenzisa ipharamitha yamandla okuhumusha, isenzo sithola amandla okufaka nje inani elifanele lesitayela esifunekayo (bheka umfanekiso 5).
Ungacabangi ngendlovu epinki
Inhloko ngenhla cishe izobangela ukuba abafundi bacabange ngendlovu epinki – ngokuthakatha, kutholakale ukuthi into efanayo yenzeka ku-T2IS (bheka umfanekiso 6). Lokhu kungenzeka uma abafaka imibhalo ye-T2IS 'bashumeka izigwebo njengesikhwama samagama futhi ayonakalisi ngokwanele ukutolika ama-nefant, kepha kuvame ukwenzeka ukuthi kufaka phakathi indlovu epinki “) kuzoholela empumelelweni ye-T2I ukuze ifake leyo nto engafuneki esithombeni.

I-Linear-Act iyindlela yokungenelela engabhekana nale nselelo futhi isuse imiqondo engafuneki ezithombeni ezikhiqizwayo. Ukuze wenze lokhu, iqoqo lemithombo ekhuthaza ukuthi liqukethe umqondo ozosuswa (isib. “Indlu nesihlahla esinendlovu epinki eduze kwaso”) iyadingeka, kanye nesethi yokuphakanyiswa kwelitshe lapho umqondo ungekho khona (isib. “Indlu”). Umqondo oyinhloko ukuthi “uhlukanise” umqondo “wokususwa njengomehluko omkhulu phakathi kwamasethi wokuphakanyiswa. I-Linear-Act ifunda imephu yezokuthutha evela emthonjeni isethi okuqondiwe, okuholela ekungeneleleni okulawulwayo okungasusa umqondo ongenandaba (njenge- “Pink Elephant”) enganikezwa ngemodeli ekhiqizwayo, njengoba kukhonjisiwe kuMfanekiso 7.

Ukugcina
Njengamakhono nokukhishwa kokukhiqiza ngolimi olukhiqizayo kanye namamodeli we-Text-to-isithombe kuyaqhubeka ukukhula, ukuhlinzeka ngezilawuli ezibunjiwe kahle ngaphezulu kwemiphumela yazo kubaluleke kakhulu. Ngenkathi izindlela ezivamile ezinjengeRlHF kanye nokufundisa okuhle ziye zasebenza kahle ekuthuthukiseni ukuqondanisa kokuphuma kwe-LLM ngokulindelwe kwabasebenzisi, lezi zindlela ziyinsiza zinamandla, futhi zingaba yinhle njengoba amamodeli akhula ngobulukhuni. I-activation Transport (Act) ihlinzeka ngokulawulwa okukhohlisayo kokukhishwa kwama-LLMS kanye namamodeli we-T2I Puricesion, ngaphandle kwemikhawulo ye-RLHF nokuhleleka okuhle. Loluhlaka olujwayelekile lokusebenza ngokuqhubekayo ngokuya nge-theory yezokuthutha efanelekile yimodeli-agnostic futhi kudinga ngaphezulu kwe-overhead ye-computal, futhi abacwaningi kanye nabaqeqeshi nabaqeqeshi manje bangasebenzisa futhi bakha ngokwenza ikhodi etholakala lapha.


