Iminyaka emithathu yesayensi yedatha: Ukusebenzisa nini ukufunda ngomshini wendabuko, ukufunda okujulile, noma i-LLM (kuchazwe ngesibonelo esisodwa)

yendawo yonke (eyenziwe ngomunye wabahlabeleli abaningi abathobezwayo) ithi lokhu:
Ngifisa ukuthi ngingahle ngibuyele emuva
Futhi ushintshe le minyaka
Ngihamba ngezinguqukoISabatha Elimnyama – Izinguquko
Le ngoma inamandla amakhulu futhi ikhuluma ngendlela yokuphila engashintsha ngayo ngqo phambi kwakho ngokushesha.
Leyo ngoma imayelana nenhliziyo ephukile nendaba yothando. Kodwa-ke, futhi kungikhumbuza izinguquko eziningi ukuthi umsebenzi wami, njengososayensi wedatha, eseyenze eminyakeni eyi-10 edlule yomsebenzi wami:
- Lapho ngiqala ukufunda i-physics, okuwukuphela kwento engangicabanga ngayo lapho othile ethi “transformer” kwakungu-Optimus Prime. Ukufunda ngomshini kimi kwaphela Ukuhlehlisa okuqondile, i-SVM, ihlathi elingahleliwe njll … [2016]
- Lapho ngenza iziqu zenkosi yami kwidatha enkulu kanye ne-physics yezinhlelo eziyinkimbinkimbi, ngaqala ukuzwa ngo “Ibandla“Futhi nobuchwepheshe obuhlukahlukene bokufunda obujulile obubonakala buthembisa kakhulu ngaleso sikhathi. Amamodeli okuqala we-GPT aphuma, futhi abukeka ethakazelisa kakhulu, noma ngabe akekho owawulindele ukuba anamandla njengoba namhlanje. [2018-2020]
- Phambili phambili empilweni yami manje njengososayensi wedatha yesikhathi esigcwele. Namuhla, uma ungazi ukuthi yini i-GPT imele futhi awukaze ufunde “Ukunakwa konke okudingayo” Unamathuba ambalwa kakhulu wokudlulisa ingxoxo yokwakhiwa kwesistimu yedatha yedatha. [2021 – today]
Lapho abantu besho ukuthi amathuluzi kanye nempilo yansuku zonke yomuntu osebenza nedatha ahluke kakhulu eminyakeni eyi-10 (noma engu-5) edlule, ngivumelana ngayo yonke indlela. Ini Angivumi Ngomqondo wokuthi amathuluzi asetshenziswa esikhathini esedlule kufanele asuswe ngoba yonke manje yonke manje ibonakala ixazululwa nge-GPT, LLMS, noma i-Agentic AI.
Inhloso yalesi sihloko ukucabanga ngomsebenzi owodwa, okuyi ukuhlukanisa uthando / inzondo / inhloso engathathi hlangothi ye-tweet. Ikakhulu, sizokwenza nge Ukufundwa komshini wendabuko, ukufunda okujulile, na- Amamodeli amakhulu olimi.
Sizokwenza lokhu, sisebenzisa i-Python, futhi sizochaza ukuthi kungani futhi nini ukusebenzisa indlela ngayinye. Ngethemba ukuthi, ngemuva kwalesi sihloko, uzofunda:
- Le khasi amathuluzi Kusetshenziswe ezinsukwini zokuqala kufanele kubhekwe, kufundwe, futhi ngezinye izikhathi kwamukelwa.
- I-latency, ukunemba, nezindleko kufanele ihlolwe lapho ukhetha i-algorithm enhle kakhulu yecala lakho lokusebenzisa
- Ushintsho Emhlabeni wesayensi yedatha kuyadingeka futhi wamukelwa ngaphandle kokwesaba 🙂
Ake siqale!
1. Icala lokusebenzisa
Icala esibhekene nalo yinto etholwe empeleni ekwamukelwe kakhulu ku-Data Science Science / AI Application: Ukuhlaziywa kwemizwa. Lokhu kusho ukuthi, unikezwe umbhalo, sifuna ukukhipha “umuzwa” ngemuva kombhali walowo mbhalo. Lokhu kusiza kakhulu emacaleni lapho ufuna ukuqoqa impendulo ngemuva kokubuyekezwa okunikezwe into, i-movie, into oyincomayo, njll …
Kulesi sifundo sebhulogi, sisebenzisa isibonelo sokuhlanywa semizwa “esidumile” okuhlukanisa umuzwa ngemuva kwe-tweet. Njengoba ngangifuna ukulawula okwengeziwe, ngeke sisebenze nama-organic tweets asongwe kuwebhu (lapho amalebula angaqinisekile). Esikhundleni salokho, sizobe sisebenzisa okuqukethwe okwenziwe ngu Amamodeli amakhulu olimi ukuthi singakwazi ukulawula.
Le ndlela nayo iyasivumela ukuthi sihambisane nobunzima kanye nezinhlobonhlobo zenkinga nokubheka ukuthi amasu ahlukene asabela kanjani.
- Icala elilula: Uthando ama-tweets azwakala njengamakhadi okuposa, ezonzondo azisho lutho, futhi imiyalezo engathathi hlangothi Ikhuluma ngesimo sezulu nekhofi. Uma imodeli ilwela lapha, enye into icishiwe.
- Icala elinzima: usathanda, inzondo, engathathi hlangothi, kepha manje sesijova ukubhuqa, amathoni ahlanganisiwe, namacebo acashile afuna ukunakwa kumongo. Sibe nakho Ngaphansi Idatha, ukuba nedatha encane yokuqeqesha nayo.
- Icala elinzima lezezimali: Sithuthela emizweni emihlanu: uthando, inzondo, intukuthelo, ukunyanyeka, umona, ngakho-ke imodeli kufanele ihambisane nokucebisa, imisho ebekwe ngaphezulu. Ngaphezu kwalokho, sinokufakwa oku-0 ukuqeqesha imininingwane: Ngeke sikwazi ukuqeqeshwa.
Ngiye ngakhiqiza imininingwane futhi ngibeka amafayela ngamunye kufolda ethile yefolda yomphakathi Gitub engikwenzile kulo msebenzi [data].
Umgomo wethu ukwakha uhlelo lokuhlukanisa oluhle oluzokwazi ukuqonda kahle isihawu ngemuva kwama-tweets. Kepha sizokwenza kanjani? Ake sikuthole.
2. Design System
Isithombe esihlala siwusizo kakhulu ukucabangisisa ukuthi yikuphi okulandelayo:
Ukuqonda nqo, khokhisafuthi ukukala izinga Ohlelweni lokufunda lomshini lakha unxantathu. Ungabe usakha ngokuphelele ezimbili ngasikhathi sinye.
Ungaba nemodeli enembile kakhulu ekala kahle ngezigidi zokufakwa, kepha ngeke kusheshe. Ungaba nemodeli esheshayo enesikali esinezigidi zokufakwa, kepha ngeke kube okunembile. Ungaba nemodeli enembile futhi esheshayo, kepha ngeke ilinganise kahle.
Lokhu kucatshangelwa kukhishwe enkingeni ethile, kepha basiza ukuqondisa ukuqondiswa okuyi-ML System ukwakha. Sizobuyela kulokhu.
Futhi, amandla emodeli yethu kufanele alinganiswe ngosayizi wokuqeqeshwa kwethu. Ngokuvamile, sizama ukugwema iphutha lokuqeqeshwa lesethi ukwehla ngezindleko zokwanda kokusetha kokuhlola (okudumile ukweqisa ngokweqile).

Asifuni ukuba sendaweni engaphansi noma ekwenziweni ngokweqile. Ake ngichaze ukuthi kungani.
Ngamagama alula, kwenzeka ngokweqile lapho imodeli yakho ilula kakhulu ukufunda iphethini yangempela kwidatha yakho. Kufana nokuzama ukudweba umugqa oqondile ngokuvunguza. Ngokweqile kuphambene. Imodeli ifunda kahle idatha yokuqeqesha, kufaka phakathi wonke umsindo, ngakho-ke yenza kahle kulokho esikubonile kodwa kahle kwidatha entsha. Indawo emnandi yindawo ephakathi nendawo, lapho imodeli yakho iqonda khona isakhiwo ngaphandle kokukhumbula.
Sizobuyela kulokhu futhi.
3. Icala elilula: Ukufunda komshini wendabuko
Sivula ngesimo sobungani: i-dataset ehlelwe kakhulu yama-tweets ayi-1 000 esawenza futhi abhala. Amakilasi amathathu (angathathi hlangothi, awathathi hlanza) alinganiselwe ngenhloso, ulimi lucacile kakhulu, futhi wonke umugqa uhlala ku-CSV ehlanzekile.
Ake siqale nge-block elula yokungenisa yekhodi.
Ake sibheke ukuthi i-dataset ibukeka kanjani:

Manje, silindele ukuthi lokhu ngeke ilinganise Ezigidini zemigqa (ngoba i-datasset ihlelwe kakhulu ukuba yihluka). Kodwa-ke, singakha indlela esheshayo futhi enembile yaleli cala elincane neliqondile lokusebenzisa. Ake siqale ngemodeli. Amaphuzu amathathu aphambili okufanele acatshangelwe:
- Senza Isitimela / Split Split ngo-20% wedathasethi kusethi yokuhlola.
- Sizosebenzisa a I-TF-IDF indlela yokuthola ukushumeka kwamagama. I-TF-IDF imele imvamisa yemfanelo yemfanelo yemfanelo. Kuyindlela ye-classic eguqula umbhalo ube yizinombolo ngokunikeza igama ngalinye isisindo ngokususelwa ekutheni libaluleke kangakanani kudokhumenti olwaqhathaniswa nedatha yonke.
- Sizohlanganisa le ndlela ngamamodeli amabili e-ML: Ukubuyiselwa Kwe-Logistic na- Imishini yokusekela ye-Vest Supportkusuka ku-skikit-funda. Ukuphindaphinda okune-Logistic kulula futhi kuhumusha, kuvame ukusetshenziswa njengesisekelo esinamandla sokuhlukaniswa kombhalo. Imishini yokusekela ye-vector igxile ekutholeni umngcele omuhle kakhulu phakathi kwamakilasi futhi imvamisa yenza kahle lapho idatha ingenamsindo omkhulu.
Futhi ukusebenza kulungele zombili amamodeli.

Kuleli cala elilula kakhulu, lapho sinedatha engaguquki yemigqa eyi-1 000, indlela yendabuko ithola umsebenzi owenziwe. Akunasidingo sezigidigidi zamamodeli wepharamitha afana ne-GPT.
4. Icala elinzima: Ukufunda okujulile
Idatha yesibili isantulwa, kepha yenzelwe ukucasula ngenhloso. Amalebula asele uthando, inzondo, futhi engathathi hlangothi, kepha ama-tweets ancike ekubhuqisweni, ithoni exubile, nezincomo ezihlekisayo. Phezu kwalokho, ichibi lokuqeqesha lincane ngenkathi ucezu lokuqinisekiswa luhlala lukhulu, ngakho-ke amamodeli asebenza nobufakazi obuncane kanye nokuphikiswa okwengeziwe.
Manje njengoba sinalokhu kucabanga, kudingeka sithathe izibhamu ezinkulu. Kunamamodeli wokufunda ajulile wokuvimbela okulondolozwa ukunemba okuqinile futhi akwazi ukukala kahle kulezi zimo (khumbula unxantathu kanye nephutha lokuphikisana nokubumbana kwesakhiwo!). Ikakhulu, amamodeli wokufunda ajulile wokuvimbela afunde okushiwo amagama avela kumongo wawo esikhundleni sokuwaphatha njengamathokheni awodwa.
Ngalesi sifundo sebhulogi, sizosebenzisa i-Bert, okuyinye yamamodeli adume kakhulu wokushumeka lapho. Ake siqale ngenisa eminye imitapo yolwazi:
… nabanye abasizi.
Ngenxa yale misebenzi, singakwazi ukuhlola ngokushesha imodeli yethu yokushumeka vs indlela ye-TF-IDF.


Njengoba sikwazi ukubona, imodeli ye-TF-IDF isebenza ngokweqile ngamalebula amahle, ngenkathi igcina ukunemba okuphezulu lapho usebenzisa imodeli yokushumeka (i-Bert).
I-5. Icala elinamandla elengeziwe: I-LLM Agent
Kulungile, manje ake senze izinto zilukhuni kakhulu:
- Sinakho kuphela Imigqa eyi-100.
- Siyacabanga Asazi amalebulaokusho ukuthi asikwazi ukuqeqesha noma iyiphi imodeli yokufunda yomshini.
- Sine isihlanu Amalebula: umona, inzondo, uthando, ukwenyanya, intukuthelo.

Njengoba singakwazi ukuqeqesha noma yini, kodwa sisafuna ukwenza ukuhlukaniswa kwethu, kufanele sisebenzise indlela ethile esevele inezigaba ezingaphakathi. Amamodeli amakhulu olimi ziyisibonelo esikhulu kunazo zonke zendlela enjalo.
Qaphela ukuthi uma sisebenzisa ama-LLMS kwamanye amacala amabili, kungaba njengokudubula impukane nge-cannon. Kepha lapha, kwenza umqondo ophelele: Umsebenzi uyinselelo, futhi asinayo indlela yokwenza noma yini ehlakaniphile, ngoba asikwazi ukuqeqesha imodeli yethu (asinalo ukuqeqeshwa kwethu (asinalo isethi yokuqeqeshwa).
Kulokhu, sinokunemba ngezinga elikhulu. Kodwa-ke, i-API ithatha isikhathi, ngakho-ke kufanele silinde umzuzwana noma ezimbili ngaphambi kokuphendula (khumbula unxantathu!).
Ake singenisa eminye imitapo yolwazi:
Futhi lokhu kungukubiza kwe-API Callication:
Futhi siyabona ukuthi i-LLM yenza umsebenzi omangalisayo wokuhlukaniswa:
6. Iziphetho
Eminyakeni eyishumi edlule, iqhaza lososayensi wedatha lishintshe kakhulu njengezobuchwepheshe uqobo. Lokhu kungaholela emcabangweni wokusebenzisa nje amathuluzi anamandla kakhulu laphaya, kepha akuyona indlela engcono kakhulu yamacala amaningi.
Esikhundleni sokufinyelela imodeli enkulu kuqala, sahlola inkinga eyodwa ngelensi elula: ukunemba, i-latency, nezindleko.
Ikakhulu, nakhu esikwenzile, isinyathelo ngesinyathelo:
- Sichaze icala lethu lokusebenzisa njengoba ukuhlukaniswa kwemizwa ye-tweet, okuhlose ukuthola uthando, inzondo, noma inhloso engathathi hlangothi. Sihlele ama-dataset amathathu wobunzima obandayo: okuhlanzekile, ukubhuqa, kanye nokuqeqeshwa okungu-zero.
- Silimaze icala elilula sisebenzisa i-TF-IDF ngokubuyiselwa okunengqondo kanye ne-SVM. Ama-tweets ayecacile futhi eqondile, futhi womabili amamodeli enza cishe ngokuphelele.
- Sithuthele esimweni esinzima, lapho ukubhuqa khona, ithoni exubile, kanye nomongo ocashile wenza umsebenzi ube yinkimbinkimbi ngokwengeziwe. Sisebenzise ukushumeka kwe-bert ukuthwebula izincazelo ezingaphezu kwamagama ngamanye.
- Okokugcina, ngecala elinamandla elingenazo imininingwane yokuqeqesha, sasebenzisa imodeli enkulu yolimi ukuhlukanisa imizwa ngqo ngokufunda okudutshulwe.
Isinyathelo ngasinye sikhombisile ukuthi ithuluzi elifanele lincike kanjani enkingeni. Indabuko ml ishesha futhi iyathembeka lapho imininingwane ihlelwe. Amamodeli wokufunda ajulile asiza lapho incazelo ecasha phakathi kwemigqa. I-LLMS inamandla uma unganamalebula noma udinga i-jwer ejwayelekile.
7. Ngaphambi kokuthi uphume!
Ngiyabonga futhi ngesikhathi sakho. Kusho okuningi ❤
Igama lami nginguPiero Paialanga, futhi ngingumuntu lapha:

Ngivela e-Italy, bamba i-PH.D. ukusuka Inyuvesi yaseCincinnatifuthi usebenze njenge- Usosayensi wedatha edeskini yezohwebo eNew York City. Ngibhala nge I-AI, Ukufunda Machine, kanye nendima evelayo yososayensi wedatha kokubili lapha ku-TDS nase-LinkedIn. Uma uthandile i-athikili futhi ufuna ukwazi kabanzi ngokufunda komshini futhi ulandele izifundo zami, unga:
A. Ngilandele I-LinkedInlapho ngishicilela khona zonke izindaba zami
B. Ngilandele Umuthi othile kikilapho ungabona khona yonke ikhodi yami
C. ngemibuzo, ungangithumela i-imeyili ku- piero.paiali@hotmail



