Khulisa Ukunemba Kwezinhlelo Zokuncoma ngama-LLM, Usebenzisa iPython

esikweni laseMelika yilokhu okulandelayo:
“Ngeke uthole ikhekhe lakho futhi ulidle futhi.”
Lo musho ngiwuthola uyinkondlo kakhulu kodwa futhi usebenza kakhulu futhi uwusizo. Umlayezo walesi sisho uqondile: yonke into oyifezayo ifinyelelwa ngokuhwebelana, njengoba yonke into inenani.
Ingxoxo yefilosofi ayikho endaweni yalesi sihloko, kodwa imiphumela engokoqobo yalokhu kucatshangelwa ihambisana kakhulu nesayensi yedatha nobunjiniyela besofthiwe ngokuvamile. Ake ngichaze.
Kubunjiniyela besofthiwe nesayensi yedatha, ayikho into ebizwa ngokuthi “umklamo ophelele” ngese. I-algorithm efanayo enhle kuhlelo lokusebenza olunikeziwe yehluleka ngokudabukisayo kwezinye.
Cabanga ngokubala kuqhathaniswa nokuhwebelana kwenkumbulo kulezi zimo ezilandelayo:
Kuwenza umqondo omkhulu ukubala kusengaphambili ibanga phakathi kwamadolobha amabili futhi uwagcine kudathasethi, futhi akuwenzi umqondo ukuwabala endizeni. Lokhu kungenxa yokuthi ulindele ukuthi idathasethi ibe nokugcinwa okuphansi ngokufanelekile (amadolobha awahambi nje njalo), futhi kungaba ubuwula ukubala ibanga phakathi kwe-New York ne-San Francisco yonke ingxenye yomzuzwana. [Case A]
Kodwa-ke, kungaba ubuwula ngokulinganayo (futhi mhlawumbe kungenzeki) ukuthi i-chatbot ibambe ngekhanda yonke imibuzo engenziwa ngumuntu bese idonsa impendulo yalowo mbuzo noma nini lapho ibuzwa. Lokhu kungenxa yokuthi imvelo yenkinga inamandla kakhulu, futhi idinga ukubalwa kwezibalo “okusheshayo”. [Case B]
Esimeni A, sidela inkumbulo futhi sithola ukubala ngokushesha okukhulu. Esimweni B, sichitha isikhathi esiningi sokubala, kodwa asisebenzisi noma iyiphi inkumbulo “yombuzo”.
Awukwazi ukuthola isikhathi sokubala futhi akukho nkumbulo? Akunjalo ngempela, ngoba awukwazi ukuba nekhekhe lakho futhi ulidle futhi 🙂
Kodwa ake sithathe isibonelo esingacacile futhi “esivamile”. Ake sikhulume ngamamodeli olimi olukhulu (LLMs).
Ama-LLM angamamodeli e-AI anamandla kakhulu esinawo, futhi aqeqeshwe kulo lonke ulwazi olutholakala emhlabeni. Banjalo futhi omkhulu. Empeleni makhulu kangangokuthi asivamile ukuba nawo endlini, futhi sivame ukuwacela ngama-API. Nokho, ikholi ye-API = amathokheni = izindleko.
Manje ake ucabange ukuthi ufuna ukusebenzisa isistimu ehlakaniphile ukukhetha indawo yokudlela ehamba phambili yanamuhla kusihlwa. Ungabuza i-ChatGPT into efana nalena: “Ungangihlinzeka ngendawo yokudlela yase-Italy engabizi kakhulu kodwa enothando futhi esendaweni enhle?”
Manje, ake ucabange uma ngabe imodeli ye-GPT bekufanele ihlole zonke izindawo zokudlela endaweni yonke futhi inqume ukuthi zingama-Italian, azibizi yini, zisendaweni enhle, futhi eduze nendawo yakho. Isimo esingcono kakhulu: ungachitha izigidi zamathokheni, futhi uzobe usuvele ulele embhedeni ngesikhathi kukhishwa ikhompyutha.
Kodwa-ke, futhi asifuni ukuyeka ngokuphelele konke okumnandi, okutolika kolimi lwemvelo, namandla okubuyisa ulwazi ama-LLM. Okubalulekile ukuthi, ukuze sisebenzise i-LLM futhi sithole ulwazi oluhlakaniphile, asikwazi ukusebenzisa ingxenye yepayipi ehlakaniphe kakhulu ngaso sonke isikhathi (okungafana nokuba nekhekhe lakho bese ulidla futhi).
Kulesi sihloko, ngizokunikeza iresiphi yalezi zinhlelo zezincomo ezihlakaniphile, ezithuthukisiwe ze-LLM, ngisebenzisa isibonelo sokuncoma indawo yokudlela ebesisenza njengendawo yokusebenzisa.
Okufakwayo kwale sistimu kuzoba incazelo yomsebenzisi yendawo yokudlela ekahle edolobheni elithile, futhi umphumela kuzoba isethi yezindawo zokudlela ezinconyiwe.
Ake siqale!
1. Idizayini Yesistimu
Ikhekhe esixoxile ngalo laziwa nakwezobunjiniyela njengonxantathu Wokunemba-Isikali-Isikhathi:
- Ungenza okuthile okunembile futhi kudathasethi enkulu, kodwa izohamba kancane
- Ungenza okuthile kunembile futhi kusheshe, kodwa ngeke kukhule kahle kudathasethi enkulu
- Ungenza okuthile ngokushesha futhi ukale kahle, kodwa ngeke kube nembe kangako.
Yebo, sifuna imiphumela yethu igcine inembe, ngakho inketho 3 iyodwa ngeke inqamule. Nokho, singakwazi ukucwenga inketho yesi-3 ngemodeli enembe kakhudlwana phezu kweyokuqala. Ngamanye amazwi, Ukukhetha 3 kungasinika uhlu oluhle lwamakhandidethi anesikhathi esincane sokubala, futhi singakhetha uhlu olunembe kakhulu lwezincomo sisebenzisa Imodeli Yolimi Olukhulu.
Ngamanye amazwi, umklamo ubukeka kanje:
- Ukusesha okusheshayo nokulula kuzothola izindawo zokudlela eziseduze kakhulu zika-K (okusekelwe emthethweni, ukukhumbula okuphezulu, ukunemba okuphansi)
- Imodeli Yolimi Olunensayo, ehlakaniphe kakhulu izosisiza ukuthi sikhethe, phakathi kokuphezulu K, okungcono kakhulu ngokusekelwe embuzweni. (I-AI-based, ukunemba okuphezulu)
Ngokwenza lokhu, asichithi isikhathi nemali ku-LLM ehamba kancane, kodwa sisathola ubuhlakani bayo ngokuyisebenzisa ohlwini olukhethiwe lwabazokhethwa.
Zanele ukuhayiza. Ake siqale ukubhala amakhodi!
2. Isikripthi
2.1 Isethaphu
Ngikwenzele umsebenzi ongcolile ngemuva kwezigcawu 🙂
Yonke into ibhalwe ngendlela yezinhlelo ezigxile ezintweni (OOP), enemibhalo kanye nepayipi elizonakekela yonke inqubo. Ifolda ye-GitHub yilena, futhi ukuze ukhiqize enye ikhodi, ungayifanisa futhi usebenzise leli bhulokhi lokungenisa lapha:
2.2 Ukwenziwa Kwedatha
Ngaphambi kokuthi sincome noma yini, sidinga okuthile esingakuncoma. Kuhlelo lwangempela, sizosebenzisa isizindalwazi sokudlela endaweni ye-S3. Kulesi sihloko, sikhiqiza esokwenziwa ukuze yonke into ikwazi ukukhiqizwa futhi isebenze mahhala.
Lona umsebenzi we- RestaurantDataGenerator ikilasi ngaphakathi datagenerator.py. Yakha ithebula elikwazi ukukhiqizwa kabusha le ~10,000 izindawo zokudlela ihlakazeke emadolobheni ayisishiyagalombili (iNew York, iSan Francisco, iChicago, i-Austin, iSeattle, iBoston, iMiami, neDenver). Indawo yokudlela ngayinye ithola:
– okuqoqwe ngokungahleliwe igama
– a idolobha kanye a i-latitude/ubude isampula ezungeze isikhungo salelo dolobha (ngaphakathi kuka-~13 km),
– a isitayela cuisine (isiNtaliyane, isiJapane, isiMexico, isiThai, isiFulentshi, …),
– a ukudla iphrofayili (omnivore/vegetarian/vegan)
– i isilinganiso samaphuzu
– a inombolo yamavoti
– a uhla lwamanani (10 / 100 / 1000, ithikithi le-oda-of-magnitude elimaphakathi ngomuntu ngamunye).
Le generator yenzelwe ukusebenza kanye. Ukukhiqiza idatha kulula njengokuthi:
Lolo cingo olulodwa lubhalela ithebula data/restaurants.csvokubukeka kanje:
Kuhle, njengoba manje sinezindawo zethu zokudlela, ake sibone ukuthi singazincoma kanjani.
2.3 Ukukhiqiza Abakhethiwe
Lokhu Isiteji 1 wefaneli: uhlu lwamakhandidethi olushibhile, olusheshayo, olusekelwe emithethweni. Umsebenzisi usitshela ukuthi ukuliphi idolobha, futhi sigcina kuphela izindawo zokudlela eziseduze kakhulu ngokwendawo. Ikhodi ihlunga ithebula liye phansi edolobheni, ibala ibanga lombuthano omkhulu ukusuka kumsebenzisi ukuya kuyo yonke indawo yokudlela, futhi ikhomba N_DISTANCE_CANDIDATES (50 ngokuzenzakalelayo).
Lesi sigaba senziwa ngamabomu ukukhumbula okuphezulu, ukunemba okuphansi. Ngale ndlela, singasebenzisa itafula lonke (izindawo zokudlela eziyi-10k) ngaphandle kwekholi eyodwa ye-API nezindleko zamathokheni. Impela, asenzi lutho oluhlakaniphile noma oluwubukhazikhazi lapha, kodwa empeleni sihlunga yonke idatha okungeyona ikhandidethi elingenzeka lomsebenzisi. Lokho kukodwa kuyinto enkulu.
Isibonelo, ake sizame isicelo sangempela ekusesheni:
“ama-vegan taco ashibhile anomoya ojabulisayo” emadolobheni amaningi
Lokhu okukhiphayo:
Qaphela ukuthi uhlu olufushane olungezansi alunalwazi kanjani “nge-vegan”, “elishibhile” noma “ama-taco”: lwazi kuphela ngebanga. Kodwa-ke, lokhu kulungile, njengoba inhloso yalesi sigaba iwukudala indawo yokuqala yedolobha elingakwesokudla lapho i-LLM izophinde iyilungise kuSiteji 2.
Masilungele i-LLM!
2.4 Ukukhetha Abazokhethwa
Lokhu Isigaba 2isiphetho esinensayo, esihlakaniphile, esishayelwa yi-LLM, nesinembe kakhulu sefaneli. Lokhu kwakhela ngqo phezu kohlu olufushane lwezindawo zokudlela ezingama-50 kusuka ku-2.3. I-LLM ayilokothi ilibone ithebula eligcwele lemigqa engu-10,000; ihlale ibona ucezu oluncane, oseluvele lufanelekile olunikezwe isihlungi sebanga.
Sikhuluma nemodeli ngeklayenti elincane le-OpenAI. Ukhiye ufundwa ku OPENAI_API_KEY (igcinwe endaweni). Isincomo, esichazwa ngokuthi RestaurantRecommenderigijima embuzweni nasedolobheni idlule RestaurantRecommender.recommender(query,city):
Kunezinto ezimbalwa okufanele uzibize:
- Ukunemba kuyakhuphuka. Isiteji 1 bekungukukhunjulwa okuphezulu, ukunemba okuphansi: sibuyisele izindawo zokudlela eziseduze ezingama-50 ngaphandle kwesicelo. Isigaba 2 siwufunda ngempela umbuzo (ama-vegan tacos ashibhile anomoya ojabulisayo), ilahla yonke into engalingani, futhi ibuyisela kuphela okuhle kakhulu okungu-5 kuya ku-10 nomuntu othembekile
fit_score.
- Okuphumayo okuhlelekile nge-Pydantic. Asilokothi sincozulule umbhalo wefomu lamahhala. Imodeli iphoqeleka ukuthi iphendule ngokuma kwemodeli ye-Pydantic (ngemiphumela ehlelekile ye-OpenAI), ngakho-ke yonke impendulo iqinisekisiwe ukufanisa i-schema.
I-schema yokuphumayo iphethe i- restaurant_id futhi name (kuvela kubakhandidethi), a fit_scoreinani eliphakathi kuka-0 no-100, kanye nelifushane reason. Impendulo iphinde isongwe ngomusa summary. Ukwenza ucingo lwamadolobha ethu amathathu kunikeza, isibonelo:
Uma uqaphela, lokhu kungcono kakhulu kunohlu olufushane lwebanga elingahluziwe ukusuka ku-2.3. Lapho, indawo yokudlela eseduze edolobheni ngalinye yayifana nje ngokungahleliwe (isiKorea, iLebanese, i-Mexican-kodwa-imifino). Lapha, imodeli ihlele kabusha okufanayo Amakhandidethi angu-50 azungeze lokho esikucelile ngempela: izindawo ze-vegan nezase-Mexico zintanta phezulu ngokuphakamafit_scoresfuthi imodeli ithembekile uma kungekho okulingana kahle, imaka ukufana okuyingxenye phansi futhi ichaza ukuthi kungani ku reason. Lokho ukunemba i-LLM esithenga ngayo, kusetshenziswe kuhlu olufushane oluncane ngokwanele ukuze luhlale lushibhile esikalini.
3. Imiphumela
Ake sibuyele emuva sibheke ukuthi ifaneli yezigaba ezimbili isithenge ini, sisebenzisa isicelo esifanayo kuwo wonke amadolobha amathathu: “ama-vegan taco ashibhile anomoya ojabulisayo”.
- Isiteji 1 sisinika uhlu lwamakhandidethi. Uhlu olufushane lwebanga ukusuka ku-2.3 belunokukhunjulwa okuphezulu nokunemba okuphansi ngokuklama.
- Isiteji sesi-2 sikhomba izincomo zangempela. Ukondla amakhandidethi angu-50 ukusuka ku-Stage 1 kuya ku-LLM kubahlela kabusha ngalokho abakucelile.
Nazi ukukhetha kokugcina imodeli ebuyisiwe yedolobha ngalinye:
- I-New York: Isipuni segolide (i-vegan, 4.9) kanye I-Maison Fork (I-Mexican, kusabelomali) ikhuphukela phezulu ngamaphuzu afanelekile angu-90 no-85.
- Miami: I-Royal Tavern & Co. (i-vegan, yaseMexico, ethengekayo) ihola ku-85.
- eBoston: Isipuni sasemadolobheni futhi Indlu Encanezombili izindawo zesabelomali zaseMexico, thatha izikhala ezimbili eziphezulu ku-90 no-85.
Kuwo wonke amadolobha, imodeli iphromothe amakhandidethi ahambisana nenjongo ye-vegan, eshibhile kanye ne-Mexican/tacos, futhi yayithembekile mayelana nokulingana okungaphelele: izindawo ezibethelela ukudla kodwa hhayi ukudla (noma okuphambene nalokho) zigcinwe njengezipele ezinokuncipha okubonakalayo. fit_scores.
4. Iziphetho
Ngiyabonga ngokuchitha isikhathi nami, kusho lukhulu. ❤️ Nakhu esikwenze ndawonye:
– Yakha ifaneli yokuncoma enezigaba ezimbili kokubili okuhlasimulisayo futhi ehlakaniphile.
– Kusetshenziswe isihlungi esishibhile, esisekelwe kumthetho webanga (Isigaba 1) ukunciphisa izindawo zokudlela eziyi-10,000 zehle ziye kwezingu-50 eziseduze.
– Kusetshenziswe i-LLM renk (Isiteji sesi-2) ukuze kuguqule labo makhandidethi abangama-50 babe abangcono kakhulu abahlanu kuye kwayi-10, ngamaphuzu athembekile nesizathu salowo nalowo.
Kumaphrojekthi amaningi wangempela, ifaneli efana nalena esiyakhe lapha ivamise ukuthandwa kakhulu. Lezi zinhlobo zezinhlelo ziyingozi kakhulu, njengoba i-LLM isetshenziswa ngokuhlakanipha, futhi ihlakaniphile, njengoba sisebenzisa amamodeli angakwazi ukuqonda umongo kahle kakhulu.
7. Ngaphambi kokuthi uphume!
Siyabonga futhi ngesikhathi sakho. Kusho lukhulu. Igama lami ngingu-Piero Paialunga, futhi ngiyilo mfana lapha:

Ngingowokudabuka e-Italy, ngineziqu ze-Ph.D. ukusuka Inyuvesi yaseCincinnatifuthi usebenze njenge I-Data Scientist e-The Trade Desk eNew York City. Ngibhala mayelana I-AI, Ukufunda Ngomshini, kanye nendima ethuthukayo yososayensi bedatha kokubili lapha ku-TDS naku-LinkedIn. Uma usithandile lesi sihloko futhi ufuna ukwazi okwengeziwe ngokufunda komshini futhi ulandele izifundo zami, unga:
A. Ngilandele I-Linkedinlapho ngishicilela khona zonke izindaba zami
B. Ngilandele I-GitHublapho ungabona khona yonke ikhodi yami
C. Ngemibuzo, ungangithumelela i-imeyili kokuthi piero.paialunga@hotmail



