I-spreadsheet engenamandla ejwayelekile nge-LLM

Le ndatshana iyingxenye yochungechunge lwezihloko ezenzakalelayo zokuhlanza idatha yanoma iyiphi idatha ye-tabular.
Ungahlola isici esichazwe kule ndatshana kwi-dataset yakho usebenzisa insiza ye-cleanMyeExcy.io, emahhala futhi ayidingi ukubhaliswa.
Qala ngekakho

Ake sibheke lesi sipredishithi se-Excel, esiqukethe imininingwane ngemiklomelo enikezwe amafilimu. Itholwa encwadini Ukuhlanza idatha yesayensi esebenzayo yedatha futhi iyatholakala lapha.
Lesi yisipredishithi esijwayelekile nesivamile wonke umuntu angaba nawo futhi abhekane nawo emisebenzini yabo yansuku zonke. Kepha yini inkinga ngayo?
Ukuze siphendule lowo mbuzo, ake siqale sikhumbule umgomo wokuphela kokusebenzisa idatha: ukuthola ukuqonda okusiza ukuqondisa izinqumo zethu ezimpilweni zethu zomuntu siqu noma zebhizinisi. Le nqubo idinga okungenani izinto ezimbili ezibalulekile:
- Idatha ethembekile: Idatha ehlanzekile ngaphandle kwezinkinga, ukungahambisani, izimpinda, amanani angekho, njll.
- Idatha ehleliwe: Ifreyimu yedatha ejwayelekile esebenza kahle ecubungula kanye nokukhohlisa.
Iphuzu lesibili liyisisekelo esiyinhloko sokuhlaziywa, kufaka phakathi ukubhekana nekhwalithi yedatha.
Ukubuyela esibonelweni sethu, ake sithi sifuna ukwenza izenzo ezilandelayo:
1. Ngefilimu ngalinye elibandakanyeke kwimiklomelo eminingi, bhala umklomelo nonyaka kuhlotshaniswa nalo.
2
3. Bheka ukuthi wonke amagama omlingisi / abalingisi alungile futhi alinganise kahle.
Ngokwemvelo, le datha yesibonelo incane ngokwanele ukuba ithole lezo zindlela ngeso noma ngesandla uma sihlela (ngokushesha njengekhodi). Kepha ake ucabange manje ukuthi i-dataset iqukethe yonke imiklomelo yomlando; Lokhu kuzodla isikhathi, kubuhlungu, futhi kube namaphutha ngaphandle kokuzenzakalela.
Kufundza leli predishishithi futhi kucondzele ngqo isakhiwo salo ngomshini kunzima, njengoba kungalandeli imikhuba emihle yokuhlelwa kwedatha. Kungakho idatha yokufaka ebaluleke kakhulu. Ngokuqinisekisa ukuthi idatha ihlelwe ngendlela enobungani bemishini, singenza lula ukuphazamisa, zenze ezenzakalelayo amasheke ekhwalithi, futhi uthuthukise ukuhlaziywa kwebhizinisi – konke ngaphandle kokushintsha okuqukethwe kwangempela kwedatha.
Isibonelo sokwenziwa kabusha kwale datha:

Manje, noma ngubani angasebenzisa amathuluzi aphansi / angabi namakhodi noma imibuzo esekwe kwikhodi (SQL, Python, njll.) Ukuxhumana kalula nale datha kanye nokuthola imininingwane.
Inselelo enkulu ukuthi ungaguqula kanjani ispredishithi esikhanyayo nesinamehlo esinamuntu-esimnandi kwinguqulo efundeka ngomshini.
Yini idatha ehleliwe? Uhlaka lwedatha ebunjwe kahle?
Igama Idatha ehleliwe Kuchazwe esihlokweni esaziwayo esibizwa nge-Pidy Dat Daty ngu-Hadley Winkham futhi kushicilelwe kwiphephabhuku le-software yezibalo ngonyaka we-2014. Ngezansi kunezingcaphuno ezibalulekile ezidingekayo ukuze ziqonde imiqondo eyisisekelo kangcono.
Ukufundisa kwedatha
“Ama-Dateaset ehlelekile ukwenza lula ukukhohlisa, ukubona ngeso le-modeling.”
“Ama-Datets acocekile ahlinzeka ngendlela ejwayelekile yokuxhumanisa ukwakheka kwedatha (ukwakheka kwawo ngokomzimba) nama-semantics awo (incazelo yawo).”
Isakhiwo sedatha
“Iningi lemininingwane yezibalo lingamathebula anxande akhiwa ngemigqa namakholomu. Amakholomu cishe ahlala abhaleki, futhi ngezinye izikhathi imigqa ilebula. “
Ama-semantics wedatha
“Idathasethi iqoqo lamagugu, imvamisa kungaba izinombolo (uma ubukhulu) noma izintambo (uma zifaneleka). Amanani ahlelwe ngezindlela ezimbili. Lonke inani lingelokuhlukahluka nokubukwa. Ukuhluka kuqukethe wonke amanani akala imfanelo engaphansi ye-(njengokuphakama, izinga lokushisa noma isikhathi) kuwo wonke amayunithi. Ukuqashelwa kuqukethe wonke amanani alinganiswe kuyunithi efanayo (ngokwesibonelo, umuntu, usuku noma umjaho) ezimfanelweni. “
“Ekuhlaziyweni okunikezwe, kungahle kube namazinga amaningi okubuka. Isibonelo, ecaleni lomuthi omusha we-allergy, singaba nezinhlobo ezintathu zokubonwa:
- Idatha yabantu eqoqwe kumuntu ngamunye (iminyaka, ubulili, ubuhlanga),
- Idatha yezokwelapha eqoqwe kusuka kumuntu ngamunye ngosuku ngalunye (inani lama-sneezes, ubomvu wamehlo), futhi
- Idatha ye-Meteorological eqoqwe ngosuku ngalunye (izinga lokushisa, ukubala kwempova). “
Idatha ehleliwe
“Idatha ecocekile iyindlela ejwayelekile yokuthola imephu incazelo yedatha eya kwisakhiwo sayo. Idathabhethi ibhekwa njengezingcolile noma icocekile ngokuya ngokuthi imigqa yalo, amakholomu namatafula ahambelana kanjani nokubonile, okuguqukayo nezinhlobo. Emininingwane ecocekile:
- Ukuguquguquka ngakunye kwakha ikholomu.
- Ukubheka ngakunye kwakha umugqa.
- Uhlobo ngalunye lweyunithi yokubukwa lwakha itafula. “
Izinkinga ezijwayelekile ezinama-datasets ezingcolile
Izihloko zekholomu zingase zibe ngamanani kunokuba amagama ahlukahlukene.
- Isibonelo esingcolile: Ithebula lapho izihloko zekholomu zineminyaka engu-20, 2019, 2020, 2021) esikhundleni sekholomu “yonyaka”.
- Version Dirt: Ithebula elinekholomu “yonyaka” kanye nomugqa ngamunye omelela ukubonwa konyaka onikezwe.
Ukuhlukahluka okuningi kungagcinwa kwikholamu eyodwa.
- Isibonelo esingcolile: Ikholomu egama langu “Age_Geender” equkethe amanani afana ne-28_female
- Version Dirt: Amakholomu ahlukene we- “Age” kanye “nobulili”
Okuguquguqukayo kungagcinwa kuyo yomibili imigqa namakholomu.
- Isibonelo esingcolile: I-Dataset Tracking Student Test Scores lapho izifundo (izibalo, isayensi, isiNgisi) zigcinwa njengezihenqo zekholomu futhi ziphindaphindeka emigqeni esikhundleni se- “Isihloko”.
- Version Dirt: Itafula elinamakholomu “we-ID Yomfundi,” “Isihloko,” kanye “namaphuzu,” lapho umugqa ngamunye umele amaphuzu omfundi owodwa ngesihloko esisodwa.
Izinhlobo eziningi zamayunithi wokubuka zingagcinwa etafuleni elifanayo.
- Isibonelo esingcolile: Idatha yokuthengisa equkethe yomibili imininingwane yamakhasimende kanye nokusungula kwesitolo etafuleni elifanayo.
- Version Dirt: Amatafula ahlukanise ama-“amakhasimende” nelithi “yokusungula.”
Iyunithi elilodwa lokubheka lingagcinwa kumatafula amaningi.
- Isibonelo esingcolile: Amarekhodi ezokwelashwa wesiguli ahlukaniswe amatafula amaningi (ukuxilongwa kwetafula, ithebula lezokwelapha) ngaphandle kwe-ID yesiguli ejwayelekile oyihlobayo.
- Version Dirt: Ithebula elilodwa noma amatafula axhumeke kahle asebenzisa “i-ID yesineke.”
Manje njengoba sinokuqonda okungcono kwemininingwane ecocekile, ake sibheke ukuthi ungaguqula kanjani i-dataset engcolile ibe yinhle.
Ukucabanga ngokuthi kanjani
“Ama-Dateasets acocekile wonke afana, kepha yonke idatha engcolile ingcolile ngendlela yayo.” I-Hadley WICKEM (CF. Leo Tolstoy)
Yize le mihlahlandlela izwakala icacile emcabangweni, zihlala zinzima ukwenza kalula ukuzijwayeza kalula noma yiluphi uhlobo lwedatha. Ngamanye amagama, kusukela ngedatha engcolile, ayikho inqubo elula elula noma i-algorithm ekhona ukuyela kabusha idatha. Lokhu kuchazwa ikakhulukazi ebuntwini bedatha ngayinye. Ngempela, kunzima ngokumangazayo ukuchaza ngokuqondile okuguquguqukayo nokubukwa ngokujwayelekile bese kuguqula idatha ngokuzenzakalelayo ngaphandle kokulahlekelwa okuqukethwe. Kungakho-ke, naphezu kokuthuthuka okukhulu ekucutshungweni kwedatha kule minyaka eyishumi edlule, ukuhlanza idatha nokufomatha namanje kwenziwa “ngesandla” isikhathi esiningi “esiningi.
Ngakho-ke, lapho amasistimu ayinkimbinkimbi futhi angenakulu namasu asuselwa kwimithetho awafanelekile (okungukuthi ukubhekana ngqo nazo zonke izimo ngokuchaza izinqumo kusengaphambili), amamodeli wokufunda umshini anganikela ngezinzuzo ezithile. Lokhu kunika uhlelo lwenkululeko eyengeziwe yokuzivumelanisa nanoma iyiphi idatha ngokubheka lokho okufundile lapho kuqeqeshwa khona. Amamodeli amaningi amakhulu wezilimi (i-LLMS) avezwe izibonelo eziningi zokucubungula idatha, okwenza bakwazi ukuhlaziya idatha yokufaka kanye nemisebenzi enjengokuhlaziywa kwesakhiwo seSpreadsheet, ukulinganisa kwamakhodi, kanye nesizukulwane sekhodi.
Ngemuva kwalokho, ake sichaze ukuhamba komsebenzi okwenziwe ngama-code kanye namamojula asuselwa ku-LLM, eceleni kwebhizinisi lebhizinisi, ukuze abuye kabusha noma yiliphi ispredishithi.

I-spreadsheet encoder
Le module yenzelwe ukufakwa embhalweni imininingwane eyinhloko edingekayo kwimininingwane yespredishithi. Isethi kuphela edingekayo yamaseli afaka isandla ekwakhekeni kwetafula egcinwe, isuswa imininingwane engabalulekile noma ephindaphindwayo. Ngokugcina imininingwane edingekayo kuphela, lesi sinyathelo sinciphisa ukusetshenziswa kwethokheni, sinciphise izindleko, futhi sithuthukisa ukusebenza kwemodeli. Imininingwane engaphezulu ngayo kuzoba isihloko sesihloko esilandelayo.
Ukuhlaziywa Kwesakhiwo sethebula
Ngaphambi kokuqhubekela phambili, ukubuza i-LLM ukuthi ikhiphe isakhiwo sespredishithi isinyathelo esibalulekile ekwakheni izenzo ezilandelayo. Nazi izibonelo zemibuzo ebhekiswe:
- Mangaki amatafula akhona, futhi yini izindawo zawo (izifunda) kwispredishithi?
- Yini echaza imingcele yetafula ngalinye (isib, imigqa / ikholomu engenalutho, amamaki athile)?
- Imiphi imigqa / amakholomu asebenza njengezihloko, futhi ingabe amanye amatafula anezihloko ezisezingeni eliphakeme?
- Ingabe zikhona izigaba ze-metadata, izibalo ezihlanganisiwe, noma amanothi adinga ukuhlungwa noma acutshungulwe ngokwahlukile?
- Ingabe akhona amaseli ahlanganisiwe, futhi uma kunjalo, kufanele aphathwe kanjani?
Isilinganiso se-Schema yetafula
Lapho ukuqedwa kwesakhiwo sespredishithi sekuqediwe, manje sekuyisikhathi sokuqala ukucabanga nge-Target Target Table Schema. Lokhu kufaka ukuvumela inqubo ye-LLM nge-iteratively ngo:
- Ukuhlonza wonke amakholomu angaba khona (izihloko zemigqa eminingi, i-metadata, njll.)
- Ukuqhathanisa amakholomu wokufana kwesizinda kususelwa kumagama wekholomu nama-semantics wedatha
- Amakholomu ahlobene namaqembu
Imodyuli ekhipha i-schema yokugcina enamagama nangencazelo emfushane yekholomu ngayinye egciniwe.
Isizukulwane Sesizukulwane sokufometha ispredishithi
Uma kucatshangelwa ukuhlaziywa kwangaphambilini kwesakhiwo kanye ne-sches schema yetafula, le module yokugcina esekwe kwi-LLM kufanele ihlele ikhodi eguqula ispredishithi liguqule ifredi yedatha efanele ehambisana ne-schema yetafula. Ngaphezu kwalokho, akukho okuqukethwe okuwusizo okufanele kushiywe (isib. Amanani ahlanganisiwe noma ahlanganisiwe asengasuselwa kwezinye izinto eziguqukayo).
Njengokukhiqiza ikhodi esebenza kahle kusuka ekuqaleni ekuqaleni kokufakelwa kokuqala kuyinselele, kufakwa izinqubo ezimbili zangaphakathi zangaphakathi zokubukeza ikhodi uma kudingeka:
- Ukuhlola ikhodi: Noma nini ikhodi ingahlanganisiwe noma yenziwe, iphutha le-TRACE linikezwe imodeli ukuvuselela ikhodi yayo.
- Ukuqinisekiswa kwedatha yokuqinisekisa: I-metadata yohlaka lwedatha edaliwe – njengamagama wekholomu, imigqa yokuqala neyokugcina, kanye nezibalo mayelana nekholamu ngayinye – ihlolwe ukuthi iqinisekise ukuthi ngabe itafula livumelana yini nokulindela okulindelwe. Ngaphandle kwalokho, ikhodi ibuyekezwa ngokufanele.
Guqula uhlaka lwedatha lube yifayela le-Excel
Ekugcineni, uma yonke imininingwane ifanelana kahle etafuleni elilodwa, iphepha lokusebenzela lenziwa kusuka kuloluhlaka lwedatha ukuhlonipha ifomethi yethebula. Impahla yokugcina ebuyiselwe iyifayela le-Excel elineshidi elisebenzayo liqukethe idatha yespredishithi ecocekile.
Et vioilà! Isibhakabhaka umkhawulo wokwenza konke ukulungiswa kwakho okusanda kucocekile.
Zizwe ukhululekile ukuyivivinya nge-dataset yakho usebenzisa insiza ye-CleanMyexcel.io, emahhala futhi ayidingi ukubhaliswa.
Inothi lokugcina ekuhambeni komsebenzi
Kungani ukuhiwuka komsebenzi kuhlongozwa esikhundleni se-ejenti yaleyo nhloso?
Ngesikhathi sokubhala, sibheka ukuthi ukugeleza komsebenzi kususelwa kuma-llms ngemisebenzi engezansi emihle, kuyazinza, kuyabonakala, kuyasebenza, futhi kuyasebenza kune-ejenti yokuzimela. Umenzeli angahlinzeka ngezinzuzo: inkululeko eyengeziwe nenkululeko ezenzweni ukwenza imisebenzi. Noma kunjalo, bangasekunzima ukubhekana nalo; Isibonelo, bangahle baphambuke ngokushesha uma inhloso ingacaci ngokwanele. Ngikholwa ukuthi lokhu kuyicala lethu, kepha lokho akusho ukuthi le modeli ngeke isebenze esikhathini esizayo ngendlela efanayo ne-SWE-Agent Coding yenziwa, ngokwesibonelo.
Izindatshana Ezilandelayo ochungechungeni
Ezihlokweni ezizayo, sihlela ukuhlola izihloko ezihlobene, kufaka phakathi:
- Ukuchazwa okuningiliziwe kwe-sprepsheet enceder eshiwo ekuqaleni.
- Ukuvunyelwa kwedatha: Ukuqinisekisa ikholomu ngayinye ihlangabezana nokulindelwe.
- Ukungafani kwedatha: Ukuvimbela amabhizinisi aphindwe kabili ngaphakathi kwedatha.
- Ukuqedwa kwedatha: Ukuphatha amanani alahlekile ngempumelelo.
- Ukuhlola Ukuvuselelwa Kwedatha, Ukusebenza, nezinye izici ezibalulekile zekhwalithi yedatha.
Hlala ubukele!
Ngiyabonga kuMarc Hobballah ukuze ubukeze le ndatshana futhi unikeze impendulo.
Zonke izithombe, ngaphandle kokuthi ziphawulwe ngenye indlela, zingumbhali.