Umhlahlandlela Wokuqala Kokufunda Komshini Ogadiwe

Ukufunda ngomshini (ml) kuvumela amakhompyutha ukuthi afunde amaphethini avela kudatha futhi athathe izinqumo ngokwazo. Cabanga ngawo njengemishini yokufundisa ethi “Funda Kusipiliyoni.” Sivumela umshini ukuba ufunde imithetho ezibonelweni kunokuba uvikele ngamunye. Kungumqondo enkabeni yenguquko ye-AI. Kulesi sihloko, sizokwenza lokho okufundeni okugadiwe, izinhlobo zayo ezahlukahlukene, namanye ama-algorithms ajwayelekile awela ngaphansi kwesambulela sokufunda esigabeni.
Kuyini Ukufunda Komshini?
Ngokuyisisekelo, ukufunda ngomshini kuyinqubo yokuhlonza amaphethini kudatha. Umqondo oyinhloko ukudala amamodeli enza kahle uma ufakwa kwidatha entsha, engachazwanga. I-ML ingahlukaniswa kabanzi ezindaweni ezintathu:
- Ukufunda Okuqondisiwe
- Ukufunda okungahleliwe
- Ukugcizelela Ukufunda
Isibonelo esilula: Abafundi ekilasini
- Phakathi kwa- Ukufunda Okuqondisiweuthisha unikeza abafundi imibuzo nezimpendulo (isib., “2 + 2 = 4”) Bese bekhuphuka ngokuhamba kwesikhathi ukuze bahlole ukuthi bayalikhumbula yini iphethini.
- Phakathi kwa- Ukufunda okungahleliweabafundi bathola inqwaba yemininingwane noma izindatshana futhi bazihlukanise ngesihloko; Bafunda ngaphandle kwamalebula ngokubona ukufana.
Manje, ake sizame ukuqonda umshini ogadiwe ufunda ngobuchwepheshe.
Kuyini ukufundwa komshini ogadiwe?
Ekufundeni okugadiwe, imodeli ifunda kusuka kwimininingwane ebizwayo ngokusebenzisa ngababili bokufaka ku-dataset. Imephu phakathi kokufakwayo (okubizwa nangokuthi izici noma okuguquguqukayo ezimele) kanye nokuphuma (okubizwa ngamalebula noma okuguquguqukayo) kufundwa yimodeli. Ukwenza ukubikezela kwidatha engaziwa usebenzisa lobu budlelwano obufundile inhloso. Umgomo ukwenza izibikezelo zedatha engabonakali ngokusekelwe kulobu budlelwano obufundile. Imisebenzi yokufunda egadiwe iwela ezigabeni ezimbili eziphambili:
1. Ukuhlukaniswa
Ukuhlukahluka okuguquguqukayo kokuhlukaniswa kungokwehlukaniso, okusho ukuthi iwela eqenjini elithile lamakilasi.
Izibonelo:
- I-imeyili Spam Ukutholwa
- Faka: Umbhalo we-imeyili
- Okukhishwayo: Ogaxekile noma cha ogaxekile
- Ukuqashelwa okubhalwe ngesandla (mnist)
- Faka: Isithombe sedijithi
- Okukhishwayo: Idijithi kusuka ku-0 kuye ku-9
2. Ukubuyiselwa
Ukukhishwa okuhlukile ku-Red Rection kuyaqhubeka, okusho ukuthi kungaba nanoma iyiphi inombolo yamanani awela ngaphakathi kwebanga elithile.
Izibonelo:
- Isibikezelo sentengo yendlu
- Faka: Ubukhulu, indawo, inombolo yamakamelo
- Okukhishwayo: Intengo yendlu (ngamadola)
- Ukubikezela kwentengo yesitoko
- Faka: Amanani wangaphambilini, ukuhweba ngevolumu
- Okukhishwayo: Intengo yokuvala yosuku olulandelayo
Ukuhamba Kokusebenza Okuqondisa Ukusebenza
I-algorithm ejwayelekile yokufunda yomshini egadiwe ilandela ukuhamba komsebenzi ngezansi:
- Ukuqoqwa kwedatha: Ukuqoqa idatha enelebula isinyathelo sokuqala, okubandakanya ukuqoqa kokubili imiphumela efanele (amalebula) nokufakwayo (okuguquguqukayo noma izici ezizimele).
- Idatha Engaphambi Kwedatha: Ngaphambi kokuqeqeshwa, imininingwane yethu kufanele ihlanzwe futhi ilungiswe, njengoba idatha yezwe yangempela ivame ukungahlelwanga futhi kungahlelwanga. Lokhu kufaka phakathi ukubhekana namanani alahlekile, isimo esijwayelekile, ukufaka ikhodi umbhalo ezinombolweni, nokufometha idatha ngendlela efanele.
- Ukuhlukaniswa kokuhlola isitimela: Ukuhlola ukuthi imodeli yakho ihamba kahle kanjani kwidatha entsha, udinga ukuhlukanisa idatha ibe yizingxenye ezimbili: enye yokuqeqesha imodeli nenye ukuze uyihlole. Imvamisa, ososayensi bedatha basebenzisa cishe ama-70-80% emininingwane yokuziqeqesha futhi bagcine konke okunye ukuhlola noma ukuqinisekiswa. Iningi labantu lisebenzisa ukuqhekeka okungama-80-20 noma ama-70-30.
- Ukukhetha imodeli: Ngokuya ngohlobo lwenkinga (ukuhlukaniswa noma ukubuyiselwa emuva) kanye nohlobo lwedatha yakho, ukhetha i-algorithm yakho yokufunda yomshini, njengokubuyiselwa okuqondile komshini wokubikezela izinombolo, noma izihlahla zezinqumo zemisebenzi yokuhlukanisa.
- Ukuqeqeshwa: Idatha yokuqeqesha bese isetshenziselwa ukuqeqesha imodeli ekhethiwe. Imodeli ithola ulwazi ngezitayela eziyisisekelo nokuxhumeka phakathi kwezici zokufaka kanye namalebula okuphuma kulesi sinyathelo.
- Ukuhlola: Idatha yokuhlola engabonakali isetshenziselwa ukuhlola imodeli ngemuva kokuthi iqeqeshiwe. Kuya ngokuthi kungukuhlukaniswa noma umsebenzi wokuhlehlisa, uhlola ukusebenza kwawo usebenzisa amamethrikhi ukunemba, ukunemba, ukukhumbula, khumbula, noma i-F1-Score.
- Isibikezelo: Okokugcina, imodeli eqeqeshiwe ibikezela ukuphuma kwemininingwane emisha, yangempela yomhlaba ngemiphumela engaziwa. Uma kwenza kahle, amaqembu angayisebenzisa ngezicelo ezinjengokubikezela kwentengo, ukutholwa kwenkohliso nezinhlelo zokuncoma.
I-algorithms eganidi evamile yokufunda i-algorithms
Manje ake sibheke amanye ama-algorithms asetshenziswa kakhulu e-ML algorithms. Lapha, sizogcina izinto zilula futhi sikunikeze ukubuka konke lokho okwenziwa yi-algorithm ngayinye.
1. Ukuhlehlisa okuqondile
Ngokuyisisekelo, ukubuyiselwa okuqondile okuqondile kunquma ubudlelwano obuqondile obuqondile (y = ax + b) phakathi kwelitshe eliqhubekayo (y) kanye nezici zokufaka (x). Ngokunciphisa isamba samaphutha akwenziwe ngezikwele phakathi kwamanani alindelekile neyangempela, inquma ama-coefficients afanele (a, b). Kusebenza kahle ngokusebenza kahle kwezitayela eziqondile, njengokubikezela izintengo zasekhaya ngokususelwa endaweni noma ngesimo esikwele, ngenxa yalesi sixazululo sefomu lesibalo sefomu elivaliwe. Lapho ubudlelwane bunobuhlobo obungenakulindelana futhi ukutolika kubalulekile, ukulula kwabo kuyakhanya.
2. Ukubuyiselwa Kwe-Logistic
Naphezu kwegama lawo, ukubuyiselwa kabusha okunengqondo kuguqula ukuphuma okuqondile kube amathuba okubhekana nokuhlukaniswa kanambambili. Ifaka amanani aphakathi kuka-0 no-1, omelela amathuba okuba sekilasi, esebenzisa umsebenzi we-sigmoid (1/ (1 + e⁻ᶻ)) (isib., “Ingozi Yomdlavuza: I-87%: 87%”). Emibusweni okungenzeka (imvamisa engu-0.5), kuvela imingcele yesimo. Ngenxa yesisekelo sayo sokuhlehlisa, ilungele ukuxilongwa kwezokwelapha, lapho ukuqonda kokungaqiniseki kubaluleke kakhulu njengokwenza ukubikezela okunembile.

3. Izihlahla Izinqumo
Izihlahla zezinqumo ziyithuluzi elilula lokufunda lomshini elisetshenziselwa ukuhlukaniswa kanye nemisebenzi yokuhlehlisa. Lezi zinto ezingenamsebenzisi ezisebenziseka kalula zisebenzisa imibundu yesici (enjenge “imali engenayo> $ 50k?”) Ukuhlukanisa idatha hierarychically. Ama-algorithms anjengenqola Ukwenza kahle imininingwane inzuzo (ukwehlisa i-entropy / umehluko) endaweni ngayinye yokuhlukanisa amakilasi noma amanani okubikezela. Ukubikezela kokugcina kukhiqizwa amaqabunga esibulalayo. Yize begijimisa ubungozi bemininingwane enomsindo ngokweqile, amabhange awo amhlophe e-White-Box Aids ekuchazeni ukwenqatshwa kwemalimboleko (“ekwenqatshelwe ngenxa yesikweletu sesikweletu <600 kanye nesilinganiso sesikweletu> 40%”).

4. Ihlathi elingahleliwe
Indlela ye-Ensemble esebenzisa amasampula wesici esingahleliwe kanye nama-subsets wedatha wokwakha izihlahla eziningi ezicebisiwe. Isebenzisa ukuvota okukhulu ekuqondeni okuhlanganisiwe kokuhlukaniswa kanye nezilinganiso zokuphindaphinda. Ngokwesimo sengozi yesikweletu, lapho izihlahla ezizodwa zingadidanisa umsindo ngephethini, kuqinile ngoba kunciphisa umehluko nokugqwayiza ngokweqile ngokuhlanganisa “abafundi ababuthakathaka.”

5. Imishini yokusekela ye-vector (SVM)
Esikhaleni esiphakeme kakhulu, ama-SVMS anqume i-hyperplane engcono kakhulu kumakilasi amakhulu ahlukanisayo. Ukubhekana nemingcele engahambisani, iyidatha yemephu ephelele kubukhulu obuphezulu besebenzisa ama-kernel tricks (njenge-RBF). In the Igama / genomic data, lapho ukuhlukaniswa kuchazwa kuphela ngezici ezibalulekile, ukugcizelelwa “kwimithambo yokusekela” (amacala aphezulu omngcele) anikezela ukusebenza kahle.

I-6. Omakhelwane abaseduze (knn)
I-algorithm evilaphayo, esuselwa kwisibonelo esebenzisa ivoti eliningi labamakhelwane abasondele kakhulu esicini sokuhlukanisa amaphuzu. Ukufana kulinganiswa ngamametriki ebanga (Euclidean / Manhattan), kanye nokubusheleleka kulawulwa yi-K. Akunasigaba sokuqeqesha futhi ihambelana ngokushesha nedatha entsha, okwenza kube kuhle ngezinhlelo ezi-Recnder ezenza izincomo ze-movie zincike ekukhetheni okufanayo komsebenzisi.

7. Naive Bayes
Lesi sakhiwo se-profabilistic senza umcabango onesibindi wokuthi izici zizimele ngendlela enikezwe ikilasi ukufaka isicelo se-bayes. Isebenzisa imvamisa ibalwa ukuze isheshe iqoqa amathuba angemuva naphezu kwale “naivety.” Izigidi zama-imeyili ziskenwa izihlungi zogaxekile zesikhathi sangempela ngenxa ye-O (N) yokubekezelelana nokubekezelela idatha.

8. Ukukhulisa i-Gradients (XGBOOST, I-Lightgbm)
Ukuhlangana okulandelanayo lapho yonke umfuba obuthakathaka omusha (isihlahla) elungisa amaphutha owandulelayo. Ngokusebenzisa i-gradient fest ukuze uthuthukise imisebenzi yokulahleka (njengephutha elisezingeni elithile), lilingana nezinsalela. Ngokwengeza ukukhiqizwa okujwayelekile kanye nokusebenza okuhambisanayo, ukwenziwa okuthuthukile okufana ne-XGboost Gotate Kagbost Persite Kagle Imiqhudelwano ngokuthola ukunemba kwedatha ye-tabular nokusebenzisana okuyinkimbinkimbi.

Izinhlelo zokusebenza zangempela zomhlaba
Ezinye izinhlelo zokusebenza zokufunda ezigadiwe yilezi:
- Ukunakekela impilo: Ukuqondisa Ukufundwa Kwaguqulwa Ukuxilongwa. Amanethiwekhi we-Convelval Neural Araural (CNN) ahlukanise ama-tumors e-MRI Scans nge-95% ukunemba, ngenkathi amamodeli we-regression abikezela amamodeli wesiguli noma ukusebenza kwezidakamizwa. Isibonelo, i-lyna yakwaGoogle ithola ama-metastases omdlavuza webele ngokushesha kunabantu bezezimali, okwenza ukungenelela kwangaphambili.
- Xhasa ngemali: Ama-Classifiers asetshenziswa ngamabhange okuthola amagoli okuthola isikweletu nokutholwa kwenkohliso, ukuhlaziya amaphethini okuthenga ukuthola ukungahambisani. Amamodeli wokuphindaphinda asebenzisa idatha yemakethe yomlando ukubikezela okuzenzakalelayo kwemalimboleko noma ukuthambekela kwesitoko. Ngokuzenzakalelayo ukuhlaziywa kwedokhumenti, ipulatifomu yemali yemali ye-JPMORGAN igcina amahora angama-360,000 abesebenzi ngonyaka.
- Ukuthengisa nokumaketha: Inhlanganisela yamasu abizwa ngokuthi ukuhlunga okubambisana isetshenziswa izinjini zokuncoma ze-Amazon ukwenza izincomo zomkhiqizo, ukuthengiswa okwandayo ngo-35%. Ukubikezela kabusha kwesibikezela kudinga izikhala zokuqamba ukusebenza kahle kwe-Inventory, ngenkathi ama-checkifiers asebenzisa umlando wokuthenga ukubikezela ukulahlekelwa kwamakhasimende.
- Izinhlelo ezizimele: Izimoto zokuzishayela zithembela kwi-real-time Object Creamifiers efana ne-Yolo (“Ubheka kuphela uma kuphela”) ukukhomba abahamba ngezinyawo nezimpawu zomgwaqo. Amamodeli weRegression abala ubungozi bokushayisana kanye nama-engeli aqondisa, enika amandla ukuzula okuphephile ezindaweni ezinamandla.
Izinselelo Ezibucayi Nemiyalo Ebalulekile
INSELELE 1: Ukweqisa vs. Underfing
Ukweqisa kwenzeka lapho amamodeli ekhumbula ngekhanda umsindo wokuqeqesha, wehluleka kwidatha entsha. Izisombululo zifaka phakathi ukusebenza okujwayelekile (ukujeziswa kobunzima), ukuqinisekiswa kwe-cross, kanye nokuhlanganisa izindlela. Underfing uvela ekusebenzeni ngokweqile; Ukulungiswa kubandakanya ubunjiniyela besici noma ama-algorithms athuthukile. Ukulinganisa bobabili kukhulisa ukuthuthukiswa okujwayelekile.
INSELELE 2: Ikhwalithi yedatha kanye nokukhetha
Idatha enobandlululo ikhiqiza amamodeli wokubandlulula, ikakhulukazi kwinqubo yesampula (isib. Ukuncishiswa kufaka phakathi ukukhiqizwa kwedatha yokwenziwa (okubulalayo), ubuhle be-algorithms, kanye nokuhlanzeka okuhlukahlukene kwedatha. Ukucwaningwa kwamabhuku okuqinile kanye ne- “Model Cards” ukubhala imikhawulo kuthuthukisa ukucaca kanye nokuziphendulela.
INSELELE 3: 'Isiqalekiso Sokubucayi'
Idatha ephezulu kakhulu (izici eziyi-10k) idinga inani elikhudlwana lamasampuli ukugwema ukubheja. Amasu wokunciphisa ubukhulu afana ne-PCA (Ukuhlaziywa kwengxenye eyinhloko), i-LDA (Ukuhlaziywa kobandlululo oqondile) Thatha lezi zici ezibucayi futhi unciphise lapho ugcina imininingwane efundisayo, evumela ukuthi abahlaziyi benze okungcono bakhiphe izinqumo ezinzima ngokusekelwe kanye nokunemba.
Ukugcina
Ukufundwa komshini ogadiwe (i-SML) amabhuloho phakathi kwedatha eluhlaza kanye nesenzo esihlakaniphile. Ngokufunda ezibonelweni ezifakiwe zenza amasistimu enze amasistimu enze ukubikezela okunembile kanye nezinqumo ezinolwazi, kusuka ekuhlungeleni ogaxekile kanye nokuthola inkohliso ezimakethe zokubikezela nokusiza ekunakekelweni kwezempilo. Kulesi siqondisi, sasibekela umkhankaso oyisisekelo, izinhlobo ezibalulekile (ukuhlukaniswa kanye nokubuyiselwa emuva), kanye nama-algorithms abalulekile afaka izicelo zomhlaba wonke. I-SML iyaqhubeka nokubumba umgogodla wobuchwepheshe obuningi esincike kuzo nsuku zonke, imvamisa ngaphandle kokubona.
Ngena ngemvume ukuze uqhubeke nokufunda futhi ujabulele okuqukethwe okwenziwe ngochwepheshe.



