Machine Learning

Imiqondo emi-4 yePandas Ephula Buthule Imigqa Yakho Yedatha

ngaqala ukusebenzisa amaPanda, ngacabanga ukuthi ngenza kahle kakhulu.

Ngingakwazi ukuhlanza amasethi edatha, ngisebenze groupbyhlanganisa amatafula, futhi wakhe ukuhlaziya okusheshayo encwadini yokubhalela ye-Jupyter. Izifundo eziningi zikwenze kwazwakala kuqondile: layisha idatha, iguqule, ibone ngeso lengqondo, futhi usuqedile.

Futhi ukuze ngikhulume iqiniso, ikhodi yami ngokuvamile isebenzile.

Kwaze kwaba njalo.

Ngesinye isikhathi, ngaqala ukubhekana nezindaba ezingavamile okwakunzima ukuzichaza. Izinombolo azihlangani ngendlela ebengilindele ngayo. Ikholomu ebukeka iyinombolo iziphathe njengombhalo. Kwesinye isikhathi uguquko lwaluhamba ngaphandle kwamaphutha kodwa luveze imiphumela okwakusobala ukuthi ayilungile.

Okwakucasula ukuthi amaPanda ayengavamile ukukhononda.
Bekungekho okuhlukile okusobala noma ukuphahlazeka. Ikhodi yenziwe kahle – ivele ikhiqize imiphumela engalungile.

Kungaleso sikhathi lapho ngabona khona okuthile okubalulekile: izifundo eziningi zePandas zigxila kukho ongakwenzakodwa abavamile ukuchaza ukuthi amaPanda aziphatha kanjani ngaphansi kwe-hood.

Izinto ezifana:

  • I-Pandas isebenza kanjani izinhlobo zedatha
  • Kanjani ukuqondanisa kwenkomba isebenza
  • Umehluko phakathi kwe-a kopisha kanye nombono
  • nendlela yokubhala ikhodi yokukhohlisa idatha evikelayo

Le mibono ayizwakali ithokozisa uma uqala ukufunda amaPanda. Awacwebezeli njenge groupby amaqhinga noma ukubonwa okumnandi.
Kodwa yizona kanye izinto ezivimbelayo izimbungulu ezithulile emigqeni yedatha yomhlaba wangempela.

Kulesi sihloko, ngizodabula imiqondo emine ye-Pandas abafundisi abaningi abayiqayo – yona leyo elokhu ibangela izimbungulu ezicashile kukhodi yami.

Uma uyiqonda le mibono, ukugeleza komsebenzi we-Pandas kuba okuthembeke kakhulu, ikakhulukazi uma ukuhlaziya kwakho kuqala ukuphenduka amapayipi edatha yokukhiqiza esikhundleni sezincwadi zokubhalela eyodwa.
Ake siqale ngomunye wemithombo evame kakhulu yezinkinga: izinhlobo zedatha.

Isethi Yedatha Encane (kanye Nesiphazamisi Esicashile)

Ukuze senze le mibono iqine, masisebenze nedathasethi encane ye-e-commerce.

Cabanga ukuthi sihlaziya ama-oda esitolo se-inthanethi. Umugqa ngamunye umele i-oda futhi uhlanganisa ulwazi lwemali engenayo nesephulelo.

import pandas as pd
orders = pd.DataFrame({
"order_id": [1001, 1002, 1003, 1004],
"customer_id": [1, 2, 2, 3],
"revenue": ["120", "250", "80", "300"], # looks numeric
"discount": [None, 10, None, 20]
})
orders

Okukhiphayo:

Uma uthi nhlá, yonke into ibonakala ivamile. Sinamanani emali engenayo, izaphulelo ezithile, kanye nezingeniso ezimbalwa ezingekho.

Manje ake siphendule umbuzo olula:

Ingakanani imali engenayo isiyonke?

orders["revenue"].sum()

Ungalindela okuthile okufana nalokhu:

750

Kunalokho, i-Pandas iyabuya:

'12025080300'

Lesi isibonelo esihle salokho engikushilo ekuqaleni: AmaPanda avame ukuhluleka buthule. Ikhodi isebenza ngempumelelo, kodwa okukhiphayo akukhona okulindele.

Isizathu sicashile kodwa sibaluleke kakhulu:

I revenue ikholomu ibonakala iyinombolo, kodwa amaPanda empeleni ayigcina njenge umbhalo.

Singakuqinisekisa lokhu ngokubheka izinhlobo zedatha yozimele wedatha.

orders.dtypes

Le mininingwane emincane yethula omunye wemithombo evamile yeziphazamisi ku-Pandas workflows: izinhlobo zedatha.

Asilungise lokho ngokulandelayo.

1. Izinhlobo Zedatha: Umthombo Ofihliwe Wezimbungulu Eziningi ZamaPanda

Inkinga esisanda kuyibona yehlela kokuthile okulula: izinhlobo zedatha.
Nakuba i revenue ikholomu ibonakala iyinombolo, amaPanda ayihumushe ngokuthi into (empeleni umbhalo).
Singakuqinisekisa ukuthi:

orders.dtypes

Okukhiphayo:

order_id int64 
customer_id int64 
revenue object 
discount float64 
dtype: object

Ngoba revenue igcinwa njengombhalo, imisebenzi iziphatha ngendlela ehlukile. Lapho sicela amaPandas ukuthi afice ikholomu ngaphambili, it izintambo ezihlangene esikhundleni sokwengeza izinombolo:

Lolu hlobo lwenkinga luvela ngokumangazayo uma usebenza namasethi edatha wangempela. Idatha ethunyelwe isuka kumaspredishithi, amafayela e-CSV, noma ama-API ngokuvamile igcina izinombolo njengombhalo.

Indlela ephephe kunazo zonke iwukuba chaza ngokucacile izinhlobo zedatha esikhundleni sokuthembela ekuqageleni kwe-Pandas.

Singalungisa ikholomu sisebenzisa astype():

orders["revenue"] = orders["revenue"].astype(int)

Manje uma sibheka izinhlobo futhi:

orders.dtypes

Sithola:

order_id int64 
customer_id int64 
revenue int64 
discount float64 
dtype: object

Futhi isibalo ekugcineni siziphatha ngendlela elindelekile:

orders["revenue"].sum()

Okukhiphayo:

750

Umkhuba Olula Wokuzivikela

Noma kunini lapho ngilayisha idathasethi entsha manje, enye yezinto zokuqala engiziqhubayo yile:
orders.info()

Inikeza ukubuka konke okusheshayo kwalokhu:

  • izinhlobo zedatha yekholomu
  • amanani angekho
  • ukusetshenziswa kwenkumbulo

Lesi sinyathelo esilula ngokuvamile sembula izindaba ezicashile ngaphambi kokuba ziphenduke izimbungulu ezididayo kamuva.

Kodwa izinhlobo zedatha ziyingxenye eyodwa kuphela yendaba.

Okunye ukuziphatha kwe-Panda kubangela ukudideka okukhulu nakakhulu – ikakhulukazi uma kuhlanganiswa idathasethi noma ukwenza izibalo.
Yinto ebizwa ukuqondanisa kwenkomba.

Ukuqondanisa Inkomba: I-Panda Ifanisa Amalebula, Hhayi Imigqa

Enye yezindlela ezinamandla kakhulu – futhi ezididayo – kuPandas ukuqondanisa kwenkomba.

Uma ama-Panda enza imisebenzi phakathi kwezinto (njenge-Series noma i-DataFrames), yona ayifani nemigqa ngokuma.

Kunalokho, ihambisana nawo amalebula wenkomba.

Ekuqaleni, lokhu kubonakala kucashile. Kodwa ingakwazi ukukhiqiza kalula imiphumela ebukeka ilungile lapho uthi nhla kuyilapho empeleni iyiphutha.

Ake sibone isibonelo esilula.

revenue = pd.Series([120, 250, 80], index=[0, 1, 2])
discount = pd.Series([10, 20, 5], index=[1, 2, 3])
revenue + discount

Umphumela ubukeka kanje:

0 NaN
1 260
2 100
3 NaN
dtype: float64

Uma uthi nhlá, lokhu kungase kuzwakale kungavamile.

Kungani amaPanda akhiqiza imigqa emine esikhundleni semithathu?

Isizathu siwukuthi amaPanda aqondanise amanani ngokusekelwe kumalebula wenkomba.
AmaPanda aqondanisa amanani esebenzisa amalebula ezinkomba zawo. Ngaphakathi, isibalo sibukeka kanjena:

  • Ngo inkomba 0imali engenayo ikhona kodwa isaphulelo asikho → umphumela uba NaN
  • Ngo inkomba 1womabili amanani akhona → 250 + 10 = 260
  • Ngo inkomba 2womabili amanani akhona → 80 + 20 = 100
  • Ngo inkomba 3isaphulelo sikhona kodwa imali engenayo ayikho → umphumela uba NaN

Okukhiqizayo:

0 NaN
1 260
2 100
3 NaN
dtype: float64

Imigqa engenama-indices afanayo ikhiqiza amanani angekho, ngokuyisisekelo.
Lokhu kuziphatha empeleni kungenye yamandla e-Pandas ngoba ivumela amasethi edatha anezakhiwo ezihlukene ukuthi ahlangane ngobuhlakani.

Kodwa futhi kungethula izimbungulu ezicashile.

Lokhu Kuvela Kanjani Ekuhlaziyeni Kwangempela

Ake sibuyele kweyethu orders Idathasethi.

Ake sithi sihlunga ama-oda ngezaphulelo:

discounted_orders = orders[orders["discount"].notna()]

Manje ake sicabange sizama ukubala imali engenayo yonke ngokukhipha isaphulelo.

orders["revenue"] - discounted_orders["discount"]

Ungalindela ukususa okuqondile.

Kunalokho, i-Pandas iqondanisa imigqa isebenzisa i- indices zangempela.

Umphumela uzoqukatha amanani angekho ngenxa yokuthi ifreyimu yedatha ehlungiwe ayisenaso isakhiwo senkomba esifanayo.

Lokhu kungaholela kalula kokuthi:

  • okungalindelekile NaN amanani
  • amamethrikhi angabalwanga kahle
  • imiphumela edidayo yomfula

Futhi futhi – AmaPanda ngeke aphakamise iphutha.

Indlela Yokuzivikela

Uma ufuna ukusebenza kahle umugqa ngomugqaumkhuba omuhle ukusetha kabusha inkomba ngemva kokuhlunga.

discounted_orders = orders[orders["discount"].notna()].reset_index(drop=True)

Manje imigqa iqondaniswe ngendawo futhi.

Enye inketho ukuqondisa izinto ngokucacile ngaphambi kokwenza imisebenzi:

orders.align(discounted_orders)

Noma ezimeni lapho ukuqondanisa kungenasidingo, ungasebenza ngama-raw array:

orders["revenue"].values

Ekugcineni, konke kuncike kulokhu.

Ku-Pandas, ukusebenza kuhambisana ne- amalebula wenkombahhayi ukuhleleka komugqa.

Ukuqonda lokhu kuziphatha kusiza ukuchaza eziningi ezingaqondakali NaN amanani avela ngesikhathi sokuhlaziya.

Kodwa kukhona okunye ukuziphatha kwePandas okudide cishe wonke umhlaziyi wedatha ngesikhathi esithile.

Cishe uke wayibona ngaphambili:
SettingWithCopyWarning

Ake sikhiphe ukuthi kwenzakalani ngempela lapho.

Kuhle – asiqhubeke nesigaba esilandelayo.

Inkinga Yekhophi vs Buka (kanye Nesexwayiso Esidumile)

Uma usebenzise ama-Panda isikhashana, kungenzeka ukuthi usibonile lesi sexwayiso ngaphambilini:

SettingWithCopyWarning

Lapho ngiqala ukuhlangana nayo, ngangingayinaki kakhulu. Ikhodi isasebenza, futhi okukhiphayo kwakubukeka kukuhle, ngakho kwakungabonakali njengento enkulu.

Kodwa lesi sixwayiso sikhombe kokuthile okubalulekile mayelana nendlela ama-Pandas asebenza ngayo: ngezinye izikhathi ulungisa ifayela le- Uhlaka lwedatha lwangempelafuthi ngezinye izikhathi ulungisa a ikhophi yesikhashana.

Ingxenye ekhohlisayo ukuthi amaPanda awakwenzi kube sobala lokhu ngaso sonke isikhathi.

Ake sibheke isibonelo sisebenzisa eyethu orders Idathasethi.

Ake sithi sifuna ukulungisa imali engenayo yama-oda lapho kukhona isaphulelo.

Indlela yemvelo ingase ibukeke kanje:

discounted_orders = orders[orders["discount"].notna()]
discounted_orders["revenue"] = discounted_orders["revenue"] - discounted_orders["discount"]

Lokhu kuvame ukubangela isexwayiso:

SettingWithCopyWarning:

Inani lizama ukusethwa ekhophini yocezu olusuka ku-DataFrame
Inkinga ukuthi discounted_orders kungenzeka kungabi wuhlaka lwedatha oluzimele. Kungase kube nje a buka kweyokuqala orders idathaframe.

Ngakho-ke uma siyilungisa, i-Pandas ayinaso isiqiniseko ngaso sonke isikhathi ukuthi sihlose ukuguqula idatha yoqobo noma ukuguqula isethi engaphansi ehlungiwe. Lokhu kungaqondakali yikho okukhiqiza isexwayiso.

Okubi nakakhulu, ukuguqulwa kungenzeka ungaziphathi njalo kuye ngokuthi uhlaka lwedatha lwakhiwe kanjani. Kwezinye izimo, ushintsho luthinta uhlaka lwedatha lwangempela; kwezinye, akunjalo.

Lolu hlobo lokuziphatha okungalindelekile luwuhlobo lwento oludala iziphazamisi ezicashile ekuhambeni komsebenzi wedatha yangempela.

Indlela Ephephile: Sebenzisa .loc

Indlela ethembeke kakhudlwana iwukushintsha uhlaka lwedatha ngokusebenzisa ngokucacile .loc.

orders.loc[orders["discount"].notna(), "revenue"] = (
orders["revenue"] - orders["discount"]
)

Le syntax itshela amaPanda ngokucacile ukuthi yimiphi imigqa okufanele ilungiswe nokuthi iyiphi ikholomu okufanele ibuyekezwe. Ngenxa yokuthi ukusebenza kucacile, ama-Panda angasebenzisa ngokuphephile ushintsho ngaphandle kokungaqondakali.

Omunye Umkhuba Omuhle: Sebenzisa .copy()

Kwesinye isikhathi ufuna ngempela ukusebenza nge-dataframe ehlukile. Uma kunjalo, kungcono ukwenza ikhophi esobala.

discounted_orders = orders[orders["discount"].notna()].copy()

Manje discounted_orders kuyinto ezimele ngokuphelele, futhi ukuyilungisa ngeke kuthinte idathasethi yoqobo.

Kuze kube manje sibonile ukuthi ukuziphatha okuthathu kungabangela kanjani izinkinga buthule:

  • akulungile izinhlobo zedatha
  • okungalindelekile ukuqondanisa kwenkomba
  • okungacacile kopisha vs imisebenzi yokubuka

Kodwa kunomkhuba owodwa futhi ongathuthukisa kakhulu ukuthembeka kokugeleza komsebenzi wakho wedatha.

Kuyinto abahlaziyi bedatha abaningi abangavamile ukucabanga ngayo: ukukhohlisa kwedatha okuvikelayo.

Ukukhohlisa Kwedatha Yokuzivikela: Ukubhala Ikhodi yePandas Ehluleka Kakhulu

Into eyodwa engiye ngayibona kancane ngenkathi ngisebenza ngedatha ukuthi izinkinga eziningi aziveli ekushayweni kwekhodi.

Zivela kwikhodi esebenza ngempumelelo kodwa ekhiqiza i- izinombolo ezingalungile.

Futhi kuma-Panda, lokhu kwenzeka ngokumangazayo kaningi ngoba umtapo wezincwadi uklanyelwe ukuguquguquka. Akuvamile ukukuvimba ekwenzeni into engabazekayo.

Kungakho onjiniyela bedatha abaningi nabahlaziyi abanolwazi bathembela kokuthile okubizwa ngokuthi ukukhohlisa kwedatha okuvikelayo.

Nanku umqondo.

Esikhundleni sokuthatha ukuthi idatha yakho ilungile, ukhuthele qinisekisa ukucabanga kwakho njengoba usebenza.

Lokhu kusiza ukubamba izinkinga kusenesikhathi ngaphambi kokuthi zisakaze buthule ngokuhlaziywa kwakho noma ngepayipi.

Ake sibheke izibonelo ezimbalwa ezingokoqobo.

Qinisekisa Izinhlobo Zedatha Yakho

Ngaphambili sibonile ukuthi kanjani revenue ikholomu ibibukeka njengenombolo kodwa empeleni ibigcinwe njengombhalo. Enye indlela yokuvimbela lokhu ukuthi kungasheleli ukuhlola ngokusobala ukucabanga kwakho.

Ngokwesibonelo:

assert orders["revenue"].dtype == "int64"

Uma i-dtype ingalungile, ikhodi izophakamisa iphutha ngokushesha.
Lokhu kungcono kakhulu kunokuthola inkinga kamuva lapho amamethrikhi akho engahlangani.

Vimbela Ukuhlanganiswa Okuyingozi

Omunye umthombo ovamile wamaphutha athulile ukuhlanganisa amasethi edatha.

Cabanga nje sengeza idathasethi yekhasimende elincane:

customers = pd.DataFrame({
"customer_id": [1, 2, 3],
"city": ["Lagos", "Abuja", "Ibadan"]
})

Ukuhlanganisa okujwayelekile kungase kubukeke kanje:

orders.merge(amakhasimende, on=”customer_id”)

Lokhu kusebenza kahle, kodwa kunengozi efihliwe.

Uma okhiye bengahlukile, ukuhlanganisa kungase kudaleke ngephutha imigqa eyimpindaokwenyusa amamethrikhi afana nesamba semali engenayo.

I-Pandas inikeza isivikelo esiwusizo kakhulu kulokhu:

orders.merge(customers, on="customer_id", validate="many_to_one")

Manje i-Pandas izophakamisa iphutha uma ubudlelwano phakathi kwedathasethi kungeyona into oyilindele.

Le parameter encane ingavimbela ukulungisa amaphutha okubuhlungu kakhulu kamuva.

Hlola Idatha Elahlekile kusenesikhathi

Amanani angekho angabangela ukuziphatha okungalindelekile ekubaleni.
Ukuhlolwa okusheshayo kokuxilonga kungasiza ukuveza izinkinga ngokushesha:

orders.isna().sum()

Lokhu kubonisa ukuthi mangaki amanani ashodayo akhona kukholomu ngayinye.
Uma amasethi edatha emakhulu, lokhu kuhlola okuncane kungaveza ngokushesha izinkinga okungenzeka zingabonakali.

Ukugeleza Okulula Kokuvikela

Ngokuhamba kwesikhathi, ngiqale ukulandela umkhuba omncane noma nini lapho ngisebenza nedathasethi entsha:

  • Hlola isakhiwo df.info()
  • Lungisa izinhlobo zedatha astype()
  • Hlola amanani angekho df.isna().sum()
  • Qinisekisa ukuhlanganisa validate="one_to_one" or "many_to_one"
  • Sebenzisa .loc lapho ulungisa idatha

Lezi zinyathelo zithatha kuphela imizuzwana embalwa, kodwa zinciphisa kakhulu amathuba okwethula iziphazamisi ezithulile.

Imicabango yokugcina

Lapho ngiqala ukufunda amaPanda, izifundo eziningi zazigxile ekusebenzeni okunamandla njenge groupby, mergenoma pivot_table.

Lawo mathuluzi abalulekile, kodwa ngiye ngabona ukuthi umsebenzi wedatha othembekile uncike kakhulu ekuqondeni indlela amaPanda aziphatha ngayo ngaphansi kwe-hood.

Imiqondo efana:

  • izinhlobo zedatha
  • ukuqondanisa kwenkomba
  • ikhophi vs indlela yokubuka
  • ukukhohlisa kwedatha okuvikelayo

zingase zingazizwa zijabulisa ekuqaleni, kodwa ziyizinto kanye ezigcina ukuhamba komsebenzi kwedathauzinzile futhi uthembekile.

Amaphutha amakhulu kakhulu ekuhlaziyweni kwedatha awavamile ukuvela kukhodi ephahlazekayo.

Zivela kukhodi esebenza kahle kakhulu – ngenkathi ikhiqiza imiphumela engalungile buthule.

Futhi ukuqonda lezi zisekelo zePanda kungenye yezindlela ezinhle kakhulu zokuvimbela lokho.

Siyabonga ngokufunda! Uma uthole lesi sihloko siwusizo, zizwe ukhululekile ungazise. Ngiyazisa kakhulu impendulo yakho

Maphakathi

I-LinkedIn

Twitter

I-YouTube

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button