Izandla ngochungechunge lwesikhathi semicimbi engavamile, ngePython

Uke wakuzwa lokhu:
Sinawo manani amakhulu kakhulu kule ndima, kepha lezo zimane “abathengisi“Futhi kwenzeke kuphela [insert small number here]% wesikhathi
Ngesikhathi seminyaka yesayensi yami yedatha, ngizwile lowo musho kakhulu. Ezingqungqutheleni, ngesikhathi sokubuyekezwa komkhiqizo, nangesikhathi sezingcingo namakhasimende, lesi sigwebo kuthiwa siqinisekisa ukuthi amanani athile amakhulu (noma amancane) angahle avele, futhi awenzi impikiswano “yesizathu se-X, y no-z).
Kuzilungiselelo zokukhiqiza, lawo manani amancane kakhulu noma amakhulu kakhulu (aziwa ngokuthi Amanani Eqile) ahambisana nama-Guardrails ukuze “ahluleke ngomusa” uma kulinganiswa amanani amakhulu. Lokhu kuvame ukwenele amacala lapho udinga nje uhlelo lwakho lokusebenza, futhi ufuna ukuphepha lapho lusebenza ngaso sonke isikhathi, noma ngabe amanani angejwayelekile / ahlanya, acasulayo / ahlanya, acasulayo / ahlanya, acasulayo / anyanyekayo / acasulayo / acasulayo / acasulayo / acasulayo / acasulayo / acasulayo / acasulayo / acasulayo / acasulayo / acasulayo / acasulayo / acasulayo / acasulayo / acasulayo / acasulayo / acasulayo / acasulayo futhi acasulayo acasulayo.
Noma kunjalo, lapho sihlaziya izikhathi, singenza okungaphezu kokuthola “ukulungisa” inani elikhulu kakhulu nge-Guardrails futhi uma kukhona umkhawulo qapha Amanani Eqile ukuze sikwazi ukuqonda.
Ngezikhathi ezithile, amanani amakhulu empeleni amelela okuthile ohlelweni lwenzalo. Isibonelo, uma isikhathi sakho sochungechunge sichaza ukusetshenziswa kwamandla kwedolobha, izinga eliphakeme elingenangqondo lingakhombisa ukusetshenziswa kwamandla okudangele endaweni ethile, elingadinga isenzo. Uma ubhekene nedatha yezezimali, okuphezulu kanye nezinsimbi zinencazelo esobala, ebalulekile, nokuqonda indlela yokuziphatha kwabo kubaluleke kakhulu.
Kulesi sifundo sebhulogi, sizobe sibhekene ne Imininingwane yesimo sezululapho uchungechunge lwesikhathi luzomela amazinga okushisa (eKelvin). Imininingwane, njengoba sizobona, sizoba namadolobha amaningi, futhi wonke amadolobha athela isikhathi eside. Uma ukhetha idolobha elithile, unamasheya afana nalowo obona ngezansi:
Ngakho-ke, kulolu hlobo lwedathasethi, kubalulekile kakhulu ukumodela i-maxima ne-minima, ngoba kusho ukuthi kushisa njengesithando somlilo noma kubanda kakhulu. Ngethemba, okwamanje, uyazibuza
“Sisho ukuthini ukumodela maxima ne-minima? “
Uma ubhekene nedathashi yesikhathi sesikhathi, kunengqondo ukulindela ukusatshalaliswa okunjengoGaussian, njengaleyo oyibona lapha:

Kepha uma ubheka kuphela amanani angeqi, ukusatshalaliswa kukude lokho. Futhi njengoba uzobona okwesibili, kunezendlalelo eziningi zobunzima obunzima ezihambisana nokukhipha amanani amakhulu. Ezintathu zazo yilezi:
- Ukuchaza inani elibi kakhulu: Wazi kanjani ukuthi kukhona okwedlulele? Sizochaza ukuqaliswa kwenani lethu elibi kakhulu njengesigaba sokuqala
- Ukuchaza ukusatshalaliswa okuchaza le micimbi: Kunokusatshalaliswa okuningi okungenzeka kwamanani amakhulu. Ukusatshalaliswa kathathu okuzophathwa kulokhu okuthunyelwe kwebhulogi kukhona Inani elijwayelekile kakhulu (GEV), Weibull, na- Mbumba (icala elikhethekile le-GEV) ukusatshalaliswa.
- Ukukhetha ukusatshalaliswa okungcono kakhulu: Kunamamethrikhi amaningi esingawasebenzisa ukunquma ukusatshalaliswa “okufanele” okuhle kakhulu. Sizophatha Imininingwane ka-Akaike, le khasi Ukungena-amathubano Imininingwane yeBayesian Inqubo.
Zonke izinto esizokhuluma ngazo kulesi sihloko 🥹
Kubukeka sengathi sinomhlaba omningi okufanele usimboze. Ake siqale.
0. Idatha nomthombo we-script
Ulimi esizolusebenzisa Python. Umthombo wekhodi ungatholakala kule pieropailaiai / rareevents ifolda. Umthombo wedatha ungatholakala kulokhu Umthombo Ovulekile Idathasethi ye-Kaggle. Mina, uma uvumela ifolda ye-GitHub, ngeke udinge ukulanda i-dataset. Idathasethi ingaphakathi kwefolda yeRawdata ngaphakathi kwefolda enkulu ye-GitHub (wamukelekile 😉).
1. Ukuhlolwa kwedatha yokuqala
Ukuze wenze konke kube lula ngesikhathi sokuhlola, bese usinikeza ukuguquguquka okuphezulu kubhukwana ngaphandle kokubhala amakhulukhulu emigqa yekhodi. Ikhodi eyenza lokho [data.py] Ingabe okulandelayo:
Le khodi iwenza yonke imisebenzi engcolile – ihlakaniphile; Ngakho-ke, singenza zonke izinyathelo ezilandelayo emigqeni embalwa kakhulu yekhodi.
Into yokuqala esingayenza ukukhombisa nje eminye yemigqa yedatha. Singakwenza ngale Khodi:
Qaphela ukuthi kunamakholomu / amakholomu angama-36, hhayi ama-4 kuphela, kudathabhethi. Ngibonise i-4 ukuze ngibe netafula elihlelekile. 🙂
Izinto ezimbalwa okufanele ziqaphele:
- Yonke ikholamu, ngaphandle kokuthi “datetteime”, ayi idolobha futhi imele uchungechunge lwesikhathi, lapho lonke inani lihambisana nesikhathi se-datette, esifanekisela i-axis yesikhathi
- Njalo inani lekholomu yedolobha limele izinga lokushisa le-kelvin ngosuku kwikholomu ye-datetarime ehambisanayo. Isibonelo, i-Index = 3 yekholomu = 'Vancouver' isitshela lokho, eDatTime 2012-10-01 15:00:00, izinga lokushisa lalingu-284.627 k
Ngiphinde ngakha umsebenzi okuvumela ukuthi uhlele ikholomu yedolobha. Isibonelo, uma ufuna ukubheka ngokwenzeka eNew York, ungasebenzisa lokhu:

Manje, ikholomu ye-DateThem imane nje iyikholamu yentambo, kepha empeleni ingasizakale ukuba nenyanga ethile, usuku nonyaka emini ehlukile. Futhi, sinamagugu ama-nan okufanele siwanakekele. Zonke lezi zinyathelo ezibusayo ezibusayo zingaphakathi kwe-`.en_AND_PREPROCESS ()`
Lokhu kungumphumela:
2. Ukuthola izehlakalo ezedlulele
Manje, umbuzo obalulekile:
Yini umcimbi owedlulele? Futhi sizoyithola kanjani?
Kunezindlela ezimbili eziphambili zokuchaza “umcimbi owedlulele”. Isibonelo, uma sifuna ukukhomba UMaximasingasebenza:
- Incazelo yokuqala: Peak ngaphezulu komkhawulo (imbiza). Njengoba kunikezwe umkhawulo, konke ngenhla kwaleyo mbundu kuyiphuzu eliphakeme (umcimbi owedlulele).
- Incazelo yesibili: ngokweqile esifundeni. Ngokunikezwa iwindi, sichaza inani eliphakeme lewindi njengomcimbi owedlulele.
Kulesi sifundo sebhulogi, sizosebenzisa indlela yesibili. Isibonelo, uma sisebenzisa amawindi nsuku zonke, sithwebula i-dataset bese sikhipha inani eliphakeme kakhulu losuku ngalunye. Lokhu bekuzokunikeza amaphuzu amaningi, njengoba idatha yethu ibeka iminyaka engaphezu kwemi-5. Noma, singakwenza ngamawindi noma nsuku zonke. Lokhu bekuzokunikeza amaphuzu ambalwa, kepha mhlawumbe imininingwane ecebile.
Lona ngumbuso wale ndlela: Sinokulawula inani lamaphoyinti kanye “nekhwalithi” lazo. Kulolu cwaningo, usayizi wewindows omuhle kakhulu yi- “nsuku zonke. Kokunye ukufana, zizwe ukhululekile ukulungiswa ngosayizi wemvelo (isib. Ungase ufune ukunciphisa iwindi lemvelo (isib. Uqoqa idatha enkulu kakhulu (isib. Une-50 + yedatha newindi leviki elifanelekile).
Le ncazelo yenani eliphezulu ichazwa ngaphakathi kwe I-Rareveveentstoolbox isigaba, phakathi [rare_events_toolbox.py] Iskripthi (Bheka umsebenzi we-extract_block_max).
Futhi singabonisa ngokushesha ukusatshalaliswa komcimbi ongavamile kumasayizi wewindows ahlukene usebenzisa ibhlokhi elandelayo yekhodi:

3. Ukusatshalaliswa Kwezehlakalo Ezedlulele
Ngaphambi kokungena kwikhodi, ake sithathe igxathu. Ngokuvamile, ukusatshalaliswa kwenani elikhulu kakhulu akubonisi ukuzimela okuhle kwe-Gaussian Bell oyibonile phambilini (ukusatshalaliswa kweGaussian kweSan Francisco). Kusuka ekubukeni kwethiyori, ukusatshalaliswa kokubili okwaziyo kukhona Inani elijwayelekile kakhulu (GEV) ukusatshalaliswa kanye Iblll ukusatshalaliswa.
UGev (Inani Elikhulu Kakhulu)
- I-GEV iyisisekelo se-Theory ye-Extreme Value futhi ihlinzeka ngomndeni wokusatshalaliswa enzelwe ukumodela i-block maxima noma i-minima. Icala elikhethekile yilona Mbumba ukusatshalaliswa.
- Ukuguquguquka kwayo kuvela kwipharamitha yokwakheka enquma “ukuziphatha komsila.” Ngokuya ngale parameter, i-GEV ingalingisa izinhlobo ezahlukene zokuqisa (isib., Okulinganiselayo, okusindayo).
- Ukuboniswa kokusatshalaliswa kwe-GEV kuba yinhle kakhulu: njengombono we-central value (CLT), “Uma uphakathi kwenqwaba ye-IID okungahleliwe, ukusatshalaliswa kwesilinganiso kuthambekela ku-Gaussian”, i-EVT (I-Value Value shoory) ithi “Uma uthatha okusekugcineni (noma ubuncane) benqwaba ye-IID okungahleliwe, ukusatshalaliswa kwalolo hlobo kuthambekela ku-GEV. “
Iblll
- I-Weibull ingenye yokusatshalaliswa okusetshenziswa kabanzi e-Exbiability Engineering, Meteorology, kanye nemodeli yezemvelo.
- Kuwusizo ikakhulukazi ekuchazeni idatha lapho kunomuzwa wokuthi “oboshwe” noma okonakele ngokweqile.
- Ngokungafani nokusatshalaliswa kwe-GEV, ukwakheka kwe-weibull kuyinto -phambano. U-Waloddi Weibull, unjiniyela waseSweden, waqala wahlongoza ukusatshalaliswa ngo-1939 ukumodela amandla aphula izinto zokwakha.
Ngakho-ke sinamathuba amathathu: UGev, Gumbel, na- Weibull. Manje, iyiphi ehamba phambili? Impendulo emfishane ithi “Kuya ngokuthi,” Futhi enye impendulo emfushane “Kungcono nje ukuzama bonke futhi ubone ukuthi iyiphi eyenza kahle”.
Ngakho-ke manje sinomunye umbuzo:
Siyihlola kanjani ikhwalithi yomsebenzi wokuhambisa kanye nesethi yedatha?
Amamethrikhi amathathu okufanele awasebenzise alandelayo:
Amamethrikhi amathathu okufanele awasebenzise alandelayo:
- I-Log-Lightiyoud (LL). Kukala ukuthi kungenzeka kanjani ukuthi idatha ebonakalayo ingaphansi kokusatshalaliswa okulinganiselwe: phezulu kungcono.

lapho i-F ngobuningi obukhona (noma isisindo) sokusatshalaliswa ngamapharamitha θ ne-x_i yindawo yedatha ye-I-Th
- I-Akaike Information Criterion (AIC) Izilinganiso ze-AIC amabutho amabili: ikhwalithi efanele (nge-log-amathuba = L) Futhi ukungabi lula (Amamodeli ajezisa amapharamitha amaningi kakhulu, inani lamapharamitha = K).

- I-Bayesian Information Criterion (BIC). Umoya ofanayo no-Aic, kepha uHarsher ngobulukhuni (usayizi wedatha = ni).

Ukuncoma okujwayelekile ukusebenzisa okukodwa phakathi kwe-AIC ne-BIC, njengoba babheka amathuba okulowu kanye nobunzima.
Ukuqaliswa kwemisebenzi emithathu yokusabalalisa, kanye namanani ahambisanayo e-L, AIC, kanye nama-BIC, okulandelayo:
Futhi-ke singakhombisa ukusatshalaliswa kwethu kusetshenziswa okulandelayo:

Kulungile, kunjalo? Ngenkathi kubukeka sengathi kubukeka kukuhle, singaba inani elincane ngokwengeziwe futhi sibheke i-QQ Plot, ekhombisa umdlalo we-Qualtile phakathi kwemininingwane nokusatshalaliswa okuhlanganisiwe:

Lokhu kubonisa ukuthi ukusatshalaliswa kwethu kufana kahle nedatha enikeziwe. Manje, ungabona ukuthi kanjani, ukube ubuzame ukusatshalaliswa okujwayelekile (isib.
Manje into epholile ukuthi, njengoba senze sahlelekile, singakwazi futhi lokhu kuwo wonke amadolobha kuDatha ngokushesha kusetshenziswa ibhlokhi elandelayo:
Futhi okukhishwa kuzobukeka kanjena:
{'Dallas': {'dist_type': 'gev',
'param': (0.5006578789482107, 296.2415220841758, 9.140132853556741),
'dist': ,
'metrics': {'log_likelihood': -6602.222429209462,
'aic': 13210.444858418923,
'bic': 13227.07308905503}},
'Pittsburgh': {'dist_type': 'gev',
'param': (0.5847547512518895, 287.21064374616327, 11.190557085335278),
'dist': ,
'metrics': {'log_likelihood': -6904.563305593636,
'aic': 13815.126611187272,
'bic': 13831.754841823378}},
'New York': {'dist_type': 'weibull_min',
'param': (6.0505720895039445, 238.93568735311248, 55.21556483095677),
'dist': ,
'metrics': {'log_likelihood': -6870.265288196851,
'aic': 13746.530576393701,
'bic': 13763.10587863208}},
'Kansas City': {'dist_type': 'gev',
'param': (0.5483246490879885, 290.4564464294219, 11.284265203196664),
'dist': ,
'metrics': {'log_likelihood': -6949.785968553707,
'aic': 13905.571937107414,
'bic': 13922.20016774352}}
4. Isifinyezo
Ngiyabonga ukuchitha isikhathi nami kuze kube manje, kusho okuningi ❤
Ake sibuke ukuthi senzeni. Esikhundleni sokususa ngesandla “abathengisi,” saphawula ukweqisa njengezimpawu ezisezingeni lokuqala. Njengesibonelo:
- Sithathe i-dataset emele izinga lokushisa lamadolobha emhlabeni jikelele
- Sichaze imicimbi yethu eyeqisayo sisebenzisa i-block maxima ewindini elimisiwe
- Sifanekisela izinga lokushisa le-City-Level liphezulu nemindeni emithathu ekhetho (Gev, Gumbel, kanye ne-Weibull)
- Sikhethe okungcono kakhulu kusetshenziswa amathuba okuthola amathuba, i-AIC, ne-BIC, khona-ke kuqinisekiswa kufanelana neziza ze-QQ.
Imiphumela ikhombisa ukuthi “okungcono kakhulu” kuyehluka ngeDolobha: Isibonelo, uDallas, uPittsburgh, kanye neKansas City bencike uGev, kuyilapho iNew York ifanelana ne-weibull.
Lolu hlobo lwendlela lubaluleke kakhulu lapho amanani abi kakhulu abaluleke kakhulu ohlelweni lwakho, futhi kudingeka siveze ukuthi kuziphatha kanjani ngokwezibalo ngaphansi kwezimo ezingandile nezibucayi.
5. Iziphetho
Ngiyabonga futhi ngesikhathi sakho. Kusho okuningi ❤
Igama lami nginguPiero Paialanga, futhi ngingumuntu lapha:

Ngingu-PH.D. Ozokwenziwa ukhetho eNyuvesi yaseCincinnate Aerospace Aerospace Engineering. Ngikhuluma nge-AI ne-Machine efunda ezikhundleni zami zebhulogi naku-LinkedIn, futhi lapha ku-TDS. Uma uthandile i-athikili futhi ufuna ukwazi kabanzi ngokufunda komshini futhi ulandele izifundo zami, unga:
A. Ngilandele I-LinkedInlapho ngishicilela khona zonke izindaba zami
B. Ngilandele Umuthi othile kikilapho ungabona khona yonke ikhodi yami
C. ngemibuzo, ungangithumela i-imeyili ku- piero.paiali@hotmail



