Machine Learning

Umhlahlandlela Wokuhlaziya Ukusinda NgePython: Ukusebenzisa Amamodeli Esikhathi Somcimbi Ukubikezela Isikhathi Sokuphila Kwekhasimende

emikhakheni eminingi yolwazi, okusisiza ukuthi sibhekane nokungaqiniseki, ukubala okungenzeka, futhi sisekele izinqumo lapho sisendleleni.

Enye yalezo zindawo ezithembele kakhulu kuzibalo imboni yezokwelapha, isebenzisa amathuluzi afana ne-T-Test, Ukuhlolwa kwe-A/B, noma Ukuhlaziywa Kokusinda. Lesi sokugcina siyisihloko salesi sihloko.

Ukuhlaziywa kokusinda kwavela kusayensi yezokwelapha neyebhayoloji, lapho babezama khona ukumodela, njengokuyinhloko yabo umcimbiukufa kwesiguli noma umzimba. Yileso isizathu segama.

Kodwa-ke, izazi zezibalo zaziqonda ukuthi ukuhlaziya okunjalo kwakunamandla kangangokuthi kwakungasetshenziswa kwezinye izindawo eziningi zokuphila, ngakho-ke kwasakazekela esizindeni sebhizinisi, ngisho nangokwengeziwe ngemva kokwanda kweDatha Science.

Ake sifunde okwengeziwe ngayo.

Ukuhlaziya Ukusinda

Ukuhlaziya Ukusinda [SA] igatsha lezibalo ezisetshenziselwa ukubikezela inani lesikhathi esithathwayo ukuze kwenzeke umcimbi othile.[1]

Obeye aziwe njengo Isikhathi somcimbilolu cwaningo lunganquma ukuthi kuzothatha isikhathi esingakanani ukuthi okuthile kwenzeke ngenkathi kubalwa iqiniso lokuthi ezinye izehlakalo azikenzeki ngesikhathi sokuqoqwa kwedatha.

Izibonelo azikho kuphela kwisayensi yezokwelapha neyezinto eziphilayo, kodwa yonke indawo.

  • Isikhathi kuze kube umshini uhluleka
  • Isikhathi kuze kube yilapho ikhasimende likhansela ukubhalisa
  • Isikhathi kuze kube yilapho ikhasimende lithenga futhi

Manje, njengoba sizama ukulinganisa inombolo, esikhundleni seqembu noma isigaba, lokhu kusho ukuthi sibhekene nohlobo lwenkinga yokuhlehla. Manje kungani singakwazi ukuhamba ne-OLS Linear Regression?

Kungani Kufanele Usebenzise Ukuhlaziya Ukusinda?

Amamodeli okuhlehla ajwayelekile afana ne-OLS noma umzabalazo we-Logistic Regression nedatha yokusinda ngenxa yokuthi aklanyelwe ukuphatha imicimbi eqediwe, hhayi izindaba “eziqhubekayo”.

Cabanga ukuthi ufuna ukubikezela ukuthi ubani oqede umjaho wamamayela angu-10, kodwa idatha yokufaka ingumcimbi osaqhubeka. Umjaho ungowamahora angu-2, futhi ufuna ukusebenzisa idatha onayo kuze kube manje ukuze ulinganisele okuthile.

Ama-algorithms okuhlehla ajwayelekile azohluleka ngoba:

  • OLS: Unama-data avela kulabo asebewuqedile umjaho kuphela. Ukusebenzisa idatha yabo kuphela kuzodala ukuchema okukhulu kubantu abasheshayo.
  • Logistic Regression: Kungase kubonakale uma umuntu ewuqedile umjaho, mhlawumbe, kodwa iphatha labo abaqede ngemizuzu engu-30 ngendlela efanayo nabaqede emahoreni angu-8.

Izisekelo Zokuhlaziya Ukusinda

Ake sihlole imiqondo embalwa ebalulekile ukuze siqonde Ukuhlaziywa Kokusinda.

Okokuqala, kufanele siqonde ukuzalwa nokufa kwephuzu ledatha.

  • Ukuzalwa: Isikhathi esiqale ngaso ukukala lelo phuzu ledatha. Isibonelo, lapho isiguli sithola ukuthi sinomdlavuza, noma usuku umuntu aqashwe ngalo yinkampani. Qaphela ukuthi ukuqaphela akudingeki ukuthi kuqale ngesikhathi esisodwa.
  • Ukufa: Kwenzeka lapho kwenzeka khona isithakazelo. Ngosuku isisebenzi sishiya inkampani.

Manje, into ethokozisayo nge-SA ukuthi ucwaningo noma ukuqaphela kungaphela ngaphambili umcimbi uyenzeka. Kulokhu, sizoba nomunye umqondo obalulekile: i iphoyinti ledatha elihloliwe.

  • Ukuvimbela (Okungekona ukufa): Uma isifundo siphela noma isihloko siyeka ngaphambi kokuba umcimbi wenzeke, idatha “iyahlolwa,” okusho ukuthi sazi kuphela ukuthi basinda okungenani kwaze kwaba yileso sikhathi.

Idatha ingacutshungulwa ngezindlela ezahlukene, noma kunjalo.

  • Ukuvinjelwa Kwesokudla: Okuvame kakhulu. Umcimbi wenzeka ngemva kokuphela kwenkathi yokubuka noma isihloko siyeka.
Iphoyinti ledatha C lihlolwe kwesokudla. Isithombe sombhali.
  • Ukuhlola Kwesokunxele: Isehlakalo senzeka ngaphambi kokuthi kuqale isifundo.

Kuhle. Kubalulekile ukuqaphela ukuthi ukuhlaziya ukusinda kuyindlela yokulinganisa amathuba okuthi umcimbi wenzeke njengomsebenzi wesikhathi. Ngokuphatha ukusinda njengomsebenzi wesikhathi, singakwazi ukuphendula imibuzo iphuzu elilodwa lamathuba angakwazi, njengokuthi: “Kungayiphi inyanga eqondile lapho ingozi yokwanda kwekhasimende ikhuphuka?”

Manje njengoba sesizazi izinto eziyisisekelo, ake sifunde kabanzi mayelana nemisebenzi ebandakanyekayo eNingizimu Afrika.

Umsebenzi Wokusinda

Umsebenzi wokusinda S

Ngakho-ke, ukusisebenzisa esibonelweni sethu se-churn yabasebenzi, sizobona amathuba okuthi isisebenzi sisesenkampanini ngemva kweminyaka engu-N.

Umsebenzi Wokusinda. Isithombe sombhali.

Umsebenzi Wengozi

Umsebenzi wengozi ukhombisa amathuba okuba isigameko senzeke ngesikhathi esithile. Iphambene nomsebenzi wokusinda, futhi imele ubungozi bokuthi i-churn (esikhundleni sokuba nethuba lokuhlala enkampanini).

Lo msebenzi uzobala ukuthi yimaphi amathuba okuthi izisebenzi ezingakaxoxi kuze kube manje zizokwenza lokho kusukela kulesi sikhathi.

Umsebenzi Wengozi. Isithombe sombhali.

Ukukhetha Imodeli Yakho Yokuhlaziya Ukusinda

Njengoba ubona, i-SA yisihloko esingajula futhi siminyene ngokushesha okukhulu. Kodwa ake sizame ukukugcina kulula.

Kunamamodeli amabili ayinhloko asetshenziswa lapho kwenziwa ukuhlaziya ukusinda. Enye yi- Kaplan-Meierelula kodwa engacabangi umthelela wezinto eziguquguqukayo ezengeziwe, futhi idinga ukucabanga okumbalwa ukuze kusebenze.

Enye i- I-Cox Proportional Hazard imodeli, okuyindinganiso yemboni ngoba ingathatha ezinye izinto eziguquguqukayo ingene kumodeli, izinzile ngokwezibalo, futhi isebenza kahle ngisho noma ukucatshangelwa okuthile kwephulwa.

Ake sifunde okwengeziwe ngabo.

Kaplan-Meier

  • Isebenza kahle ngedatha ehlolwe kwesokudla (uyakhumbula? uma umcimbi wenzeka ngemva kokuphela kwesikhathi sokubuka)
  • Imodeli enembile
  • I-Non-parametric: ayilandeli noma yikuphi ukusatshalaliswa
  • Ukuqagela kuyadingeka, njengokuyeka esikoleni akuhlobene nomcimbi; Isikhathi sokungena asiphazamisi ubungozi bokusinda; kanye nezikhathi zomcimbi zaziwa ngokunembile.
  • Ibuyisela a umsebenzi wokusinda lokho kufana nezitebhisi

Isetshenziswa nini:

  • Ukuhlaziywa kokusinda okulula ngaphandle kwamanye ama-covariates noma izibikezelo.
  • Kuhle ekubukeni okusheshayo.

I-Cox Proportional Hazard

  • Izinga lemboni
  • Yamukela izibikezelo ezengeziwe noma ama-covariate
  • Isebenza kahle noma ngabe ukucatshangwa okuthile kwephulwa
  • Izilinganiso a umsebenzi wengoziezivame ukuzinza kunemisebenzi yokusinda

Isetshenziswa nini:

  • Linganisela kudatha enokuhlukahluka okuningi kokubikezela (i-covariate).

Okulandelayo, ake sibambe amakhodi athile.

Ikhodi

Kulesi sigaba, sizofunda indlela yokumodela i-SA sisebenzisa womabili amamodeli ethulwe ngaphambilini.

Idathasethi ekhethelwe lo msebenzi yi-Telco Customer Churn, ongayithola ku-UCI Machine Learning Repository ngaphansi kwelayisensi ye-Creative Commons.

Ukubuka kwedathasethi. Isithombe sombhali.

Okulandelayo, masingenise amaphakheji adingekayo.

# Data
from ucimlrepo import fetch_ucirepo

# Data Wrangling
import pandas as pd
import numpy as np

# DataViz
import matplotlib.pyplot as plt
import seaborn as sns

# Lifelines Survival Analysis
from lifelines import KaplanMeierFitter
from lifelines import CoxPHFitter

# fetch dataset 
telco_churn = fetch_ucirepo(id=563) 
  
# data (as pandas dataframes) 
X = telco_churn.data.features 
y = telco_churn.data.targets 
  
# Pandas df
df = pd.concat([X, y], axis=1)
df.head(3)

Ukusebenzisa i-Kaplan-Meier

Manje, njengoba kushiwo, i-Kaplan-Meier [KM] imodeli ilula ngempela futhi kuqondile ukuyisebenzisa, ibe yisinqumo esihle sokubonwayo. Esikudingayo yiziguquko ezimbili: isibikezelo esisodwa kanye nelebula eyodwa.

Ngemuva kwalokho, singafaka imodeli ye-KM futhi siyifake kudatha, sisebenzisa Subscription Length (izinyanga eziphelele zokubhaliselwe) njengesibikezeli, kanye Churn njengoba umcimbi ububonile.

# Instantiate K-M
kmf = KaplanMeierFitter()

# Fit the model
kmf.fit(df['Subscription  Length'],
        event_observed=df['Churn'],
        label= 'Customer Churn')

Kwenziwe. Okulandelayo, singakwazi ukubona ngeso lengqondo umsebenzi wokusinda.

# Plot survival curve
plt.figure(figsize=(12, 5))
kmf.plot_survival_function()
plt.title('Kaplan-Meier Survival Curve: Telco Customer Lifetime')
plt.xlabel('Time (months)')
plt.ylabel('Probability of Remaining Subscribed')
plt.grid(True)
plt.show()

Lokhu kuhle kakhulu! Siyabona ukuthi amakhasimende angaphezu kwama-90% ahlala nenkampani yeTelecom cishe izinyanga ezingama-35.

Imodeli ye-Kaplan-Meier inhle ekubukeni. Isithombe sombhali.

Uma sifuna ukuqinisekisa, singabhala kalula lokho ukuze sifunde ukuthi u-90% uhlala nenkampani izinyanga ezingu-34, empeleni.

# Checking survival rate at 34 months
kmf.survival_function_at_times(34)
Customer Churn
34	0.900613

Uma sifuna ukwazi ukuthi yisiphi isikhathi esimaphakathi lapho abantu begiya khona, singasebenzisa isibaluli sika-KM .median_survival_time_. Lona iphuzu ngesikhathi

# Time 
median_survival = kmf.median_survival_time_
print(f"Median Customer Lifetime: {median_survival} months")

Singakwazi futhi ukwenza okunye ukuhlaziya, njengokuqhathanisa phakathi kwamaqembu. Cabanga ukuthi le nkampani yakwaTelco ihlukanisa amakhasimende ayo ngamaqembu amabili:

  1. Abasebenzisi abanzima: Frequency of Use > median
  2. Abasebenzisi abathambile: Frequency of Use <= median

Singakwazi ukuqhathanisa kokubili imisebenzi yokusinda kulawa maqembu amabili.

# Column Groups
df['Heavy_User'] = np.where(df['Frequency of use'] > df['Frequency of use'].median(), 1, 0)
df.head()

plt.figure(figsize=(12, 5))
plt.title('Kaplan-Meier Survival Curve: Telco Customer Lifetime')
plt.xlabel('Time (months)')
plt.ylabel('Probability Churn')

# Fit the model for Soft users and plot
kmf.fit(df[df.Heavy_User == 0]['Subscription  Length'], df[df.Heavy_User == 0]['Churn'], label='Soft User')
ax = kmf.plot_survival_function()

# Fit the model for Heavy users and plot
kmf.fit(df[df.Heavy_User == 1]['Subscription  Length'], df[df.Heavy_User == 1]['Churn'], label='Heavy User')
ax = kmf.plot_survival_function(ax=ax)

plt.show()

Futhi kukhona. Nakuba abasebenzisi abakhulu behlala beqinile nenkampani kuso sonke isikhathi esibekelwe, abasebenzisi abathambile bazonyakaza ngokushesha ngemva kwenyanga engama-30. Isikhathi sabo esimaphakathi sokusinda yizinyanga ezingama-40.

Ukuqhathaniswa kokusinda phakathi kwamaqembu. Isithombe sombhali.

Uma uqhathanisa amaqembu, kufanele uqiniseke ukuthi umehluko ubalulekile ngokwezibalo. Ngalokho, iphakheji lifelines kusetshenziswe ukuhlolwa kwezinga lelogi. Kuyisivivinyo se-hypothesis:

  • Ho (null hypothesis): Amajika okusinda wabantu ababili awahlukani.
  • Ha (alternative hypothesis): Amajika okusinda kwabantu ababili ahlukene.
from lifelines.statistics import logrank_test
# 3. Perform the Log-Rank Test
results = logrank_test(df[df.Heavy_User == 0]['Subscription  Length'],
                       df[df.Heavy_User == 1]['Subscription  Length'],
                       event_observed_A= df[df.Heavy_User == 0]['Churn'], 
                       event_observed_B= df[df.Heavy_User == 1]['Churn'])

# 4. Print Results
print(f"P-value: {results.p_value}")
print(f"Test Statistic: {results.test_statistic}")

if results.p_value < 0.05:
    print("Result: Statistically significant difference between groups.")
else:
    print("Result: No significant difference detected.")
P-value: 7.23487469906141e-103
Test Statistic: 463.7794219211866
Result: Statistically significant difference between groups.

Ukusebenzisa i-Cox Proportional Hazard

Into yokuqala epholile ongayenza nge-Cox Proportional Hazard [CPH] Imodeli ibheka ukuthi ezinye izinto eziguquguqukayo zingaba nomthelela kanjani ekusindeni komuntu wakho ombhekile.

Asihlephule.

  1. Siqala ngokukhetha amanye ama-covariate
  2. Sihlunga idathasethi
  3. Faka imodeli
  4. Faka imodeli
# 1. Prepare the data
# Selecting the time, the event, and our chosen covariates
cols_to_use = [
    'Subscription  Length', # Time 
    'Churn',                 # Event (E)
    'Charge  Amount',        # Covariate 1
    'Complains',             # Covariate 2
    'Frequency of use'       # Covariate 3
]

# Dropping any missing values for the model
df_model = df[cols_to_use].dropna()

# 2. Initialize and fit the Cox model
# Use the penalizer to stabilize the math if not converging.
cph = CoxPHFitter(penalizer=0.1)
cph.fit(df_model, 
        duration_col='Subscription  Length', 
        event_col='Churn')

# 3. Display the results
cph.print_summary()

# 4. Visualize the influence of covariates
cph.plot()

Lona umphumela wethu omuhle.

Imodeli ye-CPH. Isithombe sombhali.

Singakuhumusha kanjani lokhu?

Umugqa oqondile onedeshi kokuthi 0.0 iphuzu elingathathi hlangothi.

  • Uma iphuzu eliguquguqukayo lihlala 0akunawo umthelela ku-churn.
  • Kwesokudla (> 0): Yandisa ingozi (yenza ukushuba kwenzeke ngokushesha).
  • Kwesokunxele (< 0): Yehlisa ubungozi (yenza ikhasimende lihlale isikhathi eside).
  • Etafuleni, ikholomu ebaluleke kakhulu yababambe iqhaza ebhizinisini i-Hazard Ration exp(coef). Lisitshela ukuthi isiphindaphinda umphumela engcupheni ye-churn.

[TABLE] Izikhalo (5.36): Ikhasimende elikhonondayo 5.36 izikhathi (noma 436%) amathuba amaningi ukuxova nganoma yisiphi isikhathi kunekhasimende elingakhonondi. Lona umphumela omkhulu.

[GRAPHIC] Uyakhononda (Ingozi Ephezulu): Lesi isibikezelo sethu esiqine kakhulu. Amakhasimende anezikhalazo cishe 5.4 izikhathi ezingaphezulu ukuqhuqha nganoma yisiphi isikhathi uma kuqhathaniswa nalabo abangakwenzi.

[TABLE] Imvamisa yokusetshenziswa (0.99): Nakuba i-p-value ithi lokhu kubalulekile ngokobuchwepheshe, i-HR ka-0.99 ngempumelelo ingu-1. Kusho ukuthi umthelela ku-churn awunaki (ushintsho olu-1% kuphela).

[GRAPHIC] Imvamisa Yokusebenzisa (Ukungathathi hlangothi): Isikwele sihlezi cishe ngqo kulayini ongu-0.0. Kule modeli ethile, ukuthi ikhasimende liyisebenzisa kaningi kangakanani isevisi akushintshi kakhulu nini bayaqhuma.

[TABLE] Inani Lenkokhiso (0.83): Kukho konke ukukhuphuka kweyunithi eyodwa, ingcuphe ye-churn yehle ngo-17% ($1 – 0.83 = 0.17$). Amakhasimende akhokha kakhulu azinzile.

[GRAPHIC] Inani Lenkokhelo (Isici Esivikelayo): Isikwele singakwesokunxele kumugqa onguziro. Amanani aphezulu ahlotshaniswa ne-a ngaphansi ingozi yokutheleleka.

Singaphinde sibheke kokubili imisebenzi ye-Survival kanye ne-Hazard yale modeli.

Imisebenzi Yokusinda Nengozi evela kumodeli ye-CPH. Isithombe sombhali.

Ijika lifana nemodeli ye-KM. Ake siqhathanise amathuba okusinda enyangeni yama-34 efanayo.

# Extract the baseline survival probability at time 34
survival_at_34 = cph.baseline_survival_.loc[34]
print(f"Baseline Survival Probability at period 34: {survival_at_34.values[0]:.4f}")
Baseline Survival Probability at period 34: 0.9294

Cishe iphakeme ngo-3%, ngo-~93%

Futhi ukuvala lesi sihloko, ake sikhethe amakhasimende amabili ahlukene, elilodwa elingenazikhalo nelinye elinezikhalazo, futhi ake siqhathanise amathuba awo okusinda enyangeni yama-34.

# 1. Pick a customer (or predict for a new one)
individual = df_model.iloc[[110,111]]

# 2. Predict their full survival curve
pred_survival = cph.predict_survival_function(individual)

# 3. Get the value at time 34
prob110_at_34 = pred_survival.loc[34].values[0]
prob111_at_34 = pred_survival.loc[34].values[1]

print(f"Customer 110 (no complaints) Probability of 'Surviving' to period 34: {prob110_at_34:.2%}")
print(f"Customer 111 (yes compaints) Probability of 'Surviving' to period 34: {prob111_at_34:.2%}")
Customer 110 (no complaints) Probability of 'Surviving' to period 34: 93.94%
Customer 111 (yes compaints) Probability of 'Surviving' to period 34: 61.68%

Umehluko omkhulu, huh? Ngaphezu kwama-30%. Futhi ekugcineni singakwazi ukubala isikhathi ezinyangeni lapho ikhasimende ngalinye kulindeleke ukuthi lisebenze.

# Time Until Churn (Expected life) by customer
pred_churn = cph.predict_expectation(df_model.iloc[[110,111]])

# Get the values in months
prob110_churn = pred_churn.loc[110]
prob111_churn = pred_churn.loc[111]

print(f"Customer 110 (no complaints) expected churn at: {prob110_churn: .0f} months")
print(f"Customer 111 (yes compaints)  expected churn at: {prob111_churn:.0f} months")
Customer 110 (no complaints) expected churn at:  41 months
Customer 111 (yes compaints)  expected churn at: 31 months

Impela, izikhalo zenza umehluko kule nkampani yakwaTelco.

Ngaphambi kokuthi Uhambe

Nokho, ukuhlaziya ukusinda kungaphezu nje komsebenzi wezibalo. Izinkampani zingayisebenzisa ukuqonda ukuziphatha kwamakhasimende.

Amamodeli e-Kaplan-Meier kanye ne-Cox Proportional Hazard ahlinzeka ngemininingwane engenzeka empilweni ende yababhalisile. Sibonile ukuthi okuguquguqukayo okufana nenani lekhasimende kanye nezikhalazo zesevisi kuyithinta kanjani ngokuqondile i-churn, okuvumela abenzi bezinqumo ukuthi baphishekele amasu okugcina aqondiswe kakhulu.

Ochwepheshe bedatha abaqondayo lawa mamodeli bangakha ithuluzi elinamandla lezinkampani ukuthuthukisa ubudlelwano bazo nesisekelo sazo sabasebenzisi. Sebenzisa lawa mathuluzi ukuze uhlale ngaphambi kwejika. Ngokwezwi nezwi.

Uma ukuthandile lokhu okuqukethwe, ngithole kuwebhusayithi yami.

I-GitHub Repository

Izithenjwa

[1. Survival Analysis Definition] (

[2. The Complete Introduction to Survival Analysis in Python] (

[3. Introduction to Customer Survival Analysis: Understanding Customer Lifetimes] (

[4. Ultimate Guide to Survival Analysis] (

[5. What is the difference between Kaplan-Meier (KM) and Cox Proportional Hazards (CPH) ratio?] (

[6. Lifelines Documentation] (

[7. Survival Analysis in R For Beginners] (

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button