Machine Learning

UROC AUC uchaze: Umhlahlandlela Wabaqalayo wokuhlola amamodeli wokuhlela

Thumela kwi-matrix yokudideka, sasebenzisa i-algorithm yokubuyiselwa kwemali engenayo eDathani lomdlavuza we-wisconsin ukuze ahlukanise ukuthi isimilo siyingozi noma i-benign.

Sihlole imodeli yokuhlukanisa usebenzisa amamethrikhi ahlukahlukene njengokunemba, ukunemba, njll.

Manje, kumamodeli wokuhlukaniswa kanambambili, sinenye indlela yokuhlola imodeli, futhi lokho Roc auc.

Kule bhulogi, sizoxoxa ngokuthi kungani sinenye i-metric nokuthi kufanele isetshenziswe nini.

Ukuqonda i-ROC AUC ngokuningiliziwe, sizocubungula i-Dataset ye-IBM HR Analytics.

Kulesi dathale, sinolwazi olungaba ngu-1,470 abasebenzi abanjengobudala babo, iqhaza likaJobe, ubulili, imali engenayo yanyanga zonke, ukwaneliseka komsebenzi, njll.

Sekukonke, kunezici ezingama-34 ezichaza isisebenzi ngasinye.

Siphinde sibe nekholomu eqondiwe, 'Ukuheha'okusho ukuthi 'Yebo' Uma umsebenzi eshiye inkampani futhi 'Cha' Uma umsebenzi ehlala.

Ake sibheke ukusatshalaliswa kweklasi kwekholomu eqondiwe.

Isithombe nguMlobi

Kusuka kulokhu kusatshalaliswa kweklasi elingenhla, singabona ukuthi i-dataset iyi -sibili.

Manje, sidinga ukwakha imodeli ngokususelwa kule datha ukuhlukanisa abasebenzi ngokuya ngokuthi bazohlala enkampanini noma cha.

Njengoba lokhu kungukuhlukaniswa kanambambili (yebo / cha), ake sisebenzise i-algorithm yokubuyisela emininingwane yedatha kule datha.

Ikhodi:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, classification_report

# Load the dataset
df = pd.read_csv("C:/HR-Employee-Attrition.csv")

# Drop non-informative columns
df.drop(['EmployeeNumber', 'Over18', 'EmployeeCount', 'StandardHours'], axis=1, inplace=True)

# Encode the target column
df['Attrition'] = df['Attrition'].map({'Yes': 1, 'No': 0})

# One-hot encode categorical features
df = pd.get_dummies(df, drop_first=True)

# Split features and target
X = df.drop('Attrition', axis=1)
y = df['Attrition']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train logistic regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train_scaled, y_train)

# Predict on test data
y_pred = model.predict(X_test_scaled)

# Predict probabilities for the positive class
y_prob = model.predict_proba(X_test_scaled)[:, 1]


# Confusion matrix and classification report
conf_matrix = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)

# Display results
print("Confusion Matrix:n", conf_matrix)
print("nClassification Report:n", report)

Umbiko we-Cardion ne-Classification Report

Isithombe nguMlobi

Ukusuka embikweni wokuhlukaniswa ngenhla, sibona ukuthi ukunemba kungama-86%. Kodwa-ke, ukukhumbula ngakho '1' .

Ukukhumbula ngakho '0' .

Lokhu kwenzeka ngenxa yedatha enomzali. Ukunemba kungadukisa lapha.

Ingabe lokhu kusho ukuthi kudingeka siguqule i-algorithm yethu? Cha.

Sidinga ukushintsha indlela esilinganisa ngayo imodeli yethu, futhi indlela engcono yokuhlola amamodeli wokuhlelwa ngedatha enomzali Roc auc.


Manje siyazi ukuthi kunenye indlela yokuhlaziya amamodeli wokuhlela, okungukuthi, i-roc auc. Kepha ngaphambi kokuhlola i-roc auc, ake sibe nomqondo ocacile wokuthi kwenzekani kuze kube manje.

Sisebenzise ukubuyiselwa imali okune-locistic kudatha ye-IBM HR, futhi imodeli yasinika amaphuzu amathuba esisebenzi ngasinye, amele makaze ukuthi ashiye umsebenzi.

Isithombe nguMlobi

Lapho sikhiqiza i-matrix yokudideka nombiko wokuhlukanisa, basuselwa embundwini, okuthi ngokuzenzakalelayo ngu-0,5.

Uma kungenzeka ukuthi kungenzeka kube kukhulu kuno-0,5, umsebenzi uthathwa njengowushiye umsebenzi; Uma kungenzeka ukuthi kuncane kune-0,5, isisebenzi kubhekwa njengokuhlale.

Kulokhu, sathola ukunemba kwama-86%, kepha khumbula bekungama-34% kuphela. Siphawule ukuthi ukunemba kuyadukisa, ngakho-ke sanquma ukuhlola imodeli esebenzisa i-roc auc.


Roc auc

Okokuqala, sizoxoxa ngomlingiswa owemukelayo (roc) ijika.

Sithola ijika le-roc ngokuhlela i Isilinganiso esilungile sangempela qhathanisa nezinga elihle lamanga.

Sesivele sazi ukuthi umbiko wehlukaniso ususelwa embundwini owodwa, kepha ijika le-roc lenziwa ngokubala isilinganiso seqiniso seqiniso (i-TPR) kanye nezinga elifanele elingamanga (FPR) kuyo yonke imikhawulo engenzeka bese ibakha.

Ake sithathe idatha yesampula futhi sibone ukuthi sikhiqiza kanjani ijika le-roc kulo.

Isithombe nguMlobi

Manje, kule mininingwane engenhla, sibala i-TPR ne-FPR lapho kunemibundu engenzeka bese sizihlela.

Yini imibundu engenzeka?

Ukukhiqiza ijika le-roc, asidingi ukubala i-TPR ne-FPR kuwo wonke amanani phakathi kuka-0 no-1.

Esikhundleni salokho, sisebenzisa amathuba okubikezelwa kusuka kudathafathi nenani elilodwa ngaphezulu kwamathuba aphezulu abikezelwe (ngakho konke ukubikezela akunamkhawulo (ngakho-ke konke ukubikezela kunamathuba aphezulu (ngakho konke ukubikezela kuqondile, kuqeda ijika (1,1)).

Kungani kungenjalo yonke inombolo ephakathi kuka-0 no-1 njengomkhawulo?

Cabanga ngemininingwane yesampula yethu. Sinethuba elibikezelwe elingu-0.6592, futhi sizosebenzisa lokho njengombundu wokubala i-TPR ne-FPR.

Manje, phakathi kuka-0.6592 no-0.8718 i-TPR ne-FPR ihlale injalo, futhi bashintsha uma umkhawulo weqa amathuba okubikezelwa.

Kungakho sisebenzisa amathuba ahlukile abikezelwe njengombundu wokukhiqiza ijika le-roc.

Manje, ngokuya ngemininingwane yesampula yethu, ake sikhiqize ijika le-roc futhi sibone ukuthi yini esingayibona.

Ukukhiqiza ijika le-roc, sidinga ukubala i-TPR ne-FPR.

[
text{True Positive Rate (TPR)} = frac{text{True Positives (TP)}}{text{True Positives (TP)} + text{False Negatives (FN)}}
]

Izinga elifanele leqiniso (i-TPR) libizwa nangokuthi Khumbula.

[
text{False Positive Rate (FPR)} = frac{text{False Positives (FP)}}{text{False Positives (FP)} + text{True Negatives (TN)}}
]

Imikhawulo esizoyisebenzisa le datha yesampula ukubala i-TPR ne-FPR yile {1, 0.9799, 0.9709, 0.8737, 0.8718, 0.6592, 0.6537, 0.6537, 0.6537, 0.6337, 0.6537, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337, 0.6337

Masibala i-TPR ne-FPR embundwini ngamunye.

[
begin{aligned}
mathbf{At threshold 0.9799:} & \[4pt]
Mathrm {True poditives (tp)} & = 1, quad mathrm {Flue Ugatives (FN)} = 2,
mathrm {amanga poditives (fp)} & = 0, quad mathrm mathrm i-mathrm {True [6pt]
mathrm {tpr} & = frac { mathrm {tp}} { } {Tp} {1 } {1} {1} [6pt]
mathrm {fpr} & = frac { mathrm {fp}} { mathrm {fp} {fp} { grac {0 3} = 0 0 \[6pt]
Marerow ( Mathrm {FPR}, , mathrm {TPR}) & = (0, , 0.33)
ekugcineni {aqondaniswe}
]

Ngale ndlela, sibala i-TPR ne-FPR embundwini ngamunye.

Isithombe nguMlobi

Manje, ake sihlele i-TPR Verver VPR ukuthola ijika le-roc.

Isithombe nguMlobi

Le yindlela ekhiqizwa ngayo ijika le-roc. Njengoba sibheke isampula yamaphoyinti ayi-6 kuphela, kunzima ukubona futhi ukutolika ijika ngokucacile. Umgomo oyinhloko lapha ukuqonda ukuthi kukhiqizwa kanjani ijika le-roc.

Manje sidinga ukuhumusha ijika le-roc, futhi ngenxa yalokho sizokhiqiza ijika le-roc lisebenzisa i-python kudathafathi yethu.

Ikhodi:

# Compute ROC curve and AUC
fpr, tpr, thresholds = roc_curve(y_test, y_prob)
roc_auc = auc(fpr, tpr)

# Print AUC
print(f"AUC: {roc_auc:.2f}")

# Plot ROC curve
plt.figure(figsize=(6,6))
plt.plot(fpr, tpr, label=f"ROC curve (AUC = {roc_auc:.2f})", linewidth=2)
plt.plot([0,1], [0,1], 'k--', label="Random guess (AUC = 0.5)")
plt.xlim([0,1])
plt.ylim([0,1.05])
plt.xlabel("False Positive Rate (FPR)")
plt.ylabel("True Positive Rate (TPR)")
plt.title("ROC Curve - Logistic Regression (HR Dataset)")
plt.legend(loc="lower right")
plt.grid(True)
plt.show()

Uzungu:

Isithombe nguMlobi

Ake sibheke ukuthi yini esingayihumusha kusuka ku-roc curve yedwa, kungakhathalekile ukuthi u-AUC, ngoba sizoxoxa ngokuhamba kwesikhathi.

Kule nyoso engenhla, i-y-axis imele isilinganiso esihle esifanele, okusho ukuthi mangaki ama-politives ahlonza amamodeli, futhi i-x-axis imele isilinganiso esihle samanga, okusho ukuthi kukhiqizwa amanga amanga amanga.

Esikhathini se-roc curve, sibona ukuthi imodeli iziphatha kanjani ngokubumba okuhlukahlukene. Sifuna inani elifanele eliphakeme libe phezulu ngangokunokwenzeka ngenkathi ugcina isilinganiso esihle samanga siphansi, okusho ukuthi ijika kufanele liphume ekhoneni eliphezulu kwesokunxele.

Uma ijika liseduze noma eduze kwe-diagonal, imodeli empeleni yenza ukuqagela okungahleliwe, futhi ukusebenza kwawo akugculisi.

Uma ijika lilele ngaphansi kwe-diagonal, ukusebenza kwemodeli kumpofu kakhulu.

Ngale ndlela, singathola umbono wokusebenza kwemodeli emikhawulweni ehlukahlukene.

Manje ake sixoxe ngakho Indawo ngaphansi kwejika (AUC).

Sesikubonile lokho, ngedatha yethu AUC 0.81.

Isithombe nguMlobi

I-AUC ye-0.81 isho ukuthi uma ukhetha umsebenzi oyedwa oshiye omunye owahlala, kunethuba eli-81% yokuthi imodeli inikeza kungenzeka kwesisebenzi esisele.

Manje, ake sisebenzise idatha yesampula ukuqonda ukuthi i-AUC ibalwa kanjani.

Manje futhi sibuyela emuva kwijika le-roc esilikhiqize kusetshenziswa idatha yesampula yethu.

Isithombe nguMlobi

Izifunda ezinomthunzi kulesi sicebe esingenhla zimelela i-AUC.

Manje ake siqhubeke nokubalwa kwe-AUC.

Kusuka endaweni (0.00, 0.33) kuya (0.33, 0.33), indawo ephansi yejika imelwe yi-Orange unxande.

Kusuka endaweni (0.33, 0.33) kuya (0.67, 0.33), indawo engaphansi kwejika imelelwa unxande oluhlaza.

Kusuka endaweni (0.67, 0.33) kuya (1.00, 0.33), indawo engaphansi kwejika imelwe yiRexangle ebomvu.

Manje ukuthola i-AUC, sidinga ukubala izindawo zamaxande bese zizifaka.

$ $
umbhalo {unxande we-orange:} l times b = 0.33 times 0.33 = 0.11
$ $

$ $
Umbhalo {unxantathu ohlaza:} l times b = 0.34 times 0.33 = 0.11
$ $

$ $
Umbhalo {unxande obomvu:} l times b = 0.33 times 0.33 = 0.11
$ $

$ $
Umbhalo {Inani le-Auc} = 0.11 + 0.11 + 0.11 = 0.33
$ $

Ngale ndlela sibala i-AUC.

Kulesampula engenhla, singathola nendawo ngaphandle kokuyihlukanisa ibe izingxenye ze-point-by-point, kepha emhlabeni wangempela asitholi ukubona amajika anjalo we-roc.

Manje ake sibheke isibonelo se-roc ijika elifana namacala omhlaba wangempela futhi sibala i-AUC.

Isithombe nguMlobi

Manje ake sithole i-AUC le Curve Roc.

Lapha sinezigaba ezintathu. Ababili babo bangama-trapezoids futhi munye ubukeka njengonxantathu. Kodwa-ke, asisebenzisi amafomula ahlukile ngesimo ngasinye, ngoba amaxande kanye nonxantathu ayakhiwa.

Sisebenzisa kuphela ifomula ye-trapezoid yendawo.

$ $
Umbhalo {indawo} = tfrac {1} {2} times (y_1 + y_2) ama-x_2 – x_1)
$ $

Manje, usebenzisa le formula, ake sithole i-AUC.

$ $
Umbhalo {Segment 1:} (0.0,0.0) ; (0.2,0.4)
Umbhalo {indawo} = tfrac {1} {2} times (0.0 + 0.4) izikhathi 0.2 – 0.0) = 0.04
$ $

Lapha, ifomula le-trapezoid linciphisa ngokuzenzakalelayo kwifomula yendawo yoxantathu.

$ $
Umbhalo {Segement 2:} (0.2,0.4) ; REXARROROW ; (0.6,0.8) \
Umbhalo {indawo} = tfrac {1} {2} times (0.4 + 0.8) izikhathi 0.6 – 0.2) = 0.24
$ $

$ $
Umbhalo {Segement 3:} (0.6,0.8) ; (1.0,1.0) \
umbhalo {indawo} = tfrac {1} {2} times (0.8 + 1.8 + 1.6 1.0) times (1.0 – 0.6) = 0.36
$ $

$ $
Umbhalo {Inani le-Auc} = 0.04 + 0.24 + 0.36 = 0.64
$ $

Le yindlela i-AUC ibalwa kanjani. Manje siyaqonda ukuthi sathola kanjani i-AUC ka-0.81 yedatha yethu ye-HR.


Kepha kukhona nendlela yesibili yokubala i-AUC.

Futhi, sibuyela emuva kudatha yesampula yethu.

Isithombe nguMlobi

Positives (1's): [0.9799, 0.6592, 0.6337]

Ama-Negatives (0's): [0.9709, 0.8737, 0.8718]

Lapha sinamabili ama-9 amabi amabi.

Manje siqhathanisa okunempilo ngakunye nge-negative ngayinye ukubona ukuthi ngabe okuhle kubalwa kuphakeme noma kubi.

$ $
0.9799 ; ( umbhalo {omuhle})> 0.9709 ; ( umbhalo {okubi ) ; ; umbhalo {amahle abekwe phezulu}
$ $

$ $
I-0.9799 ; ( umbhalo {omuhle})> 0.87337 ; ( umbhalo {okubi umbhalo {amahle abekwe phezulu}
$ $

$ $
0.9799 ; ( umbhalo {omuhle})> 0.8718 ; ( umbhalo {okubi}) ; ; ; umbhalo {amahle abekwe phezulu}
$ $

$ $
0.6592 ; ( Umbhalo {omuhle}) <0.9709 ; Umbhalo {Amahle abekwe phansi} $ $ $ $ $ $ $ 0.6592 ; ( umbhalo {omuhle}) <0.87737 ; ( umbhalo umbhalo {isikhundla esiphansi esiphansi} $ $ $ $ $ $ $ 0.6592 ; ( umbhalo {omuhle}) <0.8718 ; ( umbhalo umbhalo {isikhundla esiphansi esiphansi} $ $ $ $ $ $ 0.6337 ; ( umbhalo {omuhle}) <0.9709 ; ( umbhalo umbhalo {isikhundla esiphansi esiphansi} $ $ $ $ $ $ $ 0.6337 ; ( umbhalo {omuhle}) <0.87737 ; ( umbhalo umbhalo {isikhundla esiphansi esiphansi} $ $ $ $ $ $ 0.6337 ; ( umbhalo {omuhle}) <0.8718 ; ( umbhalo Umbhalo {Amahle abekwe phansi {$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 3, 3, Quad Umbhalo {Amabili amabhangqa

Lokhu kubizwa ngokuthi Indlela yokuhlela ye-auc.

Sithole inani elifanayo le-0.33 lisebenzisa zombili izindlela zedatha yesampula.

Singakuqonda lokho

$ $
Umbhalo {AUC} = Frac { Umbhalo {Inombolo yamabhande ahlelwe kahle}} { Umbhalo
$ $

Singahumusha i-AUC njengokuthi amathuba akhethiwe akhethiwe abekwe phezulu kunokuba akhethwe ngokungakhethi.


Manje njengoba sinomqondo wokuthi ungakhiqiza kanjani ijika le-roc futhi ukubala i-auc, ake sixoxe nge Ukubaluleka kwe-roc-auc.

Sisebenzise i-roc-auc lapho sithola ukuthi ukunemba kuyadukisa. Kepha esikhundleni se-roc-auc, singabuza: kungani ungagijimeli i-loop kuwo wonke amanani aphezulu, ukubala ukunemba namanye amamethrikhi, bese ukhetha umkhawulo omuhle kakhulu?

Yebo, lokho kungenzeka. Kodwa-ke, lapho siqhathanisa amamodeli amabili, asinakuziqhathanisa ngokuya ngemibundu. Njengoba amamodeli ahlukene angaba nemikhawulo ehlukile ehlukene.

I-ROC-AUC isinikeza inombolo eyodwa efingqa ukusebenza kwemodeli futhi ivumela ukuqhathanisa kuwo wonke amamodeli ahlukene.

Elinye iphuzu ukuthi umkhawulo omuhle kakhulu uncike kumetric esikukhethayo.

Ushintsho “oluhle kakhulu” luyashintsha ngokuya ngokuthi siyakwazi yini ukunemba, ukunemba, ukukhumbula, noma i-F1-Score. I-ROC-AUC iyinhlangano-ezimele, okwenza kube yisilinganiso esijwayelekile sekhwalithi yemodeli.

Ekugcineni, i-roc-auc ithumba Ikhono lokulinganisa yemodeli, eyenza ibe wusizo ikakhulukazi kuma-datasets ambaled.


Isilinganiso sakwaDataset

I-Dataset ye-IBM HR Analytics Actintition Esetshenziselwa kulesi sihloko isuka kuKaggle, inelayisense ngaphansi kwe-CC0 (Domain Domain), okwenza kuphephe ukukusebenzisa kulokhu kushicilelwa.


Ngiyethemba ukuthi uthole le ndatshana iyasiza.

Zizwe ukhululekile ukwabelana ngemicabango yakho.

Siyabonga ngokufunda!

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button