Kolmogorov-Smirnov figure, explains: Model modeling model in the credit risk model

0 2 8 minutes read

Kolmogorov-Smirnov figure, explains: Model modeling model in the credit risk model

Days, people take a loan greater than ever. Anyone who wants to build their own house, home loan is available and if you own the property, you can receive a loan. There is agricultural loan, education loan, business loans, gold loans, and many others.

In addition, purchase items such as television, refrigerators, furniture and mobile phones, we have EMI options.

But everyone gets their loan allowed request?

Banks do not give credit to everyone using the application; There is a procedure they follow to allow loan.

We know that the machine study and data science is now included in all industries, and banks also use them.

When the customer works with loans, banks need to know the opportunities for the Customer back at the time.

In this case, banks use speculative models, especially based on a reasonable return or other machine learning methods,

We already know that through these methods, each applicant was assigned to possibly.

This is the model of separation, and we need to distinguish fakers and non-corrupt ones.

They do not: Customers fail to pay their loan (payment costs or stop paying completely).

Non-apects: Customers return their loan on time.

We have discussed accuracy and ROC-AUC to check the separation models.

In this article, we will discuss Kolmogorov-Smirnov Station (KS Statistic) which is used to assess the planning models mainly in the banking sector.

Understanding KS figure, we will use the German credit data.

This data contains information about 1000 applicants, explaining 20 features such as account condition, loan period, credit period, employment, housing, and condition.

The intended variations indicate that the applicant is not responsible for (represented at 1) or defaulter (represented by 2).

You can find information about the Data and the data here.

We now need to create a separation model to separate applicants. Since the binary problem is binary separation, we will use Locisistic money in this data.

Code:

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Load dataset
file_path = "C:/german.data"
data = pd.read_csv(file_path, sep=" ", header=None)

# Rename columns
columns = [f"col_{i}" for i in range(1, 21)] + ["target"]
data.columns = columns

# Features and target
X = pd.get_dummies(data.drop(columns=["target"]), drop_first=True)
y = data["target"]   # keep as 1 and 2

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Train logistic regression
model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)

# Predicted probabilities
y_pred_proba = model.predict_proba(X_test)

# Results DataFrame
results = pd.DataFrame({
    "Actual": y_test.values,
    "Pred_Prob_Class2": y_pred_proba[:, 1]
})

print(results.head())

We already know that when we use logical arts, we get opportunities to give up.

Photo by the writer

Now to understand how KS Sticistic is calculated, let's look at 10 points sample in the outgoing.

Here the highest predictable is 0.92, meaning 92% of the Appeal will make a mistake.

Now let's go on the number of KS figure.

First, we will filter the applicants about orderly, so that the top applicants are high.

We already know that '1' is symbolizing non-falferters and '2' symbolizes the odds.

In the following step, we calculate the accumulated count of Fallersors and Fakeraders in each step.

In the following step, we convert custodial and non-false regulations at a balanced price.

We separate defilitative systemulatives by the total number of alias, and non-true relatives at the total number of non-degraded number.

Next, lists the difference between the default degree of defaulter and the definition of defaul.

The main difference between the accumulated cleaning rate and the defaulter is 0.83, the Statistic KS sample.

Here the KS statistic is 0.83, occurring at 0.29.

This means that the model captures 83 fakers more efficiently than non-refundable.

Here, we can see that:

The rate of a joint cleaning = a good measure of true (how many real fakerler wins so far).

A non-Defaulter level = a good false rate (how many non-falker is improperly captured as fake).

But since we didn't fix any limit here, how can we get good prices and false rates?

Let's see how to collect the TPR and FPR.

First of all, we look at all possible as a rainborn and count TPR and FPR.

[
begin{aligned}
mathbf{At threshold 0.92:} & \[4pt]
TP & = 1, QA AAD FN = 3, QUAD FP = 0, quaad TN = 6 \[6pt]
TPR & = tffac {1} {4} = 0.25 \[6pt]
FPR & = tfrac {0} {6} = 0 \[6pt]
Rightarrow ( mathrow {FPR}, , mathrm {TPR}) & = (0.25)
Finally {aligned}
]

[
begin{aligned}
mathbf{At threshold 0.63:} & \[4pt]

TP & = 2, QA AAD FN = 2, QUAD FP = 0, QUAD TN = 6 \[6pt]

TPR & = tffac {2} {4} = 0.50 \[6pt]

FPR & = tfrac {0} {6} = 0 \[6pt]

Marerow ( Mathrm {FPR}, , mathm {TPR}) & = (0, , 0.50)
Finally {aligned}
][
begin{aligned}
mathbf{At threshold 0.51:} & \[4pt]

TP & = 3, QA AAD FN = 1, QUAD FP = 0, qua AD = 6 [6pt]

TPR & = tffac {3} {4} = 0.75 \[6pt]

FPR & = tfrac {0} {6} = 0 \[6pt]

Rightarrow ( mathrow {FPR}, , mathrm {TPR}) & = (0, , 0.75)
Finally {aligned}
][
begin{aligned}
mathbf{At threshold 0.39:} & \[4pt]

TP & = 3, QA AAD FN = 1, QA AAD FP = 1, quaad TN = 5 \[6pt]

TPR & = tffac {3} {4} = 0.75 \[6pt]

FPR & = tfrac {1} {6} approx 0.17 \[6pt]

Marerow ( Mathrm {FPR}, , mathrm {TPR}) & = (0.17, , 0.75)
Finally {aligned}
][
begin{aligned}
mathbf{At threshold 0.29:} & \[4pt]

TP & = 4, QA AAD FN = 0, QUAD FP = 1, quaad TN = 5 \[6pt]