Machine Learning

Survival of Survival When Nobody Dies: The Methodology

Is the mathematic method used to answer the question: “How long will something last?” That “something” can range from patient's life to the continued part of a machine or user's registration period.

One of the most widely used tools in this area is KAplan-Meier Estimator.

Born in the world of biology, Kaplan-Meier lived with its advertising death. But like any true true algorithm, he did not sit on his route. These days, reflect business devices, commercial groups, and churn analyzes everywhere.

But here is catching: Business is not biology. It is sad, unexpectedly, and it is full of a building's twig. That is why there are a few issues that make our lives more difficult when we try to use survivors of the business world.

First, we are actually interested that the customer survived “(even if the survival can mean nothing in this amount), but rather How much is the economic value of the survivor.

Second, contrary to biology, It is very likely that customers “die” and “revive” many times (Think when you are not registered / resubscribe online service).

In this article, we will see how we can extend the ancient kaplan-meier way to better suit our needs: Modeling is ongoing (economics) instead of a binary (life / death) and allowing “the awareness”.

Update to Kaplan-Meier Estimator

Let us stop and postpone the second time. Before we can customize kaplan-meier to suit our business needs, we need immediate refreshment that the classic version works.

Suppose you had 3 lessons (Suppose the lab rats) and give them the medicine you need for testing. The tree was given at different times at the time: Title a You got it in January, the title b in April, as well as the topic c in May.

Then, we measure how long they survive. Set a died after six months, title c After 4 months, and the title b He is still alive during analysis (November).

In terms of photos, we can represent 3 lessons like this:

[Image by Author]

Now, Even if we wanted to measure simple metric, such as average survival, we will face the problem. In fact, we do not know how much lesson b You will survive, as you are alive today.

This is the classical issue of mathematics, and it is called “The exam right“.

Right call is maths – talk about “We do not know about the aftermath of some point” and is a great deal of survival. So much that Lead to the development of one of the Icononical Equicinator in Mathematics: Kafan-Meier EstimatorHe was named after the DUO introduced back in the 1950s.

So, how does kaplan-meier treat our problem?

First, we adapt to watches. Or our mice treated at different times, Important Time since the treatment. So we are reset x-Xis to Zero for everyone – Zero Day The day they find.

[Image by Author]

Now that we are all the same time line, we want to build something useful: An The curve is heavy survival. This curve tells us that a typical The mouse in our group will survive at least x months after treatment.

Let's follow Logic together.

  • Up to time 3? Everyone is still alive. So survive = 100%. It's easy.
  • At the time 4, mouse c You die. This means that the 3rd rats escape, only for them survives 4. That gives us an average of 67% to survivor 4.
  • And at the time 6, mouse a Checking. In the 2 mice they have lived for 6, only 1, so survival rate from 5 to 6 is 50%. Multiply it is 67% ago, and we get 33% survival of a period of time 6.
  • After a time 7 we do not have any other subjects seen alive, so the curve should stop here.

Let's set this results:

[Image by Author]

As the code often stretches to understand it, let us transliate this in Python. We have the following variables:

  • kaplan_meierArray containing the Kafan-Meier ratings of each point in time, e.g. Worship of survival until time t.
  • obs_tA lot of lists that you tell us whether someone is watched (eg, not selected) at the time t.
  • surv_tBoolean lists telling us that each person is alive at a time t.
  • surv_t_minus_1Boolean lists telling us that each person is alive at a time t-1.

All we have to do to take all the people being seen at the tEnter their survival ratings from t-1 to t (survival_rate_t), and it is multiplied to the survival level until time t-1 (km[t-1]) Getting a Survival Rating until the time t (km[t]Selected. In other words,

survival_rate_t = surv_t[obs_t].sum() / surv_t_minus_1[obs_t].sum()

kaplan_meier[t] = kaplan_meier[t-1] * survival_rate_t

Where, of course, the first place kaplan_meier[0] = 1.

If you do not want to write this from the start, kaplan-Meier algorithm is available in the Python Brillion lifelinesand can be used as follows:

from lifelines import KaplanMeierFitter

KaplanMeierFitter().fit(
    durations=[6,7,4],
    event_observed=[1,0,1],
).survival_function_["KM_estimate"]

If you use this code, you will find the same effect we have received manually with past snippet.

So far, we did not save the country of rats, medications and deaths. Not really your quarter quarter review, right? So, how is this helpful in business?

Moving to Business Structure

To date, we have treated “death” as transparent. Plan-Meier, someone lives or dies, and we can easily enter the time of death. But now let's raise the mix of real business.

What is even Is “Death” in a business context?

It is not easy to answer that question, at least because of a few reasons:

  1. “Death” is not easy to explain. Suppose you work in an e-commerce company. You want to know that the user is 'dead'. Should you trust as they die when they remove their account? It is easy to track … but it is rare to use. What if only they start buying a little? But How There is very little dead? Week of peace? Month? Two? You see the problem. The definition of “death” conflicts, and depends on where drawing a line, your analysis can mean very different stories.
  2. “Death” is not permanently. KAplan-Meaier is considered biological requests when a dead person is no return. But in business use, resurrection is not only but often. Consider the broadcast service for the people paying monthly subscription. It is easy to explain “Death” to: This is the users of their subscription. However, it is very often, for some time after cancellation, and they registered.

So how does all this play in the data?

Let's get by the example of the toy. It says we have a user in our E-Commerce commerce environment. In the past 10 months, how much they have used:

[Image by Author]

Reducing this to Plan-Meier Fram, Need Translate the Code of Conduct to Life or Death.

Therefore we do the law: If the user stops spending time in 2 months in a row, we declare “employees”.

In terms of photos, this law looks as follows:

[Image by Author]

As a user spends $ 0, two months in a row (month 4 and 5) We will process this user not working 4 monthly On. And we will do that even though the user started spending again in the month 7. This is because, in Kaplan-IIRA, the resurrection is thought that it is impossible.

Now let's add some two users to our example. Since we have decided to change their curve value into the Curve Curve, we can also catch the Kafen-Meier curve.

[Image by Author]

Now, you may be seeing how much (data) that has dropped to do this work. User a He came back from the dead – but we payed attention to that. User cExcellent expenditure – but kaplan-meier doesn't matter, because everything you see is 1S and 0s. We forced the continued amount (spending) in the binary box (alive / dead), and on the road, we lost any more information.

The question is therefore: We can extend the Kaplan-Meier way:

  • Saves Date, Factory Data,
  • refrains into a cutoffs dirt,
  • It allows to resurrect?

Yes, we can. In the following section, how will I show you.

Introducing the “Pan-Meade-Meanier's value”

Let's start with a simple form of Plan-Meier for Seen before.

# kaplan_meier: array containing the Kaplan-Meier estimates,
#               e.g. the probability of survival up to time t
# obs_t: array, whether a subject has been observed at time t
# surv_t: array, whether a subject was alive at time t
# surv_t_minus_1: array, whether a subject was alive at time t−1

survival_rate_t = surv_t[obs_t].sum() / surv_t_minus_1[obs_t].sum()

kaplan_meier[t] = kaplan_meier[t-1] * survival_rate_t

The first changes we need to make it replace surv_t including surv_t_minus_1The Bloolean Arrays tells us that the theme is alive (1) or dead (0) with the hands that tell us that the amount of each lesson. For this purpose, we can use two named Array named val_t including val_t_minus_1.

But this is not enough, because as we deal with a continuous value, Every user is unique, so we think we want to think equally, we need to persuade them based on a certain amount. But what value should we use? The most logical selection is to use their first amount at a time 0, before being influenced by any treatment we use.

So we also need to use another vector, named val_t_0 That represents the value of that person at 0.

# value_kaplan_meier: array containing the Value Kaplan-Meier estimates
# obs_t: array, whether a subject has been observed at time t
# val_t_0: array, user value at time 0
# val_t: array, user value at time t
# val_t_minus_1: array, user value at time t−1

value_rate_t = (
    (val_t[obs_t] / val_t_0[obs_t]).sum()
    / (val_t_minus_1[obs_t] / val_t_0[obs_t]).sum()
)

value_kaplan_meier[t] = value_kaplan_meier[t-1] * value_rate_t

Which we have built a Directly Race the Kafan-Meier. In fact, if you set val_t = surv_t, val_t_minus_1 = surv_t_minus_1beside val_t_0 Like a number of 1s, this formula falls backward back to the original Survival Estimator for the first survival. So-is official.

And here is the curve we will find when used for three users.

[Image by Author]

Let's call this new version Plan-Meier Estimator's value. In fact, you answer the question:

What percentage of value do we survive, on average, after x Time?

We have a vision. But does it work in the wild?

Plan-Meier's value is used in operation

If you take Plan-Meier Spikator's value to find spin on the actual world and compare it with the old cura-meier curve, you will see something comforting – they often have a similar situation. That is a good sign. It means we have never broken anything basic when it is upgraded from the binary to continuity.

But here are things they get: Plan-Meier's value is usually slow above her native cousin. Why? Because this new world, users are allowed to 'arouse'. The Kafan-Meier, to be firm for the two, would be written down when they were silent.

So how do we use this?

Think you are conducting research. During Sero, you start new treatment in the users group. Whatever it is, you can follow how much the amount is “heavy” in both medical methods and control groups later.

And this is what you are available as:

[Image by Author]

Store

KAplan-Meier is a widely used and accurate method of measuring survival jobs, especially where the result binary event is as death or failure. However, many actual business situations include many hardwork – awakening is possible, and the results are best represented at continuous numbers and no binary kingdom.

In such cases, the Value of Plan-Meier offers a natural extension. By entering the number of people in time, it empows the ability to understand well to keep the value of value and rot. This method saves easier and easy-meier estimator first while adapting to better customers' ability.

Plan-Meanier's value usually provides a high rate of the value stored in comparison with Plan-Meier, due to its recycling skill. This makes it especially helpful for test tests or tracking the number of customers later.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button