Machine Learning

4 data observation: time in bubble charts


Bubble charts press for more detail in one detecting, in bubble size adds the third size. However, comparison “before” and “after” it is often difficult. Dealing with this, we suggest to add transformation between these provinces, build an accurate user experience.

Since we could not find a solution made ready, we built our own. The challenge came interesting and wanted to revitalize certain mathematical concepts.

Without a doubt, the most difficult part of the change between two circles – before and after the province. To facilitate, we focus on solving a single case, which may be expanded in the lake to produce the right amount of changes.

To build such a number, let's initially rotten three parts: two circles and polygon connecting them.

The Base Element Base, Photo by the writer

Building two circles is very easy – we know their destinations and radii. The remaining work to create quadrilatergon, with the following form:

Polygon, the picture is a writer

The construction of this polygon reduces to obtain the coordinates of its vertices. This is the most interesting work, and we will solve more.

From polygon to tangent lines, image is the writer

Counting distance from somewhere (x1, y1) in line AX + Y + B = 0Formula is:

Distance from point to line, image by the writer

In our case, distance (d) equal to rash rays (guard). Therefore,

Distance to the radio, the image is the writer

After repetition of both sides of equation in A ** 2 + 1We find:

The basis of mathematical, image is the writer

After submitting everything on one side and put an equation equivalent to Zero, we find:

The basis of mathematical, image is the writer

With two circles and we need to get a new accessory, we have the next statistical program:

Mathematical system, Photo by the author

This is most effective, but the problem is that we have 4 real-life lines:

All the lines of the potential tangent, the picture is the writer

And we need to choose just 2 them – they are outside.

To do this we need to test each domain and each orientation center and decide that the line is above or below the point:

Check if the line is above or below the point, the picture is the writer

We need two lines that exceed both of the above or both passes under the circles.

Now, let us translate all these steps into the code:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sympy as sp
from scipy.spatial import ConvexHull
import math
from matplotlib import rcParams
import matplotlib.patches as patches

def check_position_relative_to_line(a, b, x0, y0):
    y_line = a * x0 + b
    
    if y0 > y_line:
        return 1 # line is above the point
    elif y0 < y_line:
        return -1

    
def find_tangent_equations(x1, y1, r1, x2, y2, r2):
    a, b = sp.symbols('a b')

    tangent_1 = (a*x1 + b - y1)**2 - r1**2 * (a**2 + 1)  
    tangent_2 = (a*x2 + b - y2)**2 - r2**2 * (a**2 + 1) 

    eqs_1 = [tangent_2, tangent_1]
    solution = sp.solve(eqs_1, (a, b))
    parameters = [(float(e[0]), float(e[1])) for e in solution]

    # filter just external tangents
    parameters_filtered = []
    for tangent in parameters:
        a = tangent[0]
        b = tangent[1]
        if abs(check_position_relative_to_line(a, b, x1, y1) + check_position_relative_to_line(a, b, x2, y2)) == 2:
            parameters_filtered.append(tangent)

    return parameters_filtered

Now, we just need to get the joint of thighs and gatherings. These 4 points will be the desired polelygon vertices.

Circle Equation:

Circle equation, photograph is the writer

Replace Line Equation y = ax + b In the same custom:

The basis of mathematical, image is the writer

Equation solution is the x of the meeting.

Then, counting y From Line Equation:

Counting Y, Photo by the writer

How to translate the code:

def find_circle_line_intersection(circle_x, circle_y, circle_r, line_a, line_b):
    x, y = sp.symbols('x y')
    circle_eq = (x - circle_x)**2 + (y - circle_y)**2 - circle_r**2
    intersection_eq = circle_eq.subs(y, line_a * x + line_b)

    sol_x_raw = sp.solve(intersection_eq, x)[0]
    try:
        sol_x = float(sol_x_raw)
    except:
        sol_x = sol_x_raw.as_real_imag()[0]
    sol_y = line_a * sol_x + line_b
    return sol_x, sol_y

Now we want to produce sample data to show all the chart songs.

Think about 4 users in our marketplace. We know how much to buy what they have made, money produced and work in the platform. All these metric counts number 2 days (let us call us during the previous and time).

# data generation
df = pd.DataFrame({'user': ['Emily', 'Emily', 'James', 'James', 'Tony', 'Tony', 'Olivia', 'Olivia'],
                   'period': ['pre', 'post', 'pre', 'post', 'pre', 'post', 'pre', 'post'],
                   'num_purchases': [10, 9, 3, 5, 2, 4, 8, 7],
                   'revenue': [70, 60, 80, 90, 20, 15, 80, 76],
                   'activity': [100, 80, 50, 90, 210, 170, 60, 55]})
The data sample, the image by the writer

Let's think that “work” is a bubble place. Now, let's change it into bubble radiation. We will limit the Iy-axis.

def area_to_radius(area):
    radius = math.sqrt(area / math.pi)
    return radius

x_alias, y_alias, a_alias="num_purchases", 'revenue', 'activity'

# scaling metrics
radius_scaler = 0.1
df['radius'] = df[a_alias].apply(area_to_radius) * radius_scaler
df['y_scaled'] = df[y_alias] / df[x_alias].max()

Now let's build a chart – 2 circles and polygon.

def draw_polygon(plt, points):
    hull = ConvexHull(points)
    convex_points = [points[i] for i in hull.vertices]

    x, y = zip(*convex_points)
    x += (x[0],)
    y += (y[0],)

    plt.fill(x, y, color="#99d8e1", alpha=1, zorder=1)

# bubble pre
for _, row in df[df.period=='pre'].iterrows():
    x = row[x_alias]
    y = row.y_scaled
    r = row.radius
    circle = patches.Circle((x, y), r, facecolor="#99d8e1", edgecolor="none", linewidth=0, zorder=2)
    plt.gca().add_patch(circle)

# transition area
for user in df.user.unique():
    user_pre = df[(df.user==user) & (df.period=='pre')]
    x1, y1, r1 = user_pre[x_alias].values[0], user_pre.y_scaled.values[0], user_pre.radius.values[0]
    user_post = df[(df.user==user) & (df.period=='post')]
    x2, y2, r2 = user_post[x_alias].values[0], user_post.y_scaled.values[0], user_post.radius.values[0]

    tangent_equations = find_tangent_equations(x1, y1, r1, x2, y2, r2)
    circle_1_line_intersections = [find_circle_line_intersection(x1, y1, r1, eq[0], eq[1]) for eq in tangent_equations]
    circle_2_line_intersections = [find_circle_line_intersection(x2, y2, r2, eq[0], eq[1]) for eq in tangent_equations]

    polygon_points = circle_1_line_intersections + circle_2_line_intersections
    draw_polygon(plt, polygon_points)

# bubble post
for _, row in df[df.period=='post'].iterrows():
    x = row[x_alias]
    y = row.y_scaled
    r = row.radius
    label = row.user
    circle = patches.Circle((x, y), r, facecolor="#2d699f", edgecolor="none", linewidth=0, zorder=2)
    plt.gca().add_patch(circle)

    plt.text(x, y - r - 0.3, label, fontsize=12, ha="center")

Release looks like the expected:

Which removes, the image is the writer

Now we want to install some style:

# plot parameters
plt.subplots(figsize=(10, 10))
rcParams['font.family'] = 'DejaVu Sans'
rcParams['font.size'] = 14
plt.grid(color="gray", linestyle=(0, (10, 10)), linewidth=0.5, alpha=0.6, zorder=1)
plt.axvline(x=0, color="white", linewidth=2)
plt.gca().set_facecolor('white')
plt.gcf().set_facecolor('white')

# spines formatting
plt.gca().spines["top"].set_visible(False)
plt.gca().spines["right"].set_visible(False)
plt.gca().spines["bottom"].set_visible(False)
plt.gca().spines["left"].set_visible(False)
plt.gca().tick_params(axis="both", which="both", length=0)

# plot labels
plt.xlabel("Number purchases") 
plt.ylabel("Revenue, $")
plt.title("Product users performance", fontsize=18, color="black")

# axis limits
axis_lim = df[x_alias].max() * 1.2
plt.xlim(0, axis_lim)
plt.ylim(0, axis_lim)

The prior legend fodder in the lower corner below to provide a vision viewer, how to learn the chart:

## pre-post legend 
# circle 1
legend_position, r1 = (11, 2.2), 0.3
x1, y1 = legend_position[0], legend_position[1]
circle = patches.Circle((x1, y1), r1, facecolor="#99d8e1", edgecolor="none", linewidth=0, zorder=2)
plt.gca().add_patch(circle)
plt.text(x1, y1 + r1 + 0.15, 'Pre', fontsize=12, ha="center", va="center")
# circle 2
x2, y2 = legend_position[0], legend_position[1] - r1*3
r2 = r1*0.7
circle = patches.Circle((x2, y2), r2, facecolor="#2d699f", edgecolor="none", linewidth=0, zorder=2)
plt.gca().add_patch(circle)
plt.text(x2, y2 - r2 - 0.15, 'Post', fontsize=12, ha="center", va="center")
# tangents
tangent_equations = find_tangent_equations(x1, y1, r1, x2, y2, r2)
circle_1_line_intersections = [find_circle_line_intersection(x1, y1, r1, eq[0], eq[1]) for eq in tangent_equations]
circle_2_line_intersections = [find_circle_line_intersection(x2, y2, r2, eq[0], eq[1]) for eq in tangent_equations]
polygon_points = circle_1_line_intersections + circle_2_line_intersections
draw_polygon(plt, polygon_points)
# small arrow
plt.annotate('', xytext=(x1, y1), xy=(x2, y1 - r1*2), arrowprops=dict(edgecolor="black", arrowstyle="->", lw=1))
Adding Style and Myth, Photo by Author

And ultimately bubble-size legend:

# bubble size legend
legend_areas_original = [150, 50]
legend_position = (11, 10.2)
for i in legend_areas_original:
    i_r = area_to_radius(i) * radius_scaler
    circle = plt.Circle((legend_position[0], legend_position[1] + i_r), i_r, color="black", fill=False, linewidth=0.6, facecolor="none")
    plt.gca().add_patch(circle)
    plt.text(legend_position[0], legend_position[1] + 2*i_r, str(i), fontsize=12, ha="center", va="center",
              bbox=dict(facecolor="white", edgecolor="none", boxstyle="round,pad=0.1"))
legend_label_r = area_to_radius(np.max(legend_areas_original)) * radius_scaler
plt.text(legend_position[0], legend_position[1] + 2*legend_label_r + 0.3, 'Activity, hours', fontsize=12, ha="center", va="center")

Our last chart looks like this:

Adding a secondary of myth, the picture is the writer

Seeing looks very looks and focuses on more information on the united form.

Here is the full graph code:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sympy as sp
from scipy.spatial import ConvexHull
import math
from matplotlib import rcParams
import matplotlib.patches as patches

def check_position_relative_to_line(a, b, x0, y0):
    y_line = a * x0 + b
    
    if y0 > y_line:
        return 1 # line is above the point
    elif y0 < y_line:
        return -1

    
def find_tangent_equations(x1, y1, r1, x2, y2, r2):
    a, b = sp.symbols('a b')

    tangent_1 = (a*x1 + b - y1)**2 - r1**2 * (a**2 + 1)  
    tangent_2 = (a*x2 + b - y2)**2 - r2**2 * (a**2 + 1) 

    eqs_1 = [tangent_2, tangent_1]
    solution = sp.solve(eqs_1, (a, b))
    parameters = [(float(e[0]), float(e[1])) for e in solution]

    # filter just external tangents
    parameters_filtered = []
    for tangent in parameters:
        a = tangent[0]
        b = tangent[1]
        if abs(check_position_relative_to_line(a, b, x1, y1) + check_position_relative_to_line(a, b, x2, y2)) == 2:
            parameters_filtered.append(tangent)

    return parameters_filtered

def find_circle_line_intersection(circle_x, circle_y, circle_r, line_a, line_b):
    x, y = sp.symbols('x y')
    circle_eq = (x - circle_x)**2 + (y - circle_y)**2 - circle_r**2
    intersection_eq = circle_eq.subs(y, line_a * x + line_b)

    sol_x_raw = sp.solve(intersection_eq, x)[0]
    try:
        sol_x = float(sol_x_raw)
    except:
        sol_x = sol_x_raw.as_real_imag()[0]
    sol_y = line_a * sol_x + line_b
    return sol_x, sol_y

def draw_polygon(plt, points):
    hull = ConvexHull(points)
    convex_points = [points[i] for i in hull.vertices]

    x, y = zip(*convex_points)
    x += (x[0],)
    y += (y[0],)

    plt.fill(x, y, color="#99d8e1", alpha=1, zorder=1)

def area_to_radius(area):
    radius = math.sqrt(area / math.pi)
    return radius

# data generation
df = pd.DataFrame({'user': ['Emily', 'Emily', 'James', 'James', 'Tony', 'Tony', 'Olivia', 'Olivia', 'Oliver', 'Oliver', 'Benjamin', 'Benjamin'],
                   'period': ['pre', 'post', 'pre', 'post', 'pre', 'post', 'pre', 'post', 'pre', 'post', 'pre', 'post'],
                   'num_purchases': [10, 9, 3, 5, 2, 4, 8, 7, 6, 7, 4, 6],
                   'revenue': [70, 60, 80, 90, 20, 15, 80, 76, 17, 19, 45, 55],
                   'activity': [100, 80, 50, 90, 210, 170, 60, 55, 30, 20, 200, 120]})

x_alias, y_alias, a_alias="num_purchases", 'revenue', 'activity'

# scaling metrics
radius_scaler = 0.1
df['radius'] = df[a_alias].apply(area_to_radius) * radius_scaler
df['y_scaled'] = df[y_alias] / df[x_alias].max()

# plot parameters
plt.subplots(figsize=(10, 10))
rcParams['font.family'] = 'DejaVu Sans'
rcParams['font.size'] = 14
plt.grid(color="gray", linestyle=(0, (10, 10)), linewidth=0.5, alpha=0.6, zorder=1)
plt.axvline(x=0, color="white", linewidth=2)
plt.gca().set_facecolor('white')
plt.gcf().set_facecolor('white')

# spines formatting
plt.gca().spines["top"].set_visible(False)
plt.gca().spines["right"].set_visible(False)
plt.gca().spines["bottom"].set_visible(False)
plt.gca().spines["left"].set_visible(False)
plt.gca().tick_params(axis="both", which="both", length=0)

# plot labels
plt.xlabel("Number purchases") 
plt.ylabel("Revenue, $")
plt.title("Product users performance", fontsize=18, color="black")

# axis limits
axis_lim = df[x_alias].max() * 1.2
plt.xlim(0, axis_lim)
plt.ylim(0, axis_lim)

# bubble pre
for _, row in df[df.period=='pre'].iterrows():
    x = row[x_alias]
    y = row.y_scaled
    r = row.radius
    circle = patches.Circle((x, y), r, facecolor="#99d8e1", edgecolor="none", linewidth=0, zorder=2)
    plt.gca().add_patch(circle)

# transition area
for user in df.user.unique():
    user_pre = df[(df.user==user) & (df.period=='pre')]
    x1, y1, r1 = user_pre[x_alias].values[0], user_pre.y_scaled.values[0], user_pre.radius.values[0]
    user_post = df[(df.user==user) & (df.period=='post')]
    x2, y2, r2 = user_post[x_alias].values[0], user_post.y_scaled.values[0], user_post.radius.values[0]

    tangent_equations = find_tangent_equations(x1, y1, r1, x2, y2, r2)
    circle_1_line_intersections = [find_circle_line_intersection(x1, y1, r1, eq[0], eq[1]) for eq in tangent_equations]
    circle_2_line_intersections = [find_circle_line_intersection(x2, y2, r2, eq[0], eq[1]) for eq in tangent_equations]

    polygon_points = circle_1_line_intersections + circle_2_line_intersections
    draw_polygon(plt, polygon_points)

# bubble post
for _, row in df[df.period=='post'].iterrows():
    x = row[x_alias]
    y = row.y_scaled
    r = row.radius
    label = row.user
    circle = patches.Circle((x, y), r, facecolor="#2d699f", edgecolor="none", linewidth=0, zorder=2)
    plt.gca().add_patch(circle)

    plt.text(x, y - r - 0.3, label, fontsize=12, ha="center")

# bubble size legend
legend_areas_original = [150, 50]
legend_position = (11, 10.2)
for i in legend_areas_original:
    i_r = area_to_radius(i) * radius_scaler
    circle = plt.Circle((legend_position[0], legend_position[1] + i_r), i_r, color="black", fill=False, linewidth=0.6, facecolor="none")
    plt.gca().add_patch(circle)
    plt.text(legend_position[0], legend_position[1] + 2*i_r, str(i), fontsize=12, ha="center", va="center",
              bbox=dict(facecolor="white", edgecolor="none", boxstyle="round,pad=0.1"))
legend_label_r = area_to_radius(np.max(legend_areas_original)) * radius_scaler
plt.text(legend_position[0], legend_position[1] + 2*legend_label_r + 0.3, 'Activity, hours', fontsize=12, ha="center", va="center")


## pre-post legend 
# circle 1
legend_position, r1 = (11, 2.2), 0.3
x1, y1 = legend_position[0], legend_position[1]
circle = patches.Circle((x1, y1), r1, facecolor="#99d8e1", edgecolor="none", linewidth=0, zorder=2)
plt.gca().add_patch(circle)
plt.text(x1, y1 + r1 + 0.15, 'Pre', fontsize=12, ha="center", va="center")
# circle 2
x2, y2 = legend_position[0], legend_position[1] - r1*3
r2 = r1*0.7
circle = patches.Circle((x2, y2), r2, facecolor="#2d699f", edgecolor="none", linewidth=0, zorder=2)
plt.gca().add_patch(circle)
plt.text(x2, y2 - r2 - 0.15, 'Post', fontsize=12, ha="center", va="center")
# tangents
tangent_equations = find_tangent_equations(x1, y1, r1, x2, y2, r2)
circle_1_line_intersections = [find_circle_line_intersection(x1, y1, r1, eq[0], eq[1]) for eq in tangent_equations]
circle_2_line_intersections = [find_circle_line_intersection(x2, y2, r2, eq[0], eq[1]) for eq in tangent_equations]
polygon_points = circle_1_line_intersections + circle_2_line_intersections
draw_polygon(plt, polygon_points)
# small arrow
plt.annotate('', xytext=(x1, y1), xy=(x2, y1 - r1*2), arrowprops=dict(edgecolor="black", arrowstyle="->", lw=1))

# y axis formatting
max_y = df[y_alias].max()
nearest_power_of_10 = 10 ** math.ceil(math.log10(max_y))
ticks = [round(nearest_power_of_10/5 * i, 2) for i in range(0, 6)]
yticks_scaled = ticks / df[x_alias].max()
yticklabels = [str(i) for i in ticks]
yticklabels[0] = ''
plt.yticks(yticks_scaled, yticklabels)

plt.savefig("plot_with_white_background.png", bbox_inches="tight", dpi=300)

Adding the time of the time to bubble charts improves their powerful transfer powers. Using smooth changes between “Before” and “AFTER, users can better understand the tendency and compare.

Although no solutions are made available, promoting custom approach reflects on the challenge and beneficial, requires detailed understanding and the electric strategies. The proposed method can easily be expanded in various datasets, which makes it an important tool for the data recognition, science, and analysis.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button