Machine Learning

Solving the 3Blue1Brown String Probability Problem (Without AI)

a stupid chance problem solving problem in my free time when I could be scrolling through doom? Because I'm trying to stay sharp in this unique moment where we can give more of our critical thinking to productive AI. If you read TDS articles, you and I probably share that goal.

This article will go over solving a fun probability puzzle released by one of my favorite YouTubers (3Blue1Brown) recently. However, if you are not familiar with his channel, you need to check him out. He focuses on visualization and intuitive interpretation that will make you wonder why math is taught in any other way.

Setup problem

The brief I linked below will give you a much better introduction, but I'll look at the brief set here as an appendix.

Imagine we have a box with many wires in it. We randomly pick the end of one string and randomly pick the end of the other string. Then we tie the ends together. One of two things can happen: (1) the ends come from different ropes and we now have one, long rope or (2) the second end comes from the same rope we picked at the beginning and joining them together makes a loop.

When we join two separate wires together, we return the long wire to the box. When we make a loop, we remove it from the box. This random string selection process continues until there are no strings left in the box.

The problematic question is, how many loops do we expect this process to create? Or, in less precise, but more realistic terms – If we repeat this process many times, what is the average number of loops that can be created?

Important observations about the problem

A good understanding of the problem is always the key to finding a good solution. Apart from understanding the mechanics presented in the last section, there is an important observation that we will need to understand.

Observation #1

Each random drawing round has two random parts. First and second random drawing. The first drawing is not very important. The second drawing, because you decide whether we make a loop or a long thread.

Observation #2

Each round results in one fewer string in the box no matter what. When a loop is created, the thread that created the loop is removed. If a long cord is made, the two cords become one, reducing the cord count in the box by 1.

Photo by the author

Observation #3

The draw number is not random variable. Each round removes the string from the box regardless of the result (observation #2). Each cycle has two diagrams, therefore, the number of cycles will be equal to the number of wires. For example, if we have 10 strings, we have 20 random draws in 10 rounds. Note that the last 'cycle' has one string left and always leads to a loop.

Observation #4

This view builds on the other three and is very important. From the point of view of counting loops, each round of the random draw is independent of the previous rounds. This means that we can break down the problem into individual rounds rather than considering the entire sequence of rounds together.

Note that if we were interested in metrics like the expected round of a circle, this observation would not be true. The length of the cords (and therefore the circumference of the loops) there is depends on previous rounds.

Ok, now that this is noted down, let's get into how to solve the problem!

A “brute force” solution.

Almost all problems like this (and real problems) have a brute force solution. The method is similar to digging a swimming pool hole by hand.

For this problem, we can create a probability tree and manually calculate the expected number of loops. It seems to pass here.

Image Produced by Dall-E using the author's handwritten image as input

This is a tough, but good solution for small numbers of wires. In the video, Grant specifically calls 50 strings as the number to solve (that would require 2 trees50 eyebrow!). He did that to get his audience out of the way of brute force into intelligent solutions.

Let's see if we can come up with a smarter way.

Divide and conquer solution

By really thinking about the characteristics of a random process, we realized that each cycle of the random draw is independent of the others (observation #4). Thanks to this structure, we can calculate the expected value of a single draw and see if we can find a way to combine many single draws to solve the problem.

Expected number of loops from one drawing

We've already seen that the first random draw isn't very important (observation #1) – it's really about the second draw.

Let's go through a simple 4-wire problem. We do our first random draw to get the first ending (it doesn't really matter which one we choose). Our second random draw can be any of the ends in the box, outside the ending we chose for the first painting.

Imagine we have 4 wires in the box, which gives us 8 ends. After choosing the first ending, we cannot choose it again, so we have 7 endings to choose from. One of the 7 will lead to a loop, the other limits will not create a loop. The picture below shows the setup more clearly than talking numbers.

Photo by the author

Therefore, the probability of looping is 1/7 and the probability of not looping is 6/7 – this yields 1/7 expected loops (1*1/7 + 0*6/7).

Let's combine this into a formula using the string number as input. If S number of wires – number of ends 2S (two ends per thread). After the first choice, we can choose from 2S-1 end, only one of those ends exits the loop. Therefore, the formula for expected loops is 1/(2S-1).

expected loops given by S strings – author's photo

Combining multiple diagrams to solve the full problem

Now that we have created a formula to calculate the expected number of loops for one cycle, let's work on how to combine multiple cycles. Because of observation #4 (independence of loops counting cycles) and observation #2 (fixed number of cycles) we can simply add up the expected loops in each cycle. Of course, we have to update the number of strings for each cycle – we can do this using the join function.

A formula for expected loops with a box S cables

Now that we have the formula, completing the challenge is as little as 50 connections S which gives us 2.94 loops – job done!

Monte Carlo solution

Because this problem has a closed-form solution, we could stop our discussion with the last section. However, it is useful to talk about how we can solve this problem with Monte Carlo simulation. Although not necessary for simple problems, Monte Carlo can help us when we add complex wrinkles.

Monte Carlo simulation estimates values ​​using repeated random processes. In our problem, we simulate the random drawing process many times, and we take the average of the number of loops calculated in all simulations.

As the number of simulations increases – due to the law of large numbers – the Monte Carlo number converges to the true expected value. Here's a link to the full code – I'll include the loop that creates the actual simulation below.

from monte_carlo_funcs import create_strings, select_ends, tie_ends

# Run the Monte Carlo
list_of_circles = []
num_strings = 50
num_simulations = 10000

if __name__ == "__main__":
  for _ in range(0, num_simulations):
      
      # create the simulated starting box of strings
      strings = create_strings(num_strings)
  
      # start circle counter for this simulation
      circle_counter = 0
  
      # draw from the box until there are no more strings left
      while len(strings) > 0:
          end_1, end_2, strings = select_ends(strings)
          strings, circle_bool = tie_ends(strings, end_1, end_2)
          circle_counter += circle_bool
          
      # add the number of circles that counts number of circles for each round
      list_of_circles.append(circle_counter)
  
  print(np.mean(list_of_circles))

When I ran this (it will change slightly each time) I got 2.95 – very close to the correct 2.94. This highlights something important – the Monte Carlo process is a good way to get an estimate, but flexibility comes at the cost of accuracy.

To complicate the problem

Let's take a moment to highlight where the Monte Carlo method shines by making the problem more difficult. What if we change the problem from calculating the expected number of loops to the expected one average cycle of loops. This problem is very complicated because it depends on the interdependence between each cycle of the random draw.

I could not find a closed solution for the problem (another one may exist). If we can't find solutions to closed problems – which will be a lot of time – Monte Carlo can save the day! We can easily modify the Monte Carlo code to keep track of the length of each string and use that length to calculate the circumference of the loop when it is created. Using Monte Carlo reduces this equation from a difficult mathematical problem to a simple coding problem.

The main takeaway is that when creating a closed form solution is difficult or impossible, Monte Carlo can be a simple way to solve the problem.

The conclusion

The ability to deeply understand a problem and thoughtfully create a solution has always been a distinguishing data science skill – and our recent reliance on generative AI – is becoming an even rarer commodity. I found this puzzle to be a fun exercise to hone those skills.

Although you will never need to calculate the expected number of loops in a wiring box like a data scientist (or at least you are unlikely to), you will often encounter situations where the path to a solution is not immediately obvious. Deeply understanding a problem, breaking it down into smaller parts, and carefully developing a customized solution are skills that transfer directly to real-world data science and analytics work.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button