Machine Learning

This puzzle indicates how well the llms have improved more than a year

That the llm skills have improved greatly a few years ago, but it is not only difficult to measure how good they are.

That found me thinking about the geometric problem I found on the YouTube station last year. This happened in June 2024, and I tried to find a great model of language at that time (GPT-4O) to solve the puzzle. It didn't go well and it's necessary much Efforts to find a solution, and I wondered how the latest llms could go through the same puzzle.

Puzzle

Here is a quick reminder of what I asked of the LLM to solve back then. Take that the next grid of dots / nodes. On X and Y Plane, each location is only one unit from its nearest neighbor. It looks like this,

Now, the question I wanted to answer it is,

How many distinct squares can be drawn on this diagram?

Soon it became clear that GPT-4o did not know the answer, so I gradually changed and asked.

I would like a Python program that plots out all the squares we can 
draw on the attached diagram, assuming that the corners of any square 
must lie on one of the spots on the diagram. Assume each adjacent spot is 
1 unit apart in both the x and y directions. Also print out a summary of 
the number of squares of the same size and what their side lengths are

A long little story, an eventually I have GPT-4o to come with the correct Python solution. Still, it took about two hours and was arguing with me 40 hours out of model to dipping its response until something worked.

NB, have you worked for the answer yet? Even now, I no longer believe that it is 21 squares you can find in thisagraph.

Collaboration with Sonnet 4.5

This refers to the current date and the recent release of anththropic of what is considered the best model of code available today, that is Claude Sonnet 4.5. For 16 months since I finally looked at this, so how well the llm have developed in the intervention? Let's see how Sonnet 4.5 deals with the puzzle.

To use Sonnet 4.5, I visit anTropic homepage.

You should see a page like that.

Image from anthropic website

Click the Great Dark “Try Claude” The button found next to the top of the screen. You will be asked to sign in this time. If you do not make an account with anthropic, please set one now. It's free to do so.

Once you have logged in, you will be taken to the screen displaying the risk window that allows you to participate and ask Sonnet 4.5 questions.

Before I begged the model to give the Python system to solve the puzzle, I asked a specific question to see if it would have known the answer.

Has returned with it,

Looking at this grid, I need to count all possible squares of different 
sizes that can be formed by connecting the dots.
Let me first identify the grid structure. The dots form:

2 dots in the top row
2 dots in the second row
6 dots in the third row
6 dots in the fourth row
2 dots in the fifth row
2 dots in the sixth row

This appears to be a cross or plus-sign shaped grid pattern.
Now I'll count squares by size:
1×1 squares: These are formed by any 4 dots that make a unit square.

In the horizontal middle section (rows 3-4): 5 squares per row × 1 = 5 squares
In the vertical middle section (columns where the cross intersects): counting 
the vertical segments gives us 5 more squares
Total: 10 small squares

2×2 squares: These need 4×4 arrangements of dots.

In the horizontal section: 4 squares
In the vertical section: 4 squares
Overlap in center: -1 (counted twice)
Total: 7 squares of size 2×2

3×3 squares and larger: The grid pattern is only 6 dots wide at its maximum, but the cross shape limits where larger squares can fit. Checking the geometry:

One 3×3 square can fit in the center where the arms of the cross overlap
Total: 1 square of size 3×3

Total: 10 + 7 + 1 = 18 squares

So, incorrectly, confidently. You have a knowledge and accused about the reference to a model in 3 × 3 square), I asked if I showed a square in Diagram. Answered this.

Looking at the grid more carefully, I need to reconsider whether a 3×3 square 
actually exists.
For a 3×3 square to exist, I would need a 4×4 arrangement of dots 
(since a 3×3 square requires 4 dots on each side). Let me check the grid 
structure:
The grid appears to be:

Rows 1-2: only 2 dots each (centered)
Rows 3-4: 6 dots each (full width)
Rows 5-6: only 2 dots each (centered)

For a 3×3 square, I would need 4 consecutive rows and 4 consecutive columns all with dots at the intersection points. However, looking at the cross/plus pattern:

The top only has 2 columns of dots
The middle has 6 columns of dots
The bottom only has 2 columns of dots

Since rows 1-2 and rows 5-6 only have 2 dots each (not spanning the full width)
, there isn't actually a 4×4 block of dots anywhere on this grid.
I need to correct my answer: There is NO 3×3 square possible on this grid.
The total should be: 10 (1×1 squares) + 7 (2×2 squares) = 17 squares
I apologize for the error in my initial count. The cross-shaped pattern 
prevents any square larger than 2×2 from being formed.

All right, it has redeemed that answer, but basically, it did not mean that there were 21 squares of the total.

The final answer

Keeping things are good for the comparison of model in 16 months, I asked Sonnet 4.5 The same question I requested by GPT-4O,, This.

I would like a Python program that plots out all the squares we can draw on 
the attached diagram, assuming that the corners of any square must lie on one 
of the spots on the diagram. Assume each adjacent spot is 1 unit apart in 
both the x and y directions. Also print out a summary of the number of 
squares of the same size and what their side lengths are

Sonnet 4.5 tested all 5 seconds before producing its Python code.

import matplotlib.pyplot as plt
import numpy as np
from itertools import combinations
from collections import defaultdict

# Define the grid points based on the image
points = [
    # Top two rows (2 dots each, centered)
    (3, 6), (4, 6),
    (3, 5), (4, 5),
    # Middle two rows (6 dots each, full width)
    (1, 4), (2, 4), (3, 4), (4, 4), (5, 4), (6, 4),
    (1, 3), (2, 3), (3, 3), (4, 3), (5, 3), (6, 3),
    # Bottom two rows (2 dots each, centered)
    (3, 2), (4, 2),
    (3, 1), (4, 1),
]

def is_square(p1, p2, p3, p4):
    """Check if four points form a square and return side length if true."""
    points = [p1, p2, p3, p4]
    
    # Calculate all pairwise distances
    distances = []
    for i in range(len(points)):
        for j in range(i + 1, len(points)):
            dist = np.sqrt((points[i][0] - points[j][0])**2 + 
                          (points[i][1] - points[j][1])**2)
            distances.append(dist)
    
    distances.sort()
    
    # For a square: 4 equal sides and 2 equal diagonals
    # Diagonals should be sqrt(2) times the side length
    if len(distances) == 6:
        sides = distances[:4]
        diagonals = distances[4:]
        
        # Check if all sides are equal
        if np.allclose(sides, [sides[0]] * 4, atol=1e-6):
            # Check if diagonals are equal
            if np.allclose(diagonals, [diagonals[0]] * 2, atol=1e-6):
                # Check if diagonal = side * sqrt(2)
                if np.allclose(diagonals[0], sides[0] * np.sqrt(2), atol=1e-6):
                    return True, sides[0]
    
    return False, 0

def order_square_points(p1, p2, p3, p4):
    """Order points of a square in counterclockwise order starting from bottom-left."""
    points = [p1, p2, p3, p4]
    # Find centroid
    cx = sum(p[0] for p in points) / 4
    cy = sum(p[1] for p in points) / 4
    
    # Sort by angle from centroid
    def angle(p):
        return np.arctan2(p[1] - cy, p[0] - cx)
    
    points.sort(key=angle)
    return points

# Find all squares
squares = []
squares_by_size = defaultdict(list)

for combo in combinations(points, 4):
    is_sq, side_length = is_square(*combo)
    if is_sq:
        ordered = order_square_points(*combo)
        squares.append(ordered)
        squares_by_size[round(side_length, 2)].append(ordered)

# Print summary
print("=" * 50)
print("SUMMARY OF SQUARES FOUND")
print("=" * 50)
total_squares = 0
for side_length in sorted(squares_by_size.keys()):
    count = len(squares_by_size[side_length])
    total_squares += count
    print(f"Side length {side_length}: {count} square(s)")

print(f"nTotal squares: {total_squares}")
print("=" * 50)

# Create visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 14))
axes = axes.flatten()

# Plot 1: All dots
ax = axes[0]
x_coords = [p[0] for p in points]
y_coords = [p[1] for p in points]
ax.scatter(x_coords, y_coords, c='black', s=100, zorder=3)
ax.set_xlim(0, 7)
ax.set_ylim(0, 7)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
ax.set_title('Original Grid', fontsize=14, fontweight='bold')
ax.set_xlabel('X')
ax.set_ylabel('Y')

# Plot 2: All squares overlaid
ax = axes[1]
ax.scatter(x_coords, y_coords, c='black', s=100, zorder=3)
colors = plt.cm.rainbow(np.linspace(0, 1, len(squares)))
for square, color in zip(squares, colors):
    square_closed = square + [square[0]]
    xs = [p[0] for p in square_closed]
    ys = [p[1] for p in square_closed]
    ax.plot(xs, ys, c=color, linewidth=2, alpha=0.6)
ax.set_xlim(0, 7)
ax.set_ylim(0, 7)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
ax.set_title(f'All {total_squares} Squares Overlaid', fontsize=14, fontweight='bold')
ax.set_xlabel('X')
ax.set_ylabel('Y')

# Plot 3 & 4: Squares by size
plot_idx = 2
for side_length in sorted(squares_by_size.keys()):
    if plot_idx >= len(axes):
        break
    
    ax = axes[plot_idx]
    ax.scatter(x_coords, y_coords, c='lightgray', s=100, zorder=2)
    
    squares_of_size = squares_by_size[side_length]
    colors = plt.cm.viridis(np.linspace(0, 1, len(squares_of_size)))
    
    for square, color in zip(squares_of_size, colors):
        square_closed = square + [square[0]]
        xs = [p[0] for p in square_closed]
        ys = [p[1] for p in square_closed]
        ax.plot(xs, ys, c=color, linewidth=2.5, marker='o', markersize=8)
    
    ax.set_xlim(0, 7)
    ax.set_ylim(0, 7)
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.3)
    ax.set_title(f'Squares with side length {side_length} ({len(squares_of_size)} total)', 
                 fontsize=14, fontweight='bold')
    ax.set_xlabel('X')
    ax.set_ylabel('Y')
    plot_idx += 1

plt.tight_layout()
plt.show()

At first pointing, I thought there was no way sonnet that could come up with the right solution as soon as possible. GPT-4O socone taken for hours. However, I have entered all the required libraries that the code is required and is copied and attached to the JipyBookbook cell of JupyBook, it runs the next result. I left the graphs yet, and they were successful.

==================================================
SUMMARY OF SQUARES FOUND
==================================================
Side length 1.0: 9 square(s)
Side length 1.41: 4 square(s)
Side length 2.24: 2 square(s)
Side length 2.83: 4 square(s)
Side length 3.61: 2 square(s)

Total squares: 21
==================================================

#
# Plus some graphs that I'm not showing here
#

That shocked me. The answer was completely aware.

The only thing of the wrong model is not a plan of each set of different squares. It just made nine 1x1s and 4xx√o. I settled that by asking the Son to put those, too.

Can you print the graphs in square side order. Also can you have two graphs  
side by side on each "line"

This is what productivity.

He is beautiful.

Summary

To show how much the development of the expense will determine if I rehabilate a challenging geometric puzzle that tried to solve GPT-4O back in a specific Grade Grade System.

My experience is more than a year ago we were on the struggle; It took me about two hours and more than 40 promoting GPT-4O in the correct Python solution.

Top to today, and I checked Claude Sonnet 4.5. When I start asking the exact question model, you fail to calculate the correct amount of squares. However, it is not a good start, the real test provided directly immediately I used to GPT-4O.

For surprise, it produced a perfect solution, which is due to the Python in the one shot one. The product produced is not available only 21 scenes but also classified in their opposite length and produced detailed farms to identify them. While I need a quick tracking to fill the sites, the basic problem was resolved immediately.

Could it be that the action of my attempt to solve this puzzle last year and publishing my findings introduced the web – the sphere, meaning anthropic. Yes, I think that would be, but why is the model can't answer the first exact question I asked about the total number of squares well?

To me, this test is well illustrates the amazing jump on the llim power. Once the second hour struggle with the leading model of its time in the past 16 months now is second-second success, shooted in lead model today.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button