Machine Learning

Feature detection, Part 1: Image settings, gradients, and the sobel operator

Computer vision is a great area for analyzing images and videos. While many people tend to think a lot about machine learning models when they hear computer vision, in reality, there are many algorithms out there that, in some cases, perform better than AI!

From a computer perspective, the location of Character discovery including identifying distinct regions of interest in an image. These results can be used to create feature definitions – Value vectors represent regions of local images. After that, feature descriptions of multiple photos from the same location can be combined to create a photo magazine or reconstruct a location.

In this article, we will make an analogy from Calculation to introduce Images from the imageand party . We will need to understand the logic behind the The sobel operatorMainly – A Computer Vision filter is used to detect edges in an image.

Image intensity

it is one of the main features of the image. Every image pixel has three elements: R (Red), g (Green), and B (Blue), which take values ​​between 0 and 255. The intensity of a pixel is simply the ratio of its R, g, and B values.

In fact, there are several standards that describe different instruments. Since we will focus on OpenCV, we will use their formula, given below:

Strength formula
image = cv2.imread('image.png')
B, G, R = cv2.split(image)
grayscale_image = 0.299 * R + 0.587 * G + 0.114 * B
grayscale_image = np.clip(grayscale_image, 0, 255).astype('uint8')
intensity = grayscale_image.mean()
print(f"Image intensity: {intensity:2f}")

GrayScale Images

Images can be displayed using different color channels. If the RGB channels represent the original image, using the intensity formula above will convert it to grayscale format, which includes only one channel.

Since the sum of the bars in the formula is equal to 1, the grayscale image will contain values ​​between 0 and 255, such as RGB channels.

Big Ben shown in RGB (left) and grayscale (right)

In OpenCV, the RGB channels can be converted to gruyscale format using the CV2.cvtcolor () function, which is a simpler method than the method already seen above.

image = cv2.imread('image.png')
grayscale_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
intensity = grayscale_image.mean()
print(f"Image intensity: {intensity:2f}")

Instead of the standard RGB Palette, OpenCV uses the BGR Palette. Both are the same except that the R and B Elements have just been changed. For simplicity, in this and the following articles of this series, we will use the terms RGB and BGR interchangeably.

If we calculate the intensity of the image using both methods in OpenCV, we can get slightly different results. That's normal since then, when using the CV2.CVTColor function, OpenCV cycles to convert pixels towards adjacent frames. Calculating the mean value will result in a small difference.

Image based

Image detection parameters are used to measure how quickly the pixel size changes in an image. Images can be thought of as a function of two arguments, i (x, y), where x and y define the position of a pixel and i represents the intensity of that pixel.

We can formally write:

But given the fact that the images exist in the physical environment, its appearance is often limited by concussion cakes:

  • On the horizontal X-axis: [-1, 0, 1]
  • For the vertical y-axis: [-1, 0, 1]ᵀ

In other words, we can rewrite the above equation in the following form:

To better understand the logic behind the kernels, let's refer to the example below.

For example

Suppose we have a matrix with 5 × 5 pixels representing a grayscale image. The properties of this matrix indicate the size of the pixels.

For image-based calculations, we can use conmotion cones. The idea is simple: By taking a pixel in the image and several pixels in its place, we get the sum of logical multiplication by the given matrix (or vector) that contributed.

In our case, we will use a small set vector [-1, 0, 1]. From the example above, let's take a pixel at location (1, 1) whose value is -3, for example.

Since the size of the kernel (yellow) is 3 × 1, we will need the left and right elements of -3 to match the size, therefore, we take the vector [4, -3, 2]. Then, by finding the sum of the wise product of the other, we get the value of -2:

A value of 2 represents the first Pixel. If we look carefully, we can see that the cancellation of Pixel -3 is just the difference between the right pixel (2) of its left Pixel (4).

Why use complicated formulas when we can take the difference between two things? Indeed, in this example, we could simply calculate the difference in intensity between the elements i (x, y + 1) and i (x, y – 1). But in reality, we can deal with complex situations where we need to find obvious features and characteristics. For that reason, it's easy to use standard kernels that are already known to find predefined types of features.

Depending on the number of derivations, we do some of the things you've seen:

  • If the amount of erasure is significant in a given image region, then the dimensions change significantly there. Other than that, there are no visible changes in light.
  • If the number of deletions is present, it means that from left to right, the image region is cold; If there are no negatives, the image region darkens in analysis from left to right.

By analogy to linear algebra, nodes can be treated as image operators in images that transform local image regions.

I have ideas, we can calculate the solution with kernical ternel. The process will remain the same, except that now we move our window (kernel) vertically across the image matrix.

You can notice that after using the Convolution filter on the original 5 × 5 image, it became 3 × 3. It is normal because we cannot use the measurements in the same way to unwrap the pixels (otherwise it will come out with borders).

To preserve the size of the image, the paddique process is often used that contains the temporal / parallel boundaries of the image or fill them with zeros, so the decision can be calculated for the edge pixels as well.

By default, libraries like OpenCV automatically Pad the Border to ensure the same size of input and output images.

Image gradient

The image shows that the magnitude of intensity (contrast) changes rapidly for a given pixel in both directions (x and y).

Formally, the gradient of an image can be written as the verivatives of the image with respect to the x- and y-axis.

The magnitude of the gradient

The magnitude of the gradient represents the normal of the gradient vector and can be found using the formula below:

Qualification guidance

Using the obtained GX and GY, it is also possible to calculate the angle of the gradient vector:

For example

Let's see how to calculate the gradients according to the example above. For that, we will need a 3 × 3 concatenated matrix after the Convel Kernel kernel used.

If we take the top left pixel, it has values Gₓ = -2 and Gᵧ = 11. We can easily calculate the magnitude of the gradient and the line:

For all three 3 × 3 matrices, we obtain the following observations of gradients:

In practice, it is recommended to normalize the cores before using them in matrices. We didn't do it for the simplicity of the example.

The sobel operator

When we have learned the basics of image detection and gradients, now it is time to take the sobel operator, which is used to divide it. Comparing the previous kernels of size 3 × 1 and 1 × 3, the sobel operator is defined with 3 × 3 kernels (on both axes):

This gives an advantage to the sobel operator as it prescales only 1D changes, ignoring other rows and columns in the area. The sobel operator looks for more information about local regions.

Another advantage is that sobel is very resistant to noise. Let's look at the picture below. If we calculate the Derivative around the red object in the center, which is on the border between the dark pixels (2) and the light pixels, we should have a noisy pixel with a value of 10.

If we use a horizontal kernel 1D around the red object, it will give more importance to the value of pixel 10, which is a clear scan. At the same time, the sobel operator is more powerful: It will take 10 out of 10, and pixels with a value of 7 around it. In a sense, sobel operators work smoothly.

When comparing several kernels at the same time, it is recommended to normalize the matrix kernels to ensure that they are all at the same level. One of the most common applications for general practitioners in image analysis is feature detection.

In the case of sobel and scharr operators, they are often used to find edges – places where the size of a pixel (and its gradient) changes significantly.

OpenCV

To use the sobel operator, it is enough to use the opencv function. Let's look at its parameters:

derivative_x = cv2.Sobel(image, cv2.CV_64F, 1, 0)
derivative_y = cv2.Sobel(image, cv2.CV_64F, 0, 1)
  • The first parameter is the -ungpy sepy image.
  • The second parameter (CV2.CV_64F) is the depth of the output image data. The problem is that, in general, operators can produce output images that contain values ​​outside the range 0-255. That's why we need to specify the type of pixels we want the output image to have.
  • The third and fourth parameters represent the order of derivation in the X and Y Direction, respectively. For us, we only want the first occurrence in the direction of X and Y, so we pass the values ​​(1, 0) and (0, 1)

Let's look at the following example, where we are given a Sudoku input image:

Let's use the sobel filter:

import cv2
import matplotlib.pyplot as plt

image = cv2.imread("data/input/sudoku.png")

image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
derivative_x = cv2.Scharr(image, cv2.CV_64F, 1, 0)
derivative_y = cv2.Scharr(image, cv2.CV_64F, 0, 1)

derivative_combined = cv2.addWeighted(derivative_x, 0.5, derivative_y, 0.5, 0)

min_value = min(derivative_x.min(), derivative_y.min(), derivative_combined.min())
max_value = max(derivative_x.max(), derivative_y.max(), derivative_combined.max())

print(f"Value range: ({min_value:.2f}, {max_value:.2f})")

fig, axes = plt.subplots(1, 3, figsize=(16, 6), constrained_layout=True)

axes[0].imshow(derivative_x, cmap='gray', vmin=min_value, vmax=max_value)
axes[0].set_title("Horizontal derivative")
axes[0].axis('off')

image_1 = axes[1].imshow(derivative_y, cmap='gray', vmin=min_value, vmax=max_value)
axes[1].set_title("Vertical derivative")
axes[1].axis('off')

image_2 = axes[2].imshow(derivative_combined, cmap='gray', vmin=min_value, vmax=max_value)
axes[2].set_title("Combined derivative")
axes[2].axis('off')

color_bar = fig.colorbar(image_2, ax=axes.ravel().tolist(), orientation='vertical', fraction=0.025, pad=0.04)

plt.savefig("data/output/sudoku.png")

plt.show()

Because of this, we can see that the horizontal and vertical detections find the lines perfectly! Additionally, the combination of those lines allows us to see both types of features:

The operator

Another popular method in Keber Kernel is the Scharr operator:

Despite its great similarity to the structure of the sobel operator, the scharr kernel achieves high accuracy in detection tasks. It has several important mathematical properties that we will not consider in this article.

OpenCV

The use of the Scharr filter in OpenCV is very similar to what we saw above with the sobel filter. The only difference is another method name (other parameters are the same):

derivative_x = cv2.Scharr(image, cv2.CV_64F, 1, 0)
derivative_y = cv2.Scharr(image, cv2.CV_64F, 0, 1)

Here is the result we get with the Scharr filter:

In this case, it is challenging to notice the difference in the results of both operators. However, by looking at the color map, we can see that the range of possible values ​​produced by the scharr operator is much greater (-800, +800) than it was for salt (-200, +200). That's normal since the scharr kernel has big races.

It is also a good example of why we need to use the special type CV2.CV_64F. Otherwise, the values ​​would have been lumped into the standard range between 0 and 255, and we would have lost important information about the gradients.

Be careful. Applying SAVE methods directly to CV2.CV_64F Images can be wrong. To save such images on disk, they need to be converted to another format and contain only values ​​between 0 and 255.

Lasting

Using Calculus tools in computer vision, we learned the important properties of images that allow us to find high peaks in images. This information is useful since feature detection is a common task in image analysis, especially when there are constraints in image processing or when machine learning algorithms are not used.

We also looked at an example using OpenCV to see how boundary detection works with sobel and scharr operators. In the following articles, we will learn more advanced algorithms for feature detection and explore OpenCV examples.

Resources

All images unless otherwise noted by the author.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button