Strengths and Pitfalls of Vector-Based Image Search

0 5 6 minutes read

Strengths and Pitfalls of Vector-Based Image Search

of any e-commerce platform. Users expect to see relevant results, and they want them quickly. Because of this, commerce teams are constantly working to improve both performance and perceived search quality to keep users happy and prevent churn.

Please look at the following three images.

They look similar from a visual perspective but the objects in these pictures are completely unrelated. This comparison shows both the strengths and inherent limitations of image search.

All images used in this article are from our internal database which we use for testing and evaluation.

One case of image search is to find duplicate products. While product titles and descriptions can certainly be used to find duplicates, some listings include slightly (or completely) different titles while using the exact same images. Finding these lists is also very important.

Because e-commerce platforms often offer millions of products, we need effective tools and methods to perform any type of search at scale.

In this article, I will show you how to set up a vector database for embedding vector images and perform a search in this database. I will also explain in detail both the advantages and limitations of vector-based image search.

Here is a rough outline of the essay:

Convert images to vectors: Convert visual data into searchable embeds.

Create a Milvus collection: Set up a Milvus collection, which is the main logical unit of data organization in the Milvus vector database.

Search image: Search for target images in this collection.

Interpret the results: Go through some examples and interpret the search results.

Let's start by finding our vectors.

Convert images to vectors

The first step is to convert images into vectors, which are numerical representations of physical data. Vector size is important and the right size depends on the application. A maximum length of 128, 512, or 768 are common options. If we increase the size, we capture more information and expect more accurate results but it comes at the cost of larger storage size and more latency in searching.

We need an embedding model to convert images into vectors. We can train our own model but there are several ready-to-use models available, both free and paid.

For example, the following code block takes a JPEG image and converts it to a 512-dimensional vector using the open-source clip-ViT-B-32 model.

from PIL import Image
from sentence_transformers import SentenceTransformer

sku_image = Image.open("sample_image.jpeg")
model = SentenceTransformer('clip-ViT-B-32')
image_vector = model.encode(sku_image)

type(image_vector), image_vector.shape
(numpy.ndarray, (512,))

The image_vector variable is a Numpy array of size 512.

Create a collection of Milvus

Milvus is a vector database and a collection in Milvus is a two-dimensional table with fixed columns and rows. Each column represents a field, and each row represents an entity, which is a picture for us.

We'll create an array with two fields: an id field (such as a product's SKU) and a corresponding vector field for that SKU image.

There are several ways to create and interact with a collection. I prefer to use Python whenever I can to go with it pymilvus module.

The first step is to create a client object. This is how to connect to your vector database in Milvus.

from pymilvus import MilvusClient, DataType

client = MilvusClient(uri=" # your milvus db uri

Then, we define our collection schema:

schema = client.create_schema(auto_id=False, enable_dynamic_field=True)


# Add fields to schema
schema.add_field(field_name="sku_id", datatype=DataType.VARCHAR, max_length=512, is_primary=True)
schema.add_field(field_name="image_vector", datatype=DataType.FLOAT_VECTOR, dim=512)

The schema has two fields: sku_id again image_vector.

Then, we can create a collection using create_collection method:

collection_name = "test_collection"

client.create_collection(
    collection_name=collection_name,
    schema=schema,
    index_params=index_params
)

Now we can add an index to our fields. Indexing is essential to the Milvus collection, or any vector database. It speeds up search and reduces query latency, especially when working with large vector data sets.

Another option to add an index is to use the create_index work.

index_params = client.prepare_index_params()

# index for the image vector
index_params.add_index(
    field_name="image_vector",
    index_name="image_vector_idx",
    index_type="IVF_FLAT",
    metric_type="COSINE",  # Using COSINE similarity (common for images). Can also be L2 or IP.
)

# index for "sku_id" (primary key)
index_params.add_index(
    field_name="sku_id",
    index_name="sku_id_idx",
    index_type="INVERTED"
)

collection_name = "test_collection"

client.create_index(
    collection_name=collection_name,
    index_params=index_params
)

In the last step, we load the cluster and check its status.

client.load_collection(collection_name=collection_name)

# check load status
res = client.get_load_state(
    collection_name=collection_name
)

print(res)
{'state': }

We have created a collection, but it is currently empty. To ensure that our new collection was created successfully, we can list all the collections in the database using list_collections way.

client.list_collections()

['test_collection']

The next step is to insert entities (ie skus and image vectors).

Enter Businesses

Now we can load the data into our collection. To do this, the data must be formatted as a dictionary, where the keys are the same as the collection field names. We can then use a list of these dictionaries to include multiple entities at once.

I have vector data stored in a Pandas DataFrame as shown below:

We can use the to_dict how to convert this DataFrame into a list of dictionaries where each dictionary represents an entity in our collection.

df.to_dict(orient="records")

[{'sku_id': 'HBCV00009LIR5S',
  'image_vector': array([ 0.1206549 ,  0.00597879, -0.07224327,  0.02327867, -0.09490156,
          0.02150885,  0.10642719, -0.10139938,  0.03159734,  0.05613545,
         -0.07615539, -0.15523671, -0.10006154,  0.05045145,  0.07733533,
         -0.03749327, -0.02301577,  0.13337888,  0.00096778,  0.05047926,
...

Once we have all the entities as a list of dictionaries, we can use the insert method to load entities into our collection:

# convert dataframe to a list of dictionaries    
data = df.to_dict(orient="records")

# insert data into collection
res = client.insert(
    collection_name=collection_name,
    data=data
)

print(res)

{'insert_count': 10000, 'ids': ['HBCV00009LIR5S', 'HBCV00001U46VH' ...]

We just loaded 10000 entities but clusters usually hold more data (eg several million entities). We cannot load millions of vectors at once. In such cases, we can divide the data into clusters and load into the cluster sequentially. For example, the loop below iterates through the entire DataFrame and inserts 10000 entities per cluster.

batch_size = 10000

for i in range(0, len(df), batch_size):

    data = df.iloc[i:i+batch_size,].to_dict(orient="records")

    res = client.insert(
        collection_name=collection_name,
        data=data
    )

Now we have a collection with sku and vector data. The next step is to search for images in this collection.

Search for an image

To search an image, we first need to convert it to a vector of the same size as the vectors stored in our collection.

Milvus has different search methods like basic search, scope search, hybrid search. We will do a basic search, which is an Approximate Nearest Nearest (ANN) search. Finds a subset of vector embeddings based on the query vector used in the search, compares the query vector with vectors in the subset, and returns multiple matching results.

The following code block reads a JPEG image, converts it to a vector using the same model we used when creating the array, and then searches for this image vector within the array.

collection_name = "test_collection"

model = SentenceTransformer('clip-ViT-B-32')
search_image = Image.open("image_to_search.jpeg")
search_image_vector = model.encode(search_image)

res = client.search(
    collection_name=collection_name,
    anns_field="image_vector",
    data=[search_image_vector],
    limit=3,
    search_params={"metric_type": "COSINE"}
)

I anns_field parameter specifies the vector field to be used in the search. Then we transfer the target image vector to data parameter. I limit parameter tells Milvus how many results to return (ie 3 returns the 3 most similar vectors in the collection).

Output of search method is a list of dictionaries like this:

print(res[0])

[{'my_id': 'HBCV0000BLJIBF', 'distance': 0.9814613848423661, 'entity': {}}
{'my_id': 'HBCV00003Z49OT', 'distance': 0.9504563808441162, 'entity': {}}
{'my_id': 'HBCV0000DCK7ML', 'distance': 0.9104360342025757, 'entity': {}}]

To evaluate the results

Searching images using vectors is very efficient and accurate. If the exact same (or exact same) image exists in the collection, you are almost guaranteed to find it.

In the examples below, the image on the left is the one being searched for, and the remaining images are the top search results.

Similar pictures

In the following examples, multiple instances of the same image exist in the collection and vector search was able to find them.

Similar and related images

In the examples below, we can clearly see that the returned images are very similar and directly related to the target image.

Similar but unrelated images

In some cases, vector search finds images that are visually similar but completely unrelated in context. Here is an example of such a case:

The searched image is a bath sponge but the results include green toys that look similar.

Here are a few more examples that include similar but completely unrelated images:

Image searches in vector databases are very useful in all situations. We can easily find similar images and images with similar visual appearance. However, as seen in the example above, visual similarity does not always indicate that the products are the same or related.

Another drawback of image search for e-commerce platforms is that completely different products may share the exact same image. For example, think about smartphones: many different models look almost identical on the outside. Relying entirely on an image search without additional content, such as a product title or model name, can easily lead to misleading results.

One solution to overcome the above-mentioned shortcomings of image search, we can use hybrid search, combining images and text vectors. Hybrid search ensures that the results are not only visually consistent but also conceptually accurate.

In the next article, I will explain step by step how to create a Milvus cluster with a multi-vector field and perform a hybrid search. We will also test this method using the same examples in this article to see how mixed search removes irrelevant search results.

Thanks for reading! Please let me know if you have an answer.

Source link

nimda 3 weeks ago

0 5 6 minutes read