The Meaning Map: How Embedded Models “Understand” Human Language

you work with Artificial Intelligence development, if you study, or plan to work with that technology, you have stumbled a lot embedding models on your journey.
At its heart, an embedding model is a neural network trained to map like words or sentences into a continuous vector space, with the goal of statistically comparing those semantically or conceptually similar items.
To put it in simple words, think of a library where books are not only sorted by author and title, but by many other dimensions, such as the vibe, the subject, state of mind, writing styleetc.
Another good analogy is the map itself. Imagine a map with two cities you don't know. Let's say you're not that good at Geography and you don't know where Tokyo and New York City are on a map. If I tell you that we should have breakfast in NYC and lunch in Tokyo, you might say: “Let's do it”.
However, once I give you the coordinates to check the cities on the map, you will see that they are very far apart. That's like rendering embedded in a model: that's links!
Building a Map
Even before you asked the question, the embedding model was trained. Learn millions of sentences and recognized patterns. For example, it recognizes that “cat” and “cat” often appear in the same types of sentences, while “cat” and “refrigerator” rarely appear.
With those patterns, the model assigns each word a set of coordinates in statistical space, like an abstract map.
- Similar concepts (such as “cat” and “cat”) are placed next to each other on the map.
- Somewhat related concepts (such as “cat” and “dog”) are placed next to each other, but not on top of each other.
- Completely unrelated concepts (like “cat” and “quantum physics”) are placed in completely different corners of the map, like NYC and Tokyo.
Digital Fingerprints
Good. Now we know how the map was created. What's next?
Now we will work with this trained embedding model. Once we have given the model a sentence like “The fluffy cat is sleeping”:
- It doesn't look for letters. Instead, it visits those links on its map word by word.
- Count i middle ground (average) of all those areas. That midpoint becomes the “fingerprint” of the entire sentence.
- It places a pin on the map where your query's fingerprint is located
- You look around the circle to see what other fingerprints are nearby.
Any documents that “sit” near your query on this map are considered similar, because they share the same “vibe” or topic, even if they don't share exactly the same names.
It's like searching for a book not by searching for a specific keyword, but by pointing to an area on the map that says “all books about kittens,” and letting the model retrieve everything from that area.
Steps to Embedding Models
Let's see next how the embedding model works step by step after receiving the request.
- The computer takes the text.
- It breaks it down into tokens, which are small pieces of a phrase that have meaning. Usually, that is a word or part of a word.
- Chunking: The input text is divided into manageable chunks (typically around 512 tokens), so that you are not overwhelmed by too much information at once.
- Embedding: Converts each snippet into a long list of numbers (vector) that works as a variable fingerprints representing the meaning of that document.
- Vector Search: When you ask a question, the model turns your question into a “fingerprint” and quickly calculates which stored snippets have the most statistically similar numbers.
- The model returns the most similar vectors, which are associated with text fragments.
- A generation: When you do Retrieval-Augmented Generation (RAG), the model gives those few “winning” words to an AI (like LLM) that reads them and writes a natural sounding answer based only on that particular information.
Coding
Good. We talked a lot. Now, let's try to code a bit and make those ideas more practical.
We will start with embedding BERT (Bidirectional Encoder Representations from Transformers). Developed by Google and powered by Transformer Architecture and its method of attention. The word vector changes based on the words around it.
# Imports
from transformers import BertTokenizer
# Load pre-trained BERT tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
# Sample text for tokenization
text = "Embedding models are so cool!"
# Step 1: Tokenize the text
tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
# View
tokens
{'input_ids': tensor([[ 101, 7861, 8270, 4667, 4275, 2024, 2061, 4658, 999, 102]]),
'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]),
'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
Note how each word was converted to an id. Since we only have 5 words, some of them may be split into two smaller words.
- ID 101 is associated with a token [CLS]. A vector of such tokens is assumed to capture the overall meaning or information of an entire sentence or sequence of sentences. It is like a stamp showing LLMs the meaning of that passage. [2]
- ID 102 is associated with a token [SEP] to different sentences. [2]
Next, let's apply the embedding model to the data.
Embedding
Here are some simple snippets where we get text and encode it with a multi-purpose embedding model all-MiniLM-L6-v2.
from qdrant_client import QdrantClient, models
from sentence_transformers import SentenceTransformer
# 1. Load embedding model
model = SentenceTransformer('all-MiniLM-L6-v2', device='cpu')
# 2. Initialize Qdrant client
client = QdrantClient(":memory:")
# 3. Create embeddings
docs = ["refund policy", "pricing details", "account cancellation"]
vectors = model.encode(docs).tolist()
# 4. Store Vectors: Create a collection (DB)
client.create_collection(
collection_name="my_collection",
vectors_config = models.VectorParams(size=384,
distance= models.Distance.COSINE)
)
# Upload embedded docs (vectors)
client.upload_collection(collection_name="my_collection",
vectors= vectors,
payload= [{"source": docs[i]} for i in range(len(docs))])
# 5. Search
query_vector = model.encode("How do I cancel my subscription")
# Result
result = client.query_points(collection_name= 'my_collection',
query= query_vector,
limit=2,
with_payload=True)
print("nn ======= RESULTS =========")
result.points
The results are as expected. It points to account cancellation topic!
======= RESULTS =========
[ScoredPoint(id='b9f4aa86-4817-4f85-b26f-0149306f24eb', version=0, score=0.6616353073200185, payload={'source': 'account cancellation'}, vector=None, shard_key=None, order_value=None),
ScoredPoint(id='190eaac1-b890-427b-bb4d-17d46eaffb25', version=0, score=0.2760082702501182, payload={'source': 'refund policy'}, vector=None, shard_key=None, order_value=None)]
What just happened?
- We introduced a pre-trained embedding model
- A vector database of our choice has been established: Qdrant [3].
- Embedded the script and loaded it into the vector DB in a new collection.
- Send us a question.
- The results are those documents that have the closest statistical “fingerprint”, or definition of the query embedding.
This is really good.
To finish this article, I wonder if we can try to fine-tune the embedding model. Let's try.
Fine-tuning the embedding model
Fine-tuning an embedding model is different from fine-tuning an LLM. Instead of teaching the model to “talk,” you teach it rearrange its internal map to specific concepts in your domain they are pushed too far apart or brought closer together.
The most common and efficient way to do this is to use Contrastive Learning with a library like Sentence-Transformers.
First, teach the model what proximity looks like using three data points.
- An anchor: Reference object (eg, “Brand A Cola Soda”)
- Good: Same item (eg, “Brand B Cola Soda”) that model should be combined.
- Bad: A unique item (eg, “Brand A Cola Soda Zero Sugar”) that the model should remove.
Next, we choose a Loss Function to tell the model how much it should change if it makes a mistake. You can choose between:
- MultipleNegativesRankingLoss: Good if you have (Anchor, Positive) pairs only. It assumes that everything else good in the collection is “bad” for the current anchor.
- TripletLoss: Best if you have sets (Peg, Good, Bad) that are clear. Forces the distance between Anchor-Positive to be smaller than Anchor-Negative by some margin.
These are the simulation results for the out-of-the-box model.
from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader
from sentence_transformers import util
# 1. Load a pre-trained base model
model = SentenceTransformer('all-MiniLM-L6-v2')
# 1. Define your test cases
query = "Brand A Cola Soda"
choices = [
"Brand B Cola Soda", # The 'Positive' (Should be closer now)
"Brand A Cola Soda Zero Sugar" # The 'Negative' (Should be further away now)
]
# 2. Encode the text into vectors
query_vec = model.encode(query)
choice_vecs = model.encode(choices)
# 3. Compute Cosine Similarity
# util.cos_sim returns a matrix, so we convert to a list for readability
cos_scores = util.cos_sim(query_vec, choice_vecs)[0].tolist()
print(f"nn ======= Results for: {query} ===============")
for i, score in enumerate(cos_scores):
print(f"-> {choices[i]}: {score:.5f}")
======= Results for: Brand A Cola Soda ===============
-> Brand B Cola Soda: 0.86003
-> Brand A Cola Soda Zero Sugar: 0.81907
And when we try to fine tune, we show this model that Cola Sodas should be closer than the Zero Sugar version, that's what happens.
from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader
from sentence_transformers import util
# 1. Load a pre-trained base model
fine_tuned_model = SentenceTransformer('all-MiniLM-L6-v2')
# 2. Define your training data (Anchors, Positives, and Negatives)
train_examples = [
InputExample(texts=["Brand A Cola Soda", "Cola Soda", "Brand C Cola Zero Sugar"]),
InputExample(texts=["Brand A Cola Soda", "Cola Soda", "Brand A Cola Zero Sugar"]),
InputExample(texts=["Brand A Cola Soda", "Cola Soda", "Brand B Cola Zero Sugar"])
]
# 3. Create a DataLoader and choose a Loss Function
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)
train_loss = losses.TripletLoss(model=fine_tuned_model)
# 4. Tune the model
fine_tuned_model.fit(train_objectives=[(train_dataloader, train_loss)],
optimizer_params={'lr': 9e-5},
epochs=40)
# 1. Define your test cases
query = "Brand A Cola Soda"
choices = [
"Brand B Cola Soda", # The 'Positive' (Should be closer now)
"Brand A Cola Zero Sugar" # The 'Negative' (Should be further away now)
]
# 2. Encode the text into vectors
query_vec = fine_tuned_model.encode(query)
choice_vecs = fine_tuned_model.encode(choices)
# 3. Compute Cosine Similarity
# util.cos_sim returns a matrix, so we convert to a list for readability
cos_scores = util.cos_sim(query_vec, choice_vecs)[0].tolist()
print(f"nn ======== Results for: {query} ====================")
for i, score in enumerate(cos_scores):
print(f"-> {choices[i]}: {score:.5f}")
======== Results for: Brand A Cola Soda ====================
-> Brand B Cola Soda: 0.86247
-> Brand A Cola Zero Sugar: 0.75732
Here, we did not get the best result. This model was trained on a very large amount of data, so this fine tuning with a small example was not enough to make it perform as we expected.
But still, this is a good lesson. We were able to model fit both examples of Cola Soda, but that also brought Zero Cola Soda closer.
Alignment and Similarity
A good way to check how well the model has updated is to look at these metrics
- Alignment: Let's say you have a bunch of related items, like 'Brand A Cola Soda' and 'Cola Soda'. Alignment measures how close these related objects are to each other in the embedding environment.
- A high similarity score means that your model is good at placing similar items close together, which is what you typically want for tasks like searching for similar products.
- Similarity: Now think of all your different things, from 'return policy' to 'Quantum computing'. Similarity measures how distributed it is everything these things are in the nesting area. You want them to be evenly distributed rather than all gathered in one corner.
- A good analogy means that your model can distinguish between different concepts effectively and avoid mapping everything into a small, dense space.
A good embedding model should be calibrated. It needs to bring similar things closer together (good alignment) while at the same time pushing dissimilar things further apart and ensuring that every space is used well (good alignment). This allows the model to capture meaningful relationships without sacrificing its ability to differentiate between different concepts.
Ultimately, the ideal balance often depends on your specific application. For some tasks, such as semantic search, you may prioritize the strongest alignment, while for others, such as anomaly detection, a high level of similarity may be more important.
This is the code to calculate the alignment, which is the definition of the cosine similarity between the anchor points and the positive similarity.
from sentence_transformers import SentenceTransformer, util
import numpy as np
import torch
# --- Alignment Metric for Base Model ---
base_alignment_scores = []
# Assuming 'train_examples' was defined in a previous cell and contains (anchor, positive, negative) triplets
for example in train_examples:
# Encode the anchor and positive texts using the base model
anchor_embedding_base = model.encode(example.texts[0], convert_to_tensor=True)
positive_embedding_base = model.encode(example.texts[1], convert_to_tensor=True)
# Calculate cosine similarity between anchor and positive
score_base = util.cos_sim(anchor_embedding_base, positive_embedding_base).item()
base_alignment_scores.append(score_base)
average_base_alignment = np.mean(base_alignment_scores)
And this is the code for the Uniformity calculation. It is calculated by first taking a diverse set of embeddings, then computing the cosine similarity between all possible embeddings, and finally averaging all those pairwise similarity scores.
# --- Uniformity Metric for Base Model ---
# Use the same diverse set of texts
uniformity_embeddings_base = model.encode(uniformity_texts, convert_to_tensor=True)
# Calculate all pairwise cosine similarities
pairwise_cos_sim_base = util.cos_sim(uniformity_embeddings_base, uniformity_embeddings_base)
# Extract unique pairwise similarities (excluding self-similarity and duplicates)
upper_triangle_indices_base = torch.triu_indices(pairwise_cos_sim_base.shape[0], pairwise_cos_sim_base.shape[1], offset=1)
uniformity_similarity_scores_base = pairwise_cos_sim_base[upper_triangle_indices_base[0], upper_triangle_indices_base[1]].cpu().numpy()
# Calculate the average of these pairwise similarities
average_uniformity_similarity_base = np.mean(uniformity_similarity_scores_base)
And the results. Given the very limited training data used for fine-tuning (only 3 examples), it is not surprising that the fine-tuned model does not show a clear improvement over the baseline model in these specific metrics.
I base model it kept the related stuff low nearby together with your fine-tuned model (superior alignment), and it also stores unique, slightly unrelated objects more they spread or slightly denser than your fine-tuned model (low similarity).
* Base Model:
Base Model Alignment Score (Avg Cosine Similarity of Positive Pairs): 0.8451
Base Model Uniformity Score (Avg Pairwise Cos Sim. of Diverse Embeddings): 0.0754
* Fine Tuned Model:
Alignment Score (Average Cosine Similarity of Positive Pairs): 0.8270
Uniformity Score (Average Pairwise Cosine Similarity of Diverse Embeddings): 0.0777
Before You Go
In this article, we learned about embedding models and how they work under the hood, in a practical way.
These models have gained a lot of importance after the AI operation, which is the main engine of RAG applications and fast search.
Computers must have a way to understand text, and embedded is key. They encode text into number vectors, making it easy for models to calculate distances and find the best match.
Here is my contact, if you liked this content. I found it on my website.
Git Hub Code
References
[1. Modern NLP: Tokenization, Embedding, and Text Classification] (
[2. A Visual Guide to Using BERT for the First Time](
[3. Qdrant Docs] (



