A Coding Guide for Using a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System

nimda May 28, 2026

0 10 6 minutes read

A Coding Guide for Using a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System

In this lesson, we create a complete pgvector playground inside Google Colab and explore how PostgreSQL can serve as a powerful vector database for modern AI applications. We start by installing PostgreSQL, compiling the pgvector extension, linking with Psycopg, and registering vector types for smooth Python compilation. Then, we create embeddings with SentenceTransformers, store them in PostgreSQL, build HNSW indexes, and run semantic search, filtered search, distance metric comparison, half-precision storage, binary approximation, sparse vector search, hybrid retrieval, and vector clustering. With this workflow, we learn how pgvector supports the generation of executable retrieval, recommendation, similarity search, and hybrid search systems using only open source tools.

Copy the Code

import os
import subprocess
import sys
import time
def sh(cmd: str, check: bool = True):
   """Run a shell command, streaming a compact log."""
   print(f"  $ {cmd}")
   return subprocess.run(cmd, shell=True, check=check,
                         stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
print("[0/10] Installing PostgreSQL + building pgvector (≈1–2 min)...")
sh("apt-get -qq update")
sh("apt-get -qq install -y postgresql postgresql-contrib "
  "postgresql-server-dev-all build-essential git")
if not os.path.exists("/tmp/pgvector"):
   sh("git clone --depth 1  /tmp/pgvector")
sh("cd /tmp/pgvector && make && make install")
sh("service postgresql start")
time.sleep(3)
sh("""sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'postgres';" """)
print("[0/10] Installing Python packages...")
sh(f"{sys.executable} -m pip install -q pgvector psycopg[binary] "
  f"sentence-transformers numpy")

Set up a complete environment for PostgreSQL and pgvector. We install the necessary system packages, compile and build pgvector from source, start the PostgreSQL service, and configure the database password. We also include the Python dependencies needed to connect to PostgreSQL and work with vector embedding.

Copy the Code

import numpy as np
import psycopg
from pgvector import HalfVector, SparseVector
from pgvector.psycopg import register_vector
from sentence_transformers import SentenceTransformer
print("n[1/10] Connecting and enabling the 'vector' extension...")
conn = psycopg.connect(
   "host=127.0.0.1 port=5432 dbname=postgres user=postgres password=postgres",
   autocommit=True,
)
conn.execute("CREATE EXTENSION IF NOT EXISTS vector")
register_vector(conn)
ver = conn.execute("SELECT extversion FROM pg_extension WHERE extname="vector"").fetchone()[0]
print(f"      pgvector version: {ver}")
print("n[2/10] Loading embedding model + encoding corpus...")
model = SentenceTransformer("all-MiniLM-L6-v2")
DIM = model.get_sentence_embedding_dimension()
corpus = [
   ("Octopuses have three hearts and blue blood.",             "animals"),
   ("Transformers revolutionized natural language processing.","technology"),
   ("Quantum computers exploit superposition and entanglement.","technology"),
   ("GPUs accelerate deep learning by parallelizing matrix math.","technology"),
   ("Sourdough bread relies on wild yeast and lactobacilli.",  "food"),
   ("Dark chocolate contains flavonoid antioxidants.",         "food"),
   ("A black hole's gravity is so strong light cannot escape.","space")
]
contents   = [c for c, _ in corpus]
categories = [k for _, k in corpus]
embeddings = model.encode(contents, normalize_embeddings=True)
conn.execute("DROP TABLE IF EXISTS documents")
conn.execute(f"""
   CREATE TABLE documents (
       id        bigserial PRIMARY KEY,
       content   text,
       category  text,
       embedding vector({DIM})
   )
""")
with conn.cursor() as cur:
   cur.executemany(
       "INSERT INTO documents (content, category, embedding) VALUES (%s, %s, %s)",
       list(zip(contents, categories, [np.asarray(e) for e in embeddings])),
   )
print(f"      Inserted {len(corpus)} documents with {DIM}-d embeddings.")

We connect to PostgreSQL, enable the pgvector extension, and register vector support with Psycopg. We load the SentenceTransformers model, define a small text corpus, generate standard embeddings, and create a PostgreSQL table to store the documents. We then input each document with its category and vector representation to perform semantic search later.

Copy the Code

print("n[3/10] Building HNSW index and running semantic search...")
conn.execute(
   "CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops) "
   "WITH (m = 16, ef_construction = 64)"
)
conn.execute("SET hnsw.ef_search = 100")
def semantic_search(query: str, k: int = 4):
   q = np.asarray(model.encode(query, normalize_embeddings=True))
   return conn.execute(
       "SELECT content, category, embedding <=> %s AS distance "
       "FROM documents ORDER BY distance LIMIT %s",
       (q, k),
   ).fetchall()
for content, cat, dist in semantic_search("animals that are unusually quick"):
   print(f"      {dist:.3f}  [{cat:<10}] {content}")
print("n[4/10] Filtered search (only category = 'space')...")
q = np.asarray(model.encode("objects with extreme gravity", normalize_embeddings=True))
rows = conn.execute(
   "SELECT content, embedding <=> %s AS distance "
   "FROM documents WHERE category = %s ORDER BY distance LIMIT 3",
   (q, "space"),
).fetchall()
for content, dist in rows:
   print(f"      {dist:.3f}  {content}")
print("n[5/10] Same query under different distance metrics (top hit each)...")
q = np.asarray(model.encode("brewing a hot caffeinated drink", normalize_embeddings=True))
for op, label in [("<->", "L2"), ("<=>", "cosine"), ("<#>", "neg-inner"), ("<+>", "L1")]:
   content, score = conn.execute(
       f"SELECT content, embedding {op} %s AS s FROM documents ORDER BY s LIMIT 1", (q,)
   ).fetchone()
   print(f"      {label:<10} {score:+.3f}  {content}")

We create an HNSW index on the embedding column to allow fast, highly efficient vector searches. We describe a semantic search function that transforms a query into an embedding and retrieves the most similar documents using cosine similarity. We also perform metadata filtered searches and compare different pgvector range operators such as L2, cosine, negative inner product, and L1.

Copy the Code

print("n[6/10] Half-precision storage with halfvec...")
conn.execute(f"ALTER TABLE documents ADD COLUMN IF NOT EXISTS embedding_half halfvec({DIM})")
conn.execute("UPDATE documents SET embedding_half = embedding::halfvec")
conn.execute(
   "CREATE INDEX ON documents USING hnsw (embedding_half halfvec_cosine_ops)"
)
q_half = HalfVector(model.encode("the galaxy we live in", normalize_embeddings=True))
rows = conn.execute(
   "SELECT content, embedding_half <=> %s AS d FROM documents ORDER BY d LIMIT 2",
   (q_half,),
).fetchall()
for content, d in rows:
   print(f"      {d:.3f}  {content}")
print("n[7/10] Binary quantization (Hamming) + exact re-rank...")
conn.execute(
   f"CREATE INDEX ON documents "
   f"USING hnsw ((binary_quantize(embedding)::bit({DIM})) bit_hamming_ops)"
)
q = np.asarray(model.encode("parallel hardware for AI training", normalize_embeddings=True))
rerank_sql = f"""
   SELECT content, candidates.embedding <=> %(q)s AS exact_distance
   FROM (
       SELECT content, embedding
       FROM documents
       ORDER BY binary_quantize(embedding)::bit({DIM})
             <~> binary_quantize(%(q)s)::bit({DIM})
       LIMIT 8
   ) AS candidates
   ORDER BY exact_distance
   LIMIT 3
"""
for content, d in conn.execute(rerank_sql, {"q": q}).fetchall():
   print(f"      {d:.3f}  {content}")
print("n[8/10] Native sparse vectors...")
conn.execute("DROP TABLE IF EXISTS sparse_items")
conn.execute("CREATE TABLE sparse_items (id bigserial PRIMARY KEY, embedding sparsevec(10))")
sparse_data = [
   SparseVector({0: 1.0, 3: 2.0, 7: 1.5}, 10),
   SparseVector({1: 0.5, 3: 1.0, 9: 3.0}, 10),
   SparseVector({0: 0.2, 4: 2.5, 7: 0.8}, 10),
]
with conn.cursor() as cur:
   cur.executemany("INSERT INTO sparse_items (embedding) VALUES (%s)",
                   [(v,) for v in sparse_data])
query_sparse = SparseVector({0: 1.0, 7: 1.0}, 10)
rows = conn.execute(
   "SELECT id, embedding, embedding <#> %s AS neg_ip "
   "FROM sparse_items ORDER BY neg_ip LIMIT 3",
   (query_sparse,),
).fetchall()
for _id, vec, neg_ip in rows:
   print(f"      id={_id}  inner_product={-neg_ip:.2f}  nnz_indices={vec.indices()}")

We explore advanced pgvector storage and retrieval methods beyond standard dense vectors. We convert the embeddings to half-precision vectors to reduce storage, use binary approximation with Hamming search to find the fastest candidate to retrieve, and then rescale the results to full-precision vectors. We also create different vectors and query them using inner product similarity, which is useful for finding keyword weight or SPLADE style.

Copy the Code

print("n[9/10] Hybrid search (vector + full-text) via RRF...")
user_query = "fast animal"
qvec = np.asarray(model.encode(user_query, normalize_embeddings=True))
hybrid_sql = """
WITH semantic AS (
   SELECT id, RANK() OVER (ORDER BY embedding <=> %(qvec)s) AS rank
   FROM documents
   ORDER BY embedding <=> %(qvec)s
   LIMIT 20
),
keyword AS (
   SELECT d.id,
          RANK() OVER (ORDER BY ts_rank_cd(to_tsvector('english', d.content), q) DESC) AS rank
   FROM documents d, plainto_tsquery('english', %(qtext)s) AS q
   WHERE to_tsvector('english', d.content) @@ q
   LIMIT 20
)
SELECT d.content,
      COALESCE(1.0 / (60 + semantic.rank), 0.0)
    + COALESCE(1.0 / (60 + keyword.rank),  0.0) AS rrf_score
FROM documents d
LEFT JOIN semantic ON d.id = semantic.id
LEFT JOIN keyword  ON d.id = keyword.id
WHERE semantic.id IS NOT NULL OR keyword.id IS NOT NULL
ORDER BY rrf_score DESC
LIMIT 4
"""
for content, score in conn.execute(hybrid_sql, {"qvec": qvec, "qtext": user_query}).fetchall():
   print(f"      {score:.5f}  {content}")
print("n[10/10] Aggregating vectors with AVG (category centroid)...")
centroid = conn.execute(
   "SELECT AVG(embedding) FROM documents WHERE category = %s", ("food",)
).fetchone()[0]
typical = conn.execute(
   "SELECT content, embedding <=> %s AS d FROM documents "
   "WHERE category = %s ORDER BY d LIMIT 1",
   (np.asarray(centroid), "food"),
).fetchone()
print(f"      Centroid dim = {len(centroid)}")
print(f"      Most representative 'food' doc: {typical[0]}")
print("n Done. You now have a working pgvector playground inside Colab.")
print("   Try editing `corpus`, the queries, or swap in your own embedding model.")

We combine semantic vector search with PostgreSQL full-text search using Reciprocal Rank Fusion. We get results from both semantic and keyword levels, combine their scores, and produce a robust hybrid search result. Finally, we calculate the average embedding of a category and use it as the centroid to find the most representative document in that category.

In conclusion, we have a pgvector-based retrieval system that works entirely on Google Colab, without external services or API keys. We use PostgreSQL not just as a standard relational database, but as a dynamic vector search engine that supports dense vectors, semi-exact vectors, binary retrieval, sparse vectors, full-text search, and clustering. We also saw that metadata filtering, HNSW indexing, Reciprocal Rank Fusion, and centroid-based analysis make pgvector useful in real-world AI search pipelines.

Check it out Full Codes with Notebook here. Also, feel free to follow us Twitter and don't forget to join our 150k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.

Need to work with us on developing your GitHub Repo OR Hug Face Page OR Product Release OR Webinar etc.?contact us

The post Coding Guide for Implementing a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System appeared first on MarkTechPost.

Source link

nimda May 28, 2026

0 10 6 minutes read