Generative AI

Meet Turbovec: A Rust Vector Index with Python Bindings, and Built on Google's TurboQuant Algorithm

Vector Search supports multiple regression generation augmented (RAG) pipelines. At scale, it's expensive. Storing 10 million embedded documents in float32 consumes 31 GB of RAM. For dev teams that use local or local assumptions, that number creates real issues.

A new open source library called turbovec it says this directly. It's a vector directory written in Rust with Python bindings. It is built upon TurboQuantquantization algorithm from Google Research. The same 10-million-document corpus is equivalent to 4 GB with turbovec. On ARM hardware, the search speed exceeds FAISS IndexPQFastScan by 12–20%.

TurboQuant paper

TurboQuant was launched by Google's research team. The Google team proposes TurboQuant as a data-agnostic quantizer. It achieves very close distortion ratios across all bit-widths and dimensions. It requires zero training and zero data overhead.

Most production-grade vector quantizers, including the FAISS Product Expansion, require a codebook training step. You must run the mean over a representative sample of your vectors before indexing can begin. If your corpus grows or changes, you may need to retrain and rebuild the index entirely. TurboQuant skips all that. It uses the analysis feature of rotated vectors instead of data-dependent approximation.

How turbovec creates Vectors

The quantization pipeline has four steps:

(1) Each vector i normal. The (normal) length is stripped and stored as a single float. Every vector becomes a unit direction on a high-dimensional hypersphere.

(2) A random rotation is used. All vectors are multiplied by the same random orthogonal matrix. After rotation, each connection independently follows a Beta distribution. At high dimensions, this converges to the Gaussian N(0, 1/d). This holds any input data — rotation makes the distribution of links predictable.

(3) Lloyd-Max scalar quantization is used. Because the distribution is known analytically, the correct bucket boundaries and centroids can be calculated in advance from statistics alone. For 2-bit quantization, that means 4 buckets per coordinate. For 4-bit, that means 16 buckets. No data passes are required.

(4) Measured links are it's a little crowded in bytes. A 1536-dimensional vector shrinks from 6,144 bytes in FP32 to 384 bytes in 2-bit. That's a 16x compression ratio.

During the search, the query is rotated once in the same domain. Scoring occurs directly against codebook values. The scoring kernel uses SIMD intrinsics – NEON on ARM and AVX-512BW on modern x86, with AVX2 fallback – with nibble division lookup tables for output.

TurboQuant achieves a bias within about 2.7x of Shannon's lower limit of information theory.

Remember and speed: Numbers

All benchmarks use 100K vectors, 1,000 queries, k=64, and report medians of 5 runs.

To recall, turbovec compares to FAISS IndexPQ (LUT256, nbits=8, float32 LUT). This is a solid foundation: FAISS uses a high-precision LUT during scoring and k-means++ for training the codebook. Besides this, TurboQuant and FAISS are within 0–1 point of R@1 embedding of OpenAI at d=1536 and d=3072. Both converge to 1.0 recall with k=4–8. The glove at d=200 is heavy. In that field, TurboQuant trails FAISS by 3–6 points in R@1, closing at k≈16–32.

For speed, ARM results (Apple M3 Max) show turbovec beating FAISS IndexPQFastScan by 12–20% for all configurations. On x86 (Intel Xeon Platinum 8481C / Sapphire Rapids, 8 vCPUs), turbovec wins all 4-bit configurations by 1–6%. It works within ~1% of FAISS for 2-bit single thread. Two configurations sit a little behind FAISS: 2-bit multiple threads at d=1536 and d=3072. There, the internal accumulate loop is too short for amortization. The AVX-512 VBMI method of FAISS holds the edge in those two cases (2–4%).

Python API

To enter a single command: pip install turbovec. The first section is TurboQuantIndexinitiated by the size and width of the bit.

from turbovec import TurboQuantIndex

index = TurboQuantIndex(dim=1536, bit_width=4)
index.add(vectors)
scores, indices = index.search(query, k=10)
index.write("my_index.tq")

The second stage, IdMapIndexsupports stable external uint64 IDs that survive deletion. Removal is O(1) per ID. This is useful for document stores where vectors are frequently updated or deleted.

turbovec integrates with LangChain (pip install turbovec[langchain]), LlamaIndex (pip install turbovec[llama-index]) and Haystack (pip install turbovec[haystack]). Rust crate is available with cargo add turbovec.

Marktechpost Visual Explainer

What is a turbovec?

turbovec is a vector index written in Rust with Python bindings. It's built on Google Research's TurboQuant algorithm — a data-agnostic estimator that requires zero codebook training. A 10 million document corpus that takes up 31 GB as float32 is equivalent to 4 GB with turbovec.

16x compression to 2-bit

💨 Beats FAISS on ARM by 12–20%

🔒 Full location — no data outage

📦 MIT license

Installation

Install a Python package from PyPI with one command. For rust, add a crate with Cargo.

# Python
pip install turbovec

# Rust
cargo add turbovec

Note: To build from source, install maturin then run formulation of maturin – release within the turbovec-python/ directory. For rust, run construction of goods – exemption.

Basic Usage – TurboQuantIndex

TurboQuantIndex it is the first stage. Run it with vector dim and a bit_width in 2 or 4. Vectors are quickly identified add() – no training step is required.

from turbovec import TurboQuantIndex

index = TurboQuantIndex(dim=1536, bit_width=4)

# Add vectors (numpy float32 array, shape [n, dim])
index.add(vectors)
index.add(more_vectors)  # incremental adds are fine

# Search: returns top-k scores and positional indices
scores, indices = index.search(query, k=10)

Fixed Ids – IdMapIndex

Use it IdMapIndex when you need external be 64 IDs survive deletion. Removal is O(1) per ID — useful for document stores where vectors change over time.

import numpy as np
from turbovec import IdMapIndex

index = IdMapIndex(dim=1536, bit_width=4)

# Map vectors to your own uint64 external IDs
index.add_with_ids(vectors, np.array([1001, 1002, 1003], dtype=np.uint64))

# Search returns your external IDs, not positional indices
scores, ids = index.search(query, k=10)

# O(1) delete by external IDnindex.remove(1002)

Save & Load Index

Both index types support persistent storage. TurboQuantIndex you write to .tq files. IdMapIndex you write to .tvim files.

from turbovec import TurboQuantIndex, IdMapIndex

# TurboQuantIndex  —>  .tq
index.write("my_index.tq")
loaded = TurboQuantIndex.load("my_index.tq")

# IdMapIndex  —>  .tvim
index.write("my_index.tvim")
loaded = IdMapIndex.load("my_index.tvim")

Framework integration

turbovec ships with optional plugins for LangChain, LlamaIndex, and Haystack. Add more like your stack.

# LangChain
pip install turbovec[langchain]

# LlamaIndex
pip install turbovec[llama-index]

# Haystack
pip install turbovec[haystack]

Tip: Each coupling connects the turbovec as a store of the drag vector. Look documentation/compilation/ in the repo for full usage examples for each framework.

Using turbovec in Rust

The Rust API includes the Python API. Both TurboQuantIndex again IdMapIndex are available. All x86_64 builds target AVX2 as base; AVX-512 is enabled at runtime by feature detection.

use turbovec::TurboQuantIndex;

let mut index = TurboQuantIndex::new(1536, 4);
index.add(&vectors);

let results = index.search(&queries, 10);

index.write("index.tv").unwrap();
let loaded = TurboQuantIndex::load("index.tv").unwrap();

📚 Full API: docs/api.md

⭐ github.com/RyanCodrai/turbovec

Key Takeaways

  • No codebook training. turbovec indexes vectors instantly – no k-means, no reconstruction as the corpus grows.
  • 16x compression. A 1536-dim float32 vector shrinks from 6,144 bytes to 384 bytes with 2-bit quantization.
  • Faster than FAISS on ARM. turbovec outperforms FAISS IndexPQFastScan by 12–20% on ARM for all configurations.
  • Close distortion. TurboQuant achieves a distortion within ~2.7x of Shannon's lower bound – almost close to the theoretical limit.
  • Full location. No managed service, no data out – pairs or any open source embedding model for the air-gapped RAG stack.

Check it out Repo here. Also, feel free to follow us Twitter and don't forget to join our 150k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.

Need to work with us on developing your GitHub Repo OR Hug Face Page OR Product Release OR Webinar etc.? contact us


Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button