Meet Turbovec: A Rust Vector Index with Python Bindings, and Built on Google's TurboQuant Algorithm

Vector Search supports multiple regression generation augmented (RAG) pipelines. At scale, it's expensive. Storing 10 million embedded documents in float32 consumes 31 GB of RAM. For dev teams that use local or local assumptions, that number creates real issues.
A new open source library called turbovec it says this directly. It's a vector directory written in Rust with Python bindings. It is built upon TurboQuantquantization algorithm from Google Research. The same 10-million-document corpus is equivalent to 4 GB with turbovec. On ARM hardware, the search speed exceeds FAISS IndexPQFastScan by 12–20%.
TurboQuant paper
TurboQuant was launched by Google's research team. The Google team proposes TurboQuant as a data-agnostic quantizer. It achieves very close distortion ratios across all bit-widths and dimensions. It requires zero training and zero data overhead.
Most production-grade vector quantizers, including the FAISS Product Expansion, require a codebook training step. You must run the mean over a representative sample of your vectors before indexing can begin. If your corpus grows or changes, you may need to retrain and rebuild the index entirely. TurboQuant skips all that. It uses the analysis feature of rotated vectors instead of data-dependent approximation.
How turbovec creates Vectors
The quantization pipeline has four steps:
(1) Each vector i normal. The (normal) length is stripped and stored as a single float. Every vector becomes a unit direction on a high-dimensional hypersphere.
(2) A random rotation is used. All vectors are multiplied by the same random orthogonal matrix. After rotation, each connection independently follows a Beta distribution. At high dimensions, this converges to the Gaussian N(0, 1/d). This holds any input data — rotation makes the distribution of links predictable.
(3) Lloyd-Max scalar quantization is used. Because the distribution is known analytically, the correct bucket boundaries and centroids can be calculated in advance from statistics alone. For 2-bit quantization, that means 4 buckets per coordinate. For 4-bit, that means 16 buckets. No data passes are required.
(4) Measured links are it's a little crowded in bytes. A 1536-dimensional vector shrinks from 6,144 bytes in FP32 to 384 bytes in 2-bit. That's a 16x compression ratio.
During the search, the query is rotated once in the same domain. Scoring occurs directly against codebook values. The scoring kernel uses SIMD intrinsics – NEON on ARM and AVX-512BW on modern x86, with AVX2 fallback – with nibble division lookup tables for output.
TurboQuant achieves a bias within about 2.7x of Shannon's lower limit of information theory.
Remember and speed: Numbers
All benchmarks use 100K vectors, 1,000 queries, k=64, and report medians of 5 runs.
To recall, turbovec compares to FAISS IndexPQ (LUT256, nbits=8, float32 LUT). This is a solid foundation: FAISS uses a high-precision LUT during scoring and k-means++ for training the codebook. Besides this, TurboQuant and FAISS are within 0–1 point of R@1 embedding of OpenAI at d=1536 and d=3072. Both converge to 1.0 recall with k=4–8. The glove at d=200 is heavy. In that field, TurboQuant trails FAISS by 3–6 points in R@1, closing at k≈16–32.
For speed, ARM results (Apple M3 Max) show turbovec beating FAISS IndexPQFastScan by 12–20% for all configurations. On x86 (Intel Xeon Platinum 8481C / Sapphire Rapids, 8 vCPUs), turbovec wins all 4-bit configurations by 1–6%. It works within ~1% of FAISS for 2-bit single thread. Two configurations sit a little behind FAISS: 2-bit multiple threads at d=1536 and d=3072. There, the internal accumulate loop is too short for amortization. The AVX-512 VBMI method of FAISS holds the edge in those two cases (2–4%).
Python API
To enter a single command: pip install turbovec. The first section is TurboQuantIndexinitiated by the size and width of the bit.
from turbovec import TurboQuantIndex
index = TurboQuantIndex(dim=1536, bit_width=4)
index.add(vectors)
scores, indices = index.search(query, k=10)
index.write("my_index.tq")
The second stage, IdMapIndexsupports stable external uint64 IDs that survive deletion. Removal is O(1) per ID. This is useful for document stores where vectors are frequently updated or deleted.
turbovec integrates with LangChain (pip install turbovec[langchain]), LlamaIndex (pip install turbovec[llama-index]) and Haystack (pip install turbovec[haystack]). Rust crate is available with cargo add turbovec.
Marktechpost Visual Explainer
Key Takeaways
- No codebook training. turbovec indexes vectors instantly – no k-means, no reconstruction as the corpus grows.
- 16x compression. A 1536-dim float32 vector shrinks from 6,144 bytes to 384 bytes with 2-bit quantization.
- Faster than FAISS on ARM. turbovec outperforms FAISS IndexPQFastScan by 12–20% on ARM for all configurations.
- Close distortion. TurboQuant achieves a distortion within ~2.7x of Shannon's lower bound – almost close to the theoretical limit.
- Full location. No managed service, no data out – pairs or any open source embedding model for the air-gapped RAG stack.
Check it out Repo here. Also, feel free to follow us Twitter and don't forget to join our 150k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.
Need to work with us on developing your GitHub Repo OR Hug Face Page OR Product Release OR Webinar etc.? contact us



