How the 2021 Quantization Algorithm Works Silently Its 2026 Successor

0 0 6 minutes read

How the 2021 Quantization Algorithm Works Silently Its 2026 Successor

[3]online vector measurement method, attracted public attention at ICLR 2026. To me, it looked very familiar: very passable EDENmeasurement method that was first introduced as a 1-bit method DRIVE at NeurIPS 2021 [1] and is generalized to have an arbitrarily small scope in ICML 2022 [2]. Written by myself, with Ran Ben-Basat, Yaniv Ben-Itzhak, Gal Mendelson, Michael Mitzenmacher, and Shay Vargaftik.

The TurboQuant paper offers two variants: TurboQuant-mse again TurboQuant-prod. With a new detailed comparison [5] we show that TurboQuant-mse is a degenerate form of EDEN, and that EDEN variants always outperform their counterparts.

How does EDEN measure a vector

Suppose you need to press a $d$ -dimensional vector $x$ (gradient update, embedding, KV caching) down to a few bits per connection. EDEN proceeds in four steps:

Random rotation – Multiply by a random orthogonal matrix $Pi$ . After rotating the links are distributed uniformly again, becoming larger $d$ approximately Gaussian.

Scalar quantization – Rotate each rotated link to one $2^b$ standards from the Lloyd–Max codebook trained in known the rotated coordinate distribution ( $b$ target number of bits per link).

Scale – Multiply by the scale factor $S$ .

Reverse rotation – Claim $Pi^up$ to restore limitations $hats{x}$ of the initial vector.

While previous work (eg, Suresh et al. (2017) [6]) used rotation mainly to reduce links' dynamic range (the gap between the largest and smallest coordinates), EDEN [1] To the best of our knowledge, it was the first measurement system to exploit a strong fact about random rotations: post-rotation coordinates follow a known distribution, which allows us to use it is deterministic a quantizer paired with a closed-form scaler that, depending on the application, reduces the MSE or makes the measurement unbiased. Both scales are obtained analytically, and the formulation yields an asymptotic MSE reduction over the previous method.

Specifically, the two types of EDEN differ only in selection $S$ :

EDEN-biased – sets $S$ to a closed-form value that minimizes the MSE of the reconstruction.

EDEN-unbiased – you choose $S$ so the compressed output is approximately correct ( $mathbb{E}[hat{x}] = x$ ), which is especially important whenever measuring multiple vectors (e.g., distributed training, attention).

Benchmarked against EDEN, TurboQuant-mse is identical in all measures except one: where EDEN finds the scale $S$ analytically, TurboQuant-mse, although aiming for MSE reduction, bypasses the improved scaling.

The pseudocode below shows the three together.

Figure 1: EDEN pseudocode set for EDEN-biased, EDEN-unbiased, and TurboQuant-mse. The three are the same except for step 5: the selection of the S image. by the author [5].

Why the right measure is right for you

The value of using the correct scale $S$ increases with bit-width. In $b = 1$ slowly, the gap is small. In $d = 128$ again $b = 4$ bits, EDEN-biased reduces the MSE by 2.25% more than TurboQuant-mse, and these are the ones with a small range that use it for embedding with KV caches.

For all dimensions 16 to 4096 and all bit-widths tested $b in{1,2,3,4}$ EDEN-biased vNMSE (vector-normalized MSE, $mathbb{E}[|x – hat{x}|^2] /|x|^2$ ) falls below TurboQuant-mse in all cases (Figure 2). As the size increases, the correct $S$ approaches 1 and the two algorithms converge, but at realistic values (128-1024), the gap persists.

Figure 2: vNMSE vs. dimension comparing EDEN-biased and TurboQuant-mse at all bit-widths $b in{1,2,3,4}$ (left to right panels). EDEN-biased (which improves the scale factor $S$ ) obtains a lower error than TurboQuant-mse (corrective $S=1$ ) for every size tested. The curves meet in high dimension as a right $S$ methods 1. Image by the author [5].

Unbiased compression: saving more than the full minimum

The above results concern the biased variable (MSE-minimizing). Now consider a neutral situation, where applications such as distributed training, limited attention, or the need for internal product recovery. $mathbb{E}[hat{x}] = x$ because they estimate many finite vectors.

EDEN-unbiased uses the same single-pass algorithm as EDEN-biased, just once $S$ selected for bias correction. TurboQuant's unbiased variant, TurboQuant-prod, takes a different route: it discards $(b-1)$ bits in the TurboQuant-mse bias step and save 1 QJL (Quantized Johnson–Lindenstrauss) bit [4] residual correction (QJL is the same as EDEN in $b=1$ but with high variability).

The unbiased EDEN-method outperforms TurboQuant-prod in all configurations tested, and by a large margin. The gap follows three advantages of EDEN's single-pass design:

EDEN develops scale. TurboQuant-prod inherits TurboQuant-mse's $s=1$ phase one, so it carries the same MSE penalty.

The 1-bit EDEN format has lower variance than QJL. At maximum size, EDEN's 1-bit vNMSE converges to $pi/2 – 1 about 0.57$ [1]when QJL meets $pi/2 is about 1.57$ [4]about 2.75× more.

EDEN uses the full budget in one unbiased estimate. TurboQuant-prod divides the budget into $(b-1)$ bias bits and 1 residual bit, which makes it bad to use all $b$ bits in a single unbiased quantizer [5].

These results include. Result: 1-bit, 2-bit, and 3-bit EDEN-unbiased are each more accurate than 2-bit, 3-bit, and 4-bit TurboQuant-prod, respectively. (Figure 3). By switching to EDEN you can drop a bit per correlation and still match the accuracy of TurboQuant-prod.

Figure 3: vNMSE vs. dimension comparing EDEN-unbiased and TurboQuant-prod at all bit-widths $b in{1,2,3,4}$ (left to right panels). EDEN-unbiased achieves the lowest error in every sector. The gap is big enough that EDEN with $b$ fragments usually pass TurboQuant-prod with $b + 1$ pieces. Photo by the author [5].

In TurboQuant benchmarks

The same image holds the standard ANN benchmarks that TurboQuant tests on, pre-trained word vectors for Stanford's GloVe (Open Data Commons Public Domain Dedication and License v1.0) and Qdrant's dbpedia-entities-openai3-text-embedding-3-large embedded (Apache 2.0), using the published TurboQuant test code:

EDEN-biased achieves a lower MSE than TurboQuant-mse, EDEN-unbiased achieves a significantly lower internal product error than TurboQuant-prod, and nearest neighbor recall in both datasets favors EDEN (Figure 4).

Figure 4: Nearest neighbor memory in GloVe and OpenAI3 embeddings at 2 and 4 bits per link. EDEN-unbiased outperforms TurboQuant-prod in all four settings. Photo by the author [5].

Takeaway: use EDEN; relevant measurement issues

The EDEN scale correlates the posterior distribution of known rotations to the highest analytical scale. TurboQuant-mse maintains the EDEN rotation and codebook but the pins $S=1$ which makes it a strictly weak special case. TurboQuant-prod adds a 1-bit QJL stage on top of that, where EDEN-i-unbiased finds the same area, with better accuracy, just by choosing a bias correction scale.

For MSE-directed compression (equilibrium weight model, nearest neighbor search, KV cache): EDEN-biased calculates the correct scale $S$ and it always beats TurboQuant-mse (which is something EDEN with $S=1$ fixed).

For unbiased measurement (distributed mean estimation, limited attention, inner product detection): EDEN-unbiased is more efficient than TurboQuant-prod's fractionalization strategy, with margins costing more than a full bit per coordinate.

EDEN was originally developed to measure the distributed value in integrated and distributed training. The following function, for example, is used to embed the compression of a document's formatting (SDR2022 [8]), prepared for NVFP4 LLM training (MS-EDEN in the middle Quartet II2026 [10]), equated to the vector quantization of the LLM weighted compression without the data (HIGGS2025 [9]), which was then used for KV-cache compression (AQUA-KV2025 [11]).

EDEN implementations are available: on PyTorch and TensorFlow, on Intel's OpenFL [7]and its 1-bit variants on Google's FedJax, TensorFlow Federated, and TensorFlow Model Optimization.

For a complete technical comparison analysis with TurboQuant (all calculations, detailed test method), see our note [5].

For original derivations, proofs, and additional extensions, see our original papers [1] [2].

References

S. Vargaftik, R. Ben-Basat, A. Portnoy, G. Mendelson, Y. Ben-Itzhak, M. Mitzenmacher, DRIVE: One-bit Distributed Mean Estimation (2021), NeurIPS 2021.

S. Vargaftik, R. Ben-Basat, A. Portnoy, G. Mendelson, Y. Ben-Itzhak, M. Mitzenmacher, EDEN: An Efficient and Robust Distributed Network for Ensemble Learning Measurement (2022), ICML 2022.

A. Zandieh, M. Daliri, A. Hadian, V. Mirrokni, TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate (2026), ICLR 2026.

A. Zandieh, M. Daliri, I. Han, QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead (2024), arXiv:2406.03482.

R. Ben-Basat, Y. Ben-Itzhak, G. Mendelson, M. Mitzenmacher, A. Portnoy, S. Vargaftik, Notice on TurboQuant and the DRIVE/EDEN Previous Work Line (2026), arXiv:2604.18555.

AT Suresh, FX Yu, S. Kumar, HB McMahan, Distributed Mean Estimation with Limited Communication (2017), ICML 2017.

VMware Open Source Blog, VMware Research Group's EDEN Becomes Part of OpenFL (November 2022).

N. Cohen, A. Portnoy, B. Fetahu, A. Ingber, SDR: Functional Neural Re-estimation using Succinct Document Representation (2022), ACL 2022.

V. Malinovskii, A. Panferov, I. Ilin, H. Guo, P. Richtárik, D. Alistarh, HIGGS: Pushing the Limits of Large Language Model Quantization via the Linearity Theorem (2025), NAACL 2025.

A. Panferov, E. Schultheis, S. Tabesh, D. Alistarh, Quartet II: Accurate LLM Pre-Training in NVFP4 with Enhanced Unbiased Gradient Estimation (2026), arXiv:2601.22813.

A. Shutova, V. Malinovskii, V. Egiazarian, D. Kuznedelev, D. Mazur, N. Surkov, I. Ermakov, D. Alistarh, Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models (2025), ICML 2025.

Source link

nimda Send an email 3 hours ago
0 0 6 minutes read

Facebook X LinkedIn Tumblr Pinterest Reddit VKontakte Odnoklassniki Pocket

Share
Facebook X LinkedIn Tumblr Pinterest Reddit VKontakte Odnoklassniki Pocket Share via Email Print