Vector Database 101 – Scalar Quantization and Product Quantization
Introduction to Scalar Quantization and Product Quantization
Within the previous tutorial, we took a take a look at two of essentially the most basic indexing algorithms – flat indexing and inverted file (IVF). Flat indexing is the best and most naive vector search technique, however can surprisingly work fairly nicely when the entire dataset dimension is small (a pair thousand vectors) and/or if you happen to’re utilizing a GPU for querying. IVF, then again, is very extensible to be extremely extensible whereas working nicely with different indexing methods too.
On this tutorial, we’ll construct on prime of that data by diving deeper into quantization strategies – particularly scalar quantization (additionally known as integer quantization) and product quantization. We’ll implement our personal scalar and product quantization algorithms in Python.
As talked about in a earlier tutorial on vector similarity search, quantization is a way for decreasing the entire dimension of the database by decreasing the general precision of the vectors. Observe that that is very totally different from dimensionality discount (PCA, LDA, and so on), which makes an attempt to cut back the size of the vectors:
>>> vector.dimension
128
>>> quantized_vector.dimension
128
>>> reduced_vector.dimension
16
Dimensionality discount strategies equivalent to PCA use linear algebra to undertaking the enter information right into a decrease dimensional area. With out getting too deep into the mathematics right here, simply know that these strategies typically aren’t used as the first indexing technique as a result of they have an inclination to have limitations on the distribution of the info. PCA, for instance, works greatest on information that may be separated into impartial, Gaussian distributed parts.
Quantization, then again, makes no assumption concerning the distribution of the info – relatively, it seems at every dimension (or group of dimensions) individually and makes an attempt to “bin” every worth into one in every of many discrete buckets. As an alternative of performing a flat search over the entire unique vectors, we are able to as a substitute carry out flat search over the quantized vectors – this can lead to decreased reminiscence consumption in addition to vital speedup. Scalar quantization, for instance, turns floating level values into low-dimensional integers:
>>> vector.dtype
dtype('float64')
>>> quantized_vector.dtype
dtype('int8')
>>> reduced_vector.dtype
dtype('float64')
From the above instance, we are able to see that scalar quantization has decreased the entire dimension of our vectors (and vector database) by a whopping 8x. Good.
So how precisely does scalar quantization work? Let’s first check out the indexing course of, i.e. turning floating level vectors into integer vectors. For every vector dimension, scalar quantization takes the utmost and minimal worth of that specific dimension as seen throughout your entire database, and uniformly splits that dimension into bins throughout its total vary.
Let’s attempt writing that in code. We’ll first generate a dataset of a thousand 128D floating level vectors sampled from a multivariate distribution. Since it is a toy instance, I will be sampling from a Gaussian distribution; in observe, precise embeddings are not often Gaussian distributed except added as a constraint when coaching the mannequin (equivalent to in variational autoencoders):
>>> import numpy as np
>>> dataset = np.random.regular(dimension=(1000, 128))
This dataset serves as our dummy information to be used on this scalar quantization implementation. Now, let’s decide the utmost and minimal values of every dimension of the vector and retailer it in a matrix known as ranges
:
>>> ranges = np.vstack((np.min(dataset, axis=0), np.max(dataset, axis=0)))
You may discover that the minutes
and maxes
listed here are pretty uniform throughout all dimensions for the this toy instance because the enter information was sampled from zero-mean unit-variance Gaussians – don’t fret about that for now, as all of the code right here interprets to actual information as nicely. We now have the minimal and most worth of every vector dimension in your entire dataset. With this, we are able to now decide the begin worth and step dimension for every dimension. The beginning worth is solely the minimal worth, and the step dimension is decided by the variety of discrete bins within the integer sort that we’ll be utilizing. On this case, we’ll be utilizing 8-bit unsigned integers uint8_t
for a complete of 256 bins:
>>> begins = ranges[0,:]
>>> steps = (ranges[1,:] - ranges[0,:]) / 255
That is all of the setup that is wanted. The precise quantization is then accomplished by subtracting all beginning values for every dimension (begins
) and dividing the ensuing worth by the step dimension (steps
):
>>> dataset_quantized = np.uint8((dataset - begins) / steps)
>>> dataset_quantized
array([[136, 58, 156, ..., 153, 182, 30],
[210, 66, 175, ..., 68, 146, 33],
[100, 136, 148, ..., 142, 86, 108],
...,
[133, 146, 146, ..., 137, 209, 144],
[ 63, 131, 96, ..., 174, 174, 105],
[159, 78, 204, ..., 95, 87, 146]], dtype=uint8)
We are able to additionally test the minutes
and maxes
of the quantized dataset:
>>> np.min(dataset_quantized, axis=0)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint8)
>>> np.max(dataset_quantized, axis=1)
array([255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 254, 254, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 254, 255, 255, 255, 255, 254, 255, 255, 254,
254, 255, 255, 254, 255, 255, 255, 254, 255, 255, 255, 255, 255,
255, 255, 255, 255, 254, 255, 255, 255, 255, 255, 254, 255, 255,
255, 254, 255, 255, 255, 255, 255, 254, 254, 255, 255, 255, 255,
255, 255, 255, 254, 255, 255, 255, 255, 254, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255], dtype=uint8)
Observe how we have used the total uint8_t
vary of [0,255]
for every dimension (among the most values are 254 as a substitute of 255 as a consequence of floating level imprecisions).
Now let’s put all of it collectively in a ScalarQuantizer
class:
import numpy as np
class ScalarQuantizer:
def __init__(self):
self._dataset = None
def create(self):
"""Calculates and shops SQ parameters primarily based on the enter dataset."""
self._dtype = dataset.dtype
self._starts = np.min(dataset, axis=1)
self._steps = (np.max(dataset, axis=1) - self._starts) / 255
self._dataset = np.uint8((dataset - self._starts) / self._steps)
def quantize(self, vector):
"""Quantizes the enter vector primarily based on SQ parameters"""
return np.uint8((vector - self._starts) / self._steps)
def restore(self, vector):
"""Restores the unique vector utilizing SQ parameters."""
return (vector * self._steps) + self._starts
@property
def dataset(self):
if self._dataset:
return self._dataset
elevate ValueError("Name ScalarQuantizer.create() first")
>>> dataset = np.random.regular(dimension=(1000, 128))
>>> quantizer = ScalarQuantizer(dataset)
And that is it for scalar quantization! The conversion perform for scalar quantization will be modified to a quadratic or exponential perform as a substitute of a linear perform as we now have within the instance above, however the basic thought stays the identical – cut up your entire area of every dimension into discrete bins with a view to cut back the general reminiscence footprint of the vector information.
A serious drawback of scalar quantization is that it doesn’t consider the distribution of values in every dimension. For instance, think about we now have a dataset with the next 2-dimensional vectors:
array([[ 9.19, 1.55],
[ 0.12, 1.55],
[ 0.40, 0.78],
[-0.04, 0.31],
[ 0.81, -2.07],
[ 0.29, 0.82],
[ 0.05, 0.96],
[ 0.12, -1.10]])
If we resolve to quantize these vectors in to 3-bit integers (vary [0,7]
), 6 of the bins for the 0th dimension will likely be utterly unused! Clearly, there should be a greater option to carry out quantization, particularly if any of the scale have a non-uniform distribution.
Enter product quantization (PQ) – a way more highly effective and versatile option to quantize vectors. The first thought behind PQ is to algorithmically cut up a high-dimensional vector right into a decrease dimensional subspace, with dimension of the subspace akin to a number of dimensions within the unique high-dimensional vector. This discount course of is often accomplished utilizing a particular algorithm known as the Lloyd’s algorithm, a quantizer which is successfully equal to k-means clustering. Like scalar quantization, every unique vector ends in a vector of integers post-quantization, and every integer corresponds with a specific centroid.
This may sound advanced, nevertheless it turns into a lot simpler to grasp if we break it down from an algorithmic perspective. Let’s first go over indexing:
- Given a dataset of
N
whole vectors, we’ll first divide every vector intoM
subvectors (often known as a subspace). These subvectors do not essentially should be the identical size, however in observe they nearly all the time are. - We’ll then use k-means (or another clustering algorithm) for all subvectors within the dataset. This can give us a set of
Ok
centroids for every subspace, every of which will likely be assigned its personal distinctive ID. - With all centroids computed, we’ll substitute all subvectors within the unique dataset with the ID of its closest centroid.
Product quantization, visualized.
PQ can each cut back reminiscence utilization and signficantly pace up nearest neighbor search speeds at the price of a little bit of accuracy. The trade-off relies on the parameters used – utilizing extra centroids and subvectors will enhance search accuracy however won’t end in as a lot compression nor speedup.
Let’s work via a quite simple PQ implementation. As with the earlier scalar quantization instance, we’ll be decreasing the vectors into 8-bit unsigned integers (uint8_t
) utilizing M = 16
and Ok = 256
, i.e. every 128D vector will likely be cut up into 16 subvectors of dimension 8, with every subvector then being quantized into one in every of 256 buckets.
>>> (M, Ok) = (16, 256)
We’ll begin with producing a toy dataset, as we did for the scalar quantization instance:
>>> import numpy as np
>>> dataset = np.random.regular(dimension=(1000, 128))
With the dataset in hand, let’s cut up off the primary set of subvectors:
>>> sublen = dataset.form[1] // M
>>> subspace = dataset[:,0:sublen]
>>> subspace.form
(1000, 8)
As with IVF, we’ll use scipy’s kmeans2
implementation to find out centroids:
>>> from scipy.cluster.vq import kmeans2
>>> (centroids, assignments) = kmeans2(subspace, Ok, iter=32)
The scipy k-means implementation returns the centroid indices in int32_t
format, so we’ll do a fast conversion to uint8_t
to wrap issues up:
>>> quantized = np.uint8(assignments)
This course of will get repeated for every subspace till we have quantized the entire vectors.
As we have accomplished for scalar quantization, let’s additionally compile all the things right here into a category:
import numpy as np
from scipy.cluster.vq import kmeans2
class ProductQuantizer:
def __init__(self, M=16, Ok=256):
self.M = 16
self.Ok = 256
self._dataset = None
def create(self, dataset):
"""Suits PQ mannequin primarily based on the enter dataset."""
sublen = dataset.form[1] // self.M
self._centroids = np.empty((self.M, self.Ok, sublen), dtype=np.float64)
self._dataset = np.empty((dataset.form[0], self.M), dtype=np.uint8)
for m in vary(self.M):
subspace = dataset[:,m*sublen:(m+1)*sublen]
(centroids, assignments) = kmeans2(subspace, self.Ok, iter=32)
self._centroids[m,:,:] = centroids
self._dataset[:,m] = np.uint8(assignments)
def quantize(self, vector):
"""Quantizes the enter vector primarily based on PQ parameters"""
quantized = np.empty((self.M,), dtype=np.uint8)
for m in vary(self.M):
centroids = self._centroids[m,:,:]
distances = np.linalg.norm(vector - centroids, axis=1)
quantized[m] = np.argmin(distances)
return quantized
def restore(self, vector):
"""Restores the unique vector utilizing PQ parameters."""
return np.hstack([self._centroids[m,vector[m],:] for m in vary(M)])
@property
def dataset(self):
if self._dataset:
return self._dataset
elevate ValueError("Name ProductQuantizer.create() first")
>>> dataset = np.random.regular(dimension=(1000, 128))
>>> quantizer = ProductQuantizer()
>>> quantizer.create(dataset)
And that is it for product quantization! To actually pace issues up through the search course of, we are able to expend a bit of additional reminiscence to compute distance tables for all centroids in every subspace, however I will depart that as a train for you.
On this tutorial, we did a deep dive into scalar quantization and product quantization, creating our personal easy implementations alongside the best way. Scalar quantization is an effective software, however product quantization is far more highly effective and can be utilized whatever the distribution of our vector information. Please remember that, whereas PQ can assist considerably pace up question occasions whereas additionally decreasing reminiscence footprint, it’s typically not that nice for recall. We’ll benchmark PQ together with a number of different indexing methods in a future tutorial.
Within the subsequent tutorial, we’ll proceed our deep dive into indexing methods with Hierarchical Navigable Small Worlds (HNSW) – a graph-based indexing algorithm that’s arguably the preferred option to index vectors at present (though it does include its personal downsides as nicely). See you within the subsequent tutorial!
All code for this tutorial is freely obtainable on Github: https://github.com/fzliu/vector-search.