Now Reading
Unum · USearch 0.22.3 documentation

Unum · USearch 0.22.3 documentation

2023-07-31 09:25:14

Smaller & Sooner Single-File
Vector Search Engine



Discord
   
LinkedIn
   
Twitter
   
Blog
   
GitHub

Euclidean • Angular • Jaccard • Hamming • Haversine • Consumer-Outlined Metrics

C++11
Python
JavaScript
Java
Rust
C99
Objective-C
Swift
GoLang
Wolfram

Linux • MacOS • Home windows • Docker • WebAssembly


Comparability with FAISS

FAISS is a widely known commonplace for high-performance vector search engines like google.
USearch and FAISS each make use of the identical HNSW algorithm, however they differ considerably of their design rules.
USearch is compact and broadly appropriate with out sacrificing efficiency, with a major concentrate on user-defined metrics and fewer dependencies.

FAISS

USearch

Implementation

84 Okay SLOC in faiss/

3 Okay SLOC in usearch/

Supported metrics

9 fastened metrics

Any Consumer-Outlined metrics

Supported ID sorts

uint32_t, uint64_t

uint32_t, uint40_t, uint64_t

Dependencies

BLAS, OpenMP

None

Bindings

SWIG

Native

Acceleration

Realized Quantization

Downcasting

Base performance is similar to FAISS, and the interface should be acquainted if in case you have ever investigated Approximate Nearest Neigbors search:

$ pip set up usearch numpy

import numpy as np
from usearch.index import Index

index = Index(
    ndim=3, # Outline the variety of dimensions in enter vectors
    metric='cos', # Select 'l2sq', 'haversine' or different metric, default = 'ip'
    dtype='f32', # Quantize to 'f16' or 'f8' if wanted, default = 'f32'
    connectivity=16, # Elective: How frequent ought to the connections within the graph be
    expansion_add=128, # Elective: Management the recall of indexing
    expansion_search=64, # Elective: Management the standard of search
)

vector = np.array([0.2, 0.6, 0.4])
index.add(42, vector)
matches, distances, depend = index.search(vector, 10)

assert len(index) == 1
assert depend == 1
assert matches[0] == 42
assert distances[0] <= 0.001
assert np.allclose(index[42], vector)

Consumer-Outlined Features

Whereas most vector search packages focus on simply a few metrics – “Inside Product distance” and “Euclidean distance,” USearch extends this checklist to incorporate any user-defined metrics.
This flexibility permits you to customise your seek for a myriad of purposes, from computing geo-spatial coordinates with the uncommon Haversine distance to creating customized metrics for composite embeddings from a number of AI fashions.

USearch: Vector Search Approaches

Not like older approaches indexing high-dimensional areas, like KD-Timber and Locality Delicate Hashing, HNSW doesn’t require vectors to be similar in size.
They solely need to be comparable.
So you’ll be able to apply it in obscure purposes, like trying to find related units or fuzzy textual content matching, utilizing GZip as a distance perform.

Reminiscence Effectivity, Downcasting, and Quantization

Coaching a quantization mannequin and dimension-reduction is a standard method to speed up vector search.
These, nonetheless, are solely generally dependable, can considerably have an effect on the statistical properties of your information, and require common changes in case your distribution shifts.

USearch uint40_t support

As a substitute, we’ve centered on high-precision arithmetic over low-precision downcasted vectors.
The identical index, and add and search operations will mechanically down-cast or up-cast between f32_t, f16_t, f64_t, and f8_t representations, even when the {hardware} doesn’t natively help it.
Persevering with the subject of memory-efficiency, we offer a uint40_t to permit assortment with over 4B+ vectors with out allocating 8 bytes for each neighbor reference within the proximity graph.

FAISS, f32

USearch, f32

USearch, f16

USearch, f8

Batch Insert

16 Okay/s

73 Okay/s

100 Okay/s

104 Okay/s +550%

Batch Search

82 Okay/s

103 Okay/s

113 Okay/s

134 Okay/s +63%

Bulk Insert

76 Okay/s

105 Okay/s

115 Okay/s

202 Okay/s +165%

Bulk Search

118 Okay/s

174 Okay/s

173 Okay/s

304 Okay/s +157%

Recall @ 10

99%

99.2%

99.1%

99.2%

Dataset: 1M vectors pattern of the Deep1B dataset.
{Hardware}: c7g.metallic AWS occasion with 64 cores and DDR5 reminiscence.
HNSW was configured with similar hyper-parameters:
connectivity M=16,
growth @ development efConstruction=128,
and growth @ search ef=64.
Batch measurement is 256.
Each libraries have been compiled for the goal structure.
Bounce to the Performance Tuning part to learn concerning the results of these hyper-parameters.

Disk-based Indexes

With USearch, you’ll be able to serve indexes from exterior reminiscence, enabling you to optimize your server selections for indexing velocity and serving prices.
This can lead to 20x prices discount on AWS and different public clouds.

index.save("index.usearch")

loaded_copy = index.load("index.usearch")
view = Index.restore("index.usearch", view=True)

other_view = Index(ndim=..., metric=CompiledMetric(...))
other_view.view("index.usearch")

Joins

One of many massive questions lately is how will AI change the world of databases and data-management?
Most databases are nonetheless struggling to implement high-quality fuzzy search, and the one form of joins they know are deterministic.
A be part of is totally different from trying to find each entry, because it requires a one-to-one mapping, banning collisions amongst separate search outcomes.

Precise Search

Fuzzy Search

Semantic Search ?

Precise Be a part of

Fuzzy Be a part of ?

Semantic Be a part of ??

Utilizing USearch one can implement sub-quadratic complexity approximate, fuzzy, and semantic joins.
This will come useful in any fuzzy-matching duties, frequent to Database Administration Software program.

males = Index(...)
girls = Index(...)
pairs: dict = males.be part of(girls, max_proposals=0, precise=False)

Performance

By now, core performance is supported throughout all bindings.
Broader performance is ported per request.

C++

Python

Java

JavaScript

Rust

GoLang

Swift

add/search/take away

See Also

save/load/view

be part of

user-defiend metrics

variable-length vectors

4B+ capacities

Utility Examples

TODO

Integrations

Citations

@software program{Vardanian_USearch_2022,
doi = {10.5281/zenodo.7949416},
writer = {Vardanian, Ash},
title = {{USearch by Unum Cloud}},
url = {https://github.com/unum-cloud/usearch},
model = {0.13.0},
yr = {2022}
month = jun,
}



Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top