Annlite#

One can use Annlite as the document store for DocumentArray. It is useful when one wants to have faster Document retrieval on embeddings, i.e. .match(), .find().

Tip

This feature requires annlite. You can install it via pip install "docarray[annlite]".

Usage#

One can instantiate a DocumentArray with Annlite storage like so:

from docarray import DocumentArray

da = DocumentArray(storage='annlite', config={'n_dim': 10})

The usage would be the same as the ordinary DocumentArray.

To access a DocumentArray formerly persisted, one can specify the data_path in config.

from docarray import DocumentArray

da = DocumentArray(storage='annlite', config={'data_path': './data', 'n_dim': 10})

da.summary()

Note that specifying the n_dim is mandatory before using Annlite as a backend for DocumentArray.

Other functions behave the same as in-memory DocumentArray.

Config#

The following configs can be set:

Name Description Default
n_dim Number of dimensions of embeddings to be stored and retrieved This is always required
data_path The data folder where the data is located A random temp folder
metric Distance metric to be used during search. Can be 'cosine', 'dot' or 'euclidean' 'cosine'
ef_construction The size of the dynamic list for the nearest neighbors (used during the construction) None, defaults to the default value in the AnnLite package*
ef_search The size of the dynamic list for the nearest neighbors (used during the search) None, defaults to the default value in the AnnLite package*
max_connection The number of bi-directional links created for every new element during construction. None, defaults to the default value in the AnnLite package*

*You can check the default values in the AnnLite source code