Qdrant#
One can use Qdrant as the document store for DocumentArray. It is useful when one wants to have faster Document retrieval on embeddings, i.e. .match()
, .find()
.
Tip
This feature requires qdrant-client
. You can install it via pip install "docarray[qdrant]".
Usage#
Start Qdrant service#
To use Qdrant as the storage backend, you need a running Qdrant server. You can use the Qdrant Docker image to run a
server. Create docker-compose.yml
as follows:
---
version: '3.4'
services:
qdrant:
image: qdrant/qdrant:v0.7.0
ports:
- "6333:6333"
ulimits: # Only required for tests, as there are a lot of collections created
nofile:
soft: 65535
hard: 65535
...
Then
docker-compose up
Create DocumentArray with Qdrant backend#
Assuming service is started using the default configuration (i.e. server address is http://localhost:6333
), one can
instantiate a DocumentArray with Qdrant storage like so:
from docarray import DocumentArray
da = DocumentArray(storage='qdrant', config={'n_dim': 10})
The usage would be the same as the ordinary DocumentArray.
To access a DocumentArray formerly persisted, one can specify the collection_name
, the host
and the port
.
from docarray import DocumentArray
da = DocumentArray(
storage='qdrant',
config={
'collection_name': 'persisted',
'host': 'localhost',
'port': '6333',
'n_dim': 10,
},
)
da.summary()
Note that specifying the n_dim
is mandatory before using Qdrant as a backend for DocumentArray.
Other functions behave the same as in-memory DocumentArray.
Config#
The following configs can be set:
Name | Description | Default |
---|---|---|
n_dim |
Number of dimensions of embeddings to be stored and retrieved | This is always required |
collection_name |
Qdrant collection name client | Random collection name generated |
host |
Hostname of the Qdrant server | 'localhost' |
port |
port of the Qdrant server | 6333 |
distance |
Distance metric to be used during search. Can be 'cosine', 'dot' or 'euclidean' | 'cosine' |
scroll_batch_size |
batch size used when scrolling over the storage | 64 |
ef_construct |
Number of neighbours to consider during the index building. Larger the value - more accurate the search, more time required to build index. | None , defaults to the default value in Qdrant* |
full_scan_threshold |
Minimal amount of points for additional payload-based indexing. | None , defaults to the default value in Qdrant* |
m |
Number of edges per node in the index graph. Larger the value - more accurate the search, more space required. | None , defaults to the default value in Qdrant* |
*You can read more about the HNSW parameters and their default values here
Minimum example#
Create docker-compose.yml
:
---
version: '3.4'
services:
qdrant:
image: qdrant/qdrant:v0.7.0
ports:
- "6333:6333"
ulimits: # Only required for tests, as there are a lot of collections created
nofile:
soft: 65535
hard: 65535
...
pip install -U docarray[qdrant]
docker compose up
import numpy as np
from docarray import DocumentArray
N, D = 100, 128
da = DocumentArray.empty(N, storage='qdrant', config={'n_dim': D}) # init
da.embeddings = np.random.random([N, D])
print(da.find(np.random.random(D), limit=10))
<DocumentArray (length=10) at 4917906896>