Welcome to DocArray!#

DocArray is a library for nested, unstructured data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer the data with a Pythonic API.

๐ŸŒŒ Rich data types: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data.

๐Ÿ Pythonic experience: designed to be as easy as a Python list. If you know how to Python, you know how to DocArray. Intuitive idioms and type annotation simplify the code you write.

๐Ÿง‘โ€๐Ÿ”ฌ Data science powerhouse: greatly accelerate data scientistsโ€™ work on embedding, matching, visualizing, evaluating via Torch/TensorFlow/ONNX/PaddlePaddle on CPU/GPU.

๐Ÿšก Data in transit: optimized for network communication, ready-to-wire at anytime with fast and compressed serialization in Protobuf, bytes, base64, JSON, CSV, DataFrame.

๐ŸŽก Scale to big data: handle out-of-memory data via on-disk document store while staying with exact same API experience. Supporting classic databases and vector databases to enable faster nearest neighbour search.

๐Ÿ‘’ For modern apps: GraphQL support makes your server versatile on request and response; built-in data validation and JSON Schema (OpenAPI) help you build reliable webservices.

๐Ÿ›ธ Integrate with IDE: pretty-print and visualization on Jupyter notebook & Google Colab; comprehensive auto-complete and type hint in PyCharm & VS Code.

Read more on why should you use DocArray and comparison to alternatives.

Install#

PyPI is the latest version.

Make sure you have Python 3.7+ and numpy installed on Linux/Mac/Windows:

pip install docarray

No extra dependency will be installed.

conda install -c conda-forge docarray

No extra dependency will be installed.

pip install "docarray[common]"

The following dependencies will be installed to enable the most common features:

Package Used in
protobuf advanced serialization
lz4 compression in seralization
requests push/pull to Jina Cloud
matplotlib visualizing image sprites
Pillow image data-related IO
fastapi used in embedding projector of DocumentArray
uvicorn used in embedding projector of DocumentArray
pip install "docarray[full]"

In addition to common, the following dependencies will be installed to enable full features:

Package Used in
scipy for sparse embedding, tensors
av for video processing and IO
trimesh for 3D mesh processing and IO
weaviate-client for using Weaviate-based document store
annlite for using Annlite-based document store
qdrant-client for using Qdrant-based document store
strawberry-graphql for GraphQL support

Alternatively, you can first do basic installation and then install missing dependencies on-demand.

pip install "docarray[full,test]"

This will install all requirements for reproducing tests on your local dev environment.

>>> import docarray
>>> docarray.__version__
'0.1.0'
>>> from docarray import Document, DocumentArray

Important

Jina 3.x users do not need to install docarray separately, as it is shipped with Jina. To check your Jina version, type jina -vf in the console.

However, if the printed version is smaller than 0.1.0, say 0.0.x, then you are not installing docarray correctly. You are probably still using an old docarray shipped with Jina 2.x.

Support#

Join Us#

DocArray is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open-source.


Index | Module Index