Welcome to DocArray!#

DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer the multi-modal data with a Pythonic API.

🚪 Door to cross-/multi-modal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc.

🧑‍🔬 Data science powerhouse: greatly accelerate data scientists’ work on embedding, k-NN matching, querying, visualizing, evaluating via Torch/TensorFlow/ONNX/PaddlePaddle on CPU/GPU.

🚡 Data in transit: optimized for network communication, ready-to-wire at anytime with fast and compressed serialization in Protobuf, bytes, base64, JSON, CSV, DataFrame. Perfect for streaming and out-of-memory data.

🔎 One-stop k-NN: Unified and consistent API for mainstream vector databases that allows nearest neighboour search including Elasticsearch, Redis, ANNLite, Qdrant, Weaviate.

👒 For modern apps: GraphQL support makes your server versatile on request and response; built-in data validation and JSON Schema (OpenAPI) help you build reliable webservices.

🐍 Pythonic experience: designed to be as easy as a Python list. If you know how to Python, you know how to DocArray. Intuitive idioms and type annotation simplify the code you write.

🛸 Integrate with IDE: pretty-print and visualization on Jupyter notebook & Google Colab; comprehensive auto-complete and type hint in PyCharm & VS Code.

Read more on why should you use DocArray and comparison to alternatives.

Jina in Jina AI neural search ecosystem

Install#

PyPI is the latest version.

Make sure you have Python 3.7+ and numpy installed on Linux/Mac/Windows:

pip install docarray

No extra dependency will be installed.

conda install -c conda-forge docarray

No extra dependency will be installed.

pip install "docarray[common]"

The following dependencies will be installed to enable the most common features:

Package

Used in

protobuf

advanced serialization

lz4

compression in seralization

requests

push/pull to Jina Cloud

matplotlib

visualizing image sprites

Pillow

image data-related IO

fastapi

used in embedding projector of DocumentArray

uvicorn

used in embedding projector of DocumentArray

pip install "docarray[full]"

In addition to common, the following dependencies will be installed to enable full features:

Package

Used in

scipy

for sparse embedding, tensors

av

for video processing and IO

trimesh

for 3D mesh processing and IO

strawberry-graphql

for GraphQL support

Alternatively, you can first do basic installation and then install missing dependencies on-demand.

pip install "docarray[full,test]"

This will install all requirements for reproducing tests on your local dev environment.

>>> import docarray
>>> docarray.__version__
'0.1.0'
>>> from docarray import Document, DocumentArray

Support#

Join Us#

DocArray is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open-source.


Index | Module Index