Welcome to DocArray!#

DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API.

πŸšͺ Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALLΒ·E Flow, DiscoArt etc.

πŸ§‘β€πŸ”¬ Data science powerhouse: greatly accelerate data scientists’ work on embedding, k-NN matching, querying, visualizing, evaluating via Torch/TensorFlow/ONNX/PaddlePaddle on CPU/GPU.

🚑 Data in transit: optimized for network communication, ready-to-wire at anytime with fast and compressed serialization in Protobuf, bytes, base64, JSON, CSV, DataFrame. Perfect for streaming and out-of-memory data.

πŸ”Ž One-stop k-NN: Unified and consistent API for mainstream vector databases that allows nearest neighbor search including Elasticsearch, Redis, AnnLite, Qdrant, Weaviate.

πŸ‘’ For modern apps: GraphQL support makes your server versatile on request and response; built-in data validation and JSON Schema (OpenAPI) help you build reliable web services.

🐍 Pythonic experience: as easy as a Python list. If you can Python, you can DocArray. Intuitive idioms and type annotation simplify the code you write.

πŸ›Έ IDE integration: pretty-print and visualization on Jupyter notebook and Google Colab; comprehensive autocomplete and type hints in PyCharm and VS Code.

Read more on why should you use DocArray and comparison to alternatives.


PyPI is the latest version.

Make sure you have Python 3.7+ and numpy installed on Linux/Mac/Windows:

pip install docarray

No extra dependency will be installed.

conda install -c conda-forge docarray

No extra dependency will be installed.

pip install "docarray[common]"

The following dependencies will be installed to enable the most common features:


Used in


advanced serialization


compression in seralization


push/pull to Jina Cloud


visualizing image sprites


image data-related IO


used in embedding projector of DocumentArray


used in embedding projector of DocumentArray

pip install "docarray[full]"

In addition to common, the following dependencies will be installed to enable full features:


Used in


for sparse embedding, tensors


for video processing and IO


for 3D mesh processing and IO


for GraphQL support

Alternatively, you can first do basic installation and then install missing dependencies on-demand.

pip install "docarray[full,test]"

This will install all requirements for reproducing tests on your local dev environment.

>>> import docarray
>>> docarray.__version__
>>> from docarray import Document, DocumentArray

Index | Module Index