Welcome to DocArray!

DocArray is a library for nested, unstructured data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer the data with a Pythonic API.

🌌 Rich data types: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data.

🐍 Pythonic experience: designed to be as easy as a Python list. If you know how to Python, you know how to DocArray. Intuitive idioms and type annotation simplify the code you write.

🧑‍🔬 Data science powerhouse: greatly accelerate data scientists’ work on embedding, matching, visualizing, evaluating via Torch/TensorFlow/ONNX/PaddlePaddle on CPU/GPU.

🚡 Data in transit: optimized for network communication, ready-to-wire at anytime with fast and compressed serialization in Protobuf, bytes, base64, JSON, CSV, DataFrame. Built-in data validation and JSON Schema (OpenAPI) help you build reliable webservices.

Install

PyPI is the latest version.

Make sure you have Python 3.7+ and numpy installed on Linux/Mac/Windows:

pip install docarray

No extra dependency will be installed.

conda install -c conda-forge docarray

No extra dependency will be installed.

pip install "docarray[full]"

The following dependencies will be installed to enable additional features:

Package Used in
protobuf advanced serialization
lz4 compression in seralization
requests push/pull to Jina Cloud
matplotlib visualizing image sprites
Pillow image data-related IO
rich push/pull to Jina Cloud, summary of Document, DocumentArray
av video data-related IO
trimesh 3D mesh data-related IO
fastapi used in embedding projector of DocumentArray

Alternatively, you can first do basic installation and then install missing dependencies on-demand.

>>> import docarray
>>> docarray.__version__
'0.1.0'
>>> from docarray import Document, DocumentArray

Important

Jina 3.x1 users do not need to install docarray separately, as it is shipped with Jina. To check your Jina version, type jina -vf in the console.

However, if the printed version is smaller than 0.1.0, say 0.0.x, then you are not installing docarray correctly. You are probably still using an old docarray shipped with Jina 2.x.

Support

Join Us

DocArray is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open-source.


Index | Module Index


1

Jina 3.0rc will be released in Feb. 2022. Stay tune!