Welcome to DocArray!#

DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API.

πŸšͺ Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALLΒ·E Flow, DiscoArt etc.

πŸ§‘β€πŸ”¬ Data science powerhouse: greatly accelerate data scientists’ work on embedding, k-NN matching, querying, visualizing, evaluating via Torch/TensorFlow/ONNX/PaddlePaddle on CPU/GPU.

🚑 Data in transit: optimized for network communication, ready-to-wire at anytime with fast and compressed serialization in Protobuf, bytes, base64, JSON, CSV, DataFrame. Perfect for streaming and out-of-memory data.

πŸ”Ž One-stop k-NN: Unified and consistent API for mainstream vector databases that allows nearest neighbor search including Elasticsearch, Redis, AnnLite, Qdrant, Weaviate.

πŸ‘’ For modern apps: GraphQL support makes your server versatile on request and response; built-in data validation and JSON Schema (OpenAPI) help you build reliable web services.

🐍 Pythonic experience: as easy as a Python list. If you can Python, you can DocArray. Intuitive idioms and type annotation simplify the code you write.

πŸ›Έ IDE integration: pretty-print and visualization on Jupyter notebook and Google Colab; comprehensive autocomplete and type hints in PyCharm and VS Code.

Read more on why should you use DocArray and comparison to alternatives.

Install#

PyPI is the latest version.

Make sure you have Python 3.7+ and numpy installed on Linux/Mac/Windows:

pip install docarray

No extra dependency will be installed.

conda install -c conda-forge docarray

No extra dependency will be installed.

pip install "docarray[common]"

The following dependencies will be installed to enable the most common features:

Package

Used in

protobuf

advanced serialization

lz4

compression in seralization

requests

push/pull to Jina Cloud

matplotlib

visualizing image sprites

Pillow

image data-related IO

fastapi

used in embedding projector of DocumentArray

uvicorn

used in embedding projector of DocumentArray

pip install "docarray[full]"

In addition to common, the following dependencies will be installed to enable full features:

Package

Used in

scipy

for sparse embedding, tensors

av

for video processing and IO

trimesh

for 3D mesh processing and IO

strawberry-graphql

for GraphQL support

Alternatively, you can first do basic installation and then install missing dependencies on-demand.

pip install "docarray[full,test]"

This will install all requirements for reproducing tests on your local dev environment.

>>> import docarray
>>> docarray.__version__
'0.1.0'
>>> from docarray import Document, DocumentArray

Index | Module Index