Welcome to DocArray!#
DocArray is a library for nested, unstructured data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer the data with a Pythonic API.
๐ Rich data types: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data.
๐ Pythonic experience: designed to be as easy as a Python list. If you know how to Python, you know how to DocArray. Intuitive idioms and type annotation simplify the code you write.
๐งโ๐ฌ Data science powerhouse: greatly accelerate data scientistsโ work on embedding, matching, visualizing, evaluating via Torch/TensorFlow/ONNX/PaddlePaddle on CPU/GPU.
๐ก Data in transit: optimized for network communication, ready-to-wire at anytime with fast and compressed serialization in Protobuf, bytes, base64, JSON, CSV, DataFrame.
๐ก Scale to big data: handle out-of-memory data via on-disk document store while staying with exact same API experience. Supporting classic databases and vector databases to enable faster nearest neighbour search.
๐ For modern apps: GraphQL support makes your server versatile on request and response; built-in data validation and JSON Schema (OpenAPI) help you build reliable webservices.
๐ธ Integrate with IDE: pretty-print and visualization on Jupyter notebook & Google Colab; comprehensive auto-complete and type hint in PyCharm & VS Code.
Read more on why should you use DocArray and comparison to alternatives.
Install#
is the latest version.
Make sure you have Python 3.7+ and numpy
installed on Linux/Mac/Windows:
pip install docarray
No extra dependency will be installed.
conda install -c conda-forge docarray
No extra dependency will be installed.
pip install "docarray[common]"
The following dependencies will be installed to enable the most common features:
Package | Used in |
---|---|
protobuf |
advanced serialization |
lz4 |
compression in seralization |
requests |
push/pull to Jina Cloud |
matplotlib |
visualizing image sprites |
Pillow |
image data-related IO |
fastapi |
used in embedding projector of DocumentArray |
uvicorn |
used in embedding projector of DocumentArray |
pip install "docarray[full]"
In addition to common
, the following dependencies will be installed to enable full features:
Package | Used in |
---|---|
scipy |
for sparse embedding, tensors |
av |
for video processing and IO |
trimesh |
for 3D mesh processing and IO |
weaviate-client |
for using Weaviate-based document store |
annlite |
for using Annlite-based document store |
qdrant-client |
for using Qdrant-based document store |
strawberry-graphql |
for GraphQL support |
Alternatively, you can first do basic installation and then install missing dependencies on-demand.
pip install "docarray[full,test]"
This will install all requirements for reproducing tests on your local dev environment.
>>> import docarray
>>> docarray.__version__
'0.1.0'
>>> from docarray import Document, DocumentArray
Important
Jina 3.x users do not need to install docarray
separately, as it is shipped with Jina. To check your Jina version, type jina -vf
in the console.
However, if the printed version is smaller than 0.1.0
, say 0.0.x
, then you are
not installing docarray
correctly. You are probably still using an old docarray
shipped with Jina 2.x.
Support#
Check out the Learning Bootcamp to get started with DocArray.
Use Discussions to talk about your use cases, questions, and support queries.
Join our Slack community and chat with other community members about ideas.
Join our Engineering All Hands meet-up to discuss your use case and learn Jinaโs new features.
When? The second Tuesday of every month
Where? Zoom (see our public events calendar/.ical) and live stream on YouTube
Subscribe to the latest video tutorials on our YouTube channel
Join Us#
DocArray is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open-source.