DocArray is an upstream dependency of Jina. Without DocArray, Jina can not run.
DocArray focuses on the local & monolith developer experience. Jina scales DocArray to the Cloud. DocArray is also the default transit format in Jina, Executors talk to each other via serialized DocArray. The picture below shows their relations.
The next picture summarizes your development journey with DocArray and Jina. With a new project, first move horizontally left with DocArray, that often means improving quality and completing logics on a local environment. When you are ready, move vertically up with Jina, equipping your application with service endpoint, scalability and cloud-native features. Finally, you reach the point your service is ready for production.
If you are a Jina 3 user, you don’t need to install
docarray independently, as it is included in
pip install jina. You can use
jina -v in the terminal to check if you are using
When starting a Jina project, you can write imports either as
from docarray import DocumentArray, Document from jina import Flow
from jina import Flow, DocumentArray, Document
They work exactly same. You will be using the same install of
docarray in your system. This is because
jina package exposes
You can update DocArray package without updating Jina via
pip install -U docarray. This often works unless otherwise specified in the release note of Jina.
Direct invoke Jina/Hub Executor#
As described here, one can simply use an external Jina Flow/Executor as a regular function to process a DocumentArray.
Local code as a service#
Considering the example below, where we use DocArray to pre-process an image DocumentArray:
from docarray import Document, DocumentArray da = DocumentArray.from_files('**/*.png') def preproc(d: Document): return ( d.load_uri_to_image_tensor() # load .set_image_tensor_normalization() # normalize color .set_image_tensor_channel_axis(-1, 0) ) # switch color axis for the PyTorch model later da.apply(preproc).plot_image_sprites(channel_axis=0)
The code can be run as-is. It will give you a plot like the following (depending on how many images you have):
When writing it with Jina, the code is slightly refactored into the Executor-style:
from docarray import Document, DocumentArray from jina import Executor, requests class MyExecutor(Executor): @staticmethod def preproc(d: Document): return ( d.load_uri_to_image_tensor() # load .set_image_tensor_normalization() # normalize color .set_image_tensor_channel_axis(-1, 0) ) # switch color axis for the PyTorch model later @requests def foo(self, docs: DocumentArray, **kwargs): docs.apply(self.preproc)
To summarize, you need to do three changes:
Executorand subclass it;
Wrap you functions into class methods;
@requestdecorator the logic functions.
Now you can feed data to it via:
from jina import Flow, DocumentArray f = Flow().add(uses=MyExecutor) with f: r = f.post('/', DocumentArray.from_files('**/*.png'), show_progress=True) r.plot_image_sprites(channel_axis=0)
You get the same results as before with some extra output from the console:
[email protected][I]:🎉 Flow is ready to use! 🔗 Protocol: GRPC 🏠 Local access: 0.0.0.0:57050 🔒 Private network: 192.168.0.102:57050 🌐 Public address: 126.96.36.199:57050 ⠋ DONE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 0:00:05 100% ETA: 0 seconds 80 steps done in 5 seconds
Three good reasons to use Jina#
Okay, so I refactor the code from 10 lines to 24 lines, what’s the deal? Here are three reasons to use Jina:
A client-server architecture#
One immediate consequence is now your logic works as a service. You can host it remotely on a server and start client to query it:
from jina import Flow, DocumentArray f = Flow(port=12345).add(uses=MyExecutor) with f: f.block()
from jina import Client, DocumentArray c = Client(port=12345) c.post('/', DocumentArray.from_files('**/*.png'), show_progressbar=True)
You can also use
http, GraphQL API to query it. More details can be found in Jina Documentation.
Scale it out#
Scaling your server is as easy as adding
from jina import Flow f = Flow(port=12345).add(uses=MyExecutor, replicas=3) with f: f.block()
This will start three parallels can improve the overall throughput. More details can be found here.
If you start something new, start with DocArray. If you want to scale it out and make it a public available cloud-service, then use Jina.