Support New Modality#

Each type in docarray.typing corresponds to one modality. Supporting a new modality means adding a new type, and specifying how it is translated from/to Document.

Whether it is about adding a new type, or changing the behavior of an existing type, you can leverage the field() function.

Create new types#

Say you want to define a new type MyImage, where image is accepted as a URI, but instead of loading it to .tensor of the sub-document, you want to load it to .blob. This is different from the built-in Image type behavior.

All you need to do is:

from docarray import Document

from typing import TypeVar

MyImage = TypeVar('MyImage', bound=str)


def my_setter(value) -> 'Document':
    return Document(uri=value).load_uri_to_blob()


def my_getter(doc: 'Document'):
    return doc.uri

Now you can use MyImage type in the dataclass:

from docarray import dataclass, field, Document


@dataclass
class MMDoc:
    banner: MyImage = field(setter=my_setter, getter=my_getter, default='test-1.jpeg')


Document(MMDoc()).summary()
📄 Document: bde1ab74306c2f63188069879e3945ac
└── 💠 Chunks
    └── 📄 Document: cd594a6870a8921d7a9c6b0ec764251d
        ╭─────────────┬────────────────────────────────────────────────────────────────╮
        │ Attribute   │ Value                                                          │
        ├─────────────┼────────────────────────────────────────────────────────────────┤
        │ parent_id   │ bde1ab74306c2f63188069879e3945ac                               │
        │ granularity │ 1                                                              │
        │ blob        │ b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x0… │
        │             │ (length: 56810)                                                │
        │ mime_type   │ image/jpeg                                                     │
        │ uri         │ test-1.jpeg                                                    │
        ╰─────────────┴────────────────────────────────────────────────────────────────╯

Specifically, setter defines how you want to store the value in the sub-document. Usually you need to process it and fill the value into one of the attributes defined by the Document schema. You may also want to keep the original value so that you can recover it in getter later. setter will be invoked when calling Document() on this dataclass.

getter defines how you want to recover the original value from the sub-Document. getter will be invoked when calling dataclass constructor given a Document object.

Override existing types#

To override getter, setter behavior of the existing types, you can define a map and pass it to the argument of type_var_map in the dataclass() function.

from docarray import dataclass, field, Document
from docarray.typing import Image


def my_setter(value) -> 'Document':
    print('im setting .uri only not loading it!')
    return Document(uri=value)


def my_getter(doc: 'Document'):
    print('im returning .uri!')
    return doc.uri


@dataclass(
    type_var_map={
        Image: lambda x: field(setter=my_setter, getter=my_getter, _source_field=x)
    }
)
class MMDoc:
    banner: Image = field(setter=my_setter, getter=my_getter, default='test-1.jpeg')


m1 = MMDoc()
m2 = MMDoc(Document(m1))

assert m1 == m2
im setting .uri only not loading it!
im returning .uri!