Construct#

Construct an empty array#

from docarray import DocumentArray

da = DocumentArray()
<DocumentArray (length=0) at 4453362704>

Now you can use list-like interfaces such as .append() and .extend() as you would add elements to a Python List.

da.append(Document(text='hello world!'))
da.extend([Document(text='hello'), Document(text='world!')])
<DocumentArray (length=3) at 4446140816>

Directly printing a DocumentArray does not show you too much useful information, you can use summary().

da.summary()
                  Documents Summary                   
                                                      
  Length                 3                            
  Homogenous Documents   True                         
  Common Attributes      ('id', 'mime_type', 'text')  
                                                      
                     Attributes Summary                     
                                                            
  Attribute   Data type   #Unique values   Has empty value  
 ────────────────────────────────────────────────────────── 
  id          ('str',)    3                False            
  mime_type   ('str',)    1                False            
  text        ('str',)    3                False    

Construct with empty Documents#

Like numpy.zeros(), you can quickly build a DocumentArray with only empty Documents:

from docarray import DocumentArray

da = DocumentArray.empty(10)
<DocumentArray (length=10) at 4453362704>

Construct from list-like objects#

You can construct DocumentArray from a Sequence, List, Tuple or Iterator that yields Document object.

from docarray import DocumentArray, Document

da = DocumentArray([Document(text='hello'), Document(text='world')])
<DocumentArray (length=2) at 4866772176>
from docarray import DocumentArray, Document

da = DocumentArray((Document() for _ in range(10)))
<DocumentArray (length=10) at 4866772176>

As DocumentArray itself is also a “list-like object that yields Document”, you can also construct DocumentArray from another DocumentArray:

da = DocumentArray(...)
da1 = DocumentArray(da)

Construct from multiple DocumentArray#

You can use + or += to concatenate DocumentArrays together:

from docarray import DocumentArray

da1 = DocumentArray.empty(3)
da2 = DocumentArray.empty(4)
da3 = DocumentArray.empty(5)
print(da1 + da2 + da3)

da1 += da2
print(da1)
<DocumentArray (length=12) at 5024988176>
<DocumentArray (length=7) at 4525853328>

Construct from a single Document#

from docarray import DocumentArray, Document

d1 = Document(text='hello')
da = DocumentArray(d1)
<DocumentArray (length=1) at 4452802192>

Deep copy on elements#

Note that, as in Python list, adding Document object into DocumentArray only adds its memory reference. The original Document is not copied. If you change the original Document afterwards, then the one inside DocumentArray will also change. Here is an example,

from docarray import DocumentArray, Document

d1 = Document(text='hello')
da = DocumentArray(d1)

print(da[0].text)
d1.text = 'world'
print(da[0].text)
hello
world

This may surprise some users, but considering the following Python code, you will find this behavior is very natural and authentic.

d = {'hello': None}
a = [d]

print(a[0]['hello'])
d['hello'] = 'world'
print(a[0]['hello'])
None
world

To make a deep copy, set DocumentArray(..., copy=True). Now all Documents in this DocumentArray are completely new objects with identical contents as the original ones.

from docarray import DocumentArray, Document

d1 = Document(text='hello')
da = DocumentArray(d1, copy=True)

print(da[0].text)
d1.text = 'world'
print(da[0].text)
hello
hello

Construct from local files#

You may recall the common pattern that I mentioned here. With from_files() One can easily construct a DocumentArray object with all file paths defined by a glob expression.

from docarray import DocumentArray

da_jpg = DocumentArray.from_files('images/*.jpg')
da_png = DocumentArray.from_files('images/*.png')
da_all = DocumentArray.from_files(['images/**/*.png', 'images/**/*.jpg', 'images/**/*.jpeg'])

This will scan all filenames that match the expression and construct Documents with filled .uri attribute. You can control if to read each as text or binary with read_mode argument.

What’s next?#

In the next chapter, we will see how to construct DocumentArray from binary bytes, JSON, CSV, dataframe, Protobuf message.