Nested Structure#

Document can be nested both horizontally and vertically via .matches and .chunks. The picture below illustrates the recursive Document structure.

../../../_images/nested-structure.svg

Attribute

Description

doc.chunks

The list of sub-Documents of this Document. They have granularity + 1 but same adjacency

doc.matches

The list of matched Documents of this Document. They have adjacency + 1 but same granularity

doc.granularity

The “depth” of the nested chunks structure

doc.adjacency

The “width” of the nested match structure

You can add chunks (sub-Document) and matches (neighbour-Document) to a Document:

  • Add in constructor:

    d = Document(chunks=[Document(), Document()], matches=[Document(), Document()])
    
  • Add to existing Document:

    d = Document()
    d.chunks = [Document(), Document()]
    d.matches = [Document(), Document()]
    
  • Add to existing doc.chunks or doc.matches:

    d = Document()
    d.chunks.append(Document())
    d.matches.append(Document())
    

Both doc.chunks and doc.matches return DocumentArray.

To get a clear picture of a nested Document, use summary(), e.g.:

d.summary()
 <Document ('id', 'chunks', 'matches') at 7f907d786d6c11ec840a1e008a366d49>
    └─ matches
          ├─ <Document ('id', 'adjacency') at 7f907c606d6c11ec840a1e008a366d49>
          └─ <Document ('id', 'adjacency') at 7f907cba6d6c11ec840a1e008a366d49>
    └─ chunks
          ├─ <Document ('id', 'parent_id', 'granularity') at 7f907ab26d6c11ec840a1e008a366d49>
          └─ <Document ('id', 'parent_id', 'granularity') at 7f907c106d6c11ec840a1e008a366d49>

What’s next?#

When you have multiple Documents with nested structures, traversing over certain chunks and matches can be crucial. Fortunately, this is extremely simple thanks to DocumentArray as shown in Access Documents.

Note that some methods rely on these two attributes, some methods require these two attributes to be filled in advance. For example, match() will fill .matches, whereas evaluate() requires .matches to be filled.