- class docarray.array.mixins.text.TextToolsMixin#
Help functions used in NLP for DA and DAM
- get_vocabulary(min_freq=1, text_attrs=('text',))#
Get the text vocabulary in a dict that maps from the word to the index from all Documents.
...]) – the textual attributes where vocabulary will be derived from
int) – the minimum word frequency to be considered into the vocabulary.
- Return type:
a vocabulary in dictionary where key is the word, value is the index. The value is 2-index, where 0 is reserved for padding, 1 is reserved for unknown token.