docarray.document.mixins.featurehash module#

class docarray.document.mixins.featurehash.FeatureHashMixin[source]#

Bases: object

Provide helper functions for feature hashing.

embed_feature_hashing(n_dim=256, sparse=False, fields=('text', 'tags'), max_value=1000000)[source]#

Convert an arbitrary set of attributes into a fixed-dimensional matrix using the hashing trick.

  • n_dim (int) – the dimensionality of each document in the output embedding. Small numbers of features are likely to cause hash collisions, but large numbers will cause larger overall parameter dimensions.

  • sparse (bool) – whether the resulting feature matrix should be a sparse csr_matrix or dense ndarray. Note that this feature requires scipy

  • fields (Tuple[str, ...]) – which attributes to be considered as for feature hashing.

Return type: