- class docarray.array.mixins.dataloader.DataLoaderMixin#
- classmethod dataloader(path, func, batch_size, protocol='protobuf', compress=None, backend='thread', num_worker=None, pool=None, show_progress=False)#
Load array elements, batches and maps them with a function in parallel, finally yield the batch in DocumentArray
Path]) – Path or filename where the data is stored.
DocumentArray], T]) – a function that takes
DocumentArrayas input and outputs anything. You can either modify elements in-place (only with thread backend) or work later on return elements.
int) – Size of each generated batch (except the last one, which might be smaller)
str) – protocol to use
str]) – compress algorithm to use
if to use multi-process or multi-thread as the parallelization backend. In general, if your
funcis IO-bound then perhaps thread is good enough. If your
funcis CPU-bound then you may use process. In practice, you should try yourselves to figure out the best value. However, if you wish to modify the elements in-place, regardless of IO/CPU-bound, you should always use thread backend.
When using process backend, you should not expect
funcmodify elements in-place. This is because the multiprocessing backing pass the variable via pickle and work in another process. The passed object and the original object do not share the same memory.
int]) – the number of parallel workers. If not given, then the number of CPUs in the system will be used.
None]) – use an existing/external pool. If given, backend is ignored and you will be responsible for closing the pool.
bool) – if set, show a progressbar
- Return type: