docarray.document.mixins.image module#

class docarray.document.mixins.image.ImageDataMixin[source]#

Bases: object

Provide helper functions for Document to support image data.

set_image_tensor_channel_axis(original_channel_axis, new_channel_axis)[source]#

Move the channel axis of the image tensor inplace.

Parameters:
  • original_channel_axis (int) – the original axis of the channel

  • new_channel_axis (int) – the new axis of the channel

Return type:

T

Returns:

itself after processed

load_pil_image_to_datauri(image)[source]#

Convert a pillow image into a datauri with header data:image/png.

Parameters:

image (PILImage) – a pillow image

Returns:

itself after processed

convert_blob_to_image_tensor(width=None, height=None, channel_axis=-1)[source]#

Convert an image blob to a ndarray tensor.

Parameters:
  • width (Optional[int]) – the width of the image tensor.

  • height (Optional[int]) – the height of the tensor.

  • channel_axis (int) – the axis id of the color channel, -1 indicates the color channel info at the last axis

Return type:

T

Returns:

itself after processed

convert_image_tensor_to_uri(channel_axis=-1, image_format='png')[source]#

Assuming tensor is a _valid_ image, set uri accordingly

Parameters:
  • channel_axis (int) – the axis id of the color channel, -1 indicates the color channel info at the last axis

  • image_format (str) – either png or jpeg

Return type:

T

Returns:

itself after processed

convert_image_tensor_to_blob(channel_axis=-1, image_format='png')[source]#

Assuming tensor is a _valid_ image, set blob accordingly

Parameters:
  • channel_axis (int) – the axis id of the color channel, -1 indicates the color channel info at the last axis

  • image_format (str) – either png or jpeg

Return type:

T

Returns:

itself after processed

set_image_tensor_resample(ratio, channel_axis=-1)[source]#

Resample the image tensor into different size inplace.

Parameters:
  • ratio (float) – scale ratio of the resampled image tensor.

  • channel_axis (int) – the axis id of the color channel, -1 indicates the color channel info at the last axis

Return type:

T

Returns:

itself after processed

set_image_tensor_shape(shape, channel_axis=-1)[source]#

Resample the image tensor into different size inplace.

If your current image tensor has shape [H,W,C], then the new tensor will be [*shape, C]

Parameters:
  • shape (Tuple[int, int]) – the new shape of the image tensor.

  • channel_axis (int) – the axis id of the color channel, -1 indicates the color channel info at the last axis

Return type:

T

Returns:

itself after processed

save_image_tensor_to_file(file, channel_axis=-1, image_format='png')[source]#

Save tensor into a file

Parameters:
  • file (Union[str, BinaryIO]) – File or filename to which the data is saved.

  • channel_axis (int) – the axis id of the color channel, -1 indicates the color channel info at the last axis

  • image_format (str) – either png or jpeg

Return type:

T

Returns:

itself after processed

load_uri_to_image_tensor(width=None, height=None, channel_axis=-1, **kwargs)[source]#

Convert the image-like uri into tensor

Parameters:
  • width (Optional[int]) – the width of the image tensor.

  • height (Optional[int]) – the height of the tensor.

  • channel_axis (int) – the axis id of the color channel, -1 indicates the color channel info at the last axis

  • kwargs – keyword arguments to pass to :meth:_uri_to_blob such as timeout

Return type:

T

Returns:

itself after processed

set_image_tensor_inv_normalization(channel_axis=-1, img_mean=(0.485, 0.456, 0.406), img_std=(0.229, 0.224, 0.225))[source]#

Inverse the normalization of a float32 image tensor into a uint8 image tensor inplace.

Parameters:
  • channel_axis (int) – the axis id of the color channel, -1 indicates the color channel info at the last axis

  • img_mean (Tuple[float]) – the mean of all images

  • img_std (Tuple[float]) – the standard deviation of all images

Return type:

T

Returns:

itself after processed

set_image_tensor_normalization(channel_axis=-1, img_mean=(0.485, 0.456, 0.406), img_std=(0.229, 0.224, 0.225))[source]#

Normalize a uint8 image tensor into a float32 image tensor inplace.

Following Pytorch standard, the image must be in the shape of shape (3 x H x W) and will be normalized in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. These two arrays are computed based on millions of images. If you want to train from scratch on your own dataset, you can calculate the new mean and std. Otherwise, using the Imagenet pretrianed model with its own mean and std is recommended.

Parameters:
  • channel_axis (int) – the axis id of the color channel, -1 indicates the color channel info at the last axis

  • img_mean (Tuple[float]) – the mean of all images

  • img_std (Tuple[float]) – the standard deviation of all images

Return type:

T

Returns:

itself after processed

Warning

Please do NOT generalize this function to gray scale, black/white image, it does not make any sense for non RGB image. if you look at their MNIST examples, the mean and stddev are 1-dimensional (since the inputs are greyscale– no RGB channels).

convert_image_tensor_to_sliding_windows(window_shape=(64, 64), strides=None, padding=False, channel_axis=-1, as_chunks=False)[source]#

Convert tensor into a sliding window view with the given window shape tensor inplace.

Parameters:
  • window_shape (Tuple[int, int]) – desired output size. If size is a sequence like (h, w), the output size will be matched to this. If size is an int, the output will have the same height and width as the target_size.

  • strides (Optional[Tuple[int, int]]) – the strides between two neighboring sliding windows. strides is a sequence like (h, w), in which denote the strides on the vertical and the horizontal axis. When not given, using window_shape

  • padding (bool) – If False, only patches which are fully contained in the input image are included. If True, all patches whose starting point is inside the input are included, and areas outside the input default to zero. The padding argument has no effect on the size of each patch, it determines how many patches are extracted. Default is False.

  • channel_axis (int) – the axis id of the color channel, -1 indicates the color channel info at the last axis.

  • as_chunks (bool) – If set, each sliding window will be stored in the chunk of the current Document

Return type:

T

Returns:

Document itself after processed