gluoncv.data.transforms

This file includes various transformations that are critical to vision tasks.

Bounding Box Transforms

crop

Crop bounding boxes according to slice area.

flip

Flip bounding boxes according to image flipping directions.

resize

Resize bouding boxes according to image resize operation.

translate

Translate bounding boxes by offsets.

experimental.bbox.random_crop_with_constraints

Crop an image randomly with bounding box constraints.

Image Transforms

imresize

Resize image with OpenCV.

resize_long

Resizes longer edge to size.

resize_short_within

Resizes shorter edge to size but make sure it’s capped at maximum size.

random_pca_lighting

Apply random pca lighting noise to input image.

random_expand

Random expand original image with borders, this is identical to placing the original image on a larger canvas.

random_flip

Randomly flip image along horizontal and vertical with probabilities.

resize_contain

Resize the image to fit in the given area while keeping aspect ratio.

ten_crop

Crop 10 regions from an array.

Instance Segmentation Mask Transforms

flip

Flip polygons according to image flipping directions.

resize

Resize polygons according to image resize operation.

to_mask

Convert list of polygons to full size binary mask

fill

Fill mask to full image size

Preset Transforms

We include presets for reproducing SOTA performances described in different papers. This is a complimentary section and APIs are prone to changes.

Single Shot Multibox Object Detector

load_test

A util function to load all images, transform them to tensor by applying normalizations.

transform_test

A util function to transform all images to tensors as network input by applying normalizations.

SSDDefaultTrainTransform

Default SSD training transform which includes tons of image augmentations.

SSDDefaultValTransform

Default SSD validation transform.

Faster RCNN

load_test

A util function to load all images, transform them to tensor by applying normalizations.

transform_test

A util function to transform all images to tensors as network input by applying normalizations.

FasterRCNNDefaultTrainTransform

Default Faster-RCNN training transform.

FasterRCNNDefaultValTransform

Default Faster-RCNN validation transform.

Mask RCNN

load_test

A util function to load all images, transform them to tensor by applying normalizations.

transform_test

A util function to transform all images to tensors as network input by applying normalizations.

MaskRCNNDefaultTrainTransform

Default Mask RCNN training transform.

MaskRCNNDefaultValTransform

Default Mask RCNN validation transform.

YOLO

load_test

A util function to load all images, transform them to tensor by applying normalizations.

transform_test

A util function to transform all images to tensors as network input by applying normalizations.

YOLO3DefaultTrainTransform

Default YOLO training transform which includes tons of image augmentations.

YOLO3DefaultValTransform

Default YOLO validation transform.

API Reference

Bounding boxes transformation functions.

gluoncv.data.transforms.bbox.affine_transform(pt, t)[source]

Apply affine transform to a bounding box given transform matrix t.

Parameters
Returns

New bounding box with shape (1, 2).

Return type

numpy.ndarray

gluoncv.data.transforms.bbox.crop(bbox, crop_box=None, allow_outside_center=True)[source]

Crop bounding boxes according to slice area.

This method is mainly used with image cropping to ensure bonding boxes fit within the cropped image.

Parameters
  • bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.

  • crop_box (tuple) – Tuple of length 4. \((x_{min}, y_{min}, width, height)\)

  • allow_outside_center (bool) – If False, remove bounding boxes which have centers outside cropping area.

Returns

Cropped bounding boxes with shape (M, 4+) where M <= N.

Return type

numpy.ndarray

gluoncv.data.transforms.bbox.flip(bbox, size, flip_x=False, flip_y=False)[source]

Flip bounding boxes according to image flipping directions.

Parameters
  • bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.

  • size (tuple) – Tuple of length 2: (width, height).

  • flip_x (bool) – Whether flip horizontally.

  • flip_y (bool) – Whether flip vertically.

Returns

Flipped bounding boxes with original shape.

Return type

numpy.ndarray

gluoncv.data.transforms.bbox.get_affine_transform(center, scale, rot, output_size, shift=array([0.0, 0.0], dtype=float32), inv=0)[source]

Get affine transform matrix given center, scale and rotation.

Parameters
  • center (tuple of float) – Center point.

  • scale (float) – Scaling factor.

  • rot (float) – Rotation degree.

  • output_size (tuple of int) – (width, height) of the output size.

  • shift (float) – Shift factor.

  • inv (bool) – Whether inverse the computation.

Returns

Affine matrix.

Return type

numpy.ndarray

gluoncv.data.transforms.bbox.resize(bbox, in_size, out_size)[source]

Resize bouding boxes according to image resize operation.

Parameters
  • bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.

  • in_size (tuple) – Tuple of length 2: (width, height) for input.

  • out_size (tuple) – Tuple of length 2: (width, height) for output.

Returns

Resized bounding boxes with original shape.

Return type

numpy.ndarray

gluoncv.data.transforms.bbox.translate(bbox, x_offset=0, y_offset=0)[source]

Translate bounding boxes by offsets.

Parameters
  • bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.

  • x_offset (int or float) – Offset along x axis.

  • y_offset (int or float) – Offset along y axis.

Returns

Translated bounding boxes with original shape.

Return type

numpy.ndarray

Addtional image transforms.

class gluoncv.data.transforms.block.RandomCrop(size, pad=None, interpolation=2)[source]

Randomly crop src with size (width, height). Padding is optional. Upsample result if src is smaller than size.

Parameters
  • size (int or tuple of (W, H)) – Size of the final output.

  • pad (int or tuple) –

    if int, size of the zero-padding if tuple, number of values padded to the edges of each axis.

    ((before_1, after_1), … (before_N, after_N)) unique pad widths for each axis. ((before, after),) yields same before and after pad for each axis. (pad,) or int is a shortcut for before = after = pad width for all axes.

  • interpolation (int) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices.

Inputs:
  • data: input tensor with (Hi x Wi x C) shape.

Outputs:
  • out: output tensor with (size[0] x size[1] x C) or (size x size x C) shape.

forward(x)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

Parameters

*args (list of NDArray) – Input tensors.

class gluoncv.data.transforms.block.RandomErasing(probability=0.5, s_min=0.02, s_max=0.4, ratio=0.3, mean=(125.31, 122.96, 113.86))[source]

Randomly erasing the area in src between s_min and s_max with probability. ratio controls the ratio between width and height. mean means the value in erasing area.

Parameters
  • probability (float) – Probability of erasing.

  • s_min (float) – Min area to all area.

  • s_max (float) – Max area to all area.

  • ratio (float) – The ratio between width and height.

  • mean (int or tuple of (R, G, B)) – The value in erasing area.

Inputs:
  • data: input tensor with (Hi x Wi x C) shape.

Outputs:
  • out: output tensor with (Hi x Wi x C) shape.

forward(x)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

Parameters

*args (list of NDArray) – Input tensors.

Extended image transformations to mxnet.image.

gluoncv.data.transforms.image.imresize(src, w, h, interp=1)[source]

Resize image with OpenCV.

This is a duplicate of mxnet.image.imresize for name space consistency.

Parameters
  • src (mxnet.nd.NDArray) – source image

  • w (int, required) – Width of resized image.

  • h (int, required) – Height of resized image.

  • interp (int, optional, default='1') – Interpolation method (default=cv2.INTER_LINEAR).

  • out (NDArray, optional) – The output NDArray to hold the result.

Returns

out – The output of this function.

Return type

NDArray or list of NDArrays

Examples

>>> import mxnet as mx
>>> from gluoncv import data as gdata
>>> img = mx.random.uniform(0, 255, (300, 300, 3)).astype('uint8')
>>> print(img.shape)
(300, 300, 3)
>>> img = gdata.transforms.image.imresize(img, 200, 200)
>>> print(img.shape)
(200, 200, 3)
gluoncv.data.transforms.image.random_expand(src, max_ratio=4, fill=0, keep_ratio=True)[source]

Random expand original image with borders, this is identical to placing the original image on a larger canvas.

Parameters
  • src (mxnet.nd.NDArray) – The original image with HWC format.

  • max_ratio (int or float) – Maximum ratio of the output image on both direction(vertical and horizontal)

  • fill (int or float or array-like) – The value(s) for padded borders. If fill is numerical type, RGB channels will be padded with single value. Otherwise fill must have same length as image channels, which resulted in padding with per-channel values.

  • keep_ratio (bool) – If True, will keep output image the same aspect ratio as input.

Returns

  • mxnet.nd.NDArray – Augmented image.

  • tuple – Tuple of (offset_x, offset_y, new_width, new_height)

gluoncv.data.transforms.image.random_flip(src, px=0, py=0, copy=False)[source]

Randomly flip image along horizontal and vertical with probabilities.

Parameters
  • src (mxnet.nd.NDArray) – Input image with HWC format.

  • px (float) – Horizontal flip probability [0, 1].

  • py (float) – Vertical flip probability [0, 1].

  • copy (bool) – If True, return a copy of input

Returns

  • mxnet.nd.NDArray – Augmented image.

  • tuple – Tuple of (flip_x, flip_y), records of whether flips are applied.

gluoncv.data.transforms.image.random_pca_lighting(src, alphastd, eigval=None, eigvec=None)[source]

Apply random pca lighting noise to input image.

Parameters
  • img (mxnet.nd.NDArray) – Input image with HWC format.

  • alphastd (float) – Noise level [0, 1) for image with range [0, 255].

  • eigval (list of floats.) – Eigen values, defaults to [55.46, 4.794, 1.148].

  • eigvec (nested lists of floats) –

    Eigen vectors with shape (3, 3), defaults to [[-0.5675, 0.7192, 0.4009],

    [-0.5808, -0.0045, -0.8140], [-0.5836, -0.6948, 0.4203]].

Returns

Augmented image.

Return type

mxnet.nd.NDArray

gluoncv.data.transforms.image.resize_contain(src, size, fill=0)[source]

Resize the image to fit in the given area while keeping aspect ratio.

If both the height and the width in size are larger than the height and the width of input image, the image is placed on the center with an appropriate padding to match size. Otherwise, the input image is scaled to fit in a canvas whose size is size while preserving aspect ratio.

Parameters
  • src (mxnet.nd.NDArray) – The original image with HWC format.

  • size (tuple) – Tuple of length 2 as (width, height).

  • fill (int or float or array-like) – The value(s) for padded borders. If fill is numerical type, RGB channels will be padded with single value. Otherwise fill must have same length as image channels, which resulted in padding with per-channel values.

Returns

  • mxnet.nd.NDArray – Augmented image.

  • tuple – Tuple of (offset_x, offset_y, scaled_x, scaled_y)

gluoncv.data.transforms.image.resize_long(src, size, interp=2)[source]

Resizes longer edge to size. Note: resize_long uses OpenCV (not the CV2 Python library). MXNet must have been built with OpenCV for resize_long to work. Resizes the original image by setting the longer edge to size and setting the shorter edge accordingly. This will ensure the new image will fit into the size specified. Resizing function is called from OpenCV.

Parameters
  • src (NDArray) – The original image.

  • size (int) – The length to be set for the shorter edge.

  • interp (int, optional, default=2) – Interpolation method used for resizing the image. Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Area-based (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moire-free results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 3: Bicubic interpolation over 4x4 pixel neighborhood. 4: Lanczos interpolation over 8x8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method mentioned above. Note: When shrinking an image, it will generally look best with AREA-based interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK). More details can be found in the documentation of OpenCV, please refer to http://docs.opencv.org/master/da/d54/group__imgproc__transform.html.

Returns

An ‘NDArray’ containing the resized image.

Return type

NDArray

Example

>>> with open("flower.jpeg", 'rb') as fp:
...     str_image = fp.read()
...
>>> image = mx.img.imdecode(str_image)
>>> image
<NDArray 2321x3482x3 @cpu(0)>
>>> size = 640
>>> new_image = mx.img.resize_long(image, size)
>>> new_image
<NDArray 386x640x3 @cpu(0)>
gluoncv.data.transforms.image.resize_short_within(src, short, max_size, mult_base=1, interp=2)[source]

Resizes shorter edge to size but make sure it’s capped at maximum size. Note: resize_short_within uses OpenCV (not the CV2 Python library). MXNet must have been built with OpenCV for resize_short_within to work. Resizes the original image by setting the shorter edge to size and setting the longer edge accordingly. Also this function will ensure the new image will not exceed max_size even at the longer side. Resizing function is called from OpenCV.

Parameters
  • src (NDArray) – The original image.

  • short (int) – Resize shorter side to short.

  • max_size (int) – Make sure the longer side of new image is smaller than max_size.

  • mult_base (int, default is 1) – Width and height are rounded to multiples of mult_base.

  • interp (int, optional, default=2) – Interpolation method used for resizing the image. Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Area-based (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moire-free results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 3: Bicubic interpolation over 4x4 pixel neighborhood. 4: Lanczos interpolation over 8x8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method mentioned above. Note: When shrinking an image, it will generally look best with AREA-based interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK). More details can be found in the documentation of OpenCV, please refer to http://docs.opencv.org/master/da/d54/group__imgproc__transform.html.

Returns

An ‘NDArray’ containing the resized image.

Return type

NDArray

Example

>>> with open("flower.jpeg", 'rb') as fp:
...     str_image = fp.read()
...
>>> image = mx.img.imdecode(str_image)
>>> image
<NDArray 2321x3482x3 @cpu(0)>
>>> new_image = resize_short_within(image, short=800, max_size=1000)
>>> new_image
<NDArray 667x1000x3 @cpu(0)>
>>> new_image = resize_short_within(image, short=800, max_size=1200)
>>> new_image
<NDArray 800x1200x3 @cpu(0)>
>>> new_image = resize_short_within(image, short=800, max_size=1200, mult_base=32)
>>> new_image
<NDArray 800x1184x3 @cpu(0)>
gluoncv.data.transforms.image.ten_crop(src, size)[source]

Crop 10 regions from an array. This is performed same as: http://chainercv.readthedocs.io/en/stable/reference/transforms.html#ten-crop

This method crops 10 regions. All regions will be in shape :obj`size`. These regions consist of 1 center crop and 4 corner crops and horizontal flips of them. The crops are ordered in this order. * center crop * top-left crop * bottom-left crop * top-right crop * bottom-right crop * center crop (flipped horizontally) * top-left crop (flipped horizontally) * bottom-left crop (flipped horizontally) * top-right crop (flipped horizontally) * bottom-right crop (flipped horizontally)

Parameters
  • src (mxnet.nd.NDArray) – Input image.

  • size (tuple) – Tuple of length 2, as (width, height) of the cropped areas.

Returns

The cropped images with shape (10, size[1], size[0], C)

Return type

mxnet.nd.NDArray

Experimental bounding box transformations.

gluoncv.data.transforms.experimental.bbox.bbox_crop(bbox, crop_box=None, allow_outside_center=True)

Crop bounding boxes according to slice area.

This method is mainly used with image cropping to ensure bonding boxes fit within the cropped image.

Parameters
  • bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.

  • crop_box (tuple) – Tuple of length 4. \((x_{min}, y_{min}, width, height)\)

  • allow_outside_center (bool) – If False, remove bounding boxes which have centers outside cropping area.

Returns

Cropped bounding boxes with shape (M, 4+) where M <= N.

Return type

numpy.ndarray

gluoncv.data.transforms.experimental.bbox.bbox_iou(bbox_a, bbox_b, offset=0)[source]

Calculate Intersection-Over-Union(IOU) of two bounding boxes.

Parameters
  • bbox_a (numpy.ndarray) – An ndarray with shape \((N, 4)\).

  • bbox_b (numpy.ndarray) – An ndarray with shape \((M, 4)\).

  • offset (float or int, default is 0) – The offset is used to control the whether the width(or height) is computed as (right - left + offset). Note that the offset must be 0 for normalized bboxes, whose ranges are in [0, 1].

Returns

An ndarray with shape \((N, M)\) indicates IOU between each pairs of bounding boxes in bbox_a and bbox_b.

Return type

numpy.ndarray

gluoncv.data.transforms.experimental.bbox.random_crop_with_constraints(bbox, size, min_scale=0.3, max_scale=1, max_aspect_ratio=2, constraints=None, max_trial=50)[source]

Crop an image randomly with bounding box constraints.

This data augmentation is used in training of Single Shot Multibox Detector [#]_. More details can be found in data augmentation section of the original paper. .. [#] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy,

Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.

Parameters
  • bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.

  • size (tuple) – Tuple of length 2 of image shape as (width, height).

  • min_scale (float) – The minimum ratio between a cropped region and the original image. The default value is 0.3.

  • max_scale (float) – The maximum ratio between a cropped region and the original image. The default value is 1.

  • max_aspect_ratio (float) – The maximum aspect ratio of cropped region. The default value is 2.

  • constraints (iterable of tuples) – An iterable of constraints. Each constraint should be (min_iou, max_iou) format. If means no constraint if set min_iou or max_iou to None. If this argument defaults to None, ((0.1, None), (0.3, None), (0.5, None), (0.7, None), (0.9, None), (None, 1)) will be used.

  • max_trial (int) – Maximum number of trials for each constraint before exit no matter what.

Returns

  • numpy.ndarray – Cropped bounding boxes with shape (M, 4+) where M <= N.

  • tuple – Tuple of length 4 as (x_offset, y_offset, new_width, new_height).

Experimental image transformations.

gluoncv.data.transforms.experimental.image.np_random_color_distort(image, data_rng=None, eig_val=None, eig_vec=None, var=0.4, alphastd=0.1)[source]

Numpy version of random color jitter.

Parameters
  • image (numpy.ndarray) – original image.

  • data_rng (numpy.random.rng) – Numpy random number generator.

  • eig_val (numpy.ndarray) – Eigen values.

  • eig_vec (numpy.ndarray) – Eigen vectors.

  • var (float) – Variance for the color jitters.

  • alphastd (type) – Jitter for the brightness.

Returns

The jittered image

Return type

numpy.ndarray

gluoncv.data.transforms.experimental.image.random_color_distort(src, brightness_delta=32, contrast_low=0.5, contrast_high=1.5, saturation_low=0.5, saturation_high=1.5, hue_delta=18)[source]

Randomly distort image color space. Note that input image should in original range [0, 255].

Parameters
  • src (mxnet.nd.NDArray) – Input image as HWC format.

  • brightness_delta (int) – Maximum brightness delta. Defaults to 32.

  • contrast_low (float) – Lowest contrast. Defaults to 0.5.

  • contrast_high (float) – Highest contrast. Defaults to 1.5.

  • saturation_low (float) – Lowest saturation. Defaults to 0.5.

  • saturation_high (float) – Highest saturation. Defaults to 1.5.

  • hue_delta (int) – Maximum hue delta. Defaults to 18.

Returns

Distorted image in HWC format.

Return type

mxnet.nd.NDArray

Transforms described in https://arxiv.org/abs/1512.02325.

class gluoncv.data.transforms.presets.ssd.SSDDALIPipeline(num_workers, device_id, batch_size, data_shape, anchors, dataset_reader, seed=- 1)[source]

DALI Pipeline with SSD training transform.

Parameters
  • device_id (int) – DALI pipeline arg - Device id.

  • num_workers – DALI pipeline arg - Number of CPU workers.

  • batch_size – Batch size.

  • data_shape (int) – Height and width length. (height==width in SSD)

  • anchors (float list) – Normalized [ltrb] anchors generated from SSD networks. The shape length be N*4 since it is a list of the N anchors that have all 4 float elements.

  • dataset_reader (float) – Partial pipeline object, which __call__ function has to return (images, bboxes, labels) DALI EdgeReference tuple.

  • seed (int) – Random seed. Default value is -1, which corresponds to no seed.

define_graph()[source]

Define the DALI graph.

class gluoncv.data.transforms.presets.ssd.SSDDefaultTrainTransform(width, height, anchors=None, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), iou_thresh=0.5, box_norm=(0.1, 0.1, 0.2, 0.2), **kwargs)[source]

Default SSD training transform which includes tons of image augmentations.

Parameters
  • width (int) – Image width.

  • height (int) – Image height.

  • anchors (mxnet.nd.NDArray, optional) –

    Anchors generated from SSD networks, the shape must be (1, N, 4). Since anchors are shared in the entire batch so it is 1 for the first dimension. N is the number of anchors for each image.

    Hint

    If anchors is None, the transformation will not generate training targets. Otherwise it will generate training targets to accelerate the training phase since we push some workload to CPU workers instead of GPUs.

  • mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].

  • std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].

  • iou_thresh (float) – IOU overlap threshold for maximum matching, default is 0.5.

  • box_norm (array-like of size 4, default is (0.1, 0.1, 0.2, 0.2)) – Std value to be divided from encoded values.

class gluoncv.data.transforms.presets.ssd.SSDDefaultValTransform(width, height, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

Default SSD validation transform.

Parameters
  • width (int) – Image width.

  • height (int) – Image height.

  • mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].

  • std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].

gluoncv.data.transforms.presets.ssd.load_test(filenames, short, max_size=1024, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

A util function to load all images, transform them to tensor by applying normalizations. This function support 1 filename or iterable of filenames.

Parameters
  • filenames (str or list of str) – Image filename(s) to be loaded.

  • short (int) – Resize image short side to this short and keep aspect ratio.

  • max_size (int, optional) – Maximum longer side length to fit image. This is to limit the input image shape. Aspect ratio is intact because we support arbitrary input size in our SSD implementation.

  • mean (iterable of float) – Mean pixel values.

  • std (iterable of float) – Standard deviations of pixel values.

Returns

A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original un-normalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.

Return type

(mxnet.NDArray, numpy.ndarray) or list of such tuple

gluoncv.data.transforms.presets.ssd.transform_test(imgs, short, max_size=1024, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

A util function to transform all images to tensors as network input by applying normalizations. This function support 1 NDArray or iterable of NDArrays.

Parameters
  • imgs (NDArray or iterable of NDArray) – Image(s) to be transformed.

  • short (int) – Resize image short side to this short and keep aspect ratio.

  • max_size (int, optional) – Maximum longer side length to fit image. This is to limit the input image shape. Aspect ratio is intact because we support arbitrary input size in our SSD implementation.

  • mean (iterable of float) – Mean pixel values.

  • std (iterable of float) – Standard deviations of pixel values.

Returns

A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original un-normalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.

Return type

(mxnet.NDArray, numpy.ndarray) or list of such tuple

Transforms for RCNN series.

class gluoncv.data.transforms.presets.rcnn.FasterRCNNDefaultTrainTransform(short=600, max_size=1000, net=None, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), box_norm=(1.0, 1.0, 1.0, 1.0), num_sample=256, pos_iou_thresh=0.7, neg_iou_thresh=0.3, pos_ratio=0.5, flip_p=0.5, ashape=128, multi_stage=False, **kwargs)[source]

Default Faster-RCNN training transform.

Parameters
  • short (int/tuple, default is 600) – Resize image shorter side to short. Resize the shorter side of the image randomly within the given range, if it is a tuple.

  • max_size (int, default is 1000) – Make sure image longer side is smaller than max_size.

  • net (mxnet.gluon.HybridBlock, optional) –

    The faster-rcnn network.

    Hint

    If net is None, the transformation will not generate training targets. Otherwise it will generate training targets to accelerate the training phase since we push some workload to CPU workers instead of GPUs.

  • mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].

  • std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].

  • box_norm (array-like of size 4, default is (1., 1., 1., 1.)) – Std value to be divided from encoded values.

  • num_sample (int, default is 256) – Number of samples for RPN targets.

  • pos_iou_thresh (float, default is 0.7) – Anchors larger than pos_iou_thresh is regarded as positive samples.

  • neg_iou_thresh (float, default is 0.3) – Anchors smaller than neg_iou_thresh is regarded as negative samples. Anchors with IOU in between pos_iou_thresh and neg_iou_thresh are ignored.

  • pos_ratio (float, default is 0.5) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.

  • flip_p (float, default is 0.5) – Probability to flip horizontally, by default is 0.5 for random horizontal flip. You may set it to 0 to disable random flip or 1 to force flip.

  • ashape (int, default is 128) – Defines shape of pre generated anchors for target generation

  • multi_stage (boolean, default is False) – Whether the network output multi stage features.

class gluoncv.data.transforms.presets.rcnn.FasterRCNNDefaultValTransform(short=600, max_size=1000, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

Default Faster-RCNN validation transform.

Parameters
  • short (int, default is 600) – Resize image shorter side to short.

  • max_size (int, default is 1000) – Make sure image longer side is smaller than max_size.

  • mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].

  • std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].

class gluoncv.data.transforms.presets.rcnn.MaskRCNNDefaultTrainTransform(short=600, max_size=1000, net=None, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), box_norm=(1.0, 1.0, 1.0, 1.0), num_sample=256, pos_iou_thresh=0.7, neg_iou_thresh=0.3, pos_ratio=0.5, ashape=128, multi_stage=False, **kwargs)[source]

Default Mask RCNN training transform.

Parameters
  • short (int/tuple, default is 600) – Resize image shorter side to short. Resize the shorter side of the image randomly within the given range, if it is a tuple.

  • max_size (int, default is 1000) – Make sure image longer side is smaller than max_size.

  • net (mxnet.gluon.HybridBlock, optional) –

    The Mask R-CNN network.

    Hint

    If net is None, the transformation will not generate training targets. Otherwise it will generate training targets to accelerate the training phase since we push some workload to CPU workers instead of GPUs.

  • mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].

  • std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].

  • box_norm (array-like of size 4, default is (1., 1., 1., 1.)) – Std value to be divided from encoded values.

  • num_sample (int, default is 256) – Number of samples for RPN targets.

  • pos_iou_thresh (float, default is 0.7) – Anchors larger than pos_iou_thresh is regarded as positive samples.

  • neg_iou_thresh (float, default is 0.3) – Anchors smaller than neg_iou_thresh is regarded as negative samples. Anchors with IOU in between pos_iou_thresh and neg_iou_thresh are ignored.

  • pos_ratio (float, default is 0.5) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.

  • ashape (int, default is 128) – Defines shape of pre generated anchors for target generation

  • multi_stage (boolean, default is False) – Whether the network output multi stage features.

class gluoncv.data.transforms.presets.rcnn.MaskRCNNDefaultValTransform(short=600, max_size=1000, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

Default Mask RCNN validation transform.

Parameters
  • short (int, default is 600) – Resize image shorter side to short.

  • max_size (int, default is 1000) – Make sure image longer side is smaller than max_size.

  • mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].

  • std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].

gluoncv.data.transforms.presets.rcnn.load_test(filenames, short=600, max_size=1000, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

A util function to load all images, transform them to tensor by applying normalizations. This function support 1 filename or list of filenames.

Parameters
  • filenames (str or list of str) – Image filename(s) to be loaded.

  • short (int, optional, default is 600) – Resize image short side to this short and keep aspect ratio.

  • max_size (int, optional, default is 1000) – Maximum longer side length to fit image. This is to limit the input image shape, avoid processing too large image.

  • mean (iterable of float) – Mean pixel values.

  • std (iterable of float) – Standard deviations of pixel values.

Returns

A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original un-normalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.

Return type

(mxnet.NDArray, numpy.ndarray) or list of such tuple

gluoncv.data.transforms.presets.rcnn.transform_test(imgs, short=600, max_size=1000, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

A util function to transform all images to tensors as network input by applying normalizations. This function support 1 NDArray or iterable of NDArrays.

Parameters
  • imgs (NDArray or iterable of NDArray) – Image(s) to be transformed.

  • short (int, optional, default is 600) – Resize image short side to this short and keep aspect ratio.

  • max_size (int, optional, default is 1000) – Maximum longer side length to fit image. This is to limit the input image shape, avoid processing too large image.

  • mean (iterable of float) – Mean pixel values.

  • std (iterable of float) – Standard deviations of pixel values.

Returns

A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original un-normalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.

Return type

(mxnet.NDArray, numpy.ndarray) or list of such tuple

Transforms for YOLO series.

class gluoncv.data.transforms.presets.yolo.YOLO3DefaultTrainTransform(width, height, net=None, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), mixup=False, **kwargs)[source]

Default YOLO training transform which includes tons of image augmentations.

Parameters
  • width (int) – Image width.

  • height (int) – Image height.

  • net (mxnet.gluon.HybridBlock, optional) –

    The yolo network.

    Hint

    If net is None, the transformation will not generate training targets. Otherwise it will generate training targets to accelerate the training phase since we push some workload to CPU workers instead of GPUs.

  • mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].

  • std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].

  • iou_thresh (float) – IOU overlap threshold for maximum matching, default is 0.5.

  • box_norm (array-like of size 4, default is (0.1, 0.1, 0.2, 0.2)) – Std value to be divided from encoded values.

class gluoncv.data.transforms.presets.yolo.YOLO3DefaultValTransform(width, height, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

Default YOLO validation transform.

Parameters
  • width (int) – Image width.

  • height (int) – Image height.

  • mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].

  • std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].

gluoncv.data.transforms.presets.yolo.load_test(filenames, short=416, max_size=1024, stride=1, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

A util function to load all images, transform them to tensor by applying normalizations. This function support 1 filename or list of filenames.

Parameters
  • filenames (str or list of str) – Image filename(s) to be loaded.

  • short (int, default=416) – Resize image short side to this short and keep aspect ratio. Note that yolo network

  • max_size (int, optional) – Maximum longer side length to fit image. This is to limit the input image shape. Aspect ratio is intact because we support arbitrary input size in our YOLO implementation.

  • stride (int, optional, default is 1) – The stride constraint due to precise alignment of bounding box prediction module. Image’s width and height must be multiples of stride. Use stride = 1 to relax this constraint.

  • mean (iterable of float) – Mean pixel values.

  • std (iterable of float) – Standard deviations of pixel values.

Returns

A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original un-normalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.

Return type

(mxnet.NDArray, numpy.ndarray) or list of such tuple

gluoncv.data.transforms.presets.yolo.transform_test(imgs, short=416, max_size=1024, stride=1, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

A util function to transform all images to tensors as network input by applying normalizations. This function support 1 NDArray or iterable of NDArrays.

Parameters
  • imgs (NDArray or iterable of NDArray) – Image(s) to be transformed.

  • short (int, default=416) – Resize image short side to this short and keep aspect ratio. Note that yolo network

  • max_size (int, optional) – Maximum longer side length to fit image. This is to limit the input image shape. Aspect ratio is intact because we support arbitrary input size in our YOLO implementation.

  • stride (int, optional, default is 1) – The stride constraint due to precise alignment of bounding box prediction module. Image’s width and height must be multiples of stride. Use stride = 1 to relax this constraint.

  • mean (iterable of float) – Mean pixel values.

  • std (iterable of float) – Standard deviations of pixel values.

Returns

A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original un-normalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.

Return type

(mxnet.NDArray, numpy.ndarray) or list of such tuple