gluoncv.data.transforms

This file includes various transformations that are critical to vision tasks.

Bounding Box Transforms

crop Crop bounding boxes according to slice area.
flip Flip bounding boxes according to image flipping directions.
resize Resize bouding boxes according to image resize operation.
translate Translate bounding boxes by offsets.
experimental.bbox.random_crop_with_constraints Crop an image randomly with bounding box constraints.

Image Transforms

imresize Resize image with OpenCV.
resize_long Resizes longer edge to size.
resize_short_within Resizes shorter edge to size but make sure it’s capped at maximum size.
random_pca_lighting Apply random pca lighting noise to input image.
random_expand Random expand original image with borders, this is identical to placing the original image on a larger canvas.
random_flip Randomly flip image along horizontal and vertical with probabilities.
resize_contain Resize the image to fit in the given area while keeping aspect ratio.
ten_crop Crop 10 regions from an array.

Instance Segmentation Mask Transforms

flip Flip polygons according to image flipping directions.
resize Resize polygons according to image resize operation.
to_mask Convert list of polygons to full size binary mask
fill Fill mask to full image size

Preset Transforms

We include presets for reproducing SOTA performances described in different papers. This is a complimentary section and APIs are prone to changes.

Single Shot Multibox Object Detector

load_test A util function to load all images, transform them to tensor by applying normalizations.
SSDDefaultTrainTransform Default SSD training transform which includes tons of image augmentations.
SSDDefaultValTransform Default SSD validation transform.

Faster RCNN

load_test A util function to load all images, transform them to tensor by applying normalizations.
FasterRCNNDefaultTrainTransform Default Faster-RCNN training transform.
FasterRCNNDefaultValTransform Default Faster-RCNN validation transform.

Mask RCNN

load_test A util function to load all images, transform them to tensor by applying normalizations.
MaskRCNNDefaultTrainTransform Default Mask RCNN training transform.
MaskRCNNDefaultValTransform Default Mask RCNN validation transform.

YOLO

load_test A util function to load all images, transform them to tensor by applying normalizations.
YOLO3DefaultTrainTransform Default YOLO training transform which includes tons of image augmentations.
YOLO3DefaultValTransform Default YOLO validation transform.

API Reference

Bounding boxes transformation functions.

gluoncv.data.transforms.bbox.crop(bbox, crop_box=None, allow_outside_center=True)[source]

Crop bounding boxes according to slice area.

This method is mainly used with image cropping to ensure bonding boxes fit within the cropped image.

Parameters:
  • bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
  • crop_box (tuple) – Tuple of length 4. \((x_{min}, y_{min}, width, height)\)
  • allow_outside_center (bool) – If False, remove bounding boxes which have centers outside cropping area.
Returns:

Cropped bounding boxes with shape (M, 4+) where M <= N.

Return type:

numpy.ndarray

gluoncv.data.transforms.bbox.flip(bbox, size, flip_x=False, flip_y=False)[source]

Flip bounding boxes according to image flipping directions.

Parameters:
  • bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
  • size (tuple) – Tuple of length 2: (width, height).
  • flip_x (bool) – Whether flip horizontally.
  • flip_y (type) – Whether flip vertically.
Returns:

Flipped bounding boxes with original shape.

Return type:

numpy.ndarray

gluoncv.data.transforms.bbox.resize(bbox, in_size, out_size)[source]

Resize bouding boxes according to image resize operation.

Parameters:
  • bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
  • in_size (tuple) – Tuple of length 2: (width, height) for input.
  • out_size (tuple) – Tuple of length 2: (width, height) for output.
Returns:

Resized bounding boxes with original shape.

Return type:

numpy.ndarray

gluoncv.data.transforms.bbox.translate(bbox, x_offset=0, y_offset=0)[source]

Translate bounding boxes by offsets.

Parameters:
  • bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
  • x_offset (int or float) – Offset along x axis.
  • y_offset (int or float) – Offset along y axis.
Returns:

Translated bounding boxes with original shape.

Return type:

numpy.ndarray

Addtional image transforms.

class gluoncv.data.transforms.block.RandomCrop(size, pad=None, interpolation=2)[source]

Randomly crop src with size (width, height). Padding is optional. Upsample result if src is smaller than size.

Parameters:
  • size (int or tuple of (W, H)) – Size of the final output.
  • pad (int or tuple) –

    if int, size of the zero-padding if tuple, number of values padded to the edges of each axis.

    ((before_1, after_1), … (before_N, after_N)) unique pad widths for each axis. ((before, after),) yields same before and after pad for each axis. (pad,) or int is a shortcut for before = after = pad width for all axes.
  • interpolation (int) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices.
Inputs:
  • data: input tensor with (Hi x Wi x C) shape.
Outputs:
  • out: output tensor with ((H+2*pad) x (W+2*pad) x C) shape.
forward(x)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

Parameters:*args (list of NDArray) – Input tensors.

Extended image transformations to mxnet.image.

gluoncv.data.transforms.image.imresize(src, w, h, interp=1)[source]

Resize image with OpenCV.

This is a duplicate of mxnet.image.imresize for name space consistancy.

Parameters:
  • src (mxnet.nd.NDArray) – source image
  • w (int, required) – Width of resized image.
  • h (int, required) – Height of resized image.
  • interp (int, optional, default='1') – Interpolation method (default=cv2.INTER_LINEAR).
  • out (NDArray, optional) – The output NDArray to hold the result.
Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays

Examples

>>> import mxnet as mx
>>> from gluoncv import data as gdata
>>> img = mx.random.uniform(0, 255, (300, 300, 3)).astype('uint8')
>>> print(img.shape)
(300, 300, 3)
>>> img = gdata.transforms.image.imresize(img, 200, 200)
>>> print(img.shape)
(200, 200, 3)
gluoncv.data.transforms.image.resize_long(src, size, interp=2)[source]

Resizes longer edge to size. Note: resize_short uses OpenCV (not the CV2 Python library). MXNet must have been built with OpenCV for resize_short to work. Resizes the original image by setting the longer edge to size and setting the shorter edge accordingly. This will ensure the new image will fit into the size specified. Resizing function is called from OpenCV.

Parameters:
  • src (NDArray) – The original image.
  • size (int) – The length to be set for the shorter edge.
  • interp (int, optional, default=2) – Interpolation method used for resizing the image. Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Area-based (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moire-free results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 3: Bicubic interpolation over 4x4 pixel neighborhood. 4: Lanczos interpolation over 8x8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method metioned above. Note: When shrinking an image, it will generally look best with AREA-based interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK). More details can be found in the documentation of OpenCV, please refer to http://docs.opencv.org/master/da/d54/group__imgproc__transform.html.
Returns:

An ‘NDArray’ containing the resized image.

Return type:

NDArray

Example

>>> with open("flower.jpeg", 'rb') as fp:
...     str_image = fp.read()
...
>>> image = mx.img.imdecode(str_image)
>>> image
<NDArray 2321x3482x3 @cpu(0)>
>>> size = 640
>>> new_image = mx.img.resize_long(image, size)
>>> new_image
<NDArray 386x640x3 @cpu(0)>
gluoncv.data.transforms.image.resize_short_within(src, short, max_size, mult_base=1, interp=2)[source]

Resizes shorter edge to size but make sure it’s capped at maximum size. Note: resize_short_within uses OpenCV (not the CV2 Python library). MXNet must have been built with OpenCV for resize_short_within to work. Resizes the original image by setting the shorter edge to size and setting the longer edge accordingly. Also this function will ensure the new image will not exceed max_size even at the longer side. Resizing function is called from OpenCV.

Parameters:
  • src (NDArray) – The original image.
  • short (int) – Resize shorter side to short.
  • max_size (int) – Make sure the longer side of new image is smaller than max_size.
  • mult_base (int, default is 1) – Width and height are rounded to multiples of mult_base.
  • interp (int, optional, default=2) – Interpolation method used for resizing the image. Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Area-based (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moire-free results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 3: Bicubic interpolation over 4x4 pixel neighborhood. 4: Lanczos interpolation over 8x8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method metioned above. Note: When shrinking an image, it will generally look best with AREA-based interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK). More details can be found in the documentation of OpenCV, please refer to http://docs.opencv.org/master/da/d54/group__imgproc__transform.html.
Returns:

An ‘NDArray’ containing the resized image.

Return type:

NDArray

Example

>>> with open("flower.jpeg", 'rb') as fp:
...     str_image = fp.read()
...
>>> image = mx.img.imdecode(str_image)
>>> image
<NDArray 2321x3482x3 @cpu(0)>
>>> new_image = resize_short_within(image, short=800, max_size=1000)
>>> new_image
<NDArray 667x1000x3 @cpu(0)>
>>> new_image = resize_short_within(image, short=800, max_size=1200)
>>> new_image
<NDArray 800x1200x3 @cpu(0)>
>>> new_image = resize_short_within(image, short=800, max_size=1200, mult_base=32)
>>> new_image
<NDArray 800x1184x3 @cpu(0)>
gluoncv.data.transforms.image.random_pca_lighting(src, alphastd, eigval=None, eigvec=None)[source]

Apply random pca lighting noise to input image.

Parameters:
  • img (mxnet.nd.NDArray) – Input image with HWC format.
  • alphastd (float) – Noise level [0, 1) for image with range [0, 255].
  • eigval (list of floats.) – Eigen values, defaults to [55.46, 4.794, 1.148].
  • eigvec (nested lists of floats) –

    Eigen vectors with shape (3, 3), defaults to [[-0.5675, 0.7192, 0.4009],

    [-0.5808, -0.0045, -0.8140], [-0.5836, -0.6948, 0.4203]].
Returns:

Augmented image.

Return type:

mxnet.nd.NDArray

gluoncv.data.transforms.image.random_expand(src, max_ratio=4, fill=0, keep_ratio=True)[source]

Random expand original image with borders, this is identical to placing the original image on a larger canvas.

Parameters:
  • src (mxnet.nd.NDArray) – The original image with HWC format.
  • max_ratio (int or float) – Maximum ratio of the output image on both direction(vertical and horizontal)
  • fill (int or float or array-like) – The value(s) for padded borders. If fill is numerical type, RGB channels will be padded with single value. Otherwise fill must have same length as image channels, which resulted in padding with per-channel values.
  • keep_ratio (bool) – If True, will keep output image the same aspect ratio as input.
Returns:

  • mxnet.nd.NDArray – Augmented image.
  • tuple – Tuple of (offset_x, offset_y, new_width, new_height)

gluoncv.data.transforms.image.random_flip(src, px=0, py=0, copy=False)[source]

Randomly flip image along horizontal and vertical with probabilities.

Parameters:
  • src (mxnet.nd.NDArray) – Input image with HWC format.
  • px (float) – Horizontal flip probability [0, 1].
  • py (float) – Vertical flip probability [0, 1].
  • copy (bool) – If True, return a copy of input
Returns:

  • mxnet.nd.NDArray – Augmented image.
  • tuple – Tuple of (flip_x, flip_y), records of whether flips are applied.

gluoncv.data.transforms.image.resize_contain(src, size, fill=0)[source]

Resize the image to fit in the given area while keeping aspect ratio.

If both the height and the width in size are larger than the height and the width of input image, the image is placed on the center with an appropriate padding to match size. Otherwise, the input image is scaled to fit in a canvas whose size is size while preserving aspect ratio.

Parameters:
  • src (mxnet.nd.NDArray) – The original image with HWC format.
  • size (tuple) – Tuple of length 2 as (width, height).
  • fill (int or float or array-like) – The value(s) for padded borders. If fill is numerical type, RGB channels will be padded with single value. Otherwise fill must have same length as image channels, which resulted in padding with per-channel values.
Returns:

  • mxnet.nd.NDArray – Augmented image.
  • tuple – Tuple of (offset_x, offset_y, scaled_x, scaled_y)

gluoncv.data.transforms.image.ten_crop(src, size)[source]

Crop 10 regions from an array. This is performed same as: http://chainercv.readthedocs.io/en/stable/reference/transforms.html#ten-crop

This method crops 10 regions. All regions will be in shape :obj`size`. These regions consist of 1 center crop and 4 corner crops and horizontal flips of them. The crops are ordered in this order. * center crop * top-left crop * bottom-left crop * top-right crop * bottom-right crop * center crop (flipped horizontally) * top-left crop (flipped horizontally) * bottom-left crop (flipped horizontally) * top-right crop (flipped horizontally) * bottom-right crop (flipped horizontally)

Parameters:
  • src (mxnet.nd.NDArray) – Input image.
  • size (tuple) – Tuple of length 2, as (width, height) of the cropped areas.
Returns:

The cropped images with shape (10, size[1], size[0], C)

Return type:

mxnet.nd.NDArray

Experimental bounding box transformations.

gluoncv.data.transforms.experimental.bbox.bbox_crop(bbox, crop_box=None, allow_outside_center=True)

Crop bounding boxes according to slice area.

This method is mainly used with image cropping to ensure bonding boxes fit within the cropped image.

Parameters:
  • bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
  • crop_box (tuple) – Tuple of length 4. \((x_{min}, y_{min}, width, height)\)
  • allow_outside_center (bool) – If False, remove bounding boxes which have centers outside cropping area.
Returns:

Cropped bounding boxes with shape (M, 4+) where M <= N.

Return type:

numpy.ndarray

gluoncv.data.transforms.experimental.bbox.bbox_iou(bbox_a, bbox_b, offset=0)[source]

Calculate Intersection-Over-Union(IOU) of two bounding boxes.

Parameters:
  • bbox_a (numpy.ndarray) – An ndarray with shape \((N, 4)\).
  • bbox_b (numpy.ndarray) – An ndarray with shape \((M, 4)\).
  • offset (float or int, default is 0) – The offset is used to control the whether the width(or height) is computed as (right - left + offset). Note that the offset must be 0 for normalized bboxes, whose ranges are in [0, 1].
Returns:

An ndarray with shape \((N, M)\) indicates IOU between each pairs of bounding boxes in bbox_a and bbox_b.

Return type:

numpy.ndarray

gluoncv.data.transforms.experimental.bbox.random_crop_with_constraints(bbox, size, min_scale=0.3, max_scale=1, max_aspect_ratio=2, constraints=None, max_trial=50)[source]

Crop an image randomly with bounding box constraints.

This data augmentation is used in training of Single Shot Multibox Detector [#]_. More details can be found in data augmentation section of the original paper. .. [#] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy,

Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.
Parameters:
  • bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
  • size (tuple) – Tuple of length 2 of image shape as (width, height).
  • min_scale (float) – The minimum ratio between a cropped region and the original image. The default value is 0.3.
  • max_scale (float) – The maximum ratio between a cropped region and the original image. The default value is 1.
  • max_aspect_ratio (float) – The maximum aspect ratio of cropped region. The default value is 2.
  • constraints (iterable of tuples) – An iterable of constraints. Each constraint should be (min_iou, max_iou) format. If means no constraint if set min_iou or max_iou to None. If this argument defaults to None, ((0.1, None), (0.3, None), (0.5, None), (0.7, None), (0.9, None), (None, 1)) will be used.
  • max_trial (int) – Maximum number of trials for each constraint before exit no matter what.
Returns:

  • numpy.ndarray – Cropped bounding boxes with shape (M, 4+) where M <= N.
  • tuple – Tuple of length 4 as (x_offset, y_offset, new_width, new_height).

Experimental image transformations.

gluoncv.data.transforms.experimental.image.random_color_distort(src, brightness_delta=32, contrast_low=0.5, contrast_high=1.5, saturation_low=0.5, saturation_high=1.5, hue_delta=18)[source]

Randomly distort image color space. Note that input image should in original range [0, 255].

Parameters:
  • src (mxnet.nd.NDArray) – Input image as HWC format.
  • brightness_delta (int) – Maximum brightness delta. Defaults to 32.
  • contrast_low (float) – Lowest contrast. Defaults to 0.5.
  • contrast_high (float) – Highest contrast. Defaults to 1.5.
  • saturation_low (float) – Lowest saturation. Defaults to 0.5.
  • saturation_high (float) – Highest saturation. Defaults to 1.5.
  • hue_delta (int) – Maximum hue delta. Defaults to 18.
Returns:

Distorted image in HWC format.

Return type:

mxnet.nd.NDArray

Transforms described in https://arxiv.org/abs/1512.02325.

gluoncv.data.transforms.presets.ssd.load_test(filenames, short, max_size=1024, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

A util function to load all images, transform them to tensor by applying normalizations. This function support 1 filename or list of filenames.

Parameters:
  • filenames (str or list of str) – Image filename(s) to be loaded.
  • short (int) – Resize image short side to this short and keep aspect ratio.
  • max_size (int, optional) – Maximum longer side length to fit image. This is to limit the input image shape. Aspect ratio is intact because we support arbitrary input size in our SSD implementation.
  • mean (iterable of float) – Mean pixel values.
  • std (iterable of float) – Standard deviations of pixel values.
Returns:

A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original un-normalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.

Return type:

(mxnet.NDArray, numpy.ndarray) or list of such tuple

class gluoncv.data.transforms.presets.ssd.SSDDefaultTrainTransform(width, height, anchors=None, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), iou_thresh=0.5, box_norm=(0.1, 0.1, 0.2, 0.2), **kwargs)[source]

Default SSD training transform which includes tons of image augmentations.

Parameters:
  • width (int) – Image width.
  • height (int) – Image height.
  • anchors (mxnet.nd.NDArray, optional) –

    Anchors generated from SSD networks, the shape must be (1, N, 4). Since anchors are shared in the entire batch so it is 1 for the first dimension. N is the number of anchors for each image.

    Hint

    If anchors is None, the transformation will not generate training targets. Otherwise it will generate training targets to accelerate the training phase since we push some workload to CPU workers instead of GPUs.

  • mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
  • std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].
  • iou_thresh (float) – IOU overlap threshold for maximum matching, default is 0.5.
  • box_norm (array-like of size 4, default is (0.1, 0.1, 0.2, 0.2)) – Std value to be divided from encoded values.
class gluoncv.data.transforms.presets.ssd.SSDDefaultValTransform(width, height, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

Default SSD validation transform.

Parameters:
  • width (int) – Image width.
  • height (int) – Image height.
  • mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
  • std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].

Transforms for RCNN series.

gluoncv.data.transforms.presets.rcnn.load_test(filenames, short=600, max_size=1000, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

A util function to load all images, transform them to tensor by applying normalizations. This function support 1 filename or list of filenames.

Parameters:
  • filenames (str or list of str) – Image filename(s) to be loaded.
  • short (int, optional, default is 600) – Resize image short side to this short and keep aspect ratio.
  • max_size (int, optional, default is 1000) – Maximum longer side length to fit image. This is to limit the input image shape, avoid processing too large image.
  • mean (iterable of float) – Mean pixel values.
  • std (iterable of float) – Standard deviations of pixel values.
Returns:

A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original un-normalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.

Return type:

(mxnet.NDArray, numpy.ndarray) or list of such tuple

class gluoncv.data.transforms.presets.rcnn.FasterRCNNDefaultTrainTransform(short=600, max_size=1000, net=None, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), box_norm=(1.0, 1.0, 1.0, 1.0), num_sample=256, pos_iou_thresh=0.7, neg_iou_thresh=0.3, pos_ratio=0.5, **kwargs)[source]

Default Faster-RCNN training transform.

Parameters:
  • short (int, default is 600) – Resize image shorter side to short.
  • max_size (int, default is 1000) – Make sure image longer side is smaller than max_size.
  • net (mxnet.gluon.HybridBlock, optional) –

    The faster-rcnn network.

    Hint

    If net is None, the transformation will not generate training targets. Otherwise it will generate training targets to accelerate the training phase since we push some workload to CPU workers instead of GPUs.

  • mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
  • std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].
  • box_norm (array-like of size 4, default is (1., 1., 1., 1.)) – Std value to be divided from encoded values.
  • num_sample (int, default is 256) – Number of samples for RPN targets.
  • pos_iou_thresh (float, default is 0.7) – Anchors larger than pos_iou_thresh is regarded as positive samples.
  • neg_iou_thresh (float, default is 0.3) – Anchors smaller than neg_iou_thresh is regarded as negative samples. Anchors with IOU in between pos_iou_thresh and neg_iou_thresh are ignored.
  • pos_ratio (float, default is 0.5) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.
class gluoncv.data.transforms.presets.rcnn.FasterRCNNDefaultValTransform(short=600, max_size=1000, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

Default Faster-RCNN validation transform.

Parameters:
  • short (int, default is 600) – Resize image shorter side to short.
  • max_size (int, default is 1000) – Make sure image longer side is smaller than max_size.
  • mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
  • std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].
class gluoncv.data.transforms.presets.rcnn.MaskRCNNDefaultTrainTransform(short=600, max_size=1000, net=None, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), box_norm=(1.0, 1.0, 1.0, 1.0), num_sample=256, pos_iou_thresh=0.7, neg_iou_thresh=0.3, pos_ratio=0.5, **kwargs)[source]

Default Mask RCNN training transform.

Parameters:
  • short (int, default is 600) – Resize image shorter side to short.
  • max_size (int, default is 1000) – Make sure image longer side is smaller than max_size.
  • net (mxnet.gluon.HybridBlock, optional) –

    The Mask R-CNN network.

    Hint

    If net is None, the transformation will not generate training targets. Otherwise it will generate training targets to accelerate the training phase since we push some workload to CPU workers instead of GPUs.

  • mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
  • std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].
  • box_norm (array-like of size 4, default is (1., 1., 1., 1.)) – Std value to be divided from encoded values.
  • num_sample (int, default is 256) – Number of samples for RPN targets.
  • pos_iou_thresh (float, default is 0.7) – Anchors larger than pos_iou_thresh is regarded as positive samples.
  • neg_iou_thresh (float, default is 0.3) – Anchors smaller than neg_iou_thresh is regarded as negative samples. Anchors with IOU in between pos_iou_thresh and neg_iou_thresh are ignored.
  • pos_ratio (float, default is 0.5) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.
class gluoncv.data.transforms.presets.rcnn.MaskRCNNDefaultValTransform(short=600, max_size=1000, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

Default Mask RCNN validation transform.

Parameters:
  • short (int, default is 600) – Resize image shorter side to short.
  • max_size (int, default is 1000) – Make sure image longer side is smaller than max_size.
  • mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
  • std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].

Transforms for YOLO series.

gluoncv.data.transforms.presets.yolo.load_test(filenames, short=416, max_size=1024, stride=32, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

A util function to load all images, transform them to tensor by applying normalizations. This function support 1 filename or list of filenames.

Parameters:
  • filenames (str or list of str) – Image filename(s) to be loaded.
  • short (int, default=416) – Resize image short side to this short and keep aspect ratio. Note that yolo network
  • max_size (int, optional) – Maximum longer side length to fit image. This is to limit the input image shape. Aspect ratio is intact because we support arbitrary input size in our YOLO implementation.
  • stride (int, optinal, default is 32) – The stride constraint due to precised alignment of bounding box prediction module. Image’s width and height must be multiples of stride. Use stride = 1 to relax this constraint.
  • mean (iterable of float) – Mean pixel values.
  • std (iterable of float) – Standard deviations of pixel values.
Returns:

A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original un-normalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.

Return type:

(mxnet.NDArray, numpy.ndarray) or list of such tuple

class gluoncv.data.transforms.presets.yolo.YOLO3DefaultTrainTransform(width, height, net=None, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), **kwargs)[source]

Default YOLO training transform which includes tons of image augmentations.

Parameters:
  • width (int) – Image width.
  • height (int) – Image height.
  • net (mxnet.gluon.HybridBlock, optional) –

    The yolo network.

    Hint

    If net is None, the transformation will not generate training targets. Otherwise it will generate training targets to accelerate the training phase since we push some workload to CPU workers instead of GPUs.

  • mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
  • std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].
  • iou_thresh (float) – IOU overlap threshold for maximum matching, default is 0.5.
  • box_norm (array-like of size 4, default is (0.1, 0.1, 0.2, 0.2)) – Std value to be divided from encoded values.
class gluoncv.data.transforms.presets.yolo.YOLO3DefaultValTransform(width, height, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]

Default YOLO validation transform.

Parameters:
  • width (int) – Image width.
  • height (int) – Image height.
  • mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
  • std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].