gluoncv.data.transforms¶
This file includes various transformations that are critical to vision tasks.
Bounding Box Transforms¶
crop 
Crop bounding boxes according to slice area. 
flip 
Flip bounding boxes according to image flipping directions. 
resize 
Resize bouding boxes according to image resize operation. 
translate 
Translate bounding boxes by offsets. 
experimental.bbox.random_crop_with_constraints 
Crop an image randomly with bounding box constraints. 
Image Transforms¶
imresize 
Resize image with OpenCV. 
resize_long 
Resizes longer edge to size. 
resize_short_within 
Resizes shorter edge to size but make sure it’s capped at maximum size. 
random_pca_lighting 
Apply random pca lighting noise to input image. 
random_expand 
Random expand original image with borders, this is identical to placing the original image on a larger canvas. 
random_flip 
Randomly flip image along horizontal and vertical with probabilities. 
resize_contain 
Resize the image to fit in the given area while keeping aspect ratio. 
ten_crop 
Crop 10 regions from an array. 
Instance Segmentation Mask Transforms¶
flip 
Flip polygons according to image flipping directions. 
resize 
Resize polygons according to image resize operation. 
to_mask 
Convert list of polygons to full size binary mask 
fill 
Fill mask to full image size 
Preset Transforms¶
We include presets for reproducing SOTA performances described in different papers. This is a complimentary section and APIs are prone to changes.
Single Shot Multibox Object Detector¶
load_test 
A util function to load all images, transform them to tensor by applying normalizations. 
SSDDefaultTrainTransform 
Default SSD training transform which includes tons of image augmentations. 
SSDDefaultValTransform 
Default SSD validation transform. 
Faster RCNN¶
load_test 
A util function to load all images, transform them to tensor by applying normalizations. 
FasterRCNNDefaultTrainTransform 
Default FasterRCNN training transform. 
FasterRCNNDefaultValTransform 
Default FasterRCNN validation transform. 
Mask RCNN¶
load_test 
A util function to load all images, transform them to tensor by applying normalizations. 
MaskRCNNDefaultTrainTransform 
Default Mask RCNN training transform. 
MaskRCNNDefaultValTransform 
Default Mask RCNN validation transform. 
YOLO¶
load_test 
A util function to load all images, transform them to tensor by applying normalizations. 
YOLO3DefaultTrainTransform 
Default YOLO training transform which includes tons of image augmentations. 
YOLO3DefaultValTransform 
Default YOLO validation transform. 
API Reference¶
Bounding boxes transformation functions.

gluoncv.data.transforms.bbox.
crop
(bbox, crop_box=None, allow_outside_center=True)[source]¶ Crop bounding boxes according to slice area.
This method is mainly used with image cropping to ensure bonding boxes fit within the cropped image.
Parameters:  bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
 crop_box (tuple) – Tuple of length 4. \((x_{min}, y_{min}, width, height)\)
 allow_outside_center (bool) – If False, remove bounding boxes which have centers outside cropping area.
Returns: Cropped bounding boxes with shape (M, 4+) where M <= N.
Return type:

gluoncv.data.transforms.bbox.
flip
(bbox, size, flip_x=False, flip_y=False)[source]¶ Flip bounding boxes according to image flipping directions.
Parameters:  bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
 size (tuple) – Tuple of length 2: (width, height).
 flip_x (bool) – Whether flip horizontally.
 flip_y (type) – Whether flip vertically.
Returns: Flipped bounding boxes with original shape.
Return type:

gluoncv.data.transforms.bbox.
resize
(bbox, in_size, out_size)[source]¶ Resize bouding boxes according to image resize operation.
Parameters:  bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
 in_size (tuple) – Tuple of length 2: (width, height) for input.
 out_size (tuple) – Tuple of length 2: (width, height) for output.
Returns: Resized bounding boxes with original shape.
Return type:

gluoncv.data.transforms.bbox.
translate
(bbox, x_offset=0, y_offset=0)[source]¶ Translate bounding boxes by offsets.
Parameters:  bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
 x_offset (int or float) – Offset along x axis.
 y_offset (int or float) – Offset along y axis.
Returns: Translated bounding boxes with original shape.
Return type:
Addtional image transforms.

class
gluoncv.data.transforms.block.
RandomCrop
(size, pad=None, interpolation=2)[source]¶ Randomly crop src with size (width, height). Padding is optional. Upsample result if src is smaller than size.
Parameters:  size (int or tuple of (W, H)) – Size of the final output.
 pad (int or tuple) –
if int, size of the zeropadding if tuple, number of values padded to the edges of each axis.
((before_1, after_1), … (before_N, after_N)) unique pad widths for each axis. ((before, after),) yields same before and after pad for each axis. (pad,) or int is a shortcut for before = after = pad width for all axes.  interpolation (int) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices.
 Inputs:
 data: input tensor with (Hi x Wi x C) shape.
 Outputs:
 out: output tensor with ((H+2*pad) x (W+2*pad) x C) shape.
Extended image transformations to mxnet.image.

gluoncv.data.transforms.image.
imresize
(src, w, h, interp=1)[source]¶ Resize image with OpenCV.
This is a duplicate of mxnet.image.imresize for name space consistancy.
Parameters: Returns: out – The output of this function.
Return type: NDArray or list of NDArrays
Examples
>>> import mxnet as mx >>> from gluoncv import data as gdata >>> img = mx.random.uniform(0, 255, (300, 300, 3)).astype('uint8') >>> print(img.shape) (300, 300, 3) >>> img = gdata.transforms.image.imresize(img, 200, 200) >>> print(img.shape) (200, 200, 3)

gluoncv.data.transforms.image.
resize_long
(src, size, interp=2)[source]¶ Resizes longer edge to size. Note: resize_short uses OpenCV (not the CV2 Python library). MXNet must have been built with OpenCV for resize_short to work. Resizes the original image by setting the longer edge to size and setting the shorter edge accordingly. This will ensure the new image will fit into the size specified. Resizing function is called from OpenCV.
Parameters:  src (NDArray) – The original image.
 size (int) – The length to be set for the shorter edge.
 interp (int, optional, default=2) – Interpolation method used for resizing the image. Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Areabased (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moirefree results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 3: Bicubic interpolation over 4x4 pixel neighborhood. 4: Lanczos interpolation over 8x8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method metioned above. Note: When shrinking an image, it will generally look best with AREAbased interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK). More details can be found in the documentation of OpenCV, please refer to http://docs.opencv.org/master/da/d54/group__imgproc__transform.html.
Returns: An ‘NDArray’ containing the resized image.
Return type: NDArray
Example
>>> with open("flower.jpeg", 'rb') as fp: ... str_image = fp.read() ... >>> image = mx.img.imdecode(str_image) >>> image <NDArray 2321x3482x3 @cpu(0)> >>> size = 640 >>> new_image = mx.img.resize_long(image, size) >>> new_image <NDArray 386x640x3 @cpu(0)>

gluoncv.data.transforms.image.
resize_short_within
(src, short, max_size, mult_base=1, interp=2)[source]¶ Resizes shorter edge to size but make sure it’s capped at maximum size. Note: resize_short_within uses OpenCV (not the CV2 Python library). MXNet must have been built with OpenCV for resize_short_within to work. Resizes the original image by setting the shorter edge to size and setting the longer edge accordingly. Also this function will ensure the new image will not exceed
max_size
even at the longer side. Resizing function is called from OpenCV.Parameters:  src (NDArray) – The original image.
 short (int) – Resize shorter side to
short
.  max_size (int) – Make sure the longer side of new image is smaller than
max_size
.  mult_base (int, default is 1) – Width and height are rounded to multiples of mult_base.
 interp (int, optional, default=2) – Interpolation method used for resizing the image. Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Areabased (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moirefree results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 3: Bicubic interpolation over 4x4 pixel neighborhood. 4: Lanczos interpolation over 8x8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method metioned above. Note: When shrinking an image, it will generally look best with AREAbased interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK). More details can be found in the documentation of OpenCV, please refer to http://docs.opencv.org/master/da/d54/group__imgproc__transform.html.
Returns: An ‘NDArray’ containing the resized image.
Return type: NDArray
Example
>>> with open("flower.jpeg", 'rb') as fp: ... str_image = fp.read() ... >>> image = mx.img.imdecode(str_image) >>> image <NDArray 2321x3482x3 @cpu(0)> >>> new_image = resize_short_within(image, short=800, max_size=1000) >>> new_image <NDArray 667x1000x3 @cpu(0)> >>> new_image = resize_short_within(image, short=800, max_size=1200) >>> new_image <NDArray 800x1200x3 @cpu(0)> >>> new_image = resize_short_within(image, short=800, max_size=1200, mult_base=32) >>> new_image <NDArray 800x1184x3 @cpu(0)>

gluoncv.data.transforms.image.
random_pca_lighting
(src, alphastd, eigval=None, eigvec=None)[source]¶ Apply random pca lighting noise to input image.
Parameters:  img (mxnet.nd.NDArray) – Input image with HWC format.
 alphastd (float) – Noise level [0, 1) for image with range [0, 255].
 eigval (list of floats.) – Eigen values, defaults to [55.46, 4.794, 1.148].
 eigvec (nested lists of floats) –
Eigen vectors with shape (3, 3), defaults to [[0.5675, 0.7192, 0.4009],
[0.5808, 0.0045, 0.8140], [0.5836, 0.6948, 0.4203]].
Returns: Augmented image.
Return type: mxnet.nd.NDArray

gluoncv.data.transforms.image.
random_expand
(src, max_ratio=4, fill=0, keep_ratio=True)[source]¶ Random expand original image with borders, this is identical to placing the original image on a larger canvas.
Parameters:  src (mxnet.nd.NDArray) – The original image with HWC format.
 max_ratio (int or float) – Maximum ratio of the output image on both direction(vertical and horizontal)
 fill (int or float or arraylike) – The value(s) for padded borders. If fill is numerical type, RGB channels will be padded with single value. Otherwise fill must have same length as image channels, which resulted in padding with perchannel values.
 keep_ratio (bool) – If True, will keep output image the same aspect ratio as input.
Returns:  mxnet.nd.NDArray – Augmented image.
 tuple – Tuple of (offset_x, offset_y, new_width, new_height)

gluoncv.data.transforms.image.
random_flip
(src, px=0, py=0, copy=False)[source]¶ Randomly flip image along horizontal and vertical with probabilities.
Parameters: Returns:  mxnet.nd.NDArray – Augmented image.
 tuple – Tuple of (flip_x, flip_y), records of whether flips are applied.

gluoncv.data.transforms.image.
resize_contain
(src, size, fill=0)[source]¶ Resize the image to fit in the given area while keeping aspect ratio.
If both the height and the width in size are larger than the height and the width of input image, the image is placed on the center with an appropriate padding to match size. Otherwise, the input image is scaled to fit in a canvas whose size is size while preserving aspect ratio.
Parameters:  src (mxnet.nd.NDArray) – The original image with HWC format.
 size (tuple) – Tuple of length 2 as (width, height).
 fill (int or float or arraylike) – The value(s) for padded borders. If fill is numerical type, RGB channels will be padded with single value. Otherwise fill must have same length as image channels, which resulted in padding with perchannel values.
Returns:  mxnet.nd.NDArray – Augmented image.
 tuple – Tuple of (offset_x, offset_y, scaled_x, scaled_y)

gluoncv.data.transforms.image.
ten_crop
(src, size)[source]¶ Crop 10 regions from an array. This is performed same as: http://chainercv.readthedocs.io/en/stable/reference/transforms.html#tencrop
This method crops 10 regions. All regions will be in shape :obj`size`. These regions consist of 1 center crop and 4 corner crops and horizontal flips of them. The crops are ordered in this order. * center crop * topleft crop * bottomleft crop * topright crop * bottomright crop * center crop (flipped horizontally) * topleft crop (flipped horizontally) * bottomleft crop (flipped horizontally) * topright crop (flipped horizontally) * bottomright crop (flipped horizontally)
Parameters:  src (mxnet.nd.NDArray) – Input image.
 size (tuple) – Tuple of length 2, as (width, height) of the cropped areas.
Returns: The cropped images with shape (10, size[1], size[0], C)
Return type: mxnet.nd.NDArray
Experimental bounding box transformations.

gluoncv.data.transforms.experimental.bbox.
bbox_crop
(bbox, crop_box=None, allow_outside_center=True)¶ Crop bounding boxes according to slice area.
This method is mainly used with image cropping to ensure bonding boxes fit within the cropped image.
Parameters:  bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
 crop_box (tuple) – Tuple of length 4. \((x_{min}, y_{min}, width, height)\)
 allow_outside_center (bool) – If False, remove bounding boxes which have centers outside cropping area.
Returns: Cropped bounding boxes with shape (M, 4+) where M <= N.
Return type:

gluoncv.data.transforms.experimental.bbox.
bbox_iou
(bbox_a, bbox_b, offset=0)[source]¶ Calculate IntersectionOverUnion(IOU) of two bounding boxes.
Parameters:  bbox_a (numpy.ndarray) – An ndarray with shape \((N, 4)\).
 bbox_b (numpy.ndarray) – An ndarray with shape \((M, 4)\).
 offset (float or int, default is 0) – The
offset
is used to control the whether the width(or height) is computed as (right  left +offset
). Note that the offset must be 0 for normalized bboxes, whose ranges are in[0, 1]
.
Returns: An ndarray with shape \((N, M)\) indicates IOU between each pairs of bounding boxes in bbox_a and bbox_b.
Return type:

gluoncv.data.transforms.experimental.bbox.
random_crop_with_constraints
(bbox, size, min_scale=0.3, max_scale=1, max_aspect_ratio=2, constraints=None, max_trial=50)[source]¶ Crop an image randomly with bounding box constraints.
This data augmentation is used in training of Single Shot Multibox Detector [#]_. More details can be found in data augmentation section of the original paper. .. [#] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy,
Scott Reed, ChengYang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.Parameters:  bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
 size (tuple) – Tuple of length 2 of image shape as (width, height).
 min_scale (float) – The minimum ratio between a cropped region and the original image.
The default value is
0.3
.  max_scale (float) – The maximum ratio between a cropped region and the original image.
The default value is
1
.  max_aspect_ratio (float) – The maximum aspect ratio of cropped region.
The default value is
2
.  constraints (iterable of tuples) – An iterable of constraints.
Each constraint should be
(min_iou, max_iou)
format. If means no constraint if setmin_iou
ormax_iou
toNone
. If this argument defaults toNone
,((0.1, None), (0.3, None), (0.5, None), (0.7, None), (0.9, None), (None, 1))
will be used.  max_trial (int) – Maximum number of trials for each constraint before exit no matter what.
Returns:  numpy.ndarray – Cropped bounding boxes with shape
(M, 4+)
where M <= N.  tuple – Tuple of length 4 as (x_offset, y_offset, new_width, new_height).
Experimental image transformations.

gluoncv.data.transforms.experimental.image.
random_color_distort
(src, brightness_delta=32, contrast_low=0.5, contrast_high=1.5, saturation_low=0.5, saturation_high=1.5, hue_delta=18)[source]¶ Randomly distort image color space. Note that input image should in original range [0, 255].
Parameters:  src (mxnet.nd.NDArray) – Input image as HWC format.
 brightness_delta (int) – Maximum brightness delta. Defaults to 32.
 contrast_low (float) – Lowest contrast. Defaults to 0.5.
 contrast_high (float) – Highest contrast. Defaults to 1.5.
 saturation_low (float) – Lowest saturation. Defaults to 0.5.
 saturation_high (float) – Highest saturation. Defaults to 1.5.
 hue_delta (int) – Maximum hue delta. Defaults to 18.
Returns: Distorted image in HWC format.
Return type: mxnet.nd.NDArray
Transforms described in https://arxiv.org/abs/1512.02325.

gluoncv.data.transforms.presets.ssd.
load_test
(filenames, short, max_size=1024, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶ A util function to load all images, transform them to tensor by applying normalizations. This function support 1 filename or list of filenames.
Parameters:  filenames (str or list of str) – Image filename(s) to be loaded.
 short (int) – Resize image short side to this short and keep aspect ratio.
 max_size (int, optional) – Maximum longer side length to fit image. This is to limit the input image shape. Aspect ratio is intact because we support arbitrary input size in our SSD implementation.
 mean (iterable of float) – Mean pixel values.
 std (iterable of float) – Standard deviations of pixel values.
Returns: A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original unnormalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.
Return type: (mxnet.NDArray, numpy.ndarray) or list of such tuple

class
gluoncv.data.transforms.presets.ssd.
SSDDefaultTrainTransform
(width, height, anchors=None, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), iou_thresh=0.5, box_norm=(0.1, 0.1, 0.2, 0.2), **kwargs)[source]¶ Default SSD training transform which includes tons of image augmentations.
Parameters:  width (int) – Image width.
 height (int) – Image height.
 anchors (mxnet.nd.NDArray, optional) –
Anchors generated from SSD networks, the shape must be
(1, N, 4)
. Since anchors are shared in the entire batch so it is1
for the first dimension.N
is the number of anchors for each image.Hint
If anchors is
None
, the transformation will not generate training targets. Otherwise it will generate training targets to accelerate the training phase since we push some workload to CPU workers instead of GPUs.  mean (arraylike of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
 std (arraylike of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].
 iou_thresh (float) – IOU overlap threshold for maximum matching, default is 0.5.
 box_norm (arraylike of size 4, default is (0.1, 0.1, 0.2, 0.2)) – Std value to be divided from encoded values.

class
gluoncv.data.transforms.presets.ssd.
SSDDefaultValTransform
(width, height, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶ Default SSD validation transform.
Parameters:
Transforms for RCNN series.

gluoncv.data.transforms.presets.rcnn.
load_test
(filenames, short=600, max_size=1000, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶ A util function to load all images, transform them to tensor by applying normalizations. This function support 1 filename or list of filenames.
Parameters:  filenames (str or list of str) – Image filename(s) to be loaded.
 short (int, optional, default is 600) – Resize image short side to this short and keep aspect ratio.
 max_size (int, optional, default is 1000) – Maximum longer side length to fit image. This is to limit the input image shape, avoid processing too large image.
 mean (iterable of float) – Mean pixel values.
 std (iterable of float) – Standard deviations of pixel values.
Returns: A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original unnormalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.
Return type: (mxnet.NDArray, numpy.ndarray) or list of such tuple

class
gluoncv.data.transforms.presets.rcnn.
FasterRCNNDefaultTrainTransform
(short=600, max_size=1000, net=None, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), box_norm=(1.0, 1.0, 1.0, 1.0), num_sample=256, pos_iou_thresh=0.7, neg_iou_thresh=0.3, pos_ratio=0.5, **kwargs)[source]¶ Default FasterRCNN training transform.
Parameters:  short (int, default is 600) – Resize image shorter side to
short
.  max_size (int, default is 1000) – Make sure image longer side is smaller than
max_size
.  net (mxnet.gluon.HybridBlock, optional) –
The fasterrcnn network.
Hint
If net is
None
, the transformation will not generate training targets. Otherwise it will generate training targets to accelerate the training phase since we push some workload to CPU workers instead of GPUs.  mean (arraylike of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
 std (arraylike of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].
 box_norm (arraylike of size 4, default is (1., 1., 1., 1.)) – Std value to be divided from encoded values.
 num_sample (int, default is 256) – Number of samples for RPN targets.
 pos_iou_thresh (float, default is 0.7) – Anchors larger than
pos_iou_thresh
is regarded as positive samples.  neg_iou_thresh (float, default is 0.3) – Anchors smaller than
neg_iou_thresh
is regarded as negative samples. Anchors with IOU in betweenpos_iou_thresh
andneg_iou_thresh
are ignored.  pos_ratio (float, default is 0.5) –
pos_ratio
defines how many positive samples (pos_ratio * num_sample
) is to be sampled.
 short (int, default is 600) – Resize image shorter side to

class
gluoncv.data.transforms.presets.rcnn.
FasterRCNNDefaultValTransform
(short=600, max_size=1000, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶ Default FasterRCNN validation transform.
Parameters:  short (int, default is 600) – Resize image shorter side to
short
.  max_size (int, default is 1000) – Make sure image longer side is smaller than
max_size
.  mean (arraylike of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
 std (arraylike of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].
 short (int, default is 600) – Resize image shorter side to

class
gluoncv.data.transforms.presets.rcnn.
MaskRCNNDefaultTrainTransform
(short=600, max_size=1000, net=None, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), box_norm=(1.0, 1.0, 1.0, 1.0), num_sample=256, pos_iou_thresh=0.7, neg_iou_thresh=0.3, pos_ratio=0.5, **kwargs)[source]¶ Default Mask RCNN training transform.
Parameters:  short (int, default is 600) – Resize image shorter side to
short
.  max_size (int, default is 1000) – Make sure image longer side is smaller than
max_size
.  net (mxnet.gluon.HybridBlock, optional) –
The Mask RCNN network.
Hint
If net is
None
, the transformation will not generate training targets. Otherwise it will generate training targets to accelerate the training phase since we push some workload to CPU workers instead of GPUs.  mean (arraylike of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
 std (arraylike of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].
 box_norm (arraylike of size 4, default is (1., 1., 1., 1.)) – Std value to be divided from encoded values.
 num_sample (int, default is 256) – Number of samples for RPN targets.
 pos_iou_thresh (float, default is 0.7) – Anchors larger than
pos_iou_thresh
is regarded as positive samples.  neg_iou_thresh (float, default is 0.3) – Anchors smaller than
neg_iou_thresh
is regarded as negative samples. Anchors with IOU in betweenpos_iou_thresh
andneg_iou_thresh
are ignored.  pos_ratio (float, default is 0.5) –
pos_ratio
defines how many positive samples (pos_ratio * num_sample
) is to be sampled.
 short (int, default is 600) – Resize image shorter side to

class
gluoncv.data.transforms.presets.rcnn.
MaskRCNNDefaultValTransform
(short=600, max_size=1000, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶ Default Mask RCNN validation transform.
Parameters:  short (int, default is 600) – Resize image shorter side to
short
.  max_size (int, default is 1000) – Make sure image longer side is smaller than
max_size
.  mean (arraylike of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
 std (arraylike of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].
 short (int, default is 600) – Resize image shorter side to
Transforms for YOLO series.

gluoncv.data.transforms.presets.yolo.
load_test
(filenames, short=416, max_size=1024, stride=32, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶ A util function to load all images, transform them to tensor by applying normalizations. This function support 1 filename or list of filenames.
Parameters:  filenames (str or list of str) – Image filename(s) to be loaded.
 short (int, default=416) – Resize image short side to this short and keep aspect ratio. Note that yolo network
 max_size (int, optional) – Maximum longer side length to fit image. This is to limit the input image shape. Aspect ratio is intact because we support arbitrary input size in our YOLO implementation.
 stride (int, optinal, default is 32) – The stride constraint due to precised alignment of bounding box prediction module. Image’s width and height must be multiples of stride. Use stride = 1 to relax this constraint.
 mean (iterable of float) – Mean pixel values.
 std (iterable of float) – Standard deviations of pixel values.
Returns: A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original unnormalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.
Return type: (mxnet.NDArray, numpy.ndarray) or list of such tuple

class
gluoncv.data.transforms.presets.yolo.
YOLO3DefaultTrainTransform
(width, height, net=None, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), **kwargs)[source]¶ Default YOLO training transform which includes tons of image augmentations.
Parameters:  width (int) – Image width.
 height (int) – Image height.
 net (mxnet.gluon.HybridBlock, optional) –
The yolo network.
Hint
If net is
None
, the transformation will not generate training targets. Otherwise it will generate training targets to accelerate the training phase since we push some workload to CPU workers instead of GPUs.  mean (arraylike of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406].
 std (arraylike of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].
 iou_thresh (float) – IOU overlap threshold for maximum matching, default is 0.5.
 box_norm (arraylike of size 4, default is (0.1, 0.1, 0.2, 0.2)) – Std value to be divided from encoded values.