gluoncv.nn¶
Neural Network Components.
Hint
Not every component listed here is HybridBlock, which means some of them are not hybridizable. However, we are trying our best to make sure components required during inference are hybridizable so the entire network can be exported and run in other languages.
For example, encoders are usually nonhybridizable but are only required during training. In contrast, decoders are mostly `HybridBlock`s.
Bounding Box¶
Blocks that apply bounding box related functions.
BBoxCornerToCenter 
Convert corner boxes to center boxes. 
BBoxCenterToCorner 
Convert center boxes to corner boxes. 
BBoxSplit 
Split bounding boxes into 4 columns. 
BBoxArea 
Calculate the area of bounding boxes. 
Coders¶
Encoders are used to encode training targets before we apply loss functions. Decoders are used to restore predicted values by inverting the operations done in encoders. They often come as a pair in order to make the results consistent.
NormalizedBoxCenterEncoder 
Encode bounding boxes training target with normalized center offsets. 
NormalizedBoxCenterDecoder 
Decode bounding boxes training target with normalized center offsets. 
MultiClassEncoder 
Encode classification training target given matching results. 
MultiClassDecoder 
Decode classification results. 
MultiPerClassDecoder 
Decode classification results. 
SigmoidClassEncoder 
Encode class prediction labels for SigmoidCrossEntropy Loss. 
Feature¶
Feature layers are components that either extract partial networks as feature extractor or extend them with new layers.
FeatureExtractor 
Feature extractor. 
FeatureExpander 
Feature extractor with additional layers to append. 
Matchers¶
Matchers are often used by object detection tasks whose target is to find the matchings between anchor boxes(very popular in object detection) and ground truths.
CompositeMatcher 
A Matcher that combines multiple strategies. 
BipartiteMatcher 
A Matcher implementing bipartite matching strategy. 
MaximumMatcher 
A Matcher implementing maximum matching strategy. 
Predictors¶
Predictors are common neural network components which are specifically used to predict values. Depending on the purpose, it may vary from Convolution or Fully Connected.
ConvPredictor 
Convolutional predictor. 
FCPredictor 
Fully connected predictor. 
Samplers¶
Samples are often used after matching layers which is to determine positive/negative/ignored samples.
For example, a NaiveSampler simply returns all matched samples as positive, and all unmatched samples as negative.
This behavior is sometimes prone to vulnerability because training objective is not balanced. Please see OHEMSampler and QuotaSampler for more advanced sampling strategies.
NaiveSampler 
A naive sampler that take all existing matching results. 
OHEMSampler 
A sampler implementing Online Hardnegative mining. 
QuotaSampler 
Sampler that handles limited quota for positive and negative samples. 
API Reference¶
Bounding boxes operators

class
gluoncv.nn.bbox.
BBoxArea
(axis=1, fmt='corner', **kwargs)[source]¶ Calculate the area of bounding boxes.
Parameters: Returns: Return type: A BxNx1 NDArray

class
gluoncv.nn.bbox.
BBoxBatchIOU
(axis=1, fmt='corner', offset=0, eps=1e15, **kwargs)[source]¶ Batch Bounding Box IOU.
Parameters:  axis (int) – On which axis is the length4 bounding box dimension.
 fmt (str) – BBox encoding format, can be ‘corner’ or ‘center’. ‘corner’: (xmin, ymin, xmax, ymax) ‘center’: (center_x, center_y, width, height)
 offset (float, default is 0) – Offset is used if +1 is desired for computing width and height, otherwise use 0.
 eps (float, default is 1e15) – Very small number to avoid division by 0.

class
gluoncv.nn.bbox.
BBoxCenterToCorner
(axis=1, split=False)[source]¶ Convert center boxes to corner boxes. Corner boxes are encoded as (xmin, ymin, xmax, ymax) Center boxes are encoded as (center_x, center_y, width, height)
Parameters: Returns: Return type: A BxNx4 NDArray if split is False, or 4 BxNx1 NDArray if split is True.

class
gluoncv.nn.bbox.
BBoxClipToImage
(**kwargs)[source]¶ Clip bounding box coordinates to image boundaries. If multiple images are supplied and padded, must have additional inputs of accurate image shape.

class
gluoncv.nn.bbox.
BBoxCornerToCenter
(axis=1, split=False)[source]¶ Convert corner boxes to center boxes. Corner boxes are encoded as (xmin, ymin, xmax, ymax) Center boxes are encoded as (center_x, center_y, width, height)
Parameters: Returns: Return type: A BxNx4 NDArray if split is False, or 4 BxNx1 NDArray if split is True

class
gluoncv.nn.bbox.
BBoxSplit
(axis, squeeze_axis=False, **kwargs)[source]¶ Split bounding boxes into 4 columns.
Parameters:  axis (int, default is 1) – On which axis to split the bounding box. Default is 1(the last dimension).
 squeeze_axis (boolean, default is False) – If true, Removes the axis with length 1 from the shapes of the output arrays.
Note that setting squeeze_axis to
true
removes axis with length 1 only along the axis which it is split. Also squeeze_axis can be set totrue
only ifinput.shape[axis] == num_outputs
.
Encoder and Decoder functions. Encoders are used during training, which assign training targets. Decoders are used during testing/validation, which convert predictions back to normal boxes, etc.

class
gluoncv.nn.coder.
MultiClassDecoder
(axis=1, thresh=0.01)[source]¶ Decode classification results.
This decoder must work with MultiClassEncoder to reconstruct valid labels. The decoder expect results are after logits, e.g. Softmax.
Parameters:

class
gluoncv.nn.coder.
MultiClassEncoder
(ignore_label=1)[source]¶ Encode classification training target given matching results.
This encoder will assign training target of matched bounding boxes to groundtruth label + 1 and negative samples with label 0. Ignored samples will be assigned with ignore_label, whose default is 1.
Parameters: ignore_label (float) – Assigned to unmatched samples, they are neither positive or negative during training, and should be excluded in loss function. Default is 1. 
hybrid_forward
(F, samples, matches, refs)[source]¶ HybridBlock, handle multi batch correctly
Parameters:  samples ((B, N), value +1 (positive), 1 (negative), 0 (ignore)) –
 matches ((B, N), value range [0, M)) –
 refs ((B, M), value range [0, num_fg_class), excluding background) –
Returns: targets
Return type: (B, N), value range [0, num_fg_class + 1), including background


class
gluoncv.nn.coder.
MultiPerClassDecoder
(num_class, axis=1, thresh=0.01)[source]¶ Decode classification results.
This decoder must work with MultiClassEncoder to reconstruct valid labels. The decoder expect results are after logits, e.g. Softmax. This version is different from
gluoncv.nn.coder.MultiClassDecoder
with the following changes:For each position(anchor boxes), each foreground class can have their own results, rather than enforced to be the best one. For example, for a 5class prediction with background(totaling 6 class), say (0.5, 0.1, 0.2, 0.1, 0.05, 0.05) as (bg, apple, orange, peach, grape, melon), MultiClassDecoder produce only one class id and score, that is (orange0.2). MultiPerClassDecoder produce 5 results individually: (apple0.1, orange0.2, peach0.1, grape0.05, melon0.05).
Parameters:

class
gluoncv.nn.coder.
NormalizedBoxCenterDecoder
(stds=(0.1, 0.1, 0.2, 0.2), means=(0.0, 0.0, 0.0, 0.0), convert_anchor=False, clip=None)[source]¶ Decode bounding boxes training target with normalized center offsets. This decoder must cooperate with NormalizedBoxCenterEncoder of same stds in order to get properly reconstructed bounding boxes.
Returned bounding boxes are using corner type: x_{min}, y_{min}, x_{max}, y_{max}.
Parameters:  stds (arraylike of size 4) – Std value to be divided from encoded values, default is (0.1, 0.1, 0.2, 0.2).
 means (arraylike of size 4) – Mean value to be subtracted from encoded values, default is (0., 0., 0., 0.).
 clip (float, default is None) – If given, bounding box target will be clipped to this value.

class
gluoncv.nn.coder.
NormalizedBoxCenterEncoder
(stds=(0.1, 0.1, 0.2, 0.2), means=(0.0, 0.0, 0.0, 0.0))[source]¶ Encode bounding boxes training target with normalized center offsets.
Input bounding boxes are using corner type: x_{min}, y_{min}, x_{max}, y_{max}.
Parameters:  stds (arraylike of size 4) – Std value to be divided from encoded values, default is (0.1, 0.1, 0.2, 0.2).
 means (arraylike of size 4) – Mean value to be subtracted from encoded values, default is (0., 0., 0., 0.).

forward
(samples, matches, anchors, refs)[source]¶ Not HybridBlock due to use of matches.shape
Parameters:  samples ((B, N) value +1 (positive), 1 (negative), 0 (ignore)) –
 matches ((B, N) value range [0, M)) –
 anchors ((B, N, 4) encoded in corner) –
 refs ((B, M, 4) encoded in corner) –
Returns:  targets ((B, N, 4) transform anchors to refs picked according to matches)
 masks ((B, N, 4) only positive anchors has targets)

class
gluoncv.nn.coder.
NormalizedPerClassBoxCenterEncoder
(num_class, stds=(0.1, 0.1, 0.2, 0.2), means=(0.0, 0.0, 0.0, 0.0))[source]¶ Encode bounding boxes training target with normalized center offsets.
Input bounding boxes are using corner type: x_{min}, y_{min}, x_{max}, y_{max}.
Parameters:  stds (arraylike of size 4) – Std value to be divided from encoded values, default is (0.1, 0.1, 0.2, 0.2).
 means (arraylike of size 4) – Mean value to be subtracted from encoded values, default is (0., 0., 0., 0.).

forward
(samples, matches, anchors, labels, refs)[source]¶ Encode BBox One entry per category
Parameters:  samples ((B, N) value +1 (positive), 1 (negative), 0 (ignore)) –
 matches ((B, N) value range [0, M)) –
 anchors ((B, N, 4) encoded in corner) –
 labels ((B, N) value range [0, self._num_class), excluding background) –
 refs ((B, M, 4) encoded in corner) –
Returns:  targets ((C, B, N, 4) transform anchors to refs picked according to matches)
 masks ((C, B, N, 4) only positive anchors of the correct class has targets)

class
gluoncv.nn.coder.
SigmoidClassEncoder
(**kwargs)[source]¶ Encode class prediction labels for SigmoidCrossEntropy Loss.

hybrid_forward
(F, samples)[source]¶ Encode class prediction labels for SigmoidCrossEntropy Loss.
Parameters: samples (mxnet.nd.NDArray or mxnet.sym.Symbol) – Sampling results with shape (B, N), 1:pos, 0:ignore, 1:negative Returns: (target, mask) target is the output label with shape (B, N), 1: pos, 0: negative, 1: ignore mask is the mask for label, 1(ignore) labels have mask 0, otherwise mask is 1. Return type: (mxnet.nd.NDArray, mxnet.nd.NDArray)

Feature extraction blocks. Feature or MultiFeature extraction is a key component in object detection. Class predictor/Box predictor are usually applied on feature layer(s). A good feature extraction mechanism is critical to performance.

class
gluoncv.nn.feature.
FeatureExpander
(network, outputs, num_filters, use_1x1_transition=True, use_bn=True, reduce_ratio=1.0, min_depth=128, global_pool=False, pretrained=False, ctx=cpu(0), inputs=('data', ), **kwargs)[source]¶ Feature extractor with additional layers to append. This is very common in vision networks where extra branches are attached to backbone network.
Parameters:  network (str or HybridBlock or Symbol) – Logic chain: load from gluoncv.model_zoo if network is string. Convert to Symbol if network is HybridBlock.
 outputs (str or list of str) – The name of layers to be extracted as features
 num_filters (list of int) – Number of filters to be appended.
 use_1x1_transition (bool) – Whether to use 1x1 convolution between attached layers. It is effective reducing network size.
 use_bn (bool) – Whether to use BatchNorm between attached layers.
 reduce_ratio (float) – Channel reduction ratio of the transition layers.
 min_depth (int) – Minimum channel number of transition layers.
 global_pool (bool) – Whether to use global pooling as the last layer.
 pretrained (bool) – Use pretrained parameters as in gluon.model_zoo if True.
 ctx (Context) – The context, e.g. mxnet.cpu(), mxnet.gpu(0).
 inputs (list of str) – Name of input variables to the network.

class
gluoncv.nn.feature.
FeatureExtractor
(network, outputs, inputs=('data', ), pretrained=False, ctx=cpu(0), **kwargs)[source]¶ Feature extractor.
Parameters:  network (str or HybridBlock or Symbol) – Logic chain: load from gluoncv.model_zoo if network is string. Convert to Symbol if network is HybridBlock
 outputs (str or list of str) – The name of layers to be extracted as features
 inputs (list of str or list of Symbol) – The inputs of network.
 pretrained (bool) – Use pretrained parameters as in gluon.model_zoo
 ctx (Context) – The context, e.g. mxnet.cpu(), mxnet.gpu(0).
Predictor for classification/box prediction.

class
gluoncv.nn.predictor.
ConvPredictor
(num_channel, kernel=(3, 3), pad=(1, 1), stride=(1, 1), activation=None, use_bias=True, **kwargs)[source]¶ Convolutional predictor. Convolutional predictor is widely used in objectdetection. It can be used to predict classification scores (1 channel per class) or box predictor, which is usually 4 channels per box. The output is of shape (N, num_channel, H, W).
Parameters:  num_channel (int) – Number of conv channels.
 kernel (tuple of (int, int), default (3, 3)) – Conv kernel size as (H, W).
 pad (tuple of (int, int), default (1, 1)) – Conv padding size as (H, W).
 stride (tuple of (int, int), default (1, 1)) – Conv stride size as (H, W).
 activation (str, optional) – Optional activation after conv, e.g. ‘relu’.
 use_bias (bool) – Use bias in convolution. It is not necessary if BatchNorm is followed.

class
gluoncv.nn.predictor.
FCPredictor
(num_output, activation=None, use_bias=True, **kwargs)[source]¶ Fully connected predictor. Fully connected predictor is used to ignore spatial information and will output fixedsized predictions.
Parameters:
Matchers for target assignment. Matchers are commonly used in objectdetection for anchorgroundtruth matching. The matching process is a prerequisite to training target assignment. Matching is usually not required during testing.

class
gluoncv.nn.matcher.
BipartiteMatcher
(threshold=1e12, is_ascend=False, eps=1e12)[source]¶ A Matcher implementing bipartite matching strategy.
Parameters:

class
gluoncv.nn.matcher.
CompositeMatcher
(matchers)[source]¶ A Matcher that combines multiple strategies.
Parameters: matchers (list of Matcher) – Matcher is a Block/HybridBlock used to match two groups of boxes

class
gluoncv.nn.matcher.
MaximumMatcher
(threshold)[source]¶ A Matcher implementing maximum matching strategy.
Parameters: threshold (float) – Matching threshold.
Samplers for positive/negative/ignore sample selections. This module is used to select samples during training. Based on different strategies, we would like to choose different number of samples as positive, negative or ignore(don’t care). The purpose is to alleviate unbalanced training target in some circumstances. The output of sampler is an NDArray of the same shape as the matching results. Note: 1 for positive, 1 for negative, 0 for ignore.

class
gluoncv.nn.sampler.
NaiveSampler
[source]¶ A naive sampler that take all existing matching results. There is no ignored sample in this case.

class
gluoncv.nn.sampler.
OHEMSampler
(ratio, min_samples=0, thresh=0.5)[source]¶ A sampler implementing Online Hardnegative mining. As described in paper https://arxiv.org/abs/1604.03540.
Parameters:  ratio (float) – Ratio of negative vs. positive samples. Values >= 1.0 is recommended.
 min_samples (int, default 0) – Minimum samples to be selected regardless of positive samples. For example, if positive samples is 0, we sometimes still want some num_negative samples to be selected.
 thresh (float, default 0.5) – IOU overlap threshold of selected negative samples. IOU must not exceed this threshold such that good matching anchors won’t be selected as negative samples.

class
gluoncv.nn.sampler.
QuotaSampler
(num_sample, pos_thresh, neg_thresh_high, neg_thresh_low=inf, pos_ratio=0.5, neg_ratio=None, fill_negative=True)[source]¶ Sampler that handles limited quota for positive and negative samples.
Parameters:  num_sample (int, default is 128) – Number of samples for RCNN targets.
 pos_iou_thresh (float, default is 0.5) – Proposal whose IOU larger than
pos_iou_thresh
is regarded as positive samples.  neg_iou_thresh_high (float, default is 0.5) – Proposal whose IOU smaller than
neg_iou_thresh_high
and larger thanneg_iou_thresh_low
is regarded as negative samples. Proposals with IOU in betweenpos_iou_thresh
andneg_iou_thresh
are ignored.  neg_iou_thresh_low (float, default is 0.0) – See
neg_iou_thresh_high
.  pos_ratio (float, default is 0.25) –
pos_ratio
defines how many positive samples (pos_ratio * num_sample
) is to be sampled.  neg_ratio (float or None) –
neg_ratio
defines how many negative samples (pos_ratio * num_sample
) is to be sampled. IfNone
is provided, it equals to1  pos_ratio
.  fill_negative (bool) – If
True
, negative samples will fill the gap caused by insufficient positive samples. For example, ifnum_sample
is 100,pos_ratio
andneg_ratio
are both0.5
. Available positive sample and negative samples are 10 and 10000, which are typical values. Now, the output positive samples is 10(intact), since it’s smaller than50(100 * 0.5)
, the negative samples will fill the rest40
slots. Iffill_negative == False
, the40
slots is filled with1(ignore)
.

forward
(matches, ious)[source]¶ Quota Sampler
 matches : NDArray or Symbol
 Matching results, positive number for positive matching, 1 for not matched.
 ious : NDArray or Symbol
 IOU overlaps with shape (N, M), batching is supported.
 NDArray or Symbol
 Sampling results with same shape as
matches
. 1 for positive, 1 for negative, 0 for ignore.

class
gluoncv.nn.sampler.
QuotaSamplerOp
(num_sample, pos_thresh, neg_thresh_high=0.5, neg_thresh_low=inf, pos_ratio=0.5, neg_ratio=None, fill_negative=True)[source]¶ Sampler that handles limited quota for positive and negative samples.
This is a custom Operator used inside HybridBlock.
Parameters:  num_sample (int, default is 128) – Number of samples for RCNN targets.
 pos_iou_thresh (float, default is 0.5) – Proposal whose IOU larger than
pos_iou_thresh
is regarded as positive samples.  neg_iou_thresh_high (float, default is 0.5) – Proposal whose IOU smaller than
neg_iou_thresh_high
and larger thanneg_iou_thresh_low
is regarded as negative samples. Proposals with IOU in betweenpos_iou_thresh
andneg_iou_thresh
are ignored.  neg_iou_thresh_low (float, default is 0.0) – See
neg_iou_thresh_high
.  pos_ratio (float, default is 0.25) –
pos_ratio
defines how many positive samples (pos_ratio * num_sample
) is to be sampled.  neg_ratio (float or None) –
neg_ratio
defines how many negative samples (pos_ratio * num_sample
) is to be sampled. IfNone
is provided, it equals to1  pos_ratio
.  fill_negative (bool) – If
True
, negative samples will fill the gap caused by insufficient positive samples. For example, ifnum_sample
is 100,pos_ratio
andneg_ratio
are both0.5
. Available positive sample and negative samples are 10 and 10000, which are typical values. Now, the output positive samples is 10(intact), since it’s smaller than50(100 * 0.5)
, the negative samples will fill the rest40
slots. Iffill_negative == False
, the40
slots is filled with1(ignore)
.

backward
(req, out_grad, in_data, out_data, in_grad, aux)[source]¶ Backward interface. Can override when creating new operators.
Parameters:  req (list of str) – how to assign to in_grad. can be ‘null’, ‘write’, or ‘add’. You can optionally use self.assign(dst, req, src) to handle this.
 in_data, out_data, in_grad, aux (out_grad,) – input and output for backward. See document for corresponding arguments of Operator::Backward

forward
(is_train, req, in_data, out_data, aux)[source]¶ Quota Sampler
 in_data: arraylike of Symbol
 [matches, ious], see below.
 matches : NDArray or Symbol
 Matching results, positive number for positive matching, 1 for not matched.
 ious : NDArray or Symbol
 IOU overlaps with shape (N, M), batching is supported.
 NDArray or Symbol
 Sampling results with same shape as
matches
. 1 for positive, 1 for negative, 0 for ignore.

class
gluoncv.nn.sampler.
QuotaSamplerProp
(num_sample, pos_thresh, neg_thresh_high=0.5, neg_thresh_low=0.0, pos_ratio=0.5, neg_ratio=None, fill_negative=True)[source]¶ Property for QuotaSampleOp.
Parameters:  num_sample (int, default is 128) – Number of samples for RCNN targets.
 pos_iou_thresh (float, default is 0.5) – Proposal whose IOU larger than
pos_iou_thresh
is regarded as positive samples.  neg_iou_thresh_high (float, default is 0.5) – Proposal whose IOU smaller than
neg_iou_thresh_high
and larger thanneg_iou_thresh_low
is regarded as negative samples. Proposals with IOU in betweenpos_iou_thresh
andneg_iou_thresh
are ignored.  neg_iou_thresh_low (float, default is 0.0) – See
neg_iou_thresh_high
.  pos_ratio (float, default is 0.25) –
pos_ratio
defines how many positive samples (pos_ratio * num_sample
) is to be sampled.  neg_ratio (float or None) –
neg_ratio
defines how many negative samples (pos_ratio * num_sample
) is to be sampled. IfNone
is provided, it equals to1  pos_ratio
.  fill_negative (bool) – If
True
, negative samples will fill the gap caused by insufficient positive samples. For example, ifnum_sample
is 100,pos_ratio
andneg_ratio
are both0.5
. Available positive sample and negative samples are 10 and 10000, which are typical values. Now, the output positive samples is 10(intact), since it’s smaller than50(100 * 0.5)
, the negative samples will fill the rest40
slots. Iffill_negative == False
, the40
slots is filled with1(ignore)
.

create_operator
(ctx, in_shapes, in_dtypes)[source]¶ Create an operator that carries out the real computation given the context, input shapes, and input data types.

infer_shape
(in_shape)[source]¶ infer_shape interface. Can override when creating new operators.
Parameters: in_shape (list) – List of argument shapes in the same order as declared in list_arguments. Returns:  in_shape (list) – List of argument shapes. Can be modified from in_shape.
 out_shape (list) – List of output shapes calculated from in_shape, in the same order as declared in list_outputs.
 aux_shape (Optional, list) – List of aux shapes calculated from in_shape, in the same order as declared in list_auxiliary_states.

infer_type
(in_type)[source]¶ infer_type interface. override to create new operators
Parameters: in_type (list of np.dtype) – list of argument types in the same order as declared in list_arguments. Returns:  in_type (list) – list of argument types. Can be modified from in_type.
 out_type (list) – list of output types calculated from in_type, in the same order as declared in list_outputs.
 aux_type (Optional, list) – list of aux types calculated from in_type, in the same order as declared in list_auxiliary_states.