gluoncv.nn

Neural Network Components.

Hint

Not every component listed here is HybridBlock, which means some of them are not hybridizable. However, we are trying our best to make sure components required during inference are hybridizable so the entire network can be exported and run in other languages.

For example, encoders are usually non-hybridizable but are only required during training. In contrast, decoders are mostly `HybridBlock`s.

Bounding Box

Blocks that apply bounding box related functions.

BBoxCornerToCenter Convert corner boxes to center boxes.
BBoxCenterToCorner Convert center boxes to corner boxes.
BBoxSplit Split bounding boxes into 4 columns.
BBoxArea Calculate the area of bounding boxes.

Coders

Encoders are used to encode training targets before we apply loss functions. Decoders are used to restore predicted values by inverting the operations done in encoders. They often come as a pair in order to make the results consistent.

NormalizedBoxCenterEncoder Encode bounding boxes training target with normalized center offsets.
NormalizedBoxCenterDecoder Decode bounding boxes training target with normalized center offsets.
MultiClassEncoder Encode classification training target given matching results.
MultiClassDecoder Decode classification results.
MultiPerClassDecoder Decode classification results.
SigmoidClassEncoder Encode class prediction labels for SigmoidCrossEntropy Loss.

Feature

Feature layers are components that either extract partial networks as feature extractor or extend them with new layers.

FeatureExtractor Feature extractor.
FeatureExpander Feature extractor with additional layers to append.

Matchers

Matchers are often used by object detection tasks whose target is to find the matchings between anchor boxes(very popular in object detection) and ground truths.

CompositeMatcher A Matcher that combines multiple strategies.
BipartiteMatcher A Matcher implementing bipartite matching strategy.
MaximumMatcher A Matcher implementing maximum matching strategy.

Predictors

Predictors are common neural network components which are specifically used to predict values. Depending on the purpose, it may vary from Convolution or Fully Connected.

ConvPredictor Convolutional predictor.
FCPredictor Fully connected predictor.

Samplers

Samples are often used after matching layers which is to determine positive/negative/ignored samples.

For example, a NaiveSampler simply returns all matched samples as positive, and all un-matched samples as negative.

This behavior is sometimes prone to vulnerability because training objective is not balanced. Please see OHEMSampler and QuotaSampler for more advanced sampling strategies.

NaiveSampler A naive sampler that take all existing matching results.
OHEMSampler A sampler implementing Online Hard-negative mining.
QuotaSampler Sampler that handles limited quota for positive and negative samples.

API Reference

Bounding boxes operators

class gluoncv.nn.bbox.BBoxArea(axis=-1, fmt='corner', **kwargs)[source]

Calculate the area of bounding boxes.

Parameters:
  • fmt (str, default is corner) – Bounding box format, can be {‘center’, ‘corner’}. ‘center’: {x, y, width, height} ‘corner’: {xmin, ymin, xmax, ymax}
  • axis (int, default is -1) – Effective axis of the bounding box. Default is -1(the last dimension).
Returns:

Return type:

A BxNx1 NDArray

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluoncv.nn.bbox.BBoxCenterToCorner(axis=-1, split=False)[source]

Convert center boxes to corner boxes. Corner boxes are encoded as (xmin, ymin, xmax, ymax) Center boxes are encoded as (center_x, center_y, width, height)

Parameters:
  • split (bool) – Whether split boxes to individual elements after processing.
  • axis (int, default is -1) – Effective axis of the bounding box. Default is -1(the last dimension).
Returns:

Return type:

A BxNx4 NDArray if split is False, or 4 BxNx1 NDArray if split is True.

hybrid_forward(F, x)[source]

Hybrid forward

class gluoncv.nn.bbox.BBoxCornerToCenter(axis=-1, split=False)[source]

Convert corner boxes to center boxes. Corner boxes are encoded as (xmin, ymin, xmax, ymax) Center boxes are encoded as (center_x, center_y, width, height)

Parameters:
  • split (bool) – Whether split boxes to individual elements after processing.
  • axis (int, default is -1) – Effective axis of the bounding box. Default is -1(the last dimension).
Returns:

Return type:

A BxNx4 NDArray if split is False, or 4 BxNx1 NDArray if split is True

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluoncv.nn.bbox.BBoxSplit(axis, **kwargs)[source]

Split bounding boxes into 4 columns.

Parameters:axis (int, default is -1) – On which axis to split the bounding box. Default is -1(the last dimension).
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.

Encoder and Decoder functions. Encoders are used during training, which assign training targets. Decoders are used during testing/validation, which convert predictions back to normal boxes, etc.

class gluoncv.nn.coder.MultiClassDecoder(axis=-1, thresh=0.01)[source]

Decode classification results.

This decoder must work with MultiClassEncoder to reconstruct valid labels. The decoder expect results are after logits, e.g. Softmax.

Parameters:
  • axis (int) – Axis of class-wise results.
  • thresh (float) – Confidence threshold for the post-softmax scores. Scores less than thresh are marked with 0, corresponding cls_id is marked with invalid class id -1.
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluoncv.nn.coder.MultiClassEncoder(ignore_label=-1)[source]

Encode classification training target given matching results.

This encoder will assign training target of matched bounding boxes to ground-truth label + 1 and negative samples with label 0. Ignored samples will be assigned with ignore_label, whose default is -1.

Parameters:ignore_label (float) – Assigned to un-matched samples, they are neither positive or negative during training, and should be excluded in loss function. Default is -1.
hybrid_forward(F, samples, matches, refs)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluoncv.nn.coder.MultiPerClassDecoder(num_class, axis=-1, thresh=0.01)[source]

Decode classification results.

This decoder must work with MultiClassEncoder to reconstruct valid labels. The decoder expect results are after logits, e.g. Softmax. This version is different from gluoncv.nn.coder.MultiClassDecoder with the following changes:

For each position(anchor boxes), each foreground class can have their own results, rather than enforced to be the best one. For example, for a 5-class prediction with background(totaling 6 class), say (0.5, 0.1, 0.2, 0.1, 0.05, 0.05) as (bg, apple, orange, peach, grape, melon), MultiClassDecoder produce only one class id and score, that is (orange-0.2). MultiPerClassDecoder produce 5 results individually: (apple-0.1, orange-0.2, peach-0.1, grape-0.05, melon-0.05).

Parameters:
  • num_class (int) – Number of classes including background.
  • axis (int) – Axis of class-wise results.
  • thresh (float) – Confidence threshold for the post-softmax scores. Scores less than thresh are marked with 0, corresponding cls_id is marked with invalid class id -1.
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluoncv.nn.coder.NormalizedBoxCenterDecoder(stds=(0.1, 0.1, 0.2, 0.2), means=(0.0, 0.0, 0.0, 0.0), convert_anchor=False)[source]

Decode bounding boxes training target with normalized center offsets. This decoder must cooperate with NormalizedBoxCenterEncoder of same stds in order to get properly reconstructed bounding boxes.

Returned bounding boxes are using corner type: x_{min}, y_{min}, x_{max}, y_{max}.

Parameters:
  • stds (array-like of size 4) – Std value to be divided from encoded values, default is (0.1, 0.1, 0.2, 0.2).
  • means (array-like of size 4) – Mean value to be subtracted from encoded values, default is (0., 0., 0., 0.).
hybrid_forward(F, x, anchors)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluoncv.nn.coder.NormalizedBoxCenterEncoder(stds=(0.1, 0.1, 0.2, 0.2), means=(0.0, 0.0, 0.0, 0.0))[source]

Encode bounding boxes training target with normalized center offsets.

Input bounding boxes are using corner type: x_{min}, y_{min}, x_{max}, y_{max}.

Parameters:
  • stds (array-like of size 4) – Std value to be divided from encoded values, default is (0.1, 0.1, 0.2, 0.2).
  • means (array-like of size 4) – Mean value to be subtracted from encoded values, default is (0., 0., 0., 0.).
forward(samples, matches, anchors, refs)[source]

Forward

class gluoncv.nn.coder.NormalizedPerClassBoxCenterEncoder(num_class, stds=(0.1, 0.1, 0.2, 0.2), means=(0.0, 0.0, 0.0, 0.0))[source]

Encode bounding boxes training target with normalized center offsets.

Input bounding boxes are using corner type: x_{min}, y_{min}, x_{max}, y_{max}.

Parameters:
  • stds (array-like of size 4) – Std value to be divided from encoded values, default is (0.1, 0.1, 0.2, 0.2).
  • means (array-like of size 4) – Mean value to be subtracted from encoded values, default is (0., 0., 0., 0.).
forward(samples, matches, anchors, labels, refs)[source]

Encode BBox One entry per category

class gluoncv.nn.coder.SigmoidClassEncoder(**kwargs)[source]

Encode class prediction labels for SigmoidCrossEntropy Loss.

hybrid_forward(F, samples)[source]

Encode class prediction labels for SigmoidCrossEntropy Loss.

Parameters:samples (mxnet.nd.NDArray or mxnet.sym.Symbol) – Sampling results with shape (B, N), 1:pos, 0:ignore, -1:negative
Returns:(target, mask) target is the output label with shape (B, N), 1: pos, 0: negative, -1: ignore mask is the mask for label, -1(ignore) labels have mask 0, otherwise mask is 1.
Return type:(mxnet.nd.NDArray, mxnet.nd.NDArray)

Feature extraction blocks. Feature or Multi-Feature extraction is a key component in object detection. Class predictor/Box predictor are usually applied on feature layer(s). A good feature extraction mechanism is critical to performance.

class gluoncv.nn.feature.FeatureExpander(network, outputs, num_filters, use_1x1_transition=True, use_bn=True, reduce_ratio=1.0, min_depth=128, global_pool=False, pretrained=False, ctx=cpu(0), inputs=('data', ))[source]

Feature extractor with additional layers to append. This is very common in vision networks where extra branches are attched to backbone network.

Parameters:
  • network (str or HybridBlock or Symbol) – Logic chain: load from gluon.model_zoo.vision if network is string. Convert to Symbol if network is HybridBlock.
  • outputs (str or list of str) – The name of layers to be extracted as features
  • num_filters (list of int) – Number of filters to be appended.
  • use_1x1_transition (bool) – Whether to use 1x1 convolution between attached layers. It is effective reducing network size.
  • use_bn (bool) – Whether to use BatchNorm between attached layers.
  • reduce_ratio (float) – Channel reduction ratio of the transition layers.
  • min_depth (int) – Minimum channel number of transition layers.
  • global_pool (bool) – Whether to use global pooling as the last layer.
  • pretrained (bool) – Use pretrained parameters as in gluon.model_zoo if True.
  • ctx (Context) – The context, e.g. mxnet.cpu(), mxnet.gpu(0).
  • inputs (list of str) – Name of input variables to the network.
class gluoncv.nn.feature.FeatureExtractor(network, outputs, inputs=('data', ), pretrained=False, ctx=cpu(0))[source]

Feature extractor.

Parameters:
  • network (str or HybridBlock or Symbol) – Logic chain: load from gluon.model_zoo.vision if network is string. Convert to Symbol if network is HybridBlock
  • outputs (str or list of str) – The name of layers to be extracted as features
  • inputs (list of str or list of Symbol) – The inputs of network.
  • pretrained (bool) – Use pretrained parameters as in gluon.model_zoo
  • ctx (Context) – The context, e.g. mxnet.cpu(), mxnet.gpu(0).

Predictor for classification/box prediction.

class gluoncv.nn.predictor.ConvPredictor(num_channel, kernel=(3, 3), pad=(1, 1), stride=(1, 1), activation=None, use_bias=True, **kwargs)[source]

Convolutional predictor. Convolutional predictor is widely used in object-detection. It can be used to predict classification scores (1 channel per class) or box predictor, which is usually 4 channels per box. The output is of shape (N, num_channel, H, W).

Parameters:
  • num_channel (int) – Number of conv channels.
  • kernel (tuple of (int, int), default (3, 3)) – Conv kernel size as (H, W).
  • pad (tuple of (int, int), default (1, 1)) – Conv padding size as (H, W).
  • stride (tuple of (int, int), default (1, 1)) – Conv stride size as (H, W).
  • activation (str, optional) – Optional activation after conv, e.g. ‘relu’.
  • use_bias (bool) – Use bias in convolution. It is not necessary if BatchNorm is followed.
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluoncv.nn.predictor.FCPredictor(num_output, activation=None, use_bias=True, **kwargs)[source]

Fully connected predictor. Fully connected predictor is used to ignore spatial information and will output fixed-sized predictions.

Parameters:
  • num_output (int) – Number of fully connected outputs.
  • activation (str, optional) – Optional activation after conv, e.g. ‘relu’.
  • use_bias (bool) – Use bias in convolution. It is not necessary if BatchNorm is followed.
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.

Matchers for target assignment. Matchers are commonly used in object-detection for anchor-groundtruth matching. The matching process is a prerequisite to training target assignment. Matching is usually not required during testing.

class gluoncv.nn.matcher.BipartiteMatcher(threshold=1e-12, is_ascend=False, eps=1e-12)[source]

A Matcher implementing bipartite matching strategy.

Parameters:
  • threshold (float) – Threshold used to ignore invalid paddings
  • is_ascend (bool) – Whether sort matching order in ascending order. Default is False.
  • eps (float) – Epsilon for floating number comparison
hybrid_forward(F, x)[source]

BipartiteMatching

x : NDArray or Symbol
IOU overlaps with shape (N, M), batching is supported.
class gluoncv.nn.matcher.CompositeMatcher(matchers)[source]

A Matcher that combines multiple strategies.

Parameters:matchers (list of Matcher) – Matcher is a Block/HybridBlock used to match two groups of boxes
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluoncv.nn.matcher.MaximumMatcher(threshold)[source]

A Matcher implementing maximum matching strategy.

Parameters:threshold (float) – Matching threshold.
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.

Samplers for positive/negative/ignore sample selections. This module is used to select samples during training. Based on different strategies, we would like to choose different number of samples as positive, negative or ignore(don’t care). The purpose is to alleviate unbalanced training target in some circumstances. The output of sampler is an NDArray of the same shape as the matching results. Note: 1 for positive, -1 for negative, 0 for ignore.

class gluoncv.nn.sampler.NaiveSampler[source]

A naive sampler that take all existing matching results. There is no ignored sample in this case.

hybrid_forward(F, x)[source]

Hybrid forward

class gluoncv.nn.sampler.OHEMSampler(ratio, min_samples=0, thresh=0.5)[source]

A sampler implementing Online Hard-negative mining. As described in paper https://arxiv.org/abs/1604.03540.

Parameters:
  • ratio (float) – Ratio of negative vs. positive samples. Values >= 1.0 is recommended.
  • min_samples (int, default 0) – Minimum samples to be selected regardless of positive samples. For example, if positive samples is 0, we sometimes still want some num_negative samples to be selected.
  • thresh (float, default 0.5) – IOU overlap threshold of selected negative samples. IOU must not exceed this threshold such that good matching anchors won’t be selected as negative samples.
forward(x, logits, ious)[source]

Forward

class gluoncv.nn.sampler.QuotaSampler(num_sample, pos_thresh, neg_thresh_high, neg_thresh_low=-inf, pos_ratio=0.5, neg_ratio=None, fill_negative=True)[source]

Sampler that handles limited quota for positive and negative samples.

Parameters:
  • num_sample (int, default is 128) – Number of samples for RCNN targets.
  • pos_iou_thresh (float, default is 0.5) – Proposal whose IOU larger than pos_iou_thresh is regarded as positive samples.
  • neg_iou_thresh_high (float, default is 0.5) – Proposal whose IOU smaller than neg_iou_thresh_high and larger than neg_iou_thresh_low is regarded as negative samples. Proposals with IOU in between pos_iou_thresh and neg_iou_thresh are ignored.
  • neg_iou_thresh_low (float, default is 0.0) – See neg_iou_thresh_high.
  • pos_ratio (float, default is 0.25) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.
  • neg_ratio (float or None) – neg_ratio defines how many negative samples (pos_ratio * num_sample) is to be sampled. If None is provided, it equals to 1 - pos_ratio.
  • fill_negative (bool) – If True, negative samples will fill the gap caused by insufficient positive samples. For example, if num_sample is 100, pos_ratio and neg_ratio are both 0.5. Available positive sample and negative samples are 10 and 10000, which are typical values. Now, the output positive samples is 10(intact), since it’s smaller than 50(100 * 0.5), the negative samples will fill the rest 40 slots. If fill_negative == False, the 40 slots is filled with -1(ignore).
forward(matches, ious)[source]

Quota Sampler

matches : NDArray or Symbol
Matching results, postive number for postive matching, -1 for not matched.
ious : NDArray or Symbol
IOU overlaps with shape (N, M), batching is supported.
NDArray or Symbol
Sampling results with same shape as matches. 1 for positive, -1 for negative, 0 for ignore.
class gluoncv.nn.sampler.QuotaSamplerOp(num_sample, pos_thresh, neg_thresh_high=0.5, neg_thresh_low=-inf, pos_ratio=0.5, neg_ratio=None, fill_negative=True)[source]

Sampler that handles limited quota for positive and negative samples.

This is a custom Operator used inside HybridBlock.

Parameters:
  • num_sample (int, default is 128) – Number of samples for RCNN targets.
  • pos_iou_thresh (float, default is 0.5) – Proposal whose IOU larger than pos_iou_thresh is regarded as positive samples.
  • neg_iou_thresh_high (float, default is 0.5) – Proposal whose IOU smaller than neg_iou_thresh_high and larger than neg_iou_thresh_low is regarded as negative samples. Proposals with IOU in between pos_iou_thresh and neg_iou_thresh are ignored.
  • neg_iou_thresh_low (float, default is 0.0) – See neg_iou_thresh_high.
  • pos_ratio (float, default is 0.25) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.
  • neg_ratio (float or None) – neg_ratio defines how many negative samples (pos_ratio * num_sample) is to be sampled. If None is provided, it equals to 1 - pos_ratio.
  • fill_negative (bool) – If True, negative samples will fill the gap caused by insufficient positive samples. For example, if num_sample is 100, pos_ratio and neg_ratio are both 0.5. Available positive sample and negative samples are 10 and 10000, which are typical values. Now, the output positive samples is 10(intact), since it’s smaller than 50(100 * 0.5), the negative samples will fill the rest 40 slots. If fill_negative == False, the 40 slots is filled with -1(ignore).
backward(req, out_grad, in_data, out_data, in_grad, aux)[source]

Backward interface. Can override when creating new operators.

Parameters:
  • req (list of str) – how to assign to in_grad. can be ‘null’, ‘write’, or ‘add’. You can optionally use self.assign(dst, req, src) to handle this.
  • in_data, out_data, in_grad, aux (out_grad,) – input and output for backward. See document for corresponding arguments of Operator::Backward
forward(is_train, req, in_data, out_data, aux)[source]

Quota Sampler

in_data: array-like of Symbol
[matches, ious], see below.
matches : NDArray or Symbol
Matching results, postive number for postive matching, -1 for not matched.
ious : NDArray or Symbol
IOU overlaps with shape (N, M), batching is supported.
NDArray or Symbol
Sampling results with same shape as matches. 1 for positive, -1 for negative, 0 for ignore.
class gluoncv.nn.sampler.QuotaSamplerProp(num_sample, pos_thresh, neg_thresh_high=0.5, neg_thresh_low=0.0, pos_ratio=0.5, neg_ratio=None, fill_negative=True)[source]

Property for QuotaSampleOp.

Parameters:
  • num_sample (int, default is 128) – Number of samples for RCNN targets.
  • pos_iou_thresh (float, default is 0.5) – Proposal whose IOU larger than pos_iou_thresh is regarded as positive samples.
  • neg_iou_thresh_high (float, default is 0.5) – Proposal whose IOU smaller than neg_iou_thresh_high and larger than neg_iou_thresh_low is regarded as negative samples. Proposals with IOU in between pos_iou_thresh and neg_iou_thresh are ignored.
  • neg_iou_thresh_low (float, default is 0.0) – See neg_iou_thresh_high.
  • pos_ratio (float, default is 0.25) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.
  • neg_ratio (float or None) – neg_ratio defines how many negative samples (pos_ratio * num_sample) is to be sampled. If None is provided, it equals to 1 - pos_ratio.
  • fill_negative (bool) – If True, negative samples will fill the gap caused by insufficient positive samples. For example, if num_sample is 100, pos_ratio and neg_ratio are both 0.5. Available positive sample and negative samples are 10 and 10000, which are typical values. Now, the output positive samples is 10(intact), since it’s smaller than 50(100 * 0.5), the negative samples will fill the rest 40 slots. If fill_negative == False, the 40 slots is filled with -1(ignore).
create_operator(ctx, in_shapes, in_dtypes)[source]

Create an operator that carries out the real computation given the context, input shapes, and input data types.

infer_shape(in_shape)[source]

infer_shape interface. Can override when creating new operators.

Parameters:in_shape (list) – List of argument shapes in the same order as declared in list_arguments.
Returns:
  • in_shape (list) – List of argument shapes. Can be modified from in_shape.
  • out_shape (list) – List of output shapes calculated from in_shape, in the same order as declared in list_outputs.
  • aux_shape (Optional, list) – List of aux shapes calculated from in_shape, in the same order as declared in list_auxiliary_states.
infer_type(in_type)[source]

infer_type interface. override to create new operators

Parameters:in_type (list of np.dtype) – list of argument types in the same order as declared in list_arguments.
Returns:
  • in_type (list) – list of argument types. Can be modified from in_type.
  • out_type (list) – list of output types calculated from in_type, in the same order as declared in list_outputs.
  • aux_type (Optional, list) – list of aux types calculated from in_type, in the same order as declared in list_auxiliary_states.
list_arguments()[source]

list_arguments interface. Can override when creating new operators.

Returns:arguments – List of argument blob names.
Return type:list
list_outputs()[source]

list_outputs interface. Can override when creating new operators.

Returns:outputs – List of output blob names.
Return type:list