gluoncv.model_zoo¶
Gluon Vision Model Zoo
gluoncv.model_zoo.get_model¶
Returns a predefined GluonCV model by name.
Hint
This is the recommended method for getting a predefined model.
It support directly loading models from Gluon Model Zoo as well.
get_model 
Returns a predefined model by name 
Image Classification¶
CIFAR¶
ImageNet¶
We apply dilattion strategy to pretrained ResNet models (with stride of 8). Please see gluoncv.model_zoo.SegBaseModel
for how to use it.
ResNetV1b 
Pretrained ResNetV1b Model, which preduces the strides of 8 featuremaps at conv5. 
resnet18_v1b 
Constructs a ResNetV1b18 model. 
resnet34_v1b 
Constructs a ResNetV1b34 model. 
resnet50_v1b 
Constructs a ResNetV1b50 model. 
resnet101_v1b 
Constructs a ResNetV1b101 model. 
resnet152_v1b 
Constructs a ResNetV1b152 model. 
Object Detection¶
SSD¶
SSD 
Singleshot Object Detection Network: https://arxiv.org/abs/1512.02325. 
get_ssd 
Get SSD models. 
ssd_300_vgg16_atrous_voc 
SSD architecture with VGG16 atrous 300x300 base network for Pascal VOC. 
ssd_512_vgg16_atrous_voc 
SSD architecture with VGG16 atrous 512x512 base network. 
ssd_512_resnet50_v1_voc 
SSD architecture with ResNet v1 50 layers. 
ssd_512_resnet101_v2_voc 
SSD architecture with ResNet v2 101 layers. 
ssd_512_resnet152_v2_voc 
SSD architecture with ResNet v2 152 layers. 
VGGAtrousExtractor 
VGG Atrous multi layer feature extractor which produces multiple output feauture maps. 
get_vgg_atrous_extractor 
Get VGG atrous feature extractor networks. 
vgg16_atrous_300 
Get VGG atrous 16 layer 300 in_size feature extractor networks. 
vgg16_atrous_512 
Get VGG atrous 16 layer 512 in_size feature extractor networks. 
Faster RCNN¶
FasterRCNN 
Faster RCNN network. 
get_faster_rcnn 
Utility function to return faster rcnn networks. 
faster_rcnn_resnet50_v2a_voc 
Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J. 
faster_rcnn_resnet50_v2a_coco 
Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J. 
Semantic Segmentation¶
FCN¶
FCN 
Fully Convolutional Networks for Semantic Segmentation 
get_fcn 
FCN model from the paper “Fully Convolutional Network for semantic segmentation” 
get_fcn_voc_resnet50 
FCN model with base network ResNet50 pretrained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation” 
get_fcn_voc_resnet101 
FCN model with base network ResNet101 pretrained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation” 
get_fcn_ade_resnet50 
FCN model with base network ResNet50 pretrained on ADE20K dataset from the paper “Fully Convolutional Network for semantic segmentation” 
PSPNet¶
PSPNet 
Pyramid Scene Parsing Network 
get_psp 
Pyramid Scene Parsing Network :param dataset: The dataset that model pretrained on. 
get_psp_ade_resnet50 
Pyramid Scene Parsing Network :param pretrained: Whether to load the pretrained weights for model. 
API Reference¶
Gluon Vision Model Zoo

class
gluoncv.model_zoo.
BasicBlockV1b
(inplanes, planes, strides=1, dilation=1, downsample=None, previous_dilation=1, norm_layer=None, **kwargs)[source]¶ ResNetV1b BasicBlockV1b

class
gluoncv.model_zoo.
BottleneckV1b
(inplanes, planes, strides=1, dilation=1, downsample=None, previous_dilation=1, norm_layer=None, last_gamma=False, **kwargs)[source]¶ ResNetV1b BottleneckV1b

class
gluoncv.model_zoo.
FCN
(nclass, backbone='resnet50', norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, aux=True, ctx=cpu(0), **kwargs)[source]¶ Fully Convolutional Networks for Semantic Segmentation
Parameters:  nclass (int) – Number of categories for the training dataset.
 backbone (string) – Pretrained dilated backbone network type (default:’resnet50’; ‘resnet50’, ‘resnet101’ or ‘resnet152’).
 norm_layer (object) – Normalization layer used in backbone network (default:
mxnet.gluon.nn.BatchNorm
;
Reference:
Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” CVPR, 2015Examples
>>> model = FCN(nclass=21, backbone='resnet50') >>> print(model)

class
gluoncv.model_zoo.
FasterRCNN
(features, top_features, scales, ratios, classes, roi_mode, roi_size, stride=16, rpn_channel=1024, rpn_train_pre_nms=12000, rpn_train_post_nms=2000, rpn_test_pre_nms=6000, rpn_test_post_nms=300, num_sample=128, pos_iou_thresh=0.5, neg_iou_thresh_high=0.5, neg_iou_thresh_low=0.0, pos_ratio=0.25, **kwargs)[source]¶ Faster RCNN network.
Parameters:  features (gluon.HybridBlock) – Base feature extractor before feature pooling layer.
 top_features (gluon.HybridBlock) – Tail feature extractor after feature pooling layer.
 train_patterns (str) – Matching pattern for trainable parameters.
 scales (iterable of float) –
The areas of anchor boxes. We use the following form to compute the shapes of anchors:
\[width_{anchor} = size_{base} \times scale \times \sqrt{ 1 / ratio} height_{anchor} = size_{base} \times scale \times \sqrt{ratio}\]  ratios (iterable of float) – The aspect ratios of anchor boxes. We expect it to be a list or tuple.
 classes (iterable of str) – Names of categories, its length is
num_class
.  roi_mode (str) – ROI pooling mode. Currently support ‘pool’ and ‘align’.
 roi_size (tuple of int, length 2) – (height, width) of the ROI region.
 stride (int, default is 16) – Feature map stride with respect to original image. This is usually the ratio between original image size and feature map size.
 rpn_channel (int, default is 1024) – Channel number used in RPN convolutional layers.
 rpn_train_pre_nms (int, default is 12000) – Filter top proposals before NMS in training of RPN.
 rpn_train_post_nms (int, default is 2000) – Return top proposal results after NMS in training of RPN.
 rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.
 rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN.
 nms_thresh (float, default is 0.3.) – Nonmaximum suppression threshold. You can speficy < 0 or > 1 to disable NMS.
 nms_topk (int, default is 400) –
 Apply NMS to top k detection results, use 1 to disable so that every Detection
 result is used in NMS.
 post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use 1 to return all detections.
 num_sample (int, default is 128) – Number of samples for RCNN targets.
 pos_iou_thresh (float, default is 0.5) – Proposal whose IOU larger than
pos_iou_thresh
is regarded as positive samples.  neg_iou_thresh_high (float, default is 0.5) – Proposal whose IOU smaller than
neg_iou_thresh_high
and larger thanneg_iou_thresh_low
is regarded as negative samples. Proposals with IOU in betweenpos_iou_thresh
andneg_iou_thresh
are ignored.  neg_iou_thresh_low (float, default is 0.0) – See
neg_iou_thresh_high
.  pos_ratio (float, default is 0.25) –
pos_ratio
defines how many positive samples (pos_ratio * num_sample
) is to be sampled.

hybrid_forward
(F, x, gt_box=None)[source]¶ Forward FasterRCNN network.
The behavior during traing and inference is different.
Parameters:  x (mxnet.nd.NDArray or mxnet.symbol) – The network input tensor.
 gt_box (type, only required during training) – The groundtruth bbox tensor with shape (1, N, 4).
Returns: During inference, returns final class id, confidence scores, bounding boxes.
Return type: (ids, scores, bboxes)

target_generator
¶ Returns stored target generator
Returns: The RCNN target generator Return type: mxnet.gluon.HybridBlock

class
gluoncv.model_zoo.
HybridBlock
(prefix=None, params=None)[source]¶ HybridBlock supports forwarding with both Symbol and NDArray.
HybridBlock is similar to Block, with a few differences:
import mxnet as mx from mxnet.gluon import HybridBlock, nn class Model(HybridBlock): def __init__(self, **kwargs): super(Model, self).__init__(**kwargs) # use name_scope to give child Blocks appropriate names. with self.name_scope(): self.dense0 = nn.Dense(20) self.dense1 = nn.Dense(20) def hybrid_forward(self, F, x): x = F.relu(self.dense0(x)) return F.relu(self.dense1(x)) model = Model() model.initialize(ctx=mx.cpu(0)) model.hybridize() model(mx.nd.zeros((10, 10), ctx=mx.cpu(0)))
Forward computation in
HybridBlock
must be static to work withSymbol
s, i.e. you cannot callNDArray.asnumpy()
,NDArray.shape
,NDArray.dtype
, NDArray indexing (x[i]) etc on tensors. Also, you cannot use branching or loop logic that bases on nonconstant expressions like random numbers or intermediate results, since they change the graph structure for each iteration.Before activating with
hybridize()
,HybridBlock
works just like normalBlock
. After activation,HybridBlock
will create a symbolic graph representing the forward computation and cache it. On subsequent forwards, the cached graph will be used instead ofhybrid_forward()
.Please see references for detailed tutorial.
References
Hybrid  Faster training and easy deployment

cast
(dtype)[source]¶ Cast this Block to use another data type.
Parameters: dtype (str or numpy.dtype) – The new data type.

export
(path, epoch=0)[source]¶ Export HybridBlock to json format that can be loaded by SymbolBlock.imports, mxnet.mod.Module or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
Parameters:

forward
(x, *args)[source]¶ Defines the forward computation. Arguments can be either
NDArray
orSymbol
.

hybrid_forward
(F, x, *args, **kwargs)[source]¶ Overrides to construct symbolic graph for this Block.
Parameters:  x (Symbol or NDArray) – The first input tensor.
 *args (list of Symbol or list of NDArray) – Additional input tensors.

hybridize
(active=True, **kwargs)[source]¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on nonhybrid children.Parameters:  active (bool, default True) – Whether to turn hybrid on or off.
 static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
 static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.


class
gluoncv.model_zoo.
PSPNet
(nclass, backbone='resnet50', norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, aux=True, ctx=cpu(0), **kwargs)[source]¶ Pyramid Scene Parsing Network
Parameters:  nclass (int) – Number of categories for the training dataset.
 backbone (string) – Pretrained dilated backbone network type (default:’resnet50’; ‘resnet50’, ‘resnet101’ or ‘resnet152’).
 norm_layer (object) – Normalization layer used in backbone network (default:
mxnet.gluon.nn.BatchNorm
; for Synchronized CrossGPU BachNormalization).  aux (bool) – Auxilary loss.
Reference:
Zhao, Hengshuang, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. “Pyramid scene parsing network.” CVPR, 2017

class
gluoncv.model_zoo.
ResNetV1b
(block, layers, classes=1000, dilated=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, last_gamma=False, **kwargs)[source]¶ Pretrained ResNetV1b Model, which preduces the strides of 8 featuremaps at conv5.
Parameters:  block (Block) – Class for the residual block. Options are BasicBlockV1, BottleneckV1.
 layers (list of int) – Numbers of layers in each block
 classes (int, default 1000) – Number of classification classes.
 dilated (bool, default False) – Applying dilation strategy to pretrained ResNet yielding a stride8 model, typically used in Semantic Segmentation.
 norm_layer (object) – Normalization layer used in backbone network (default:
mxnet.gluon.norm_layer
; for Synchronized CrossGPU BachNormalization).
Reference:
 He, Kaiming, et al. “Deep residual learning for image recognition.”
Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
 Yu, Fisher, and Vladlen Koltun. “Multiscale context aggregation by dilated convolutions.”

class
gluoncv.model_zoo.
SE_BasicBlockV1
(channels, stride, downsample=False, in_channels=0, **kwargs)[source]¶ BasicBlock V1 from “Deep Residual Learning for Image Recognition” paper. This is used for SE_ResNet V1 for 18, 34 layers.
Parameters:

class
gluoncv.model_zoo.
SE_BasicBlockV2
(channels, stride, downsample=False, in_channels=0, **kwargs)[source]¶ BasicBlock V2 from “Identity Mappings in Deep Residual Networks” paper. This is used for SE_ResNet V2 for 18, 34 layers.
Parameters:

class
gluoncv.model_zoo.
SE_BottleneckV1
(channels, stride, downsample=False, in_channels=0, **kwargs)[source]¶ Bottleneck V1 from “Deep Residual Learning for Image Recognition” paper. This is used for SE_ResNet V1 for 50, 101, 152 layers.
Parameters:

class
gluoncv.model_zoo.
SE_BottleneckV2
(channels, stride, downsample=False, in_channels=0, **kwargs)[source]¶ Bottleneck V2 from “Identity Mappings in Deep Residual Networks” paper. This is used for SE_ResNet V2 for 50, 101, 152 layers.
Parameters:

class
gluoncv.model_zoo.
SE_ResNetV1
(block, layers, channels, classes=1000, thumbnail=False, **kwargs)[source]¶ SE_ResNet V1 model from “Deep Residual Learning for Image Recognition” paper.
Parameters:  block (HybridBlock) – Class for the residual block. Options are SE_BasicBlockV1, SE_BottleneckV1.
 layers (list of int) – Numbers of layers in each block
 channels (list of int) – Numbers of channels in each block. Length should be one larger than layers list.
 classes (int, default 1000) – Number of classification classes.
 thumbnail (bool, default False) – Enable thumbnail.

class
gluoncv.model_zoo.
SE_ResNetV2
(block, layers, channels, classes=1000, thumbnail=False, **kwargs)[source]¶ SE_ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.
Parameters:  block (HybridBlock) – Class for the residual block. Options are SE_BasicBlockV1, SE_BottleneckV1.
 layers (list of int) – Numbers of layers in each block
 channels (list of int) – Numbers of channels in each block. Length should be one larger than layers list.
 classes (int, default 1000) – Number of classification classes.
 thumbnail (bool, default False) – Enable thumbnail.

class
gluoncv.model_zoo.
SSD
(network, base_size, features, num_filters, sizes, ratios, steps, classes, use_1x1_transition=True, use_bn=True, reduce_ratio=1.0, min_depth=128, global_pool=False, pretrained=False, stds=(0.1, 0.1, 0.2, 0.2), nms_thresh=0.45, nms_topk=400, post_nms=100, anchor_alloc_size=128, ctx=cpu(0), **kwargs)[source]¶ Singleshot Object Detection Network: https://arxiv.org/abs/1512.02325.
Parameters:  network (string or None) – Name of the base network, if None is used, will instantiate the base network from features directly instead of composing.
 base_size (int) – Base input size, it is speficied so SSD can support dynamic input shapes.
 features (list of str or mxnet.gluon.HybridBlock) – Intermediate features to be extracted or a network with multioutput. If network is None, features is expected to be a multioutput network.
 num_filters (list of int) – Number of channels for the appended layers, ignored if network`is `None.
 sizes (iterable fo float) – Sizes of anchor boxes, this should be a list of floats, in incremental order.
The length of sizes must be len(layers) + 1. For example, a two stage SSD
model can have
sizes = [30, 60, 90]
, and it converts to [30, 60] and [60, 90] for the two stages, respectively. For more details, please refer to original paper.  ratios (iterable of list) – Aspect ratios of anchors in each output layer. Its length must be equals to the number of SSD output layers.
 steps (list of int) – Step size of anchor boxes in each output layer.
 classes (iterable of str) – Names of all categories.
 use_1x1_transition (bool) – Whether to use 1x1 convolution as transition layer between attached layers, it is effective reducing model capacity.
 use_bn (bool) – Whether to use BatchNorm layer after each attached convolutional layer.
 reduce_ratio (float) – Channel reduce ratio (0, 1) of the transition layer.
 min_depth (int) – Minimum channels for the transition layers.
 global_pool (bool) – Whether to attach a global average pooling layer as the last output layer.
 pretrained (bool) – Description of parameter pretrained.
 stds (tuple of float, default is (0.1, 0.1, 0.2, 0.2)) – Std values to be divided/multiplied to box encoded values.
 nms_thresh (float, default is 0.45.) – Nonmaximum suppression threshold. You can speficy < 0 or > 1 to disable NMS.
 nms_topk (int, default is 400) –
 Apply NMS to top k detection results, use 1 to disable so that every Detection
 result is used in NMS.
 post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use 1 to return all detections.
 anchor_alloc_size (tuple of int, default is (128, 128)) – For advanced users. Define anchor_alloc_size to generate large enough anchor maps, which will later saved in parameters. During inference, we support arbitrary input image by cropping corresponding area of the anchor map. This allow us to export to symbol so we can run it in c++, scalar, etc.
 ctx (mx.Context) – Network context.

set_nms
(nms_thresh=0.45, nms_topk=400, post_nms=100)[source]¶ Set nonmaximum suppression parameters.
Parameters:  nms_thresh (float, default is 0.45.) – Nonmaximum suppression threshold. You can speficy < 0 or > 1 to disable NMS.
 nms_topk (int, default is 400) –
 Apply NMS to top k detection results, use 1 to disable so that every Detection
 result is used in NMS.
 post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use 1 to return all detections.
Returns: Return type: None

class
gluoncv.model_zoo.
SegBaseModel
(nclass, aux, backbone='resnet50', height=480, width=480, **kwargs)[source]¶ Base Model for Semantic Segmentation
Parameters:  backbone (string) – Pretrained dilated backbone network type (default:’resnet50’; ‘resnet50’, ‘resnet101’ or ‘resnet152’).
 norm_layer (Block) – Normalization layer used in backbone network (default:
mxnet.gluon.nn.BatchNorm
; for Synchronized CrossGPU BachNormalization).

class
gluoncv.model_zoo.
VGGAtrousExtractor
(layers, filters, extras, batch_norm=False, **kwargs)[source]¶ VGG Atrous multi layer feature extractor which produces multiple output feauture maps.
Parameters:  layers (list of int) – Number of layer for vgg base network.
 filters (list of int) – Number of convolution filters for each layer.
 extras (list of list) – Extra layers configurations.
 batch_norm (bool) – If True, will use BatchNorm layers.

gluoncv.model_zoo.
cifar_resnet110_v1
(**kwargs)[source]¶ ResNet110 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.
Parameters:

gluoncv.model_zoo.
cifar_resnet110_v2
(**kwargs)[source]¶ ResNet110 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.
Parameters:

gluoncv.model_zoo.
cifar_resnet20_v1
(**kwargs)[source]¶ ResNet20 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.
Parameters:

gluoncv.model_zoo.
cifar_resnet20_v2
(**kwargs)[source]¶ ResNet20 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.
Parameters:

gluoncv.model_zoo.
cifar_resnet56_v1
(**kwargs)[source]¶ ResNet56 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.
Parameters:

gluoncv.model_zoo.
cifar_resnet56_v2
(**kwargs)[source]¶ ResNet56 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.
Parameters:

gluoncv.model_zoo.
cifar_wideresnet16_10
(**kwargs)[source]¶ WideResNet1610 model for CIFAR10 from “Wide Residual Networks” paper.
Parameters:

gluoncv.model_zoo.
cifar_wideresnet28_10
(**kwargs)[source]¶ WideResNet2810 model for CIFAR10 from “Wide Residual Networks” paper.
Parameters:

gluoncv.model_zoo.
cifar_wideresnet40_8
(**kwargs)[source]¶ WideResNet408 model for CIFAR10 from “Wide Residual Networks” paper.
Parameters:

gluoncv.model_zoo.
cpu
(device_id=0)[source]¶ Returns a CPU context.
This function is a short cut for
Context('cpu', device_id)
. For most operations, when no context is specified, the default context is cpu().Examples
>>> with mx.cpu(): ... cpu_array = mx.nd.ones((2, 3)) >>> cpu_array.context cpu(0) >>> cpu_array = mx.nd.ones((2, 3), ctx=mx.cpu()) >>> cpu_array.context cpu(0)
Parameters: device_id (int, optional) – The device id of the device. device_id is not needed for CPU. This is included to make interface compatible with GPU. Returns: context – The corresponding CPU context. Return type: Context

gluoncv.model_zoo.
faster_rcnn_resnet50_v2a_coco
(pretrained=False, pretrained_base=True, **kwargs)[source]¶ Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster rcnn: Towards realtime object detection with region proposal networks”
Parameters:  pretrained (bool, optional, default is False) – Load pretrained weights.
 pretrained_base (bool, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.
 ctx (Context, default CPU) – The context in which to load the pretrained weights.
 root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
Examples
>>> model = get_faster_rcnn_resnet50_v2a_coco(pretrained=True) >>> print(model)

gluoncv.model_zoo.
faster_rcnn_resnet50_v2a_voc
(pretrained=False, pretrained_base=True, **kwargs)[source]¶ Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster rcnn: Towards realtime object detection with region proposal networks”
Parameters:  pretrained (bool, optional, default is False) – Load pretrained weights.
 pretrained_base (bool, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.
 ctx (Context, default CPU) – The context in which to load the pretrained weights.
 root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
Examples
>>> model = get_faster_rcnn_resnet50_v2a_voc(pretrained=True) >>> print(model)

gluoncv.model_zoo.
get_cifar_resnet
(version, num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]¶ ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.
Parameters:  version (int) – Version of ResNet. Options are 1, 2.
 num_layers (int) – Numbers of layers. Needs to be an integer in the form of 6*n+2, e.g. 20, 56, 110, 164.
 pretrained (bool, default False) – Whether to load the pretrained weights for model.
 ctx (Context, default CPU) – The context in which to load the pretrained weights.
 root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

gluoncv.model_zoo.
get_cifar_wide_resnet
(num_layers, width_factor=1, drop_rate=0.0, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]¶ ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.
Parameters:  num_layers (int) – Numbers of layers. Needs to be an integer in the form of 6*n+2, e.g. 20, 56, 110, 164.
 width_factor (int) – The width factor to apply to the number of channels from the original resnet.
 drop_rate (float) – The rate of dropout.
 pretrained (bool, default False) – Whether to load the pretrained weights for model.
 ctx (Context, default CPU) – The context in which to load the pretrained weights.
 root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

gluoncv.model_zoo.
get_faster_rcnn
(name, features, top_features, scales, ratios, classes, roi_mode, roi_size, dataset, stride=16, rpn_channel=1024, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]¶ Utility function to return faster rcnn networks.
Parameters:  name (str) – Model name.
 features (gluon.HybridBlock) – Base feature extractor before feature pooling layer.
 top_features (gluon.HybridBlock) – Tail feature extractor after feature pooling layer.
 scales (iterable of float) –
The areas of anchor boxes. We use the following form to compute the shapes of anchors:
\[width_{anchor} = size_{base} \times scale \times \sqrt{ 1 / ratio} height_{anchor} = size_{base} \times scale \times \sqrt{ratio}\]  ratios (iterable of float) – The aspect ratios of anchor boxes. We expect it to be a list or tuple.
 classes (iterable of str) – Names of categories, its length is
num_class
.  roi_mode (str) – ROI pooling mode. Currently support ‘pool’ and ‘align’.
 roi_size (tuple of int, length 2) – (height, width) of the ROI region.
 dataset (str) – The name of dataset.
 stride (int, default is 16) – Feature map stride with respect to original image. This is usually the ratio between original image size and feature map size.
 rpn_channel (int, default is 1024) – Channel number used in RPN convolutional layers.
 pretrained (bool, optional, default is False) – Load pretrained weights.
 ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).
 root (str) – Model weights storing path.
Returns: The FasterRCNN network.
Return type:

gluoncv.model_zoo.
get_fcn
(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]¶ FCN model from the paper “Fully Convolutional Network for semantic segmentation”
Parameters:  dataset (str, default pascal_voc) – The dataset that model pretrained on. (pascal_voc, ade20k)
 pretrained (bool, default False) – Whether to load the pretrained weights for model.
 ctx (Context, default CPU) – The context in which to load the pretrained weights.
 root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
Examples
>>> model = get_fcn(dataset='pascal_voc', backbone='resnet50', pretrained=False) >>> print(model)

gluoncv.model_zoo.
get_fcn_ade_resnet50
(**kwargs)[source]¶ FCN model with base network ResNet50 pretrained on ADE20K dataset from the paper “Fully Convolutional Network for semantic segmentation”
Parameters: Examples
>>> model = get_fcn_ade_resnet50(pretrained=True) >>> print(model)

gluoncv.model_zoo.
get_fcn_voc_resnet101
(**kwargs)[source]¶ FCN model with base network ResNet101 pretrained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”
Parameters: Examples
>>> model = get_fcn_voc_resnet101(pretrained=True) >>> print(model)

gluoncv.model_zoo.
get_fcn_voc_resnet50
(**kwargs)[source]¶ FCN model with base network ResNet50 pretrained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”
Parameters: Examples
>>> model = get_fcn_voc_resnet50(pretrained=True) >>> print(model)

gluoncv.model_zoo.
get_model
(name, **kwargs)[source]¶ Returns a predefined model by name
Parameters:  name (str) – Name of the model.
 pretrained (bool) – Whether to load the pretrained weights for model.
 classes (int) – Number of classes for the output layer.
 ctx (Context, default CPU) – The context in which to load the pretrained weights.
 root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
Returns: The model.
Return type:

gluoncv.model_zoo.
get_psp
(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]¶ Pyramid Scene Parsing Network :param dataset: The dataset that model pretrained on. (pascal_voc, ade20k) :type dataset: str, default pascal_voc :param pretrained: Whether to load the pretrained weights for model. :type pretrained: bool, default False :param ctx: The context in which to load the pretrained weights. :type ctx: Context, default CPU :param root: Location for keeping the model parameters. :type root: str, default ‘~/.mxnet/models’
Examples
>>> model = get_fcn(dataset='pascal_voc', backbone='resnet50', pretrained=False) >>> print(model)

gluoncv.model_zoo.
get_psp_ade_resnet50
(**kwargs)[source]¶ Pyramid Scene Parsing Network :param pretrained: Whether to load the pretrained weights for model. :type pretrained: bool, default False :param ctx: The context in which to load the pretrained weights. :type ctx: Context, default CPU :param root: Location for keeping the model parameters. :type root: str, default ‘~/.mxnet/models’
Examples
>>> model = get_fcn_ade_resnet50(pretrained=True) >>> print(model)

gluoncv.model_zoo.
get_se_resnet
(version, num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]¶ SE_ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. SE_ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.
Parameters:  version (int) – Version of ResNet. Options are 1, 2.
 num_layers (int) – Numbers of layers. Options are 18, 34, 50, 101, 152.
 pretrained (bool, default False) – Whether to load the pretrained weights for model.
 ctx (Context, default CPU) – The context in which to load the pretrained weights.
 root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

gluoncv.model_zoo.
get_ssd
(name, base_size, features, filters, sizes, ratios, steps, classes, dataset, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]¶ Get SSD models.
Parameters:  name (str or None) – Model name, if None is used, you must specify features to be a HybridBlock.
 base_size (int) – Base image size for training, this is fixed once training is assigned. A fixed base size still allows you to have variable input size during test.
 features (iterable of str or HybridBlock) – List of network internal output names, in order to specify which layers are used for predicting bbox values. If name is None, features must be a HybridBlock which generate mutliple outputs for prediction.
 filters (iterable of float or None) – List of convolution layer channels which is going to be appended to the base network feature extractor. If name is None, this is ignored.
 sizes (iterable fo float) – Sizes of anchor boxes, this should be a list of floats, in incremental order.
The length of sizes must be len(layers) + 1. For example, a two stage SSD
model can have
sizes = [30, 60, 90]
, and it converts to [30, 60] and [60, 90] for the two stages, respectively. For more details, please refer to original paper.  ratios (iterable of list) – Aspect ratios of anchors in each output layer. Its length must be equals to the number of SSD output layers.
 steps (list of int) – Step size of anchor boxes in each output layer.
 classes (iterable of str) – Names of categories.
 dataset (str) – Name of dataset. This is used to identify model name because models trained on differnet datasets are going to be very different.
 pretrained (bool, optional, default is False) – Load pretrained weights.
 pretrained_base (bool, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.
 ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).
 root (str) – Model weights storing path.
Returns: A SSD detection network.
Return type:

gluoncv.model_zoo.
get_vgg_atrous_extractor
(num_layers, im_size, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]¶ Get VGG atrous feature extractor networks.
Parameters: Returns: The returned network.
Return type:

gluoncv.model_zoo.
resnet101_v1b
(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]¶ Constructs a ResNetV1b101 model.
Parameters:  pretrained (bool, default False) – Whether to load the pretrained weights for model.
 root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
 ctx (Context, default CPU) – The context in which to load the pretrained weights.
 dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yilding a stride 8 model.
 norm_layer (object) – Normalization layer used in backbone network (default:
mxnet.gluon.norm_layer
;

gluoncv.model_zoo.
resnet152_v1b
(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]¶ Constructs a ResNetV1b152 model.
Parameters:  pretrained (bool, default False) – Whether to load the pretrained weights for model.
 root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
 ctx (Context, default CPU) – The context in which to load the pretrained weights.
 dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yilding a stride 8 model.
 norm_layer (object) – Normalization layer used in backbone network (default:
mxnet.gluon.norm_layer
;

gluoncv.model_zoo.
resnet18_v1b
(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]¶ Constructs a ResNetV1b18 model.
Parameters:  pretrained (bool, default False) – Whether to load the pretrained weights for model.
 root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
 ctx (Context, default CPU) – The context in which to load the pretrained weights.
 dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yilding a stride 8 model.
 norm_layer (object) – Normalization layer used in backbone network (default:
mxnet.gluon.norm_layer
; for Synchronized CrossGPU BachNormalization).

gluoncv.model_zoo.
resnet34_v1b
(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]¶ Constructs a ResNetV1b34 model.
Parameters:  pretrained (bool, default False) – Whether to load the pretrained weights for model.
 root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
 ctx (Context, default CPU) – The context in which to load the pretrained weights.
 dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yilding a stride 8 model.
 norm_layer (object) – Normalization layer used in backbone network (default:
mxnet.gluon.norm_layer
;

gluoncv.model_zoo.
resnet50_v1b
(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]¶ Constructs a ResNetV1b50 model.
Parameters:  pretrained (bool, default False) – Whether to load the pretrained weights for model.
 root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
 ctx (Context, default CPU) – The context in which to load the pretrained weights.
 dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yilding a stride 8 model.
 norm_layer (object) – Normalization layer used in backbone network (default:
mxnet.gluon.norm_layer
;

gluoncv.model_zoo.
resnet50_v2a
(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]¶ Constructs a ResNet50v2a model.
Please ignore this if you are looking for model for other tasks.
Parameters:  pretrained (bool, default False) – Whether to load the pretrained weights for model.
 root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
 ctx (Context, default mx.cpu(0)) – The context in which to load the pretrained weights.
 norm_layer (object) – Normalization layer used in backbone network (default:
mxnet.gluon.nn.BatchNorm
;

gluoncv.model_zoo.
se_resnet101_v1
(**kwargs)[source]¶ SE_ResNet101 V1 model from “Deep Residual Learning for Image Recognition” paper.
Parameters:

gluoncv.model_zoo.
se_resnet101_v2
(**kwargs)[source]¶ SE_ResNet101 V2 model from “Identity Mappings in Deep Residual Networks” paper.
Parameters:

gluoncv.model_zoo.
se_resnet152_v1
(**kwargs)[source]¶ SE_ResNet152 V1 model from “Deep Residual Learning for Image Recognition” paper.
Parameters:

gluoncv.model_zoo.
se_resnet152_v2
(**kwargs)[source]¶ SE_ResNet152 V2 model from “Identity Mappings in Deep Residual Networks” paper.
Parameters:

gluoncv.model_zoo.
se_resnet18_v1
(**kwargs)[source]¶ SE_ResNet18 V1 model from “Deep Residual Learning for Image Recognition” paper.
Parameters:

gluoncv.model_zoo.
se_resnet18_v2
(**kwargs)[source]¶ SE_ResNet18 V2 model from “Identity Mappings in Deep Residual Networks” paper.
Parameters:

gluoncv.model_zoo.
se_resnet34_v1
(**kwargs)[source]¶ SE_ResNet34 V1 model from “Deep Residual Learning for Image Recognition” paper.
Parameters:

gluoncv.model_zoo.
se_resnet34_v2
(**kwargs)[source]¶ SE_ResNet34 V2 model from “Identity Mappings in Deep Residual Networks” paper.
Parameters:

gluoncv.model_zoo.
se_resnet50_v1
(**kwargs)[source]¶ SE_ResNet50 V1 model from “Deep Residual Learning for Image Recognition” paper.
Parameters:

gluoncv.model_zoo.
se_resnet50_v2
(**kwargs)[source]¶ SE_ResNet50 V2 model from “Identity Mappings in Deep Residual Networks” paper.
Parameters:

gluoncv.model_zoo.
ssd_300_vgg16_atrous_coco
(pretrained=False, pretrained_base=True, **kwargs)[source]¶ SSD architecture with VGG16 atrous 300x300 base network for COCO.
Parameters: Returns: A SSD detection network.
Return type:

gluoncv.model_zoo.
ssd_300_vgg16_atrous_voc
(pretrained=False, pretrained_base=True, **kwargs)[source]¶ SSD architecture with VGG16 atrous 300x300 base network for Pascal VOC.
Parameters: Returns: A SSD detection network.
Return type:

gluoncv.model_zoo.
ssd_512_mobilenet1_0_coco
(pretrained=False, pretrained_base=True, **kwargs)[source]¶ SSD architecture with mobilenet1.0 base networks for COCO.
Parameters: Returns: A SSD detection network.
Return type:

gluoncv.model_zoo.
ssd_512_mobilenet1_0_voc
(pretrained=False, pretrained_base=True, **kwargs)[source]¶ SSD architecture with mobilenet1.0 base networks.
Parameters: Returns: A SSD detection network.
Return type:

gluoncv.model_zoo.
ssd_512_resnet101_v2_voc
(pretrained=False, pretrained_base=True, **kwargs)[source]¶ SSD architecture with ResNet v2 101 layers.
Parameters: Returns: A SSD detection network.
Return type:

gluoncv.model_zoo.
ssd_512_resnet152_v2_voc
(pretrained=False, pretrained_base=True, **kwargs)[source]¶ SSD architecture with ResNet v2 152 layers.
Parameters: Returns: A SSD detection network.
Return type:

gluoncv.model_zoo.
ssd_512_resnet18_v1_voc
(pretrained=False, pretrained_base=True, **kwargs)[source]¶ SSD architecture with ResNet v1 18 layers.
Parameters: Returns: A SSD detection network.
Return type:

gluoncv.model_zoo.
ssd_512_resnet50_v1_coco
(pretrained=False, pretrained_base=True, **kwargs)[source]¶ SSD architecture with ResNet v1 50 layers for COCO.
Parameters: Returns: A SSD detection network.
Return type:

gluoncv.model_zoo.
ssd_512_resnet50_v1_voc
(pretrained=False, pretrained_base=True, **kwargs)[source]¶ SSD architecture with ResNet v1 50 layers.
Parameters: Returns: A SSD detection network.
Return type:

gluoncv.model_zoo.
ssd_512_vgg16_atrous_coco
(pretrained=False, pretrained_base=True, **kwargs)[source]¶ SSD architecture with VGG16 atrous layers for COCO.
Parameters: Returns: A SSD detection network.
Return type:

gluoncv.model_zoo.
ssd_512_vgg16_atrous_voc
(pretrained=False, pretrained_base=True, **kwargs)[source]¶ SSD architecture with VGG16 atrous 512x512 base network.
Parameters: Returns: A SSD detection network.
Return type: