Table Of Contents
Table Of Contents

gluoncv.model_zoo

GluonCV Model Zoo

gluoncv.model_zoo.get_model

Returns a pre-defined GluonCV model by name.

Hint

This is the recommended method for getting a pre-defined model.

It support directly loading models from Gluon Model Zoo as well.

get_model

Returns a pre-defined model by name

Image Classification

CIFAR

get_cifar_resnet

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper.ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper..

cifar_resnet20_v1

ResNet-20 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

cifar_resnet56_v1

ResNet-56 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

cifar_resnet110_v1

ResNet-110 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

cifar_resnet20_v2

ResNet-20 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

cifar_resnet56_v2

ResNet-56 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

cifar_resnet110_v2

ResNet-110 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

get_cifar_wide_resnet

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper.ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper..

cifar_wideresnet16_10

WideResNet-16-10 model for CIFAR10 from “Wide Residual Networks” paper.

cifar_wideresnet28_10

WideResNet-28-10 model for CIFAR10 from “Wide Residual Networks” paper.

cifar_wideresnet40_8

WideResNet-40-8 model for CIFAR10 from “Wide Residual Networks” paper.

ImageNet

We apply dilattion strategy to pre-trained ResNet models (with stride of 8). Please see gluoncv.model_zoo.SegBaseModel for how to use it.

ResNetV1b

Pre-trained ResNetV1b Model, which produces the strides of 8 featuremaps at conv5.

resnet18_v1b

Constructs a ResNetV1b-18 model.

resnet34_v1b

Constructs a ResNetV1b-34 model.

resnet50_v1b

Constructs a ResNetV1b-50 model.

resnet101_v1b

Constructs a ResNetV1b-101 model.

resnet152_v1b

Constructs a ResNetV1b-152 model.

Object Detection

SSD

SSD

Single-shot Object Detection Network: https://arxiv.org/abs/1512.02325.

get_ssd

Get SSD models.

ssd_300_vgg16_atrous_voc

SSD architecture with VGG16 atrous 300x300 base network for Pascal VOC.

ssd_300_vgg16_atrous_coco

SSD architecture with VGG16 atrous 300x300 base network for COCO.

ssd_300_vgg16_atrous_custom

SSD architecture with VGG16 atrous 300x300 base network for COCO.

ssd_512_vgg16_atrous_voc

SSD architecture with VGG16 atrous 512x512 base network.

ssd_512_vgg16_atrous_coco

SSD architecture with VGG16 atrous layers for COCO.

ssd_512_vgg16_atrous_custom

SSD architecture with VGG16 atrous 300x300 base network for COCO.

ssd_512_resnet50_v1_voc

SSD architecture with ResNet v1 50 layers.

ssd_512_resnet50_v1_coco

SSD architecture with ResNet v1 50 layers for COCO.

ssd_512_resnet50_v1_custom

SSD architecture with ResNet50 v1 512 base network for custom dataset.

ssd_512_resnet101_v2_voc

SSD architecture with ResNet v2 101 layers.

ssd_512_resnet152_v2_voc

SSD architecture with ResNet v2 152 layers.

VGGAtrousExtractor

VGG Atrous multi layer feature extractor which produces multiple output feature maps.

get_vgg_atrous_extractor

Get VGG atrous feature extractor networks.

vgg16_atrous_300

Get VGG atrous 16 layer 300 in_size feature extractor networks.

vgg16_atrous_512

Get VGG atrous 16 layer 512 in_size feature extractor networks.

Faster RCNN

FasterRCNN

Faster RCNN network.

get_faster_rcnn

Utility function to return faster rcnn networks.

faster_rcnn_resnet50_v1b_voc

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J.

faster_rcnn_resnet50_v1b_coco

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J.

faster_rcnn_resnet50_v1b_custom

Faster RCNN model with resnet50_v1b base network on custom dataset.

YOLOv3

YOLOV3

YOLO V3 detection network.

get_yolov3

Get YOLOV3 models.

yolo3_darknet53_voc

YOLO3 multi-scale with darknet53 base network on VOC dataset.

yolo3_darknet53_coco

YOLO3 multi-scale with darknet53 base network on COCO dataset.

yolo3_darknet53_custom

YOLO3 multi-scale with darknet53 base network on custom dataset.

Instance Segmentation

Mask RCNN

MaskRCNN

Mask RCNN network.

get_mask_rcnn

Utility function to return mask rcnn networks.

mask_rcnn_resnet50_v1b_coco

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R.

Semantic Segmentation

FCN

FCN

Fully Convolutional Networks for Semantic Segmentation

get_fcn

FCN model from the paper “Fully Convolutional Network for semantic segmentation”

get_fcn_resnet50_voc

FCN model with base network ResNet-50 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

get_fcn_resnet101_voc

FCN model with base network ResNet-101 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

get_fcn_resnet101_coco

FCN model with base network ResNet-101 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

get_fcn_resnet50_ade

FCN model with base network ResNet-50 pre-trained on ADE20K dataset from the paper “Fully Convolutional Network for semantic segmentation”

get_fcn_resnet101_ade

FCN model with base network ResNet-50 pre-trained on ADE20K dataset from the paper “Fully Convolutional Network for semantic segmentation”

PSPNet

PSPNet

Pyramid Scene Parsing Network

get_psp

Pyramid Scene Parsing Network :param dataset: The dataset that model pretrained on.

get_psp_resnet101_coco

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

get_psp_resnet101_voc

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

get_psp_resnet50_ade

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

get_psp_resnet101_ade

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

DeepLabV3

DeepLabV3

param nclass

Number of categories for the training dataset.

get_deeplab

DeepLabV3 :param dataset: The dataset that model pretrained on.

get_deeplab_resnet101_coco

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

get_deeplab_resnet101_voc

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

get_deeplab_resnet50_ade

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

get_deeplab_resnet101_ade

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

API Reference

Network definitions of GluonCV models

GluonCV Model Zoo

class gluoncv.model_zoo.AlexNet(classes=1000, **kwargs)[source]

AlexNet model from the “One weird trick…” paper.

Parameters

classes (int, default 1000) – Number of classes for the output layer.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BasicBlockV1(channels, stride, downsample=False, in_channels=0, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

BasicBlock V1 from “Deep Residual Learning for Image Recognition” paper. This is used for ResNet V1 for 18, 34 layers.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BasicBlockV1b(planes, strides=1, dilation=1, downsample=None, previous_dilation=1, norm_layer=None, norm_kwargs=None, **kwargs)[source]

ResNetV1b BasicBlockV1b

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BasicBlockV2(channels, stride, downsample=False, in_channels=0, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

BasicBlock V2 from “Identity Mappings in Deep Residual Networks” paper. This is used for ResNet V2 for 18, 34 layers.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BottleneckV1(channels, stride, downsample=False, in_channels=0, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Bottleneck V1 from “Deep Residual Learning for Image Recognition” paper. This is used for ResNet V1 for 50, 101, 152 layers.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BottleneckV1b(planes, strides=1, dilation=1, downsample=None, previous_dilation=1, norm_layer=None, norm_kwargs=None, last_gamma=False, **kwargs)[source]

ResNetV1b BottleneckV1b

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BottleneckV2(channels, stride, downsample=False, in_channels=0, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Bottleneck V2 from “Identity Mappings in Deep Residual Networks” paper. This is used for ResNet V2 for 50, 101, 152 layers.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.CenterNet(base_network, heads, classes, head_conv_channel=0, scale=4.0, topk=100, flip_test=False, nms_thresh=0, nms_topk=400, post_nms=100, **kwargs)[source]

Objects as Points. https://arxiv.org/abs/1904.07850v2

Parameters
  • base_network (mxnet.gluon.nn.HybridBlock) – The base feature extraction network.

  • heads (OrderedDict) –

    OrderedDict with specifications for each head. For example: OrderedDict([

    (‘heatmap’, {‘num_output’: len(classes), ‘bias’: -2.19}), (‘wh’, {‘num_output’: 2}), (‘reg’, {‘num_output’: 2}) ])

  • classes (list of str) – Category names.

  • head_conv_channel (int, default is 0) – If > 0, will use an extra conv layer before each of the real heads.

  • scale (float, default is 4.0) – The downsampling ratio of the entire network.

  • topk (int, default is 100) – Number of outputs .

  • flip_test (bool) – Whether apply flip test in inference (training mode not affected).

  • nms_thresh (float, default is 0.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS. By default nms is disabled.

  • nms_topk (int, default is 400) –

    Apply NMS to top k detection results, use -1 to disable so that every Detection

    result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

hybrid_forward(F, x)[source]

Hybrid forward of center net

property num_classes

Return number of foreground classes.

Returns

Number of foreground classes

Return type

int

reset_class(classes, reuse_weights=None)[source]

Reset class categories and class predictors.

Parameters
  • classes (iterable of str) – The new categories. [‘apple’, ‘orange’] for example.

  • reuse_weights (dict) – A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict, or a list of [name0, name1,…] if class names don’t change. This allows the new predictor to reuse the previously trained weights specified.

Example

>>> net = gluoncv.model_zoo.get_model('center_net_resnet50_v1b_voc', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the 14th category in VOC
>>> net.reset_class(classes=['person'], reuse_weights={0:14})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':14})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
set_nms(nms_thresh=0, nms_topk=400, post_nms=100)[source]

Set non-maximum suppression parameters.

Parameters
  • nms_thresh (float, default is 0.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS. By default NMS is disabled.

  • nms_topk (int, default is 400) –

    Apply NMS to top k detection results, use -1 to disable so that every Detection

    result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

Returns

Return type

None

class gluoncv.model_zoo.DUC(planes, upscale_factor=2, **kwargs)[source]

Upsampling layer with pixel shuffle

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DarknetV3(layers, channels, classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Darknet v3.

Parameters
features

Feature extraction layers.

Type

mxnet.gluon.nn.HybridSequential

output

A classes(1000)-way Fully-Connected Layer.

Type

mxnet.gluon.nn.Dense

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DeepLabV3(nclass, backbone='resnet50', aux=True, ctx=cpu(0), pretrained_base=True, height=None, width=None, base_size=520, crop_size=480, **kwargs)[source]
Parameters
  • nclass (int) – Number of categories for the training dataset.

  • backbone (string) – Pre-trained dilated backbone network type (default:’resnet50’; ‘resnet50’, ‘resnet101’ or ‘resnet152’).

  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).

  • aux (bool) – Auxiliary loss.

Reference:

Chen, Liang-Chieh, et al. “Rethinking atrous convolution for semantic image segmentation.” arXiv preprint arXiv:1706.05587 (2017).

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DeepLabV3Plus(nclass, backbone='xception', aux=True, ctx=cpu(0), pretrained_base=True, height=None, width=None, base_size=576, crop_size=512, dilated=True, **kwargs)[source]
Parameters
  • nclass (int) – Number of categories for the training dataset.

  • backbone (string) – Pre-trained dilated backbone network type (default:’xception’).

  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).

  • aux (bool) – Auxiliary loss.

Reference:

Chen, Liang-Chieh, et al. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.”

evaluate(x)[source]

evaluating network with inputs and targets

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DeepLabWV3Plus(nclass, backbone='wideresnet', aux=False, ctx=cpu(0), pretrained_base=True, height=None, width=None, base_size=520, crop_size=480, dilated=True, **kwargs)[source]
Parameters
  • nclass (int) – Number of categories for the training dataset.

  • backbone (string) – Pre-trained dilated backbone network type (default:’wideresnet’).

  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).

  • aux (bool) – Auxiliary loss.

  • Reference – Chen, Liang-Chieh, et al. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.”, https://arxiv.org/abs/1802.02611, ECCV 2018

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DenseNet(num_init_features, growth_rate, block_config, bn_size=4, dropout=0, classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Densenet-BC model from the “Densely Connected Convolutional Networks” paper.

Parameters
  • num_init_features (int) – Number of filters to learn in the first convolution layer.

  • growth_rate (int) – Number of filters to add each layer (k in the paper).

  • block_config (list of int) – List of integers for numbers of layers in each pooling block.

  • bn_size (int, default 4) – Multiplicative factor for number of bottle neck layers. (i.e. bn_size * k features in the bottleneck layer)

  • dropout (float, default 0) – Rate of dropout after each dense layer.

  • classes (int, default 1000) – Number of classification classes.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.FCN(nclass, backbone='resnet50', aux=True, ctx=cpu(0), pretrained_base=True, base_size=520, crop_size=480, **kwargs)[source]

Fully Convolutional Networks for Semantic Segmentation

Parameters
  • nclass (int) – Number of categories for the training dataset.

  • backbone (string) – Pre-trained dilated backbone network type (default:’resnet50’; ‘resnet50’, ‘resnet101’ or ‘resnet152’).

  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm;

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

  • pretrained_base (bool or str) – Refers to if the FCN backbone or the encoder is pretrained or not. If True, model weights of a model that was trained on ImageNet is loaded.

Reference:

Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” CVPR, 2015

Examples

>>> model = FCN(nclass=21, backbone='resnet50')
>>> print(model)
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.FasterRCNN(features, top_features, classes, box_features=None, short=600, max_size=1000, min_stage=4, max_stage=4, train_patterns=None, nms_thresh=0.3, nms_topk=400, post_nms=100, roi_mode='align', roi_size=(14, 14), strides=16, clip=None, rpn_channel=1024, base_size=16, scales=(8, 16, 32), ratios=(0.5, 1, 2), alloc_size=(128, 128), rpn_nms_thresh=0.7, rpn_train_pre_nms=12000, rpn_train_post_nms=2000, rpn_test_pre_nms=6000, rpn_test_post_nms=300, rpn_min_size=16, per_device_batch_size=1, num_sample=128, pos_iou_thresh=0.5, pos_ratio=0.25, max_num_gt=300, additional_output=False, force_nms=False, **kwargs)[source]

Faster RCNN network.

Parameters
  • features (gluon.HybridBlock) – Base feature extractor before feature pooling layer.

  • top_features (gluon.HybridBlock) – Tail feature extractor after feature pooling layer.

  • classes (iterable of str) – Names of categories, its length is num_class.

  • box_features (gluon.HybridBlock, default is None) – feature head for transforming shared ROI output (top_features) for box prediction. If set to None, global average pooling will be used.

  • short (int, default is 600.) – Input image short side size.

  • max_size (int, default is 1000.) – Maximum size of input image long side.

  • min_stage (int, default is 4) – Minimum stage NO. for FPN stages.

  • max_stage (int, default is 4) – Maximum stage NO. for FPN stages.

  • train_patterns (str, default is None.) – Matching pattern for trainable parameters.

  • nms_thresh (float, default is 0.3.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

  • nms_topk (int, default is 400) –

    Apply NMS to top k detection results, use -1 to disable so that every Detection

    result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

  • roi_mode (str, default is align) – ROI pooling mode. Currently support ‘pool’ and ‘align’.

  • roi_size (tuple of int, length 2, default is (14, 14)) – (height, width) of the ROI region.

  • strides (int/tuple of ints, default is 16) – Feature map stride with respect to original image. This is usually the ratio between original image size and feature map size. For FPN, use a tuple of ints.

  • clip (float, default is None) – Clip bounding box target to this value.

  • rpn_channel (int, default is 1024) – Channel number used in RPN convolutional layers.

  • base_size (int) – The width(and height) of reference anchor box.

  • scales (iterable of float, default is (8, 16, 32)) –

    The areas of anchor boxes. We use the following form to compute the shapes of anchors:

    \[width_{anchor} = size_{base} \times scale \times \sqrt{ 1 / ratio} height_{anchor} = size_{base} \times scale \times \sqrt{ratio}\]

  • ratios (iterable of float, default is (0.5, 1, 2)) – The aspect ratios of anchor boxes. We expect it to be a list or tuple.

  • alloc_size (tuple of int) – Allocate size for the anchor boxes as (H, W). Usually we generate enough anchors for large feature map, e.g. 128x128. Later in inference we can have variable input sizes, at which time we can crop corresponding anchors from this large anchor map so we can skip re-generating anchors for each input.

  • rpn_train_pre_nms (int, default is 12000) – Filter top proposals before NMS in training of RPN.

  • rpn_train_post_nms (int, default is 2000) – Return top proposal results after NMS in training of RPN. Will be set to rpn_train_pre_nms if it is larger than rpn_train_pre_nms.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN. Will be set to rpn_test_pre_nms if it is larger than rpn_test_pre_nms.

  • rpn_nms_thresh (float, default is 0.7) – IOU threshold for NMS. It is used to remove overlapping proposals.

  • rpn_num_sample (int, default is 256) – Number of samples for RPN targets.

  • rpn_pos_iou_thresh (float, default is 0.7) – Anchor with IOU larger than pos_iou_thresh is regarded as positive samples.

  • rpn_neg_iou_thresh (float, default is 0.3) – Anchor with IOU smaller than neg_iou_thresh is regarded as negative samples. Anchors with IOU in between pos_iou_thresh and neg_iou_thresh are ignored.

  • rpn_pos_ratio (float, default is 0.5) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.

  • rpn_box_norm (array-like of size 4, default is (1., 1., 1., 1.)) – Std value to be divided from encoded values.

  • rpn_min_size (int, default is 16) – Proposals whose size is smaller than min_size will be discarded.

  • per_device_batch_size (int, default is 1) – Batch size for each device during training.

  • num_sample (int, default is 128) – Number of samples for RCNN targets.

  • pos_iou_thresh (float, default is 0.5) – Proposal whose IOU larger than pos_iou_thresh is regarded as positive samples.

  • pos_ratio (float, default is 0.25) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.

  • max_num_gt (int, default is 300) – Maximum ground-truth number in whole training dataset. This is only an upper bound, not necessarily very precise. However, using a very big number may impact the training speed.

  • additional_output (boolean, default is False) – additional_output is only used for Mask R-CNN to get internal outputs.

  • force_nms (bool, default is False) – Appy NMS to all categories, this is to avoid overlapping detection results from different categories.

classes

Names of categories, its length is num_class.

Type

iterable of str

num_class

Number of positive categories.

Type

int

short

Input image short side size.

Type

int

max_size

Maximum size of input image long side.

Type

int

train_patterns

Matching pattern for trainable parameters.

Type

str

nms_thresh

Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

Type

float

nms_topk
Apply NMS to top k detection results, use -1 to disable so that every Detection

result is used in NMS.

Type

int

force_nms

Appy NMS to all categories, this is to avoid overlapping detection results from different categories.

Type

bool

post_nms

Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

Type

int

rpn_target_generator

Generate training targets with cls_target, box_target, and box_mask.

Type

gluon.Block

target_generator

Generate training targets with boxes, samples, matches, gt_label and gt_box.

Type

gluon.Block

hybrid_forward(F, x, gt_box=None, gt_label=None)[source]

Forward Faster-RCNN network.

The behavior during training and inference is different.

Parameters
  • x (mxnet.nd.NDArray or mxnet.symbol) – The network input tensor.

  • gt_box (type, only required during training) – The ground-truth bbox tensor with shape (B, N, 4).

  • gt_label (type, only required during training) – The ground-truth label tensor with shape (B, 1, 4).

Returns

During inference, returns final class id, confidence scores, bounding boxes.

Return type

(ids, scores, bboxes)

reset_class(classes, reuse_weights=None)[source]

Reset class categories and class predictors.

Parameters
  • classes (iterable of str) – The new categories. [‘apple’, ‘orange’] for example.

  • reuse_weights (dict) – A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict, or a list of [name0, name1,…] if class names don’t change. This allows the new predictor to reuse the previously trained weights specified.

Example

>>> net = gluoncv.model_zoo.get_model('faster_rcnn_resnet50_v1b_coco', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the 14th category in VOC
>>> net.reset_class(classes=['person'], reuse_weights={0:14})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':14})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
property target_generator

Returns stored target generator

Returns

The RCNN target generator

Return type

mxnet.gluon.HybridBlock

class gluoncv.model_zoo.GoogLeNet(classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, dropout_ratio=0.4, aux_logits=False, norm_kwargs=None, partial_bn=False, pretrained_base=True, ctx=None, **kwargs)[source]

GoogleNet model from “Going Deeper with Convolutions” paper. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” paper.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.HybridBlock(prefix=None, params=None)[source]

HybridBlock supports forwarding with both Symbol and NDArray.

HybridBlock is similar to Block, with a few differences:

import mxnet as mx
from mxnet.gluon import HybridBlock, nn

class Model(HybridBlock):
    def __init__(self, **kwargs):
        super(Model, self).__init__(**kwargs)
        # use name_scope to give child Blocks appropriate names.
        with self.name_scope():
            self.dense0 = nn.Dense(20)
            self.dense1 = nn.Dense(20)

    def hybrid_forward(self, F, x):
        x = F.relu(self.dense0(x))
        return F.relu(self.dense1(x))

model = Model()
model.initialize(ctx=mx.cpu(0))
model.hybridize()
model(mx.nd.zeros((10, 10), ctx=mx.cpu(0)))

Forward computation in HybridBlock must be static to work with Symbol s, i.e. you cannot call NDArray.asnumpy(), NDArray.shape, NDArray.dtype, NDArray indexing (x[i]) etc on tensors. Also, you cannot use branching or loop logic that bases on non-constant expressions like random numbers or intermediate results, since they change the graph structure for each iteration.

Before activating with hybridize(), HybridBlock works just like normal Block. After activation, HybridBlock will create a symbolic graph representing the forward computation and cache it. On subsequent forwards, the cached graph will be used instead of hybrid_forward().

Please see references for detailed tutorial.

References

Hybrid - Faster training and easy deployment

cast(dtype)[source]

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

export(path, epoch=0, remove_amp_cast=True)[source]

Export HybridBlock to json format that can be loaded by SymbolBlock.imports, mxnet.mod.Module or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters
  • path (str) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number.

  • epoch (int) – Epoch number of saved model.

forward(x, *args)[source]

Defines the forward computation. Arguments can be either NDArray or Symbol.

hybrid_forward(F, x, *args, **kwargs)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

hybridize(active=True, **kwargs)[source]

Activates or deactivates HybridBlock s recursively. Has no effect on non-hybrid children.

Parameters
  • active (bool, default True) – Whether to turn hybrid on or off.

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

infer_shape(*args)[source]

Infers shape of Parameters from inputs.

infer_type(*args)[source]

Infers data type of Parameters from inputs.

register_child(block, name=None)[source]

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

class gluoncv.model_zoo.I3D_InceptionV1(nclass=1000, pretrained_base=True, num_segments=1, num_crop=1, dropout_ratio=0.5, init_std=0.01, partial_bn=False, ctx=None, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Inception v1 model from “Going Deeper with Convolutions” paper.

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper. Slight differences between this implementation and the original implementation due to padding.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.I3D_InceptionV3(nclass=1000, pretrained_base=True, num_segments=1, num_crop=1, dropout_ratio=0.5, init_std=0.01, partial_bn=False, ctx=None, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Inception v3 model from “Rethinking the Inception Architecture for Computer Vision” paper.

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

This model definition file is written by Brais and modified by Yi.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.I3D_ResNetV1(nclass, depth, num_stages=4, pretrained=False, pretrained_base=True, num_segments=1, num_crop=1, spatial_strides=(1, 2, 2, 2), temporal_strides=(1, 1, 1, 1), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), conv1_kernel_t=5, conv1_stride_t=2, pool1_kernel_t=1, pool1_stride_t=2, inflate_freq=(1, 1, 1, 1), inflate_stride=(1, 1, 1, 1), inflate_style='3x1x1', nonlocal_stages=(-1, ), nonlocal_freq=(0, 1, 1, 0), nonlocal_cfg=None, bn_eval=True, bn_frozen=False, partial_bn=False, frozen_stages=-1, dropout_ratio=0.5, init_std=0.01, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, ctx=None, **kwargs)[source]

ResNet_I3D backbone. Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

Parameters
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.

  • num_stages (int) – Resnet stages, normally 4.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • bn_eval (bool) – Whether to set BN layers to eval mode, namely, freeze running stats (mean and var).

  • bn_frozen (bool) – Whether to freeze weight and bias of BN layers.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.Inception3(classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, partial_bn=False, **kwargs)[source]

Inception v3 model from “Rethinking the Inception Architecture for Computer Vision” paper.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.MaskRCNN(features, top_features, classes, mask_channels=256, rcnn_max_dets=1000, rpn_test_pre_nms=6000, rpn_test_post_nms=1000, target_roi_scale=1, num_fcn_convs=0, norm_layer=None, norm_kwargs=None, **kwargs)[source]

Mask RCNN network.

Parameters
  • features (gluon.HybridBlock) – Base feature extractor before feature pooling layer.

  • top_features (gluon.HybridBlock) – Tail feature extractor after feature pooling layer.

  • classes (iterable of str) – Names of categories, its length is num_class.

  • mask_channels (int, default is 256) – Number of channels in mask prediction

  • rcnn_max_dets (int, default is 1000) – Number of rois to retain in RCNN. Upper bounded by min of rpn_test_pre_nms and rpn_test_post_nms.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 1000) – Return top proposal results after NMS in testing of RPN. Will be set to rpn_test_pre_nms if it is larger than rpn_test_pre_nms.

  • target_roi_scale (int, default 1) – Ratio of mask output roi / input roi. For model with FPN, this is typically 2.

  • num_fcn_convs (int, default 0) – number of convolution blocks before deconv layer. For FPN network this is typically 4.

hybrid_forward(F, x, gt_box=None, gt_label=None)[source]

Forward Mask RCNN network.

The behavior during training and inference is different.

Parameters
  • x (mxnet.nd.NDArray or mxnet.symbol) – The network input tensor.

  • gt_box (type, only required during training) – The ground-truth bbox tensor with shape (1, N, 4).

  • gt_label (type, only required during training) – The ground-truth label tensor with shape (B, 1, 4).

Returns

During inference, returns final class id, confidence scores, bounding boxes, segmentation masks.

Return type

(ids, scores, bboxes, masks)

reset_class(classes, reuse_weights=None)[source]

Reset class categories and class predictors.

Parameters
  • classes (iterable of str) – The new categories. [‘apple’, ‘orange’] for example.

  • reuse_weights (dict) – A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict, or a list of [name0, name1,…] if class names don’t change. This allows the new predictor to reuse the previously trained weights specified.

Example

>>> net = gluoncv.model_zoo.get_model('mask_rcnn_resnet50_v1b_voc', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the first category in COCO
>>> net.reset_class(classes=['person'], reuse_weights={0:0})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':0})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
class gluoncv.model_zoo.MobileNet(multiplier=1.0, classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.MobileNetV2(multiplier=1.0, classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper. :param multiplier: The width multiplier for controlling the model size. The actual number of channels

is equal to the original channel size multiplied by this multiplier.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.MobilePose(base_name, base_attrs=('features', ), num_joints=17, pretrained_base=False, pretrained_ctx=cpu(0), **kwargs)[source]

Pose Estimation for Mobile Device

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.PSPNet(nclass, backbone='resnet50', aux=True, ctx=cpu(0), pretrained_base=True, base_size=520, crop_size=480, **kwargs)[source]

Pyramid Scene Parsing Network

Parameters
  • nclass (int) – Number of categories for the training dataset.

  • backbone (string) – Pre-trained dilated backbone network type (default:’resnet50’; ‘resnet50’, ‘resnet101’ or ‘resnet152’).

  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).

  • aux (bool) – Auxiliary loss.

Reference:

Zhao, Hengshuang, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. “Pyramid scene parsing network.” CVPR, 2017

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.RCNNTargetGenerator(num_class, max_pos=128, per_device_batch_size=1, means=(0.0, 0.0, 0.0, 0.0), stds=(0.1, 0.1, 0.2, 0.2))[source]

RCNN target encoder to generate matching target and regression target values.

Parameters
  • num_class (int) – Number of total number of positive classes.

  • max_pos (int, default is 128) – Upper bound of Number of positive samples.

  • per_device_batch_size (int, default is 1) – Per device batch size

  • means (iterable of float, default is (0., 0., 0., 0.)) – Mean values to be subtracted from regression targets.

  • stds (iterable of float, default is (1, 1, 2, 2)) – Standard deviations to be divided from regression targets.

hybrid_forward(F, roi, samples, matches, gt_label, gt_box)[source]

Components can handle batch images

Parameters
  • roi ((B, N, 4), input proposals) –

  • samples ((B, N), value +1: positive / -1: negative.) –

  • matches ((B, N), value [0, M), index to gt_label and gt_box.) –

  • gt_label ((B, M), value [0, num_class), excluding background class.) –

  • gt_box ((B, M, 4), input ground truth box corner coordinates.) –

Returns

  • cls_target ((B, N), value [0, num_class + 1), including background.)

  • box_target ((B, N, C, 4), only foreground class has nonzero target.)

  • box_weight ((B, N, C, 4), only foreground class has nonzero weight.)

class gluoncv.model_zoo.RCNNTargetSampler(num_image, num_proposal, num_sample, pos_iou_thresh, pos_ratio, max_num_gt)[source]

A sampler to choose positive/negative samples from RCNN Proposals

Parameters
  • num_image (int) – Number of input images.

  • num_proposal (int) – Number of input proposals.

  • num_sample (int) – Number of samples for RCNN targets.

  • pos_iou_thresh (float) – Proposal whose IOU larger than pos_iou_thresh is regarded as positive samples. Proposal whose IOU smaller than pos_iou_thresh is regarded as negative samples.

  • pos_ratio (float) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.

  • max_num_gt (int) – Maximum ground-truth number in whole training dataset. This is only an upper bound, not necessarily very precise. However, using a very big number may impact the training speed.

hybrid_forward(F, rois, scores, gt_boxes)[source]

Handle B=self._num_image by a for loop.

Parameters
  • rois ((B, self._num_proposal, 4) encoded in (x1, y1, x2, y2)) –

  • scores ((B, self._num_proposal, 1), value range [0, 1] with ignore value -1.) –

  • gt_boxes ((B, M, 4) encoded in (x1, y1, x2, y2), invalid box should have area of 0.) –

Returns

  • rois ((B, self._num_sample, 4), randomly drawn from proposals)

  • samples ((B, self._num_sample), value +1: positive / 0: ignore / -1: negative.)

  • matches ((B, self._num_sample), value between [0, M))

class gluoncv.model_zoo.ResNetV1(block, layers, channels, classes=1000, thumbnail=False, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
  • block (HybridBlock) – Class for the residual block. Options are BasicBlockV1, BottleneckV1.

  • layers (list of int) – Numbers of layers in each block

  • channels (list of int) – Numbers of channels in each block. Length should be one larger than layers list.

  • classes (int, default 1000) – Number of classification classes.

  • thumbnail (bool, default False) – Enable thumbnail.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_se (bool, default False) – Whether to use Squeeze-and-Excitation module

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.ResNetV1b(block, layers, classes=1000, dilated=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, last_gamma=False, deep_stem=False, stem_width=32, avg_down=False, final_drop=0.0, use_global_stats=False, name_prefix='', **kwargs)[source]

Pre-trained ResNetV1b Model, which produces the strides of 8 featuremaps at conv5.

Parameters
  • block (Block) – Class for the residual block. Options are BasicBlockV1, BottleneckV1.

  • layers (list of int) – Numbers of layers in each block

  • classes (int, default 1000) – Number of classification classes.

  • dilated (bool, default False) – Applying dilation strategy to pretrained ResNet yielding a stride-8 model, typically used in Semantic Segmentation.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • deep_stem (bool, default False) – Whether to replace the 7x7 conv1 with 3 3x3 convolution layers.

  • avg_down (bool, default False) – Whether to use average pooling for projection skip connection between stages/downsample.

  • final_drop (float, default 0.0) – Dropout ratio before the final classification layer.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

Reference:

  • He, Kaiming, et al. “Deep residual learning for image recognition.”

Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

  • Yu, Fisher, and Vladlen Koltun. “Multi-scale context aggregation by dilated convolutions.”

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.ResNetV2(block, layers, channels, classes=1000, thumbnail=False, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • block (HybridBlock) – Class for the residual block. Options are BasicBlockV1, BottleneckV1.

  • layers (list of int) – Numbers of layers in each block

  • channels (list of int) – Numbers of channels in each block. Length should be one larger than layers list.

  • classes (int, default 1000) – Number of classification classes.

  • thumbnail (bool, default False) – Enable thumbnail.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_se (bool, default False) – Whether to use Squeeze-and-Excitation module

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.ResidualAttentionModel(scale, m, classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper. Input size is 224 x 224.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_BasicBlockV1(channels, stride, downsample=False, in_channels=0, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

BasicBlock V1 from “Deep Residual Learning for Image Recognition” paper. This is used for SE_ResNet V1 for 18, 34 layers.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_BasicBlockV2(channels, stride, downsample=False, in_channels=0, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

BasicBlock V2 from “Identity Mappings in Deep Residual Networks” paper. This is used for SE_ResNet V2 for 18, 34 layers.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_BottleneckV1(channels, stride, downsample=False, in_channels=0, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Bottleneck V1 from “Deep Residual Learning for Image Recognition” paper. This is used for SE_ResNet V1 for 50, 101, 152 layers.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_BottleneckV2(channels, stride, downsample=False, in_channels=0, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Bottleneck V2 from “Identity Mappings in Deep Residual Networks” paper. This is used for SE_ResNet V2 for 50, 101, 152 layers.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_ResNetV1(block, layers, channels, classes=1000, thumbnail=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

SE_ResNet V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_ResNetV2(block, layers, channels, classes=1000, thumbnail=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

SE_ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SSD(network, base_size, features, num_filters, sizes, ratios, steps, classes, use_1x1_transition=True, use_bn=True, reduce_ratio=1.0, min_depth=128, global_pool=False, pretrained=False, stds=(0.1, 0.1, 0.2, 0.2), nms_thresh=0.45, nms_topk=400, post_nms=100, anchor_alloc_size=128, ctx=cpu(0), norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Single-shot Object Detection Network: https://arxiv.org/abs/1512.02325.

Parameters
  • network (string or None) – Name of the base network, if None is used, will instantiate the base network from features directly instead of composing.

  • base_size (int) – Base input size, it is speficied so SSD can support dynamic input shapes.

  • features (list of str or mxnet.gluon.HybridBlock) – Intermediate features to be extracted or a network with multi-output. If network is None, features is expected to be a multi-output network.

  • num_filters (list of int) – Number of channels for the appended layers, ignored if network`is `None.

  • sizes (iterable fo float) – Sizes of anchor boxes, this should be a list of floats, in incremental order. The length of sizes must be len(layers) + 1. For example, a two stage SSD model can have sizes = [30, 60, 90], and it converts to [30, 60] and [60, 90] for the two stages, respectively. For more details, please refer to original paper.

  • ratios (iterable of list) – Aspect ratios of anchors in each output layer. Its length must be equals to the number of SSD output layers.

  • steps (list of int) – Step size of anchor boxes in each output layer.

  • classes (iterable of str) – Names of all categories.

  • use_1x1_transition (bool) – Whether to use 1x1 convolution as transition layer between attached layers, it is effective reducing model capacity.

  • use_bn (bool) – Whether to use BatchNorm layer after each attached convolutional layer.

  • reduce_ratio (float) – Channel reduce ratio (0, 1) of the transition layer.

  • min_depth (int) – Minimum channels for the transition layers.

  • global_pool (bool) – Whether to attach a global average pooling layer as the last output layer.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • stds (tuple of float, default is (0.1, 0.1, 0.2, 0.2)) – Std values to be divided/multiplied to box encoded values.

  • nms_thresh (float, default is 0.45.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

  • nms_topk (int, default is 400) –

    Apply NMS to top k detection results, use -1 to disable so that every Detection

    result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

  • anchor_alloc_size (tuple of int, default is (128, 128)) – For advanced users. Define anchor_alloc_size to generate large enough anchor maps, which will later saved in parameters. During inference, we support arbitrary input image by cropping corresponding area of the anchor map. This allow us to export to symbol so we can run it in c++, scalar, etc.

  • ctx (mx.Context) – Network context.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm. This will only apply to base networks that has norm_layer specified, will ignore if the base network (e.g. VGG) don’t accept this argument.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Hybrid forward

property num_classes

Return number of foreground classes.

Returns

Number of foreground classes

Return type

int

reset_class(classes, reuse_weights=None)[source]

Reset class categories and class predictors.

Parameters
  • classes (iterable of str) – The new categories. [‘apple’, ‘orange’] for example.

  • reuse_weights (dict) – A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict, or a list of [name0, name1,…] if class names don’t change. This allows the new predictor to reuse the previously trained weights specified.

Example

>>> net = gluoncv.model_zoo.get_model('ssd_512_resnet50_v1_voc', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the 14th category in VOC
>>> net.reset_class(classes=['person'], reuse_weights={0:14})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':14})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
set_nms(nms_thresh=0.45, nms_topk=400, post_nms=100)[source]

Set non-maximum suppression parameters.

Parameters
  • nms_thresh (float, default is 0.45.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

  • nms_topk (int, default is 400) –

    Apply NMS to top k detection results, use -1 to disable so that every Detection

    result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

Returns

Return type

None

class gluoncv.model_zoo.SimplePoseResNet(base_name='resnet50_v1b', pretrained_base=False, pretrained_ctx=cpu(0), num_joints=17, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), final_conv_kernel=1, deconv_with_bias=False, **kwargs)[source]
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SlowFast(nclass, block=<class 'gluoncv.model_zoo.action_recognition.slowfast.Bottleneck'>, layers=None, pretrained=False, pretrained_base=False, num_segments=1, num_crop=1, bn_eval=True, bn_frozen=False, partial_bn=False, frozen_stages=-1, dropout_ratio=0.5, init_std=0.01, alpha=8, beta_inv=8, fusion_conv_channel_ratio=2, fusion_kernel_size=5, width_per_group=64, num_groups=1, slow_temporal_stride=16, fast_temporal_stride=2, slow_frames=4, fast_frames=32, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, ctx=None, **kwargs)[source]

SlowFast networks (SlowFast) from “SlowFast Networks for Video Recognition” paper.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SqueezeNet(version, classes=1000, **kwargs)[source]

SqueezeNet model from the “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size” paper. SqueezeNet 1.1 model from the official SqueezeNet repo. SqueezeNet 1.1 has 2.4x less computation and slightly fewer parameters than SqueezeNet 1.0, without sacrificing accuracy.

Parameters
  • version (str) – Version of squeezenet. Options are ‘1.0’, ‘1.1’.

  • classes (int, default 1000) – Number of classification classes.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.VGG(layers, filters, classes=1000, batch_norm=False, **kwargs)[source]

VGG model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • layers (list of int) – Numbers of layers in each feature block.

  • filters (list of int) – Numbers of filters in each feature block. List length should match the layers.

  • classes (int, default 1000) – Number of classification classes.

  • batch_norm (bool, default False) – Use batch normalization.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.VGGAtrousExtractor(layers, filters, extras, batch_norm=False, **kwargs)[source]

VGG Atrous multi layer feature extractor which produces multiple output feature maps.

Parameters
  • layers (list of int) – Number of layer for vgg base network.

  • filters (list of int) – Number of convolution filters for each layer.

  • extras (list of list) – Extra layers configurations.

  • batch_norm (bool) – If True, will use BatchNorm layers.

hybrid_forward(F, x, init_scale)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.Xception65(classes=1000, output_stride=32, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None)[source]

Modified Aligned Xception

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.Xception71(classes=1000, output_stride=32, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None)[source]

Modified Aligned Xception

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.YOLOV3(stages, channels, anchors, strides, classes, alloc_size=(128, 128), nms_thresh=0.45, nms_topk=400, post_nms=100, pos_iou_thresh=1.0, ignore_iou_thresh=0.7, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO V3 detection network. Reference: https://arxiv.org/pdf/1804.02767.pdf. :param stages: Staged feature extraction blocks.

For example, 3 stages and 3 YOLO output layers are used original paper.

Parameters
  • channels (iterable) – Number of conv channels for each appended stage. len(channels) should match len(stages).

  • num_class (int) – Number of foreground objects.

  • anchors (iterable) – The anchor setting. len(anchors) should match len(stages).

  • strides (iterable) – Strides of feature map. len(strides) should match len(stages).

  • alloc_size (tuple of int, default is (128, 128)) – For advanced users. Define alloc_size to generate large enough anchor maps, which will later saved in parameters. During inference, we support arbitrary input image by cropping corresponding area of the anchor map. This allow us to export to symbol so we can run it in c++, Scalar, etc.

  • nms_thresh (float, default is 0.45.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

  • nms_topk (int, default is 400) –

    Apply NMS to top k detection results, use -1 to disable so that every Detection

    result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

  • pos_iou_thresh (float, default is 1.0) – IOU threshold for true anchors that match real objects. ‘pos_iou_thresh < 1’ is not implemented.

  • ignore_iou_thresh (float) – Anchors that has IOU in range(ignore_iou_thresh, pos_iou_thresh) don’t get penalized of objectness score.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

property classes

Return names of (non-background) categories. :returns: Names of (non-background) categories. :rtype: iterable of str

hybrid_forward(F, x, *args)[source]

YOLOV3 network hybrid forward. :param F: F is mxnet.sym if hybridized or mxnet.nd if not. :type F: mxnet.nd or mxnet.sym :param x: Input data. :type x: mxnet.nd.NDArray :param *args: During training, extra inputs are required:

(gt_boxes, obj_t, centers_t, scales_t, weights_t, clas_t) These are generated by YOLOV3PrefetchTargetGenerator in dataloader transform function.

Returns

During inference, return detections in shape (B, N, 6) with format (cid, score, xmin, ymin, xmax, ymax) During training, return losses only: (obj_loss, center_loss, scale_loss, cls_loss).

Return type

(tuple of) mxnet.nd.NDArray

property num_class

Number of (non-background) categories. :returns: Number of (non-background) categories. :rtype: int

reset_class(classes, reuse_weights=None)[source]

Reset class categories and class predictors. :param classes: The new categories. [‘apple’, ‘orange’] for example. :type classes: iterable of str :param reuse_weights: A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict,

or a list of [name0, name1,…] if class names don’t change. This allows the new predictor to reuse the previously trained weights specified.

Example

>>> net = gluoncv.model_zoo.get_model('yolo3_darknet53_voc', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the 14th category in VOC
>>> net.reset_class(classes=['person'], reuse_weights={0:14})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':14})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
set_nms(nms_thresh=0.45, nms_topk=400, post_nms=100)[source]

Set non-maximum suppression parameters. :param nms_thresh: Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS. :type nms_thresh: float, default is 0.45. :param nms_topk:

Apply NMS to top k detection results, use -1 to disable so that every Detection

result is used in NMS.

Parameters

post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

Returns

Return type

None

gluoncv.model_zoo.alexnet(pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

AlexNet model from the “One weird trick…” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

gluoncv.model_zoo.center_net_dla34_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_dla34_dcnv2_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network with deformable v2 conv layers on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_dla34_dcnv2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network with deformable conv layers on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_dla34_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with dla34 base network on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet101_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet101_v1b_dcnv2_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network with deformable v2 conv layers on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet101_v1b_dcnv2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network with deformable conv layers on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet101_v1b_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet18_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet18_v1b base network on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet18_v1b_dcnv2_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet18_v1b base network with deformable v2 conv layer on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet18_v1b_dcnv2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet18_v1b base network with deformable v2 conv layers on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet18_v1b_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet18_v1b base network on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet50_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet50_v1b base network on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet50_v1b_dcnv2_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet50_v1b base network with deformable v2 conv layers on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet50_v1b_dcnv2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet50_v1b base network with deformable conv layers on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet50_v1b_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet50_v1b base network on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

class gluoncv.model_zoo.cifar_ResidualAttentionModel(scale, m, classes=10, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper. Input size is 32 x 32.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

gluoncv.model_zoo.cifar_residualattentionnet452(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.cifar_residualattentionnet56(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.cifar_residualattentionnet92(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.cifar_resnet110_v1(**kwargs)[source]

ResNet-110 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

Parameters
gluoncv.model_zoo.cifar_resnet110_v2(**kwargs)[source]

ResNet-110 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.cifar_resnet20_v1(**kwargs)[source]

ResNet-20 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

Parameters
gluoncv.model_zoo.cifar_resnet20_v2(**kwargs)[source]

ResNet-20 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.cifar_resnet56_v1(**kwargs)[source]

ResNet-56 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

Parameters
gluoncv.model_zoo.cifar_resnet56_v2(**kwargs)[source]

ResNet-56 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.cifar_wideresnet16_10(**kwargs)[source]

WideResNet-16-10 model for CIFAR10 from “Wide Residual Networks” paper.

Parameters
gluoncv.model_zoo.cifar_wideresnet28_10(**kwargs)[source]

WideResNet-28-10 model for CIFAR10 from “Wide Residual Networks” paper.

Parameters
gluoncv.model_zoo.cifar_wideresnet40_8(**kwargs)[source]

WideResNet-40-8 model for CIFAR10 from “Wide Residual Networks” paper.

Parameters
gluoncv.model_zoo.cpu(device_id=0)[source]

Returns a CPU context.

This function is a short cut for Context('cpu', device_id). For most operations, when no context is specified, the default context is cpu().

Examples

>>> with mx.cpu():
...     cpu_array = mx.nd.ones((2, 3))
>>> cpu_array.context
cpu(0)
>>> cpu_array = mx.nd.ones((2, 3), ctx=mx.cpu())
>>> cpu_array.context
cpu(0)
Parameters

device_id (int, optional) – The device id of the device. device_id is not needed for CPU. This is included to make interface compatible with GPU.

Returns

context – The corresponding CPU context.

Return type

Context

gluoncv.model_zoo.darknet53(**kwargs)[source]

Darknet v3 53 layer network. Reference: https://arxiv.org/pdf/1804.02767.pdf.

Parameters
Returns

Darknet network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.densenet121(**kwargs)[source]

Densenet-BC 121-layer model from the “Densely Connected Convolutional Networks” paper.

Parameters
gluoncv.model_zoo.densenet161(**kwargs)[source]

Densenet-BC 161-layer model from the “Densely Connected Convolutional Networks” paper.

Parameters
gluoncv.model_zoo.densenet169(**kwargs)[source]

Densenet-BC 169-layer model from the “Densely Connected Convolutional Networks” paper.

Parameters
gluoncv.model_zoo.densenet201(**kwargs)[source]

Densenet-BC 201-layer model from the “Densely Connected Convolutional Networks” paper.

Parameters
gluoncv.model_zoo.faster_rcnn_fpn_bn_resnet50_v1b_coco(pretrained=False, pretrained_base=True, num_devices=0, **kwargs)[source]

Faster RCNN model with FPN from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks” “Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S. (2016). Feature Pyramid Networks for Object Detection”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • num_devices (int, default is 0) – Number of devices for sync batch norm layer. if less than 1, use all devices available.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_fpn_bn_resnet50_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_fpn_resnet101_v1d_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model with FPN from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks” “Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S. (2016). Feature Pyramid Networks for Object Detection”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_fpn_resnet101_v1d_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_fpn_resnet50_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model with FPN from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks” “Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S. (2016). Feature Pyramid Networks for Object Detection”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_fpn_resnet50_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_resnet101_v1d_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks”

Parameters
  • pretrained (bool, optional, default is False) – Load pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_resnet101_v1d_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_resnet101_v1d_custom(classes, transfer=None, pretrained_base=True, pretrained=False, **kwargs)[source]

Faster RCNN model with resnet101_v1d base network on custom dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • transfer (str or None) – If not None, will try to reuse pre-trained weights from faster RCNN networks trained on other datasets.

  • pretrained_base (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Returns

Hybrid faster RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.faster_rcnn_resnet101_v1d_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks”

Parameters
  • pretrained (bool, optional, default is False) – Load pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_resnet101_v1d_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_resnet50_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_resnet50_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_resnet50_v1b_custom(classes, transfer=None, pretrained_base=True, pretrained=False, **kwargs)[source]

Faster RCNN model with resnet50_v1b base network on custom dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • transfer (str or None) – If not None, will try to reuse pre-trained weights from faster RCNN networks trained on other datasets.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Returns

Hybrid faster RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.faster_rcnn_resnet50_v1b_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_resnet50_v1b_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_center_net(name, dataset, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Get a center net instance.

Parameters
  • name (str or None) – Model name, if None is used, you must specify features to be a HybridBlock.

  • dataset (str) – Name of dataset. This is used to identify model name because models trained on different datasets are going to be very different.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.get_cifar_resnet(version, num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.get_cifar_wide_resnet(num_layers, width_factor=1, drop_rate=0.0, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • num_layers (int) – Numbers of layers. Needs to be an integer in the form of 6*n+2, e.g. 20, 56, 110, 164.

  • width_factor (int) – The width factor to apply to the number of channels from the original resnet.

  • drop_rate (float) – The rate of dropout.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_darknet(darknet_version, num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Get darknet by version and num_layers info.

Parameters
  • darknet_version (str) – Darknet version, choices are [‘v3’].

  • num_layers (int) – Number of layers.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

Darknet network.

Return type

mxnet.gluon.HybridBlock

Examples

>>> model = get_darknet('v3', 53, pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

DeepLabV3 :param dataset: The dataset that model pretrained on. (pascal_voc, ade20k) :type dataset: str, default pascal_voc :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn(dataset='pascal_voc', backbone='resnet50', pretrained=False)
>>> print(model)
gluoncv.model_zoo.get_deeplab_plus(dataset='pascal_voc', backbone='xception', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

DeepLabV3Plus :param dataset: The dataset that model pretrained on. (pascal_voc, ade20k) :type dataset: str, default pascal_voc :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn(dataset='pascal_voc', backbone='xception', pretrained=False)
>>> print(model)
gluoncv.model_zoo.get_deeplab_plus_xception_coco(**kwargs)[source]

DeepLabV3Plus :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_plus_xception_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet101_ade(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet101_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet101_coco(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet101_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet101_voc(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet101_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet152_coco(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet152_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet152_voc(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet152_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet50_ade(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet50_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_v3b_plus_wideresnet_citys(**kwargs)[source]

DeepLabWV3Plus :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_v3b_plus_wideresnet_citys(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplabv3b_plus(dataset='citys', backbone='wideresnet', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

DeepLabWV3Plus :param dataset: The dataset that model pretrained on. (pascal_voc, ade20k, citys) :type dataset: str, default pascal_voc :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplabv3b_plus(dataset='citys', backbone='wideresnet', pretrained=False)
>>> print(model)
gluoncv.model_zoo.get_faster_rcnn(name, dataset, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Utility function to return faster rcnn networks.

Parameters
  • name (str) – Model name.

  • dataset (str) – The name of dataset.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

Returns

The Faster-RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.get_fcn(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), pretrained_base=True, **kwargs)[source]

FCN model from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • dataset (str, default pascal_voc) – The dataset that model pretrained on. (pascal_voc, ade20k)

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • pretrained_base (bool or str, default True) – This will load pretrained backbone network, that was trained on ImageNet.

Examples

>>> model = get_fcn(dataset='pascal_voc', backbone='resnet50', pretrained=False)
>>> print(model)
gluoncv.model_zoo.get_fcn_resnet101_ade(**kwargs)[source]

FCN model with base network ResNet-50 pre-trained on ADE20K dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_resnet50_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_fcn_resnet101_coco(**kwargs)[source]

FCN model with base network ResNet-101 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_resnet101_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_fcn_resnet101_voc(**kwargs)[source]

FCN model with base network ResNet-101 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_resnet101_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_fcn_resnet50_ade(**kwargs)[source]

FCN model with base network ResNet-50 pre-trained on ADE20K dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_resnet50_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_fcn_resnet50_voc(**kwargs)[source]

FCN model with base network ResNet-50 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_resnet50_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_mask_rcnn(name, dataset, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Utility function to return mask rcnn networks.

Parameters
  • name (str) – Model name.

  • dataset (str) – The name of dataset.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

Returns

The Mask RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.get_mobilenet(multiplier, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper.

Parameters
  • multiplier (float) – The width multiplier for controlling the model size. Only multipliers that are no less than 0.25 are supported. The actual number of channels is equal to the original channel size multiplied by this multiplier.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_mobilenet_v2(multiplier, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper.

Parameters
  • multiplier (float) – The width multiplier for controlling the model size. Only multipliers that are no less than 0.25 are supported. The actual number of channels is equal to the original channel size multiplied by this multiplier.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_model(name, **kwargs)[source]

Returns a pre-defined model by name

Parameters
  • name (str) – Name of the model.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • classes (int) – Number of classes for the output layer.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Returns

The model.

Return type

HybridBlock

gluoncv.model_zoo.get_model_list()[source]

Get the entire list of model names in model_zoo.

Returns

Entire list of model names in model_zoo.

Return type

list of str

gluoncv.model_zoo.get_nasnet(repeat=6, penultimate_filters=4032, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

NASNet A model from “Learning Transferable Architectures for Scalable Image Recognition” paper

Parameters
  • repeat (int) – Number of cell repeats

  • penultimate_filters (int) – Number of filters in the penultimate layer of the network

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_psp(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), pretrained_base=True, **kwargs)[source]

Pyramid Scene Parsing Network :param dataset: The dataset that model pretrained on. (pascal_voc, ade20k) :type dataset: str, default pascal_voc :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • pretrained_base (bool or str, default True) – This will load pretrained backbone network, that was trained on ImageNet.

Examples

>>> model = get_fcn(dataset='pascal_voc', backbone='resnet50', pretrained=False)
>>> print(model)
gluoncv.model_zoo.get_psp_resnet101_ade(**kwargs)[source]

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_psp_resnet101_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_psp_resnet101_citys(**kwargs)[source]

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_psp_resnet101_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_psp_resnet101_coco(**kwargs)[source]

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_psp_resnet101_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_psp_resnet101_voc(**kwargs)[source]

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_psp_resnet101_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_psp_resnet50_ade(**kwargs)[source]

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_psp_resnet50_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_resnet(version, num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', use_se=False, **kwargs)[source]

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • version (int) – Version of ResNet. Options are 1, 2.

  • num_layers (int) – Numbers of layers. Options are 18, 34, 50, 101, 152.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • use_se (bool, default False) – Whether to use Squeeze-and-Excitation module

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_se_resnet(version, num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

SE_ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. SE_ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • version (int) – Version of ResNet. Options are 1, 2.

  • num_layers (int) – Numbers of layers. Options are 18, 34, 50, 101, 152.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_ssd(name, base_size, features, filters, sizes, ratios, steps, classes, dataset, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Get SSD models.

Parameters
  • name (str or None) – Model name, if None is used, you must specify features to be a HybridBlock.

  • base_size (int) – Base image size for training, this is fixed once training is assigned. A fixed base size still allows you to have variable input size during test.

  • features (iterable of str or HybridBlock) – List of network internal output names, in order to specify which layers are used for predicting bbox values. If name is None, features must be a HybridBlock which generate multiple outputs for prediction.

  • filters (iterable of float or None) – List of convolution layer channels which is going to be appended to the base network feature extractor. If name is None, this is ignored.

  • sizes (iterable fo float) – Sizes of anchor boxes, this should be a list of floats, in incremental order. The length of sizes must be len(layers) + 1. For example, a two stage SSD model can have sizes = [30, 60, 90], and it converts to [30, 60] and [60, 90] for the two stages, respectively. For more details, please refer to original paper.

  • ratios (iterable of list) – Aspect ratios of anchors in each output layer. Its length must be equals to the number of SSD output layers.

  • steps (list of int) – Step size of anchor boxes in each output layer.

  • classes (iterable of str) – Names of categories.

  • dataset (str) – Name of dataset. This is used to identify model name because models trained on different datasets are going to be very different.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.get_vgg(num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

VGG model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • num_layers (int) – Number of layers for the variant of densenet. Options are 11, 13, 16, 19.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

gluoncv.model_zoo.get_vgg_atrous_extractor(num_layers, im_size, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Get VGG atrous feature extractor networks.

Parameters
  • num_layers (int) – VGG types, can be 11,13,16,19.

  • im_size (int) – VGG detection input size, can be 300, 512.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (mx.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

Returns

The returned network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.get_xcetption(pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Xception model from

Parameters
gluoncv.model_zoo.get_xcetption_71(pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Xception model from

Parameters
gluoncv.model_zoo.get_yolov3(name, stages, filters, anchors, strides, classes, dataset, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Get YOLOV3 models. :param name: Model name, if None is used, you must specify features to be a HybridBlock. :type name: str or None :param stages: List of network internal output names, in order to specify which layers are

used for predicting bbox values. If name is None, features must be a HybridBlock which generate multiple outputs for prediction.

Parameters
  • filters (iterable of float or None) – List of convolution layer channels which is going to be appended to the base network feature extractor. If name is None, this is ignored.

  • sizes (iterable fo float) – Sizes of anchor boxes, this should be a list of floats, in incremental order. The length of sizes must be len(layers) + 1. For example, a two stage SSD model can have sizes = [30, 60, 90], and it converts to [30, 60] and [60, 90] for the two stages, respectively. For more details, please refer to original paper.

  • ratios (iterable of list) – Aspect ratios of anchors in each output layer. Its length must be equals to the number of SSD output layers.

  • steps (list of int) – Step size of anchor boxes in each output layer.

  • classes (iterable of str) – Names of categories.

  • dataset (str) – Name of dataset. This is used to identify model name because models trained on different datasets are going to be very different.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A YOLOV3 detection network.

Return type

HybridBlock

gluoncv.model_zoo.googlenet(classes=1000, pretrained=False, pretrained_base=True, ctx=cpu(0), dropout_ratio=0.4, aux_logits=False, root='~/.mxnet/models', partial_bn=False, **kwargs)[source]

GoogleNet model from “Going Deeper with Convolutions” paper. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.i3d_inceptionv1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, **kwargs)[source]

Inception v1 model from “Going Deeper with Convolutions” paper.

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.i3d_inceptionv3_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, **kwargs)[source]

Inception v3 model from “Rethinking the Inception Architecture for Computer Vision” paper.

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.i3d_nl10_resnet101_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, **kwargs)[source]

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper. “Non-local Neural Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.i3d_nl10_resnet50_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, **kwargs)[source]

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper. “Non-local Neural Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.i3d_nl5_resnet101_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, **kwargs)[source]

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper. “Non-local Neural Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.i3d_nl5_resnet50_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, **kwargs)[source]

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper. “Non-local Neural Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.i3d_resnet101_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, **kwargs)[source]

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.i3d_resnet50_v1_hmdb51(nclass=51, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, use_kinetics_pretrain=True, **kwargs)[source]

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.i3d_resnet50_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, **kwargs)[source]

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.i3d_resnet50_v1_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, **kwargs)[source]

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.i3d_resnet50_v1_ucf101(nclass=101, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, use_kinetics_pretrain=True, **kwargs)[source]

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.inception_v3(pretrained=False, ctx=cpu(0), root='~/.mxnet/models', partial_bn=False, **kwargs)[source]

Inception v3 model from “Rethinking the Inception Architecture for Computer Vision” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.mask_rcnn_fpn_bn_mobilenet1_0_coco(pretrained=False, pretrained_base=True, num_devices=0, rcnn_max_dets=1000, rpn_test_pre_nms=6000, rpn_test_post_nms=1000, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • num_devices (int, default is 0) – Number of devices for sync batch norm layer. if less than 1, use all devices available.

  • rcnn_max_dets (int, default is 1000) – Number of rois to retain in RCNN.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_fpn_bn_mobilenet1_0_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_fpn_bn_resnet18_v1b_coco(pretrained=False, pretrained_base=True, num_devices=0, rcnn_max_dets=1000, rpn_test_pre_nms=6000, rpn_test_post_nms=1000, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • num_devices (int, default is 0) – Number of devices for sync batch norm layer. if less than 1, use all devices available.

  • rcnn_max_dets (int, default is 1000) – Number of rois to retain in RCNN.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_fpn_resnet18_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_fpn_resnet101_v1d_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_fpn_resnet101_v1d_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_fpn_resnet18_v1b_coco(pretrained=False, pretrained_base=True, rcnn_max_dets=1000, rpn_test_pre_nms=6000, rpn_test_post_nms=1000, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • rcnn_max_dets (int, default is 1000) – Number of rois to retain in RCNN.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_fpn_resnet18_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_fpn_resnet50_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_resnet50_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_resnet101_v1d_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_resnet101_v1d_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_resnet18_v1b_coco(pretrained=False, pretrained_base=True, rcnn_max_dets=1000, rpn_test_pre_nms=6000, rpn_test_post_nms=1000, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • rcnn_max_dets (int, default is 1000) – Number of rois to retain in RCNN.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_resnet18_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_resnet50_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_resnet50_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mobilenet0_25(**kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper, with width multiplier 0.25.

Parameters
gluoncv.model_zoo.mobilenet0_5(**kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper, with width multiplier 0.5.

Parameters
gluoncv.model_zoo.mobilenet0_75(**kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper, with width multiplier 0.75.

Parameters
gluoncv.model_zoo.mobilenet1_0(**kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper, with width multiplier 1.0.

Parameters
gluoncv.model_zoo.mobilenet_v2_0_25(**kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper.

Parameters
gluoncv.model_zoo.mobilenet_v2_0_5(**kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper.

Parameters
gluoncv.model_zoo.mobilenet_v2_0_75(**kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper.

Parameters
gluoncv.model_zoo.mobilenet_v2_1_0(**kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper.

Parameters
gluoncv.model_zoo.nasnet_4_1056(**kwargs)[source]

NASNet A model from “Learning Transferable Architectures for Scalable Image Recognition” paper

Parameters
  • repeat (int) – Number of cell repeats

  • penultimate_filters (int) – Number of filters in the penultimate layer of the network

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.nasnet_5_1538(**kwargs)[source]

NASNet A model from “Learning Transferable Architectures for Scalable Image Recognition” paper

Parameters
  • repeat (int) – Number of cell repeats

  • penultimate_filters (int) – Number of filters in the penultimate layer of the network

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.nasnet_6_4032(**kwargs)[source]

NASNet A model from “Learning Transferable Architectures for Scalable Image Recognition” paper

Parameters
  • repeat (int) – Number of cell repeats

  • penultimate_filters (int) – Number of filters in the penultimate layer of the network

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.nasnet_7_1920(**kwargs)[source]

NASNet A model from “Learning Transferable Architectures for Scalable Image Recognition” paper

Parameters
  • repeat (int) – Number of cell repeats

  • penultimate_filters (int) – Number of filters in the penultimate layer of the network

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.pretrained_model_list()[source]

Get list of model which has pretrained weights available.

gluoncv.model_zoo.residualattentionnet128(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.residualattentionnet164(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.residualattentionnet200(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.residualattentionnet236(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.residualattentionnet452(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.residualattentionnet56(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.residualattentionnet92(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.resnet101_v1(**kwargs)[source]

ResNet-101 V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
gluoncv.model_zoo.resnet101_v1b(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-101 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet101_v1b_gn(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-50 GroupNorm model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet101_v1c(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1c-101 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet101_v1d(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1d-101 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet101_v1e(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1e-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet101_v1s(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1s-101 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet101_v2(**kwargs)[source]

ResNet-101 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.resnet152_v1(**kwargs)[source]

ResNet-152 V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
gluoncv.model_zoo.resnet152_v1b(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-152 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet152_v1c(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1c-152 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet152_v1d(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1d-152 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet152_v1e(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1e-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet152_v1s(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1s-152 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet152_v2(**kwargs)[source]

ResNet-152 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.resnet18_v1(**kwargs)[source]

ResNet-18 V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
gluoncv.model_zoo.resnet18_v1b(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-18 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet18_v2(**kwargs)[source]

ResNet-18 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.resnet34_v1(**kwargs)[source]

ResNet-34 V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
gluoncv.model_zoo.resnet34_v1b(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-34 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet34_v2(**kwargs)[source]

ResNet-34 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.resnet50_v1(**kwargs)[source]

ResNet-50 V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
gluoncv.model_zoo.resnet50_v1b(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet50_v1b_gn(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-50 GroupNorm model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet50_v1c(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1c-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet50_v1d(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1d-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet50_v1e(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1e-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet50_v1s(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1s-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet50_v2(**kwargs)[source]

ResNet-50 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet101_v1(**kwargs)[source]

SE-ResNet-101 V1 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet101_v2(**kwargs)[source]

SE-ResNet-101 V2 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet152_v1(**kwargs)[source]

SE-ResNet-152 V1 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet152_v2(**kwargs)[source]

SE-ResNet-152 V2 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet18_v1(**kwargs)[source]

SE-ResNet-18 V1 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet18_v2(**kwargs)[source]

SE-ResNet-18 V2 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet34_v1(**kwargs)[source]

SE-ResNet-34 V1 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet34_v2(**kwargs)[source]

SE-ResNet-34 V2 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet50_v1(**kwargs)[source]

SE-ResNet-50 V1 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet50_v2(**kwargs)[source]

SE-ResNet-50 V2 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.simple_pose_resnet101_v1b(**kwargs)[source]

ResNet-101 backbone model from “Simple Baselines for Human Pose Estimation and Tracking” paper. :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.simple_pose_resnet101_v1d(**kwargs)[source]

ResNet-101-d backbone model from “Simple Baselines for Human Pose Estimation and Tracking” paper. :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.simple_pose_resnet152_v1b(**kwargs)[source]

ResNet-152 backbone model from “Simple Baselines for Human Pose Estimation and Tracking” paper. :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.simple_pose_resnet152_v1d(**kwargs)[source]

ResNet-152-d backbone model from “Simple Baselines for Human Pose Estimation and Tracking” paper. :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.simple_pose_resnet18_v1b(**kwargs)[source]

ResNet-18 backbone model from “Simple Baselines for Human Pose Estimation and Tracking” paper. :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.simple_pose_resnet50_v1b(**kwargs)[source]

ResNet-50 backbone model from “Simple Baselines for Human Pose Estimation and Tracking” paper. :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.simple_pose_resnet50_v1d(**kwargs)[source]

ResNet-50-d backbone model from “Simple Baselines for Human Pose Estimation and Tracking” paper. :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.slowfast_4x16_resnet50_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

SlowFast networks (SlowFast) from “SlowFast Networks for Video Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.slowfast_8x8_resnet50_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

SlowFast networks (SlowFast) from “SlowFast Networks for Video Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.squeezenet1_0(**kwargs)[source]

SqueezeNet 1.0 model from the “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.squeezenet1_1(**kwargs)[source]

SqueezeNet 1.1 model from the official SqueezeNet repo. SqueezeNet 1.1 has 2.4x less computation and slightly fewer parameters than SqueezeNet 1.0, without sacrificing accuracy.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.ssd_300_mobilenet0_25_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with mobilenet0.25 base networks for COCO.

Parameters
Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_300_mobilenet0_25_custom(classes, pretrained_base=True, pretrained=False, transfer=None, **kwargs)[source]

SSD architecture with mobilenet0.25 300 base network for custom dataset.

Parameters
Returns

A SSD detection network.

Return type

HybridBlock

Example

>>> net = ssd_300_mobilenet0_25_custom(classes=['a', 'b', 'c'], pretrained_base=True)
>>> net = ssd_300_mobilenet0_25_custom(classes=['foo', 'bar'], transfer='voc')
gluoncv.model_zoo.ssd_300_mobilenet0_25_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with mobilenet0.25 base networks.

Parameters
Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_300_vgg16_atrous_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with VGG16 atrous 300x300 base network for COCO.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_300_vgg16_atrous_custom(classes, pretrained_base=True, pretrained=False, transfer=None, **kwargs)[source]

SSD architecture with VGG16 atrous 300x300 base network for COCO.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • transfer (str or None) – If not None, will try to reuse pre-trained weights from SSD networks trained on other datasets.

Returns

A SSD detection network.

Return type

HybridBlock

Example

>>> net = ssd_300_vgg16_atrous_custom(classes=['a', 'b', 'c'], pretrained_base=True)
>>> net = ssd_300_vgg16_atrous_custom(classes=['foo', 'bar'], transfer='coco')
gluoncv.model_zoo.ssd_300_vgg16_atrous_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with VGG16 atrous 300x300 base network for Pascal VOC.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_mobilenet1_0_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with mobilenet1.0 base networks for COCO.

Parameters
Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_mobilenet1_0_custom(classes, pretrained_base=True, pretrained=False, transfer=None, **kwargs)[source]

SSD architecture with mobilenet1.0 512 base network for custom dataset.

Parameters
Returns

A SSD detection network.

Return type

HybridBlock

Example

>>> net = ssd_512_mobilenet1_0_custom(classes=['a', 'b', 'c'], pretrained_base=True)
>>> net = ssd_512_mobilenet1_0_custom(classes=['foo', 'bar'], transfer='voc')
gluoncv.model_zoo.ssd_512_mobilenet1_0_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with mobilenet1.0 base networks.

Parameters
Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_resnet101_v2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v2 101 layers.

Parameters
Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_resnet152_v2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v2 152 layers.

Parameters
Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_resnet18_v1_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v1 18 layers.

Parameters
Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_resnet18_v1_custom(classes, pretrained_base=True, pretrained=False, transfer=None, **kwargs)[source]

SSD architecture with ResNet18 v1 512 base network for COCO.

Parameters
Returns

A SSD detection network.

Return type

HybridBlock

Example

>>> net = ssd_512_resnet18_v1_custom(classes=['a', 'b', 'c'], pretrained_base=True)
>>> net = ssd_512_resnet18_v1_custom(classes=['foo', 'bar'], transfer='voc')
gluoncv.model_zoo.ssd_512_resnet18_v1_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v1 18 layers.

Parameters
Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_resnet50_v1_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v1 50 layers for COCO.

Parameters
Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_resnet50_v1_custom(classes, pretrained_base=True, pretrained=False, transfer=None, **kwargs)[source]

SSD architecture with ResNet50 v1 512 base network for custom dataset.

Parameters
Returns

A SSD detection network.

Return type

HybridBlock

Example

>>> net = ssd_512_resnet50_v1_custom(classes=['a', 'b', 'c'], pretrained_base=True)
>>> net = ssd_512_resnet50_v1_custom(classes=['foo', 'bar'], transfer='voc')
gluoncv.model_zoo.ssd_512_resnet50_v1_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v1 50 layers.

Parameters
Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_vgg16_atrous_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with VGG16 atrous layers for COCO.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_vgg16_atrous_custom(classes, pretrained_base=True, pretrained=False, transfer=None, **kwargs)[source]

SSD architecture with VGG16 atrous 300x300 base network for COCO.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • transfer (str or None) – If not None, will try to reuse pre-trained weights from SSD networks trained on other datasets.

Returns

A SSD detection network.

Return type

HybridBlock

Example

>>> net = ssd_512_vgg16_atrous_custom(classes=['a', 'b', 'c'], pretrained_base=True)
>>> net = ssd_512_vgg16_atrous_custom(classes=['foo', 'bar'], transfer='coco')
gluoncv.model_zoo.ssd_512_vgg16_atrous_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with VGG16 atrous 512x512 base network.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.vgg11(**kwargs)[source]

VGG-11 model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.vgg11_bn(**kwargs)[source]

VGG-11 model with batch normalization from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.vgg13(**kwargs)[source]

VGG-13 model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.vgg13_bn(**kwargs)[source]

VGG-13 model with batch normalization from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.vgg16(**kwargs)[source]

VGG-16 model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.vgg16_atrous_300(**kwargs)[source]

Get VGG atrous 16 layer 300 in_size feature extractor networks.

gluoncv.model_zoo.vgg16_atrous_512(**kwargs)[source]

Get VGG atrous 16 layer 512 in_size feature extractor networks.

gluoncv.model_zoo.vgg16_bn(**kwargs)[source]

VGG-16 model with batch normalization from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.vgg19(**kwargs)[source]

VGG-19 model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.vgg19_bn(**kwargs)[source]

VGG-19 model with batch normalization from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.yolo3_darknet53_coco(pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with darknet53 base network on COCO dataset. :param pretrained_base: Whether fetch and load pretrained weights for base network. :type pretrained_base: boolean :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.yolo3_darknet53_custom(classes, transfer=None, pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with darknet53 base network on custom dataset. :param classes: Names of custom foreground classes. len(classes) is the number of foreground classes. :type classes: iterable of str :param transfer: If not None, will try to reuse pre-trained weights from yolo networks trained on other

datasets.

Parameters
Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.yolo3_darknet53_voc(pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with darknet53 base network on VOC dataset. :param pretrained_base: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.yolo3_mobilenet0_25_coco(pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with mobilenet0.25 base network on COCO dataset. :param pretrained_base: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.yolo3_mobilenet0_25_custom(classes, transfer=None, pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with mobilenet0.25 base network on custom dataset. :param classes: Names of custom foreground classes. len(classes) is the number of foreground classes. :type classes: iterable of str :param transfer: If not None, will try to reuse pre-trained weights from yolo networks trained on other

datasets.

Parameters
Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.yolo3_mobilenet0_25_voc(pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with mobilenet0.25 base network on VOC dataset. :param pretrained_base: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.yolo3_mobilenet1_0_coco(pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with mobilenet base network on COCO dataset. :param pretrained_base: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.yolo3_mobilenet1_0_custom(classes, transfer=None, pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with mobilenet base network on custom dataset. :param classes: Names of custom foreground classes. len(classes) is the number of foreground classes. :type classes: iterable of str :param transfer: If not None, will try to reuse pre-trained weights from yolo networks trained on other

datasets.

Parameters
Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.yolo3_mobilenet1_0_voc(pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with mobilenet base network on VOC dataset. :param pretrained_base: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock