Table Of Contents
Table Of Contents

gluoncv.model_zoo

GluonCV Model Zoo

gluoncv.model_zoo.get_model

Returns a pre-defined GluonCV model by name.

Hint

This is the recommended method for getting a pre-defined model.

It support directly loading models from Gluon Model Zoo as well.

get_model

Returns a pre-defined model by name

Image Classification

CIFAR

get_cifar_resnet

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper.ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper..

cifar_resnet20_v1

ResNet-20 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

cifar_resnet56_v1

ResNet-56 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

cifar_resnet110_v1

ResNet-110 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

cifar_resnet20_v2

ResNet-20 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

cifar_resnet56_v2

ResNet-56 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

cifar_resnet110_v2

ResNet-110 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

get_cifar_wide_resnet

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper.ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper..

cifar_wideresnet16_10

WideResNet-16-10 model for CIFAR10 from “Wide Residual Networks” paper.

cifar_wideresnet28_10

WideResNet-28-10 model for CIFAR10 from “Wide Residual Networks” paper.

cifar_wideresnet40_8

WideResNet-40-8 model for CIFAR10 from “Wide Residual Networks” paper.

ImageNet

We apply dilattion strategy to pre-trained ResNet models (with stride of 8). Please see gluoncv.model_zoo.SegBaseModel for how to use it.

ResNetV1b

Pre-trained ResNetV1b Model, which produces the strides of 8 featuremaps at conv5.

resnet18_v1b

Constructs a ResNetV1b-18 model.

resnet34_v1b

Constructs a ResNetV1b-34 model.

resnet50_v1b

Constructs a ResNetV1b-50 model.

resnet101_v1b

Constructs a ResNetV1b-101 model.

resnet152_v1b

Constructs a ResNetV1b-152 model.

Object Detection

SSD

SSD

Single-shot Object Detection Network: https://arxiv.org/abs/1512.02325.

get_ssd

Get SSD models.

ssd_300_vgg16_atrous_voc

SSD architecture with VGG16 atrous 300x300 base network for Pascal VOC.

ssd_300_vgg16_atrous_coco

SSD architecture with VGG16 atrous 300x300 base network for COCO.

ssd_300_vgg16_atrous_custom

SSD architecture with VGG16 atrous 300x300 base network for COCO.

ssd_512_vgg16_atrous_voc

SSD architecture with VGG16 atrous 512x512 base network.

ssd_512_vgg16_atrous_coco

SSD architecture with VGG16 atrous layers for COCO.

ssd_512_vgg16_atrous_custom

SSD architecture with VGG16 atrous 300x300 base network for COCO.

ssd_512_resnet50_v1_voc

SSD architecture with ResNet v1 50 layers.

ssd_512_resnet50_v1_coco

SSD architecture with ResNet v1 50 layers for COCO.

ssd_512_resnet50_v1_custom

SSD architecture with ResNet50 v1 512 base network for custom dataset.

ssd_512_resnet101_v2_voc

SSD architecture with ResNet v2 101 layers.

ssd_512_resnet152_v2_voc

SSD architecture with ResNet v2 152 layers.

VGGAtrousExtractor

VGG Atrous multi layer feature extractor which produces multiple output feature maps.

get_vgg_atrous_extractor

Get VGG atrous feature extractor networks.

vgg16_atrous_300

Get VGG atrous 16 layer 300 in_size feature extractor networks.

vgg16_atrous_512

Get VGG atrous 16 layer 512 in_size feature extractor networks.

Faster RCNN

FasterRCNN

Faster RCNN network.

get_faster_rcnn

Utility function to return faster rcnn networks.

faster_rcnn_resnet50_v1b_voc

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J.

faster_rcnn_resnet50_v1b_coco

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J.

faster_rcnn_resnet50_v1b_custom

Faster RCNN model with resnet50_v1b base network on custom dataset.

YOLOv3

YOLOV3

YOLO V3 detection network.

get_yolov3

Get YOLOV3 models.

yolo3_darknet53_voc

YOLO3 multi-scale with darknet53 base network on VOC dataset.

yolo3_darknet53_coco

YOLO3 multi-scale with darknet53 base network on COCO dataset.

yolo3_darknet53_custom

YOLO3 multi-scale with darknet53 base network on custom dataset.

Instance Segmentation

Mask RCNN

MaskRCNN

Mask RCNN network.

get_mask_rcnn

Utility function to return mask rcnn networks.

mask_rcnn_resnet50_v1b_coco

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R.

Semantic Segmentation

FCN

FCN

Fully Convolutional Networks for Semantic Segmentation

get_fcn

FCN model from the paper “Fully Convolutional Network for semantic segmentation”

get_fcn_resnet50_voc

FCN model with base network ResNet-50 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

get_fcn_resnet101_voc

FCN model with base network ResNet-101 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

get_fcn_resnet101_coco

FCN model with base network ResNet-101 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

get_fcn_resnet50_ade

FCN model with base network ResNet-50 pre-trained on ADE20K dataset from the paper “Fully Convolutional Network for semantic segmentation”

get_fcn_resnet101_ade

FCN model with base network ResNet-50 pre-trained on ADE20K dataset from the paper “Fully Convolutional Network for semantic segmentation”

PSPNet

PSPNet

Pyramid Scene Parsing Network

get_psp

Pyramid Scene Parsing Network :param dataset: The dataset that model pretrained on.

get_psp_resnet101_coco

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

get_psp_resnet101_voc

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

get_psp_resnet50_ade

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

get_psp_resnet101_ade

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

DeepLabV3

DeepLabV3

param nclass

Number of categories for the training dataset.

get_deeplab

DeepLabV3 :param dataset: The dataset that model pretrained on.

get_deeplab_resnet101_coco

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

get_deeplab_resnet101_voc

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

get_deeplab_resnet50_ade

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

get_deeplab_resnet101_ade

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

Action Recognition

TSN

vgg16_ucf101

VGG16 model trained on UCF101 dataset.

vgg16_hmdb51

VGG16 model trained on HMDB51 dataset.

vgg16_kinetics400

VGG16 model trained on Kinetics400 dataset.

vgg16_sthsthv2

VGG16 model trained on Something-Something-V2 dataset.

inceptionv1_ucf101

InceptionV1 model trained on UCF101 dataset.

inceptionv1_hmdb51

InceptionV1 model trained on HMDB51 dataset.

inceptionv1_kinetics400

InceptionV1 model trained on Kinetics400 dataset.

inceptionv1_sthsthv2

InceptionV1 model trained on Something-Something-V2 dataset.

inceptionv3_ucf101

InceptionV3 model trained on UCF101 dataset.

inceptionv3_hmdb51

InceptionV3 model trained on HMDB51 dataset.

inceptionv3_kinetics400

InceptionV3 model trained on Kinetics400 dataset.

inceptionv3_sthsthv2

InceptionV3 model trained on Something-Something-V2 dataset.

resnet18_v1b_sthsthv2

ResNet18 model trained on Something-Something-V2 dataset.

resnet34_v1b_sthsthv2

ResNet34 model trained on Something-Something-V2 dataset.

resnet50_v1b_sthsthv2

ResNet50 model trained on Something-Something-V2 dataset.

resnet101_v1b_sthsthv2

ResNet101 model trained on Something-Something-V2 dataset.

resnet152_v1b_sthsthv2

ResNet152 model trained on Something-Something-V2 dataset.

resnet18_v1b_kinetics400

ResNet18 model trained on Kinetics400 dataset.

resnet34_v1b_kinetics400

ResNet34 model trained on Kinetics400 dataset.

resnet50_v1b_kinetics400

ResNet50 model trained on Kinetics400 dataset.

resnet101_v1b_kinetics400

ResNet101 model trained on Kinetics400 dataset.

resnet152_v1b_kinetics400

ResNet152 model trained on Kinetics400 dataset.

resnet50_v1b_ucf101

ResNet50 model trained on UCF101 dataset.

resnet50_v1b_hmdb51

ResNet50 model trained on HMDB51 dataset.

resnet50_v1b_custom

ResNet50 model customized for any dataset.

C3D

C3D

The Convolutional 3D network (C3D).

c3d_kinetics400

The Convolutional 3D network (C3D) trained on Kinetics400 dataset.

I3D

I3D_InceptionV1

Inception v1 model from “Going Deeper with Convolutions” paper.

i3d_inceptionv1_kinetics400

Inception v1 model trained on Kinetics400 dataset from “Going Deeper with Convolutions” paper.

I3D_InceptionV3

Inception v3 model from “Rethinking the Inception Architecture for Computer Vision” paper.

i3d_inceptionv3_kinetics400

Inception v3 model trained on Kinetics400 dataset from “Rethinking the Inception Architecture for Computer Vision” paper.

I3D_ResNetV1

ResNet_I3D backbone.

i3d_resnet50_v1_kinetics400

Inflated 3D model (I3D) with ResNet50 backbone trained on Kinetics400 dataset.

i3d_resnet101_v1_kinetics400

Inflated 3D model (I3D) with ResNet101 backbone trained on Kinetics400 dataset.

i3d_nl5_resnet50_v1_kinetics400

Inflated 3D model (I3D) with ResNet50 backbone and 5 non-local blocks trained on Kinetics400 dataset.

i3d_nl10_resnet50_v1_kinetics400

Inflated 3D model (I3D) with ResNet50 backbone and 10 non-local blocks trained on Kinetics400 dataset.

i3d_nl5_resnet101_v1_kinetics400

Inflated 3D model (I3D) with ResNet101 backbone and 5 non-local blocks trained on Kinetics400 dataset.

i3d_nl10_resnet101_v1_kinetics400

Inflated 3D model (I3D) with ResNet101 backbone and 10 non-local blocks trained on Kinetics400 dataset.

i3d_resnet50_v1_sthsthv2

Inflated 3D model (I3D) with ResNet50 backbone trained on Something-Something-V2 dataset.

i3d_resnet50_v1_hmdb51

Inflated 3D model (I3D) with ResNet50 backbone trained on HMDB51 dataset.

i3d_resnet50_v1_ucf101

Inflated 3D model (I3D) with ResNet50 backbone trained on UCF101 dataset.

i3d_resnet50_v1_custom

Inflated 3D model (I3D) with ResNet50 backbone.

P3D

P3D

The Pseudo 3D network (P3D).

p3d_resnet50_kinetics400

The Pseudo 3D network (P3D) with ResNet50 backbone trained on Kinetics400 dataset.

p3d_resnet101_kinetics400

The Pseudo 3D network (P3D) with ResNet101 backbone trained on Kinetics400 dataset.

R2+1D

R2Plus1D

The R2+1D network.

r2plus1d_resnet18_kinetics400

R2Plus1D with ResNet18 backbone trained on Kinetics400 dataset.

r2plus1d_resnet34_kinetics400

R2Plus1D with ResNet34 backbone trained on Kinetics400 dataset.

r2plus1d_resnet50_kinetics400

R2Plus1D with ResNet50 backbone trained on Kinetics400 dataset.

r2plus1d_resnet101_kinetics400

R2Plus1D with ResNet101 backbone trained on Kinetics400 dataset.

r2plus1d_resnet152_kinetics400

R2Plus1D with ResNet152 backbone trained on Kinetics400 dataset.

SlowFast

SlowFast

SlowFast networks (SlowFast) from “SlowFast Networks for Video Recognition” paper.

slowfast_4x16_resnet50_kinetics400

SlowFast 4x16 networks (SlowFast) with ResNet50 backbone trained on Kinetics400 dataset.

slowfast_8x8_resnet50_kinetics400

SlowFast 8x8 networks (SlowFast) with ResNet50 backbone trained on Kinetics400 dataset.

slowfast_4x16_resnet101_kinetics400

SlowFast 4x16 networks (SlowFast) with ResNet101 backbone trained on Kinetics400 dataset.

slowfast_8x8_resnet101_kinetics400

SlowFast 8x8 networks (SlowFast) with ResNet101 backbone trained on Kinetics400 dataset.

slowfast_16x8_resnet101_kinetics400

SlowFast 16x8 networks (SlowFast) with ResNet101 backbone trained on Kinetics400 dataset.

slowfast_16x8_resnet101_50_50_kinetics400

SlowFast 16x8 networks (SlowFast) with ResNet101 backbone trained on Kinetics400 dataset, but the temporal head is initialized with ResNet50 structure (3, 4, 6, 3).

slowfast_4x16_resnet50_custom

SlowFast 4x16 networks (SlowFast) with ResNet50 backbone.

API Reference

Network definitions of GluonCV models

GluonCV Model Zoo

class gluoncv.model_zoo.AlexNet(classes=1000, **kwargs)[source]

AlexNet model from the “One weird trick…” paper.

Parameters

classes (int, default 1000) – Number of classes for the output layer.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BasicBlockV1(channels, stride, downsample=False, in_channels=0, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

BasicBlock V1 from “Deep Residual Learning for Image Recognition” paper. This is used for ResNet V1 for 18, 34 layers.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BasicBlockV1b(planes, strides=1, dilation=1, downsample=None, previous_dilation=1, norm_layer=None, norm_kwargs=None, **kwargs)[source]

ResNetV1b BasicBlockV1b

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BasicBlockV2(channels, stride, downsample=False, in_channels=0, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

BasicBlock V2 from “Identity Mappings in Deep Residual Networks” paper. This is used for ResNet V2 for 18, 34 layers.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BottleneckV1(channels, stride, downsample=False, in_channels=0, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Bottleneck V1 from “Deep Residual Learning for Image Recognition” paper. This is used for ResNet V1 for 50, 101, 152 layers.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BottleneckV1b(planes, strides=1, dilation=1, downsample=None, previous_dilation=1, norm_layer=None, norm_kwargs=None, last_gamma=False, **kwargs)[source]

ResNetV1b BottleneckV1b

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BottleneckV2(channels, stride, downsample=False, in_channels=0, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Bottleneck V2 from “Identity Mappings in Deep Residual Networks” paper. This is used for ResNet V2 for 50, 101, 152 layers.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.C3D(nclass, dropout_ratio=0.5, num_segments=1, num_crop=1, feat_ext=False, init_std=0.001, ctx=None, **kwargs)[source]

The Convolutional 3D network (C3D). Learning Spatiotemporal Features with 3D Convolutional Networks. ICCV, 2015. https://arxiv.org/abs/1412.0767

Parameters
  • nclass (int) – Number of classes in the training dataset.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • dropout_ratio (float) – Dropout value used in the dropout layers after dense layers to avoid overfitting.

  • init_std (float) – Default standard deviation value for initializing dense layers.

  • ctx (str) – Context, default CPU. The context in which to load the pretrained weights.

hybrid_forward(F, x)[source]

Hybrid forward of C3D net

class gluoncv.model_zoo.CenterNet(base_network, heads, classes, head_conv_channel=0, scale=4.0, topk=100, flip_test=False, nms_thresh=0, nms_topk=400, post_nms=100, **kwargs)[source]

Objects as Points. https://arxiv.org/abs/1904.07850v2

Parameters
  • base_network (mxnet.gluon.nn.HybridBlock) – The base feature extraction network.

  • heads (OrderedDict) –

    OrderedDict with specifications for each head. For example: OrderedDict([

    (‘heatmap’, {‘num_output’: len(classes), ‘bias’: -2.19}), (‘wh’, {‘num_output’: 2}), (‘reg’, {‘num_output’: 2}) ])

  • classes (list of str) – Category names.

  • head_conv_channel (int, default is 0) – If > 0, will use an extra conv layer before each of the real heads.

  • scale (float, default is 4.0) – The downsampling ratio of the entire network.

  • topk (int, default is 100) – Number of outputs .

  • flip_test (bool) – Whether apply flip test in inference (training mode not affected).

  • nms_thresh (float, default is 0.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS. By default nms is disabled.

  • nms_topk (int, default is 400) –

    Apply NMS to top k detection results, use -1 to disable so that every Detection

    result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

hybrid_forward(F, x)[source]

Hybrid forward of center net

property num_classes

Return number of foreground classes.

Returns

Number of foreground classes

Return type

int

reset_class(classes, reuse_weights=None)[source]

Reset class categories and class predictors.

Parameters
  • classes (iterable of str) – The new categories. [‘apple’, ‘orange’] for example.

  • reuse_weights (dict) – A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict, or a list of [name0, name1,…] if class names don’t change. This allows the new predictor to reuse the previously trained weights specified.

Example

>>> net = gluoncv.model_zoo.get_model('center_net_resnet50_v1b_voc', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the 14th category in VOC
>>> net.reset_class(classes=['person'], reuse_weights={0:14})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':14})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
set_nms(nms_thresh=0, nms_topk=400, post_nms=100)[source]

Set non-maximum suppression parameters.

Parameters
  • nms_thresh (float, default is 0.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS. By default NMS is disabled.

  • nms_topk (int, default is 400) –

    Apply NMS to top k detection results, use -1 to disable so that every Detection

    result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

Returns

Return type

None

class gluoncv.model_zoo.DUC(planes, upscale_factor=2, **kwargs)[source]

Upsampling layer with pixel shuffle

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DarknetV3(layers, channels, classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Darknet v3.

Parameters
features

Feature extraction layers.

Type

mxnet.gluon.nn.HybridSequential

output

A classes(1000)-way Fully-Connected Layer.

Type

mxnet.gluon.nn.Dense

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DeepLabV3(nclass, backbone='resnet50', aux=True, ctx=cpu(0), pretrained_base=True, height=None, width=None, base_size=520, crop_size=480, **kwargs)[source]
Parameters
  • nclass (int) – Number of categories for the training dataset.

  • backbone (string) – Pre-trained dilated backbone network type (default:’resnet50’; ‘resnet50’, ‘resnet101’ or ‘resnet152’).

  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).

  • aux (bool) – Auxiliary loss.

Reference:

Chen, Liang-Chieh, et al. “Rethinking atrous convolution for semantic image segmentation.” arXiv preprint arXiv:1706.05587 (2017).

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DeepLabV3Plus(nclass, backbone='xception', aux=True, ctx=cpu(0), pretrained_base=True, height=None, width=None, base_size=576, crop_size=512, dilated=True, **kwargs)[source]
Parameters
  • nclass (int) – Number of categories for the training dataset.

  • backbone (string) – Pre-trained dilated backbone network type (default:’xception’).

  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).

  • aux (bool) – Auxiliary loss.

Reference:

Chen, Liang-Chieh, et al. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.”

evaluate(x)[source]

evaluating network with inputs and targets

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DeepLabWV3Plus(nclass, backbone='wideresnet', aux=False, ctx=cpu(0), pretrained_base=True, height=None, width=None, base_size=520, crop_size=480, dilated=True, **kwargs)[source]
Parameters
  • nclass (int) – Number of categories for the training dataset.

  • backbone (string) – Pre-trained dilated backbone network type (default:’wideresnet’).

  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).

  • aux (bool) – Auxiliary loss.

  • Reference – Chen, Liang-Chieh, et al. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.”, https://arxiv.org/abs/1802.02611, ECCV 2018

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DenseNet(num_init_features, growth_rate, block_config, bn_size=4, dropout=0, classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Densenet-BC model from the “Densely Connected Convolutional Networks” paper.

Parameters
  • num_init_features (int) – Number of filters to learn in the first convolution layer.

  • growth_rate (int) – Number of filters to add each layer (k in the paper).

  • block_config (list of int) – List of integers for numbers of layers in each pooling block.

  • bn_size (int, default 4) – Multiplicative factor for number of bottle neck layers. (i.e. bn_size * k features in the bottleneck layer)

  • dropout (float, default 0) – Rate of dropout after each dense layer.

  • classes (int, default 1000) – Number of classification classes.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DepthwiseRPN(bz=1, is_train=False, ctx=cpu(0), anchor_num=5, out_channels=256)[source]

get cls and loc throught z_f and x_f

Parameters
  • bz (int) – batch size for train, bz = 1 if test.

  • is_train (str) – is_train is True if train, False if test.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • anchor_num (int) – number of anchor.

  • out_channels (int) – hidden feature channel.

hybrid_forward(F, z_f, x_f)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.FCN(nclass, backbone='resnet50', aux=True, ctx=cpu(0), pretrained_base=True, base_size=520, crop_size=480, **kwargs)[source]

Fully Convolutional Networks for Semantic Segmentation

Parameters
  • nclass (int) – Number of categories for the training dataset.

  • backbone (string) – Pre-trained dilated backbone network type (default:’resnet50’; ‘resnet50’, ‘resnet101’ or ‘resnet152’).

  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm;

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

  • pretrained_base (bool or str) – Refers to if the FCN backbone or the encoder is pretrained or not. If True, model weights of a model that was trained on ImageNet is loaded.

Reference:

Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” CVPR, 2015

Examples

>>> model = FCN(nclass=21, backbone='resnet50')
>>> print(model)
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.FasterRCNN(features, top_features, classes, box_features=None, short=600, max_size=1000, min_stage=4, max_stage=4, train_patterns=None, nms_thresh=0.3, nms_topk=400, post_nms=100, roi_mode='align', roi_size=(14, 14), strides=16, clip=None, rpn_channel=1024, base_size=16, scales=(8, 16, 32), ratios=(0.5, 1, 2), alloc_size=(128, 128), rpn_nms_thresh=0.7, rpn_train_pre_nms=12000, rpn_train_post_nms=2000, rpn_test_pre_nms=6000, rpn_test_post_nms=300, rpn_min_size=16, per_device_batch_size=1, num_sample=128, pos_iou_thresh=0.5, pos_ratio=0.25, max_num_gt=300, additional_output=False, force_nms=False, minimal_opset=False, **kwargs)[source]

Faster RCNN network.

Parameters
  • features (gluon.HybridBlock) – Base feature extractor before feature pooling layer.

  • top_features (gluon.HybridBlock) – Tail feature extractor after feature pooling layer.

  • classes (iterable of str) – Names of categories, its length is num_class.

  • box_features (gluon.HybridBlock, default is None) – feature head for transforming shared ROI output (top_features) for box prediction. If set to None, global average pooling will be used.

  • short (int, default is 600.) – Input image short side size.

  • max_size (int, default is 1000.) – Maximum size of input image long side.

  • min_stage (int, default is 4) – Minimum stage NO. for FPN stages.

  • max_stage (int, default is 4) – Maximum stage NO. for FPN stages.

  • train_patterns (str, default is None.) – Matching pattern for trainable parameters.

  • nms_thresh (float, default is 0.3.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

  • nms_topk (int, default is 400) – Apply NMS to top k detection results, use -1 to disable so that every Detection result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

  • roi_mode (str, default is align) – ROI pooling mode. Currently support ‘pool’ and ‘align’.

  • roi_size (tuple of int, length 2, default is (14, 14)) – (height, width) of the ROI region.

  • strides (int/tuple of ints, default is 16) – Feature map stride with respect to original image. This is usually the ratio between original image size and feature map size. For FPN, use a tuple of ints.

  • clip (float, default is None) – Clip bounding box prediction to to prevent exponentiation from overflowing.

  • rpn_channel (int, default is 1024) – Channel number used in RPN convolutional layers.

  • base_size (int) – The width(and height) of reference anchor box.

  • scales (iterable of float, default is (8, 16, 32)) –

    The areas of anchor boxes. We use the following form to compute the shapes of anchors:

    \[width_{anchor} = size_{base} \times scale \times \sqrt{ 1 / ratio} height_{anchor} = size_{base} \times scale \times \sqrt{ratio}\]

  • ratios (iterable of float, default is (0.5, 1, 2)) – The aspect ratios of anchor boxes. We expect it to be a list or tuple.

  • alloc_size (tuple of int) – Allocate size for the anchor boxes as (H, W). Usually we generate enough anchors for large feature map, e.g. 128x128. Later in inference we can have variable input sizes, at which time we can crop corresponding anchors from this large anchor map so we can skip re-generating anchors for each input.

  • rpn_train_pre_nms (int, default is 12000) – Filter top proposals before NMS in training of RPN.

  • rpn_train_post_nms (int, default is 2000) – Return top proposal results after NMS in training of RPN. Will be set to rpn_train_pre_nms if it is larger than rpn_train_pre_nms.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN. Will be set to rpn_test_pre_nms if it is larger than rpn_test_pre_nms.

  • rpn_nms_thresh (float, default is 0.7) – IOU threshold for NMS. It is used to remove overlapping proposals.

  • rpn_num_sample (int, default is 256) – Number of samples for RPN targets.

  • rpn_pos_iou_thresh (float, default is 0.7) – Anchor with IOU larger than pos_iou_thresh is regarded as positive samples.

  • rpn_neg_iou_thresh (float, default is 0.3) – Anchor with IOU smaller than neg_iou_thresh is regarded as negative samples. Anchors with IOU in between pos_iou_thresh and neg_iou_thresh are ignored.

  • rpn_pos_ratio (float, default is 0.5) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.

  • rpn_box_norm (array-like of size 4, default is (1., 1., 1., 1.)) – Std value to be divided from encoded values.

  • rpn_min_size (int, default is 16) – Proposals whose size is smaller than min_size will be discarded.

  • per_device_batch_size (int, default is 1) – Batch size for each device during training.

  • num_sample (int, default is 128) – Number of samples for RCNN targets.

  • pos_iou_thresh (float, default is 0.5) – Proposal whose IOU larger than pos_iou_thresh is regarded as positive samples.

  • pos_ratio (float, default is 0.25) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.

  • max_num_gt (int, default is 300) – Maximum ground-truth number for each example. This is only an upper bound, not necessarily very precise. However, using a very big number may impact the training speed.

  • additional_output (boolean, default is False) – additional_output is only used for Mask R-CNN to get internal outputs.

  • force_nms (bool, default is False) – Appy NMS to all categories, this is to avoid overlapping detection results from different categories.

  • minimal_opset (bool, default is False) – We sometimes add special operators to accelerate training/inference, however, for exporting to third party compilers we want to utilize most widely used operators. If minimal_opset is True, the network will use a minimal set of operators good for e.g., TVM.

classes

Names of categories, its length is num_class.

Type

iterable of str

num_class

Number of positive categories.

Type

int

short

Input image short side size.

Type

int

max_size

Maximum size of input image long side.

Type

int

train_patterns

Matching pattern for trainable parameters.

Type

str

nms_thresh

Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

Type

float

nms_topk
Apply NMS to top k detection results, use -1 to disable so that every Detection

result is used in NMS.

Type

int

force_nms

Appy NMS to all categories, this is to avoid overlapping detection results from different categories.

Type

bool

post_nms

Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

Type

int

rpn_target_generator

Generate training targets with cls_target, box_target, and box_mask.

Type

gluon.Block

target_generator

Generate training targets with boxes, samples, matches, gt_label and gt_box.

Type

gluon.Block

hybrid_forward(F, x, gt_box=None, gt_label=None)[source]

Forward Faster-RCNN network.

The behavior during training and inference is different.

Parameters
  • x (mxnet.nd.NDArray or mxnet.symbol) – The network input tensor.

  • gt_box (type, only required during training) – The ground-truth bbox tensor with shape (B, N, 4).

  • gt_label (type, only required during training) – The ground-truth label tensor with shape (B, 1, 4).

Returns

During inference, returns final class id, confidence scores, bounding boxes.

Return type

(ids, scores, bboxes)

reset_class(classes, reuse_weights=None)[source]

Reset class categories and class predictors.

Parameters
  • classes (iterable of str) – The new categories. [‘apple’, ‘orange’] for example.

  • reuse_weights (dict) – A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict, or a list of [name0, name1,…] if class names don’t change. This allows the new predictor to reuse the previously trained weights specified.

Example

>>> net = gluoncv.model_zoo.get_model('faster_rcnn_resnet50_v1b_coco', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the 14th category in VOC
>>> net.reset_class(classes=['person'], reuse_weights={0:14})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':14})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
property target_generator

Returns stored target generator

Returns

The RCNN target generator

Return type

mxnet.gluon.HybridBlock

class gluoncv.model_zoo.GoogLeNet(classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, dropout_ratio=0.4, aux_logits=False, norm_kwargs=None, partial_bn=False, pretrained_base=True, ctx=None, **kwargs)[source]

GoogleNet model from “Going Deeper with Convolutions” paper. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” paper.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.HybridBlock(prefix=None, params=None)[source]

HybridBlock supports forwarding with both Symbol and NDArray.

HybridBlock is similar to Block, with a few differences:

import mxnet as mx
from mxnet.gluon import HybridBlock, nn

class Model(HybridBlock):
    def __init__(self, **kwargs):
        super(Model, self).__init__(**kwargs)
        # use name_scope to give child Blocks appropriate names.
        with self.name_scope():
            self.dense0 = nn.Dense(20)
            self.dense1 = nn.Dense(20)

    def hybrid_forward(self, F, x):
        x = F.relu(self.dense0(x))
        return F.relu(self.dense1(x))

model = Model()
model.initialize(ctx=mx.cpu(0))
model.hybridize()
model(mx.nd.zeros((10, 10), ctx=mx.cpu(0)))

Forward computation in HybridBlock must be static to work with Symbol s, i.e. you cannot call NDArray.asnumpy(), NDArray.shape, NDArray.dtype, NDArray indexing (x[i]) etc on tensors. Also, you cannot use branching or loop logic that bases on non-constant expressions like random numbers or intermediate results, since they change the graph structure for each iteration.

Before activating with hybridize(), HybridBlock works just like normal Block. After activation, HybridBlock will create a symbolic graph representing the forward computation and cache it. On subsequent forwards, the cached graph will be used instead of hybrid_forward().

Please see references for detailed tutorial.

References

Hybrid - Faster training and easy deployment

cast(dtype)[source]

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

export(path, epoch=0, remove_amp_cast=True)[source]

Export HybridBlock to json format that can be loaded by SymbolBlock.imports, mxnet.mod.Module or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters
  • path (str) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number.

  • epoch (int) – Epoch number of saved model.

forward(x, *args)[source]

Defines the forward computation. Arguments can be either NDArray or Symbol.

hybrid_forward(F, x, *args, **kwargs)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

hybridize(active=True, **kwargs)[source]

Activates or deactivates HybridBlock s recursively. Has no effect on non-hybrid children.

Parameters
  • active (bool, default True) – Whether to turn hybrid on or off.

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

infer_shape(*args)[source]

Infers shape of Parameters from inputs.

infer_type(*args)[source]

Infers data type of Parameters from inputs.

register_child(block, name=None)[source]

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_op_hook(callback, monitor_all=False)[source]

Install op hook for block recursively.

Parameters
  • callback (function) – Takes a string and a NDArrayHandle.

  • monitor_all (bool, default False) – If true, monitor both input and output, otherwise monitor output only.

class gluoncv.model_zoo.I3D_InceptionV1(nclass=1000, pretrained=False, pretrained_base=True, num_segments=1, num_crop=1, feat_ext=False, dropout_ratio=0.5, init_std=0.01, partial_bn=False, ctx=None, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Inception v1 model from “Going Deeper with Convolutions” paper.

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper. Slight differences between this implementation and the original implementation due to padding.

Parameters
  • nclass (int) – Number of classes in the training dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • dropout_ratio (float, default is 0.5.) – The dropout rate of a dropout layer. The larger the value, the more strength to prevent overfitting.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • init_std (float, default is 0.001.) – Standard deviation value when initialize the dense layers.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.I3D_InceptionV3(nclass=1000, pretrained=False, pretrained_base=True, num_segments=1, num_crop=1, feat_ext=False, dropout_ratio=0.5, init_std=0.01, partial_bn=False, ctx=None, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Inception v3 model from “Rethinking the Inception Architecture for Computer Vision” paper.

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

This model definition file is written by Brais and modified by Yi.

Parameters
  • nclass (int) – Number of classes in the training dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • dropout_ratio (float, default is 0.5.) – The dropout rate of a dropout layer. The larger the value, the more strength to prevent overfitting.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • init_std (float, default is 0.001.) – Standard deviation value when initialize the dense layers.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.I3D_ResNetV1(nclass, depth, num_stages=4, pretrained=False, pretrained_base=True, feat_ext=False, num_segments=1, num_crop=1, spatial_strides=(1, 2, 2, 2), temporal_strides=(1, 1, 1, 1), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), conv1_kernel_t=5, conv1_stride_t=2, pool1_kernel_t=1, pool1_stride_t=2, inflate_freq=(1, 1, 1, 1), inflate_stride=(1, 1, 1, 1), inflate_style='3x1x1', nonlocal_stages=(-1, ), nonlocal_freq=(0, 1, 1, 0), nonlocal_cfg=None, bn_eval=True, bn_frozen=False, partial_bn=False, frozen_stages=-1, dropout_ratio=0.5, init_std=0.01, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, ctx=None, **kwargs)[source]

ResNet_I3D backbone. Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • depth (int, default is 50.) – Depth of ResNet, from {18, 34, 50, 101, 152}.

  • num_stages (int, default is 4.) – Number of stages in a ResNet.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • spatial_strides (tuple of int.) – Strides in the spatial dimension of the first block of each stage.

  • temporal_strides (tuple of int.) – Strides in the temporal dimension of the first block of each stage.

  • dilations (tuple of int.) – Dilation ratio of each stage.

  • out_indices (tuple of int.) – Collect features from the selected stages of ResNet, usually used for feature extraction or auxililary loss.

  • conv1_kernel_t (int, default is 5.) – The kernel size of first convolutional layer in a ResNet.

  • conv1_stride_t (int, default is 2.) – The stride of first convolutional layer in a ResNet.

  • pool1_kernel_t (int, default is 1.) – The kernel size of first pooling layer in a ResNet.

  • pool1_stride_t (int, default is 2.) – The stride of first pooling layer in a ResNet.

  • inflate_freq (tuple of int.) – Select which 2D convolutional layers to be inflated to 3D convolutional layers in each stage.

  • inflate_stride (tuple of int.) – The stride for inflated layers in each stage.

  • inflate_style (str, default is '3x1x1'.) – How to inflate a 2D kernel, either ‘3x1x1’ or ‘1x3x3’.

  • nonlocal_stages (tuple of int.) – Select which stage we need non-local blocks.

  • nonlocal_freq (tuple of int.) – Select where to insert non-local blocks in each stage.

  • nonlocal_cfg (dict.) – Additional non-local arguments, for example nonlocal_type=’gaussian’.

  • bn_eval (bool.) – Whether to set BN layers to eval mode, namely, freeze running stats (mean and var).

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • frozen_stages (int.) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • dropout_ratio (float, default is 0.5.) – The dropout rate of a dropout layer. The larger the value, the more strength to prevent overfitting.

  • init_std (float, default is 0.001.) – Standard deviation value when initialize the dense layers.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

hybrid_forward(F, x)[source]

Hybrid forward of I3D network

init_weights(ctx)[source]

Initial I3D network with its 2D pretrained weights.

class gluoncv.model_zoo.Inception3(classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, partial_bn=False, **kwargs)[source]

Inception v3 model from “Rethinking the Inception Architecture for Computer Vision” paper.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.MaskRCNN(features, top_features, classes, mask_channels=256, rcnn_max_dets=1000, rpn_test_pre_nms=6000, rpn_test_post_nms=1000, target_roi_scale=1, num_fcn_convs=0, norm_layer=None, norm_kwargs=None, **kwargs)[source]

Mask RCNN network.

Parameters
  • features (gluon.HybridBlock) – Base feature extractor before feature pooling layer.

  • top_features (gluon.HybridBlock) – Tail feature extractor after feature pooling layer.

  • classes (iterable of str) – Names of categories, its length is num_class.

  • mask_channels (int, default is 256) – Number of channels in mask prediction

  • rcnn_max_dets (int, default is 1000) – Number of rois to retain in RCNN. Upper bounded by min of rpn_test_pre_nms and rpn_test_post_nms.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 1000) – Return top proposal results after NMS in testing of RPN. Will be set to rpn_test_pre_nms if it is larger than rpn_test_pre_nms.

  • target_roi_scale (int, default 1) – Ratio of mask output roi / input roi. For model with FPN, this is typically 2.

  • num_fcn_convs (int, default 0) – number of convolution blocks before deconv layer. For FPN network this is typically 4.

hybrid_forward(F, x, gt_box=None, gt_label=None)[source]

Forward Mask RCNN network.

The behavior during training and inference is different.

Parameters
  • x (mxnet.nd.NDArray or mxnet.symbol) – The network input tensor.

  • gt_box (type, only required during training) – The ground-truth bbox tensor with shape (1, N, 4).

  • gt_label (type, only required during training) – The ground-truth label tensor with shape (B, 1, 4).

Returns

During inference, returns final class id, confidence scores, bounding boxes, segmentation masks.

Return type

(ids, scores, bboxes, masks)

reset_class(classes, reuse_weights=None)[source]

Reset class categories and class predictors.

Parameters
  • classes (iterable of str) – The new categories. [‘apple’, ‘orange’] for example.

  • reuse_weights (dict) – A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict, or a list of [name0, name1,…] if class names don’t change. This allows the new predictor to reuse the previously trained weights specified.

Example

>>> net = gluoncv.model_zoo.get_model('mask_rcnn_resnet50_v1b_voc', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the first category in COCO
>>> net.reset_class(classes=['person'], reuse_weights={0:0})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':0})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
class gluoncv.model_zoo.MobileNet(multiplier=1.0, classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.MobileNetV2(multiplier=1.0, classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper. :param multiplier: The width multiplier for controlling the model size. The actual number of channels

is equal to the original channel size multiplied by this multiplier.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.MobilePose(base_name, base_attrs=('features', ), num_joints=17, pretrained_base=False, pretrained_ctx=cpu(0), **kwargs)[source]

Pose Estimation for Mobile Device

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.P3D(nclass, block, layers, shortcut_type='B', block_design=('A', 'B', 'C'), dropout_ratio=0.5, num_segments=1, num_crop=1, feat_ext=False, init_std=0.001, ctx=None, partial_bn=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

The Pseudo 3D network (P3D). Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. ICCV, 2017. https://arxiv.org/abs/1711.10305

Parameters
  • nclass (int) – Number of classes in the training dataset.

  • block (Block, default is Bottleneck.) – Class for the residual block.

  • layers (list of int) – Numbers of layers in each block

  • block_design (tuple of str.) – Different designs for each block, from ‘A’, ‘B’ or ‘C’.

  • dropout_ratio (float, default is 0.5.) – The dropout rate of a dropout layer. The larger the value, the more strength to prevent overfitting.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • init_std (float, default is 0.001.) – Standard deviation value when initialize the dense layers.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Hybrid forward of P3D net

class gluoncv.model_zoo.PSPNet(nclass, backbone='resnet50', aux=True, ctx=cpu(0), pretrained_base=True, base_size=520, crop_size=480, **kwargs)[source]

Pyramid Scene Parsing Network

Parameters
  • nclass (int) – Number of categories for the training dataset.

  • backbone (string) – Pre-trained dilated backbone network type (default:’resnet50’; ‘resnet50’, ‘resnet101’ or ‘resnet152’).

  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).

  • aux (bool) – Auxiliary loss.

Reference:

Zhao, Hengshuang, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. “Pyramid scene parsing network.” CVPR, 2017

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.R2Plus1D(nclass, block, layers, dropout_ratio=0.5, num_segments=1, num_crop=1, feat_ext=False, init_std=0.001, ctx=None, partial_bn=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

The R2+1D network. A Closer Look at Spatiotemporal Convolutions for Action Recognition. CVPR, 2018. https://arxiv.org/abs/1711.11248

Parameters
  • nclass (int) – Number of classes in the training dataset.

  • block (Block, default is Bottleneck.) – Class for the residual block.

  • layers (list of int) – Numbers of layers in each block

  • dropout_ratio (float, default is 0.5.) – The dropout rate of a dropout layer. The larger the value, the more strength to prevent overfitting.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • init_std (float, default is 0.001.) – Standard deviation value when initialize the dense layers.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Hybrid forward of R2+1D net

class gluoncv.model_zoo.RCNNTargetGenerator(num_class, max_pos=128, per_device_batch_size=1, means=(0.0, 0.0, 0.0, 0.0), stds=(0.1, 0.1, 0.2, 0.2))[source]

RCNN target encoder to generate matching target and regression target values.

Parameters
  • num_class (int) – Number of total number of positive classes.

  • max_pos (int, default is 128) – Upper bound of Number of positive samples.

  • per_device_batch_size (int, default is 1) – Per device batch size

  • means (iterable of float, default is (0., 0., 0., 0.)) – Mean values to be subtracted from regression targets.

  • stds (iterable of float, default is (1, 1, 2, 2)) – Standard deviations to be divided from regression targets.

hybrid_forward(F, roi, samples, matches, gt_label, gt_box)[source]

Components can handle batch images

Parameters
  • roi ((B, N, 4), input proposals) –

  • samples ((B, N), value +1: positive / -1: negative.) –

  • matches ((B, N), value [0, M), index to gt_label and gt_box.) –

  • gt_label ((B, M), value [0, num_class), excluding background class.) –

  • gt_box ((B, M, 4), input ground truth box corner coordinates.) –

Returns

  • cls_target ((B, N), value [0, num_class + 1), including background.)

  • box_target ((B, N, C, 4), only foreground class has nonzero target.)

  • box_weight ((B, N, C, 4), only foreground class has nonzero weight.)

class gluoncv.model_zoo.RCNNTargetSampler(num_image, num_proposal, num_sample, pos_iou_thresh, pos_ratio, max_num_gt)[source]

A sampler to choose positive/negative samples from RCNN Proposals

Parameters
  • num_image (int) – Number of input images.

  • num_proposal (int) – Number of input proposals.

  • num_sample (int) – Number of samples for RCNN targets.

  • pos_iou_thresh (float) – Proposal whose IOU larger than pos_iou_thresh is regarded as positive samples. Proposal whose IOU smaller than pos_iou_thresh is regarded as negative samples.

  • pos_ratio (float) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.

  • max_num_gt (int) – Maximum ground-truth number for each example. This is only an upper bound, not necessarily very precise. However, using a very big number may impact the training speed.

hybrid_forward(F, rois, scores, gt_boxes)[source]

Handle B=self._num_image by a for loop.

Parameters
  • rois ((B, self._num_proposal, 4) encoded in (x1, y1, x2, y2)) –

  • scores ((B, self._num_proposal, 1), value range [0, 1] with ignore value -1.) –

  • gt_boxes ((B, M, 4) encoded in (x1, y1, x2, y2), invalid box should have area of 0.) –

Returns

  • rois ((B, self._num_sample, 4), randomly drawn from proposals)

  • samples ((B, self._num_sample), value +1: positive / 0: ignore / -1: negative.)

  • matches ((B, self._num_sample), value between [0, M))

class gluoncv.model_zoo.ResNetV1(block, layers, channels, classes=1000, thumbnail=False, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
  • block (HybridBlock) – Class for the residual block. Options are BasicBlockV1, BottleneckV1.

  • layers (list of int) – Numbers of layers in each block

  • channels (list of int) – Numbers of channels in each block. Length should be one larger than layers list.

  • classes (int, default 1000) – Number of classification classes.

  • thumbnail (bool, default False) – Enable thumbnail.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_se (bool, default False) – Whether to use Squeeze-and-Excitation module

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.ResNetV1b(block, layers, classes=1000, dilated=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, last_gamma=False, deep_stem=False, stem_width=32, avg_down=False, final_drop=0.0, use_global_stats=False, name_prefix='', **kwargs)[source]

Pre-trained ResNetV1b Model, which produces the strides of 8 featuremaps at conv5.

Parameters
  • block (Block) – Class for the residual block. Options are BasicBlockV1, BottleneckV1.

  • layers (list of int) – Numbers of layers in each block

  • classes (int, default 1000) – Number of classification classes.

  • dilated (bool, default False) – Applying dilation strategy to pretrained ResNet yielding a stride-8 model, typically used in Semantic Segmentation.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • deep_stem (bool, default False) – Whether to replace the 7x7 conv1 with 3 3x3 convolution layers.

  • avg_down (bool, default False) – Whether to use average pooling for projection skip connection between stages/downsample.

  • final_drop (float, default 0.0) – Dropout ratio before the final classification layer.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

Reference:

  • He, Kaiming, et al. “Deep residual learning for image recognition.”

Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

  • Yu, Fisher, and Vladlen Koltun. “Multi-scale context aggregation by dilated convolutions.”

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.ResNetV2(block, layers, channels, classes=1000, thumbnail=False, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • block (HybridBlock) – Class for the residual block. Options are BasicBlockV1, BottleneckV1.

  • layers (list of int) – Numbers of layers in each block

  • channels (list of int) – Numbers of channels in each block. Length should be one larger than layers list.

  • classes (int, default 1000) – Number of classification classes.

  • thumbnail (bool, default False) – Enable thumbnail.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_se (bool, default False) – Whether to use Squeeze-and-Excitation module

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.ResidualAttentionModel(scale, m, classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper. Input size is 224 x 224.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_BasicBlockV1(channels, stride, downsample=False, in_channels=0, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

BasicBlock V1 from “Deep Residual Learning for Image Recognition” paper. This is used for SE_ResNet V1 for 18, 34 layers.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_BasicBlockV2(channels, stride, downsample=False, in_channels=0, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

BasicBlock V2 from “Identity Mappings in Deep Residual Networks” paper. This is used for SE_ResNet V2 for 18, 34 layers.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_BottleneckV1(channels, stride, downsample=False, in_channels=0, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Bottleneck V1 from “Deep Residual Learning for Image Recognition” paper. This is used for SE_ResNet V1 for 50, 101, 152 layers.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_BottleneckV2(channels, stride, downsample=False, in_channels=0, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Bottleneck V2 from “Identity Mappings in Deep Residual Networks” paper. This is used for SE_ResNet V2 for 50, 101, 152 layers.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_ResNetV1(block, layers, channels, classes=1000, thumbnail=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

SE_ResNet V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_ResNetV2(block, layers, channels, classes=1000, thumbnail=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

SE_ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SSD(network, base_size, features, num_filters, sizes, ratios, steps, classes, use_1x1_transition=True, use_bn=True, reduce_ratio=1.0, min_depth=128, global_pool=False, pretrained=False, stds=(0.1, 0.1, 0.2, 0.2), nms_thresh=0.45, nms_topk=400, post_nms=100, anchor_alloc_size=128, ctx=cpu(0), norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, root='~/.mxnet/models', minimal_opset=False, **kwargs)[source]

Single-shot Object Detection Network: https://arxiv.org/abs/1512.02325.

Parameters
  • network (string or None) – Name of the base network, if None is used, will instantiate the base network from features directly instead of composing.

  • base_size (int) – Base input size, it is speficied so SSD can support dynamic input shapes.

  • features (list of str or mxnet.gluon.HybridBlock) – Intermediate features to be extracted or a network with multi-output. If network is None, features is expected to be a multi-output network.

  • num_filters (list of int) – Number of channels for the appended layers, ignored if network`is `None.

  • sizes (iterable fo float) – Sizes of anchor boxes, this should be a list of floats, in incremental order. The length of sizes must be len(layers) + 1. For example, a two stage SSD model can have sizes = [30, 60, 90], and it converts to [30, 60] and [60, 90] for the two stages, respectively. For more details, please refer to original paper.

  • ratios (iterable of list) – Aspect ratios of anchors in each output layer. Its length must be equals to the number of SSD output layers.

  • steps (list of int) – Step size of anchor boxes in each output layer.

  • classes (iterable of str) – Names of all categories.

  • use_1x1_transition (bool) – Whether to use 1x1 convolution as transition layer between attached layers, it is effective reducing model capacity.

  • use_bn (bool) – Whether to use BatchNorm layer after each attached convolutional layer.

  • reduce_ratio (float) – Channel reduce ratio (0, 1) of the transition layer.

  • min_depth (int) – Minimum channels for the transition layers.

  • global_pool (bool) – Whether to attach a global average pooling layer as the last output layer.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • stds (tuple of float, default is (0.1, 0.1, 0.2, 0.2)) – Std values to be divided/multiplied to box encoded values.

  • nms_thresh (float, default is 0.45.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

  • nms_topk (int, default is 400) –

    Apply NMS to top k detection results, use -1 to disable so that every Detection

    result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

  • anchor_alloc_size (tuple of int, default is (128, 128)) – For advanced users. Define anchor_alloc_size to generate large enough anchor maps, which will later saved in parameters. During inference, we support arbitrary input image by cropping corresponding area of the anchor map. This allow us to export to symbol so we can run it in c++, scalar, etc.

  • ctx (mx.Context) – Network context.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm. This will only apply to base networks that has norm_layer specified, will ignore if the base network (e.g. VGG) don’t accept this argument.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

  • root (str) – The root path for model storage, default is ‘~/.mxnet/models’

  • minimal_opset (bool) – We sometimes add special operators to accelerate training/inference, however, for exporting to third party compilers we want to utilize most widely used operators. If minimal_opset is True, the network will use a minimal set of operators good for e.g., TVM.

hybrid_forward(F, x)[source]

Hybrid forward

property num_classes

Return number of foreground classes.

Returns

Number of foreground classes

Return type

int

reset_class(classes, reuse_weights=None)[source]

Reset class categories and class predictors.

Parameters
  • classes (iterable of str) – The new categories. [‘apple’, ‘orange’] for example.

  • reuse_weights (dict) – A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict, or a list of [name0, name1,…] if class names don’t change. This allows the new predictor to reuse the previously trained weights specified.

Example

>>> net = gluoncv.model_zoo.get_model('ssd_512_resnet50_v1_voc', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the 14th category in VOC
>>> net.reset_class(classes=['person'], reuse_weights={0:14})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':14})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
set_nms(nms_thresh=0.45, nms_topk=400, post_nms=100)[source]

Set non-maximum suppression parameters.

Parameters
  • nms_thresh (float, default is 0.45.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

  • nms_topk (int, default is 400) –

    Apply NMS to top k detection results, use -1 to disable so that every Detection

    result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

Returns

Return type

None

class gluoncv.model_zoo.SiamRPN(bz=1, is_train=False, ctx=cpu(0), **kwargs)[source]
hybrid_forward(F, template, search)[source]

Hybrid forward of SiamRPN net only used in training

template(zinput)[source]

template z branch

track(xinput)[source]

track x branch

Parameters

xinput (np.ndarray) – predicted frame

Returns

predicted frame result

Return type

dic

class gluoncv.model_zoo.SimplePoseResNet(base_name='resnet50_v1b', pretrained_base=False, pretrained_ctx=cpu(0), num_joints=17, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), final_conv_kernel=1, deconv_with_bias=False, **kwargs)[source]
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SlowFast(nclass, block=<class 'gluoncv.model_zoo.action_recognition.slowfast.Bottleneck'>, layers=None, num_block_temp_kernel_fast=None, num_block_temp_kernel_slow=None, pretrained=False, pretrained_base=False, feat_ext=False, num_segments=1, num_crop=1, bn_eval=True, bn_frozen=False, partial_bn=False, frozen_stages=-1, dropout_ratio=0.5, init_std=0.01, alpha=8, beta_inv=8, fusion_conv_channel_ratio=2, fusion_kernel_size=5, width_per_group=64, num_groups=1, slow_temporal_stride=16, fast_temporal_stride=2, slow_frames=4, fast_frames=32, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, ctx=None, **kwargs)[source]

SlowFast networks (SlowFast) from “SlowFast Networks for Video Recognition” paper.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • block (a HybridBlock.) – Building block of a ResNet, could be Basic or Bottleneck.

  • layers (a list or tuple, default is None.) – Number of stages in a ResNet, e.g., [3, 4, 6, 3] in ResNet50.

  • num_block_temp_kernel_fast (int, default is None.) – If the current block has more than NUM_BLOCK_TEMP_KERNEL blocks, use temporal kernel of 1 for the rest of the blocks.

  • num_block_temp_kernel_slow (int, default is None.) – If the current block has more than NUM_BLOCK_TEMP_KERNEL blocks, use temporal kernel of 1 for the rest of the blocks.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • bn_eval (bool.) – Whether to set BN layers to eval mode, namely, freeze running stats (mean and var).

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • frozen_stages (int.) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • dropout_ratio (float, default is 0.5.) – The dropout rate of a dropout layer. The larger the value, the more strength to prevent overfitting.

  • init_std (float, default is 0.001.) – Standard deviation value when initialize the dense layers.

  • alpha (int, default is 8.) – Corresponds to the frame rate reduction ratio between the Slow and Fast pathways.

  • beta_inv (int, default is 8.) – Corresponds to the inverse of the channel reduction ratio between the Slow and Fast pathways.

  • fusion_conv_channel_ratio (int, default is 2.) – Ratio of channel dimensions between the Slow and Fast pathways.

  • fusion_kernel_size (int, default is 5.) – Kernel dimension used for fusing information from Fast pathway to Slow pathway.

  • width_per_group (int, default is 64.) – Width of each group (64 -> ResNet; 4 -> ResNeXt).

  • num_groups (int, default is 1.) – Number of groups for the convolution. Num_groups=1 is for standard ResNet like networks, and num_groups>1 is for ResNeXt like networks.

  • slow_temporal_stride (int, default 16.) – The temporal stride for sparse sampling of video frames in slow branch of a SlowFast network.

  • fast_temporal_stride (int, default 2.) – The temporal stride for sparse sampling of video frames in fast branch of a SlowFast network.

  • slow_frames (int, default 4.) – The number of frames used as input to a slow branch.

  • fast_frames (int, default 32.) – The number of frames used as input to a fast branch.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

FastPath(F, x)[source]

Hybrid forward of the fast branch

SlowPath(F, x, lateral)[source]

Hybrid forward of the slow branch

hybrid_forward(F, x)[source]

Hybrid forward of SlowFast network

class gluoncv.model_zoo.SqueezeNet(version, classes=1000, **kwargs)[source]

SqueezeNet model from the “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size” paper. SqueezeNet 1.1 model from the official SqueezeNet repo. SqueezeNet 1.1 has 2.4x less computation and slightly fewer parameters than SqueezeNet 1.0, without sacrificing accuracy.

Parameters
  • version (str) – Version of squeezenet. Options are ‘1.0’, ‘1.1’.

  • classes (int, default 1000) – Number of classification classes.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.VGG(layers, filters, classes=1000, batch_norm=False, **kwargs)[source]

VGG model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • layers (list of int) – Numbers of layers in each feature block.

  • filters (list of int) – Numbers of filters in each feature block. List length should match the layers.

  • classes (int, default 1000) – Number of classification classes.

  • batch_norm (bool, default False) – Use batch normalization.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.VGGAtrousExtractor(layers, filters, extras, batch_norm=False, **kwargs)[source]

VGG Atrous multi layer feature extractor which produces multiple output feature maps.

Parameters
  • layers (list of int) – Number of layer for vgg base network.

  • filters (list of int) – Number of convolution filters for each layer.

  • extras (list of list) – Extra layers configurations.

  • batch_norm (bool) – If True, will use BatchNorm layers.

hybrid_forward(F, x, init_scale)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.Xception65(classes=1000, output_stride=32, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None)[source]

Modified Aligned Xception

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.Xception71(classes=1000, output_stride=32, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None)[source]

Modified Aligned Xception

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.YOLOV3(stages, channels, anchors, strides, classes, alloc_size=(128, 128), nms_thresh=0.45, nms_topk=400, post_nms=100, pos_iou_thresh=1.0, ignore_iou_thresh=0.7, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO V3 detection network. Reference: https://arxiv.org/pdf/1804.02767.pdf. :param stages: Staged feature extraction blocks.

For example, 3 stages and 3 YOLO output layers are used original paper.

Parameters
  • channels (iterable) – Number of conv channels for each appended stage. len(channels) should match len(stages).

  • num_class (int) – Number of foreground objects.

  • anchors (iterable) – The anchor setting. len(anchors) should match len(stages).

  • strides (iterable) – Strides of feature map. len(strides) should match len(stages).

  • alloc_size (tuple of int, default is (128, 128)) – For advanced users. Define alloc_size to generate large enough anchor maps, which will later saved in parameters. During inference, we support arbitrary input image by cropping corresponding area of the anchor map. This allow us to export to symbol so we can run it in c++, Scalar, etc.

  • nms_thresh (float, default is 0.45.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

  • nms_topk (int, default is 400) –

    Apply NMS to top k detection results, use -1 to disable so that every Detection

    result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

  • pos_iou_thresh (float, default is 1.0) – IOU threshold for true anchors that match real objects. ‘pos_iou_thresh < 1’ is not implemented.

  • ignore_iou_thresh (float) – Anchors that has IOU in range(ignore_iou_thresh, pos_iou_thresh) don’t get penalized of objectness score.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

property classes

Return names of (non-background) categories. :returns: Names of (non-background) categories. :rtype: iterable of str

hybrid_forward(F, x, *args)[source]

YOLOV3 network hybrid forward. :param F: F is mxnet.sym if hybridized or mxnet.nd if not. :type F: mxnet.nd or mxnet.sym :param x: Input data. :type x: mxnet.nd.NDArray :param *args: During training, extra inputs are required:

(gt_boxes, obj_t, centers_t, scales_t, weights_t, clas_t) These are generated by YOLOV3PrefetchTargetGenerator in dataloader transform function.

Returns

During inference, return detections in shape (B, N, 6) with format (cid, score, xmin, ymin, xmax, ymax) During training, return losses only: (obj_loss, center_loss, scale_loss, cls_loss).

Return type

(tuple of) mxnet.nd.NDArray

property num_class

Number of (non-background) categories. :returns: Number of (non-background) categories. :rtype: int

reset_class(classes, reuse_weights=None)[source]

Reset class categories and class predictors. :param classes: The new categories. [‘apple’, ‘orange’] for example. :type classes: iterable of str :param reuse_weights: A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict,

or a list of [name0, name1,…] if class names don’t change. This allows the new predictor to reuse the previously trained weights specified.

Example

>>> net = gluoncv.model_zoo.get_model('yolo3_darknet53_voc', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the 14th category in VOC
>>> net.reset_class(classes=['person'], reuse_weights={0:14})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':14})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
set_nms(nms_thresh=0.45, nms_topk=400, post_nms=100)[source]

Set non-maximum suppression parameters. :param nms_thresh: Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS. :type nms_thresh: float, default is 0.45. :param nms_topk:

Apply NMS to top k detection results, use -1 to disable so that every Detection

result is used in NMS.

Parameters

post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

Returns

Return type

None

gluoncv.model_zoo.alexnet(pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

AlexNet model from the “One weird trick…” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

gluoncv.model_zoo.alexnetlegacy(**kwargs)[source]

Alexnetlegacy

gluoncv.model_zoo.c3d_kinetics400(nclass=400, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', num_segments=1, num_crop=1, feat_ext=False, **kwargs)[source]

The Convolutional 3D network (C3D) trained on Kinetics400 dataset. Learning Spatiotemporal Features with 3D Convolutional Networks. ICCV, 2015. https://arxiv.org/abs/1412.0767

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.center_net_dla34_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_dla34_dcnv2_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network with deformable v2 conv layers on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_dla34_dcnv2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network with deformable conv layers on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_dla34_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with dla34 base network on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet101_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet101_v1b_dcnv2_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network with deformable v2 conv layers on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet101_v1b_dcnv2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network with deformable conv layers on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet101_v1b_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet18_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet18_v1b base network on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet18_v1b_dcnv2_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet18_v1b base network with deformable v2 conv layer on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet18_v1b_dcnv2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet18_v1b base network with deformable v2 conv layers on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet18_v1b_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet18_v1b base network on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet50_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet50_v1b base network on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet50_v1b_dcnv2_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet50_v1b base network with deformable v2 conv layers on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet50_v1b_dcnv2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet50_v1b base network with deformable conv layers on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet50_v1b_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet50_v1b base network on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

class gluoncv.model_zoo.cifar_ResidualAttentionModel(scale, m, classes=10, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper. Input size is 32 x 32.

Parameters
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

gluoncv.model_zoo.cifar_residualattentionnet452(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.cifar_residualattentionnet56(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.cifar_residualattentionnet92(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.cifar_resnet110_v1(**kwargs)[source]

ResNet-110 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

Parameters
gluoncv.model_zoo.cifar_resnet110_v2(**kwargs)[source]

ResNet-110 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.cifar_resnet20_v1(**kwargs)[source]

ResNet-20 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

Parameters
gluoncv.model_zoo.cifar_resnet20_v2(**kwargs)[source]

ResNet-20 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.cifar_resnet56_v1(**kwargs)[source]

ResNet-56 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

Parameters
gluoncv.model_zoo.cifar_resnet56_v2(**kwargs)[source]

ResNet-56 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.cifar_wideresnet16_10(**kwargs)[source]

WideResNet-16-10 model for CIFAR10 from “Wide Residual Networks” paper.

Parameters
gluoncv.model_zoo.cifar_wideresnet28_10(**kwargs)[source]

WideResNet-28-10 model for CIFAR10 from “Wide Residual Networks” paper.

Parameters
gluoncv.model_zoo.cifar_wideresnet40_8(**kwargs)[source]

WideResNet-40-8 model for CIFAR10 from “Wide Residual Networks” paper.

Parameters
gluoncv.model_zoo.cpu(device_id=0)[source]

Returns a CPU context.

This function is a short cut for Context('cpu', device_id). For most operations, when no context is specified, the default context is cpu().

Examples

>>> with mx.cpu():
...     cpu_array = mx.nd.ones((2, 3))
>>> cpu_array.context
cpu(0)
>>> cpu_array = mx.nd.ones((2, 3), ctx=mx.cpu())
>>> cpu_array.context
cpu(0)
Parameters

device_id (int, optional) – The device id of the device. device_id is not needed for CPU. This is included to make interface compatible with GPU.

Returns

context – The corresponding CPU context.

Return type

Context

gluoncv.model_zoo.custom_faster_rcnn_fpn(classes, transfer=None, dataset='custom', pretrained_base=True, base_network_name='resnet18_v1b', norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, sym_norm_layer=None, sym_norm_kwargs=None, num_fpn_filters=256, num_box_head_conv=4, num_box_head_conv_filters=256, num_box_head_dense_filters=1024, **kwargs)[source]

Faster RCNN model with resnet base network and FPN on custom dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • transfer (str or None) – Dataset from witch to transfer from. If not None, will try to reuse pre-trained weights from faster RCNN networks trained on other dataset, specified by the parameter.

  • dataset (str, default 'custom') – Dataset name attached to the network name

  • pretrained_base (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • base_network_name (str, default 'resnet18_v1b') – base network for mask RCNN. Currently support: ‘resnet18_v1b’, ‘resnet50_v1b’, and ‘resnet101_v1d’

  • norm_layer (nn.HybridBlock, default nn.BatchNorm) – Gluon normalization layer to use. Default is frozen batch normalization layer.

  • norm_kwargs (dict) – Keyword arguments for gluon normalization layer

  • sym_norm_layer (nn.SymbolBlock, default None) – Symbol normalization layer to use in FPN. This is due to FPN being implemented using SymbolBlock. Default is None, meaning no normalization layer will be used in FPN.

  • sym_norm_kwargs (dict) – Keyword arguments for symbol normalization layer used in FPN.

  • num_fpn_filters (int, default 256) – Number of filters for FPN output layers.

  • num_box_head_conv (int, default 4) – Number of convolution layers to use in box head if batch normalization is not frozen.

  • num_box_head_conv_filters (int, default 256) – Number of filters for convolution layers in box head. Only applicable if batch normalization is not frozen.

  • num_box_head_dense_filters (int, default 1024) – Number of hidden units for the last fully connected layer in box head.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Returns

Hybrid faster RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.custom_mask_rcnn_fpn(classes, transfer=None, dataset='custom', pretrained_base=True, base_network_name='resnet18_v1b', norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, sym_norm_layer=None, sym_norm_kwargs=None, num_fpn_filters=256, num_box_head_conv=4, num_box_head_conv_filters=256, num_box_head_dense_filters=1024, **kwargs)[source]

Mask RCNN model with resnet base network and FPN on custom dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • transfer (str or None) – Dataset from witch to transfer from. If not None, will try to reuse pre-trained weights from faster RCNN networks trained on other dataset, specified by the parameter.

  • dataset (str, default 'custom') – Dataset name attached to the network name

  • pretrained_base (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • base_network_name (str, default 'resnet18_v1b') – base network for mask RCNN. Currently support: ‘resnet18_v1b’, ‘resnet50_v1b’, and ‘resnet101_v1d’

  • norm_layer (nn.HybridBlock, default nn.BatchNorm) – Gluon normalization layer to use. Default is frozen batch normalization layer.

  • norm_kwargs (dict) – Keyword arguments for gluon normalization layer

  • sym_norm_layer (nn.SymbolBlock, default None) – Symbol normalization layer to use in FPN. This is due to FPN being implemented using SymbolBlock. Default is None, meaning no normalization layer will be used in FPN.

  • sym_norm_kwargs (dict) – Keyword arguments for symbol normalization layer used in FPN.

  • num_fpn_filters (int, default 256) – Number of filters for FPN output layers.

  • num_box_head_conv (int, default 4) – Number of convolution layers to use in box head if batch normalization is not frozen.

  • num_box_head_conv_filters (int, default 256) – Number of filters for convolution layers in box head. Only applicable if batch normalization is not frozen.

  • num_box_head_dense_filters (int, default 1024) – Number of hidden units for the last fully connected layer in box head.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Returns

Hybrid faster RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.darknet53(**kwargs)[source]

Darknet v3 53 layer network. Reference: https://arxiv.org/pdf/1804.02767.pdf.

Parameters
Returns

Darknet network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.densenet121(**kwargs)[source]

Densenet-BC 121-layer model from the “Densely Connected Convolutional Networks” paper.

Parameters
gluoncv.model_zoo.densenet161(**kwargs)[source]

Densenet-BC 161-layer model from the “Densely Connected Convolutional Networks” paper.

Parameters
gluoncv.model_zoo.densenet169(**kwargs)[source]

Densenet-BC 169-layer model from the “Densely Connected Convolutional Networks” paper.

Parameters
gluoncv.model_zoo.densenet201(**kwargs)[source]

Densenet-BC 201-layer model from the “Densely Connected Convolutional Networks” paper.

Parameters
gluoncv.model_zoo.faster_rcnn_fpn_resnet101_v1d_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model with FPN from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks” “Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S. (2016). Feature Pyramid Networks for Object Detection”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_fpn_resnet101_v1d_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_fpn_resnet50_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model with FPN from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks” “Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S. (2016). Feature Pyramid Networks for Object Detection”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_fpn_resnet50_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_fpn_syncbn_resnet101_v1d_coco(pretrained=False, pretrained_base=True, num_devices=0, **kwargs)[source]

Faster RCNN model with FPN from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks” “Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S. (2016). Feature Pyramid Networks for Object Detection”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • num_devices (int, default is 0) – Number of devices for sync batch norm layer. if less than 1, use all devices available.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_fpn_syncbn_resnet101_v1d_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_fpn_syncbn_resnet50_v1b_coco(pretrained=False, pretrained_base=True, num_devices=0, **kwargs)[source]

Faster RCNN model with FPN from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks” “Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S. (2016). Feature Pyramid Networks for Object Detection”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • num_devices (int, default is 0) – Number of devices for sync batch norm layer. if less than 1, use all devices available.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_fpn_syncbn_resnet50_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_resnet101_v1d_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks”

Parameters
  • pretrained (bool, optional, default is False) – Load pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_resnet101_v1d_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_resnet101_v1d_custom(classes, transfer=None, pretrained_base=True, pretrained=False, **kwargs)[source]

Faster RCNN model with resnet101_v1d base network on custom dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • transfer (str or None) – If not None, will try to reuse pre-trained weights from faster RCNN networks trained on other datasets.

  • pretrained_base (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Returns

Hybrid faster RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.faster_rcnn_resnet101_v1d_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks”

Parameters
  • pretrained (bool, optional, default is False) – Load pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_resnet101_v1d_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_resnet50_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_resnet50_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_resnet50_v1b_custom(classes, transfer=None, pretrained_base=True, pretrained=False, **kwargs)[source]

Faster RCNN model with resnet50_v1b base network on custom dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • transfer (str or None) – If not None, will try to reuse pre-trained weights from faster RCNN networks trained on other datasets.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Returns

Hybrid faster RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.faster_rcnn_resnet50_v1b_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_resnet50_v1b_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_Siam_RPN(base_name, bz=1, is_train=False, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

get Siam_RPN net and get pretrained model if have pretrained

Parameters
  • base_name (str) – Backbone model name

  • bz (int) – batch size for train, bz = 1 if test

  • is_train (str) – is_train is True if train, False if test

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

Returns

A SiamRPN Tracking network.

Return type

HybridBlock

gluoncv.model_zoo.get_center_net(name, dataset, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Get a center net instance.

Parameters
  • name (str or None) – Model name, if None is used, you must specify features to be a HybridBlock.

  • dataset (str) – Name of dataset. This is used to identify model name because models trained on different datasets are going to be very different.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.get_cifar_resnet(version, num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.get_cifar_wide_resnet(num_layers, width_factor=1, drop_rate=0.0, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • num_layers (int) – Numbers of layers. Needs to be an integer in the form of 6*n+2, e.g. 20, 56, 110, 164.

  • width_factor (int) – The width factor to apply to the number of channels from the original resnet.

  • drop_rate (float) – The rate of dropout.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_darknet(darknet_version, num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Get darknet by version and num_layers info.

Parameters
  • darknet_version (str) – Darknet version, choices are [‘v3’].

  • num_layers (int) – Number of layers.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

Darknet network.

Return type

mxnet.gluon.HybridBlock

Examples

>>> model = get_darknet('v3', 53, pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

DeepLabV3 :param dataset: The dataset that model pretrained on. (pascal_voc, pascal_aug, ade20k, coco, citys) :type dataset: str, default pascal_voc :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn(dataset='pascal_voc', backbone='resnet50', pretrained=False)
>>> print(model)
gluoncv.model_zoo.get_deeplab_plus(dataset='pascal_voc', backbone='xception', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

DeepLabV3Plus :param dataset: The dataset that model pretrained on. (pascal_voc, ade20k) :type dataset: str, default pascal_voc :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn(dataset='pascal_voc', backbone='xception', pretrained=False)
>>> print(model)
gluoncv.model_zoo.get_deeplab_plus_xception_coco(**kwargs)[source]

DeepLabV3Plus :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_plus_xception_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet101_ade(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet101_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet101_citys(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet101_citys(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet101_coco(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet101_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet101_voc(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet101_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet152_coco(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet152_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet152_voc(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet152_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet50_ade(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet50_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet50_citys(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet50_citys(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_v3b_plus_wideresnet_citys(**kwargs)[source]

DeepLabWV3Plus :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_v3b_plus_wideresnet_citys(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplabv3b_plus(dataset='citys', backbone='wideresnet', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

DeepLabWV3Plus :param dataset: The dataset that model pretrained on. (pascal_voc, ade20k, citys) :type dataset: str, default pascal_voc :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplabv3b_plus(dataset='citys', backbone='wideresnet', pretrained=False)
>>> print(model)
gluoncv.model_zoo.get_faster_rcnn(name, dataset, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Utility function to return faster rcnn networks.

Parameters
  • name (str) – Model name.

  • dataset (str) – The name of dataset.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

Returns

The Faster-RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.get_fcn(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), pretrained_base=True, **kwargs)[source]

FCN model from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • dataset (str, default pascal_voc) – The dataset that model pretrained on. (pascal_voc, ade20k)

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • pretrained_base (bool or str, default True) – This will load pretrained backbone network, that was trained on ImageNet.

Examples

>>> model = get_fcn(dataset='pascal_voc', backbone='resnet50', pretrained=False)
>>> print(model)
gluoncv.model_zoo.get_fcn_resnet101_ade(**kwargs)[source]

FCN model with base network ResNet-50 pre-trained on ADE20K dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_resnet50_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_fcn_resnet101_coco(**kwargs)[source]

FCN model with base network ResNet-101 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_resnet101_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_fcn_resnet101_voc(**kwargs)[source]

FCN model with base network ResNet-101 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_resnet101_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_fcn_resnet50_ade(**kwargs)[source]

FCN model with base network ResNet-50 pre-trained on ADE20K dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_resnet50_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_fcn_resnet50_voc(**kwargs)[source]

FCN model with base network ResNet-50 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_resnet50_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_hrnet(model_name, stage_interp_type='nearest', purpose='cls', pretrained=False, ctx=cpu(0), root='~/.mxnet/models', norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, num_classes=1000, **kwargs)[source]

HRNet model from the “Deep High-Resolution Representation Learning for Visual Recognition” paper.

Parameters
  • model_name (string) – The name of hrnet models: w18_small_v1/w18_small_v2/w30/w32/w40/w42/w48.

  • stage_interp_type (string) – The interpolation type for upsample in each stage, nearest, bilinear and bilinear_like are supported.

  • purpose (string) – The purpose of model, cls and seg are supported.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_mask_rcnn(name, dataset, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Utility function to return mask rcnn networks.

Parameters
  • name (str) – Model name.

  • dataset (str) – The name of dataset.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

Returns

The Mask RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.get_mobilenet(multiplier, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper.

Parameters
  • multiplier (float) – The width multiplier for controlling the model size. Only multipliers that are no less than 0.25 are supported. The actual number of channels is equal to the original channel size multiplied by this multiplier.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_mobilenet_v2(multiplier, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper.

Parameters
  • multiplier (float) – The width multiplier for controlling the model size. Only multipliers that are no less than 0.25 are supported. The actual number of channels is equal to the original channel size multiplied by this multiplier.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_model(name, **kwargs)[source]

Returns a pre-defined model by name

Parameters
  • name (str) – Name of the model.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • classes (int) – Number of classes for the output layer.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Returns

The model.

Return type

HybridBlock

gluoncv.model_zoo.get_model_list()[source]

Get the entire list of model names in model_zoo.

Returns

Entire list of model names in model_zoo.

Return type

list of str

gluoncv.model_zoo.get_nasnet(repeat=6, penultimate_filters=4032, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

NASNet A model from “Learning Transferable Architectures for Scalable Image Recognition” paper

Parameters
  • repeat (int) – Number of cell repeats

  • penultimate_filters (int) – Number of filters in the penultimate layer of the network

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_psp(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), pretrained_base=True, **kwargs)[source]

Pyramid Scene Parsing Network :param dataset: The dataset that model pretrained on. (pascal_voc, ade20k) :type dataset: str, default pascal_voc :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • pretrained_base (bool or str, default True) – This will load pretrained backbone network, that was trained on ImageNet.

Examples

>>> model = get_fcn(dataset='pascal_voc', backbone='resnet50', pretrained=False)
>>> print(model)
gluoncv.model_zoo.get_psp_resnet101_ade(**kwargs)[source]

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_psp_resnet101_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_psp_resnet101_citys(**kwargs)[source]

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_psp_resnet101_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_psp_resnet101_coco(**kwargs)[source]

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_psp_resnet101_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_psp_resnet101_voc(**kwargs)[source]

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_psp_resnet101_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_psp_resnet50_ade(**kwargs)[source]

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_psp_resnet50_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_resnet(version, num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', use_se=False, **kwargs)[source]

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • version (int) – Version of ResNet. Options are 1, 2.

  • num_layers (int) – Numbers of layers. Options are 18, 34, 50, 101, 152.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • use_se (bool, default False) – Whether to use Squeeze-and-Excitation module

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_se_resnet(version, num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

SE_ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. SE_ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • version (int) – Version of ResNet. Options are 1, 2.

  • num_layers (int) – Numbers of layers. Options are 18, 34, 50, 101, 152.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_ssd(name, base_size, features, filters, sizes, ratios, steps, classes, dataset, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Get SSD models.

Parameters
  • name (str or None) – Model name, if None is used, you must specify features to be a HybridBlock.

  • base_size (int) – Base image size for training, this is fixed once training is assigned. A fixed base size still allows you to have variable input size during test.

  • features (iterable of str or HybridBlock) – List of network internal output names, in order to specify which layers are used for predicting bbox values. If name is None, features must be a HybridBlock which generate multiple outputs for prediction.

  • filters (iterable of float or None) – List of convolution layer channels which is going to be appended to the base network feature extractor. If name is None, this is ignored.

  • sizes (iterable fo float) – Sizes of anchor boxes, this should be a list of floats, in incremental order. The length of sizes must be len(layers) + 1. For example, a two stage SSD model can have sizes = [30, 60, 90], and it converts to [30, 60] and [60, 90] for the two stages, respectively. For more details, please refer to original paper.

  • ratios (iterable of list) – Aspect ratios of anchors in each output layer. Its length must be equals to the number of SSD output layers.

  • steps (list of int) – Step size of anchor boxes in each output layer.

  • classes (iterable of str) – Names of categories.

  • dataset (str) – Name of dataset. This is used to identify model name because models trained on different datasets are going to be very different.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.get_vgg(num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

VGG model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • num_layers (int) – Number of layers for the variant of densenet. Options are 11, 13, 16, 19.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

gluoncv.model_zoo.get_vgg_atrous_extractor(num_layers, im_size, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Get VGG atrous feature extractor networks.

Parameters
  • num_layers (int) – VGG types, can be 11,13,16,19.

  • im_size (int) – VGG detection input size, can be 300, 512.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (mx.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

Returns

The returned network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.get_xcetption(pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Xception model from

Parameters
gluoncv.model_zoo.get_xcetption_71(pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Xception model from

Parameters
gluoncv.model_zoo.get_yolov3(name, stages, filters, anchors, strides, classes, dataset, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Get YOLOV3 models. :param name: Model name, if None is used, you must specify features to be a HybridBlock. :type name: str or None :param stages: List of network internal output names, in order to specify which layers are

used for predicting bbox values. If name is None, features must be a HybridBlock which generate multiple outputs for prediction.

Parameters
  • filters (iterable of float or None) – List of convolution layer channels which is going to be appended to the base network feature extractor. If name is None, this is ignored.

  • sizes (iterable fo float) – Sizes of anchor boxes, this should be a list of floats, in incremental order. The length of sizes must be len(layers) + 1. For example, a two stage SSD model can have sizes = [30, 60, 90], and it converts to [30, 60] and [60, 90] for the two stages, respectively. For more details, please refer to original paper.

  • ratios (iterable of list) – Aspect ratios of anchors in each output layer. Its length must be equals to the number of SSD output layers.

  • steps (list of int) – Step size of anchor boxes in each output layer.

  • classes (iterable of str) – Names of categories.

  • dataset (str) – Name of dataset. This is used to identify model name because models trained on different datasets are going to be very different.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A YOLOV3 detection network.

Return type

HybridBlock

gluoncv.model_zoo.googlenet(classes=1000, pretrained=False, pretrained_base=True, ctx=cpu(0), dropout_ratio=0.4, aux_logits=False, root='~/.mxnet/models', partial_bn=False, **kwargs)[source]

GoogleNet model from “Going Deeper with Convolutions” paper. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.hrnet_w18_small_v1_c(**kwargs)[source]

hhrnet_w18_small_v1 for Imagenet classification

gluoncv.model_zoo.hrnet_w18_small_v1_s(**kwargs)[source]

hrnet_w18_small_v1 for cityscapes segmentation

gluoncv.model_zoo.hrnet_w18_small_v2_c(**kwargs)[source]

hhrnet_w18_small_v2 for Imagenet classification

gluoncv.model_zoo.hrnet_w18_small_v2_s(**kwargs)[source]

hrnet_w18_small_v2 for cityscapes segmentation

gluoncv.model_zoo.hrnet_w30_c(**kwargs)[source]

hhrnet_w30 for Imagenet classification

gluoncv.model_zoo.hrnet_w32_c(**kwargs)[source]

hhrnet_w32 for Imagenet classification

gluoncv.model_zoo.hrnet_w40_c(**kwargs)[source]

hhrnet_w40 for Imagenet classification

gluoncv.model_zoo.hrnet_w44_c(**kwargs)[source]

hhrnet_w44 for Imagenet classification

gluoncv.model_zoo.hrnet_w48_c(**kwargs)[source]

hhrnet_w48 for Imagenet classification

gluoncv.model_zoo.hrnet_w48_s(**kwargs)[source]

hrnet_w48 for cityscapes segmentation

gluoncv.model_zoo.i3d_inceptionv1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, **kwargs)[source]

Inception v1 model trained on Kinetics400 dataset from “Going Deeper with Convolutions” paper.

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_inceptionv3_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, **kwargs)[source]

Inception v3 model trained on Kinetics400 dataset from “Rethinking the Inception Architecture for Computer Vision” paper.

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_nl10_resnet101_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet101 backbone and 10 non-local blocks trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_nl10_resnet50_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet50 backbone and 10 non-local blocks trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_nl5_resnet101_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet101 backbone and 5 non-local blocks trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_nl5_resnet50_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet50 backbone and 5 non-local blocks trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_resnet101_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet101 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_resnet50_v1_custom(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, use_kinetics_pretrain=True, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet50 backbone. Customized for users’s own dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • use_kinetics_pretrain (bool.) – Whether to load Kinetics-400 pre-trained model weights.

gluoncv.model_zoo.i3d_resnet50_v1_hmdb51(nclass=51, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, use_kinetics_pretrain=True, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet50 backbone trained on HMDB51 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_resnet50_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, bn_frozen=False, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet50 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_resnet50_v1_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet50 backbone trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_resnet50_v1_ucf101(nclass=101, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, use_kinetics_pretrain=True, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet50 backbone trained on UCF101 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.inception_v3(pretrained=False, ctx=cpu(0), root='~/.mxnet/models', partial_bn=False, **kwargs)[source]

Inception v3 model from “Rethinking the Inception Architecture for Computer Vision” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.inceptionv1_hmdb51(nclass=51, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

InceptionV1 model trained on HMDB51 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.inceptionv1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, tsn=False, num_segments=1, num_crop=1, partial_bn=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

InceptionV1 model trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.inceptionv1_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, tsn=False, num_segments=1, num_crop=1, partial_bn=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

InceptionV1 model trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.inceptionv1_ucf101(nclass=101, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

InceptionV1 model trained on UCF101 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.inceptionv3_hmdb51(nclass=51, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

InceptionV3 model trained on HMDB51 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.inceptionv3_kinetics400(nclass=400, pretrained=False, pretrained_base=True, tsn=False, num_segments=1, num_crop=1, partial_bn=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

InceptionV3 model trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.inceptionv3_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, tsn=False, num_segments=1, num_crop=1, partial_bn=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

InceptionV3 model trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.inceptionv3_ucf101(nclass=101, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

InceptionV3 model trained on UCF101 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.mask_rcnn_fpn_resnet101_v1d_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_fpn_resnet101_v1d_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_fpn_resnet18_v1b_coco(pretrained=False, pretrained_base=True, rcnn_max_dets=1000, rpn_test_pre_nms=6000, rpn_test_post_nms=1000, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • rcnn_max_dets (int, default is 1000) – Number of rois to retain in RCNN.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_fpn_resnet18_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_fpn_resnet50_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_resnet50_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_fpn_syncbn_mobilenet1_0_coco(pretrained=False, pretrained_base=True, num_devices=0, rcnn_max_dets=1000, rpn_test_pre_nms=6000, rpn_test_post_nms=1000, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • num_devices (int, default is 0) – Number of devices for sync batch norm layer. if less than 1, use all devices available.

  • rcnn_max_dets (int, default is 1000) – Number of rois to retain in RCNN.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_fpn_syncbn_mobilenet1_0_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_fpn_syncbn_resnet18_v1b_coco(pretrained=False, pretrained_base=True, num_devices=0, rcnn_max_dets=1000, rpn_test_pre_nms=6000, rpn_test_post_nms=1000, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • num_devices (int, default is 0) – Number of devices for sync batch norm layer. if less than 1, use all devices available.

  • rcnn_max_dets (int, default is 1000) – Number of rois to retain in RCNN.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_fpn_syncbn_resnet18_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_resnet101_v1d_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_resnet101_v1d_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_resnet18_v1b_coco(pretrained=False, pretrained_base=True, rcnn_max_dets=1000, rpn_test_pre_nms=6000, rpn_test_post_nms=1000, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • rcnn_max_dets (int, default is 1000) – Number of rois to retain in RCNN.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_resnet18_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_resnet50_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_resnet50_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mobilenet0_25(**kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper, with width multiplier 0.25.

Parameters
gluoncv.model_zoo.mobilenet0_5(**kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper, with width multiplier 0.5.

Parameters
gluoncv.model_zoo.mobilenet0_75(**kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper, with width multiplier 0.75.

Parameters
gluoncv.model_zoo.mobilenet1_0(**kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper, with width multiplier 1.0.

Parameters
gluoncv.model_zoo.mobilenet_v2_0_25(**kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper.

Parameters
gluoncv.model_zoo.mobilenet_v2_0_5(**kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper.

Parameters
gluoncv.model_zoo.mobilenet_v2_0_75(**kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper.

Parameters
gluoncv.model_zoo.mobilenet_v2_1_0(**kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper.

Parameters
gluoncv.model_zoo.nasnet_4_1056(**kwargs)[source]

NASNet A model from “Learning Transferable Architectures for Scalable Image Recognition” paper

Parameters
  • repeat (int) – Number of cell repeats

  • penultimate_filters (int) – Number of filters in the penultimate layer of the network

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.nasnet_5_1538(**kwargs)[source]

NASNet A model from “Learning Transferable Architectures for Scalable Image Recognition” paper

Parameters
  • repeat (int) – Number of cell repeats

  • penultimate_filters (int) – Number of filters in the penultimate layer of the network

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.nasnet_6_4032(**kwargs)[source]

NASNet A model from “Learning Transferable Architectures for Scalable Image Recognition” paper

Parameters
  • repeat (int) – Number of cell repeats

  • penultimate_filters (int) – Number of filters in the penultimate layer of the network

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.nasnet_7_1920(**kwargs)[source]

NASNet A model from “Learning Transferable Architectures for Scalable Image Recognition” paper

Parameters
  • repeat (int) – Number of cell repeats

  • penultimate_filters (int) – Number of filters in the penultimate layer of the network

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.p3d_resnet101_kinetics400(nclass=400, pretrained=False, pretrained_base=True, root='~/.mxnet/models', num_segments=1, num_crop=1, feat_ext=False, ctx=cpu(0), **kwargs)[source]

The Pseudo 3D network (P3D) with ResNet101 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.p3d_resnet50_kinetics400(nclass=400, pretrained=False, pretrained_base=True, root='~/.mxnet/models', num_segments=1, num_crop=1, feat_ext=False, ctx=cpu(0), **kwargs)[source]

The Pseudo 3D network (P3D) with ResNet50 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.pretrained_model_list()[source]

Get list of model which has pretrained weights available.

gluoncv.model_zoo.r2plus1d_resnet101_kinetics400(nclass=400, pretrained=False, pretrained_base=True, root='~/.mxnet/models', num_segments=1, num_crop=1, feat_ext=False, ctx=cpu(0), **kwargs)[source]

R2Plus1D with ResNet101 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.r2plus1d_resnet152_kinetics400(nclass=400, pretrained=False, pretrained_base=True, root='~/.mxnet/models', num_segments=1, num_crop=1, feat_ext=False, ctx=cpu(0), **kwargs)[source]

R2Plus1D with ResNet152 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.r2plus1d_resnet18_kinetics400(nclass=400, pretrained=False, pretrained_base=True, root='~/.mxnet/models', num_segments=1, num_crop=1, feat_ext=False, ctx=cpu(0), **kwargs)[source]

R2Plus1D with ResNet18 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.r2plus1d_resnet34_kinetics400(nclass=400, pretrained=False, pretrained_base=True, root='~/.mxnet/models', num_segments=1, num_crop=1, feat_ext=False, ctx=cpu(0), **kwargs)[source]

R2Plus1D with ResNet34 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.r2plus1d_resnet50_kinetics400(nclass=400, pretrained=False, pretrained_base=True, root='~/.mxnet/models', num_segments=1, num_crop=1, feat_ext=False, ctx=cpu(0), **kwargs)[source]

R2Plus1D with ResNet50 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.residualattentionnet128(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.residualattentionnet164(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.residualattentionnet200(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.residualattentionnet236(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.residualattentionnet452(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.residualattentionnet56(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.residualattentionnet92(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
gluoncv.model_zoo.resnet101_v1(**kwargs)[source]

ResNet-101 V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
gluoncv.model_zoo.resnet101_v1b(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-101 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet101_v1b_gn(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-50 GroupNorm model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet101_v1b_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet101 model trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet101_v1b_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet101 model trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet101_v1c(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1c-101 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet101_v1d(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1d-101 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet101_v1e(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1e-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet101_v1s(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1s-101 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet101_v2(**kwargs)[source]

ResNet-101 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.resnet152_v1(**kwargs)[source]

ResNet-152 V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
gluoncv.model_zoo.resnet152_v1b(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-152 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet152_v1b_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet152 model trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet152_v1b_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet152 model trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet152_v1c(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1c-152 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet152_v1d(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1d-152 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet152_v1e(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1e-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet152_v1s(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1s-152 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet152_v2(**kwargs)[source]

ResNet-152 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.resnet18_v1(**kwargs)[source]

ResNet-18 V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
gluoncv.model_zoo.resnet18_v1b(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-18 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet18_v1b_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet18 model trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet18_v1b_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet18 model trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet18_v2(**kwargs)[source]

ResNet-18 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.resnet34_v1(**kwargs)[source]

ResNet-34 V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
gluoncv.model_zoo.resnet34_v1b(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-34 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet34_v1b_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet34 model trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet34_v1b_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet34 model trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet34_v2(**kwargs)[source]

ResNet-34 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.resnet50_v1(**kwargs)[source]

ResNet-50 V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
gluoncv.model_zoo.resnet50_v1b(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet50_v1b_custom(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), use_kinetics_pretrain=True, **kwargs)[source]

ResNet50 model customized for any dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • use_kinetics_pretrain (bool, default True.) – Whether to load pretrained weights on Kinetics400 dataset as model initialization.

gluoncv.model_zoo.resnet50_v1b_gn(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-50 GroupNorm model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet50_v1b_hmdb51(nclass=51, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet50 model trained on HMDB51 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet50_v1b_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet50 model trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet50_v1b_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet50 model trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet50_v1b_ucf101(nclass=101, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet50 model trained on UCF101 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet50_v1c(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1c-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet50_v1d(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1d-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet50_v1e(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1e-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet50_v1s(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1s-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet50_v2(**kwargs)[source]

ResNet-50 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet101_v1(**kwargs)[source]

SE-ResNet-101 V1 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet101_v2(**kwargs)[source]

SE-ResNet-101 V2 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet152_v1(**kwargs)[source]

SE-ResNet-152 V1 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet152_v2(**kwargs)[source]

SE-ResNet-152 V2 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet18_v1(**kwargs)[source]

SE-ResNet-18 V1 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet18_v2(**kwargs)[source]

SE-ResNet-18 V2 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet34_v1(**kwargs)[source]

SE-ResNet-34 V1 model from “Squeeze-and-Excitation Networks” paper.

Parameters
gluoncv.model_zoo.se_resnet34_v2(**kwargs)[source]

SE-ResNet-34 V2 model from “Squeeze-and-Excitation Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (