gluoncv.model_zoo

GluonCV Model Zoo

gluoncv.model_zoo.get_model

Returns a pre-defined GluonCV model by name.

Hint

This is the recommended method for getting a pre-defined model.

It support directly loading models from Gluon Model Zoo as well.

get_model

Returns a pre-defined model by name

Image Classification

CIFAR

get_cifar_resnet

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper.

cifar_resnet20_v1

ResNet-20 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

cifar_resnet56_v1

ResNet-56 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

cifar_resnet110_v1

ResNet-110 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

cifar_resnet20_v2

ResNet-20 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

cifar_resnet56_v2

ResNet-56 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

cifar_resnet110_v2

ResNet-110 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

get_cifar_wide_resnet

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper.

cifar_wideresnet16_10

WideResNet-16-10 model for CIFAR10 from “Wide Residual Networks” paper.

cifar_wideresnet28_10

WideResNet-28-10 model for CIFAR10 from “Wide Residual Networks” paper.

cifar_wideresnet40_8

WideResNet-40-8 model for CIFAR10 from “Wide Residual Networks” paper.

ImageNet

We apply dilattion strategy to pre-trained ResNet models (with stride of 8). Please see gluoncv.model_zoo.SegBaseModel for how to use it.

ResNetV1b

Pre-trained ResNetV1b Model, which produces the strides of 8 featuremaps at conv5.

resnet18_v1b

Constructs a ResNetV1b-18 model.

resnet34_v1b

Constructs a ResNetV1b-34 model.

resnet50_v1b

Constructs a ResNetV1b-50 model.

resnet101_v1b

Constructs a ResNetV1b-101 model.

resnet152_v1b

Constructs a ResNetV1b-152 model.

ResNeSt

ResNeSt

ResNeSt Model :param block: Class for the residual block. Options are BasicBlockV1, BottleneckV1. :type block: Block :param layers: Numbers of layers in each block :type layers: list of int :param classes: Number of classification classes. :type classes: int, default 1000 :param dilated: Applying dilation strategy to pretrained ResNet yielding a stride-8 model, typically used in Semantic Segmentation. :type dilated: bool, default False :param norm_layer: Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm. :type norm_layer: object :param last_gamma: Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero. :type last_gamma: bool, default False :param deep_stem: Whether to replace the 7x7 conv1 with 3 3x3 convolution layers. :type deep_stem: bool, default False :param avg_down: Whether to use average pooling for projection skip connection between stages/downsample. :type avg_down: bool, default False :param final_drop: Dropout ratio before the final classification layer. :type final_drop: float, default 0.0 :param use_global_stats: Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models. :type use_global_stats: bool, default False :param Reference: - He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. - Yu, Fisher, and Vladlen Koltun. “Multi-scale context aggregation by dilated convolutions.”.

resnest14

Constructs a ResNeSt-14 model.

resnest26

Constructs a ResNeSt-26 model.

resnest50

Constructs a ResNeSt-50 model.

resnest101

Constructs a ResNeSt-101 model.

resnest200

Constructs a ResNeSt-200 model.

resnest269

Constructs a ResNeSt-269 model.

MobileNet

MobileNet

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper.

MobileNetV2

MobileNetV2 model from the “Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation” paper. :param multiplier: The width multiplier for controlling the model size. The actual number of channels is equal to the original channel size multiplied by this multiplier. :type multiplier: float, default 1.0 :param classes: Number of classes for the output layer. :type classes: int, default 1000 :param norm_layer: Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm. :type norm_layer: object :param norm_kwargs: Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm. :type norm_kwargs: dict.

get_mobilenet

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper.

get_mobilenet_v2

MobileNetV2 model from the “Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation” paper.

mobilenet1_0

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper, with width multiplier 1.0.

mobilenet0_75

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper, with width multiplier 0.75.

mobilenet0_5

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper, with width multiplier 0.5.

mobilenet0_25

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper, with width multiplier 0.25.

DenseNet

DenseNet

Densenet-BC model from the “Densely Connected Convolutional Networks” paper.

densenet121

Densenet-BC 121-layer model from the “Densely Connected Convolutional Networks” paper.

densenet161

Densenet-BC 161-layer model from the “Densely Connected Convolutional Networks” paper.

densenet169

Densenet-BC 169-layer model from the “Densely Connected Convolutional Networks” paper.

densenet201

Densenet-BC 201-layer model from the “Densely Connected Convolutional Networks” paper.

Object Detection

SSD

SSD

Single-shot Object Detection Network: https://arxiv.org/abs/1512.02325.

get_ssd

Get SSD models.

ssd_300_vgg16_atrous_voc

SSD architecture with VGG16 atrous 300x300 base network for Pascal VOC.

ssd_300_vgg16_atrous_coco

SSD architecture with VGG16 atrous 300x300 base network for COCO.

ssd_300_vgg16_atrous_custom

SSD architecture with VGG16 atrous 300x300 base network for COCO.

ssd_512_vgg16_atrous_voc

SSD architecture with VGG16 atrous 512x512 base network.

ssd_512_vgg16_atrous_coco

SSD architecture with VGG16 atrous layers for COCO.

ssd_512_vgg16_atrous_custom

SSD architecture with VGG16 atrous 300x300 base network for COCO.

ssd_512_resnet50_v1_voc

SSD architecture with ResNet v1 50 layers.

ssd_512_resnet50_v1_coco

SSD architecture with ResNet v1 50 layers for COCO.

ssd_512_resnet50_v1_custom

SSD architecture with ResNet50 v1 512 base network for custom dataset.

ssd_512_resnet101_v2_voc

SSD architecture with ResNet v2 101 layers.

ssd_512_resnet152_v2_voc

SSD architecture with ResNet v2 152 layers.

VGGAtrousExtractor

VGG Atrous multi layer feature extractor which produces multiple output feature maps.

get_vgg_atrous_extractor

Get VGG atrous feature extractor networks.

vgg16_atrous_300

Get VGG atrous 16 layer 300 in_size feature extractor networks.

vgg16_atrous_512

Get VGG atrous 16 layer 512 in_size feature extractor networks.

Faster RCNN

FasterRCNN

Faster RCNN network.

get_faster_rcnn

Utility function to return faster rcnn networks.

faster_rcnn_resnet50_v1b_voc

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J.

faster_rcnn_resnet50_v1b_coco

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J.

faster_rcnn_resnet50_v1b_custom

Faster RCNN model with resnet50_v1b base network on custom dataset.

YOLOv3

YOLOV3

YOLO V3 detection network. Reference: https://arxiv.org/pdf/1804.02767.pdf. :param stages: Staged feature extraction blocks. For example, 3 stages and 3 YOLO output layers are used original paper. :type stages: mxnet.gluon.HybridBlock :param channels: Number of conv channels for each appended stage. len(channels) should match len(stages). :type channels: iterable :param num_class: Number of foreground objects. :type num_class: int :param anchors: The anchor setting. len(anchors) should match len(stages). :type anchors: iterable :param strides: Strides of feature map. len(strides) should match len(stages). :type strides: iterable :param alloc_size: For advanced users. Define alloc_size to generate large enough anchor maps, which will later saved in parameters. During inference, we support arbitrary input image by cropping corresponding area of the anchor map. This allow us to export to symbol so we can run it in c++, Scalar, etc. :type alloc_size: tuple of int, default is (128, 128) :param nms_thresh: Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS. :type nms_thresh: float, default is 0.45. :param nms_topk: Apply NMS to top k detection results, use -1 to disable so that every Detection result is used in NMS. :type nms_topk: int, default is 400 :param post_nms: Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections. :type post_nms: int, default is 100 :param pos_iou_thresh: IOU threshold for true anchors that match real objects. ‘pos_iou_thresh < 1’ is not implemented. :type pos_iou_thresh: float, default is 1.0 :param ignore_iou_thresh: Anchors that has IOU in range(ignore_iou_thresh, pos_iou_thresh) don’t get penalized of objectness score. :type ignore_iou_thresh: float :param norm_layer: Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm. :type norm_layer: object :param norm_kwargs: Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm. :type norm_kwargs: dict.

get_yolov3

Get YOLOV3 models. :param name: Model name, if None is used, you must specify features to be a HybridBlock. :type name: str or None :param stages: List of network internal output names, in order to specify which layers are used for predicting bbox values. If name is None, features must be a HybridBlock which generate multiple outputs for prediction. :type stages: iterable of str or HybridBlock :param filters: List of convolution layer channels which is going to be appended to the base network feature extractor. If name is None, this is ignored. :type filters: iterable of float or None :param sizes: Sizes of anchor boxes, this should be a list of floats, in incremental order. The length of sizes must be len(layers) + 1. For example, a two stage SSD model can have sizes = [30, 60, 90], and it converts to [30, 60] and [60, 90] for the two stages, respectively. For more details, please refer to original paper. :type sizes: iterable fo float :param ratios: Aspect ratios of anchors in each output layer. Its length must be equals to the number of SSD output layers. :type ratios: iterable of list :param steps: Step size of anchor boxes in each output layer. :type steps: list of int :param classes: Names of categories. :type classes: iterable of str :param dataset: Name of dataset. This is used to identify model name because models trained on different datasets are going to be very different. :type dataset: str :param pretrained: Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights. :type pretrained: bool or str :param pretrained_base: Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect. :type pretrained_base: bool or str, optional, default is True :param ctx: Context such as mx.cpu(), mx.gpu(0). :type ctx: mxnet.Context :param root: Model weights storing path. :type root: str :param norm_layer: Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm. :type norm_layer: object :param norm_kwargs: Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm. :type norm_kwargs: dict.

yolo3_darknet53_voc

YOLO3 multi-scale with darknet53 base network on VOC dataset. :param pretrained_base: Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights. :type pretrained_base: bool or str :param pretrained: Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights. :type pretrained: bool or str :param norm_layer: Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm. :type norm_layer: object :param norm_kwargs: Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm. :type norm_kwargs: dict.

yolo3_darknet53_coco

YOLO3 multi-scale with darknet53 base network on COCO dataset. :param pretrained_base: Whether fetch and load pretrained weights for base network. :type pretrained_base: boolean :param pretrained: Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights. :type pretrained: bool or str :param norm_layer: Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm. :type norm_layer: object :param norm_kwargs: Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm. :type norm_kwargs: dict.

yolo3_darknet53_custom

YOLO3 multi-scale with darknet53 base network on custom dataset. :param classes: Names of custom foreground classes. len(classes) is the number of foreground classes. :type classes: iterable of str :param transfer: If not None, will try to reuse pre-trained weights from yolo networks trained on other datasets. :type transfer: str or None :param pretrained_base: Whether fetch and load pretrained weights for base network. :type pretrained_base: boolean :param norm_layer: Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm. :type norm_layer: object :param norm_kwargs: Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm. :type norm_kwargs: dict.

Instance Segmentation

Mask RCNN

MaskRCNN

Mask RCNN network.

get_mask_rcnn

Utility function to return mask rcnn networks.

mask_rcnn_resnet50_v1b_coco

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R.

Semantic Segmentation

FCN

FCN

Fully Convolutional Networks for Semantic Segmentation

get_fcn

FCN model from the paper “Fully Convolutional Network for semantic segmentation”

get_fcn_resnet50_voc

FCN model with base network ResNet-50 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

get_fcn_resnet101_voc

FCN model with base network ResNet-101 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

get_fcn_resnet101_coco

FCN model with base network ResNet-101 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

get_fcn_resnet50_ade

FCN model with base network ResNet-50 pre-trained on ADE20K dataset from the paper “Fully Convolutional Network for semantic segmentation”

get_fcn_resnet101_ade

FCN model with base network ResNet-50 pre-trained on ADE20K dataset from the paper “Fully Convolutional Network for semantic segmentation”

PSPNet

PSPNet

Pyramid Scene Parsing Network

get_psp

Pyramid Scene Parsing Network :param dataset: The dataset that model pretrained on. (pascal_voc, ade20k) :type dataset: str, default pascal_voc :param pretrained: Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights. :type pretrained: bool or str :param ctx: The context in which to load the pretrained weights. :type ctx: Context, default CPU :param root: Location for keeping the model parameters. :type root: str, default ‘~/.mxnet/models’ :param pretrained_base: This will load pretrained backbone network, that was trained on ImageNet. :type pretrained_base: bool or str, default True.

get_psp_resnet101_coco

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights. :type pretrained: bool or str :param ctx: The context in which to load the pretrained weights. :type ctx: Context, default CPU :param root: Location for keeping the model parameters. :type root: str, default ‘~/.mxnet/models’.

get_psp_resnet101_voc

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights. :type pretrained: bool or str :param ctx: The context in which to load the pretrained weights. :type ctx: Context, default CPU :param root: Location for keeping the model parameters. :type root: str, default ‘~/.mxnet/models’.

get_psp_resnet50_ade

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights. :type pretrained: bool or str :param ctx: The context in which to load the pretrained weights. :type ctx: Context, default CPU :param root: Location for keeping the model parameters. :type root: str, default ‘~/.mxnet/models’.

get_psp_resnet101_ade

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights. :type pretrained: bool or str :param ctx: The context in which to load the pretrained weights. :type ctx: Context, default CPU :param root: Location for keeping the model parameters. :type root: str, default ‘~/.mxnet/models’.

DeepLabV3

DeepLabV3

param nclass

Number of categories for the training dataset.

get_deeplab

DeepLabV3 :param dataset: The dataset that model pretrained on. (pascal_voc, pascal_aug, ade20k, coco, citys) :type dataset: str, default pascal_voc :param pretrained: Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights. :type pretrained: bool or str :param ctx: The context in which to load the pretrained weights. :type ctx: Context, default CPU :param root: Location for keeping the model parameters. :type root: str, default ‘~/.mxnet/models’.

get_deeplab_resnet101_coco

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights. :type pretrained: bool or str :param ctx: The context in which to load the pretrained weights. :type ctx: Context, default CPU :param root: Location for keeping the model parameters. :type root: str, default ‘~/.mxnet/models’.

get_deeplab_resnet101_voc

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights. :type pretrained: bool or str :param ctx: The context in which to load the pretrained weights. :type ctx: Context, default CPU :param root: Location for keeping the model parameters. :type root: str, default ‘~/.mxnet/models’.

get_deeplab_resnet50_ade

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights. :type pretrained: bool or str :param ctx: The context in which to load the pretrained weights. :type ctx: Context, default CPU :param root: Location for keeping the model parameters. :type root: str, default ‘~/.mxnet/models’.

get_deeplab_resnet101_ade

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights. :type pretrained: bool or str :param ctx: The context in which to load the pretrained weights. :type ctx: Context, default CPU :param root: Location for keeping the model parameters. :type root: str, default ‘~/.mxnet/models’.

Action Recognition

TSN

vgg16_ucf101

VGG16 model trained on UCF101 dataset.

vgg16_hmdb51

VGG16 model trained on HMDB51 dataset.

vgg16_kinetics400

VGG16 model trained on Kinetics400 dataset.

vgg16_sthsthv2

VGG16 model trained on Something-Something-V2 dataset.

inceptionv1_ucf101

InceptionV1 model trained on UCF101 dataset.

inceptionv1_hmdb51

InceptionV1 model trained on HMDB51 dataset.

inceptionv1_kinetics400

InceptionV1 model trained on Kinetics400 dataset.

inceptionv1_sthsthv2

InceptionV1 model trained on Something-Something-V2 dataset.

inceptionv3_ucf101

InceptionV3 model trained on UCF101 dataset.

inceptionv3_hmdb51

InceptionV3 model trained on HMDB51 dataset.

inceptionv3_kinetics400

InceptionV3 model trained on Kinetics400 dataset.

inceptionv3_sthsthv2

InceptionV3 model trained on Something-Something-V2 dataset.

resnet18_v1b_sthsthv2

ResNet18 model trained on Something-Something-V2 dataset.

resnet34_v1b_sthsthv2

ResNet34 model trained on Something-Something-V2 dataset.

resnet50_v1b_sthsthv2

ResNet50 model trained on Something-Something-V2 dataset.

resnet101_v1b_sthsthv2

ResNet101 model trained on Something-Something-V2 dataset.

resnet152_v1b_sthsthv2

ResNet152 model trained on Something-Something-V2 dataset.

resnet18_v1b_kinetics400

ResNet18 model trained on Kinetics400 dataset.

resnet34_v1b_kinetics400

ResNet34 model trained on Kinetics400 dataset.

resnet50_v1b_kinetics400

ResNet50 model trained on Kinetics400 dataset.

resnet101_v1b_kinetics400

ResNet101 model trained on Kinetics400 dataset.

resnet152_v1b_kinetics400

ResNet152 model trained on Kinetics400 dataset.

resnet50_v1b_ucf101

ResNet50 model trained on UCF101 dataset.

resnet50_v1b_hmdb51

ResNet50 model trained on HMDB51 dataset.

resnet50_v1b_custom

ResNet50 model customized for any dataset.

C3D

C3D

The Convolutional 3D network (C3D).

c3d_kinetics400

The Convolutional 3D network (C3D) trained on Kinetics400 dataset.

I3D

I3D_InceptionV1

Inception v1 model from “Going Deeper with Convolutions” paper.

i3d_inceptionv1_kinetics400

Inception v1 model trained on Kinetics400 dataset from “Going Deeper with Convolutions” paper.

I3D_InceptionV3

Inception v3 model from “Rethinking the Inception Architecture for Computer Vision” paper.

i3d_inceptionv3_kinetics400

Inception v3 model trained on Kinetics400 dataset from “Rethinking the Inception Architecture for Computer Vision” paper.

I3D_ResNetV1

ResNet_I3D backbone.

i3d_resnet50_v1_kinetics400

Inflated 3D model (I3D) with ResNet50 backbone trained on Kinetics400 dataset.

i3d_resnet101_v1_kinetics400

Inflated 3D model (I3D) with ResNet101 backbone trained on Kinetics400 dataset.

i3d_nl5_resnet50_v1_kinetics400

Inflated 3D model (I3D) with ResNet50 backbone and 5 non-local blocks trained on Kinetics400 dataset.

i3d_nl10_resnet50_v1_kinetics400

Inflated 3D model (I3D) with ResNet50 backbone and 10 non-local blocks trained on Kinetics400 dataset.

i3d_nl5_resnet101_v1_kinetics400

Inflated 3D model (I3D) with ResNet101 backbone and 5 non-local blocks trained on Kinetics400 dataset.

i3d_nl10_resnet101_v1_kinetics400

Inflated 3D model (I3D) with ResNet101 backbone and 10 non-local blocks trained on Kinetics400 dataset.

i3d_resnet50_v1_sthsthv2

Inflated 3D model (I3D) with ResNet50 backbone trained on Something-Something-V2 dataset.

i3d_resnet50_v1_hmdb51

Inflated 3D model (I3D) with ResNet50 backbone trained on HMDB51 dataset.

i3d_resnet50_v1_ucf101

Inflated 3D model (I3D) with ResNet50 backbone trained on UCF101 dataset.

i3d_resnet50_v1_custom

Inflated 3D model (I3D) with ResNet50 backbone.

P3D

P3D

The Pseudo 3D network (P3D).

p3d_resnet50_kinetics400

The Pseudo 3D network (P3D) with ResNet50 backbone trained on Kinetics400 dataset.

p3d_resnet101_kinetics400

The Pseudo 3D network (P3D) with ResNet101 backbone trained on Kinetics400 dataset.

R2+1D

R2Plus1D

The R2+1D network.

r2plus1d_resnet18_kinetics400

R2Plus1D with ResNet18 backbone trained on Kinetics400 dataset.

r2plus1d_resnet34_kinetics400

R2Plus1D with ResNet34 backbone trained on Kinetics400 dataset.

r2plus1d_resnet50_kinetics400

R2Plus1D with ResNet50 backbone trained on Kinetics400 dataset.

r2plus1d_resnet101_kinetics400

R2Plus1D with ResNet101 backbone trained on Kinetics400 dataset.

r2plus1d_resnet152_kinetics400

R2Plus1D with ResNet152 backbone trained on Kinetics400 dataset.

SlowFast

SlowFast

SlowFast networks (SlowFast) from “SlowFast Networks for Video Recognition” paper.

slowfast_4x16_resnet50_kinetics400

SlowFast 4x16 networks (SlowFast) with ResNet50 backbone trained on Kinetics400 dataset.

slowfast_8x8_resnet50_kinetics400

SlowFast 8x8 networks (SlowFast) with ResNet50 backbone trained on Kinetics400 dataset.

slowfast_4x16_resnet101_kinetics400

SlowFast 4x16 networks (SlowFast) with ResNet101 backbone trained on Kinetics400 dataset.

slowfast_8x8_resnet101_kinetics400

SlowFast 8x8 networks (SlowFast) with ResNet101 backbone trained on Kinetics400 dataset.

slowfast_16x8_resnet101_kinetics400

SlowFast 16x8 networks (SlowFast) with ResNet101 backbone trained on Kinetics400 dataset.

slowfast_16x8_resnet101_50_50_kinetics400

SlowFast 16x8 networks (SlowFast) with ResNet101 backbone trained on Kinetics400 dataset, but the temporal head is initialized with ResNet50 structure (3, 4, 6, 3).

slowfast_4x16_resnet50_custom

SlowFast 4x16 networks (SlowFast) with ResNet50 backbone.

API Reference

Network definitions of GluonCV models

GluonCV Model Zoo

class gluoncv.model_zoo.ABC[source]

Helper class that provides a standard way to create an ABC using inheritance.

class gluoncv.model_zoo.AlexNet(classes=1000, **kwargs)[source]

AlexNet model from the “One weird trick…” paper.

Parameters

classes (int, default 1000) – Number of classes for the output layer.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BaseAnchorBasedTracktor[source]
abstract anchors()[source]
abstract clean_up()[source]

Clean up after running one video

abstract detect_and_track(frame, tracking_anchor_indices, tracking_anchor_weights, tracking_classes)[source]

Perform detection and tracking on the new frame

Parameters
  • frame (HxWx3 RGB image) –

  • tracking_anchor_indices (NxM ndarray) –

  • NxM ndarray (tracking_anchor_weights) –

  • tracking_classes (Nx1 ndarray of the class ids of the tracked object) –

  • Returns – detection_bounding_boxes: all detection results, in (x0, y0, x1, y1, confidence, cls) format detection_source: source anchor box indices for each detection tracking_boxes: all tracking results, in (x0, y0, x1, y1, confidence) format extract_info: extra information from the tracktor, e.g. landmarks, a dict

  • -------

abstract prepare_for_frame(frame)[source]

This method should run anything that needs to happen before the motion prediction. It can prepare the detector or even run the backbone feature extractions. It can also provide data to motion prediction :param frame: :type frame: the frame data, the same as in the detect_and_track method

Returns

motion_predict_data

Return type

optional data provided to motion prediction, if no data is provided, return None

class gluoncv.model_zoo.BasicBlockV1(channels, stride, downsample=False, in_channels=0, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

BasicBlock V1 from “Deep Residual Learning for Image Recognition” paper. This is used for ResNet V1 for 18, 34 layers.

Parameters
  • channels (int) – Number of output channels.

  • stride (int) – Stride size.

  • downsample (bool, default False) – Whether to downsample the input.

  • in_channels (int, default 0) – Number of input channels. Default is 0, to infer from the graph.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_se (bool, default False) – Whether to use Squeeze-and-Excitation module

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BasicBlockV1b(planes, strides=1, dilation=1, downsample=None, previous_dilation=1, norm_layer=None, norm_kwargs=None, **kwargs)[source]

ResNetV1b BasicBlockV1b

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BasicBlockV2(channels, stride, downsample=False, in_channels=0, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

BasicBlock V2 from “Identity Mappings in Deep Residual Networks” paper. This is used for ResNet V2 for 18, 34 layers.

Parameters
  • channels (int) – Number of output channels.

  • stride (int) – Stride size.

  • downsample (bool, default False) – Whether to downsample the input.

  • in_channels (int, default 0) – Number of input channels. Default is 0, to infer from the graph.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_se (bool, default False) – Whether to use Squeeze-and-Excitation module

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.Block(channels, cardinality, bottleneck_width, stride, downsample=False, last_gamma=False, use_se=False, avg_down=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Bottleneck Block from “Aggregated Residual Transformations for Deep Neural Network” paper.

Parameters
  • cardinality (int) – Number of groups

  • bottleneck_width (int) – Width of bottleneck block

  • stride (int) – Stride size.

  • downsample (bool, default False) – Whether to downsample the input.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_se (bool, default False) – Whether to use Squeeze-and-Excitation module

  • avg_down (bool, default False) – Whether to use average pooling for projection skip connection between stages/downsample.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.Bottleneck(channels, cardinality=1, bottleneck_width=64, strides=1, dilation=1, downsample=None, previous_dilation=1, norm_layer=None, norm_kwargs=None, last_gamma=False, dropblock_prob=0, input_size=None, use_splat=False, radix=2, avd=False, avd_first=False, in_channels=None, split_drop_ratio=0, **kwargs)[source]

ResNeSt Bottleneck

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BottleneckV1(channels, stride, downsample=False, in_channels=0, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Bottleneck V1 from “Deep Residual Learning for Image Recognition” paper. This is used for ResNet V1 for 50, 101, 152 layers.

Parameters
  • channels (int) – Number of output channels.

  • stride (int) – Stride size.

  • downsample (bool, default False) – Whether to downsample the input.

  • in_channels (int, default 0) – Number of input channels. Default is 0, to infer from the graph.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_se (bool, default False) – Whether to use Squeeze-and-Excitation module

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BottleneckV1b(planes, strides=1, dilation=1, downsample=None, previous_dilation=1, norm_layer=None, norm_kwargs=None, last_gamma=False, **kwargs)[source]

ResNetV1b BottleneckV1b

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.BottleneckV2(channels, stride, downsample=False, in_channels=0, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Bottleneck V2 from “Identity Mappings in Deep Residual Networks” paper. This is used for ResNet V2 for 50, 101, 152 layers.

Parameters
  • channels (int) – Number of output channels.

  • stride (int) – Stride size.

  • downsample (bool, default False) – Whether to downsample the input.

  • in_channels (int, default 0) – Number of input channels. Default is 0, to infer from the graph.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_se (bool, default False) – Whether to use Squeeze-and-Excitation module

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.C3D(nclass, dropout_ratio=0.5, num_segments=1, num_crop=1, feat_ext=False, init_std=0.001, ctx=None, **kwargs)[source]

The Convolutional 3D network (C3D). Learning Spatiotemporal Features with 3D Convolutional Networks. ICCV, 2015. https://arxiv.org/abs/1412.0767

Parameters
  • nclass (int) – Number of classes in the training dataset.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • dropout_ratio (float) – Dropout value used in the dropout layers after dense layers to avoid overfitting.

  • init_std (float) – Default standard deviation value for initializing dense layers.

  • ctx (str) – Context, default CPU. The context in which to load the pretrained weights.

hybrid_forward(F, x)[source]

Hybrid forward of C3D net

class gluoncv.model_zoo.COCODetection(root='~/.mxnet/datasets/coco', splits=('instances_val2017'), transform=None, min_object_area=0, skip_empty=True, use_crowd=True)[source]

MS COCO detection dataset.

Parameters
  • root (str, default '~/.mxnet/datasets/coco') – Path to folder storing the dataset.

  • splits (list of str, default ['instances_val2017']) – Json annotations name. Candidates can be: instances_val2017, instances_train2017.

  • transform (callable, default None) –

    A function that takes data and label and transforms them. Refer to ./transforms for examples.

    A transform function for object detection should take label into consideration, because any geometric modification will require label to be modified.

  • min_object_area (float) – Minimum accepted ground-truth area, if an object’s area is smaller than this value, it will be ignored.

  • skip_empty (bool, default is True) – Whether skip images with no valid object. This should be True in training, otherwise it will cause undefined behavior.

  • use_crowd (bool, default is True) – Whether use boxes labeled as crowd instance.

property annotation_dir

The subdir for annotations. Default is ‘annotations’(coco default) For example, a coco format json file will be searched as ‘root/annotation_dir/xxx.json’ You can override if custom dataset don’t follow the same pattern

property classes

Category names.

property coco

Return pycocotools object for evaluation purposes.

get_im_aspect_ratio()[source]

Return the aspect ratio of each image in the order of the raw data.

class gluoncv.model_zoo.CenterNet(base_network, heads, classes, head_conv_channel=0, scale=4.0, topk=100, flip_test=False, nms_thresh=0, nms_topk=400, post_nms=100, **kwargs)[source]

Objects as Points. https://arxiv.org/abs/1904.07850v2

Parameters
  • base_network (mxnet.gluon.nn.HybridBlock) – The base feature extraction network.

  • heads (OrderedDict) –

    OrderedDict with specifications for each head. For example: OrderedDict([

    (‘heatmap’, {‘num_output’: len(classes), ‘bias’: -2.19}), (‘wh’, {‘num_output’: 2}), (‘reg’, {‘num_output’: 2}) ])

  • classes (list of str) – Category names.

  • head_conv_channel (int, default is 0) – If > 0, will use an extra conv layer before each of the real heads.

  • scale (float, default is 4.0) – The downsampling ratio of the entire network.

  • topk (int, default is 100) – Number of outputs .

  • flip_test (bool) – Whether apply flip test in inference (training mode not affected).

  • nms_thresh (float, default is 0.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS. By default nms is disabled.

  • nms_topk (int, default is 400) –

    Apply NMS to top k detection results, use -1 to disable so that every Detection

    result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

hybrid_forward(F, x)[source]

Hybrid forward of center net

property num_classes

Return number of foreground classes.

Returns

Number of foreground classes

Return type

int

reset_class(classes, reuse_weights=None)[source]

Reset class categories and class predictors.

Parameters
  • classes (iterable of str) – The new categories. [‘apple’, ‘orange’] for example.

  • reuse_weights (dict) – A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict, or a list of [name0, name1,…] if class names don’t change. This allows the new predictor to reuse the previously trained weights specified.

Example

>>> net = gluoncv.model_zoo.get_model('center_net_resnet50_v1b_voc', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the 14th category in VOC
>>> net.reset_class(classes=['person'], reuse_weights={0:14})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':14})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
set_nms(nms_thresh=0, nms_topk=400, post_nms=100)[source]

Set non-maximum suppression parameters.

Parameters
  • nms_thresh (float, default is 0.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS. By default NMS is disabled.

  • nms_topk (int, default is 400) –

    Apply NMS to top k detection results, use -1 to disable so that every Detection

    result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

Returns

Return type

None

class gluoncv.model_zoo.DUC(planes, upscale_factor=2, **kwargs)[source]

Upsampling layer with pixel shuffle

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DarknetV3(layers, channels, classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Darknet v3.

Parameters
  • layers (iterable) – Description of parameter layers.

  • channels (iterable) – Description of parameter channels.

  • classes (int, default is 1000) – Number of classes, which determines the dense layer output channels.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

features

Feature extraction layers.

Type

mxnet.gluon.nn.HybridSequential

output

A classes(1000)-way Fully-Connected Layer.

Type

mxnet.gluon.nn.Dense

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DeepLabV3(nclass, backbone='resnet50', aux=True, ctx=cpu(0), pretrained_base=True, height=None, width=None, base_size=520, crop_size=480, **kwargs)[source]
Parameters
  • nclass (int) – Number of categories for the training dataset.

  • backbone (string) – Pre-trained dilated backbone network type (default:’resnet50’; ‘resnet50’, ‘resnet101’ or ‘resnet152’).

  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).

  • aux (bool) – Auxiliary loss.

Reference:

Chen, Liang-Chieh, et al. “Rethinking atrous convolution for semantic image segmentation.” arXiv preprint arXiv:1706.05587 (2017).

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DeepLabV3Plus(nclass, backbone='xception', aux=True, ctx=cpu(0), pretrained_base=True, height=None, width=None, base_size=576, crop_size=512, dilated=True, **kwargs)[source]
Parameters
  • nclass (int) – Number of categories for the training dataset.

  • backbone (string) – Pre-trained dilated backbone network type (default:’xception’).

  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).

  • aux (bool) – Auxiliary loss.

Reference:

Chen, Liang-Chieh, et al. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.”

evaluate(x)[source]

evaluating network with inputs and targets

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DeepLabWV3Plus(nclass, backbone='wideresnet', aux=False, ctx=cpu(0), pretrained_base=True, height=None, width=None, base_size=520, crop_size=480, dilated=True, **kwargs)[source]
Parameters
  • nclass (int) – Number of categories for the training dataset.

  • backbone (string) – Pre-trained dilated backbone network type (default:’wideresnet’).

  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).

  • aux (bool) – Auxiliary loss.

  • Reference – Chen, Liang-Chieh, et al. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.”, https://arxiv.org/abs/1802.02611, ECCV 2018

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DenseNet(num_init_features, growth_rate, block_config, bn_size=4, dropout=0, classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Densenet-BC model from the “Densely Connected Convolutional Networks” paper.

Parameters
  • num_init_features (int) – Number of filters to learn in the first convolution layer.

  • growth_rate (int) – Number of filters to add each layer (k in the paper).

  • block_config (list of int) – List of integers for numbers of layers in each pooling block.

  • bn_size (int, default 4) – Multiplicative factor for number of bottle neck layers. (i.e. bn_size * k features in the bottleneck layer)

  • dropout (float, default 0) – Rate of dropout after each dense layer.

  • classes (int, default 1000) – Number of classification classes.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DepthDecoder(num_ch_enc, scales=range(0, 4), num_output_channels=1, use_skips=True)[source]

Decoder of Monodepth2

Parameters
  • num_ch_enc (list) – The channels number of encoder.

  • scales (list) – The scales used in the loss. (Default: range(4))

  • num_output_channels (int) – The number of output channels. (Default: 1)

  • use_skips (bool) – This will use skip architecture in the network. (Default: True)

hybrid_forward(F, input_features)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DepthwiseRPN(bz=1, is_train=False, ctx=cpu(0), anchor_num=5, out_channels=256)[source]

get cls and loc throught z_f and x_f

Parameters
  • bz (int) – batch size for train, bz = 1 if test.

  • is_train (str) – is_train is True if train, False if test.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • anchor_num (int) – number of anchor.

  • out_channels (int) – hidden feature channel.

hybrid_forward(F, z_f, x_f)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.DoubleHeadRCNN(features, top_features, classes, box_features=None, short=600, max_size=1000, min_stage=4, max_stage=4, train_patterns=None, nms_thresh=0.3, nms_topk=400, post_nms=100, roi_mode='align', roi_size=(14, 14), strides=16, clip=None, rpn_channel=1024, base_size=16, scales=(8, 16, 32), ratios=(0.5, 1, 2), alloc_size=(128, 128), rpn_nms_thresh=0.7, rpn_train_pre_nms=12000, rpn_train_post_nms=2000, rpn_test_pre_nms=6000, rpn_test_post_nms=300, rpn_min_size=16, per_device_batch_size=1, num_sample=128, pos_iou_thresh=0.5, pos_ratio=0.25, max_num_gt=300, additional_output=False, force_nms=False, minimal_opset=False, **kwargs)[source]

Double Head RCNN network.

Parameters
  • features (gluon.HybridBlock) – Base feature extractor before feature pooling layer.

  • top_features (gluon.HybridBlock) – Tail feature extractor after feature pooling layer.

  • classes (iterable of str) – Names of categories, its length is num_class.

  • box_features (gluon.HybridBlock, default is None) – feature head for transforming shared ROI output (top_features) for box prediction. If set to None, global average pooling will be used.

  • short (int, default is 600.) – Input image short side size.

  • max_size (int, default is 1000.) – Maximum size of input image long side.

  • min_stage (int, default is 4) – Minimum stage NO. for FPN stages.

  • max_stage (int, default is 4) – Maximum stage NO. for FPN stages.

  • train_patterns (str, default is None.) – Matching pattern for trainable parameters.

  • nms_thresh (float, default is 0.3.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

  • nms_topk (int, default is 400) – Apply NMS to top k detection results, use -1 to disable so that every Detection result is used in NMS.

  • roi_mode (str, default is align) – ROI pooling mode. Currently support ‘pool’ and ‘align’.

  • roi_size (tuple of int, length 2, default is (14, 14)) – (height, width) of the ROI region.

  • strides (int/tuple of ints, default is 16) – Feature map stride with respect to original image. This is usually the ratio between original image size and feature map size. For FPN, use a tuple of ints.

  • clip (float, default is None) – Clip bounding box prediction to to prevent exponentiation from overflowing.

  • rpn_channel (int, default is 1024) – number of channels used in RPN convolutional layers.

  • base_size (int) – The width(and height) of reference anchor box.

  • scales (iterable of float, default is (8, 16, 32)) –

    The areas of anchor boxes. We use the following form to compute the shapes of anchors:

    \[width_{anchor} = size_{base} \times scale \times \sqrt{ 1 / ratio} height_{anchor} = size_{base} \times scale \times \sqrt{ratio}\]

  • ratios (iterable of float, default is (0.5, 1, 2)) – The aspect ratios of anchor boxes. We expect it to be a list or tuple.

  • alloc_size (tuple of int) – Allocate size for the anchor boxes as (H, W). Usually we generate enough anchors for large feature map, e.g. 128x128. Later in inference we can have variable input sizes, at which time we can crop corresponding anchors from this large anchor map so we can skip re-generating anchors for each input.

  • rpn_train_pre_nms (int, default is 12000) – Filter top proposals before NMS in training of RPN.

  • rpn_train_post_nms (int, default is 2000) – Return top proposal results after NMS in training of RPN. Will be set to rpn_train_pre_nms if it is larger than rpn_train_pre_nms.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN. Will be set to rpn_test_pre_nms if it is larger than rpn_test_pre_nms.

  • rpn_nms_thresh (float, default is 0.7) – IOU threshold for NMS. It is used to remove overlapping proposals.

  • rpn_num_sample (int, default is 256) – Number of samples for RPN targets.

  • rpn_pos_iou_thresh (float, default is 0.7) – Anchor with IOU larger than pos_iou_thresh is regarded as positive samples.

  • rpn_neg_iou_thresh (float, default is 0.3) – Anchor with IOU smaller than neg_iou_thresh is regarded as negative samples. Anchors with IOU in between pos_iou_thresh and neg_iou_thresh are ignored.

  • rpn_pos_ratio (float, default is 0.5) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.

  • rpn_box_norm (array-like of size 4, default is (1., 1., 1., 1.)) – Std value to be divided from encoded values.

  • rpn_min_size (int, default is 16) – Proposals whose size is smaller than min_size will be discarded.

  • per_device_batch_size (int, default is 1) – Batch size for each device during training.

  • num_sample (int, default is 128) – Number of samples for RCNN targets.

  • pos_iou_thresh (float, default is 0.5) – Proposal whose IOU larger than pos_iou_thresh is regarded as positive samples.

  • pos_ratio (float, default is 0.25) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.

  • max_num_gt (int, default is 300) – Maximum ground-truth number for each example. This is only an upper bound, not necessarily very precise. However, using a very big number may impact the training speed.

  • additional_output (boolean, default is False) – additional_output is only used for Mask R-CNN to get internal outputs.

  • force_nms (bool, default is False) – Appy NMS to all categories, this is to avoid overlapping detection results from different categories.

  • minimal_opset (bool, default is False) – We sometimes add special operators to accelerate training/inference, however, for exporting to third party compilers we want to utilize most widely used operators. If minimal_opset is True, the network will use a minimal set of operators good for e.g., TVM.

classes

Names of categories, its length is num_class.

Type

iterable of str

num_class

Number of positive categories.

Type

int

short

Input image short side size.

Type

int

max_size

Maximum size of input image long side.

Type

int

train_patterns

Matching pattern for trainable parameters.

Type

str

nms_thresh

Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

Type

float

nms_topk
Apply NMS to top k detection results, use -1 to disable so that every Detection

result is used in NMS.

Type

int

force_nms

Appy NMS to all categories, this is to avoid overlapping detection results from different categories.

Type

bool

rpn_target_generator

Generate training targets with cls_target, box_target, and box_mask.

Type

gluon.Block

target_generator

Generate training targets with boxes, samples, matches, gt_label and gt_box.

Type

gluon.Block

hybrid_forward(F, x, gt_box=None, gt_label=None)[source]

Forward DoubleHeadRCNN-RCNN network.

The behavior during training and inference is different.

Parameters
  • x (mxnet.nd.NDArray or mxnet.symbol) – The network input tensor.

  • gt_box (type, only required during training) – The ground-truth bbox tensor with shape (B, N, 4).

  • gt_label (type, only required during training) – The ground-truth label tensor with shape (B, 1, 4).

Returns

During inference, returns final class id, confidence scores, bounding boxes.

Return type

(ids, scores, bboxes)

reset_class(classes, reuse_weights=None)[source]

Reset class categories and class predictors.

Parameters
  • classes (iterable of str) – The new categories. [‘apple’, ‘orange’] for example.

  • reuse_weights (dict) – A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict, or a list of [name0, name1,…] if class names don’t change. This allows the new predictor to reuse the previously trained weights specified.

Example

>>> net = gluoncv.model_zoo.get_model('faster_rcnn_resnet50_v1b_coco', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the 14th category in VOC
>>> net.reset_class(classes=['person'], reuse_weights={0:14})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':14})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
property target_generator

Returns stored target generator

Returns

The RCNN target generator

Return type

mxnet.gluon.HybridBlock

class gluoncv.model_zoo.DummyMotionEstimator[source]
initialize(first_frame, first_frame_motion_pred_data)[source]

Initialize the motion estimator by feeding the first frame

Parameters
  • first_frame (data of the first frame) –

  • first_frame_motion_pred_data (additional data for motion prediction) –

  • Returns – cache_information

  • -------

predict_new_locations(prev_frame_cache: numpy.ndarray, prev_bboxes: numpy.ndarray, new_frame: numpy.ndarray, skip: bool = False, **kwargs)[source]

The abstract method for predicting movement of bounding boxes given the two frames. :param prev_frame_cache: :type prev_frame_cache: cached image from motion estimation, numpy.ndarray :param prev_bboxes: :type prev_bboxes: Nx4 numpy.ndarray, bounding boxes in (left, top, right, bottom) format :param new_frame: :type new_frame: BGR image, numpy.ndarray :param new_frame_motion_pred_data: :type new_frame_motion_pred_data: additional data for motion prediction :param tracked_boxes_anchor_indices: :type tracked_boxes_anchor_indices: anchor indices used to build the prev_bboxes :param tracked_boxes_anchor_weights: :type tracked_boxes_anchor_weights: voting weights of anchors used to build prev_bboxes :param skip: :type skip: whether to just skip motion estimation for this frame :param kwargs: :type kwargs: other information :param Returns: new_boxes: Nx4 numpy.ndarray

cache_information:

Parameters

-------

class gluoncv.model_zoo.FCN(nclass, backbone='resnet50', aux=True, ctx=cpu(0), pretrained_base=True, base_size=520, crop_size=480, **kwargs)[source]

Fully Convolutional Networks for Semantic Segmentation

Parameters
  • nclass (int) – Number of categories for the training dataset.

  • backbone (string) – Pre-trained dilated backbone network type (default:’resnet50’; ‘resnet50’, ‘resnet101’ or ‘resnet152’).

  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm;

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

  • pretrained_base (bool or str) – Refers to if the FCN backbone or the encoder is pretrained or not. If True, model weights of a model that was trained on ImageNet is loaded.

Reference:

Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” CVPR, 2015

Examples

>>> model = FCN(nclass=21, backbone='resnet50')
>>> print(model)
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.FarneBeckFlowMotionEstimator(flow_scale=256)[source]

Use the farnebeck algorithm for the flow-based motion estimator

compute_flow(prev_frame_cache, prepared_new_frame)[source]

Compute dense optical flow :param prev_frame_cache: :param prepared_new_frame: :param Returns: flow_map: a NxMx2 map. each spatial local contains a 2-element vector

specifying the delta in x and y directions. The unit of delta is pixel in this flow_map’s coordinate space

Parameters

-------

class gluoncv.model_zoo.FastSCNN(nclass, aux=True, ctx=cpu(0), pretrained_base=False, height=None, width=None, base_size=2048, crop_size=1024, **kwargs)[source]

Fast-SCNN: Fast Semantic Segmentation Network

Parameters
  • nclass (int) – Number of categories for the training dataset.

  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm).

  • aux (bool) – Auxiliary loss.

Reference:

Rudra P K Poudel, et al. https://bmvc2019.org/wp-content/uploads/papers/0959-paper.pdf “Fast-SCNN: Fast Semantic Segmentation Network.” BMVC, 2019

demo(x)[source]

fastscnn demo

evaluate(x)[source]

evaluating network with inputs and targets

hybrid_forward(F, x)[source]

hybrid forward for Fast SCNN

predict(x)[source]

fastscnn predict

class gluoncv.model_zoo.FasterRCNN(features, top_features, classes, box_features=None, short=600, max_size=1000, min_stage=4, max_stage=4, train_patterns=None, nms_thresh=0.3, nms_topk=400, post_nms=100, roi_mode='align', roi_size=(14, 14), strides=16, clip=None, rpn_channel=1024, base_size=16, scales=(8, 16, 32), ratios=(0.5, 1, 2), alloc_size=(128, 128), rpn_nms_thresh=0.7, rpn_train_pre_nms=12000, rpn_train_post_nms=2000, rpn_test_pre_nms=6000, rpn_test_post_nms=300, rpn_min_size=16, per_device_batch_size=1, num_sample=128, pos_iou_thresh=0.5, pos_ratio=0.25, max_num_gt=300, additional_output=False, force_nms=False, minimal_opset=False, **kwargs)[source]

Faster RCNN network.

Parameters
  • features (gluon.HybridBlock) – Base feature extractor before feature pooling layer.

  • top_features (gluon.HybridBlock) – Tail feature extractor after feature pooling layer.

  • classes (iterable of str) – Names of categories, its length is num_class.

  • box_features (gluon.HybridBlock, default is None) – feature head for transforming shared ROI output (top_features) for box prediction. If set to None, global average pooling will be used.

  • short (int, default is 600.) – Input image short side size.

  • max_size (int, default is 1000.) – Maximum size of input image long side.

  • min_stage (int, default is 4) – Minimum stage NO. for FPN stages.

  • max_stage (int, default is 4) – Maximum stage NO. for FPN stages.

  • train_patterns (str, default is None.) – Matching pattern for trainable parameters.

  • nms_thresh (float, default is 0.3.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

  • nms_topk (int, default is 400) – Apply NMS to top k detection results, use -1 to disable so that every Detection result is used in NMS.

  • roi_mode (str, default is align) – ROI pooling mode. Currently support ‘pool’ and ‘align’.

  • roi_size (tuple of int, length 2, default is (14, 14)) – (height, width) of the ROI region.

  • strides (int/tuple of ints, default is 16) – Feature map stride with respect to original image. This is usually the ratio between original image size and feature map size. For FPN, use a tuple of ints.

  • clip (float, default is None) – Clip bounding box prediction to to prevent exponentiation from overflowing.

  • rpn_channel (int, default is 1024) – number of channels used in RPN convolutional layers.

  • base_size (int) – The width(and height) of reference anchor box.

  • scales (iterable of float, default is (8, 16, 32)) –

    The areas of anchor boxes. We use the following form to compute the shapes of anchors:

    \[width_{anchor} = size_{base} \times scale \times \sqrt{ 1 / ratio} height_{anchor} = size_{base} \times scale \times \sqrt{ratio}\]

  • ratios (iterable of float, default is (0.5, 1, 2)) – The aspect ratios of anchor boxes. We expect it to be a list or tuple.

  • alloc_size (tuple of int) – Allocate size for the anchor boxes as (H, W). Usually we generate enough anchors for large feature map, e.g. 128x128. Later in inference we can have variable input sizes, at which time we can crop corresponding anchors from this large anchor map so we can skip re-generating anchors for each input.

  • rpn_train_pre_nms (int, default is 12000) – Filter top proposals before NMS in training of RPN.

  • rpn_train_post_nms (int, default is 2000) – Return top proposal results after NMS in training of RPN. Will be set to rpn_train_pre_nms if it is larger than rpn_train_pre_nms.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN. Will be set to rpn_test_pre_nms if it is larger than rpn_test_pre_nms.

  • rpn_nms_thresh (float, default is 0.7) – IOU threshold for NMS. It is used to remove overlapping proposals.

  • rpn_num_sample (int, default is 256) – Number of samples for RPN targets.

  • rpn_pos_iou_thresh (float, default is 0.7) – Anchor with IOU larger than pos_iou_thresh is regarded as positive samples.

  • rpn_neg_iou_thresh (float, default is 0.3) – Anchor with IOU smaller than neg_iou_thresh is regarded as negative samples. Anchors with IOU in between pos_iou_thresh and neg_iou_thresh are ignored.

  • rpn_pos_ratio (float, default is 0.5) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.

  • rpn_box_norm (array-like of size 4, default is (1., 1., 1., 1.)) – Std value to be divided from encoded values.

  • rpn_min_size (int, default is 16) – Proposals whose size is smaller than min_size will be discarded.

  • per_device_batch_size (int, default is 1) – Batch size for each device during training.

  • num_sample (int, default is 128) – Number of samples for RCNN targets.

  • pos_iou_thresh (float, default is 0.5) – Proposal whose IOU larger than pos_iou_thresh is regarded as positive samples.

  • pos_ratio (float, default is 0.25) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.

  • max_num_gt (int, default is 300) – Maximum ground-truth number for each example. This is only an upper bound, not necessarily very precise. However, using a very big number may impact the training speed.

  • additional_output (boolean, default is False) – additional_output is only used for Mask R-CNN to get internal outputs.

  • force_nms (bool, default is False) – Appy NMS to all categories, this is to avoid overlapping detection results from different categories.

  • minimal_opset (bool, default is False) – We sometimes add special operators to accelerate training/inference, however, for exporting to third party compilers we want to utilize most widely used operators. If minimal_opset is True, the network will use a minimal set of operators good for e.g., TVM.

classes

Names of categories, its length is num_class.

Type

iterable of str

num_class

Number of positive categories.

Type

int

short

Input image short side size.

Type

int

max_size

Maximum size of input image long side.

Type

int

train_patterns

Matching pattern for trainable parameters.

Type

str

nms_thresh

Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

Type

float

nms_topk
Apply NMS to top k detection results, use -1 to disable so that every Detection

result is used in NMS.

Type

int

force_nms

Appy NMS to all categories, this is to avoid overlapping detection results from different categories.

Type

bool

rpn_target_generator

Generate training targets with cls_target, box_target, and box_mask.

Type

gluon.Block

target_generator

Generate training targets with boxes, samples, matches, gt_label and gt_box.

Type

gluon.Block

hybrid_forward(F, x, gt_box=None, gt_label=None)[source]

Forward Faster-RCNN network.

The behavior during training and inference is different.

Parameters
  • x (mxnet.nd.NDArray or mxnet.symbol) – The network input tensor.

  • gt_box (type, only required during training) – The ground-truth bbox tensor with shape (B, N, 4).

  • gt_label (type, only required during training) – The ground-truth label tensor with shape (B, 1, 4).

Returns

During inference, returns final class id, confidence scores, bounding boxes.

Return type

(ids, scores, bboxes)

reset_class(classes, reuse_weights=None)[source]

Reset class categories and class predictors.

Parameters
  • classes (iterable of str) – The new categories. [‘apple’, ‘orange’] for example.

  • reuse_weights (dict) – A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict, or a list of [name0, name1,…] if class names don’t change. This allows the new predictor to reuse the previously trained weights specified.

Example

>>> net = gluoncv.model_zoo.get_model('faster_rcnn_resnet50_v1b_coco', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the 14th category in VOC
>>> net.reset_class(classes=['person'], reuse_weights={0:14})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':14})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
property target_generator

Returns stored target generator

Returns

The RCNN target generator

Return type

mxnet.gluon.HybridBlock

class gluoncv.model_zoo.ForwardBackwardTask(net, optimizer, rpn_cls_loss, rpn_box_loss, rcnn_cls_loss, rcnn_box_loss, rcnn_mask_loss, amp_enabled)[source]

Mask R-CNN training task that can be scheduled concurrently using Parallel. :param net: Faster R-CNN network. :type net: gluon.HybridBlock :param optimizer: Optimizer for the training. :type optimizer: gluon.Trainer :param rpn_cls_loss: RPN box classification loss. :type rpn_cls_loss: gluon.loss :param rpn_box_loss: RPN box regression loss. :type rpn_box_loss: gluon.loss :param rcnn_cls_loss: R-CNN box head classification loss. :type rcnn_cls_loss: gluon.loss :param rcnn_box_loss: R-CNN box head regression loss. :type rcnn_box_loss: gluon.loss :param rcnn_mask_loss: R-CNN mask head segmentation loss. :type rcnn_mask_loss: gluon.loss :param amp_enabled: Whether to enable Automatic Mixed Precision. :type amp_enabled: bool

forward_backward(x)[source]

Forward and backward computation.

class gluoncv.model_zoo.GluonSSDMultiClassTracktor(gpu_id=0, detector_thresh=0.5, model_name='', use_pretrained=False, param_path='', data_shape=512)[source]

Initiate a tracktor based on an object detetor.

anchors()[source]
clean_up()[source]

Clean up after running one video

prepare_for_frame(frame)[source]

This method should run anything that needs to happen before the motion prediction. It can prepare the detector or even run the backbone feature extractions. It can also provide data to motion prediction :param frame: :type frame: the frame data, the same as in the detect_and_track method

Returns

motion_predict_data

Return type

optional data provided to motion prediction, if no data is provided, return None

class gluoncv.model_zoo.GoogLeNet(classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, dropout_ratio=0.4, aux_logits=False, norm_kwargs=None, partial_bn=False, pretrained_base=True, ctx=None, **kwargs)[source]

GoogleNet model from “Going Deeper with Convolutions” paper. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” paper.

Parameters
  • classes (int, default 1000) – Number of classification classes.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.HybridBlock(prefix=None, params=None)[source]

HybridBlock supports forwarding with both Symbol and NDArray.

HybridBlock is similar to Block, with a few differences:

import mxnet as mx
from mxnet.gluon import HybridBlock, nn

class Model(HybridBlock):
    def __init__(self, **kwargs):
        super(Model, self).__init__(**kwargs)
        # use name_scope to give child Blocks appropriate names.
        with self.name_scope():
            self.dense0 = nn.Dense(20)
            self.dense1 = nn.Dense(20)

    def hybrid_forward(self, F, x):
        x = F.relu(self.dense0(x))
        return F.relu(self.dense1(x))

model = Model()
model.initialize(ctx=mx.cpu(0))
model.hybridize()
model(mx.nd.zeros((10, 10), ctx=mx.cpu(0)))

Forward computation in HybridBlock must be static to work with Symbol s, i.e. you cannot call NDArray.asnumpy(), NDArray.shape, NDArray.dtype, NDArray indexing (x[i]) etc on tensors. Also, you cannot use branching or loop logic that bases on non-constant expressions like random numbers or intermediate results, since they change the graph structure for each iteration.

Before activating with hybridize(), HybridBlock works just like normal Block. After activation, HybridBlock will create a symbolic graph representing the forward computation and cache it. On subsequent forwards, the cached graph will be used instead of hybrid_forward().

Please see references for detailed tutorial.

References

Hybrid - Faster training and easy deployment

cast(dtype)[source]

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

export(path, epoch=0, remove_amp_cast=True)[source]

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports, mxnet.mod.Module or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters
  • path (str) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number.

  • epoch (int) – Epoch number of saved model.

forward(x, *args)[source]

Defines the forward computation. Arguments can be either NDArray or Symbol.

hybrid_forward(F, x, *args, **kwargs)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

hybridize(active=True, backend=None, backend_opts=None, **kwargs)[source]

Activates or deactivates HybridBlock s recursively. Has no effect on non-hybrid children.

Parameters
  • active (bool, default True) – Whether to turn hybrid on or off.

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

infer_shape(*args)[source]

Infers shape of Parameters from inputs.

infer_type(*args)[source]

Infers data type of Parameters from inputs.

optimize_for(x, *args, backend=None, backend_opts=None, **kwargs)[source]

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

register_child(block, name=None)[source]

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_op_hook(callback, monitor_all=False)[source]

Install op hook for block recursively.

Parameters
  • callback (function) – Takes a string and a NDArrayHandle.

  • monitor_all (bool, default False) – If true, monitor both input and output, otherwise monitor output only.

class gluoncv.model_zoo.I3D_InceptionV1(nclass=1000, pretrained=False, pretrained_base=True, num_segments=1, num_crop=1, feat_ext=False, dropout_ratio=0.5, init_std=0.01, partial_bn=False, ctx=None, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Inception v1 model from “Going Deeper with Convolutions” paper.

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper. Slight differences between this implementation and the original implementation due to padding.

Parameters
  • nclass (int) – Number of classes in the training dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • dropout_ratio (float, default is 0.5.) – The dropout rate of a dropout layer. The larger the value, the more strength to prevent overfitting.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • init_std (float, default is 0.001.) – Standard deviation value when initialize the dense layers.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.I3D_InceptionV3(nclass=1000, pretrained=False, pretrained_base=True, num_segments=1, num_crop=1, feat_ext=False, dropout_ratio=0.5, init_std=0.01, partial_bn=False, ctx=None, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Inception v3 model from “Rethinking the Inception Architecture for Computer Vision” paper.

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

This model definition file is written by Brais and modified by Yi.

Parameters
  • nclass (int) – Number of classes in the training dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • dropout_ratio (float, default is 0.5.) – The dropout rate of a dropout layer. The larger the value, the more strength to prevent overfitting.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • init_std (float, default is 0.001.) – Standard deviation value when initialize the dense layers.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.I3D_ResNetV1(nclass, depth, num_stages=4, pretrained=False, pretrained_base=True, feat_ext=False, num_segments=1, num_crop=1, spatial_strides=(1, 2, 2, 2), temporal_strides=(1, 1, 1, 1), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), conv1_kernel_t=5, conv1_stride_t=2, pool1_kernel_t=1, pool1_stride_t=2, inflate_freq=(1, 1, 1, 1), inflate_stride=(1, 1, 1, 1), inflate_style='3x1x1', nonlocal_stages=(-1, ), nonlocal_freq=(0, 1, 1, 0), nonlocal_cfg=None, bn_eval=True, bn_frozen=False, partial_bn=False, frozen_stages=-1, dropout_ratio=0.5, init_std=0.01, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, ctx=None, **kwargs)[source]

ResNet_I3D backbone. Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • depth (int, default is 50.) – Depth of ResNet, from {18, 34, 50, 101, 152}.

  • num_stages (int, default is 4.) – Number of stages in a ResNet.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • spatial_strides (tuple of int.) – Strides in the spatial dimension of the first block of each stage.

  • temporal_strides (tuple of int.) – Strides in the temporal dimension of the first block of each stage.

  • dilations (tuple of int.) – Dilation ratio of each stage.

  • out_indices (tuple of int.) – Collect features from the selected stages of ResNet, usually used for feature extraction or auxililary loss.

  • conv1_kernel_t (int, default is 5.) – The kernel size of first convolutional layer in a ResNet.

  • conv1_stride_t (int, default is 2.) – The stride of first convolutional layer in a ResNet.

  • pool1_kernel_t (int, default is 1.) – The kernel size of first pooling layer in a ResNet.

  • pool1_stride_t (int, default is 2.) – The stride of first pooling layer in a ResNet.

  • inflate_freq (tuple of int.) – Select which 2D convolutional layers to be inflated to 3D convolutional layers in each stage.

  • inflate_stride (tuple of int.) – The stride for inflated layers in each stage.

  • inflate_style (str, default is '3x1x1'.) – How to inflate a 2D kernel, either ‘3x1x1’ or ‘1x3x3’.

  • nonlocal_stages (tuple of int.) – Select which stage we need non-local blocks.

  • nonlocal_freq (tuple of int.) – Select where to insert non-local blocks in each stage.

  • nonlocal_cfg (dict.) – Additional non-local arguments, for example nonlocal_type=’gaussian’.

  • bn_eval (bool.) – Whether to set BN layers to eval mode, namely, freeze running stats (mean and var).

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • frozen_stages (int.) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • dropout_ratio (float, default is 0.5.) – The dropout rate of a dropout layer. The larger the value, the more strength to prevent overfitting.

  • init_std (float, default is 0.001.) – Standard deviation value when initialize the dense layers.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

hybrid_forward(F, x)[source]

Hybrid forward of I3D network

init_weights(ctx)[source]

Initial I3D network with its 2D pretrained weights.

class gluoncv.model_zoo.Inception3(classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, partial_bn=False, **kwargs)[source]

Inception v3 model from “Rethinking the Inception Architecture for Computer Vision” paper.

Parameters
  • classes (int, default 1000) – Number of classification classes.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.MaskRCNN(features, top_features, classes, mask_channels=256, rcnn_max_dets=1000, rpn_test_pre_nms=6000, rpn_test_post_nms=1000, target_roi_scale=1, num_fcn_convs=0, norm_layer=None, norm_kwargs=None, **kwargs)[source]

Mask RCNN network.

Parameters
  • features (gluon.HybridBlock) – Base feature extractor before feature pooling layer.

  • top_features (gluon.HybridBlock) – Tail feature extractor after feature pooling layer.

  • classes (iterable of str) – Names of categories, its length is num_class.

  • mask_channels (int, default is 256) – Number of channels in mask prediction

  • rcnn_max_dets (int, default is 1000) – Number of rois to retain in RCNN. Upper bounded by min of rpn_test_pre_nms and rpn_test_post_nms.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 1000) – Return top proposal results after NMS in testing of RPN. Will be set to rpn_test_pre_nms if it is larger than rpn_test_pre_nms.

  • target_roi_scale (int, default 1) – Ratio of mask output roi / input roi. For model with FPN, this is typically 2.

  • num_fcn_convs (int, default 0) – number of convolution blocks before deconv layer. For FPN network this is typically 4.

hybrid_forward(F, x, gt_box=None, gt_label=None)[source]

Forward Mask RCNN network.

The behavior during training and inference is different.

Parameters
  • x (mxnet.nd.NDArray or mxnet.symbol) – The network input tensor.

  • gt_box (type, only required during training) – The ground-truth bbox tensor with shape (1, N, 4).

  • gt_label (type, only required during training) – The ground-truth label tensor with shape (B, 1, 4).

Returns

During inference, returns final class id, confidence scores, bounding boxes, segmentation masks.

Return type

(ids, scores, bboxes, masks)

reset_class(classes, reuse_weights=None)[source]

Reset class categories and class predictors.

Parameters
  • classes (iterable of str) – The new categories. [‘apple’, ‘orange’] for example.

  • reuse_weights (dict) – A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict, or a list of [name0, name1,…] if class names don’t change. This allows the new predictor to reuse the previously trained weights specified.

Example

>>> net = gluoncv.model_zoo.get_model('mask_rcnn_resnet50_v1b_voc', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the first category in COCO
>>> net.reset_class(classes=['person'], reuse_weights={0:0})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':0})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
class gluoncv.model_zoo.MobileNet(multiplier=1.0, classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper.

Parameters
  • multiplier (float, default 1.0) – The width multiplier for controlling the model size. Only multipliers that are no less than 0.25 are supported. The actual number of channels is equal to the original channel size multiplied by this multiplier.

  • classes (int, default 1000) – Number of classes for the output layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.MobileNetV2(multiplier=1.0, classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper. :param multiplier: The width multiplier for controlling the model size. The actual number of channels

is equal to the original channel size multiplied by this multiplier.

Parameters
  • classes (int, default 1000) – Number of classes for the output layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.MobilePose(base_name, base_attrs=('features'), num_joints=17, pretrained_base=False, pretrained_ctx=cpu(0), **kwargs)[source]

Pose Estimation for Mobile Device

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.MonoDepth2(backbone, pretrained_base, num_input_images=1, scales=range(0, 4), num_output_channels=1, use_skips=True, ctx=cpu(0), **kwargs)[source]

Monodepth2

Parameters
  • backbone (string) – Pre-trained dilated backbone network type (‘resnet18’, ‘resnet34’, ‘resnet50’, ‘resnet101’ or ‘resnet152’).

  • pretrained_base (bool or str) – Refers to if the backbone is pretrained or not. If True, model weights of a model that was trained on ImageNet is loaded.

  • num_input_images (int) – The number of input sequences. 1 for depth encoder, larger than 1 for pose encoder. (Default: 1)

  • scales (list) – The scales used in the loss. (Default: range(4))

  • num_output_channels (int) – The number of output channels. (Default: 1)

  • use_skips (bool) – This will use skip architecture in the network. (Default: True)

  • Reference – Clement Godard, Oisin Mac Aodha, Michael Firman, Gabriel Brostow. “Digging Into Self-Supervised Monocular Depth Estimation.” ICCV, 2019

Examples

>>> model = MonoDepth2(backbone='resnet18', pretrained_base=True)
>>> print(model)
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.MonoDepth2PoseNet(backbone, pretrained_base, num_input_images=2, num_input_features=1, num_frames_to_predict_for=2, stride=1, ctx=cpu(0), **kwargs)[source]

Monodepth2

Parameters
  • backbone (string) – Pre-trained dilated backbone network type (‘resnet18’, ‘resnet34’, ‘resnet50’, ‘resnet101’ or ‘resnet152’).

  • pretrained_base (bool or str) – Refers to if the backbone is pretrained or not. If True, model weights of a model that was trained on ImageNet is loaded.

  • num_input_images (int) – The number of input sequences. 1 for depth encoder, larger than 1 for pose encoder. (Default: 2)

  • num_input_features (int) – The number of input feature maps from posenet encoder. (Default: 1)

  • num_frames_to_predict_for (int) – The number of output pose between frames; If None, it equals num_input_features - 1. (Default: 2)

  • stride (int) – The stride number for Conv in pose decoder. (Default: 1)

  • Reference – Clement Godard, Oisin Mac Aodha, Michael Firman, Gabriel Brostow. “Digging Into Self-Supervised Monocular Depth Estimation.” ICCV, 2019

Examples

>>> model = MonoDepth2PoseNet(backbone='resnet18', pretrained_base=True)
>>> print(model)
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.P3D(nclass, block, layers, shortcut_type='B', block_design=('A', 'B', 'C'), dropout_ratio=0.5, num_segments=1, num_crop=1, feat_ext=False, init_std=0.001, ctx=None, partial_bn=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

The Pseudo 3D network (P3D). Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. ICCV, 2017. https://arxiv.org/abs/1711.10305

Parameters
  • nclass (int) – Number of classes in the training dataset.

  • block (Block, default is Bottleneck.) – Class for the residual block.

  • layers (list of int) – Numbers of layers in each block

  • block_design (tuple of str.) – Different designs for each block, from ‘A’, ‘B’ or ‘C’.

  • dropout_ratio (float, default is 0.5.) – The dropout rate of a dropout layer. The larger the value, the more strength to prevent overfitting.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • init_std (float, default is 0.001.) – Standard deviation value when initialize the dense layers.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Hybrid forward of P3D net

class gluoncv.model_zoo.PSPNet(nclass, backbone='resnet50', aux=True, ctx=cpu(0), pretrained_base=True, base_size=520, crop_size=480, **kwargs)[source]

Pyramid Scene Parsing Network

Parameters
  • nclass (int) – Number of categories for the training dataset.

  • backbone (string) – Pre-trained dilated backbone network type (default:’resnet50’; ‘resnet50’, ‘resnet101’ or ‘resnet152’).

  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).

  • aux (bool) – Auxiliary loss.

Reference:

Zhao, Hengshuang, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. “Pyramid scene parsing network.” CVPR, 2017

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.PoseDecoder(num_ch_enc, num_input_features, num_frames_to_predict_for=2, stride=1)[source]

Decoder of Monodepth2 PoseNet

Parameters
  • num_ch_enc (list) – The channels number of encoder.

  • num_input_features (int) – The number of input sequences. 1 for depth encoder, larger than 1 for pose encoder. (Default: 2)

  • num_frames_to_predict_for (int) – The number of output pose between frames; If None, it equals num_input_features - 1. (Default: 2)

  • stride (int) – The stride number for Conv in pose decoder. (Default: 1)

hybrid_forward(F, input_features)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.R2Plus1D(nclass, block, layers, dropout_ratio=0.5, num_segments=1, num_crop=1, feat_ext=False, init_std=0.001, ctx=None, partial_bn=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

The R2+1D network. A Closer Look at Spatiotemporal Convolutions for Action Recognition. CVPR, 2018. https://arxiv.org/abs/1711.11248

Parameters
  • nclass (int) – Number of classes in the training dataset.

  • block (Block, default is Bottleneck.) – Class for the residual block.

  • layers (list of int) – Numbers of layers in each block

  • dropout_ratio (float, default is 0.5.) – The dropout rate of a dropout layer. The larger the value, the more strength to prevent overfitting.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • init_std (float, default is 0.001.) – Standard deviation value when initialize the dense layers.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Hybrid forward of R2+1D net

class gluoncv.model_zoo.RCNNTargetGenerator(num_class, max_pos=128, per_device_batch_size=1, means=(0.0, 0.0, 0.0, 0.0), stds=(0.1, 0.1, 0.2, 0.2))[source]

RCNN target encoder to generate matching target and regression target values.

Parameters
  • num_class (int) – Number of total number of positive classes.

  • max_pos (int, default is 128) – Upper bound of Number of positive samples.

  • per_device_batch_size (int, default is 1) – Per device batch size

  • means (iterable of float, default is (0., 0., 0., 0.)) – Mean values to be subtracted from regression targets.

  • stds (iterable of float, default is (1, 1, 2, 2)) – Standard deviations to be divided from regression targets.

hybrid_forward(F, roi, samples, matches, gt_label, gt_box)[source]

Components can handle batch images

Parameters
  • roi ((B, N, 4), input proposals) –

  • samples ((B, N), value +1: positive / -1: negative.) –

  • matches ((B, N), value [0, M), index to gt_label and gt_box.) –

  • gt_label ((B, M), value [0, num_class), excluding background class.) –

  • gt_box ((B, M, 4), input ground truth box corner coordinates.) –

Returns

  • cls_target ((B, N), value [0, num_class + 1), including background.)

  • box_target ((B, N, C, 4), only foreground class has nonzero target.)

  • box_weight ((B, N, C, 4), only foreground class has nonzero weight.)

class gluoncv.model_zoo.RCNNTargetSampler(num_image, num_proposal, num_sample, pos_iou_thresh, pos_ratio, max_num_gt)[source]

A sampler to choose positive/negative samples from RCNN Proposals

Parameters
  • num_image (int) – Number of input images.

  • num_proposal (int) – Number of input proposals.

  • num_sample (int) – Number of samples for RCNN targets.

  • pos_iou_thresh (float) – Proposal whose IOU larger than pos_iou_thresh is regarded as positive samples. Proposal whose IOU smaller than pos_iou_thresh is regarded as negative samples.

  • pos_ratio (float) – pos_ratio defines how many positive samples (pos_ratio * num_sample) is to be sampled.

  • max_num_gt (int) – Maximum ground-truth number for each example. This is only an upper bound, not necessarily very precise. However, using a very big number may impact the training speed.

hybrid_forward(F, rois, scores, gt_boxes)[source]

Handle B=self._num_image by a for loop.

Parameters
  • rois ((B, self._num_proposal, 4) encoded in (x1, y1, x2, y2)) –

  • scores ((B, self._num_proposal, 1), value range [0, 1] with ignore value -1.) –

  • gt_boxes ((B, M, 4) encoded in (x1, y1, x2, y2), invalid box should have area of 0.) –

Returns

  • rois ((B, self._num_sample, 4), randomly drawn from proposals)

  • samples ((B, self._num_sample), value +1: positive / 0: ignore / -1: negative.)

  • matches ((B, self._num_sample), value between [0, M))

class gluoncv.model_zoo.ResNeSt(block, layers, cardinality=1, bottleneck_width=64, classes=1000, dilated=False, dilation=1, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, last_gamma=False, deep_stem=False, stem_width=32, avg_down=False, final_drop=0.0, use_global_stats=False, name_prefix='', dropblock_prob=0, input_size=224, use_splat=False, radix=2, avd=False, avd_first=False, split_drop_ratio=0)[source]

ResNeSt Model :param block: Class for the residual block. Options are BasicBlockV1, BottleneckV1. :type block: Block :param layers: Numbers of layers in each block :type layers: list of int :param classes: Number of classification classes. :type classes: int, default 1000 :param dilated: Applying dilation strategy to pretrained ResNet yielding a stride-8 model,

typically used in Semantic Segmentation.

Parameters
  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • deep_stem (bool, default False) – Whether to replace the 7x7 conv1 with 3 3x3 convolution layers.

  • avg_down (bool, default False) – Whether to use average pooling for projection skip connection between stages/downsample.

  • final_drop (float, default 0.0) – Dropout ratio before the final classification layer.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

  • Reference

    • He, Kaiming, et al. “Deep residual learning for image recognition.”

    Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. - Yu, Fisher, and Vladlen Koltun. “Multi-scale context aggregation by dilated convolutions.”

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.ResNetV1(block, layers, channels, classes=1000, thumbnail=False, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
  • block (HybridBlock) – Class for the residual block. Options are BasicBlockV1, BottleneckV1.

  • layers (list of int) – Numbers of layers in each block

  • channels (list of int) – Numbers of channels in each block. Length should be one larger than layers list.

  • classes (int, default 1000) – Number of classification classes.

  • thumbnail (bool, default False) – Enable thumbnail.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_se (bool, default False) – Whether to use Squeeze-and-Excitation module

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.ResNetV1b(block, layers, classes=1000, dilated=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, last_gamma=False, deep_stem=False, stem_width=32, avg_down=False, final_drop=0.0, use_global_stats=False, name_prefix='', **kwargs)[source]

Pre-trained ResNetV1b Model, which produces the strides of 8 featuremaps at conv5.

Parameters
  • block (Block) – Class for the residual block. Options are BasicBlockV1, BottleneckV1.

  • layers (list of int) – Numbers of layers in each block

  • classes (int, default 1000) – Number of classification classes.

  • dilated (bool, default False) – Applying dilation strategy to pretrained ResNet yielding a stride-8 model, typically used in Semantic Segmentation.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • deep_stem (bool, default False) – Whether to replace the 7x7 conv1 with 3 3x3 convolution layers.

  • avg_down (bool, default False) – Whether to use average pooling for projection skip connection between stages/downsample.

  • final_drop (float, default 0.0) – Dropout ratio before the final classification layer.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

Reference:

  • He, Kaiming, et al. “Deep residual learning for image recognition.”

Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

  • Yu, Fisher, and Vladlen Koltun. “Multi-scale context aggregation by dilated convolutions.”

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.ResNetV2(block, layers, channels, classes=1000, thumbnail=False, last_gamma=False, use_se=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • block (HybridBlock) – Class for the residual block. Options are BasicBlockV1, BottleneckV1.

  • layers (list of int) – Numbers of layers in each block

  • channels (list of int) – Numbers of channels in each block. Length should be one larger than layers list.

  • classes (int, default 1000) – Number of classification classes.

  • thumbnail (bool, default False) – Enable thumbnail.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_se (bool, default False) – Whether to use Squeeze-and-Excitation module

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.ResNet_SlowFast(num_classes, depth, pretrained=None, pretrained_base=True, feat_ext=False, num_segments=1, num_crop=1, num_stages=4, spatial_strides=(1, 2, 2, 2), temporal_strides=(1, 1, 1, 1), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), conv1_kernel_t=1, conv1_stride_t=1, pool1_kernel_t=1, pool1_stride_t=1, frozen_stages=-1, inflate_freq=(0, 0, 1, 1), inflate_stride=(1, 1, 1, 1), inflate_style='3x1x1', nonlocal_stages=(-1, ), nonlocal_freq=(0, 0, 0, 0), nonlocal_cfg=None, bn_eval=False, bn_frozen=False, partial_bn=False, dropout_ratio=0.5, init_std=0.01, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, ctx=None, **kwargs)[source]

ResNe(x)t_SlowFast backbone. :param depth: Depth of resnet, from {50, 101, 152}. :type depth: int :param num_stages: Resnet stages, normally 4. :type num_stages: int :param strides: Strides of the first block of each stage. :type strides: Sequence[int] :param dilations: Dilation of each stage. :type dilations: Sequence[int] :param out_indices: Output from which stages. :type out_indices: Sequence[int] :param frozen_stages: Stages to be frozen (all param fixed). -1 means

not freezing any parameters.

Parameters
  • bn_eval (bool) – Whether to set BN layers to eval mode, namely, freeze running stats (mean and var).

  • bn_frozen (bool) – Whether to freeze weight and bias of BN layers.

hybrid_forward(F, x)[source]

Hybrid forward of I3D_slow network

init_weights(ctx)[source]

Initial I3D_slow network.

class gluoncv.model_zoo.ResNext(layers, cardinality, bottleneck_width, classes=1000, last_gamma=False, use_se=False, deep_stem=False, avg_down=False, stem_width=64, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

ResNext model from “Aggregated Residual Transformations for Deep Neural Network” paper.

Parameters
  • layers (list of int) – Numbers of layers in each block

  • cardinality (int) – Number of groups

  • bottleneck_width (int) – Width of bottleneck block

  • classes (int, default 1000) – Number of classification classes.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_se (bool, default False) – Whether to use Squeeze-and-Excitation module

  • deep_stem (bool, default False) – Whether to replace the 7x7 conv1 with 3 3x3 convolution layers.

  • stem_width (int, default 64) – Width of the stem intermediate layer.

  • avg_down (bool, default False) – Whether to use average pooling for projection skip connection between stages/downsample.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.ResidualAttentionModel(scale, m, classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper. Input size is 224 x 224.

Parameters
  • scale (tuple) – Network scale p, t, r.

  • m (tuple) – Network scale m.Network scale is defined as 36m + 20. And normally m is a tuple of (m-1, m, m+1) except m==1 as (1, 1, 1).

  • classes (int, default 1000) – Number of classification classes.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.ResnetEncoder(backbone, pretrained, num_input_images=1, root='/root/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Encoder of Monodepth2

Parameters
  • backbone (string) – Pre-trained dilated backbone network type (‘resnet18’, ‘resnet34’, ‘resnet50’, ‘resnet101’ or ‘resnet152’).

  • pretrained (bool or str) – Refers to if the backbone is pretrained or not. If True, model weights of a model that was trained on ImageNet is loaded.

  • num_input_images (int) – The number of input sequences. 1 for depth encoder, larger than 1 for pose encoder. (Default: 1)

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

hybrid_forward(F, input_image)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_BasicBlockV1(channels, stride, downsample=False, in_channels=0, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

BasicBlock V1 from “Deep Residual Learning for Image Recognition” paper. This is used for SE_ResNet V1 for 18, 34 layers.

Parameters
  • channels (int) – Number of output channels.

  • stride (int) – Stride size.

  • downsample (bool, default False) – Whether to downsample the input.

  • in_channels (int, default 0) – Number of input channels. Default is 0, to infer from the graph.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_BasicBlockV2(channels, stride, downsample=False, in_channels=0, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

BasicBlock V2 from “Identity Mappings in Deep Residual Networks” paper. This is used for SE_ResNet V2 for 18, 34 layers.

Parameters
  • channels (int) – Number of output channels.

  • stride (int) – Stride size.

  • downsample (bool, default False) – Whether to downsample the input.

  • in_channels (int, default 0) – Number of input channels. Default is 0, to infer from the graph.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_BottleneckV1(channels, stride, downsample=False, in_channels=0, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Bottleneck V1 from “Deep Residual Learning for Image Recognition” paper. This is used for SE_ResNet V1 for 50, 101, 152 layers.

Parameters
  • channels (int) – Number of output channels.

  • stride (int) – Stride size.

  • downsample (bool, default False) – Whether to downsample the input.

  • in_channels (int, default 0) – Number of input channels. Default is 0, to infer from the graph.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_BottleneckV2(channels, stride, downsample=False, in_channels=0, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Bottleneck V2 from “Identity Mappings in Deep Residual Networks” paper. This is used for SE_ResNet V2 for 50, 101, 152 layers.

Parameters
  • channels (int) – Number of output channels.

  • stride (int) – Stride size.

  • downsample (bool, default False) – Whether to downsample the input.

  • in_channels (int, default 0) – Number of input channels. Default is 0, to infer from the graph.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_ResNetV1(block, layers, channels, classes=1000, thumbnail=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

SE_ResNet V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
  • block (HybridBlock) – Class for the residual block. Options are SE_BasicBlockV1, SE_BottleneckV1.

  • layers (list of int) – Numbers of layers in each block

  • channels (list of int) – Numbers of channels in each block. Length should be one larger than layers list.

  • classes (int, default 1000) – Number of classification classes.

  • thumbnail (bool, default False) – Enable thumbnail.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SE_ResNetV2(block, layers, channels, classes=1000, thumbnail=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

SE_ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • block (HybridBlock) – Class for the residual block. Options are SE_BasicBlockV1, SE_BottleneckV1.

  • layers (list of int) – Numbers of layers in each block

  • channels (list of int) – Numbers of channels in each block. Length should be one larger than layers list.

  • classes (int, default 1000) – Number of classification classes.

  • thumbnail (bool, default False) – Enable thumbnail.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SMOTTracker(motion_model='no', anchor_array=None, use_motion=True, tracking_classes=[], match_top_k=10, track_keep_alive_thresh=0.1, new_track_iou_thresh=0.3, track_nms_thresh=0.5, gpu_id=0, anchor_assignment_method='iou', joint_linking=False, tracktor=None)[source]

Implementation of the SMOT tracker The steps to use the tracker is: 0. Set anchors from the SSD 1. First call tracker.predict(new_frame) 2. Then get the tracking anchor information 3. Run the detractor with the tracking anchor information 4. Run tracker.update(new_detection, track_info).

process_frame_sequence(frame_iterator, tracktor)[source]
Parameters
  • frame_iterator (each step it emits a tuple of (frame_id, frame_data)) –

  • tracktor

Returns

results_iter

Return type

a response iterator with one tuple (frame_id, frame_rst) per frame

class gluoncv.model_zoo.SSD(network, base_size, features, num_filters, sizes, ratios, steps, classes, use_1x1_transition=True, use_bn=True, reduce_ratio=1.0, min_depth=128, global_pool=False, pretrained=False, stds=(0.1, 0.1, 0.2, 0.2), nms_thresh=0.45, nms_topk=400, post_nms=100, anchor_alloc_size=128, ctx=cpu(0), norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, root='~/.mxnet/models', minimal_opset=False, predictors_kernel=(3, 3), predictors_pad=(1, 1), anchor_generator=<class 'gluoncv.model_zoo.ssd.anchor.SSDAnchorGenerator'>, **kwargs)[source]

Single-shot Object Detection Network: https://arxiv.org/abs/1512.02325.

Parameters
  • network (string or None) – Name of the base network, if None is used, will instantiate the base network from features directly instead of composing.

  • base_size (int) – Base input size, it is speficied so SSD can support dynamic input shapes.

  • features (list of str or mxnet.gluon.HybridBlock) – Intermediate features to be extracted or a network with multi-output. If network is None, features is expected to be a multi-output network.

  • num_filters (list of int) – Number of channels for the appended layers, ignored if network`is `None.

  • sizes (iterable fo float) – Sizes of anchor boxes, this should be a list of floats, in incremental order. The length of sizes must be len(layers) + 1. For example, a two stage SSD model can have sizes = [30, 60, 90], and it converts to [30, 60] and [60, 90] for the two stages, respectively. For more details, please refer to original paper.

  • ratios (iterable of list) – Aspect ratios of anchors in each output layer. Its length must be equals to the number of SSD output layers.

  • steps (list of int) – Step size of anchor boxes in each output layer.

  • classes (iterable of str) – Names of all categories.

  • use_1x1_transition (bool) – Whether to use 1x1 convolution as transition layer between attached layers, it is effective reducing model capacity.

  • use_bn (bool) – Whether to use BatchNorm layer after each attached convolutional layer.

  • reduce_ratio (float) – Channel reduce ratio (0, 1) of the transition layer.

  • min_depth (int) – Minimum channels for the transition layers.

  • global_pool (bool) – Whether to attach a global average pooling layer as the last output layer.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • stds (tuple of float, default is (0.1, 0.1, 0.2, 0.2)) – Std values to be divided/multiplied to box encoded values.

  • nms_thresh (float, default is 0.45.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

  • nms_topk (int, default is 400) –

    Apply NMS to top k detection results, use -1 to disable so that every Detection

    result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

  • anchor_alloc_size (tuple of int, default is (128, 128)) – For advanced users. Define anchor_alloc_size to generate large enough anchor maps, which will later saved in parameters. During inference, we support arbitrary input image by cropping corresponding area of the anchor map. This allow us to export to symbol so we can run it in c++, scalar, etc.

  • ctx (mx.Context) – Network context.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm. This will only apply to base networks that has norm_layer specified, will ignore if the base network (e.g. VGG) don’t accept this argument.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

  • root (str) – The root path for model storage, default is ‘~/.mxnet/models’

  • minimal_opset (bool) – We sometimes add special operators to accelerate training/inference, however, for exporting to third party compilers we want to utilize most widely used operators. If minimal_opset is True, the network will use a minimal set of operators good for e.g., TVM.

  • predictor_kernel (tuple of int. default is (3,3)) – Dimension of predictor kernel

  • predictor_pad (tuple of int. default is (1,1)) – Padding of the predictor kenrel conv.

  • anchor_generator (default is SSDAnchorGenerator) – Anchor Generator to be used. The default it SSDAnchorGenerator corresponding to SSD published article. This argument can be used for other custom anchor generators. Like LiteAnchorGenerator.

hybrid_forward(F, x)[source]

Hybrid forward

property num_classes

Return number of foreground classes.

Returns

Number of foreground classes

Return type

int

reset_class(classes, reuse_weights=None)[source]

Reset class categories and class predictors.

Parameters
  • classes (iterable of str) – The new categories. [‘apple’, ‘orange’] for example.

  • reuse_weights (dict) – A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict, or a list of [name0, name1,…] if class names don’t change. This allows the new predictor to reuse the previously trained weights specified.

Example

>>> net = gluoncv.model_zoo.get_model('ssd_512_resnet50_v1_voc', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the 14th category in VOC
>>> net.reset_class(classes=['person'], reuse_weights={0:14})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':14})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
set_nms(nms_thresh=0.45, nms_topk=400, post_nms=100)[source]

Set non-maximum suppression parameters.

Parameters
  • nms_thresh (float, default is 0.45.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

  • nms_topk (int, default is 400) –

    Apply NMS to top k detection results, use -1 to disable so that every Detection

    result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

Returns

Return type

None

class gluoncv.model_zoo.SiamRPN(bz=1, is_train=False, ctx=cpu(0), **kwargs)[source]
hybrid_forward(F, template, search)[source]

Hybrid forward of SiamRPN net only used in training

template(zinput)[source]

template z branch

track(xinput)[source]

track x branch

Parameters

xinput (np.ndarray) – predicted frame

Returns

predicted frame result

Return type

dic

class gluoncv.model_zoo.SimplePoseResNet(base_name='resnet50_v1b', pretrained_base=False, pretrained_ctx=cpu(0), num_joints=17, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), final_conv_kernel=1, deconv_with_bias=False, **kwargs)[source]
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.SlowFast(nclass, block=<class 'gluoncv.model_zoo.action_recognition.slowfast.Bottleneck'>, layers=None, num_block_temp_kernel_fast=None, num_block_temp_kernel_slow=None, pretrained=False, pretrained_base=False, feat_ext=False, num_segments=1, num_crop=1, bn_eval=True, bn_frozen=False, partial_bn=False, frozen_stages=-1, dropout_ratio=0.5, init_std=0.01, alpha=8, beta_inv=8, fusion_conv_channel_ratio=2, fusion_kernel_size=5, width_per_group=64, num_groups=1, slow_temporal_stride=16, fast_temporal_stride=2, slow_frames=4, fast_frames=32, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, ctx=None, **kwargs)[source]

SlowFast networks (SlowFast) from “SlowFast Networks for Video Recognition” paper.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • block (a HybridBlock.) – Building block of a ResNet, could be Basic or Bottleneck.

  • layers (a list or tuple, default is None.) – Number of stages in a ResNet, e.g., [3, 4, 6, 3] in ResNet50.

  • num_block_temp_kernel_fast (int, default is None.) – If the current block has more than NUM_BLOCK_TEMP_KERNEL blocks, use temporal kernel of 1 for the rest of the blocks.

  • num_block_temp_kernel_slow (int, default is None.) – If the current block has more than NUM_BLOCK_TEMP_KERNEL blocks, use temporal kernel of 1 for the rest of the blocks.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • bn_eval (bool.) – Whether to set BN layers to eval mode, namely, freeze running stats (mean and var).

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • frozen_stages (int.) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • dropout_ratio (float, default is 0.5.) – The dropout rate of a dropout layer. The larger the value, the more strength to prevent overfitting.

  • init_std (float, default is 0.001.) – Standard deviation value when initialize the dense layers.

  • alpha (int, default is 8.) – Corresponds to the frame rate reduction ratio between the Slow and Fast pathways.

  • beta_inv (int, default is 8.) – Corresponds to the inverse of the channel reduction ratio between the Slow and Fast pathways.

  • fusion_conv_channel_ratio (int, default is 2.) – Ratio of channel dimensions between the Slow and Fast pathways.

  • fusion_kernel_size (int, default is 5.) – Kernel dimension used for fusing information from Fast pathway to Slow pathway.

  • width_per_group (int, default is 64.) – Width of each group (64 -> ResNet; 4 -> ResNeXt).

  • num_groups (int, default is 1.) – Number of groups for the convolution. Num_groups=1 is for standard ResNet like networks, and num_groups>1 is for ResNeXt like networks.

  • slow_temporal_stride (int, default 16.) – The temporal stride for sparse sampling of video frames in slow branch of a SlowFast network.

  • fast_temporal_stride (int, default 2.) – The temporal stride for sparse sampling of video frames in fast branch of a SlowFast network.

  • slow_frames (int, default 4.) – The number of frames used as input to a slow branch.

  • fast_frames (int, default 32.) – The number of frames used as input to a fast branch.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

FastPath(F, x)[source]

Hybrid forward of the fast branch

SlowPath(F, x, lateral)[source]

Hybrid forward of the slow branch

hybrid_forward(F, x)[source]

Hybrid forward of SlowFast network

class gluoncv.model_zoo.SqueezeNet(version, classes=1000, **kwargs)[source]

SqueezeNet model from the “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size” paper. SqueezeNet 1.1 model from the official SqueezeNet repo. SqueezeNet 1.1 has 2.4x less computation and slightly fewer parameters than SqueezeNet 1.0, without sacrificing accuracy.

Parameters
  • version (str) – Version of squeezenet. Options are ‘1.0’, ‘1.1’.

  • classes (int, default 1000) – Number of classification classes.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.Track(mean, track_id, source, keep_alive_thresh=0.1, max_missing=30, attributes=None, class_id=0, linked_id=None)[source]

This class represents a track/tracklet used in the SMOT Tracker It has the following properties

mean: 4-tuple representing the (x0, y0, x1, y1) as the current state (location) of the tracked object track_id: the numerical id of the track age: the number of timesteps since its first occurrence time_since_update: number of time-steps since the last update of the its location state: the state of the track, can be one in TrackState confidence_score: tracking_confidence at the current timestep

source: a tuple of (anchor_indices, anchor_weights) attributes: np.ndarray of additional attributes of the object ***************************************************

It also has these configs keep_alive_thresh: the minimal tracking/detection confidence to keep the track in Active state max_missing: the maximal timesteps we will keep searching for this track when missing before we mark it as deleted ***************************************************

is_active()[source]

Returns True if this track is confirmed.

is_deleted()[source]

Returns True if this track is dead and should be deleted.

is_mising()[source]

Returns True if this track is tentative (unconfirmed).

mark_missed()[source]

Mark this track as missed (no association at the current time step).

predict(motion_model=None)[source]
Parameters

motion_model (if not None, predict the motion of this track given its history) –

update(bbx, source=None, attributes=None)[source]

Update the state of the track. We override the predicted track position. Updating the track will keep or flip its state as Active If the confidence of detection is below the keep_alive_threshold, we will mark this track as missed. ———- bbx : new detection location of this object attributes: some useful attributes of this object at this frame, e.g. landmarks

class gluoncv.model_zoo.VGG(layers, filters, classes=1000, batch_norm=False, **kwargs)[source]

VGG model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • layers (list of int) – Numbers of layers in each feature block.

  • filters (list of int) – Numbers of filters in each feature block. List length should match the layers.

  • classes (int, default 1000) – Number of classification classes.

  • batch_norm (bool, default False) – Use batch normalization.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.VGGAtrousExtractor(layers, filters, extras, batch_norm=False, **kwargs)[source]

VGG Atrous multi layer feature extractor which produces multiple output feature maps.

Parameters
  • layers (list of int) – Number of layer for vgg base network.

  • filters (list of int) – Number of convolution filters for each layer.

  • extras (list of list) – Extra layers configurations.

  • batch_norm (bool) – If True, will use BatchNorm layers.

hybrid_forward(F, x, init_scale)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.Xception65(classes=1000, output_stride=32, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None)[source]

Modified Aligned Xception

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.Xception71(classes=1000, output_stride=32, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None)[source]

Modified Aligned Xception

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluoncv.model_zoo.YOLOV3(stages, channels, anchors, strides, classes, alloc_size=(128, 128), nms_thresh=0.45, nms_topk=400, post_nms=100, pos_iou_thresh=1.0, ignore_iou_thresh=0.7, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO V3 detection network. Reference: https://arxiv.org/pdf/1804.02767.pdf. :param stages: Staged feature extraction blocks.

For example, 3 stages and 3 YOLO output layers are used original paper.

Parameters
  • channels (iterable) – Number of conv channels for each appended stage. len(channels) should match len(stages).

  • num_class (int) – Number of foreground objects.

  • anchors (iterable) – The anchor setting. len(anchors) should match len(stages).

  • strides (iterable) – Strides of feature map. len(strides) should match len(stages).

  • alloc_size (tuple of int, default is (128, 128)) – For advanced users. Define alloc_size to generate large enough anchor maps, which will later saved in parameters. During inference, we support arbitrary input image by cropping corresponding area of the anchor map. This allow us to export to symbol so we can run it in c++, Scalar, etc.

  • nms_thresh (float, default is 0.45.) – Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.

  • nms_topk (int, default is 400) –

    Apply NMS to top k detection results, use -1 to disable so that every Detection

    result is used in NMS.

  • post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

  • pos_iou_thresh (float, default is 1.0) – IOU threshold for true anchors that match real objects. ‘pos_iou_thresh < 1’ is not implemented.

  • ignore_iou_thresh (float) – Anchors that has IOU in range(ignore_iou_thresh, pos_iou_thresh) don’t get penalized of objectness score.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

property classes

Return names of (non-background) categories. :returns: Names of (non-background) categories. :rtype: iterable of str

hybrid_forward(F, x, *args)[source]

YOLOV3 network hybrid forward. :param F: F is mxnet.sym if hybridized or mxnet.nd if not. :type F: mxnet.nd or mxnet.sym :param x: Input data. :type x: mxnet.nd.NDArray :param *args: During training, extra inputs are required:

(gt_boxes, obj_t, centers_t, scales_t, weights_t, clas_t) These are generated by YOLOV3PrefetchTargetGenerator in dataloader transform function.

Returns

During inference, return detections in shape (B, N, 6) with format (cid, score, xmin, ymin, xmax, ymax) During training, return losses only: (obj_loss, center_loss, scale_loss, cls_loss).

Return type

(tuple of) mxnet.nd.NDArray

property num_class

Number of (non-background) categories. :returns: Number of (non-background) categories. :rtype: int

reset_class(classes, reuse_weights=None)[source]

Reset class categories and class predictors. :param classes: The new categories. [‘apple’, ‘orange’] for example. :type classes: iterable of str :param reuse_weights: A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict,

or a list of [name0, name1,…] if class names don’t change. This allows the new predictor to reuse the previously trained weights specified.

Example

>>> net = gluoncv.model_zoo.get_model('yolo3_darknet53_voc', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the 14th category in VOC
>>> net.reset_class(classes=['person'], reuse_weights={0:14})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':14})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
set_nms(nms_thresh=0.45, nms_topk=400, post_nms=100)[source]

Set non-maximum suppression parameters. :param nms_thresh: Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS. :type nms_thresh: float, default is 0.45. :param nms_topk:

Apply NMS to top k detection results, use -1 to disable so that every Detection

result is used in NMS.

Parameters

post_nms (int, default is 100) – Only return top post_nms detection results, the rest is discarded. The number is based on COCO dataset which has maximum 100 objects per image. You can adjust this number if expecting more objects. You can use -1 to return all detections.

Returns

Return type

None

gluoncv.model_zoo.abstractmethod(funcobj)[source]

A decorator indicating abstract methods.

Requires that the metaclass is ABCMeta or derived from it. A class that has a metaclass derived from ABCMeta cannot be instantiated unless all of its abstract methods are overridden. The abstract methods can be called using any of the normal ‘super’ call mechanisms.

Usage:

class C(metaclass=ABCMeta):

@abstractmethod def my_abstract_method(self, …):

gluoncv.model_zoo.alexnet(pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

AlexNet model from the “One weird trick…” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

gluoncv.model_zoo.alexnetlegacy(**kwargs)[source]

Alexnetlegacy

gluoncv.model_zoo.bbox_iou(bbox_a, bbox_b, offset=0)[source]

Calculate Intersection-Over-Union(IOU) of two bounding boxes.

Parameters
  • bbox_a (numpy.ndarray) – An ndarray with shape \((N, 4)\).

  • bbox_b (numpy.ndarray) – An ndarray with shape \((M, 4)\).

  • offset (float or int, default is 0) – The offset is used to control the whether the width(or height) is computed as (right - left + offset). Note that the offset must be 0 for normalized bboxes, whose ranges are in [0, 1].

Returns

An ndarray with shape \((N, M)\) indicates IOU between each pairs of bounding boxes in bbox_a and bbox_b.

Return type

numpy.ndarray

gluoncv.model_zoo.c3d_kinetics400(nclass=400, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', num_segments=1, num_crop=1, feat_ext=False, **kwargs)[source]

The Convolutional 3D network (C3D) trained on Kinetics400 dataset. Learning Spatiotemporal Features with 3D Convolutional Networks. ICCV, 2015. https://arxiv.org/abs/1412.0767

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.center_net_dla34_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with dla34 base network on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_dla34_dcnv2_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with dla34 base network with deformable v2 conv layers on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_dla34_dcnv2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with dla34 base network with deformable conv layers on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_dla34_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with dla34 base network on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_mobilenetv3_large_duc_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with mobilenetv3_large base network on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_mobilenetv3_large_duc_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with mobilenetv3_large base network on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_mobilenetv3_small_duc_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with mobilenetv3_small base network with DUC layers on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_mobilenetv3_small_duc_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with mobilenetv3_small base network with DUC layers on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet101_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet101_v1b_dcnv2_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network with deformable v2 conv layers on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet101_v1b_dcnv2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network with deformable conv layers on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet101_v1b_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet101_v1b base network on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet18_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet18_v1b base network on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet18_v1b_dcnv2_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet18_v1b base network with deformable v2 conv layer on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet18_v1b_dcnv2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet18_v1b base network with deformable v2 conv layers on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet18_v1b_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet18_v1b base network on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet50_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet50_v1b base network on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet50_v1b_dcnv2_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet50_v1b base network with deformable v2 conv layers on coco dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet50_v1b_dcnv2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet50_v1b base network with deformable conv layers on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.center_net_resnet50_v1b_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Center net with resnet50_v1b base network on voc dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A CenterNet detection network.

Return type

HybridBlock

class gluoncv.model_zoo.cifar_ResidualAttentionModel(scale, m, classes=10, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper. Input size is 32 x 32.

Parameters
  • scale (tuple) – Network scale p, t, r.

  • m (tuple) – Network scale m.Network scale is defined as 36m + 20. And normally m is a tuple of (m-1, m, m+1) except m==1 as (1, 1, 1).

  • classes (int, default 10) – Number of classification classes.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

gluoncv.model_zoo.cifar_residualattentionnet452(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
  • input_size (int) – Input size of net. Options are 32,224.

  • num_layers (int) – Numbers of layers. Options are 56, 92, 128, 164, 200, 236, 452.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.cifar_residualattentionnet56(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
  • input_size (int) – Input size of net. Options are 32,224.

  • num_layers (int) – Numbers of layers. Options are 56, 92, 128, 164, 200, 236, 452.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.cifar_residualattentionnet92(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
  • input_size (int) – Input size of net. Options are 32,224.

  • num_layers (int) – Numbers of layers. Options are 56, 92, 128, 164, 200, 236, 452.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.cifar_resnet110_v1(**kwargs)[source]

ResNet-110 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

Parameters
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.cifar_resnet110_v2(**kwargs)[source]

ResNet-110 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.cifar_resnet20_v1(**kwargs)[source]

ResNet-20 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

Parameters
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.cifar_resnet20_v2(**kwargs)[source]

ResNet-20 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.cifar_resnet56_v1(**kwargs)[source]

ResNet-56 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

Parameters
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.cifar_resnet56_v2(**kwargs)[source]

ResNet-56 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.cifar_wideresnet16_10(**kwargs)[source]

WideResNet-16-10 model for CIFAR10 from “Wide Residual Networks” paper.

Parameters
  • drop_rate (float) – The rate of dropout.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.cifar_wideresnet28_10(**kwargs)[source]

WideResNet-28-10 model for CIFAR10 from “Wide Residual Networks” paper.

Parameters
  • drop_rate (float) – The rate of dropout.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.cifar_wideresnet40_8(**kwargs)[source]

WideResNet-40-8 model for CIFAR10 from “Wide Residual Networks” paper.

Parameters
  • drop_rate (float) – The rate of dropout.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.cpu(device_id=0)[source]

Returns a CPU context.

This function is a short cut for Context('cpu', device_id). For most operations, when no context is specified, the default context is cpu().

Examples

>>> with mx.cpu():
...     cpu_array = mx.nd.ones((2, 3))
>>> cpu_array.context
cpu(0)
>>> cpu_array = mx.nd.ones((2, 3), ctx=mx.cpu())
>>> cpu_array.context
cpu(0)
Parameters

device_id (int, optional) – The device id of the device. device_id is not needed for CPU. This is included to make interface compatible with GPU.

Returns

context – The corresponding CPU context.

Return type

Context

gluoncv.model_zoo.custom_faster_rcnn_fpn(classes, transfer=None, dataset='custom', pretrained_base=True, base_network_name='resnet18_v1b', norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, sym_norm_layer=None, sym_norm_kwargs=None, num_fpn_filters=256, num_box_head_conv=4, num_box_head_conv_filters=256, num_box_head_dense_filters=1024, **kwargs)[source]

Faster RCNN model with resnet base network and FPN on custom dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • transfer (str or None) – Dataset from witch to transfer from. If not None, will try to reuse pre-trained weights from faster RCNN networks trained on other dataset, specified by the parameter.

  • dataset (str, default 'custom') – Dataset name attached to the network name

  • pretrained_base (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • base_network_name (str, default 'resnet18_v1b') – base network for mask RCNN. Currently support: ‘resnet18_v1b’, ‘resnet50_v1b’, and ‘resnet101_v1d’

  • norm_layer (nn.HybridBlock, default nn.BatchNorm) – Gluon normalization layer to use. Default is frozen batch normalization layer.

  • norm_kwargs (dict) – Keyword arguments for gluon normalization layer

  • sym_norm_layer (nn.SymbolBlock, default None) – Symbol normalization layer to use in FPN. This is due to FPN being implemented using SymbolBlock. Default is None, meaning no normalization layer will be used in FPN.

  • sym_norm_kwargs (dict) – Keyword arguments for symbol normalization layer used in FPN.

  • num_fpn_filters (int, default 256) – Number of filters for FPN output layers.

  • num_box_head_conv (int, default 4) – Number of convolution layers to use in box head if batch normalization is not frozen.

  • num_box_head_conv_filters (int, default 256) – Number of filters for convolution layers in box head. Only applicable if batch normalization is not frozen.

  • num_box_head_dense_filters (int, default 1024) – Number of hidden units for the last fully connected layer in box head.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Returns

Hybrid faster RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.custom_mask_rcnn_fpn(classes, transfer=None, dataset='custom', pretrained_base=True, base_network_name='resnet18_v1b', norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, sym_norm_layer=None, sym_norm_kwargs=None, num_fpn_filters=256, num_box_head_conv=4, num_box_head_conv_filters=256, num_box_head_dense_filters=1024, **kwargs)[source]

Mask RCNN model with resnet base network and FPN on custom dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • transfer (str or None) – Dataset from witch to transfer from. If not None, will try to reuse pre-trained weights from faster RCNN networks trained on other dataset, specified by the parameter.

  • dataset (str, default 'custom') – Dataset name attached to the network name

  • pretrained_base (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • base_network_name (str, default 'resnet18_v1b') – base network for mask RCNN. Currently support: ‘resnet18_v1b’, ‘resnet50_v1b’, and ‘resnet101_v1d’

  • norm_layer (nn.HybridBlock, default nn.BatchNorm) – Gluon normalization layer to use. Default is frozen batch normalization layer.

  • norm_kwargs (dict) – Keyword arguments for gluon normalization layer

  • sym_norm_layer (nn.SymbolBlock, default None) – Symbol normalization layer to use in FPN. This is due to FPN being implemented using SymbolBlock. Default is None, meaning no normalization layer will be used in FPN.

  • sym_norm_kwargs (dict) – Keyword arguments for symbol normalization layer used in FPN.

  • num_fpn_filters (int, default 256) – Number of filters for FPN output layers.

  • num_box_head_conv (int, default 4) – Number of convolution layers to use in box head if batch normalization is not frozen.

  • num_box_head_conv_filters (int, default 256) – Number of filters for convolution layers in box head. Only applicable if batch normalization is not frozen.

  • num_box_head_dense_filters (int, default 1024) – Number of hidden units for the last fully connected layer in box head.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Returns

Hybrid faster RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.custom_ssd(base_network_name, base_size, filters, sizes, ratios, steps, classes, dataset, pretrained_base, **kwargs)[source]

Custom SSD models.

gluoncv.model_zoo.custom_yolov3(base_network_name, filters, anchors, strides, classes, dataset, pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

Custom YOLO models.

gluoncv.model_zoo.darknet53(**kwargs)[source]

Darknet v3 53 layer network. Reference: https://arxiv.org/pdf/1804.02767.pdf.

Parameters
  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

Darknet network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.densenet121(**kwargs)[source]

Densenet-BC 121-layer model from the “Densely Connected Convolutional Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.densenet161(**kwargs)[source]

Densenet-BC 161-layer model from the “Densely Connected Convolutional Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.densenet169(**kwargs)[source]

Densenet-BC 169-layer model from the “Densely Connected Convolutional Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.densenet201(**kwargs)[source]

Densenet-BC 201-layer model from the “Densely Connected Convolutional Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.doublehead_rcnn_resnet50_v1b_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Double Head Faster RCNN model from the paper “(2019). Rethinking Classification and Localization for Object Detection.”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_resnet50_v1b_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_fpn_resnet101_v1d_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model with FPN from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks” “Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S. (2016). Feature Pyramid Networks for Object Detection”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_fpn_resnet101_v1d_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_fpn_resnet50_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model with FPN from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks” “Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S. (2016). Feature Pyramid Networks for Object Detection”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_fpn_resnet50_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_fpn_syncbn_resnest101_coco(pretrained=False, pretrained_base=True, num_devices=0, **kwargs)[source]

Faster R-CNN with ResNeSt ResNeSt: Split Attention Network”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • num_devices (int, default is 0) – Number of devices for sync batch norm layer. if less than 1, use all devices available.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_fpn_syncbn_resnest101_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_fpn_syncbn_resnest269_coco(pretrained=False, pretrained_base=True, num_devices=0, **kwargs)[source]

Faster R-CNN with ResNeSt ResNeSt: Split Attention Network”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • num_devices (int, default is 0) – Number of devices for sync batch norm layer. if less than 1, use all devices available.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_fpn_syncbn_resnest269_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_fpn_syncbn_resnest50_coco(pretrained=False, pretrained_base=True, num_devices=0, **kwargs)[source]

Faster R-CNN with ResNeSt ResNeSt: Split Attention Network”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • num_devices (int, default is 0) – Number of devices for sync batch norm layer. if less than 1, use all devices available.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_fpn_syncbn_resnest50_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_fpn_syncbn_resnet101_v1d_coco(pretrained=False, pretrained_base=True, num_devices=0, **kwargs)[source]

Faster RCNN model with FPN from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks” “Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S. (2016). Feature Pyramid Networks for Object Detection”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • num_devices (int, default is 0) – Number of devices for sync batch norm layer. if less than 1, use all devices available.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_fpn_syncbn_resnet101_v1d_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_fpn_syncbn_resnet50_v1b_coco(pretrained=False, pretrained_base=True, num_devices=0, **kwargs)[source]

Faster RCNN model with FPN from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks” “Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S. (2016). Feature Pyramid Networks for Object Detection”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • num_devices (int, default is 0) – Number of devices for sync batch norm layer. if less than 1, use all devices available.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_fpn_syncbn_resnet50_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_resnet101_v1d_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks”

Parameters
  • pretrained (bool, optional, default is False) – Load pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_resnet101_v1d_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_resnet101_v1d_custom(classes, transfer=None, pretrained_base=True, pretrained=False, **kwargs)[source]

Faster RCNN model with resnet101_v1d base network on custom dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • transfer (str or None) – If not None, will try to reuse pre-trained weights from faster RCNN networks trained on other datasets.

  • pretrained_base (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Returns

Hybrid faster RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.faster_rcnn_resnet101_v1d_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks”

Parameters
  • pretrained (bool, optional, default is False) – Load pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_resnet101_v1d_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_resnet50_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_resnet50_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.faster_rcnn_resnet50_v1b_custom(classes, transfer=None, pretrained_base=True, pretrained=False, **kwargs)[source]

Faster RCNN model with resnet50_v1b base network on custom dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • transfer (str or None) – If not None, will try to reuse pre-trained weights from faster RCNN networks trained on other datasets.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Returns

Hybrid faster RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.faster_rcnn_resnet50_v1b_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

Faster RCNN model from the paper “Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_faster_rcnn_resnet50_v1b_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_Siam_RPN(base_name, bz=1, is_train=False, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

get Siam_RPN net and get pretrained model if have pretrained

Parameters
  • base_name (str) – Backbone model name

  • bz (int) – batch size for train, bz = 1 if test

  • is_train (str) – is_train is True if train, False if test

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

Returns

A SiamRPN Tracking network.

Return type

HybridBlock

gluoncv.model_zoo.get_base_network(name, **kwargs)[source]

Get centernet base network

gluoncv.model_zoo.get_center_net(name, dataset, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Get a center net instance.

Parameters
  • name (str or None) – Model name, if None is used, you must specify features to be a HybridBlock.

  • dataset (str) – Name of dataset. This is used to identify model name because models trained on different datasets are going to be very different.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

Returns

A CenterNet detection network.

Return type

HybridBlock

gluoncv.model_zoo.get_cifar_resnet(version, num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • version (int) – Version of ResNet. Options are 1, 2.

  • num_layers (int) – Numbers of layers. Needs to be an integer in the form of 6*n+2, e.g. 20, 56, 110, 164.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_cifar_wide_resnet(num_layers, width_factor=1, drop_rate=0.0, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • num_layers (int) – Numbers of layers. Needs to be an integer in the form of 6*n+2, e.g. 20, 56, 110, 164.

  • width_factor (int) – The width factor to apply to the number of channels from the original resnet.

  • drop_rate (float) – The rate of dropout.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_darknet(darknet_version, num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Get darknet by version and num_layers info.

Parameters
  • darknet_version (str) – Darknet version, choices are [‘v3’].

  • num_layers (int) – Number of layers.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

Darknet network.

Return type

mxnet.gluon.HybridBlock

Examples

>>> model = get_darknet('v3', 53, pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

DeepLabV3 :param dataset: The dataset that model pretrained on. (pascal_voc, pascal_aug, ade20k, coco, citys) :type dataset: str, default pascal_voc :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn(dataset='pascal_voc', backbone='resnet50', pretrained=False)
>>> print(model)
gluoncv.model_zoo.get_deeplab_plus(dataset='pascal_voc', backbone='xception', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

DeepLabV3Plus :param dataset: The dataset that model pretrained on. (pascal_voc, ade20k) :type dataset: str, default pascal_voc :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn(dataset='pascal_voc', backbone='xception', pretrained=False)
>>> print(model)
gluoncv.model_zoo.get_deeplab_plus_xception_coco(**kwargs)[source]

DeepLabV3Plus :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_plus_xception_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnest101_ade(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnest101_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnest200_ade(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnest200_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnest269_ade(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnest269_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnest50_ade(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnest50_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet101_ade(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet101_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet101_citys(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet101_citys(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet101_coco(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet101_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet101_voc(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet101_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet152_coco(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet152_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet152_voc(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet152_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet50_ade(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet50_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_resnet50_citys(**kwargs)[source]

DeepLabV3 :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_resnet50_citys(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplab_v3b_plus_wideresnet_citys(**kwargs)[source]

DeepLabWV3Plus :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplab_v3b_plus_wideresnet_citys(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_deeplabv3b_plus(dataset='citys', backbone='wideresnet', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

DeepLabWV3Plus :param dataset: The dataset that model pretrained on. (pascal_voc, ade20k, citys) :type dataset: str, default pascal_voc :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_deeplabv3b_plus(dataset='citys', backbone='wideresnet', pretrained=False)
>>> print(model)
gluoncv.model_zoo.get_doublehead_rcnn(name, dataset, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Utility function to return faster rcnn networks.

Parameters
  • name (str) – Model name.

  • dataset (str) – The name of dataset.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

Returns

The DoubleHeadRCNN-RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.get_faster_rcnn(name, dataset, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Utility function to return faster rcnn networks.

Parameters
  • name (str) – Model name.

  • dataset (str) – The name of dataset.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

Returns

The Faster-RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.get_fastscnn(dataset='citys', ctx=cpu(0), pretrained=False, root='~/.mxnet/models', **kwargs)[source]

Fast-SCNN: Fast Semantic Segmentation Network :param dataset: :type dataset: str, default cityscapes :param ctx: The context in which to load the pretrained weights. :type ctx: Context, default CPU :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters

root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fastscnn(dataset='citys')
>>> print(model)
gluoncv.model_zoo.get_fastscnn_citys(**kwargs)[source]

Fast-SCNN: Fast Semantic Segmentation Network :param dataset: :type dataset: str, default cityscapes :param ctx: The context in which to load the pretrained weights. :type ctx: Context, default CPU

Examples

>>> model = get_fastscnn_citys()
>>> print(model)
gluoncv.model_zoo.get_fcn(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), pretrained_base=True, **kwargs)[source]

FCN model from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • dataset (str, default pascal_voc) – The dataset that model pretrained on. (pascal_voc, ade20k)

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • pretrained_base (bool or str, default True) – This will load pretrained backbone network, that was trained on ImageNet.

Examples

>>> model = get_fcn(dataset='pascal_voc', backbone='resnet50', pretrained=False)
>>> print(model)
gluoncv.model_zoo.get_fcn_resnet101_ade(**kwargs)[source]

FCN model with base network ResNet-50 pre-trained on ADE20K dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_resnet50_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_fcn_resnet101_coco(**kwargs)[source]

FCN model with base network ResNet-101 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_resnet101_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_fcn_resnet101_voc(**kwargs)[source]

FCN model with base network ResNet-101 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_resnet101_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_fcn_resnet50_ade(**kwargs)[source]

FCN model with base network ResNet-50 pre-trained on ADE20K dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_resnet50_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_fcn_resnet50_voc(**kwargs)[source]

FCN model with base network ResNet-50 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_resnet50_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_hrnet(model_name, stage_interp_type='nearest', purpose='cls', pretrained=False, ctx=cpu(0), root='~/.mxnet/models', norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, num_classes=1000, **kwargs)[source]

HRNet model from the “Deep High-Resolution Representation Learning for Visual Recognition” paper.

Parameters
  • model_name (string) – The name of hrnet models: w18_small_v1/w18_small_v2/w30/w32/w40/w42/w48.

  • stage_interp_type (string) – The interpolation type for upsample in each stage, nearest, bilinear and bilinear_like are supported.

  • purpose (string) – The purpose of model, cls and seg are supported.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_mask_rcnn(name, dataset, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Utility function to return mask rcnn networks.

Parameters
  • name (str) – Model name.

  • dataset (str) – The name of dataset.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

Returns

The Mask RCNN network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.get_mobilenet(multiplier, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper.

Parameters
  • multiplier (float) – The width multiplier for controlling the model size. Only multipliers that are no less than 0.25 are supported. The actual number of channels is equal to the original channel size multiplied by this multiplier.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_mobilenet_v2(multiplier, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper.

Parameters
  • multiplier (float) – The width multiplier for controlling the model size. Only multipliers that are no less than 0.25 are supported. The actual number of channels is equal to the original channel size multiplied by this multiplier.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_model(name, **kwargs)[source]

Returns a pre-defined model by name

Parameters
  • name (str) – Name of the model.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • classes (int) – Number of classes for the output layer.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Returns

The model.

Return type

HybridBlock

gluoncv.model_zoo.get_model_list()[source]

Get the entire list of model names in model_zoo.

Returns

Entire list of model names in model_zoo.

Return type

list of str

gluoncv.model_zoo.get_monodepth2(backbone='resnet18', pretrained_base=True, scales=range(0, 4), num_output_channels=1, use_skips=True, root='~/.mxnet/models', ctx=cpu(0), pretrained=False, pretrained_model='kitti_stereo_640x192', **kwargs)[source]

MonoDepth2

Parameters
  • backbone (string, default:'resnet18') – Pre-trained dilated backbone network type (‘resnet18’, ‘resnet34’, ‘resnet50’, ‘resnet101’ or ‘resnet152’).

  • pretrained_base (bool or str, default: True) – This will load pretrained backbone network, that was trained on ImageNet.

  • scales (list, default: range(4)) – The scales used in the loss.

  • num_output_channels (int, default: 1) – The number of output channels.

  • use_skips (bool, default: True) – This will use skip architecture in the network.

  • ctx (Context, default: CPU) – The context in which to load the pretrained weights.

  • root (str, default: '~/.mxnet/models') – Location for keeping the model parameters.

  • pretrained (bool or str, default: False) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_model (string, default: kitti_stereo_640x192) – The dataset that model pretrained on.

gluoncv.model_zoo.get_monodepth2_resnet18_kitti_mono_640x192(**kwargs)[source]

Monodepth2

Parameters

backbone (string) – Pre-trained dilated backbone network type (default:’resnet18’).

gluoncv.model_zoo.get_monodepth2_resnet18_kitti_mono_stereo_640x192(**kwargs)[source]

Monodepth2

Parameters

backbone (string) – Pre-trained dilated backbone network type (default:’resnet18’).

gluoncv.model_zoo.get_monodepth2_resnet18_kitti_stereo_640x192(**kwargs)[source]

Monodepth2

Parameters

backbone (string) – Pre-trained dilated backbone network type (default:’resnet18’).

gluoncv.model_zoo.get_monodepth2_resnet18_posenet_kitti_mono_640x192(**kwargs)[source]

Monodepth2 PoseNet

Parameters

backbone (string) – Pre-trained dilated backbone network type (default:’resnet18’).

gluoncv.model_zoo.get_monodepth2_resnet18_posenet_kitti_mono_stereo_640x192(**kwargs)[source]

Monodepth2 PoseNet

Parameters

backbone (string) – Pre-trained dilated backbone network type (default:’resnet18’).

gluoncv.model_zoo.get_monodepth2posenet(backbone='resnet18', pretrained_base=True, num_input_images=2, num_input_features=1, num_frames_to_predict_for=2, stride=1, root='~/.mxnet/models', ctx=cpu(0), pretrained=False, pretrained_model='kitti_stereo_640x192', **kwargs)[source]

Monodepth2

Parameters
  • backbone (string) – Pre-trained dilated backbone network type (‘resnet18’, ‘resnet34’, ‘resnet50’, ‘resnet101’ or ‘resnet152’).

  • pretrained_base (bool or str) – Refers to if the backbone is pretrained or not. If True, model weights of a model that was trained on ImageNet is loaded.

  • num_input_images (int) – The number of input sequences. 1 for depth encoder, larger than 1 for pose encoder. (Default: 2)

  • num_input_features (int) – The number of input feature maps from posenet encoder. (Default: 1)

  • num_frames_to_predict_for (int) – The number of output pose between frames; If None, it equals num_input_features - 1. (Default: 2)

  • stride (int) – The stride number for Conv in pose decoder. (Default: 1)

  • ctx (Context, default: CPU) – The context in which to load the pretrained weights.

  • root (str, default: '~/.mxnet/models') – Location for keeping the model parameters.

  • pretrained (bool or str, default: False) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_model (string, default: kitti_stereo_640x192) – The dataset that model pretrained on.

gluoncv.model_zoo.get_nasnet(repeat=6, penultimate_filters=4032, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

NASNet A model from “Learning Transferable Architectures for Scalable Image Recognition” paper

Parameters
  • repeat (int) – Number of cell repeats

  • penultimate_filters (int) – Number of filters in the penultimate layer of the network

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_psp(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), pretrained_base=True, **kwargs)[source]

Pyramid Scene Parsing Network :param dataset: The dataset that model pretrained on. (pascal_voc, ade20k) :type dataset: str, default pascal_voc :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • pretrained_base (bool or str, default True) – This will load pretrained backbone network, that was trained on ImageNet.

Examples

>>> model = get_fcn(dataset='pascal_voc', backbone='resnet50', pretrained=False)
>>> print(model)
gluoncv.model_zoo.get_psp_resnet101_ade(**kwargs)[source]

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_psp_resnet101_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_psp_resnet101_citys(**kwargs)[source]

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_psp_resnet101_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_psp_resnet101_coco(**kwargs)[source]

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_psp_resnet101_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_psp_resnet101_voc(**kwargs)[source]

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_psp_resnet101_voc(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_psp_resnet50_ade(**kwargs)[source]

Pyramid Scene Parsing Network :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_psp_resnet50_ade(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_resnet(version, num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', use_se=False, **kwargs)[source]

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • version (int) – Version of ResNet. Options are 1, 2.

  • num_layers (int) – Numbers of layers. Options are 18, 34, 50, 101, 152.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • use_se (bool, default False) – Whether to use Squeeze-and-Excitation module

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_resnext(num_layers, cardinality=32, bottleneck_width=4, use_se=False, deep_stem=False, avg_down=False, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

ResNext model from “Aggregated Residual Transformations for Deep Neural Network” paper.

Parameters
  • num_layers (int) – Numbers of layers. Options are 50, 101.

  • cardinality (int) – Number of groups

  • bottleneck_width (int) – Width of bottleneck block

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_se_resnet(version, num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

SE_ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. SE_ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • version (int) – Version of ResNet. Options are 1, 2.

  • num_layers (int) – Numbers of layers. Options are 18, 34, 50, 101, 152.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_ssd(name, base_size, features, filters, sizes, ratios, steps, classes, dataset, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', anchor_generator=<class 'gluoncv.model_zoo.ssd.anchor.SSDAnchorGenerator'>, **kwargs)[source]

Get SSD models.

Parameters
  • name (str or None) – Model name, if None is used, you must specify features to be a HybridBlock.

  • base_size (int) – Base image size for training, this is fixed once training is assigned. A fixed base size still allows you to have variable input size during test.

  • features (iterable of str or HybridBlock) – List of network internal output names, in order to specify which layers are used for predicting bbox values. If name is None, features must be a HybridBlock which generate multiple outputs for prediction.

  • filters (iterable of float or None) – List of convolution layer channels which is going to be appended to the base network feature extractor. If name is None, this is ignored.

  • sizes (iterable fo float) – Sizes of anchor boxes, this should be a list of floats, in incremental order. The length of sizes must be len(layers) + 1. For example, a two stage SSD model can have sizes = [30, 60, 90], and it converts to [30, 60] and [60, 90] for the two stages, respectively. For more details, please refer to original paper.

  • ratios (iterable of list) – Aspect ratios of anchors in each output layer. Its length must be equals to the number of SSD output layers.

  • steps (list of int) – Step size of anchor boxes in each output layer.

  • classes (iterable of str) – Names of categories.

  • dataset (str) – Name of dataset. This is used to identify model name because models trained on different datasets are going to be very different.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.get_vgg(num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

VGG model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • num_layers (int) – Number of layers for the variant of densenet. Options are 11, 13, 16, 19.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

gluoncv.model_zoo.get_vgg_atrous_extractor(num_layers, im_size, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Get VGG atrous feature extractor networks.

Parameters
  • num_layers (int) – VGG types, can be 11,13,16,19.

  • im_size (int) – VGG detection input size, can be 300, 512.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (mx.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

Returns

The returned network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.get_xcetption(pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Xception model from

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_xcetption_71(pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Xception model from

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.get_yolov3(name, stages, filters, anchors, strides, classes, dataset, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Get YOLOV3 models. :param name: Model name, if None is used, you must specify features to be a HybridBlock. :type name: str or None :param stages: List of network internal output names, in order to specify which layers are

used for predicting bbox values. If name is None, features must be a HybridBlock which generate multiple outputs for prediction.

Parameters
  • filters (iterable of float or None) – List of convolution layer channels which is going to be appended to the base network feature extractor. If name is None, this is ignored.

  • sizes (iterable fo float) – Sizes of anchor boxes, this should be a list of floats, in incremental order. The length of sizes must be len(layers) + 1. For example, a two stage SSD model can have sizes = [30, 60, 90], and it converts to [30, 60] and [60, 90] for the two stages, respectively. For more details, please refer to original paper.

  • ratios (iterable of list) – Aspect ratios of anchors in each output layer. Its length must be equals to the number of SSD output layers.

  • steps (list of int) – Step size of anchor boxes in each output layer.

  • classes (iterable of str) – Names of categories.

  • dataset (str) – Name of dataset. This is used to identify model name because models trained on different datasets are going to be very different.

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).

  • root (str) – Model weights storing path.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A YOLOV3 detection network.

Return type

HybridBlock

gluoncv.model_zoo.googlenet(classes=1000, pretrained=False, pretrained_base=True, ctx=cpu(0), dropout_ratio=0.4, aux_logits=False, root='~/.mxnet/models', partial_bn=False, **kwargs)[source]

GoogleNet model from “Going Deeper with Convolutions” paper. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.gpu_iou(bbox_a_tensor, bbox_b_tensor)[source]
Parameters
  • bbox_a_tensor

  • bbox_b_tensor

gluoncv.model_zoo.hrnet_w18_c(**kwargs)[source]

hrnet_w18 for Imagenet classification

gluoncv.model_zoo.hrnet_w18_small_v1_c(**kwargs)[source]

hhrnet_w18_small_v1 for Imagenet classification

gluoncv.model_zoo.hrnet_w18_small_v1_s(**kwargs)[source]

hrnet_w18_small_v1 for cityscapes segmentation

gluoncv.model_zoo.hrnet_w18_small_v2_c(**kwargs)[source]

hhrnet_w18_small_v2 for Imagenet classification

gluoncv.model_zoo.hrnet_w18_small_v2_s(**kwargs)[source]

hrnet_w18_small_v2 for cityscapes segmentation

gluoncv.model_zoo.hrnet_w30_c(**kwargs)[source]

hhrnet_w30 for Imagenet classification

gluoncv.model_zoo.hrnet_w32_c(**kwargs)[source]

hhrnet_w32 for Imagenet classification

gluoncv.model_zoo.hrnet_w40_c(**kwargs)[source]

hhrnet_w40 for Imagenet classification

gluoncv.model_zoo.hrnet_w44_c(**kwargs)[source]

hhrnet_w44 for Imagenet classification

gluoncv.model_zoo.hrnet_w48_c(**kwargs)[source]

hhrnet_w48 for Imagenet classification

gluoncv.model_zoo.hrnet_w48_s(**kwargs)[source]

hrnet_w48 for cityscapes segmentation

gluoncv.model_zoo.hrnet_w64_c(**kwargs)[source]

hhrnet_w64 for Imagenet classification

gluoncv.model_zoo.i3d_inceptionv1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, **kwargs)[source]

Inception v1 model trained on Kinetics400 dataset from “Going Deeper with Convolutions” paper.

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_inceptionv3_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, **kwargs)[source]

Inception v3 model trained on Kinetics400 dataset from “Rethinking the Inception Architecture for Computer Vision” paper.

Inflated 3D model (I3D) from “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” paper.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_nl10_resnet101_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet101 backbone and 10 non-local blocks trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_nl10_resnet50_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet50 backbone and 10 non-local blocks trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_nl5_resnet101_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet101 backbone and 5 non-local blocks trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_nl5_resnet50_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet50 backbone and 5 non-local blocks trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_resnet101_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet101 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_resnet50_v1_custom(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, use_kinetics_pretrain=True, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet50 backbone. Customized for users’s own dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • use_kinetics_pretrain (bool.) – Whether to load Kinetics-400 pre-trained model weights.

gluoncv.model_zoo.i3d_resnet50_v1_hmdb51(nclass=51, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, use_kinetics_pretrain=True, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet50 backbone trained on HMDB51 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_resnet50_v1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, bn_frozen=False, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet50 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_resnet50_v1_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet50 backbone trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.i3d_resnet50_v1_ucf101(nclass=101, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, use_kinetics_pretrain=True, feat_ext=False, **kwargs)[source]

Inflated 3D model (I3D) with ResNet50 backbone trained on UCF101 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • bn_frozen (bool.) – Whether to freeze weight and bias of BN layers.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.inception_v3(pretrained=False, ctx=cpu(0), root='~/.mxnet/models', partial_bn=False, **kwargs)[source]

Inception v3 model from “Rethinking the Inception Architecture for Computer Vision” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • partial_bn (bool, default False) – Freeze all batch normalization layers during training except the first layer.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.inceptionv1_hmdb51(nclass=51, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

InceptionV1 model trained on HMDB51 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.inceptionv1_kinetics400(nclass=400, pretrained=False, pretrained_base=True, tsn=False, num_segments=1, num_crop=1, partial_bn=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

InceptionV1 model trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.inceptionv1_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, tsn=False, num_segments=1, num_crop=1, partial_bn=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

InceptionV1 model trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.inceptionv1_ucf101(nclass=101, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

InceptionV1 model trained on UCF101 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.inceptionv3_hmdb51(nclass=51, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

InceptionV3 model trained on HMDB51 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.inceptionv3_kinetics400(nclass=400, pretrained=False, pretrained_base=True, tsn=False, num_segments=1, num_crop=1, partial_bn=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

InceptionV3 model trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.inceptionv3_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, tsn=False, num_segments=1, num_crop=1, partial_bn=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

InceptionV3 model trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.inceptionv3_ucf101(nclass=101, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

InceptionV3 model trained on UCF101 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.mask_rcnn_fpn_resnet101_v1d_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_fpn_resnet101_v1d_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_fpn_resnet18_v1b_coco(pretrained=False, pretrained_base=True, rcnn_max_dets=1000, rpn_test_pre_nms=6000, rpn_test_post_nms=1000, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • rcnn_max_dets (int, default is 1000) – Number of rois to retain in RCNN.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_fpn_resnet18_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_fpn_resnet50_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_resnet50_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_fpn_syncbn_mobilenet1_0_coco(pretrained=False, pretrained_base=True, num_devices=0, rcnn_max_dets=1000, rpn_test_pre_nms=6000, rpn_test_post_nms=1000, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • num_devices (int, default is 0) – Number of devices for sync batch norm layer. if less than 1, use all devices available.

  • rcnn_max_dets (int, default is 1000) – Number of rois to retain in RCNN.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_fpn_syncbn_mobilenet1_0_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_fpn_syncbn_resnet18_v1b_coco(pretrained=False, pretrained_base=True, num_devices=0, rcnn_max_dets=1000, rpn_test_pre_nms=6000, rpn_test_post_nms=1000, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • num_devices (int, default is 0) – Number of devices for sync batch norm layer. if less than 1, use all devices available.

  • rcnn_max_dets (int, default is 1000) – Number of rois to retain in RCNN.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_fpn_syncbn_resnet18_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_resnet101_v1d_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_resnet101_v1d_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_resnet18_v1b_coco(pretrained=False, pretrained_base=True, rcnn_max_dets=1000, rpn_test_pre_nms=6000, rpn_test_post_nms=1000, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • rcnn_max_dets (int, default is 1000) – Number of rois to retain in RCNN.

  • rpn_test_pre_nms (int, default is 6000) – Filter top proposals before NMS in testing of RPN.

  • rpn_test_post_nms (int, default is 300) – Return top proposal results after NMS in testing of RPN.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_resnet18_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mask_rcnn_resnet50_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

Mask RCNN model from the paper “He, K., Gkioxari, G., Doll&ar, P., & Girshick, R. (2017). Mask R-CNN”

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = mask_rcnn_resnet50_v1b_coco(pretrained=True)
>>> print(model)
gluoncv.model_zoo.mobilenet0_25(**kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper, with width multiplier 0.25.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.mobilenet0_5(**kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper, with width multiplier 0.5.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.mobilenet0_75(**kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper, with width multiplier 0.75.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.mobilenet1_0(**kwargs)[source]

MobileNet model from the “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” paper, with width multiplier 1.0.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.mobilenet_v2_0_25(**kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.mobilenet_v2_0_5(**kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.mobilenet_v2_0_75(**kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.mobilenet_v2_1_0(**kwargs)[source]

MobileNetV2 model from the `”Inverted Residuals and Linear Bottlenecks:

Mobile Networks for Classification, Detection and Segmentation”

<https://arxiv.org/abs/1801.04381>`_ paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.nasnet_4_1056(**kwargs)[source]

NASNet A model from “Learning Transferable Architectures for Scalable Image Recognition” paper

Parameters
  • repeat (int) – Number of cell repeats

  • penultimate_filters (int) – Number of filters in the penultimate layer of the network

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.nasnet_5_1538(**kwargs)[source]

NASNet A model from “Learning Transferable Architectures for Scalable Image Recognition” paper

Parameters
  • repeat (int) – Number of cell repeats

  • penultimate_filters (int) – Number of filters in the penultimate layer of the network

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.nasnet_6_4032(**kwargs)[source]

NASNet A model from “Learning Transferable Architectures for Scalable Image Recognition” paper

Parameters
  • repeat (int) – Number of cell repeats

  • penultimate_filters (int) – Number of filters in the penultimate layer of the network

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.nasnet_7_1920(**kwargs)[source]

NASNet A model from “Learning Transferable Architectures for Scalable Image Recognition” paper

Parameters
  • repeat (int) – Number of cell repeats

  • penultimate_filters (int) – Number of filters in the penultimate layer of the network

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.nms_fallback(boxes, thresh)[source]

Perform non-maximal suppression and return the indices :param boxes: :type boxes: [[x, y, xmax, ymax, score]] :param Returns kept box indices: :param ——-:

gluoncv.model_zoo.p3d_resnet101_kinetics400(nclass=400, pretrained=False, pretrained_base=True, root='~/.mxnet/models', num_segments=1, num_crop=1, feat_ext=False, ctx=cpu(0), **kwargs)[source]

The Pseudo 3D network (P3D) with ResNet101 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.p3d_resnet50_kinetics400(nclass=400, pretrained=False, pretrained_base=True, root='~/.mxnet/models', num_segments=1, num_crop=1, feat_ext=False, ctx=cpu(0), **kwargs)[source]

The Pseudo 3D network (P3D) with ResNet50 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.pretrained_model_list()[source]

Get list of model which has pretrained weights available.

gluoncv.model_zoo.r2plus1d_resnet101_kinetics400(nclass=400, pretrained=False, pretrained_base=True, root='~/.mxnet/models', num_segments=1, num_crop=1, feat_ext=False, ctx=cpu(0), **kwargs)[source]

R2Plus1D with ResNet101 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.r2plus1d_resnet152_kinetics400(nclass=400, pretrained=False, pretrained_base=True, root='~/.mxnet/models', num_segments=1, num_crop=1, feat_ext=False, ctx=cpu(0), **kwargs)[source]

R2Plus1D with ResNet152 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.r2plus1d_resnet18_kinetics400(nclass=400, pretrained=False, pretrained_base=True, root='~/.mxnet/models', num_segments=1, num_crop=1, feat_ext=False, ctx=cpu(0), **kwargs)[source]

R2Plus1D with ResNet18 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.r2plus1d_resnet34_kinetics400(nclass=400, pretrained=False, pretrained_base=True, root='~/.mxnet/models', num_segments=1, num_crop=1, feat_ext=False, ctx=cpu(0), **kwargs)[source]

R2Plus1D with ResNet34 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.r2plus1d_resnet50_kinetics400(nclass=400, pretrained=False, pretrained_base=True, root='~/.mxnet/models', num_segments=1, num_crop=1, feat_ext=False, ctx=cpu(0), **kwargs)[source]

R2Plus1D with ResNet50 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.residualattentionnet128(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
  • input_size (int) – Input size of net. Options are 32,224.

  • num_layers (int) – Numbers of layers. Options are 56, 92, 128, 164, 200, 236, 452.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.residualattentionnet164(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
  • input_size (int) – Input size of net. Options are 32,224.

  • num_layers (int) – Numbers of layers. Options are 56, 92, 128, 164, 200, 236, 452.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.residualattentionnet200(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
  • input_size (int) – Input size of net. Options are 32,224.

  • num_layers (int) – Numbers of layers. Options are 56, 92, 128, 164, 200, 236, 452.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.residualattentionnet236(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
  • input_size (int) – Input size of net. Options are 32,224.

  • num_layers (int) – Numbers of layers. Options are 56, 92, 128, 164, 200, 236, 452.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.residualattentionnet452(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
  • input_size (int) – Input size of net. Options are 32,224.

  • num_layers (int) – Numbers of layers. Options are 56, 92, 128, 164, 200, 236, 452.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.residualattentionnet56(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
  • input_size (int) – Input size of net. Options are 32, 224.

  • num_layers (int) – Numbers of layers. Options are 56, 92, 128, 164, 200, 236, 452.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.residualattentionnet92(**kwargs)[source]

AttentionModel model from “Residual Attention Network for Image Classification” paper.

Parameters
  • input_size (int) – Input size of net. Options are 32,224.

  • num_layers (int) – Numbers of layers. Options are 56, 92, 128, 164, 200, 236, 452.

  • pretrained (bool, default False) – Whether to load the pretrained weights for model.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnest101(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNeSt-101 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNeSt, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnest14(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNeSt-14 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNeSt, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnest200(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNeSt-200 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNeSt, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnest26(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNeSt-26 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNeSt, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnest269(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNeSt-269 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNeSt, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnest50(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNeSt-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNeSt, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet101_v1(**kwargs)[source]

ResNet-101 V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet101_v1b(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-101 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet101_v1b_gn(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-50 GroupNorm model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet101_v1b_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet101 model trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet101_v1b_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet101 model trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet101_v1c(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1c-101 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet101_v1d(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1d-101 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet101_v1e(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1e-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet101_v1s(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1s-101 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet101_v2(**kwargs)[source]

ResNet-101 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet152_v1(**kwargs)[source]

ResNet-152 V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet152_v1b(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-152 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet152_v1b_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet152 model trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet152_v1b_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet152 model trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet152_v1c(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1c-152 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet152_v1d(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1d-152 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet152_v1e(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1e-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet152_v1s(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1s-152 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet152_v2(**kwargs)[source]

ResNet-152 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet18_v1(**kwargs)[source]

ResNet-18 V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet18_v1b(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-18 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet18_v1b_custom(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, use_kinetics_pretrain=True, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet18 model trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet18_v1b_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet18 model trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet18_v1b_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet18 model trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet18_v2(**kwargs)[source]

ResNet-18 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet34_v1(**kwargs)[source]

ResNet-34 V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet34_v1b(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-34 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet34_v1b_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet34 model trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet34_v1b_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet34 model trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet34_v2(**kwargs)[source]

ResNet-34 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet50_v1(**kwargs)[source]

ResNet-50 V1 model from “Deep Residual Learning for Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet50_v1b(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet50_v1b_custom(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), use_kinetics_pretrain=True, **kwargs)[source]

ResNet50 model customized for any dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • use_kinetics_pretrain (bool, default True.) – Whether to load pretrained weights on Kinetics400 dataset as model initialization.

gluoncv.model_zoo.resnet50_v1b_gn(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1b-50 GroupNorm model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • last_gamma (bool, default False) – Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.

  • use_global_stats (bool, default False) – Whether forcing BatchNorm to use global statistics instead of minibatch statistics; optionally set to True if finetuning using ImageNet classification pretrained models.

gluoncv.model_zoo.resnet50_v1b_hmdb51(nclass=51, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet50 model trained on HMDB51 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet50_v1b_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet50 model trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet50_v1b_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet50 model trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet50_v1b_ucf101(nclass=101, pretrained=False, pretrained_base=True, use_tsn=False, partial_bn=False, num_segments=1, num_crop=1, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

ResNet50 model trained on UCF101 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

gluoncv.model_zoo.resnet50_v1c(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1c-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet50_v1d(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1d-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet50_v1e(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1e-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet50_v1s(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a ResNetV1s-50 model.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • dilated (bool, default False) – Whether to apply dilation strategy to ResNetV1b, yielding a stride 8 model.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm). Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnet50_v2(**kwargs)[source]

ResNet-50 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnext101_32x4d(**kwargs)[source]

ResNext101 32x4d model from “Aggregated Residual Transformations for Deep Neural Network” paper.

Parameters
  • cardinality (int) – Number of groups

  • bottleneck_width (int) – Width of bottleneck block

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnext101_64x4d(**kwargs)[source]

ResNext101 64x4d model from “Aggregated Residual Transformations for Deep Neural Network” paper.

Parameters
  • cardinality (int) – Number of groups

  • bottleneck_width (int) – Width of bottleneck block

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnext101e_64x4d(**kwargs)[source]

ResNext101e 64x4d model modified from “Aggregated Residual Transformations for Deep Neural Network” paper.

Parameters
  • cardinality (int) – Number of groups

  • bottleneck_width (int) – Width of bottleneck block

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.resnext50_32x4d(**kwargs)[source]

ResNext50 32x4d model from “Aggregated Residual Transformations for Deep Neural Network” paper.

Parameters
  • cardinality (int) – Number of groups

  • bottleneck_width (int) – Width of bottleneck block

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.se_resnet101_v1(**kwargs)[source]

SE-ResNet-101 V1 model from “Squeeze-and-Excitation Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.se_resnet101_v2(**kwargs)[source]

SE-ResNet-101 V2 model from “Squeeze-and-Excitation Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.se_resnet152_v1(**kwargs)[source]

SE-ResNet-152 V1 model from “Squeeze-and-Excitation Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.se_resnet152_v2(**kwargs)[source]

SE-ResNet-152 V2 model from “Squeeze-and-Excitation Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.se_resnet18_v1(**kwargs)[source]

SE-ResNet-18 V1 model from “Squeeze-and-Excitation Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.se_resnet18_v2(**kwargs)[source]

SE-ResNet-18 V2 model from “Squeeze-and-Excitation Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.se_resnet34_v1(**kwargs)[source]

SE-ResNet-34 V1 model from “Squeeze-and-Excitation Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.se_resnet34_v2(**kwargs)[source]

SE-ResNet-34 V2 model from “Squeeze-and-Excitation Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.se_resnet50_v1(**kwargs)[source]

SE-ResNet-50 V1 model from “Squeeze-and-Excitation Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.se_resnet50_v2(**kwargs)[source]

SE-ResNet-50 V2 model from “Squeeze-and-Excitation Networks” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.se_resnext101_32x4d(**kwargs)[source]

SE-ResNext101 32x4d model from “Aggregated Residual Transformations for Deep Neural Network” paper.

Parameters
  • cardinality (int) – Number of groups

  • bottleneck_width (int) – Width of bottleneck block

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.se_resnext101_64x4d(**kwargs)[source]

SE-ResNext101 64x4d model from “Aggregated Residual Transformations for Deep Neural Network” paper.

Parameters
  • cardinality (int) – Number of groups

  • bottleneck_width (int) – Width of bottleneck block

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.se_resnext101e_64x4d(**kwargs)[source]

SE-ResNext101e 64x4d model modified from “Aggregated Residual Transformations for Deep Neural Network” paper.

Parameters
  • cardinality (int) – Number of groups

  • bottleneck_width (int) – Width of bottleneck block

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.se_resnext50_32x4d(**kwargs)[source]

SE-ResNext50 32x4d model from “Aggregated Residual Transformations for Deep Neural Network” paper.

Parameters
  • cardinality (int) – Number of groups

  • bottleneck_width (int) – Width of bottleneck block

  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

gluoncv.model_zoo.siamrpn_alexnet_v2_otb15(**kwargs)[source]

Alexnet backbone model from `”High Performance Visual Tracking with Siamese Region Proposal Network

Object tracking”

<http://openaccess.thecvf.com/content_cvpr_2018/papers/ Li_High_Performance_Visual_CVPR_2018_paper.pdf>`_ paper.

gluoncv.model_zoo.simple_pose_resnet101_v1b(**kwargs)[source]

ResNet-101 backbone model from “Simple Baselines for Human Pose Estimation and Tracking” paper. :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.simple_pose_resnet101_v1d(**kwargs)[source]

ResNet-101-d backbone model from “Simple Baselines for Human Pose Estimation and Tracking” paper. :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.simple_pose_resnet152_v1b(**kwargs)[source]

ResNet-152 backbone model from “Simple Baselines for Human Pose Estimation and Tracking” paper. :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.simple_pose_resnet152_v1d(**kwargs)[source]

ResNet-152-d backbone model from “Simple Baselines for Human Pose Estimation and Tracking” paper. :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.simple_pose_resnet18_v1b(**kwargs)[source]

ResNet-18 backbone model from “Simple Baselines for Human Pose Estimation and Tracking” paper. :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.simple_pose_resnet50_v1b(**kwargs)[source]

ResNet-50 backbone model from “Simple Baselines for Human Pose Estimation and Tracking” paper. :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.simple_pose_resnet50_v1d(**kwargs)[source]

ResNet-50-d backbone model from “Simple Baselines for Human Pose Estimation and Tracking” paper. :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.slowfast_16x8_resnet101_50_50_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

SlowFast 16x8 networks (SlowFast) with ResNet101 backbone trained on Kinetics400 dataset, but the temporal head is initialized with ResNet50 structure (3, 4, 6, 3).

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.slowfast_16x8_resnet101_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

SlowFast 16x8 networks (SlowFast) with ResNet101 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.slowfast_4x16_resnet101_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

SlowFast 4x16 networks (SlowFast) with ResNet101 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.slowfast_4x16_resnet50_custom(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, use_kinetics_pretrain=True, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

SlowFast 4x16 networks (SlowFast) with ResNet50 backbone. Customized for users’s own dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

  • use_kinetics_pretrain (bool.) – Whether to load Kinetics-400 pre-trained model weights.

gluoncv.model_zoo.slowfast_4x16_resnet50_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

SlowFast 4x16 networks (SlowFast) with ResNet50 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.slowfast_8x8_resnet101_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

SlowFast 8x8 networks (SlowFast) with ResNet101 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.slowfast_8x8_resnet50_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, partial_bn=False, feat_ext=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

SlowFast 8x8 networks (SlowFast) with ResNet50 backbone trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

  • partial_bn (bool, default False.) – Freeze all batch normalization layers during training except the first layer.

  • feat_ext (bool.) – Whether to extract features before dense classification layer or do a complete forward pass.

gluoncv.model_zoo.squeezenet1_0(**kwargs)[source]

SqueezeNet 1.0 model from the “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.squeezenet1_1(**kwargs)[source]

SqueezeNet 1.1 model from the official SqueezeNet repo. SqueezeNet 1.1 has 2.4x less computation and slightly fewer parameters than SqueezeNet 1.0, without sacrificing accuracy.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.ssd_300_mobilenet0_25_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with mobilenet0.25 base networks for COCO.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_300_mobilenet0_25_custom(classes, pretrained_base=True, pretrained=False, transfer=None, **kwargs)[source]

SSD architecture with mobilenet0.25 300 base network for custom dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • transfer (str or None) – If not None, will try to reuse pre-trained weights from SSD networks trained on other datasets.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

Example

>>> net = ssd_300_mobilenet0_25_custom(classes=['a', 'b', 'c'], pretrained_base=True)
>>> net = ssd_300_mobilenet0_25_custom(classes=['foo', 'bar'], transfer='voc')
gluoncv.model_zoo.ssd_300_mobilenet0_25_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with mobilenet0.25 base networks.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_300_resnet34_v1b_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v1b 34 layers.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_300_resnet34_v1b_custom(classes, pretrained_base=True, pretrained=False, transfer=None, **kwargs)[source]

SSD architecture with ResNet v1b 34 layers for custom dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • transfer (str or None) – If not None, will try to reuse pre-trained weights from SSD networks trained on other datasets.

Returns

A SSD detection network.

Return type

HybridBlock

Example

>>> net = ssd_300_resnet34_v1b_custom(classes=['a', 'b', 'c'], pretrained_base=True)
>>> net = ssd_300_resnet34_v1b_custom(classes=['foo', 'bar'], transfer='coco')
gluoncv.model_zoo.ssd_300_resnet34_v1b_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v1b 34 layers.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_300_vgg16_atrous_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with VGG16 atrous 300x300 base network for COCO.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_300_vgg16_atrous_custom(classes, pretrained_base=True, pretrained=False, transfer=None, **kwargs)[source]

SSD architecture with VGG16 atrous 300x300 base network for COCO.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • transfer (str or None) – If not None, will try to reuse pre-trained weights from SSD networks trained on other datasets.

Returns

A SSD detection network.

Return type

HybridBlock

Example

>>> net = ssd_300_vgg16_atrous_custom(classes=['a', 'b', 'c'], pretrained_base=True)
>>> net = ssd_300_vgg16_atrous_custom(classes=['foo', 'bar'], transfer='coco')
gluoncv.model_zoo.ssd_300_vgg16_atrous_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with VGG16 atrous 300x300 base network for Pascal VOC.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_mobilenet1_0_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with mobilenet1.0 base networks for COCO.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_mobilenet1_0_custom(classes, pretrained_base=True, pretrained=False, transfer=None, **kwargs)[source]

SSD architecture with mobilenet1.0 512 base network for custom dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • transfer (str or None) – If not None, will try to reuse pre-trained weights from SSD networks trained on other datasets.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

Example

>>> net = ssd_512_mobilenet1_0_custom(classes=['a', 'b', 'c'], pretrained_base=True)
>>> net = ssd_512_mobilenet1_0_custom(classes=['foo', 'bar'], transfer='voc')
gluoncv.model_zoo.ssd_512_mobilenet1_0_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with mobilenet1.0 base networks.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_resnet101_v2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v2 101 layers.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_resnet152_v2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v2 152 layers.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_resnet18_v1_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v1 18 layers.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_resnet18_v1_custom(classes, pretrained_base=True, pretrained=False, transfer=None, **kwargs)[source]

SSD architecture with ResNet18 v1 512 base network for COCO.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • transfer (str or None) – If not None, will try to reuse pre-trained weights from SSD networks trained on other datasets.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

Example

>>> net = ssd_512_resnet18_v1_custom(classes=['a', 'b', 'c'], pretrained_base=True)
>>> net = ssd_512_resnet18_v1_custom(classes=['foo', 'bar'], transfer='voc')
gluoncv.model_zoo.ssd_512_resnet18_v1_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v1 18 layers.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_resnet50_v1_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v1 50 layers for COCO.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_resnet50_v1_custom(classes, pretrained_base=True, pretrained=False, transfer=None, **kwargs)[source]

SSD architecture with ResNet50 v1 512 base network for custom dataset.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • transfer (str or None) – If not None, will try to reuse pre-trained weights from SSD networks trained on other datasets.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

Example

>>> net = ssd_512_resnet50_v1_custom(classes=['a', 'b', 'c'], pretrained_base=True)
>>> net = ssd_512_resnet50_v1_custom(classes=['foo', 'bar'], transfer='voc')
gluoncv.model_zoo.ssd_512_resnet50_v1_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v1 50 layers.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_vgg16_atrous_coco(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with VGG16 atrous layers for COCO.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.ssd_512_vgg16_atrous_custom(classes, pretrained_base=True, pretrained=False, transfer=None, **kwargs)[source]

SSD architecture with VGG16 atrous 300x300 base network for COCO.

Parameters
  • classes (iterable of str) – Names of custom foreground classes. len(classes) is the number of foreground classes.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

  • transfer (str or None) – If not None, will try to reuse pre-trained weights from SSD networks trained on other datasets.

Returns

A SSD detection network.

Return type

HybridBlock

Example

>>> net = ssd_512_vgg16_atrous_custom(classes=['a', 'b', 'c'], pretrained_base=True)
>>> net = ssd_512_vgg16_atrous_custom(classes=['foo', 'bar'], transfer='coco')
gluoncv.model_zoo.ssd_512_vgg16_atrous_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with VGG16 atrous 512x512 base network.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True) – Load pretrained base network, the extra layers are randomized.

Returns

A SSD detection network.

Return type

HybridBlock

gluoncv.model_zoo.timeit(method)[source]

The timing decorator to wrap the functions

gluoncv.model_zoo.vgg11(**kwargs)[source]

VGG-11 model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.vgg11_bn(**kwargs)[source]

VGG-11 model with batch normalization from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.vgg13(**kwargs)[source]

VGG-13 model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.vgg13_bn(**kwargs)[source]

VGG-13 model with batch normalization from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.vgg16(**kwargs)[source]

VGG-16 model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.vgg16_atrous_300(**kwargs)[source]

Get VGG atrous 16 layer 300 in_size feature extractor networks.

gluoncv.model_zoo.vgg16_atrous_512(**kwargs)[source]

Get VGG atrous 16 layer 512 in_size feature extractor networks.

gluoncv.model_zoo.vgg16_bn(**kwargs)[source]

VGG-16 model with batch normalization from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.vgg16_hmdb51(nclass=51, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

VGG16 model trained on HMDB51 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

gluoncv.model_zoo.vgg16_kinetics400(nclass=400, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

VGG16 model trained on Kinetics400 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

gluoncv.model_zoo.vgg16_sthsthv2(nclass=174, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

VGG16 model trained on Something-Something-V2 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

gluoncv.model_zoo.vgg16_ucf101(nclass=101, pretrained=False, pretrained_base=True, use_tsn=False, num_segments=1, num_crop=1, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

VGG16 model trained on UCF101 dataset.

Parameters
  • nclass (int.) – Number of categories in the dataset.

  • pretrained (bool or str.) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • pretrained_base (bool or str, optional, default is True.) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is True, this has no effect.

  • ctx (Context, default CPU.) – The context in which to load the pretrained weights.

  • root (str, default $MXNET_HOME/models) – Location for keeping the model parameters.

  • num_segments (int, default is 1.) – Number of segments used to evenly divide a video.

  • num_crop (int, default is 1.) – Number of crops used during evaluation, choices are 1, 3 or 10.

gluoncv.model_zoo.vgg19(**kwargs)[source]

VGG-19 model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.vgg19_bn(**kwargs)[source]

VGG-19 model with batch normalization from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • ctx (Context, default CPU) – The context in which to load the pretrained weights.

  • root (str, default '$MXNET_HOME/models') – Location for keeping the model parameters.

gluoncv.model_zoo.yolo3_darknet53_coco(pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with darknet53 base network on COCO dataset. :param pretrained_base: Whether fetch and load pretrained weights for base network. :type pretrained_base: boolean :param pretrained: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.yolo3_darknet53_custom(classes, transfer=None, pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with darknet53 base network on custom dataset. :param classes: Names of custom foreground classes. len(classes) is the number of foreground classes. :type classes: iterable of str :param transfer: If not None, will try to reuse pre-trained weights from yolo networks trained on other

datasets.

Parameters
  • pretrained_base (boolean) – Whether fetch and load pretrained weights for base network.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.yolo3_darknet53_voc(pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with darknet53 base network on VOC dataset. :param pretrained_base: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.yolo3_mobilenet0_25_coco(pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with mobilenet0.25 base network on COCO dataset. :param pretrained_base: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.yolo3_mobilenet0_25_custom(classes, transfer=None, pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with mobilenet0.25 base network on custom dataset. :param classes: Names of custom foreground classes. len(classes) is the number of foreground classes. :type classes: iterable of str :param transfer: If not None, will try to reuse pre-trained weights from yolo networks trained on other

datasets.

Parameters
  • pretrained_base (boolean) – Whether fetch and load pretrained weights for base network.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.yolo3_mobilenet0_25_voc(pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with mobilenet0.25 base network on VOC dataset. :param pretrained_base: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.yolo3_mobilenet1_0_coco(pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with mobilenet base network on COCO dataset. :param pretrained_base: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.yolo3_mobilenet1_0_custom(classes, transfer=None, pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with mobilenet base network on custom dataset. :param classes: Names of custom foreground classes. len(classes) is the number of foreground classes. :type classes: iterable of str :param transfer: If not None, will try to reuse pre-trained weights from yolo networks trained on other

datasets.

Parameters
  • pretrained_base (boolean) – Whether fetch and load pretrained weights for base network.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock

gluoncv.model_zoo.yolo3_mobilenet1_0_voc(pretrained_base=True, pretrained=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, norm_kwargs=None, **kwargs)[source]

YOLO3 multi-scale with mobilenet base network on VOC dataset. :param pretrained_base: Boolean value controls whether to load the default pretrained weights for model.

String value represents the hashtag for a certain version of pretrained weights.

Parameters
  • pretrained (bool or str) – Boolean value controls whether to load the default pretrained weights for model. String value represents the hashtag for a certain version of pretrained weights.

  • norm_layer (object) – Normalization layer used (default: mxnet.gluon.nn.BatchNorm) Can be mxnet.gluon.nn.BatchNorm or mxnet.gluon.contrib.nn.SyncBatchNorm.

  • norm_kwargs (dict) – Additional norm_layer arguments, for example num_devices=4 for mxnet.gluon.contrib.nn.SyncBatchNorm.

Returns

Fully hybrid yolo3 network.

Return type

mxnet.gluon.HybridBlock