GluonCV Model Zoo

GluonCV Model Zoo, similar to the upstream Gluon Model Zoo, provides pre-defined and pre-trained models to help bootstrap computer vision applications.

Model Zoo API

from gluoncv import model_zoo
# load a ResNet model trained on CIFAR10
cifar_resnet20 = model_zoo.get_model('cifar_resnet20_v1', pretrained=True)
# load a pre-trained ssd model
ssd0 = model_zoo.get_model('ssd_300_vgg16_atrous_voc', pretrained=True)
# load ssd model with pre-trained feature extractors
ssd1 = model_zoo.get_model('ssd_512_vgg16_atrous_voc', pretrained_base=True)
# load ssd model without initialization
ssd2 = model_zoo.get_model('ssd_512_resnet50_v1_voc', pretrained_base=False)

We recommend using gluoncv.model_zoo.get_model() for loading pre-defined models, because it provides name checking and list available choices.

However, you can still load models by directly instantiate it like

from gluoncv import model_zoo
cifar_resnet20 = model_zoo.cifar_resnet20_v1(pretrained=True)

Hint

Detailed model_zoo APIs are available in API reference: gluoncv.model_zoo().

Summary of Available Models

GluonCV is still under development, stay tuned for more models!

Image Classification

The following table lists pre-trained models on ImageNet. We will keep adding new models and training scripts to the table.

Besides the listed, we provide more models trained on ImageNet in the upstream Gluon Model Zoo.

ImageNet

Hint

Training commands work with this script:

Download train_imagenet.py

The resnet_v1b family is a modified version of resnet_v1, specifically we set stride at the 3x3 layer for a bottleneck block. ResNet18 and ResNet34 have identical v1 and v1b network structures. This modification has been mentioned in recent literatures, e.g. [8] .

Model Top-1 Top-5 Training Command Training Log
ResNet18_v1 [1] 70.93 89.92 shell script log
ResNet34_v1 [1] 74.37 91.87 shell script log
ResNet50_v1 [1] 76.47 93.13 shell script log
ResNet101_v1 [1] 78.34 94.01 shell script log
ResNet152_v1 [1] 79.00 94.38 shell script log
ResNet18_v1b [1] 70.94 89.83 shell script log
ResNet34_v1b [1] 74.65 92.08 shell script log
ResNet50_v1b [1] 77.07 93.55 shell script log
ResNet101_v1b [1] 78.81 94.39 shell script log
ResNet152_v1b [1] 79.44 94.61 shell script log
ResNet18_v2 [2] 71.00 89.92 shell script log
ResNet34_v2 [2] 74.40 92.08 shell script log
ResNet50_v2 [2] 77.11 93.43 shell script log
ResNet101_v2 [2] 78.53 94.17 shell script log
ResNet152_v2 [2] 79.21 94.31 shell script log
MobileNetV2_1.0 [7] 71.92 90.56 shell script log
MobileNetV2_0.75 [7] 69.61 88.95 shell script log
MobileNetV2_0.5 [7] 64.49 85.47 shell script log
MobileNetV2_0.25 [7] 50.74 74.56 shell script log

CIFAR10

The following table lists pre-trained models trained on CIFAR10.

Hint

Our pre-trained models reproduce results from “Mix-Up” [4] . Please check the reference paper for further information.

Training commands in the table work with the following scripts:

Model Acc (Vanilla/Mix-Up [4] ) Training Command Training Log
CIFAR_ResNet20_v1 [1] 92.1 / 92.9 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNet56_v1 [1] 93.6 / 94.2 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNet110_v1 [1] 93.0 / 95.2 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNet20_v2 [2] 92.1 / 92.7 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNet56_v2 [2] 93.7 / 94.6 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNet110_v2 [2] 94.3 / 95.5 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_WideResNet16_10 [3] 95.1 / 96.7 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_WideResNet28_10 [3] 95.6 / 97.2 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_WideResNet40_8 [3] 95.9 / 97.3 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNeXt29_16x64d [8] 96.3 / 97.3 Vanilla / Mix-Up Vanilla / Mix-Up

Object Detection

The following table lists pre-trained models for object detection and their performances.

Hint

Model attributes are coded in their names. For instance, ssd_300_vgg16_atrous_voc consists of four parts:

  • ssd indicate the algorithm is “Single Shot Multibox Object Detection” [5].
  • 300 is the training image size, which means training images are resized to 300x300 and all anchor boxes are designed to match this shape.
  • vgg16_atrous is the type of base feature extractor network.
  • voc is the training dataset. You can choose voc or coco, etc.

Hint

The training commands work with the following scripts:

Model mAP Training Command Training log
ssd_300_vgg16_atrous_voc 77.6 shell script log
ssd_512_vgg16_atrous_voc 79.2 shell script log
ssd_512_resnet50_v1_voc 80.1 shell script log
ssd_512_mobilenet1.0_voc 75.4 shell script log
faster_rcnn_resnet50_v2a_voc 77.9 shell script log
Model 0.5:0.95 0.5 0.75 Training Command Training Log
ssd_300_vgg16_atrous_coco 25.1 42.9 25.8 shell script log
ssd_512_vgg16_atrous_coco 28.9 47.9 30.6 shell script log
ssd_512_resnet50_v1_coco 30.6 50.0 32.2 shell script log

Semantic Segmentation

Table of pre-trained models for semantic segmentation and their performance.

Hint

The model names contain the training information. For instance, fcn_resnet50_voc:

  • fcn indicate the algorithm is “Fully Convolutional Network for Semantic Segmentation” [6].
  • resnet50 is the name of backbone network.
  • voc is the training dataset.

The test script Download test.py can be used for evaluating the models (VOC results are evaluated using the official server). For example fcn_resnet50_ade:

python test.py --dataset ade20k --model-zoo fcn_resnet50_ade --eval

The training commands work with the script: Download train.py

Name Method pixAcc mIoU Command log
fcn_resnet50_voc FCN [6] N/A 69.4 shell script log
fcn_resnet101_voc FCN [6] N/A 70.9 shell script log
fcn_resnet50_ade FCN [6] 78.6 38.7 shell script log
psp_resnet50_ade PSP [9]_ 78.4 41.1 shell script log
[1](1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13) He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016.
[2](1, 2, 3, 4, 5, 6, 7, 8) He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Identity mappings in deep residual networks.” In European Conference on Computer Vision, pp. 630-645. Springer, Cham, 2016.
[3](1, 2, 3) Zagoruyko, Sergey, and Nikos Komodakis. “Wide residual networks.” arXiv preprint arXiv:1605.07146 (2016).
[4](1, 2) Zhang, Hongyi, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. “mixup: Beyond empirical risk minimization.” arXiv preprint arXiv:1710.09412 (2017).
[5]Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.
[6](1, 2, 3, 4) Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
[7](1, 2, 3, 4) Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. “Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation.” arXiv preprint arXiv:1801.04381 (2018).
[8](1, 2) Xie, Saining, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. “Aggregated residual transformations for deep neural networks.” In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pp. 5987-5995. IEEE, 2017.