Table Of Contents
Table Of Contents

Classification

Visualization of Inference Throughputs vs. Validation Accuracy of ImageNet pre-trained models is illustrated in the following graph.

../_images/plot_help.png my plot

How To Use Pretrained Models

  • The following example requires GluonCV>=0.3 and MXNet>=1.3.0. Please follow our installation guide to install or upgrade GluonCV and MXNet if necessary.
  • Prepare an image by yourself or use our sample image. You can save the image into filename classification-demo.png in your working directory or change the filename in the source codes if you use an another name.
  • Use a pre-trained model. A model is specified by its name.

Let’s try it out!

import mxnet as mx
import gluoncv

# you can change it to your image filename
filename = 'classification-demo.png'
# you may modify it to switch to another model. The name is case-insensitive
model_name = 'ResNet50_v1d'
# download and load the pre-trained model
net = gluoncv.model_zoo.get_model(model_name, pretrained=True)
# load image
img = mx.image.imread(filename)
# apply default data preprocessing
transformed_img = gluoncv.data.transforms.presets.imagenet.transform_eval(img)
# run forward pass to obtain the predicted score for each class
pred = net(transformed_img)
# map predicted values to probability by softmax
prob = mx.nd.softmax(pred)[0].asnumpy()
# find the 5 class indices with the highest score
ind = mx.nd.topk(pred, k=5)[0].astype('int').asnumpy().tolist()
# print the class name and predicted probability
print('The input picture is classified to be')
for i in range(5):
    print('- [%s], with probability %.3f.'%(net.classes[ind[i]], prob[ind[i]]))

The output from our sample image is expected to be

The input picture is classified to be
- [Welsh springer spaniel], with probability 0.899.
- [Irish setter], with probability 0.005.
- [Brittany spaniel], with probability 0.003.
- [cocker spaniel], with probability 0.002.
- [Blenheim spaniel], with probability 0.002.

Remember, you can try different models by replacing the value of model_name. Read further for model names and their performances in the tables.

ImageNet

Hint

Training commands work with this script:

Download train_imagenet.py

A model can have differently trained parameters with different hashtags. Parameters with a grey name can be downloaded by passing the corresponding hashtag.

  • Download default pretrained weights: net = get_model('ResNet50_v1d', pretrained=True)
  • Download weights given a hashtag: net = get_model('ResNet50_v1d', pretrained='117a384e')

ResNet

Hint

  • ResNet_v1b modifies ResNet_v1 by setting stride at the 3x3 layer for a bottleneck block.
  • ResNet_v1c modifies ResNet_v1b by replacing the 7x7 conv layer with three 3x3 conv layers.
  • ResNet_v1d modifies ResNet_v1c by adding an avgpool layer 2x2 with stride 2 downsample feature map on the residual path to preserve more information.
Model Top-1 Top-5 Hashtag Training Command Training Log
ResNet18_v1 [1] 70.93 89.92 a0666292 shell script log
ResNet34_v1 [1] 74.37 91.87 48216ba9 shell script log
ResNet50_v1 [1] 77.36 93.57 cc729d95 shell script log
ResNet101_v1 [1] 78.34 94.01 d988c13d shell script log
ResNet152_v1 [1] 79.22 94.64 acfd0970 shell script log
ResNet18_v1b [1] 70.94 89.83 2d9d980c shell script log
ResNet34_v1b [1] 74.65 92.08 8e16b848 shell script log
ResNet50_v1b [1] 77.67 93.82 0ecdba34 shell script log
ResNet101_v1b [1] 79.20 94.61 a455932a shell script log
ResNet152_v1b [1] 79.69 94.74 a5a61ee1 shell script log
ResNet50_v1c [1] 78.03 94.09 2a4e0708 shell script log
ResNet101_v1c [1] 79.60 94.75 064858f2 shell script log
ResNet152_v1c [1] 80.01 94.96 75babab6 shell script log
ResNet50_v1d [1] 79.15 94.58 117a384e shell script log
ResNet50_v1d [1] 78.48 94.20 00319ddc shell script log
ResNet101_v1d [1] 80.51 95.12 1b2b825f shell script log
ResNet101_v1d [1] 79.78 94.80 8659a9d6 shell script log
ResNet152_v1d [1] 80.61 95.34 cddbc86f shell script log
ResNet152_v1d [1] 80.26 95.00 cfe0220d shell script log
ResNet18_v2 [2] 71.00 89.92 a81db45f shell script log
ResNet34_v2 [2] 74.40 92.08 9d6b80bb shell script log
ResNet50_v2 [2] 77.11 93.43 ecdde353 shell script log
ResNet101_v2 [2] 78.53 94.17 18e93e4f shell script log
ResNet152_v2 [2] 79.21 94.31 f2695542 shell script log

MobileNet

Model Top-1 Top-5 Hashtag Training Command Training Log
MobileNet1.0 [4] 73.28 91.30 efbb2ca3 shell script log
MobileNet1.0 [4] 72.93 91.14 cce75496 shell script log
MobileNet0.75 [4] 70.25 89.49 84c801e2 shell script log
MobileNet0.5 [4] 65.20 86.34 0130d2aa shell script log
MobileNet0.25 [4] 52.91 76.94 f0046a3d shell script log
MobileNetV2_1.0 [5] 71.92 90.56 36da4ff1 shell script log
MobileNetV2_0.75 [5] 69.61 88.95 e2be7b72 shell script log
MobileNetV2_0.5 [5] 64.49 85.47 aabd26cd shell script log
MobileNetV2_0.25 [5] 50.74 74.56 ae8f9392 shell script log

VGG

Model Top-1 Top-5 Hashtag Training Command Training Log
VGG11 [9] 66.62 87.34 dd221b16    
VGG13 [9] 67.74 88.11 6bc5de58    
VGG16 [9] 73.23 91.31 e660d456 shell script log
VGG19 [9] 74.11 91.35 ad2f660d shell script log
VGG11_bn [9] 68.59 88.72 ee79a809    
VGG13_bn [9] 68.84 88.82 7d97a06c    
VGG16_bn [9] 73.10 91.76 7f01cf05 shell script log
VGG19_bn [9] 74.33 91.85 f360b758 shell script log

SqueezeNet

Model Top-1 Top-5 Hashtag Training Command Training Log
SqueezeNet1.0 [10] 56.11 79.09 264ba497    
SqueezeNet1.1 [10] 54.96 78.17 33ba0f93    

DenseNet

Model Top-1 Top-5 Hashtag Training Command Training Log
DenseNet121 [7] 74.97 92.25 f27dbf2d    
DenseNet161 [7] 77.70 93.80 b6c8a957    
DenseNet169 [7] 76.17 93.17 2603f878    
DenseNet201 [7] 77.32 93.62 1cdbc116    

Others

Hint

InceptionV3 is evaluated with input size of 299x299.

Model Top-1 Top-5 Hashtag Training Command Training Log
AlexNet [6] 54.92 78.03 44335d1f    
darknet53 [3] 78.56 94.43 2189ea49 shell script log
darknet53 [3] 78.13 93.86 95975047 shell script log
InceptionV3 [8] 78.77 94.39 a5050dbc shell script log
InceptionV3 [8] 78.41 94.13 e132adf2 shell script log
SENet_154 [14] 81.26 95.51 b5538ef1    

CIFAR10

The following table lists pre-trained models trained on CIFAR10.

Hint

Our pre-trained models reproduce results from “Mix-Up” [13] . Please check the reference paper for further information.

Training commands in the table work with the following scripts:

Model Acc (Vanilla/Mix-Up [13] ) Training Command Training Log
CIFAR_ResNet20_v1 [1] 92.1 / 92.9 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNet56_v1 [1] 93.6 / 94.2 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNet110_v1 [1] 93.0 / 95.2 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNet20_v2 [2] 92.1 / 92.7 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNet56_v2 [2] 93.7 / 94.6 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNet110_v2 [2] 94.3 / 95.5 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_WideResNet16_10 [11] 95.1 / 96.7 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_WideResNet28_10 [11] 95.6 / 97.2 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_WideResNet40_8 [11] 95.9 / 97.3 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNeXt29_16x64d [12] 96.3 / 97.3 Vanilla / Mix-Up Vanilla / Mix-Up
[1](1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22) He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016.
[2](1, 2, 3, 4, 5, 6, 7, 8) He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Identity mappings in deep residual networks.” In European Conference on Computer Vision, pp. 630-645. Springer, Cham, 2016.
[3](1, 2) Redmon, Joseph, and Ali Farhadi. “Yolov3: An incremental improvement.” arXiv preprint arXiv:1804.02767 (2018).
[4](1, 2, 3, 4, 5) Howard, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. “Mobilenets: Efficient convolutional neural networks for mobile vision applications.” arXiv preprint arXiv:1704.04861 (2017).
[5](1, 2, 3, 4) Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. “Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation.” arXiv preprint arXiv:1801.04381 (2018).
[6]Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” In Advances in neural information processing systems, pp. 1097-1105. 2012.
[7](1, 2, 3, 4) Huang, Gao, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. “Densely Connected Convolutional Networks.” In CVPR, vol. 1, no. 2, p. 3. 2017.
[8](1, 2) Szegedy, Christian, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. “Rethinking the inception architecture for computer vision.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818-2826. 2016.
[9](1, 2, 3, 4, 5, 6, 7, 8) Karen Simonyan, Andrew Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” arXiv technical report arXiv:1409.1556 (2014).
[10](1, 2) Iandola, Forrest N., Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size.” arXiv preprint arXiv:1602.07360 (2016).
[11](1, 2, 3) Zagoruyko, Sergey, and Nikos Komodakis. “Wide residual networks.” arXiv preprint arXiv:1605.07146 (2016).
[12]Xie, Saining, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. “Aggregated residual transformations for deep neural networks.” In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pp. 5987-5995. IEEE, 2017.
[13](1, 2) Zhang, Hongyi, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. “mixup: Beyond empirical risk minimization.” arXiv preprint arXiv:1710.09412 (2017).
[14]Hu, Jie, Li Shen, and Gang Sun. “Squeeze-and-excitation networks.” arXiv preprint arXiv:1709.01507 7 (2017).