Table Of Contents
Table Of Contents



The model names contain the training information. For instance, fcn_resnet50_voc:

  • fcn indicate the algorithm is “Fully Convolutional Network for Semantic Segmentation” [2].
  • resnet50 is the name of backbone network.
  • voc is the training dataset.

Semantic Segmentation

Table of pre-trained models for semantic segmentation and their performance.


The test script Download can be used for evaluating the models (VOC results are evaluated using the official server). For example fcn_resnet50_ade:

python --dataset ade20k --model-zoo fcn_resnet50_ade --eval

The training commands work with the script: Download

ADE20K Dataset

Name Method pixAcc mIoU Command log
fcn_resnet50_ade FCN [2] 79.0 39.5 shell script log
fcn_resnet101_ade FCN [2] 80.6 41.6 shell script log
psp_resnet50_ade PSP [3] 80.1 41.5 shell script log
psp_resnet101_ade PSP [3] 80.8 43.3 shell script log
deeplab_resnet50_ade DeepLabV3 [4] 80.5 42.5 shell script log
deeplab_resnet101_ade DeepLabV3 [4] 81.1 44.1 shell script log

MS-COCO Dataset Pretrain

Name Method pixAcc mIoU Command log
fcn_resnet101_coco FCN [2] 92.2 66.2 shell script log
psp_resnet101_coco PSP [3] 92.4 70.4 shell script log
deeplab_resnet101_coco DeepLabV3 [4] 92.5 70.4 shell script log

Pascal VOC Dataset

Name Method pixAcc mIoU Command log
fcn_resnet101_voc FCN [2] N/A 83.6 shell script log
psp_resnet101_voc PSP [3] N/A 85.1 shell script log
deeplab_resnet101_voc DeepLabV3 [4] N/A 86.2 shell script log
deeplab_resnet152_voc DeepLabV3 [4] N/A 86.7 shell script log
psp_resnet101_citys PSP [3] N/A 77.1 shell script log

Instance Segmentation

Table of pre-trained models for instance segmentation and their performance.


The training commands work with the following scripts:

For COCO dataset, training imageset is train2017 and validation imageset is val2017.

Average precision with IoU threshold 0.5:0.95 (averaged 10 values), 0.5 and 0.75 are reported together in the format (AP 0.5:0.95)/(AP 0.5)/(AP 0.75).

For instance segmentation task, both box overlap and segmentation overlap based AP are evaluated and reported.


Model Box AP Segm AP Command Training Log
mask_rcnn_resnet50_v1b_coco 38.3/58.7/41.4 33.1/54.8/35.0 shell script log
mask_rcnn_fpn_resnet50_v1b_coco 39.2/61.2/42.2 35.4/57.5/37.3 shell script log
mask_rcnn_resnet101_v1d_coco 41.3/61.7/44.4 35.2/57.8/36.9 shell script log
mask_rcnn_fpn_resnet101_v1d_coco 42.3/63.9/46.2 37.7/60.5/40.0 shell script log
[1]He, Kaming, Georgia Gkioxari, Piotr Dollár and Ross Girshick. “Mask R-CNN.” In IEEE International Conference on Computer Vision (ICCV), 2017.
[2](1, 2, 3, 4, 5) Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
[3](1, 2, 3, 4, 5) Zhao, Hengshuang, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. “Pyramid scene parsing network.” CVPR, 2017
[4](1, 2, 3, 4, 5) Chen, Liang-Chieh, et al. “Rethinking atrous convolution for semantic image segmentation.” arXiv preprint arXiv:1706.05587 (2017).