Segmentation¶
MXNet Pytorch
MXNet¶
Visualization of Inference Throughputs vs. Validation mIoU of COCO pre-trained models is illustrated in the following graph. Throughputs are measured with single V100 GPU and batch size 16.
Hint
The model names contain the training information. For instance, fcn_resnet50_voc
:
fcn
indicate the algorithm is “Fully Convolutional Network for Semantic Segmentation” 2.resnet50
is the name of backbone network.voc
is the training dataset.
Semantic Segmentation¶
Table of pre-trained models for semantic segmentation and their performance.
Hint
The test script Download test.py
can be used for
evaluating the models (VOC results are evaluated using the official server). For example fcn_resnet50_ade
:
python test.py --dataset ade20k --model-zoo fcn_resnet50_ade --eval
The training commands work with the script: Download train.py
ADE20K Dataset¶
Name |
Method |
pixAcc |
mIoU |
Command |
log |
---|---|---|---|---|---|
fcn_resnet50_ade |
FCN 2 |
79 |
39.5 |
||
fcn_resnet101_ade |
FCN 2 |
80.6 |
41.6 |
||
psp_resnet50_ade |
PSP 3 |
80.1 |
41.5 |
||
psp_resnet101_ade |
PSP 3 |
80.8 |
43.3 |
||
deeplab_resnet50_ade |
DeepLabV3 4 |
80.5 |
42.5 |
||
deeplab_resnet101_ade |
DeepLabV3 4 |
81.1 |
44.1 |
||
deeplab_resnest50_ade |
81.2 |
45.1 |
|||
deeplab_resnest101_ade |
82.1 |
46.9 |
|||
deeplab_resnest200_ade |
82.5 |
48.4 |
|||
deeplab_resnest269_ade |
82.6 |
47.6 |
MS-COCO Dataset Pretrain¶
Name |
Method |
pixAcc |
mIoU |
Command |
log |
---|---|---|---|---|---|
fcn_resnet101_coco |
FCN 2 |
92.2 |
66.2 |
||
psp_resnet101_coco |
PSP 3 |
92.4 |
70.4 |
||
deeplab_resnet101_coco |
DeepLabV3 4 |
92.5 |
70.4 |
Pascal VOC Dataset¶
Name |
Method |
pixAcc |
mIoU |
Command |
log |
---|---|---|---|---|---|
fcn_resnet101_voc |
FCN 2 |
N/A |
|||
psp_resnet101_voc |
PSP 3 |
N/A |
|||
deeplab_resnet101_voc |
DeepLabV3 4 |
N/A |
|||
deeplab_resnet152_voc |
DeepLabV3 4 |
N/A |
Cityscapes Dataset¶
Name |
Method |
pixAcc |
mIoU |
Command |
log |
---|---|---|---|---|---|
psp_resnet101_citys |
PSP 3 |
96.4 |
79.9 |
||
deeplab_resnet50_citys |
DeepLabV3 4 |
96.3 |
78.7 |
||
deeplab_resnet101_citys |
DeepLabV3 4 |
96.4 |
79.4 |
||
danet_resnet50_citys |
DANet 7 |
96.3 |
78.5 |
||
danet_resnet101_citys |
DANet 7 |
96.5 |
80.1 |
||
icnet_resnet50_citys |
ICNet 5 |
95.5 |
74.5 |
||
fastscnn_citys |
95.1 |
72.3 |
|||
deeplab_v3b_plus_wideresnet_citys |
VPLR 6 |
N/A |
83.5 |
Instance Segmentation¶
Table of pre-trained models for instance segmentation and their performance.
Hint
The training commands work with the following scripts:
For Mask R-CNN networks:
Download train_mask_rcnn.py
For COCO dataset, training imageset is train2017 and validation imageset is val2017.
Average precision with IoU threshold 0.5:0.95 (averaged 10 values), 0.5 and 0.75 are reported together in the format (AP 0.5:0.95)/(AP 0.5)/(AP 0.75).
For instance segmentation task, both box overlap and segmentation overlap based AP are evaluated and reported.
MS COCO¶
Model |
Box AP |
Segm AP |
Command |
Training Log |
---|---|---|---|---|
mask_rcnn_resnet18_v1b_coco |
31.2/51.1/33.1 |
28.4/48.1/29.8 |
||
mask_rcnn_fpn_resnet18_v1b_coco |
34.9/56.4/37.4 |
30.4/52.2/31.4 |
||
mask_rcnn_resnet50_v1b_coco |
38.3/58.7/41.4 |
33.1/54.8/35.0 |
||
mask_rcnn_fpn_resnet50_v1b_coco |
39.2/61.2/42.2 |
35.4/57.5/37.3 |
||
mask_rcnn_resnet101_v1d_coco |
41.3/61.7/44.4 |
35.2/57.8/36.9 |
||
mask_rcnn_fpn_resnet101_v1d_coco |
42.3/63.9/46.2 |
37.7/60.5/40.0 |
PyTorch¶
Models implemented using PyTorch will be added later. Please checkout our MXNet implementation instead.
Reference¶
- 1
He, Kaming, Georgia Gkioxari, Piotr Dollár and Ross Girshick. “Mask R-CNN.” In IEEE International Conference on Computer Vision (ICCV), 2017.
- 2(1,2,3,4,5)
Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
- 3(1,2,3,4,5)
Zhao, Hengshuang, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. “Pyramid scene parsing network.” CVPR, 2017.
- 4(1,2,3,4,5,6,7,8,9,10,11)
Chen, Liang-Chieh, et al. “Rethinking atrous convolution for semantic image segmentation.” arXiv preprint arXiv:1706.05587 (2017).
- 5(1,2)
Zhao, Hengshuang, et al. “ICNet for Real-Time Semantic Segmentation on High-Resolution Images.” ECCV 2018.
- 6
Zhu, Yi, et al. “Improving Semantic Segmentation via Video Propagation and Label Relaxation.” CVPR 2019.
- 7(1,2)
Fu, Jun, et al. “Dual Attention Network for Scene Segmentation.” CVPR 2019.
- 8
Poudel, Rudra, et al. “Fast-SCNN: Fast Semantic Segmentation Network.” BMVC 2019.
- 9(1,2,3,4)
Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi Zhang, Haibin Lin, Yue Sun, Tong He, Jonas Muller, R. Manmatha, Mu Li and Alex Smola “ResNeSt: Split-Attention Network” arXiv preprint (2020).
- 10
Yi Zhu, Zhongyue Zhang, Chongruo Wu, Zhi Zhang, Tong He, Hang Zhang, R. Manmatha, Mu Li and Alexander Smola “Improving Semantic Segmentation via Self-Training” arXiv preprint arXiv:2004.14960 (2020).