Detection

Visualization of Inference Throughputs vs. Validation mAP of COCO pre-trained models is illustrated in the first graph.

../_images/plot_help.png my plot

We also provide a detailed interactive analysis of all 80 object categories.

my plot

The following tables list pre-trained models for object detection and their performances with more details.

Hint

Model attributes are coded in their names. For instance, ssd_300_vgg16_atrous_voc consists of four parts:

  • ssd indicate the algorithm is “Single Shot Multibox Object Detection” [1].
  • 300 is the training image size, which means training images are resized to 300x300 and all anchor boxes are designed to match this shape. This may not apply to some models.
  • vgg16_atrous is the type of base feature extractor network.
  • voc is the training dataset. You can choose voc or coco, etc.
  • (320x320) indicate that the model was evaluated with resolution 320x320. If not otherwise specified, all detection models in GluonCV can take various input shapes for prediction. Some models are trained with various input data shapes, e.g., Faster-RCNN and YOLO models.

Hint

The training commands work with the following scripts:

Pascal VOC

Hint

For Pascal VOC dataset, training image set is the union of 2007trainval and 2012trainval and validation image set is 2007test.

The VOC metric, mean Average Precision (mAP) across all classes with IoU threshold 0.5 is reported.

SSD

Checkout SSD demo tutorial here: 1. Predict with pre-trained SSD models

Model mAP Training Command Training log
ssd_300_vgg16_atrous_voc [1] 77.6 shell script log
ssd_512_vgg16_atrous_voc [1] 79.2 shell script log
ssd_512_resnet50_v1_voc [1] 80.1 shell script log
ssd_512_mobilenet1.0_voc [1] 75.4 shell script log

Faster-RCNN

Faster-RCNN models of VOC dataset are evaluated with native resolutions with shorter side >= 600 but longer side <= 1000 without changing aspect ratios.

Checkout Faster-RCNN demo tutorial here: 2. Predict with pre-trained Faster RCNN models

Model mAP Training Command Training log
faster_rcnn_resnet50_v1b_voc [2] 78.3 shell script log

YOLO-v3

YOLO-v3 models can be evaluated and used for prediction at different resolutions. Different mAPs are reported with various evaluation resolutions, however, the models are identical.

Checkout YOLO demo tutorial here: 3. Predict with pre-trained YOLO models

Model mAP Training Command Training log
yolo3_darknet53_voc [3] (320x320) 79.3 shell script log
yolo3_darknet53_voc [3] (416x416) 81.5 shell script log

MS COCO

Hint

For COCO dataset, training imageset is train2017 and validation imageset is val2017.

The COCO metric, Average Precision (AP) with IoU threshold 0.5:0.95 (averaged 10 values, AP 0.5:0.95), 0.5 (AP 0.5) and 0.75 (AP 0.75) are reported together in the format (AP 0.5:0.95)/(AP 0.5)/(AP 0.75).

For object detection task, only box overlap based AP is evaluated and reported.

SSD

Checkout SSD demo tutorial here: 1. Predict with pre-trained SSD models

Model Box AP Training Command Training Log
ssd_300_vgg16_atrous_coco [1] 25.1/42.9/25.8 shell script log
ssd_512_vgg16_atrous_coco [1] 28.9/47.9/30.6 shell script log
ssd_512_resnet50_v1_coco [1] 30.6/50.0/32.2 shell script log
ssd_512_mobilenet1.0_coco [1] 21.7/39.2/21.3 shell script log

Faster-RCNN

Faster-RCNN models of VOC dataset are evaluated with native resolutions with shorter side >= 800 but longer side <= 1300 without changing aspect ratios.

Checkout Faster-RCNN demo tutorial here: 2. Predict with pre-trained Faster RCNN models

Model Box AP Training Command Training Log
faster_rcnn_resnet50_v1b_coco [2] 37.0/57.8/39.6 shell script log
faster_rcnn_resnet101_v1d_coco [2] 40.1/60.9/43.3 shell script log

YOLO-v3

YOLO-v3 models can be evaluated and used for prediction at different resolutions. Different mAPs are reported with various evaluation resolutions.

Checkout YOLO demo tutorial here: 3. Predict with pre-trained YOLO models

Model Box AP Training Command Training Log
yolo3_darknet53_coco [3] (320x320) 33.6/54.1/35.8 shell script log
yolo3_darknet53_coco [3] (416x416) 36.0/57.2/38.7 shell script log
yolo3_darknet53_coco [3] (608x608) 37.0/58.2/40.1 shell script log
[1](1, 2, 3, 4, 5, 6, 7, 8, 9, 10) Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.
[2](1, 2, 3, 4) Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. “Faster r-cnn: Towards real-time object detection with region proposal networks.” In Advances in neural information processing systems, pp. 91-99. 2015.
[3](1, 2, 3, 4, 5, 6) Redmon, Joseph, and Ali Farhadi. “Yolov3: An incremental improvement.” arXiv preprint arXiv:1804.02767 (2018).