Pose Estimation¶
MXNet Pytorch
MXNet¶
Visualization of Inference Throughputs vs. Validation AP of COCO pre-trained models is illustrated in the following graph. Throughputs are measured with single V100 GPU and batch size 64.
Note
Pose Estimation is released in GluonCV 0.4. Please be sure to update your installation by
pip install gluoncv --upgrade
to try it out.
MS COCO Keypoints¶
Hint
The training commands work with the following scripts:
For Simple Pose 1 networks:
Download train_simple_pose.py
Hint
For COCO dataset, training imageset is train2017 and validation imageset is val2017.
The COCO metric, Average Precision (AP) with IoU threshold 0.5:0.95 (averaged 10 values, AP 0.5:0.95), 0.5 (AP 0.5) and 0.75 (AP 0.75) are reported together in the format (AP 0.5:0.95)/(AP 0.5)/(AP 0.75).
COCO keypoints metrics evaluate Object Keypoint Similarity AP. Please read the official doc for detailed introduction.
By averaging the prediction from the original input and the flipped one, we can get higher performance. Here we report the performance for predictions with and without the flip ensemble.
Simple Pose with ResNet¶
Checkout the demo tutorial here: 1. Predict with pre-trained Simple Pose Estimation models
Most models are trained with input size 256x192, unless specified. Parameters with a grey name can be downloaded by passing the corresponding hashtag.
Download default pretrained weights:
net = get_model('simple_pose_resnet152_v1d', pretrained=True)
Download weights given a hashtag:
net = get_model('simple_pose_resnet152_v1d', pretrained='2f544338')
Model |
OKS AP |
OKS AP (with flip) |
Hashtag |
Training Command |
Training log |
---|---|---|---|---|---|
simple_pose_resnet18_v1b 1 |
66.3/89.2/73.4 |
68.4/90.3/75.7 |
f63d42ac |
||
simple_pose_resnet18_v1b 1 (128x96) |
52.8/83.6/57.9 |
54.5/84.8/60.3 |
ccd24037 |
||
simple_pose_resnet50_v1b 1 |
71.0/91.2/78.6 |
72.2/92.2/79.9 |
e2c7b1ad |
||
simple_pose_resnet50_v1d 1 |
71.6/91.3/78.7 |
73.3/92.4/80.8 |
ba2675b6 |
||
simple_pose_resnet101_v1b 1 |
72.4/92.2/79.8 |
73.7/92.3/81.1 |
b7ec0de1 |
||
simple_pose_resnet101_v1d 1 |
73.0/92.2/80.8 |
74.2/92.4/82.0 |
1f8f48fd |
||
simple_pose_resnet152_v1b 1 |
72.4/92.1/79.6 |
74.2/92.3/82.1 |
ef4e0336 |
||
simple_pose_resnet152_v1d 1 |
73.4/92.3/80.7 |
74.6/93.4/82.1 |
3ca502ea |
||
simple_pose_resnet152_v1d 1 (384x288) |
74.8/92.3/82.0 |
76.1/92.4/83.2 |
2f544338 |
Mobile Pose Models¶
By replacing the backbone network, and use pixel shuffle layer instead of deconvolution, we can have models that are very fast.
These models are suitable for edge device applications, tutorials on deployment will come soon.
Models are trained with input size 256x192, unless specified.
Model |
OKS AP |
OKS AP (with flip) |
Hashtag |
Training Command |
Training log |
---|---|---|---|---|---|
mobile_pose_resnet18_v1b 1 |
66.2/89.2/74.3 |
67.9/90.3/75.7 |
dd6644eb |
||
mobile_pose_resnet50_v1b 1 |
71.1/91.3/78.7 |
72.4/92.3/79.8 |
ec8809df |
||
mobile_pose_mobilenet1.0 1 |
64.1/88.1/71.2 |
65.7/89.2/73.4 |
b399bac7 |
||
mobile_pose_mobilenetv2_1.0 1 |
63.7/88.1/71.0 |
65.0/89.2/72.3 |
4acdc130 |
||
mobile_pose_mobilenetv3_large 1 |
63.7/88.9/70.8 |
64.5/89.0/72.0 |
1ca004dc |
||
mobile_pose_mobilenetv3_small 1 |
54.3/83.7/59.4 |
55.6/84.7/61.7 |
b1b148a9 |
AlphaPose¶
Checkout the demo tutorial here: 2. Predict with pre-trained AlphaPose Estimation models
Alpha Pose models are evaluated with input size (320*256), unless otherwise specified. Usage is similar to simple pose section.
Model |
OKS AP |
OKS AP (with flip) |
Hashtag |
Training Command |
Training log |
---|---|---|---|---|---|
alpha_pose_resnet101_v1b_coco 2 |
74.2/91.6/80.7 |
76.7/92.6/82.9 |
de56b871 |
PyTorch¶
Models implemented using PyTorch will be added later. Please checkout our MXNet implementation instead.
Reference¶
- 1(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)
Xiao, Bin, Haiping Wu, and Yichen Wei. “Simple baselines for human pose estimation and tracking.” Proceedings of the European Conference on Computer Vision (ECCV). 2018.
- 2
Fang, Hao-Shu, et al. “Rmpe: Regional multi-person pose estimation.” Proceedings of the IEEE International Conference on Computer Vision. 2017.