2. Predict with pre-trained AlphaPose Estimation models¶

This article shows how to play with pre-trained Alpha Pose models with only a few lines of code.

First let’s import some necessary libraries:

from matplotlib import pyplot as plt
from gluoncv import model_zoo, data, utils
from gluoncv.data.transforms.pose import detector_to_alpha_pose, heatmap_to_coord_alpha_pose

Load a pretrained model¶

Let’s get a Alpha Pose model trained with input images of size 256x192 on MS COCO dataset. We pick the one using ResNet-101 V1b as the base model. By specifying pretrained=True, it will automatically download the model from the model zoo if necessary. For more pretrained models, please refer to Model Zoo.

Note that a Alpha Pose model takes a top-down strategy to estimate human pose in detected bounding boxes from an object detection model.

detector = model_zoo.get_model('yolo3_mobilenet1.0_coco', pretrained=True)
pose_net = model_zoo.get_model('alpha_pose_resnet101_v1b_coco', pretrained=True)

# Note that we can reset the classes of the detector to only include
# human, so that the NMS process is faster.

detector.reset_class(["person"], reuse_weights=['person'])

Out:

Downloading /root/.mxnet/models/alpha_pose_resnet101_v1b_coco-de56b871.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/alpha_pose_resnet101_v1b_coco-de56b871.zip...

  0%|          | 0/216178 [00:00<?, ?KB/s]
  3%|3         | 7137/216178 [00:00<00:02, 71340.98KB/s]
  7%|7         | 15264/216178 [00:00<00:02, 74073.64KB/s]
 11%|#         | 22944/216178 [00:00<00:02, 75289.37KB/s]
 14%|#4        | 30909/216178 [00:00<00:02, 76982.03KB/s]
 18%|#7        | 38609/216178 [00:00<00:02, 60253.24KB/s]
 22%|##1       | 47076/216178 [00:00<00:02, 67223.05KB/s]
 25%|##5       | 54236/216178 [00:00<00:02, 64381.97KB/s]
 29%|##8       | 62266/216178 [00:00<00:02, 68838.61KB/s]
 32%|###2      | 69677/216178 [00:01<00:02, 70340.03KB/s]
 36%|###6      | 78304/216178 [00:01<00:01, 74956.71KB/s]
 40%|###9      | 85966/216178 [00:01<00:01, 74827.96KB/s]
 43%|####3     | 93744/216178 [00:01<00:01, 75363.92KB/s]
 47%|####6     | 101362/216178 [00:01<00:02, 56882.53KB/s]
 50%|#####     | 108923/216178 [00:01<00:01, 61380.11KB/s]
 54%|#####3    | 115673/216178 [00:01<00:01, 53127.95KB/s]
 57%|#####7    | 124257/216178 [00:01<00:01, 60832.54KB/s]
 61%|######    | 131194/216178 [00:01<00:01, 62437.20KB/s]
 64%|######4   | 139368/216178 [00:02<00:01, 67018.84KB/s]
 68%|######8   | 147186/216178 [00:02<00:00, 70048.16KB/s]
 72%|#######1  | 154986/216178 [00:02<00:00, 72269.71KB/s]
 75%|#######5  | 163121/216178 [00:02<00:00, 74858.74KB/s]
 79%|#######8  | 170777/216178 [00:02<00:00, 74691.34KB/s]
 83%|########2 | 178774/216178 [00:02<00:00, 75970.78KB/s]
 86%|########6 | 186715/216178 [00:02<00:00, 76974.25KB/s]
 90%|########9 | 194489/216178 [00:02<00:00, 77088.66KB/s]
 94%|#########3| 202426/216178 [00:02<00:00, 77762.82KB/s]
 97%|#########7| 210235/216178 [00:03<00:00, 77234.75KB/s]
216179KB [00:03, 70046.85KB/s]

Pre-process an image for detector, and make inference¶

Next we download an image, and pre-process with preset data transforms. Here we specify that we resize the short edge of the image to 512 px. But you can feed an arbitrarily sized image.

This function returns two results. The first is a NDArray with shape (batch_size, RGB_channels, height, width). It can be fed into the model directly. The second one contains the images in numpy format to easy to be plotted. Since we only loaded a single image, the first dimension of x is 1.

im_fname = utils.download('https://github.com/dmlc/web-data/blob/master/' +
                          'gluoncv/pose/soccer.png?raw=true',
                          path='soccer.png')
x, img = data.transforms.presets.yolo.load_test(im_fname, short=512)
print('Shape of pre-processed image:', x.shape)

class_IDs, scores, bounding_boxs = detector(x)

Out:

Shape of pre-processed image: (1, 3, 512, 605)

Process tensor from detector to keypoint network¶

Next we process the output from the detector.

For a Alpha Pose network, it expects the input has the size 256x192, and the human is centered. We crop the bounding boxed area for each human, and resize it to 256x192, then finally normalize it.

In order to make sure the bounding box has included the entire person, we usually slightly upscale the box size.

pose_input, upscale_bbox = detector_to_alpha_pose(img, class_IDs, scores, bounding_boxs)

Predict with a Alpha Pose network¶

Now we can make prediction.

A Alpha Pose network predicts the heatmap for each joint (i.e. keypoint). After the inference we search for the highest value in the heatmap and map it to the coordinates on the original image.

predicted_heatmap = pose_net(pose_input)
pred_coords, confidence = heatmap_to_coord_alpha_pose(predicted_heatmap, upscale_bbox)

Display the pose estimation results¶

We can use gluoncv.utils.viz.plot_keypoints() to visualize the results.

ax = utils.viz.plot_keypoints(img, pred_coords, confidence,
                              class_IDs, bounding_boxs, scores,
                              box_thresh=0.5, keypoint_thresh=0.2)
plt.show()

Total running time of the script: ( 0 minutes 7.768 seconds)

Gallery generated by Sphinx-Gallery