Note
Click here to download the full example code
02. Predict with pre-trained Faster RCNN models¶
This article shows how to play with pre-trained Faster RCNN model.
First let’s import some necessary libraries:
from matplotlib import pyplot as plt
import gluoncv
from gluoncv import model_zoo, data, utils
Load a pretrained model¶
Let’s get an Faster RCNN model trained on Pascal VOC
dataset with ResNet-50 backbone. By specifying
pretrained=True
, it will automatically download the model from the model
zoo if necessary. For more pretrained models, please refer to
Model Zoo.
The returned model is a HybridBlock gluoncv.model_zoo.FasterRCNN
with a default context of cpu(0).
net = model_zoo.get_model('faster_rcnn_resnet50_v1b_voc', pretrained=True)
Out:
Downloading /root/.mxnet/models/faster_rcnn_resnet50_v1b_voc-447328d8.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/faster_rcnn_resnet50_v1b_voc-447328d8.zip...
0%| | 0/121887 [00:00<?, ?KB/s]
0%| | 139/121887 [00:00<01:51, 1089.62KB/s]
1%| | 700/121887 [00:00<00:39, 3046.41KB/s]
2%|2 | 2700/121887 [00:00<00:12, 9767.53KB/s]
6%|5 | 6773/121887 [00:00<00:05, 20970.39KB/s]
13%|#2 | 15238/121887 [00:00<00:02, 42664.05KB/s]
19%|#8 | 22713/121887 [00:00<00:01, 53166.98KB/s]
26%|##5 | 31179/121887 [00:00<00:01, 63201.52KB/s]
32%|###1 | 38601/121887 [00:00<00:01, 66646.10KB/s]
38%|###8 | 46623/121887 [00:00<00:01, 70793.12KB/s]
45%|####4 | 54549/121887 [00:01<00:00, 73382.50KB/s]
51%|##### | 61991/121887 [00:01<00:00, 73695.64KB/s]
58%|#####7 | 70392/121887 [00:01<00:00, 76818.19KB/s]
64%|######4 | 78104/121887 [00:01<00:00, 69034.10KB/s]
70%|######9 | 85176/121887 [00:01<00:00, 53388.30KB/s]
76%|#######6 | 92922/121887 [00:01<00:00, 59031.93KB/s]
82%|########2 | 100493/121887 [00:01<00:00, 63215.83KB/s]
89%|########8 | 108120/121887 [00:01<00:00, 66656.75KB/s]
95%|#########4| 115184/121887 [00:02<00:00, 66136.97KB/s]
121888KB [00:02, 58292.22KB/s]
Pre-process an image¶
Next we download an image, and pre-process with preset data transforms. The default behavior is to resize the short edge of the image to 600px. But you can feed an arbitrarily sized image.
You can provide a list of image file names, such as [im_fname1, im_fname2,
...]
to gluoncv.data.transforms.presets.rcnn.load_test()
if you
want to load multiple image together.
This function returns two results. The first is a NDArray with shape (batch_size, RGB_channels, height, width). It can be fed into the model directly. The second one contains the images in numpy format to easy to be plotted. Since we only loaded a single image, the first dimension of x is 1.
Please beware that orig_img is resized to short edge 600px.
im_fname = utils.download('https://github.com/dmlc/web-data/blob/master/' +
'gluoncv/detection/biking.jpg?raw=true',
path='biking.jpg')
x, orig_img = data.transforms.presets.rcnn.load_test(im_fname)
Out:
Downloading biking.jpg from https://github.com/dmlc/web-data/blob/master/gluoncv/detection/biking.jpg?raw=true...
0%| | 0/244 [00:00<?, ?KB/s]
100%|##########| 244/244 [00:00<00:00, 56824.55KB/s]
Inference and display¶
The Faster RCNN model returns predicted class IDs, confidence scores, bounding boxes coordinates. Their shape are (batch_size, num_bboxes, 1), (batch_size, num_bboxes, 1) and (batch_size, num_bboxes, 4), respectively.
We can use gluoncv.utils.viz.plot_bbox()
to visualize the
results. We slice the results for the first image and feed them into plot_bbox:
Total running time of the script: ( 0 minutes 5.991 seconds)