Table Of Contents
Table Of Contents

09. Run an object detection model on your webcam

This article will shows how to play with pre-trained object detection models by running them directly on your webcam video stream.


  • This tutorial has only been tested in a MacOS environment

  • Python packages required: cv2, matplotlib

  • You need a webcam :)

  • Python compatible with matplotlib rendering, installed as a framework in MacOS see guide here

Loading the model and webcam

Finished preparation? Let’s get started! First, import the necessary libraries into python.

import time

import cv2
import gluoncv as gcv
import mxnet as mx

In this tutorial we use ssd_512_mobilenet1.0_voc, a snappy network with good accuracy that should be well above 1 frame per second on most laptops. Feel free to try a different model from the Gluon Model Zoo !

# Load the model
net = gcv.model_zoo.get_model('ssd_512_mobilenet1.0_voc', pretrained=True)

We create the webcam handler in opencv to be able to acquire the frames:

# Load the webcam handler
cap = cv2.VideoCapture(0)
time.sleep(1) ### letting the camera autofocus

Detection loop

The detection loop consists of four phases:

  • loading the webcam frame

  • pre-processing the image

  • running the image through the network

  • updating the output with the resulting predictions

axes = None
NUM_FRAMES = 200 # you can change this
for i in range(NUM_FRAMES):
    # Load frame from the camera
    ret, frame =

    # Image pre-processing
    frame = mx.nd.array(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)).astype('uint8')
    rgb_nd, frame =, short=512, max_size=700)

    # Run frame through network
    class_IDs, scores, bounding_boxes = net(rgb_nd)

    # Display the result
    img = gcv.utils.viz.cv_plot_bbox(frame, bounding_boxes[0], scores[0], class_IDs[0], class_names=net.classes)

We release the webcam before exiting the script



Download the script to run the demo:


Run the script using pythonw on MacOS:

pythonw --num-frames 200


On MacOS, to enable matplotlib rendering you need python installed as a framework, see guide here

If all goes well you should be able to detect objects from the available classes of the VOC dataset. That includes persons, chairs and TV Screens!

Total running time of the script: ( 0 minutes 0.066 seconds)

Gallery generated by Sphinx-Gallery