gluoncv.data

This module provides data loaders and transformers for popular vision datasets.

Hint

Please refer to Prepare Datasets for the description of the datasets listed in this page, and how to download and extract them.

Hint

For small dataset such as MNIST and CIFAR10, please refer to GluonCV Datasets, which can be used directly without any downloading step.

ImageNet

gluoncv.data.ImageNet

Load the ImageNet classification dataset.

Pascal VOC

gluoncv.data.VOCDetection

Pascal VOC detection Dataset.

gluoncv.data.VOCSegmentation

Pascal VOC Semantic Segmentation Dataset.

gluoncv.data.VOCAugSegmentation

Pascal VOC Augmented Semantic Segmentation Dataset.

COCO

gluoncv.data.COCODetection

MS COCO detection dataset.

gluoncv.data.COCOInstance

MS COCO instance segmentation dataset.

ADE20K

gluoncv.data.ADE20KSegmentation

ADE20K Semantic Segmentation Dataset.

Kinetics400

gluoncv.data.Kinetics400

Load the Kinetics400 video action recognition dataset.

Customized Dataset

gluoncv.data.LstDetection

Detection dataset loaded from LST file and raw images.

gluoncv.data.RecordFileDetection

Detection dataset loaded from record file.

API Reference

class gluoncv.data.ImageNet(root='~/.mxnet/datasets/imagenet', train=True, transform=None)[source]

Load the ImageNet classification dataset.

Refer to Prepare the ImageNet dataset for the description of this dataset and how to prepare it.

Parameters
  • root (str, default '~/.mxnet/datasets/imagenet') – Path to the folder stored the dataset.

  • train (bool, default True) – Whether to load the training or validation set.

  • transform (function, default None) – A function that takes data and label and transforms them. Refer to ./transforms for examples. (TODO, should we restrict its datatype to transformer?)

class gluoncv.data.VOCDetection(root='~/.mxnet/datasets/voc', splits=((2007, 'trainval'), (2012, 'trainval')), transform=None, index_map=None, preload_label=True)[source]

Pascal VOC detection Dataset.

Parameters
  • root (str, default '~/mxnet/datasets/voc') – Path to folder storing the dataset.

  • splits (list of tuples, default ((2007, 'trainval'), (2012, 'trainval'))) – List of combinations of (year, name) For years, candidates can be: 2007, 2012. For names, candidates can be: ‘train’, ‘val’, ‘trainval’, ‘test’.

  • transform (callable, default None) –

    A function that takes data and label and transforms them. Refer to ./transforms for examples.

    A transform function for object detection should take label into consideration, because any geometric modification will require label to be modified.

  • index_map (dict, default None) – In default, the 20 classes are mapped into indices from 0 to 19. We can customize it by providing a str to int dict specifying how to map class names to indices. Use by advanced users only, when you want to swap the orders of class labels.

  • preload_label (bool, default True) – If True, then parse and load all labels into memory during initialization. It often accelerate speed but require more memory usage. Typical preloaded labels took tens of MB. You only need to disable it when your dataset is extremely large.

class gluoncv.data.VOCSegmentation(root='/root/.mxnet/datasets/voc', split='train', mode=None, transform=None, **kwargs)[source]

Pascal VOC Semantic Segmentation Dataset.

Parameters
  • root (string) – Path to VOCdevkit folder. Default is ‘$(HOME)/mxnet/datasets/voc’

  • split (string) – ‘train’, ‘val’ or ‘test’

  • transform (callable, optional) – A function that transforms the image

Examples

>>> from mxnet.gluon.data.vision import transforms
>>> # Transforms for Normalization
>>> input_transform = transforms.Compose([
>>>     transforms.ToTensor(),
>>>     transforms.Normalize([.485, .456, .406], [.229, .224, .225]),
>>> ])
>>> # Create Dataset
>>> trainset = gluoncv.data.VOCSegmentation(split='train', transform=input_transform)
>>> # Create Training Loader
>>> train_data = gluon.data.DataLoader(
>>>     trainset, 4, shuffle=True, last_batch='rollover',
>>>     num_workers=4)
class gluoncv.data.VOCAugSegmentation(root='/root/.mxnet/datasets/voc', split='train', mode=None, transform=None, **kwargs)[source]

Pascal VOC Augmented Semantic Segmentation Dataset.

Parameters
  • root (string) – Path to VOCdevkit folder. Default is ‘$(HOME)/mxnet/datasplits/voc’

  • split (string) – ‘train’ or ‘val’

  • transform (callable, optional) – A function that transforms the image

Examples

>>> from mxnet.gluon.data.vision import transforms
>>> # Transforms for Normalization
>>> input_transform = transforms.Compose([
>>>     transforms.ToTensor(),
>>>     transforms.Normalize([.485, .456, .406], [.229, .224, .225]),
>>> ])
>>> # Create Dataset
>>> trainset = gluoncv.data.VOCAugSegmentation(split='train', transform=input_transform)
>>> # Create Training Loader
>>> train_data = gluon.data.DataLoader(
>>>     trainset, 4, shuffle=True, last_batch='rollover',
>>>     num_workers=4)
class gluoncv.data.COCODetection(root='~/.mxnet/datasets/coco', splits=('instances_val2017'), transform=None, min_object_area=0, skip_empty=True, use_crowd=True)[source]

MS COCO detection dataset.

Parameters
  • root (str, default '~/.mxnet/datasets/coco') – Path to folder storing the dataset.

  • splits (list of str, default ['instances_val2017']) – Json annotations name. Candidates can be: instances_val2017, instances_train2017.

  • transform (callable, default None) –

    A function that takes data and label and transforms them. Refer to ./transforms for examples.

    A transform function for object detection should take label into consideration, because any geometric modification will require label to be modified.

  • min_object_area (float) – Minimum accepted ground-truth area, if an object’s area is smaller than this value, it will be ignored.

  • skip_empty (bool, default is True) – Whether skip images with no valid object. This should be True in training, otherwise it will cause undefined behavior.

  • use_crowd (bool, default is True) – Whether use boxes labeled as crowd instance.

class gluoncv.data.COCOInstance(root='~/.mxnet/datasets/coco', splits=('instances_val2017'), transform=None, min_object_area=1, skip_empty=True)[source]

MS COCO instance segmentation dataset.

Parameters
  • root (str, default '~/mxnet/datasets/coco') – Path to folder storing the dataset.

  • splits (list of str, default ['instances_val2017']) – Json annotations name. Candidates can be: instances_val2017, instances_train2017.

  • transform (callable, default None) –

    A function that takes data and label and transforms them. Refer to ./transforms for examples.

    A transform function for object detection should take label into consideration, because any geometric modification will require label to be modified.

  • min_object_area (float, default is 1) – Minimum accepted ground-truth area, if an object’s area is smaller than this value, it will be ignored.

  • skip_empty (bool, default is True) – Whether skip images with no valid object. This should be True in training, otherwise it will cause undefined behavior.

class gluoncv.data.ADE20KSegmentation(root='/root/.mxnet/datasets/ade', split='train', mode=None, transform=None, **kwargs)[source]

ADE20K Semantic Segmentation Dataset.

Parameters
  • root (string) – Path to VOCdevkit folder. Default is ‘$(HOME)/mxnet/datasplits/ade’

  • split (string) – ‘train’, ‘val’ or ‘test’

  • transform (callable, optional) – A function that transforms the image

Examples

>>> from mxnet.gluon.data.vision import transforms
>>> # Transforms for Normalization
>>> input_transform = transforms.Compose([
>>>     transforms.ToTensor(),
>>>     transforms.Normalize([.485, .456, .406], [.229, .224, .225]),
>>> ])
>>> # Create Dataset
>>> trainset = gluoncv.data.ADE20KSegmentation(split='train', transform=input_transform)
>>> # Create Training Loader
>>> train_data = gluon.data.DataLoader(
>>>     trainset, 4, shuffle=True, last_batch='rollover',
>>>     num_workers=4)
class gluoncv.data.Kinetics400(root='/root/.mxnet/datasets/kinetics400/rawframes_train', setting='/root/.mxnet/datasets/kinetics400/kinetics400_train_list_rawframes.txt', train=True, test_mode=False, name_pattern='img_%05d.jpg', video_ext='mp4', is_color=True, modality='rgb', num_segments=1, num_crop=1, new_length=1, new_step=1, new_width=340, new_height=256, target_width=224, target_height=224, temporal_jitter=False, video_loader=False, use_decord=False, slowfast=False, slow_temporal_stride=16, fast_temporal_stride=2, data_aug='v1', lazy_init=False, transform=None)[source]

Load the Kinetics400 video action recognition dataset.

Refer to Prepare the Kinetics400 dataset for the description of this dataset and how to prepare it.

Parameters
  • root (str, required. Default '~/.mxnet/datasets/kinetics400/rawframes_train'.) – Path to the root folder storing the dataset.

  • setting (str, required.) – A text file describing the dataset, each line per video sample. There are three items in each line: (1) video path; (2) video length and (3) video label.

  • train (bool, default True.) – Whether to load the training or validation set.

  • test_mode (bool, default False.) – Whether to perform evaluation on the test set. Usually there is three-crop or ten-crop evaluation strategy involved.

  • name_pattern (str, default None.) – The naming pattern of the decoded video frames. For example, img_00012.jpg.

  • video_ext (str, default 'mp4'.) – If video_loader is set to True, please specify the video format accordinly.

  • is_color (bool, default True.) – Whether the loaded image is color or grayscale.

  • modality (str, default 'rgb'.) – Input modalities, we support only rgb video frames for now. Will add support for rgb difference image and optical flow image later.

  • num_segments (int, default 1.) – Number of segments to evenly divide the video into clips. A useful technique to obtain global video-level information. Limin Wang, etal, Temporal Segment Networks: Towards Good Practices for Deep Action Recognition, ECCV 2016.

  • num_crop (int, default 1.) – Number of crops for each image. default is 1. Common choices are three crops and ten crops during evaluation.

  • new_length (int, default 1.) – The length of input video clip. Default is a single image, but it can be multiple video frames. For example, new_length=16 means we will extract a video clip of consecutive 16 frames.

  • new_step (int, default 1.) – Temporal sampling rate. For example, new_step=1 means we will extract a video clip of consecutive frames. new_step=2 means we will extract a video clip of every other frame.

  • new_width (int, default 340.) – Scale the width of loaded image to ‘new_width’ for later multiscale cropping and resizing.

  • new_height (int, default 256.) – Scale the height of loaded image to ‘new_height’ for later multiscale cropping and resizing.

  • target_width (int, default 224.) – Scale the width of transformed image to the same ‘target_width’ for batch forwarding.

  • target_height (int, default 224.) – Scale the height of transformed image to the same ‘target_height’ for batch forwarding.

  • temporal_jitter (bool, default False.) – Whether to temporally jitter if new_step > 1.

  • video_loader (bool, default False.) – Whether to use video loader to load data.

  • use_decord (bool, default True.) – Whether to use Decord video loader to load data. Otherwise use mmcv video loader.

  • transform (function, default None.) – A function that takes data and label and transforms them.

  • slowfast (bool, default False.) – If set to True, use data loader designed for SlowFast network. Christoph Feichtenhofer, etal, SlowFast Networks for Video Recognition, ICCV 2019.

  • slow_temporal_stride (int, default 16.) – The temporal stride for sparse sampling of video frames in slow branch of a SlowFast network.

  • fast_temporal_stride (int, default 2.) – The temporal stride for sparse sampling of video frames in fast branch of a SlowFast network.

  • data_aug (str, default 'v1'.) – Different types of data augmentation auto. Supports v1, v2, v3 and v4.

  • lazy_init (bool, default False.) – If set to True, build a dataset instance without loading any dataset.

class gluoncv.data.DetectionDataLoader(dataset, batch_size=None, shuffle=False, sampler=None, last_batch=None, batch_sampler=None, batchify_fn=None, num_workers=0)[source]

Data loader for detection dataset.

Deprecated since version 0.2.0: DetectionDataLoader is deprecated, please use mxnet.gluon.data.DataLoader with batchify functions listed in gluoncv.data.batchify directly.

It loads data batches from a dataset and then apply data transformations. It’s a subclass of mxnet.gluon.data.DataLoader, and therefore has very similar APIs.

The main purpose of the DataLoader is to pad variable length of labels from each image, because they have different amount of objects.

Parameters
  • dataset (mxnet.gluon.data.Dataset or numpy.ndarray or mxnet.ndarray.NDArray) – The source dataset.

  • batch_size (int) – The size of mini-batch.

  • shuffle (bool, default False) – If or not randomly shuffle the samples. Often use True for training dataset and False for validation/test datasets

  • sampler (mxnet.gluon.data.Sampler, default None) – The sampler to use. We should either specify a sampler or enable shuffle, not both, because random shuffling is a sampling method.

  • last_batch ({'keep', 'discard', 'rollover'}, default is keep) –

    How to handle the last batch if the batch size does not evenly divide by the number of examples in the dataset. There are three options to deal with the last batch if its size is smaller than the specified batch size.

    • keep: keep it

    • discard: throw it away

    • rollover: insert the examples to the beginning of the next batch

  • batch_sampler (mxnet.gluon.data.BatchSampler) – A sampler that returns mini-batches. Do not specify batch_size, shuffle, sampler, and last_batch if batch_sampler is specified.

  • batchify_fn (callable) –

    Callback function to allow users to specify how to merge samples into a batch. Defaults to gluoncv.data.dataloader.default_pad_batchify_fn():

    def default_pad_batchify_fn(data):
        if isinstance(data[0], nd.NDArray):
            return nd.stack(*data)
        elif isinstance(data[0], tuple):
            data = zip(*data)
            return [pad_batchify(i) for i in data]
        else:
            data = np.asarray(data)
            pad = max([l.shape[0] for l in data])
            buf = np.full((len(data), pad, data[0].shape[-1]),
                          -1, dtype=data[0].dtype)
            for i, l in enumerate(data):
                buf[i][:l.shape[0], :] = l
            return nd.array(buf, dtype=data[0].dtype)
    

  • num_workers (int, default 0) – The number of multiprocessing workers to use for data preprocessing. If num_workers = 0, multiprocessing is disabled. Otherwise num_workers multiprocessing worker is used to process data.

class gluoncv.data.LstDetection(filename, root='', flag=1, coord_normalized=True)[source]

Detection dataset loaded from LST file and raw images. LST file is a pure text file but with special label format.

Checkout 1. Preferred Object Detection Format for GluonCV and MXNet for tutorial of how to prepare this file.

Parameters
  • filename (type) – Description of parameter filename.

  • root (str) – Relative image root folder for filenames in LST file.

  • flag (int, default is 1) – Use 1 for color images, and 0 for gray images.

  • coord_normalized (boolean) – Indicate whether bounding box coordinates haved been normalized to (0, 1) in labels. If so, we will rescale back to absolute coordinates by multiplying width or height.

class gluoncv.data.RecordFileDetection(filename, coord_normalized=True)[source]

Detection dataset loaded from record file. The supported record file is using the same format used by mxnet.image.ImageDetIter() and mxnet.io.ImageDetRecordIter().

Checkout 1. Preferred Object Detection Format for GluonCV and MXNet for tutorial of how to prepare this file.

Note

We suggest you to use RecordFileDetection only if you are familiar with the record files.

Parameters
  • filename (str) – Path of the record file. It require both *.rec and *.idx file in the same directory, where raw image and labels are stored in *.rec file for better IO performance, *.idx file is used to provide random access to the binary file.

  • coord_normalized (boolean) – Indicate whether bounding box coordinates have been normalized to (0, 1) in labels. If so, we will rescale back to absolute coordinates by multiplying width or height.

Examples

>>> record_dataset = RecordFileDetection('train.rec')
>>> img, label = record_dataset[0]
>>> print(img.shape, label.shape)
(512, 512, 3) (1, 5)