gluoncv.data

This module provides data loaders and transfomers for popular vision datasets.

Hint

Please refer to Prepare Datasets for the description of the datasets listed in this page, and how to download and extract them.

Hint

For small dataset such as MNIST and CIFAR10, please refer to GluonCV Datasets, which can be used directly without any downloading step.

ImageNet

gluoncv.data.ImageNet Load the ImageNet classification dataset.

Pascal VOC

gluoncv.data.VOCDetection Pascal VOC detection Dataset.
gluoncv.data.VOCSegmentation Pascal VOC Semantic Segmentation Dataset.
gluoncv.data.VOCAugSegmentation Pascal VOC Augmented Semantic Segmentation Dataset.

COCO

gluoncv.data.COCODetection MS COCO detection dataset.
gluoncv.data.COCOInstance MS COCO instance segmentation dataset.

ADE20K

gluoncv.data.ADE20KSegmentation ADE20K Semantic Segmentation Dataset.

Customized Dataset

gluoncv.data.LstDetection Detection dataset loaded from LST file and raw images.
gluoncv.data.RecordFileDetection Detection dataset loaded from record file.

API Reference

class gluoncv.data.ImageNet(root='~/.mxnet/datasets/imagenet', train=True, transform=None)[source]

Load the ImageNet classification dataset.

Refer to Prepare the ImageNet dataset for the description of this dataset and how to prepare it.

Parameters:
  • root (str, default '~/.mxnet/datasets/imagenet') – Path to the folder stored the dataset.
  • train (bool, default True) – Whether to load the training or validation set.
  • transform (function, default None) – A function that takes data and label and transforms them. Refer to ./transforms for examples. (TODO, should we restrict its datatype to transformer?)
class gluoncv.data.VOCDetection(root='~/.mxnet/datasets/voc', splits=((2007, 'trainval'), (2012, 'trainval')), transform=None, index_map=None, preload_label=True)[source]

Pascal VOC detection Dataset.

Parameters:
  • root (str, default '~/mxnet/datasets/voc') – Path to folder storing the dataset.
  • splits (list of tuples, default ((2007, 'trainval'), (2012, 'trainval'))) – List of combinations of (year, name) For years, candidates can be: 2007, 2012. For names, candidates can be: ‘train’, ‘val’, ‘trainval’, ‘test’.
  • transform (callable, defaut None) –

    A function that takes data and label and transforms them. Refer to ./transforms for examples.

    A transform function for object detection should take label into consideration, because any geometric modification will require label to be modified.

  • index_map (dict, default None) – In default, the 20 classes are mapped into indices from 0 to 19. We can customize it by providing a str to int dict specifying how to map class names to indicies. Use by advanced users only, when you want to swap the orders of class labels.
  • preload_label (bool, default True) – If True, then parse and load all labels into memory during initialization. It often accelerate speed but require more memory usage. Typical preloaded labels took tens of MB. You only need to disable it when your dataset is extreamly large.
class gluoncv.data.VOCSegmentation(root='/var/lib/jenkins/.mxnet/datasets/voc', split='train', mode=None, transform=None, **kwargs)[source]

Pascal VOC Semantic Segmentation Dataset.

Parameters:
  • root (string) – Path to VOCdevkit folder. Default is ‘$(HOME)/mxnet/datasets/voc’
  • split (string) – ‘train’, ‘val’ or ‘test’
  • transform (callable, optional) – A function that transforms the image

Examples

>>> from mxnet.gluon.data.vision import transforms
>>> # Transforms for Normalization
>>> input_transform = transforms.Compose([
>>>     transforms.ToTensor(),
>>>     transforms.Normalize([.485, .456, .406], [.229, .224, .225]),
>>> ])
>>> # Create Dataset
>>> trainset = gluoncv.data.VOCSegmentation(split='train', transform=input_transform)
>>> # Create Training Loader
>>> train_data = gluon.data.DataLoader(
>>>     trainset, 4, shuffle=True, last_batch='rollover',
>>>     num_workers=4)
class gluoncv.data.VOCAugSegmentation(root='/var/lib/jenkins/.mxnet/datasets/voc', split='train', mode=None, transform=None, **kwargs)[source]

Pascal VOC Augmented Semantic Segmentation Dataset.

Parameters:
  • root (string) – Path to VOCdevkit folder. Default is ‘$(HOME)/mxnet/datasplits/voc’
  • split (string) – ‘train’ or ‘val’
  • transform (callable, optional) – A function that transforms the image

Examples

>>> from mxnet.gluon.data.vision import transforms
>>> # Transforms for Normalization
>>> input_transform = transforms.Compose([
>>>     transforms.ToTensor(),
>>>     transforms.Normalize([.485, .456, .406], [.229, .224, .225]),
>>> ])
>>> # Create Dataset
>>> trainset = gluoncv.data.VOCAugSegmentation(split='train', transform=input_transform)
>>> # Create Training Loader
>>> train_data = gluon.data.DataLoader(
>>>     trainset, 4, shuffle=True, last_batch='rollover',
>>>     num_workers=4)
class gluoncv.data.COCODetection(root='~/.mxnet/datasets/coco', splits=('instances_val2017', ), transform=None, min_object_area=0, skip_empty=True, use_crowd=True)[source]

MS COCO detection dataset.

Parameters:
  • root (str, default '~/mxnet/datasets/voc') – Path to folder storing the dataset.
  • splits (list of str, default ['instances_val2017']) – Json annotations name. Candidates can be: instances_val2017, instances_train2017.
  • transform (callable, defaut None) –

    A function that takes data and label and transforms them. Refer to ./transforms for examples.

    A transform function for object detection should take label into consideration, because any geometric modification will require label to be modified.

  • min_object_area (float) – Minimum accepted ground-truth area, if an object’s area is smaller than this value, it will be ignored.
  • skip_empty (bool, default is True) – Whether skip images with no valid object. This should be True in training, otherwise it will cause undefined behavior.
  • use_crowd (bool, default is True) – Whether use boxes labeled as crowd instance.
class gluoncv.data.COCOInstance(root='~/.mxnet/datasets/coco', splits=('instances_val2017', ), transform=None, min_object_area=1, skip_empty=True)[source]

MS COCO instance segmentation dataset.

Parameters:
  • root (str, default '~/mxnet/datasets/voc') – Path to folder storing the dataset.
  • splits (list of str, default ['instances_val2017']) – Json annotations name. Candidates can be: instances_val2017, instances_train2017.
  • transform (callable, defaut None) –

    A function that takes data and label and transforms them. Refer to ./transforms for examples.

    A transform function for object detection should take label into consideration, because any geometric modification will require label to be modified.

  • min_object_area (float, default is 1) – Minimum accepted ground-truth area, if an object’s area is smaller than this value, it will be ignored.
  • skip_empty (bool, default is True) – Whether skip images with no valid object. This should be True in training, otherwise it will cause undefined behavior.
class gluoncv.data.ADE20KSegmentation(root='/var/lib/jenkins/.mxnet/datasets/ade', split='train', mode=None, transform=None, **kwargs)[source]

ADE20K Semantic Segmentation Dataset.

Parameters:
  • root (string) – Path to VOCdevkit folder. Default is ‘$(HOME)/mxnet/datasplits/ade’
  • split (string) – ‘train’, ‘val’ or ‘test’
  • transform (callable, optional) – A function that transforms the image

Examples

>>> from mxnet.gluon.data.vision import transforms
>>> # Transforms for Normalization
>>> input_transform = transforms.Compose([
>>>     transforms.ToTensor(),
>>>     transforms.Normalize([.485, .456, .406], [.229, .224, .225]),
>>> ])
>>> # Create Dataset
>>> trainset = gluoncv.data.ADE20KSegmentation(split='train', transform=input_transform)
>>> # Create Training Loader
>>> train_data = gluon.data.DataLoader(
>>>     trainset, 4, shuffle=True, last_batch='rollover',
>>>     num_workers=4)
class gluoncv.data.DetectionDataLoader(dataset, batch_size=None, shuffle=False, sampler=None, last_batch=None, batch_sampler=None, batchify_fn=None, num_workers=0)[source]

Data loader for detection dataset.

Deprecated since version 0.2.0: DetectionDataLoader is deprecated, please use mxnet.gluon.data.DataLoader with batchify functions listed in gluoncv.data.batchify directly.

It loads data batches from a dataset and then apply data transformations. It’s a subclass of mxnet.gluon.data.DataLoader, and therefore has very simliar APIs.

The main purpose of the DataLoader is to pad variable length of labels from each image, because they have different amount of objects.

Parameters:
  • dataset (mxnet.gluon.data.Dataset or numpy.ndarray or mxnet.ndarray.NDArray) – The source dataset.
  • batch_size (int) – The size of mini-batch.
  • shuffle (bool, default False) – If or not randomly shuffle the samples. Often use True for training dataset and False for validation/test datasets
  • sampler (mxnet.gluon.data.Sampler, default None) – The sampler to use. We should either specify a sampler or enable shuffle, not both, because random shuffling is a sampling method.
  • last_batch ({'keep', 'discard', 'rollover'}, default is keep) –

    How to handle the last batch if the batch size does not evenly divide by the number of examples in the dataset. There are three options to deal with the last batch if its size is smaller than the specified batch size.

    • keep: keep it
    • discard: throw it away
    • rollover: insert the examples to the beginning of the next batch
  • batch_sampler (mxnet.gluon.data.BatchSampler) – A sampler that returns mini-batches. Do not specify batch_size, shuffle, sampler, and last_batch if batch_sampler is specified.
  • batchify_fn (callable) –

    Callback function to allow users to specify how to merge samples into a batch. Defaults to gluoncv.data.dataloader.default_pad_batchify_fn():

    def default_pad_batchify_fn(data):
        if isinstance(data[0], nd.NDArray):
            return nd.stack(*data)
        elif isinstance(data[0], tuple):
            data = zip(*data)
            return [pad_batchify(i) for i in data]
        else:
            data = np.asarray(data)
            pad = max([l.shape[0] for l in data])
            buf = np.full((len(data), pad, data[0].shape[-1]),
                          -1, dtype=data[0].dtype)
            for i, l in enumerate(data):
                buf[i][:l.shape[0], :] = l
            return nd.array(buf, dtype=data[0].dtype)
    
  • num_workers (int, default 0) – The number of multiprocessing workers to use for data preprocessing. If num_workers = 0, multiprocessing is disabled. Otherwise num_workers multiprocessing worker is used to process data.
class gluoncv.data.LstDetection(filename, root='', flag=1, coord_normalized=True)[source]

Detection dataset loaded from LST file and raw images. LST file is a pure text file but with special label format.

Checkout 1. Preferred Object Detection Format for GluonCV and MXNet for tutorial of how to prepare this file.

Parameters:
  • filename (type) – Description of parameter filename.
  • root (str) – Relative image root folder for filenames in LST file.
  • flag (int, default is 1) – Use 1 for color images, and 0 for gray images.
  • coord_normalized (boolean) – Indicate whether bounding box coordinates haved been normalized to (0, 1) in labels. If so, we will rescale back to absolute coordinates by multiplying width or height.
class gluoncv.data.RecordFileDetection(filename, coord_normalized=True)[source]

Detection dataset loaded from record file. The supported record file is using the same format used by mxnet.image.ImageDetIter() and mxnet.io.ImageDetRecordIter().

Checkout 1. Preferred Object Detection Format for GluonCV and MXNet for tutorial of how to prepare this file.

Note

We suggest you to use RecordFileDetection only if you are familier with the record files.

Parameters:
  • filename (str) – Path of the record file. It require both *.rec and *.idx file in the same directory, where raw image and labels are stored in *.rec file for better IO performance, *.idx file is used to provide random access to the binary file.
  • coord_normalized (boolean) – Indicate whether bounding box coordinates haved been normalized to (0, 1) in labels. If so, we will rescale back to absolute coordinates by multiplying width or height.

Examples

>>> record_dataset = RecordFileDetection('train.rec')
>>> img, label = record_dataset[0]
>>> print(img.shape, label.shape)
(512, 512, 3) (1, 5)