gluoncv.data¶
This module provides data loaders and transformers for popular vision datasets.
Hint
Please refer to Prepare Datasets for the description of the datasets listed in this page, and how to download and extract them.
Hint
For small dataset such as MNIST and CIFAR10, please refer to GluonCV Datasets, which can be used directly without any downloading step.
Pascal VOC¶
Pascal VOC detection Dataset. |
|
Pascal VOC Semantic Segmentation Dataset. |
|
Pascal VOC Augmented Semantic Segmentation Dataset. |
Kinetics400¶
Load the Kinetics400 video action recognition dataset. |
Customized Dataset¶
Detection dataset loaded from LST file and raw images. |
|
Detection dataset loaded from record file. |
API Reference¶
-
class
gluoncv.data.
ImageNet
(root='~/.mxnet/datasets/imagenet', train=True, transform=None)[source]¶ Load the ImageNet classification dataset.
Refer to Prepare the ImageNet dataset for the description of this dataset and how to prepare it.
- Parameters
root (str, default '~/.mxnet/datasets/imagenet') – Path to the folder stored the dataset.
train (bool, default True) – Whether to load the training or validation set.
transform (function, default None) – A function that takes data and label and transforms them. Refer to ./transforms for examples. (TODO, should we restrict its datatype to transformer?)
-
class
gluoncv.data.
VOCDetection
(root='~/.mxnet/datasets/voc', splits=((2007, 'trainval'), (2012, 'trainval')), transform=None, index_map=None, preload_label=True)[source]¶ Pascal VOC detection Dataset.
- Parameters
root (str, default '~/mxnet/datasets/voc') – Path to folder storing the dataset.
splits (list of tuples, default ((2007, 'trainval'), (2012, 'trainval'))) – List of combinations of (year, name) For years, candidates can be: 2007, 2012. For names, candidates can be: ‘train’, ‘val’, ‘trainval’, ‘test’.
transform (callable, default None) –
A function that takes data and label and transforms them. Refer to ./transforms for examples.
A transform function for object detection should take label into consideration, because any geometric modification will require label to be modified.
index_map (dict, default None) – In default, the 20 classes are mapped into indices from 0 to 19. We can customize it by providing a str to int dict specifying how to map class names to indices. Use by advanced users only, when you want to swap the orders of class labels.
preload_label (bool, default True) – If True, then parse and load all labels into memory during initialization. It often accelerate speed but require more memory usage. Typical preloaded labels took tens of MB. You only need to disable it when your dataset is extremely large.
-
class
gluoncv.data.
VOCSegmentation
(root='/root/.mxnet/datasets/voc', split='train', mode=None, transform=None, **kwargs)[source]¶ Pascal VOC Semantic Segmentation Dataset.
- Parameters
root (string) – Path to VOCdevkit folder. Default is ‘$(HOME)/mxnet/datasets/voc’
split (string) – ‘train’, ‘val’ or ‘test’
transform (callable, optional) – A function that transforms the image
Examples
>>> from mxnet.gluon.data.vision import transforms >>> # Transforms for Normalization >>> input_transform = transforms.Compose([ >>> transforms.ToTensor(), >>> transforms.Normalize([.485, .456, .406], [.229, .224, .225]), >>> ]) >>> # Create Dataset >>> trainset = gluoncv.data.VOCSegmentation(split='train', transform=input_transform) >>> # Create Training Loader >>> train_data = gluon.data.DataLoader( >>> trainset, 4, shuffle=True, last_batch='rollover', >>> num_workers=4)
-
class
gluoncv.data.
VOCAugSegmentation
(root='/root/.mxnet/datasets/voc', split='train', mode=None, transform=None, **kwargs)[source]¶ Pascal VOC Augmented Semantic Segmentation Dataset.
- Parameters
root (string) – Path to VOCdevkit folder. Default is ‘$(HOME)/mxnet/datasplits/voc’
split (string) – ‘train’ or ‘val’
transform (callable, optional) – A function that transforms the image
Examples
>>> from mxnet.gluon.data.vision import transforms >>> # Transforms for Normalization >>> input_transform = transforms.Compose([ >>> transforms.ToTensor(), >>> transforms.Normalize([.485, .456, .406], [.229, .224, .225]), >>> ]) >>> # Create Dataset >>> trainset = gluoncv.data.VOCAugSegmentation(split='train', transform=input_transform) >>> # Create Training Loader >>> train_data = gluon.data.DataLoader( >>> trainset, 4, shuffle=True, last_batch='rollover', >>> num_workers=4)
-
class
gluoncv.data.
COCODetection
(root='~/.mxnet/datasets/coco', splits=('instances_val2017'), transform=None, min_object_area=0, skip_empty=True, use_crowd=True)[source]¶ MS COCO detection dataset.
- Parameters
root (str, default '~/.mxnet/datasets/coco') – Path to folder storing the dataset.
splits (list of str, default ['instances_val2017']) – Json annotations name. Candidates can be: instances_val2017, instances_train2017.
transform (callable, default None) –
A function that takes data and label and transforms them. Refer to ./transforms for examples.
A transform function for object detection should take label into consideration, because any geometric modification will require label to be modified.
min_object_area (float) – Minimum accepted ground-truth area, if an object’s area is smaller than this value, it will be ignored.
skip_empty (bool, default is True) – Whether skip images with no valid object. This should be True in training, otherwise it will cause undefined behavior.
use_crowd (bool, default is True) – Whether use boxes labeled as crowd instance.
-
class
gluoncv.data.
COCOInstance
(root='~/.mxnet/datasets/coco', splits=('instances_val2017'), transform=None, min_object_area=1, skip_empty=True)[source]¶ MS COCO instance segmentation dataset.
- Parameters
root (str, default '~/mxnet/datasets/coco') – Path to folder storing the dataset.
splits (list of str, default ['instances_val2017']) – Json annotations name. Candidates can be: instances_val2017, instances_train2017.
transform (callable, default None) –
A function that takes data and label and transforms them. Refer to ./transforms for examples.
A transform function for object detection should take label into consideration, because any geometric modification will require label to be modified.
min_object_area (float, default is 1) – Minimum accepted ground-truth area, if an object’s area is smaller than this value, it will be ignored.
skip_empty (bool, default is True) – Whether skip images with no valid object. This should be True in training, otherwise it will cause undefined behavior.
-
class
gluoncv.data.
ADE20KSegmentation
(root='/root/.mxnet/datasets/ade', split='train', mode=None, transform=None, **kwargs)[source]¶ ADE20K Semantic Segmentation Dataset.
- Parameters
root (string) – Path to VOCdevkit folder. Default is ‘$(HOME)/mxnet/datasplits/ade’
split (string) – ‘train’, ‘val’ or ‘test’
transform (callable, optional) – A function that transforms the image
Examples
>>> from mxnet.gluon.data.vision import transforms >>> # Transforms for Normalization >>> input_transform = transforms.Compose([ >>> transforms.ToTensor(), >>> transforms.Normalize([.485, .456, .406], [.229, .224, .225]), >>> ]) >>> # Create Dataset >>> trainset = gluoncv.data.ADE20KSegmentation(split='train', transform=input_transform) >>> # Create Training Loader >>> train_data = gluon.data.DataLoader( >>> trainset, 4, shuffle=True, last_batch='rollover', >>> num_workers=4)
-
class
gluoncv.data.
Kinetics400
(root='/root/.mxnet/datasets/kinetics400/rawframes_train', setting='/root/.mxnet/datasets/kinetics400/kinetics400_train_list_rawframes.txt', train=True, test_mode=False, name_pattern='img_%05d.jpg', video_ext='mp4', is_color=True, modality='rgb', num_segments=1, num_crop=1, new_length=1, new_step=1, new_width=340, new_height=256, target_width=224, target_height=224, temporal_jitter=False, video_loader=False, use_decord=False, slowfast=False, slow_temporal_stride=16, fast_temporal_stride=2, data_aug='v1', lazy_init=False, transform=None)[source]¶ Load the Kinetics400 video action recognition dataset.
Refer to Prepare the Kinetics400 dataset for the description of this dataset and how to prepare it.
- Parameters
root (str, required. Default '~/.mxnet/datasets/kinetics400/rawframes_train'.) – Path to the root folder storing the dataset.
setting (str, required.) – A text file describing the dataset, each line per video sample. There are three items in each line: (1) video path; (2) video length and (3) video label.
train (bool, default True.) – Whether to load the training or validation set.
test_mode (bool, default False.) – Whether to perform evaluation on the test set. Usually there is three-crop or ten-crop evaluation strategy involved.
name_pattern (str, default None.) – The naming pattern of the decoded video frames. For example, img_00012.jpg.
video_ext (str, default 'mp4'.) – If video_loader is set to True, please specify the video format accordinly.
is_color (bool, default True.) – Whether the loaded image is color or grayscale.
modality (str, default 'rgb'.) – Input modalities, we support only rgb video frames for now. Will add support for rgb difference image and optical flow image later.
num_segments (int, default 1.) – Number of segments to evenly divide the video into clips. A useful technique to obtain global video-level information. Limin Wang, etal, Temporal Segment Networks: Towards Good Practices for Deep Action Recognition, ECCV 2016.
num_crop (int, default 1.) – Number of crops for each image. default is 1. Common choices are three crops and ten crops during evaluation.
new_length (int, default 1.) – The length of input video clip. Default is a single image, but it can be multiple video frames. For example, new_length=16 means we will extract a video clip of consecutive 16 frames.
new_step (int, default 1.) – Temporal sampling rate. For example, new_step=1 means we will extract a video clip of consecutive frames. new_step=2 means we will extract a video clip of every other frame.
new_width (int, default 340.) – Scale the width of loaded image to ‘new_width’ for later multiscale cropping and resizing.
new_height (int, default 256.) – Scale the height of loaded image to ‘new_height’ for later multiscale cropping and resizing.
target_width (int, default 224.) – Scale the width of transformed image to the same ‘target_width’ for batch forwarding.
target_height (int, default 224.) – Scale the height of transformed image to the same ‘target_height’ for batch forwarding.
temporal_jitter (bool, default False.) – Whether to temporally jitter if new_step > 1.
video_loader (bool, default False.) – Whether to use video loader to load data.
use_decord (bool, default True.) – Whether to use Decord video loader to load data. Otherwise use mmcv video loader.
transform (function, default None.) – A function that takes data and label and transforms them.
slowfast (bool, default False.) – If set to True, use data loader designed for SlowFast network. Christoph Feichtenhofer, etal, SlowFast Networks for Video Recognition, ICCV 2019.
slow_temporal_stride (int, default 16.) – The temporal stride for sparse sampling of video frames in slow branch of a SlowFast network.
fast_temporal_stride (int, default 2.) – The temporal stride for sparse sampling of video frames in fast branch of a SlowFast network.
data_aug (str, default 'v1'.) – Different types of data augmentation auto. Supports v1, v2, v3 and v4.
lazy_init (bool, default False.) – If set to True, build a dataset instance without loading any dataset.
-
class
gluoncv.data.
DetectionDataLoader
(dataset, batch_size=None, shuffle=False, sampler=None, last_batch=None, batch_sampler=None, batchify_fn=None, num_workers=0)[source]¶ Data loader for detection dataset.
Deprecated since version 0.2.0:
DetectionDataLoader
is deprecated, please usemxnet.gluon.data.DataLoader
with batchify functions listed in gluoncv.data.batchify directly.It loads data batches from a dataset and then apply data transformations. It’s a subclass of
mxnet.gluon.data.DataLoader
, and therefore has very similar APIs.The main purpose of the DataLoader is to pad variable length of labels from each image, because they have different amount of objects.
- Parameters
dataset (mxnet.gluon.data.Dataset or numpy.ndarray or mxnet.ndarray.NDArray) – The source dataset.
batch_size (int) – The size of mini-batch.
shuffle (bool, default False) – If or not randomly shuffle the samples. Often use True for training dataset and False for validation/test datasets
sampler (mxnet.gluon.data.Sampler, default None) – The sampler to use. We should either specify a sampler or enable shuffle, not both, because random shuffling is a sampling method.
last_batch ({'keep', 'discard', 'rollover'}, default is keep) –
How to handle the last batch if the batch size does not evenly divide by the number of examples in the dataset. There are three options to deal with the last batch if its size is smaller than the specified batch size.
keep: keep it
discard: throw it away
rollover: insert the examples to the beginning of the next batch
batch_sampler (mxnet.gluon.data.BatchSampler) – A sampler that returns mini-batches. Do not specify batch_size, shuffle, sampler, and last_batch if batch_sampler is specified.
batchify_fn (callable) –
Callback function to allow users to specify how to merge samples into a batch. Defaults to
gluoncv.data.dataloader.default_pad_batchify_fn()
:def default_pad_batchify_fn(data): if isinstance(data[0], nd.NDArray): return nd.stack(*data) elif isinstance(data[0], tuple): data = zip(*data) return [pad_batchify(i) for i in data] else: data = np.asarray(data) pad = max([l.shape[0] for l in data]) buf = np.full((len(data), pad, data[0].shape[-1]), -1, dtype=data[0].dtype) for i, l in enumerate(data): buf[i][:l.shape[0], :] = l return nd.array(buf, dtype=data[0].dtype)
num_workers (int, default 0) – The number of multiprocessing workers to use for data preprocessing. If
num_workers
= 0, multiprocessing is disabled. Otherwisenum_workers
multiprocessing worker is used to process data.
-
class
gluoncv.data.
LstDetection
(filename, root='', flag=1, coord_normalized=True)[source]¶ Detection dataset loaded from LST file and raw images. LST file is a pure text file but with special label format.
Checkout 1. Preferred Object Detection Format for GluonCV and MXNet for tutorial of how to prepare this file.
- Parameters
filename (type) – Description of parameter filename.
root (str) – Relative image root folder for filenames in LST file.
flag (int, default is 1) – Use 1 for color images, and 0 for gray images.
coord_normalized (boolean) – Indicate whether bounding box coordinates haved been normalized to (0, 1) in labels. If so, we will rescale back to absolute coordinates by multiplying width or height.
-
class
gluoncv.data.
RecordFileDetection
(filename, coord_normalized=True)[source]¶ Detection dataset loaded from record file. The supported record file is using the same format used by
mxnet.image.ImageDetIter()
andmxnet.io.ImageDetRecordIter()
.Checkout 1. Preferred Object Detection Format for GluonCV and MXNet for tutorial of how to prepare this file.
Note
We suggest you to use
RecordFileDetection
only if you are familiar with the record files.- Parameters
filename (str) – Path of the record file. It require both *.rec and *.idx file in the same directory, where raw image and labels are stored in *.rec file for better IO performance, *.idx file is used to provide random access to the binary file.
coord_normalized (boolean) – Indicate whether bounding box coordinates have been normalized to (0, 1) in labels. If so, we will rescale back to absolute coordinates by multiplying width or height.
Examples
>>> record_dataset = RecordFileDetection('train.rec') >>> img, label = record_dataset[0] >>> print(img.shape, label.shape) (512, 512, 3) (1, 5)