gluoncv.data.batchify

Batchify functions can be used to transform a dataset into mini-batches that can be processed efficiently.

In computer vision tasks, images/labels often come with different shapes. GluonCV provides a collection of convenient batchify functions suitable for various situations.

Batch Loaders

Stack Stack the input data samples to construct the batch.
Pad Pad the input ndarrays along the specific padding axis and stack them to get the output.
Append Loosely return list of the input data samples.
Tuple Wrap multiple batchify functions to form a function apply each input function on each input fields respectively.

API Reference

Batchify functions. They can be used in Gluon data loader to help combine individual samples into batches for fast processing.

class gluoncv.data.batchify.Stack[source]

Stack the input data samples to construct the batch. The N input samples must have the same shape/length and will be stacked to construct a batch. .. rubric:: Examples

>>> from gluoncv.data import batchify
>>> # Stack multiple lists
>>> a = [1, 2, 3, 4]
>>> b = [4, 5, 6, 8]
>>> c = [8, 9, 1, 2]
>>> batchify.Stack()([a, b, c])
[[1. 2. 3. 4.]
 [4. 5. 6. 8.]
 [8. 9. 1. 2.]]
<NDArray 3x4 @cpu(0)>
>>> # Stack multiple numpy.ndarrays
>>> import numpy as np
>>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
>>> b = np.array([[5, 6, 7, 8], [1, 2, 3, 4]])
>>> batchify.Stack()([a, b])
[[[1. 2. 3. 4.]
  [5. 6. 7. 8.]]
 [[5. 6. 7. 8.]
  [1. 2. 3. 4.]]]
<NDArray 2x2x4 @cpu(0)>
>>> # Stack multiple NDArrays
>>> import mxnet as mx
>>> a = mx.nd.array([[1, 2, 3, 4], [5, 6, 7, 8]])
>>> b = mx.nd.array([[5, 6, 7, 8], [1, 2, 3, 4]])
>>> batchify.Stack()([a, b])
[[[1. 2. 3. 4.]
  [5. 6. 7. 8.]]
 [[5. 6. 7. 8.]
  [1. 2. 3. 4.]]]
<NDArray 2x2x4 @cpu(0)>
class gluoncv.data.batchify.Pad(axis=0, pad_val=0, ret_length=False)[source]

Pad the input ndarrays along the specific padding axis and stack them to get the output. Input of the function will be N samples. Each sample should contain a single element that can be 1) numpy.ndarray, 2) mxnet.nd.NDArray, 3) list of numbers You need to set the index parameter to determine which part of the sample requires padding. Also, you can set the pad_axis and pad_val to determine the padding axis and value. The arrays will be padded to the largest dimension at pad_axis and then stacked to form the final output. In addition, the function will output the original dimensions at the pad_axis if ret_length is turned on. :param axis: The axis to pad the arrays. The arrays will be padded to the largest dimension at

pad_axis. For example, assume the input arrays have shape (10, 8, 5), (6, 8, 5), (3, 8, 5) and the pad_axis is 0. Each input will be padded into (10, 8, 5) and then stacked to form the final output.
Parameters:
  • pad_val (float or int, default 0) – The padding value.
  • ret_length (bool, default False) – Whether to return the valid length in the output.

Examples

>>> from gluoncv.data import batchify
>>> # Inputs are multiple lists
>>> a = [1, 2, 3, 4]
>>> b = [4, 5, 6]
>>> c = [8, 2]
>>> batchify.Pad()([a, b, c])
[[ 1  2  3  4]
 [ 4  5  6  0]
 [ 8  2  0  0]]
<NDArray 3x4 @cpu(0)>
>>> # Also output the lengths
>>> a = [1, 2, 3, 4]
>>> b = [4, 5, 6]
>>> c = [8, 2]
>>> batchify.Pad(ret_length=True)([a, b, c])
(
 [[1 2 3 4]
  [4 5 6 0]
  [8 2 0 0]]
 <NDArray 3x4 @cpu(0)>,
 [4 3 2]
 <NDArray 3 @cpu(0)>)
>>> # Inputs are multiple ndarrays
>>> import numpy as np
>>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
>>> b = np.array([[5, 8], [1, 2]])
>>> batchify.Pad(axis=1, pad_val=-1)([a, b])
[[[ 1  2  3  4]
  [ 5  6  7  8]]
 [[ 5  8 -1 -1]
  [ 1  2 -1 -1]]]
<NDArray 2x2x4 @cpu(0)>
>>> # Inputs are multiple NDArrays
>>> import mxnet as mx
>>> a = mx.nd.array([[1, 2, 3, 4], [5, 6, 7, 8]])
>>> b = mx.nd.array([[5, 8], [1, 2]])
>>> batchify.Pad(axis=1, pad_val=-1)([a, b])
[[[ 1.  2.  3.  4.]
  [ 5.  6.  7.  8.]]
 [[ 5.  8. -1. -1.]
  [ 1.  2. -1. -1.]]]
<NDArray 2x2x4 @cpu(0)>
class gluoncv.data.batchify.Append(expand=True, batch_axis=0)[source]

Loosely return list of the input data samples. There is no constraint of shape for any of the input samples, however, you will only be able to apply single batch operations since the output have different shapes.

Examples

>>> a = [1, 2, 3, 4]
>>> b = [4, 5, 6]
>>> c = [8, 2]
>>> batchify.Append()([a, b, c])
[
[[1. 2. 3. 4.]]
<NDArray 1x4 @cpu_shared(0)>,
[[4. 5. 6.]]
<NDArray 1x3 @cpu_shared(0)>,
[[8. 2.]]
<NDArray 1x2 @cpu_shared(0)>
]
class gluoncv.data.batchify.Tuple(fn, *args)[source]

Wrap multiple batchify functions to form a function apply each input function on each input fields respectively. Each data sample should be a list or tuple containing multiple attributes. The i`th batchify function stored in `Tuple will be applied on the i`th attribute. For example, each data sample is (nd_data, label). You can wrap two batchify functions using `Wrap(DataBatchify, LabelBatchify) to batchify nd_data and label correspondingly. :param fn: The batchify functions to wrap. :type fn: list or tuple or callable :param *args: The additional batchify functions to wrap. :type *args: tuple of callable

Examples

>>> from gluoncv.data import batchify
>>> a = ([1, 2, 3, 4], 0)
>>> b = ([5, 7], 1)
>>> c = ([1, 2, 3, 4, 5, 6, 7], 0)
>>> batchify.Tuple(batchify.Pad(), batchify.Stack())([a, b])
(
 [[1 2 3 4]
  [5 7 0 0]]
 <NDArray 2x4 @cpu(0)>,
 [0. 1.]
 <NDArray 2 @cpu(0)>)
>>> # Input can also be a list
>>> batchify.Tuple([batchify.Pad(), batchify.Stack()])([a, b])
(
 [[1 2 3 4]
  [5 7 0 0]]
 <NDArray 2x4 @cpu(0)>,
 [0. 1.]
 <NDArray 2 @cpu(0)>)
>>> # Another example
>>> a = ([1, 2, 3, 4], [5, 6], 1)
>>> b = ([1, 2], [3, 4, 5, 6], 0)
>>> c = ([1], [2, 3, 4, 5, 6], 0)
>>> batchify.Tuple(batchify.Pad(), batchify.Pad(), batchify.Stack())([a, b, c])
(
 [[1 2 3 4]
  [1 2 0 0]
  [1 0 0 0]]
 <NDArray 3x4 @cpu(0)>,
 [[5 6 0 0 0]
  [3 4 5 6 0]
  [2 3 4 5 6]]
 <NDArray 3x5 @cpu(0)>,
 [1. 0. 0.]
 <NDArray 3 @cpu(0)>)