models/official/vision/docs/aug_reg.md

7.0 KiB

TF-Vision Data Augmentation and Model Regularization

Data augmentation methods

TF-Vision provides a rich collection of advanced SoTA data augmentation methods for various vision tasks. Default augmentation methods such as random flipping or cropping will not be discussed here.

RandAugment and AutoAugment

RandAugment or AutoAugment combines multiple common image augmentations/transformations that adjust the contrast, orientation, color, brightness and sharpness. It randomly selects the augmentations as well as the augmentation strength. The TF-Vision implementations are designed to work out of the box for a wide range of datasets and come with low overhead. RandAugment and AutoAugment nowadays are commonly adopted for SoTA image classification model training.

Supported vision tasks:

  • Image classification
  • Video action classification
  • Object detection

Definitions in TF-Vision:

@dataclasses.dataclass
class RandAugment(hyperparams.Config):
  """Configuration for RandAugment."""
  num_layers: int = 2
  magnitude: float = 10
  cutout_const: float = 40
  translate_const: float = 10
  magnitude_std: float = 0.0
  prob_to_apply: Optional[float] = None
  exclude_ops: List[str] = dataclasses.field(default_factory=list)


@dataclasses.dataclass
class AutoAugment(hyperparams.Config):
  """Configuration for AutoAugment."""
  augmentation_name: str = 'v0'
  cutout_const: float = 100
  translate_const: float = 250

Yaml template of using the techniques in TF-Vision:

task:
  train_data:
    aug_type:
      type: 'randaug'
      randaug:
        magnitude: 10
        magnitude_std: 0.0
        num_layers: 2

Image scale jittering

Image scale jittering randomly selects a scale from a user-specified range. It either upsamples the image if the scale is greater than 1.0 or downsamples the image if the scale is smaller than 1.0. Randomly cropping or zero-padding is further performed to tailor the scaled image to a desired size. Scale jittering is one of the most effective augmentation methods for training SoTA object detection model. It has been widely used in recent publications on object detection such as SpineNet, EfficientDet, Detection-RS and Copy-Paste.

Supported vision tasks:

  • Object detection
  • Semantic segmentation

Definition in TF-Vision:

@dataclasses.dataclass
class Parser(hyperparams.Config):
  aug_scale_min: float = 1.0
  aug_scale_max: float = 1.0

Yaml template of using the techniques in TF-Vision:

task:
  train_data:
    parser:
      aug_scale_max: 2.0
      aug_scale_min: 0.5

Mixup and Cutmix

Mixup combines two input images through a random convex combination:

img = img_1 * a + img_2 * (1.0 - a),

where a is drawn from a beta distribution). Analogously, the labels are calculated with:

label = label_1 * a + label_2 * (1.0 - a).

Cutmix also combines two training instances. Instead of linearly interpolating, here we paste a random rectangular area of img_2 and into img_1. The labels are derived similarly to mixup:

label = label_1 * a + label_2 * (1.0 - a),

where a compensates the area of the inserted rectangle. The two methods are key factors in training transformer based image models (ViT, DEIT) with limited labeled data.

Supported vision tasks:

  • Image classification

Definitions in TF-Vision:

@dataclasses.dataclass
class MixupAndCutmix(hyperparams.Config):
  """Configuration for MixupAndCutmix."""
  mixup_alpha: float = .8
  cutmix_alpha: float = 1.
  prob: float = 1.0
  switch_prob: float = 0.5
  label_smoothing: float = 0.1

Yaml template of using the methods in TF-Vision:

task:
  train_data:
    mixup_and_cutmix:
      cutmix_alpha: 1.0
      label_smoothing: 0.1
      mixup_alpha: 0.8
      prob: 1.0
      switch_prob: 0.5

Random erasing

Random erasing samples random rectangles of different aspect ratios (default is one per image). Then, the pixels in these rectangles are replaced by random Gaussian noise. Note that random erasing is applied after normalizing the input images to zero mean and a variance of one. To apply random erasing , we exclude the Cutout augmentation in RandAugment. Cutout fills random square regions of the image.

Supported vision tasks:

  • Image classification

Definition in TF-Vision:

@dataclasses.dataclass
class RandomErasing(hyperparams.Config):
  """Configuration for RandomErasing."""
  probability: float = 0.25
  min_area: float = 0.02
  max_area: float = 1 / 3
  min_aspect: float = 0.3
  max_aspect = None
  min_count = 1
  max_count = 1
  trials = 10

Yaml template of using the methods in TF-Vision:

task:
  train_data:
    random_erasing:
      min_area: float: 0.02
      max_area: float: 0.33

Model Regularization methods

TF-Vision provides a rich collection of advanced SoTA model regularization methods for different models and tasks. Default regularization methods such as weight decay and dropout will not be described here.

Stochastic depth

Stochastic depth complements the idea of residual networks and is conceptually similar to dropout. In each mini-batch a random set of layers is bypassed/replaced with the identity function (aka residual connection). The probability of bypassing a layer increases linearly with the depth of the network. This procedure reduces the training time and improves generalization.

Supported models:

  • ResNet, ResNet-RS
  • SpineNet, SpineNet-mobile, SpineNet-seg
  • Vision Transformer (ViT)
  • ResNet-RS-3D

Yaml template of using the methods in TF-Vision:

task:
  model:
    backbone:
      resnet:
        init_stochastic_depth_rate: 0.2

Label smoothing

Label smoothing aims to reduce the issue of overconfidence and consequently overfitting. Instead of one-hot labeling, we assign a high value x (close to 1) to the correct class and evenly distribute (1 - x) to the remaining classes.

Supported models:

  • Image classification
  • Video action classification

Yaml template of using the methods in TF-Vision:

task
  losses:
    label_smoothing: 0.1