mirror of https://github.com/tensorflow/models.git
371 lines
16 KiB
Markdown
371 lines
16 KiB
Markdown
# Customize Input Pipeline
|
||
|
||
|
||
|
||
|
||
|
||
|
||
## Overview
|
||
|
||
|
||
A task is a class that encapsulates the logic of loading data, building models,
|
||
performing one-step training and validation, etc. It connects all components
|
||
together and is called by the base
|
||
[Trainer](https://github.com/tensorflow/models/blob/master/official/core/base_trainer.py).
|
||
You can create your own task by inheriting from base
|
||
[Task](https://github.com/tensorflow/models/blob/master/official/core/base_task.py),
|
||
or from one of the
|
||
[tasks](https://github.com/tensorflow/models/tree/master/official/vision/tasks)
|
||
we already defined, if most of the operations can be reused. An `ExampleTask`
|
||
inheriting from
|
||
[ImageClassificationTask](https://github.com/tensorflow/models/blob/master/official/vision/tasks/image_classification.py#L31)
|
||
can be found
|
||
[here](https://github.com/tensorflow/models/blob/master/official/vision/examples/starter/example_task.py).
|
||
|
||
|
||
In a task class, the `build_inputs` method is responsible for building the input
|
||
pipeline for training and evaluation. Specifically, it will instantiate a
|
||
Decoder object and a Parser object, which are used to create an `InputReader`
|
||
that will generate a `tf.data.Dataset` object.
|
||
|
||
|
||
Here's an example code snippet that demonstrates how to create a custom
|
||
`build_inputs` method:
|
||
|
||
|
||
```python
|
||
def build_inputs(
|
||
self,
|
||
params: exp_cfg.DataConfig,
|
||
input_context: Optional[tf.distribute.InputContext] = None
|
||
) -> tf.data.Dataset:
|
||
....
|
||
|
||
|
||
decoder = sample_input.Decoder()
|
||
parser = sample_input.Parser(
|
||
output_size=..., num_classes=...)
|
||
reader = input_reader_factory.input_reader_generator(
|
||
params,
|
||
dataset_fn=dataset_fn.pick_dataset_fn(params.file_type),
|
||
decoder_fn=decoder.decode,
|
||
parser_fn=parser.parse_fn(params.is_training))
|
||
....
|
||
|
||
|
||
dataset = reader.read(input_context=input_context)
|
||
return dataset
|
||
```
|
||
|
||
|
||
The class being responsible for building the input pipeline is
|
||
[InputReader](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/core/input_reader.py#L214)
|
||
with interface
|
||
|
||
```python
|
||
class InputReader:
|
||
"""Input reader that returns a tf.data.Dataset instance."""
|
||
|
||
def __init__(
|
||
self,
|
||
params: cfg.DataConfig,
|
||
dataset_fn=tf.data.TFRecordDataset,
|
||
decoder_fn: Optional[Callable[..., Any]] = None,
|
||
combine_fn: Optional[Callable[..., Any]] = None,
|
||
sample_fn: Optional[Callable[..., Any]] = None,
|
||
parser_fn: Optional[Callable[..., Any]] = None,
|
||
filter_fn: Optional[Callable[..., tf.Tensor]] = None,
|
||
transform_and_batch_fn: Optional[
|
||
Callable[
|
||
[tf.data.Dataset, Optional[tf.distribute.InputContext]],
|
||
tf.data.Dataset,
|
||
]
|
||
] = None,
|
||
postprocess_fn: Optional[Callable[..., Any]] = None,
|
||
):
|
||
....
|
||
|
||
def read(self,
|
||
input_context: Optional[tf.distribute.InputContext] = None,
|
||
dataset: Optional[tf.data.Dataset] = None) -> tf.data.Dataset:
|
||
"""Generates a tf.data.Dataset object."""
|
||
if dataset is None:
|
||
dataset = self._read_data_source(self._matched_files, self._dataset_fn,
|
||
input_context)
|
||
dataset = self._decode_and_parse_dataset(dataset, self._global_batch_size,
|
||
input_context)
|
||
dataset = _maybe_map_fn(dataset, self._postprocess_fn)
|
||
if not (self._enable_shared_tf_data_service_between_parallel_trainers and
|
||
self._apply_tf_data_service_before_batching):
|
||
dataset = self._maybe_apply_data_service(dataset, input_context)
|
||
|
||
if self._deterministic is not None:
|
||
options = tf.data.Options()
|
||
options.deterministic = self._deterministic
|
||
dataset = dataset.with_options(options)
|
||
if self._autotune_algorithm:
|
||
options = tf.data.Options()
|
||
options.autotune.autotune_algorithm = (
|
||
tf.data.experimental.AutotuneAlgorithm[self._autotune_algorithm])
|
||
dataset = dataset.with_options(options)
|
||
return dataset.prefetch(self._prefetch_buffer_size)
|
||
```
|
||
|
||
Therefore, customizing the input pipeline is equivalent to having customized
|
||
versions of `dataset_fn`, `decoder_fn`, etc. The execution order is generally
|
||
as:
|
||
|
||
```
|
||
dataset_fn -> decoder_fn -> combine_fn -> parser_fn -> filter_fn ->
|
||
transform_and_batch_fn -> postprocess_fn
|
||
```
|
||
|
||
The `transform_and_batch_fn` is an optional function that merges multiple
|
||
examples into a batch and its default behavior to `dataset.batch` if not
|
||
specified. In this workflow, the functions before `transform_and_batch_fn`, e.g.
|
||
`dataset_fn`, `decoder_fn`, consume tensors without the batch dimension, while
|
||
`postprocess_fn` will consume tensors with the batch dimension.
|
||
|
||
We have essentially covered
|
||
[decoder_fn](https://github.com/tensorflow/models/blob/master/official/vision/docs/read_custom_datasets.md#decoder),
|
||
and `parser_fn` is another very important one that takes the decoded raw tensors
|
||
dict and parses them into a dictionary of tensors that can be consumed by the
|
||
model. It will be executed after decoder_fn.
|
||
|
||
It is also worth noting that optimizing of the input pipeline through
|
||
batching, shuffling and prefetching is also implemented in this class.
|
||
|
||
## Parser
|
||
|
||
A custom data loader can also be useful if you want to take advantage of
|
||
features such as data augmentation.
|
||
|
||
Customizing preprocessing is useful because it allows the user to tailor the
|
||
preprocessing steps to suit the specific requirements of the task. While there
|
||
are standard preprocessing techniques that are commonly used, different
|
||
applications may require different preprocessing steps. Additionally, custom
|
||
preprocessing can also improve the efficiency and accuracy of the model by
|
||
removing unnecessary steps, reducing computational resources or adding steps
|
||
that are important to the specific task being addressed.
|
||
|
||
For example, tasks such as object detection or segmentation may require
|
||
additional preprocessing steps such as resizing, cropping, or data augmentation
|
||
to improve the robustness of the model. Below are some essential steps to
|
||
customize a parser.
|
||
|
||
### Instructions
|
||
|
||
* **Create a Subclass**
|
||
<br>
|
||
|
||
<dd><dl>
|
||
|
||
Like Decoder, create `class Parser(parser.Parser)` in the same file.The
|
||
`Parser` class should be a childclass of the
|
||
[generic parser interface](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders/parser.py)
|
||
and must implement all the abstract methods. It should have the implementation
|
||
of abstract methods `_parse_train_data` and `_parse_eval_data`, to generate
|
||
images and labels for model training and evaluation respectively. The below example
|
||
takes only two arguments but one can freely add as many arguments as needed.
|
||
|
||
```python
|
||
class Parser(parser.Parser):
|
||
|
||
def __init__(self, output_size: List[int], num_classes: float):
|
||
|
||
self._output_size = output_size
|
||
self._num_classes = num_classes
|
||
self._dtype = tf.float32
|
||
|
||
....
|
||
```
|
||
|
||
<br>
|
||
|
||
Refer to the data parser and processing [class](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders/maskrcnn_input.py) for Mask R-CNN for more complex cases. The class has multiple parameters related to data augmentation, masking, anchor boxes, data type of output image and more.
|
||
|
||
</dd></dl>
|
||
|
||
<br>
|
||
|
||
* **Complete Abstract Methods**<br>
|
||
|
||
<dd><dl>
|
||
|
||
To define your own Parser, the user should override abstract functions
|
||
[_parse_train_data](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/vision/dataloaders/parser.py#L26)
|
||
and
|
||
[_parse_eval_data](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/vision/dataloaders/parser.py#L39)
|
||
of the
|
||
[parser](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders/parser.py)
|
||
interface in the subclass, where decoded tensors are parsed with pre-processing
|
||
steps for training and evaluation respectively. The output from the two
|
||
functions can be any structure like a tuple, list or dictionary.
|
||
|
||
```python
|
||
@abc.abstractmethod
|
||
def _parse_train_data(self, decoded_tensors):
|
||
"""Generates images and labels that are usable for model training.
|
||
|
||
Args:
|
||
decoded_tensors: a dict of Tensors produced by the decoder.
|
||
|
||
Returns:
|
||
images: the image tensor.
|
||
labels: a dict of Tensors that contains labels.
|
||
"""
|
||
pass
|
||
|
||
@abc.abstractmethod
|
||
def _parse_eval_data(self, decoded_tensors):
|
||
"""Generates images and labels that are usable for model evaluation.
|
||
|
||
Args:
|
||
decoded_tensors: a dict of Tensors produced by the decoder.
|
||
|
||
Returns:
|
||
images: the image tensor.
|
||
labels: a dict of Tensors that contains labels.
|
||
"""
|
||
pass
|
||
|
||
```
|
||
|
||
The input of `_parse_train_data` and `_parse_eval_data` is a dict of Tensors
|
||
produced by the decoder; the output of these two functions is typically a tuple
|
||
of (processe_image, processed_label). The user may perform any processing steps
|
||
in these two functions as long as the interface is aligned. Note that the
|
||
processing steps in `_parse_train_data` and `_parse_eval_data` are typically
|
||
different since data augmentation is usually only applied to training. For
|
||
Example, refer to the
|
||
[Data parser](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/vision/dataloaders/classification_input.py#L166)
|
||
and processing steps for classification. We can observe that
|
||
|
||
<dd><dl>
|
||
|
||
-For `_parse_train_data`, the following steps are performed</dd></dl>
|
||
|
||
- Image decoding<br>
|
||
- Random cropping<br>
|
||
- Random flipping<br>
|
||
- Color jittering<br>
|
||
- Image resizing<br>
|
||
- Auto-augmentation with autoaug, randaug etc.<br>
|
||
- Image normalization<br>
|
||
|
||
<dd><dl><dd><dl>
|
||
|
||
-For `_parse_eval_data`, the following steps are performed</dd></dl></dd></dl>
|
||
|
||
-
|
||
Image decoding<br>
|
||
- Center cropping<br>
|
||
- Image resizing<br>
|
||
- Image normalization<br>
|
||
|
||
</dd></dl>
|
||
|
||
**Additional Methods**
|
||
|
||
The subclass (say sample_input.py) must include implementations for all of the
|
||
abstract methods defined in the Interface
|
||
[Decoder](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders/decoder.py)
|
||
and
|
||
[Parser](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders/parser.py)
|
||
, as well as any additional methods that are necessary for the subclass's
|
||
functionality.
|
||
|
||
For Example, In
|
||
[object detection](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/vision/dataloaders/tf_example_decoder.py#L72),
|
||
the decoder will take the serialized example and output a dictionary of tensors
|
||
with multiple fields that process and analyze to detect objects and determine
|
||
their location and orientation in the image. Separate methods for each of the
|
||
above fields can make the code easier to read and maintain, especially when the
|
||
class contains a large number of methods.
|
||
|
||
Refer
|
||
[Data parser](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders/retinanet_input.py)
|
||
for Object Detection here.
|
||
|
||
### Example
|
||
|
||
Creating a Parser is an optional step and it varies with the use case. Below are
|
||
some use cases where we have included the Decoder and Parser based on the
|
||
requirements.
|
||
|
||
Use case | Decoder/Parser |
|
||
-------------------------------------------------------------------------------------------------------------------------------------------------------- | ----
|
||
[Classification](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders/classification_input.py) | Both Decoder and Parser
|
||
[Segmentation](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders/retinanet_input.py) | Only Parser
|
||
|
||
## Input Pipeline
|
||
|
||
Decoder and Parser discussed previously define how to decode and parse per data
|
||
point e.g. an image. However a complete input pipeline would need to handle
|
||
reading data from files in a distributed system, applying random perturbations,
|
||
batching etc. You may find more details about these concepts
|
||
[here](https://www.tensorflow.org/guide/data_performance#optimize_performance).
|
||
|
||
We have established a well tuned input pipeline as defined in the [InputReader](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/core/input_reader.py#L214) class, such that the user won’t need to modify it in most cases. The input pipeline roughly follows<br>
|
||
- Shuffling the files<br>
|
||
- Decoding<br>
|
||
- Parsing<br>
|
||
- Caching<br>
|
||
- If training: repeat and shuffle<br>
|
||
- Batching<br>
|
||
- Prefetching<br>
|
||
|
||
For the rest of this section, we will discuss one particular use case that
|
||
requires the modification of the typical pipeline by maybe creating a subclass
|
||
of the
|
||
[InputReader](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/core/input_reader.py#L214).
|
||
|
||
### Combines multiple datasets
|
||
|
||
Create a custom InputReader by subclassing
|
||
[InputReader](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/core/input_reader.py#L214)
|
||
interface. Custom InputReader class allows the user to combine multiple
|
||
datasets, helps in mixing a labeled and pseudo-labeled dataset etc. The business
|
||
logic is implemented in the `read()` method which finally generates a
|
||
`tf.data.Dataset` object.
|
||
|
||
The exact implementation of an InputReader can vary depending on the specific
|
||
requirements of your task and the type of input data you're working with, data
|
||
format, and preprocessing requirements.
|
||
|
||
Here is an example of how to create a custom InputReader by subclassing
|
||
[InputReader](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/core/input_reader.py#L214)
|
||
interface:
|
||
|
||
```python
|
||
class CustomInputReader(input_reader.InputReader):
|
||
|
||
def __init__(self,
|
||
params: cfg.DataConfig,
|
||
dataset_fn=tf.data.TFRecordDataset,
|
||
pseudo_label_dataset_fn=tf.data.TFRecordDataset,
|
||
....):
|
||
|
||
def read(
|
||
self,
|
||
input_context: Optional[tf.distribute.InputContext] = None
|
||
) -> tf.data.Dataset:
|
||
|
||
|
||
labeled_dataset = ....
|
||
pseudo_labeled_dataset = ....
|
||
dataset_concat = tf.data.Dataset.zip(
|
||
(labeled_dataset, pseudo_labeled_dataset))
|
||
....
|
||
|
||
return dataset_concat.prefetch(tf.data.experimental.AUTOTUNE)
|
||
|
||
```
|
||
|
||
### Example
|
||
|
||
Refer to the
|
||
[InputReader](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/vision/dataloaders/input_reader.py#L124)
|
||
for vision in TFM. The `CombinationDatasetInputReader` class mixes a labeled and
|
||
pseudo-labeled dataset and returns a `tf.data.Dataset` instance.
|