502 lines
20 KiB
Markdown
502 lines
20 KiB
Markdown
# Binary Executable Format
|
|
|
|
<!--* freshness: {
|
|
owner: 'chky'
|
|
owner: 'hyojun'
|
|
owner: 'xldrx'
|
|
reviewed: '2021-06-07'
|
|
} *-->
|
|
|
|
<!-- TOC -->
|
|
|
|
This document describes the goals and rationale for the in-memory representation
|
|
the executor uses to run host kernels, known as the Binary Executable Format
|
|
"BEF".
|
|
|
|
We have the following goals:
|
|
|
|
- We want this to be simple and relatively stable - though it is versioned
|
|
just in case (tm). We don't want this to contain domain specific
|
|
abstractions that will require changing it frequently over time.
|
|
|
|
- This format is designed to be usable as an immutable region in memory,
|
|
allowing it to be built offline and conveniently `mmap`'d in (or stored in
|
|
the `.rodata` segment of an executable, etc).
|
|
|
|
- This is aligned with the abstractions used by the host executor and includes
|
|
key abstractions that it can directly use to make it speedy and memory
|
|
efficient. The design puts more burden on the encoder of a BEF file (e.g.
|
|
requiring multiple passes) to make the decoder fast.
|
|
|
|
- We expect this to be produced online (e.g. when a `TF_Function` is built on
|
|
demand) e.g. in training or interactive use cases.
|
|
|
|
- We need to represent location information in order to reflect runtime errors
|
|
(e.g. shape errors) back to the user source program.
|
|
|
|
- We should follow normal best practices for binary encodings, e.g. having
|
|
tools to round trip them to a textual form (MLIR in our case), support
|
|
arbitrary sized files (not limited to 4GB), be efficient to read, make
|
|
reasonable attempts to be dense, etc.
|
|
|
|
In addition to describing the format, this document describes some of the
|
|
rationale for the design decisions as well as TODO items that need to be
|
|
completed before this work is done. This file format shouldn't be declared
|
|
stable until all the TODOs are resolved.
|
|
|
|
## Fundamental Concepts
|
|
|
|
#### Grammar
|
|
|
|
```none
|
|
BYTE ::= `0x00`...`0xFF`
|
|
INTEGER ::= (`0x80`...`0xFF`)* (`0x00`...`0x7F`)
|
|
FIXED32 ::= BYTE BYTE BYTE BYTE
|
|
NULL_TERMINATED_STRING ::= (`0x01`...`0xFF`)* `0x00`
|
|
OFFSET ::= INTEGER
|
|
INDEX ::= INTEGER
|
|
```
|
|
|
|
A BEF file is formed as a byte stream whose top-level structure is a list of
|
|
"sections" that hold various sorts of data. It uses a few fundamental concepts:
|
|
|
|
* *Integers* are encoded in a "Variable Byte Rate" (VBR) encoding, allowing
|
|
small integers (less than 2^7) to be stored in one byte, integers up to 2^14
|
|
to be stored in two bytes, etc. This is done by using the high bit of each
|
|
byte in the stream to indicate "more bytes are coming".
|
|
|
|
* *Fixed Integers* are unsigned integers with fixed bit width. The values are
|
|
stored in little-endian byte order.
|
|
|
|
* *Offsets* are Integers that demarcate a byte offset in the stream from a
|
|
fixed position (typically the start of a section). These are useful when the
|
|
section contains a bunch of variable length things, e.g. when referring to a
|
|
string in the string table. Because variable sized integers are used
|
|
pervasively in BEF files, offsets are very common.
|
|
|
|
* *Indexes* are Integers that demarcate an entry in a table, e.g. a register
|
|
or kernel in a function. These are typically used when indexing into a table
|
|
where the structure is expected to be decoded and held in memory by the
|
|
reader, e.g. as is the case with the kernels and types tables.
|
|
|
|
BEF files can be created by an arbitrary producer, but the standard ways is to
|
|
use the
|
|
[MLIRToBEF](https://github.com/tensorflow/runtime/blob/master/lib/bef_converter/mlir_to_bef/mlir_to_bef.cc)
|
|
translation library (or standalone tool), which turns an MLIR representation of
|
|
a host program into a BEF file.
|
|
|
|
A
|
|
[BEFToMLIR](https://github.com/tensorflow/runtime/blob/master/lib/bef_converter/bef_to_mlir/bef_to_mlir.cc)
|
|
translation library performs the reverse translation from BEF file to MLIR
|
|
representation of a host program. It is useful for writing BEF encoder testcases
|
|
and for dealing with that "random file someone sent you".
|
|
|
|
## Sections
|
|
|
|
#### Grammar
|
|
|
|
```none
|
|
SECTION ::= SECTION_HEADER SECTION_DATA
|
|
SECTION_HEADER ::= SECTION_ID INTEGER<LENGTH_AND_ALIGNMENT> SECTION_BODY_ALIGNMENT?
|
|
SECTION_ID ::= BYTE
|
|
LENGTH_AND_ALIGNMENT ::= (SECTION_LENGTH << 1) | (SECTION_ALIGNMENT_FLAG)
|
|
SECTION_BODY_ALIGNMENT ::= BYTE<"Alignment"> BYTE<"AlignmentPadding">*
|
|
```
|
|
|
|
*Sections* are the top level entities in the file (after the header). Each
|
|
section contains a *Section ID*, a length (allowing the section to be skipped
|
|
over entirely by the reader) and an optional alignment for the contents of the
|
|
section.
|
|
|
|
Section IDs and other fundamental constants are defined in
|
|
[`bef_encoding.h`](https://github.com/tensorflow/runtime/blob/master/include/tfrt/bef/bef_encoding.h),
|
|
and utilities for decoding the basic file structures like VBR integers are
|
|
defined in
|
|
[`bef_reader.h`](https://github.com/tensorflow/runtime/blob/master/include/tfrt/bef/bef_reader.h).
|
|
|
|
The LENGTH_AND_ALIGNMENT contains SECTION_LENGTH and SECTION_ALIGNMENT_FLAG. The
|
|
SECTION_LENGTH contains one bit shifted value of the section length and the
|
|
SECTION_ALIGNMENT_FLAG (0 bit) indicates if SECTION_BODY_ALIGNMENT exists or
|
|
not; 0 means the section body starts immediately and 1 means
|
|
SECTION_BODY_ALIGNMENT follows.
|
|
|
|
## Top Level Structure
|
|
|
|
#### Grammar
|
|
|
|
```none
|
|
BEF_FILE ::= `0x0B` `0xEF` FORMAT_VERSION_NUMBER SECTION*
|
|
|
|
FORMAT_VERSION_NUMBER ::= `0x00`
|
|
|
|
SECTION_DATA ::= STRINGS_SECTION
|
|
SECTION_DATA ::= ATTRIBUTES_SECTION
|
|
SECTION_DATA ::= KERNELS_SECTION
|
|
SECTION_DATA ::= TYPES_SECTION
|
|
SECTION_DATA ::= FUNCTION_INDEX_SECTION
|
|
SECTION_DATA ::= FUNCTIONS_SECTION
|
|
SECTION_DATA ::= LOCATION_STRINGS_SECTION
|
|
SECTION_DATA ::= LOCATIONS_SECTION
|
|
SECTION_DATA ::= ATTRIBUTE_TYPES_SECTION
|
|
SECTION_DATA ::= ATTRIBUTE_NAMES_SECTION
|
|
SECTION_DATA ::= REGISTER_TYPES_SECTION
|
|
|
|
// Unknown section.
|
|
SECTION_DATA ::= BYTE*
|
|
```
|
|
|
|
The top level structure of the file is a two-byte "magic number" of `0x0BEF`
|
|
followed by one byte sized FORMAT_VERSION_NUMBER and a list of sections.
|
|
|
|
The current FORMAT_VERSION_NUMBER is 0, and will be increased when BEF format is
|
|
changed.
|
|
|
|
The reader skips over unknown sections, which could be useful for future
|
|
evolution of the format, e.g. if we want to store extra metadata in the BEF
|
|
format for some purpose.
|
|
|
|
### Strings Section
|
|
|
|
#### Grammar
|
|
|
|
```none
|
|
STRINGS_SECTION ::= NULL_TERMINATED_STRING*
|
|
```
|
|
|
|
The Strings section contains a list of NULL terminated strings used by the
|
|
program (e.g. for type and kernel names). Entries in this section are referenced
|
|
by an Offset from the start of the section.
|
|
|
|
Note: This format doesn't support embedded NULL strings, which is sufficient for
|
|
the existing use cases. We could switch to modified Pascal strings if embeded
|
|
NULL characters become important for something (at a space/complexity cost).
|
|
|
|
### Attributes Section
|
|
|
|
#### Grammar
|
|
|
|
```none
|
|
ATTRIBUTES_SECTION ::= BYTE*
|
|
```
|
|
|
|
The Attributes section contains the value of attributes used by kernels in the
|
|
BEF program. They are stored on the natural alignment boundary of the type, and
|
|
the address of the attribute is directly passed into the kernel implementation
|
|
function as an const pointer.
|
|
|
|
BEF files only support a subset of MLIR attributes, currently including:
|
|
|
|
* booleans, stored as 1-byte integers.
|
|
* i1 integers, stored as 1-byte integers.
|
|
* i32 integers, stored as 4-byte little endian integers.
|
|
* i64 integers, stored as 8-byte little endian integers.
|
|
* f32 floats, stored as IEEE single precision floats.
|
|
* f64 floats, stored as IEEE double precision floats.
|
|
* type enums, stored as 1-byte integers. Currently supported type enums are
|
|
i1, i32, i64, f32 and f64.
|
|
* strings, stored as arrays of bytes, not necessarily NULL terminated.
|
|
* dense elements, stored as shape dtype, shape rank, elements count, followed
|
|
by shape elements and elements themselves. Each element can be any of the
|
|
integer and float format above.
|
|
* arrays, all elements of which are of the same type and fixed in width (eg.
|
|
i32, f32, type).
|
|
* aggregates, stored as array of i32 integers, which are offsets to other
|
|
constants in Attributes Section. These nested constants can be of any
|
|
supported attribute type including aggregates. Unlike arrays, an aggregate
|
|
can contain a mix of different attribute types.
|
|
|
|
TODO: Support 8/16-bit integers and floating point constants.
|
|
|
|
#### Rationale
|
|
|
|
The host executor needs to refer to attribute values, and since the BEF file is
|
|
designed to be memory mapped in, we can directly use the encoding in memory.
|
|
This means that kernels will need to bswap attributes for big-endian systems,
|
|
but they generally have bswap'ing loads anyway so this shouldn't impose a
|
|
performance penalty.
|
|
|
|
### Kernels Section
|
|
|
|
#### Grammar
|
|
|
|
```none
|
|
KERNELS_SECTION ::= INTEGER<"KernelCount"> OFFSET<"KernelNameOffset">*
|
|
```
|
|
|
|
The Kernels section defines a table of kernel names, directly corresponding to
|
|
the names in the MLIR host executor program. This allows references from the
|
|
Functions section to use dense indexes into the table.
|
|
|
|
The format of this section is an Integer count of the number of kernels in the
|
|
table, followed by 'count' Integer values which are offsets into the string
|
|
table for the kernel name.
|
|
|
|
#### Rationale
|
|
|
|
We want dense indexes from kernels in the Functions section, and the Host
|
|
Executor wants to resolve names to kernel implementations at startup time
|
|
anyway, as such, it makes sense for the host executor to build its own mutable
|
|
array in memory for kernels.
|
|
|
|
### Types Section
|
|
|
|
#### Grammar
|
|
|
|
```none
|
|
TYPES_SECTION ::= INTEGER<"TypeCount"> OFFSET<"TypeNameOffset">*
|
|
```
|
|
|
|
The Types section defines a table of type names used by the host program, and is
|
|
laid out exactly the same as the [Kernels section](#kernels-section). The
|
|
[Functions section](#functions-section) uses Indexes into this section to
|
|
specify types of registers.
|
|
|
|
### FunctionIndex Section
|
|
|
|
#### Grammar
|
|
|
|
```none
|
|
FUNCTION_INDEX_SECTION ::= INTEGER<"NumFunctions"> FUNCTION_ENTRY*
|
|
FUNCTION_ENTRY ::= BYTE<"FunctionKind"> OFFSET<"Function"> \
|
|
OFFSET<"Name"> INTEGER<"NumArguments"> \
|
|
INDEX<"Type">* INTEGER<"NumResults"> INDEX<"Type">*
|
|
```
|
|
|
|
The FunctionIndex section defines a table of functions in the BEF file, one for
|
|
each function, including the kind of this function (defined in
|
|
[`bef_encoding.h`](https://github.com/tensorflow/runtime/blob/master/include/tfrt/bef/bef_encoding.h)),
|
|
an Offset into the [Functions Section](#functions-section), a name (an Offset
|
|
into the [Strings section](#strings-section), which may be an empty string), and
|
|
a list of argument and result types.
|
|
|
|
#### Rationale
|
|
|
|
This defines a symbol table for the BEF file, allowing clients to look up
|
|
functions by name. While we could intersperse this information into the
|
|
[Functions section](#functions-section) itself, doing so would require the
|
|
reader to make a pass over all of the functions ahead of time. We'd prefer to
|
|
have a quick index that the reader can scan at load time, deferring processing
|
|
of any individual function until it is needed.
|
|
|
|
### Functions Section
|
|
|
|
#### Grammar
|
|
|
|
```none
|
|
FUNCTIONS_SECTION ::= FUNCTION*
|
|
```
|
|
|
|
The Functions section is a list of Function records emitted to the byte stream
|
|
and then addressed by an Offset indicated by the
|
|
[FunctionIndex section](#functionindex-section). Functions section is 4-byte
|
|
aligned.
|
|
|
|
#### Rationale
|
|
|
|
Regions in MLIR are a generalization of a unit of computation. The BEF format
|
|
and BEF executor support a subset of MLIR region features that are core to the
|
|
abstractions we need to model - in particular, while MLIR regions can have
|
|
multiple basic blocks in them, BEF Functions only support a single block - this
|
|
guarantees that they are always be a DAG of computation.
|
|
|
|
Functions are used for top-level functions, which BEF files have direct support
|
|
for (allowing lookup of functions by name) as well as in nested positions for
|
|
control flow and other concepts that occur with MLIR regions.
|
|
|
|
### Function Definition
|
|
|
|
#### Grammar
|
|
|
|
```none
|
|
FUNCTION ::= OFFSET<"Location"> REGISTER_TABLE KERNEL_TABLE \
|
|
RESULT_REGS BYTE<"AlignmentPadding">* KERNEL+
|
|
|
|
REGISTER_TABLE ::= INTEGER<"NumRegs"> REGISTER_ENTRY*
|
|
REGISTER_ENTRY ::= INTEGER<"NumUses">
|
|
|
|
KERNEL_TABLE ::= INTEGER<"NumKernels"> KERNEL_ENTRY*
|
|
KERNEL_ENTRY ::= OFFSET<"KernelOffset"> INTEGER<"NumOperands"> \
|
|
INTEGER<"StreamId">
|
|
|
|
RESULT_REGS ::= INDEX<"Register">*
|
|
```
|
|
|
|
Each function is defined by a location (an offset into the
|
|
[LocationPositions section](#locationpositions-section)), a register table, a
|
|
kernel table, a list of result registers, a list of kernels with 4-byte
|
|
alignment, and ends with a fixed32 integer of value zero.
|
|
|
|
The Register Table is a count of registers, and an entry for each register -
|
|
indicating the number of kernels in this section that use the register.
|
|
|
|
The Kernel Table for a function is a count of kernels, an offset (from the end
|
|
of the Kernel Table) of the start of the kernel, the number of operands that the
|
|
kernel has, and a stream id that is used to help runtime scheduling decisions,
|
|
e.g. successive kernels with the same stream id can be executed in the same
|
|
thread.
|
|
|
|
The kernel list that is following the Kernel Table contains all the kernels used
|
|
in this function. Note that every function has a pseudo kernel that is the
|
|
single entry point to the rest of the kernels. Specifially, a pseudo kernel
|
|
defines registers for function arguments and a pseudo register that is
|
|
conceptually used by kernels that takes no arguments.
|
|
|
|
The result registers specify the register values to return, and must align with
|
|
the function result types from the
|
|
[FunctionIndex section](#functionindex-section).
|
|
|
|
#### Rationale
|
|
|
|
These two tables are key to allowing the BEF executor to efficiently dispatch
|
|
and destroy kernels in a lock-free way. At startup time, the executor inflates
|
|
these two tables into arrays, resolving the type descriptors for the types, and
|
|
building a table of ready counts for the kernels. Having a table of kernels,
|
|
allow the use of kernel indexes.
|
|
|
|
### Kernel Definition
|
|
|
|
#### Grammar
|
|
|
|
```none
|
|
KERNEL ::= KERNEL_HEADER KERNEL_RESULT_TABLE KERNEL_BODY
|
|
|
|
KERNEL_HEADER ::= FIXED32<"KernelCode"> FIXED32<"KernelLocation"> \
|
|
FIXED32<"NumArguments"> FIXED32<"NumAttributes"> \
|
|
FIXED32<"NumFunctions"> FIXED32<"NumResults">
|
|
|
|
KERNEL_RESULT_TABLE::= FIXED32<"NumUsedBys">*
|
|
|
|
KERNEL_BODY ::= FIXED32<KernelArgument>* FIXED32<KernelAttribute>* \
|
|
FIXED32<KernelFunction>* FIXED32<KernelResult>* \
|
|
FIXED32<KernelUsedBy>*
|
|
```
|
|
|
|
Each instance of a kernel includes a kernel header, a result table and a kernel
|
|
body. The kernel header consists of a opcode (an index into the Kernels table,
|
|
defined by the Kernels section), a location (an offset into the
|
|
[LocationPositions section](#locationpositions-section)) the numbers of
|
|
arguments, attributes, functions and results in the kernel body.
|
|
|
|
The result table contains NumResults fixed32 integers, indicating the number of
|
|
users for each corresponding result. The kernel body consists of zero or more
|
|
inputs (indexes into the Register Table for this function), zero or more
|
|
constants (offsets into the [Attributes section](#attributes-section)), zero or
|
|
more functions (indexes into the
|
|
[FunctionIndex section](#functionindex-section)), zero or more results (indexes
|
|
into the Register table for this function), and zero or more 'used by' records
|
|
(which are indexes into the Kernel Table for this function). 'used by' records
|
|
for the same result are grouped together and these groups are in the same order
|
|
of 'result' records. For example, if there are two results, A and B, and A has
|
|
one user (a0) and B has two users (b0, b1), then FIXED32<"NumResults"> will be
|
|
`0x00000002`, followed by two FIXED32<"NumUsedBys">, `0x00000001` `0x00000002`;
|
|
and in the kernel body there will be three "used by" records, a0, b0 and b1,
|
|
consecutively.
|
|
|
|
#### Rationale
|
|
|
|
This record allows us to efficiently form the array of arguments and results
|
|
that get passed to a kernel implementation function. When the kernel completes,
|
|
the UsedBy records allow us to efficiently trigger execution of data dependent
|
|
kernels if they are ready, by decrementing the "NumOperands" field for the
|
|
kernel.
|
|
|
|
### LocationStrings Section
|
|
|
|
#### Grammar
|
|
|
|
```none
|
|
LOCATION_STRINGS_SECTION ::= NULL_TERMINATED_STRING*
|
|
```
|
|
|
|
This section contains a list of NULL terminated strings used by locations
|
|
section. A filename field of FileLineCol location and name field of Name
|
|
location are stored in this section.
|
|
|
|
#### Rationale
|
|
|
|
We choose to store these separately from the string table, because these
|
|
locations are only ever decoded in the case of an error. There is no reason to
|
|
dirty data cache lines with them, and we expect no reuse with other general
|
|
strings.
|
|
|
|
### Locations Section
|
|
|
|
#### Grammar
|
|
|
|
```none
|
|
LOCATIONS_SECTION ::= LOCATION*
|
|
|
|
LOCATION = <UNKNOWN_LOC | FILELINECOL_LOC | NAME_LOC | CALLSITE_LOC
|
|
| FUSED_LOC>
|
|
|
|
UNKNOWN_LOC ::= `0x00`
|
|
|
|
FILELINECOL_LOC ::= `0x01` OFFSET<"Filename"> INTEGER<"LineNum"> \
|
|
INTEGER<"ColumnNum">
|
|
|
|
NAME_LOC ::= `0x02` OFFSET<"Name"> LOCATION<"Child">
|
|
|
|
CALLSITE_LOC ::= `0x03` LOCATION<"Callee"> LOCATION<"Caller">
|
|
|
|
FUSED_LOC ::= `0x04` INTEGER<"NumLocations"> LOCATION*
|
|
```
|
|
|
|
This section contains a list of "locations", which can have five different
|
|
types: Unknown, FileLineCol, Name, CallSite location, and Fused locations.
|
|
Strings used in FileLineCol and Name locations are added to the LocationStrings
|
|
section.
|
|
|
|
### AttributeTypes Section
|
|
|
|
#### Grammar
|
|
|
|
```none
|
|
ATTRIBUTE_TYPES_SECTION ::= INTEGER<"NumAttributes"> ATTRIBUTE_TYPE_ENTRY*
|
|
ATTRIBUTE_TYPE_ENTRY ::= OFFSET<"Attributes"> BYTE<"AttributeType">
|
|
```
|
|
|
|
The AttributeTypes section is an optional section which is not needed for BEF
|
|
execution, but this section is necessary for bef-to-mlir conversion. This
|
|
section describes the type information for each attribute (specified by an
|
|
Offset into [Attributes section](#attributes-section)). An attribute type is an
|
|
one-byte enum described in [Attributes section](#attributes-section) and defined
|
|
in bef_encoding.h.
|
|
|
|
### AttributeNames Section
|
|
|
|
#### Grammar
|
|
|
|
```none
|
|
ATTRIBUTE_NAMES_SECTION ::= INTEGER<"NumFunctions"> KERNEL_TABLE*
|
|
KERNEL_TABLE ::= INTEGER<"NumKernels"> OFFSET<"AttributeName">*
|
|
```
|
|
|
|
The AttributeNames sections is an optional section which is not needed for BEF
|
|
execution, but this section is necessary for bef-to-mlir conversion. This
|
|
section describes the attribute names used by each kernel. There is a 1:1
|
|
mapping between Kernel Table in this section and Function Entries in
|
|
[FunctionIndex section](#functionindex-section). And there is a 1:1 mapping
|
|
between kernel entries in this section and kernel entries in Functions section's
|
|
kernel table. Each kernel entry contains any number of AttributeNames.
|
|
AttributeName is an offset to Strings section that specifies the name of the
|
|
attribute used by this kernel.
|
|
|
|
### RegisterTypes Section
|
|
|
|
#### Grammar
|
|
|
|
```none
|
|
REGISTER_TYPES_SECTION ::= INTEGER<"NumFunctions"> REGISTER_TYPE_TABLE
|
|
REGISTER_TYPE_TABLE ::= INTEGER<"NumRegs"> INDEX<"Types">*
|
|
```
|
|
|
|
The RegisterTypes section is an optional section which is not needed for BEF
|
|
execution, but this section is necessary for bef-to-mlir conversion. This
|
|
section describes the type information for registers in each function. The
|
|
RegisterType Table is a count of registers, and an entry for each register
|
|
indicating its index into [Types section](#types-section). There is a 1:1
|
|
mapping between RegisterType Tables and Function Entries in
|
|
[FunctionIndex section](#functionindex-section). And it is a 1:1 mapping between
|
|
types in a RegisterType Table and registers in the RegisterTable of the
|
|
corresponding Function.
|