Update (rewrite) the javaagent structure doc; document bootstrap modules (#6227)

This commit is contained in:
Mateusz Rzeszutek 2022-06-29 05:23:27 +02:00 committed by GitHub
parent bb25a6c47c
commit c978ce22f5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
8 changed files with 149 additions and 108 deletions

View File

@ -52,9 +52,9 @@ See [Running the tests](docs/contributing/running-tests.md)
See [Writing instrumentation](docs/contributing/writing-instrumentation.md)
### Understanding the javaagent components
### Understanding the javaagent structure
See [Understanding the javaagent components](docs/contributing/javaagent-jar-components.md)
See [Understanding the javaagent structure](docs/contributing/javaagent-structure.md)
### Understanding the javaagent instrumentation testing components

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 449 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 290 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 292 KiB

View File

@ -1,102 +0,0 @@
# Understanding the javaagent components
The javaagent jar can logically be divided into 3 parts:
* Modules that live in the system class loader
* Modules that live in the bootstrap class loader
* Modules that live in the agent class loader
## Modules that live in the system class loader
### `javaagent` module
This module consists of single class
`io.opentelemetry.javaagent.OpenTelemetryAgent` which implements [Java
instrumentation
agent](https://docs.oracle.com/javase/7/docs/api/java/lang/instrument/package-summary.html).
This class is loaded during application startup by application classloader.
Its sole responsibility is to push agent's classes into JVM's bootstrap
classloader and immediately delegate to
`io.opentelemetry.javaagent.bootstrap.AgentInitializer` (now in the bootstrap class loader)
class from there.
## Modules that live in the bootstrap class loader
### `javaagent-bootstrap` module
`io.opentelemetry.javaagent.bootstrap.AgentInitializer` and a few other classes that live in the bootstrap class
loader but are not used directly by auto-instrumentation
### `instrumentation-api` and `javaagent-instrumentation-api` modules
These modules contain support classes for actual instrumentations to be loaded
later and separately. These classes should be available from all possible
classloaders in the running application. For this reason the `javaagent` module puts
all these classes into JVM's bootstrap classloader. For the same reason this
module should be as small as possible and have as few dependencies as
possible. Otherwise, there is a risk of accidentally exposing these classes to
the actual application.
`instrumentation-api` contains classes that are needed for both library and auto-instrumentation,
while `javaagent-instrumentation-api` contains classes that are only needed for auto-instrumentation.
## Modules that live in the agent class loader
### `javaagent-tooling`, `javaagent-extension-api` modules and `instrumentation` submodules
Contains everything necessary to make instrumentation machinery work,
including integration with [ByteBuddy](https://bytebuddy.net/) and actual
library-specific instrumentations. As these classes depend on many classes
from different libraries, it is paramount to hide all these classes from the
host application. This is achieved in the following way:
- When `javaagent` module builds the final agent, it moves all classes from
`instrumentation` submodules, `javaagent-tooling` and `javaagent-extension-api` modules
into a separate folder inside final jar file, called`inst`.
In addition, the extension of all class files is changed from `class` to `classdata`.
This ensures that general classloaders cannot find nor load these classes.
- When `io.opentelemetry.javaagent.bootstrap.AgentInitializer` is invoked, it creates an
instance of `io.opentelemetry.javaagent.bootstrap.AgentClassLoader`, loads an
`io.opentelemetry.javaagent.tooling.AgentInstaller` from that `AgentClassLoader`
and then passes control on to the `AgentInstaller` (now in the
`AgentClassLoader`). The `AgentInstaller` then installs all of the
instrumentations with the help of ByteBuddy. Instead of using agent classloader all agent classes
could be shaded and used from the bootstrap classloader. However, this opens de-serialization
security vulnerability and in addition to that the shaded classes are harder to debug.
The complicated process above ensures that the majority of
auto-instrumentation agent's classes are totally isolated from application
classes, and an instrumented class from arbitrary classloader in JVM can
still access helper classes from bootstrap classloader.
### Agent jar structure
If you now look inside
`javaagent/build/libs/opentelemetry-javaagent-<version>.jar`, you will see the
following "clusters" of classes:
Available in the system class loader:
- `io/opentelemetry/javaagent/bootstrap/AgentBootstrap` - the one class from `javaagent`
module
Available in the bootstrap class loader:
- `io/opentelemetry/javaagent/bootstrap/` - contains the `javaagent-bootstrap` module
- `io/opentelemetry/javaagent/instrumentation/api/` - contains the `javaagent-instrumentation-api` module
- `io/opentelemetry/javaagent/shaded/instrumentation/api/` - contains the `instrumentation-api` module,
shaded during creation of `javaagent` jar file by Shadow Gradle plugin
- `io/opentelemetry/javaagent/shaded/io/` - contains the OpenTelemetry API and its dependency gRPC
Context, both shaded during creation of `javaagent` jar file by Shadow Gradle plugin
- `io/opentelemetry/javaagent/slf4j/` - contains SLF4J and its simple logger implementation, shaded
during creation of `javaagent` jar file by Shadow Gradle plugin
Available in the agent class loader:
- `inst/` - contains `javaagent-tooling` and `javaagent-extension-api` modules and
`instrumentation` submodules, loaded and isolated inside `AgentClassLoader`.
Includes the OpenTelemetry SDK.
![Agent initialization sequence](initialization-sequence.svg)
[Image source](https://docs.google.com/drawings/d/1GHAcJ8AOaf_v2Ip82cQD9dN0mtvSk2C1B11KfwV2U8o)
![Agent classloader state](classloader-state.svg)
[Image source](https://docs.google.com/drawings/d/1x_eiGRodZ715ai6gDMTkyPYU4_wQnEkS4LQKSasEJAk)

View File

@ -0,0 +1,112 @@
# Javaagent structure
The javaagent can be logically divided into several parts, based on the class loader that contains
particular classes (and resources) in the runtime:
* The main agent class living in the system class loader.
* Classes that live in the bootstrap class loader.
* Classes that live in the agent class loader.
* Javaagent extensions, and the extension class loader(s).
## System class loader
The only class that is loaded by the system class loader is the
`io.opentelemetry.javaagent.OpenTelemetryAgent` class. This is the main class of the javaagent, it
implements the
[Java instrumentation agent specification](https://docs.oracle.com/javase/8/docs/api/java/lang/instrument/package-summary.html).
This class is loaded during application startup by the system classloader. Its sole
responsibility is to push the agent's classes into JVM's bootstrap classloader and immediately
delegate to the `io.opentelemetry.javaagent.bootstrap.AgentInitializer` class, living in the
bootstrap class loader.
Inside the javaagent jar, this class is located in the `io/opentelemetry/javaagent/` directory.
## Bootstrap class loader
The bootstrap class loader contains several modules:
* **The `javaagent-bootstrap` module**:
it contains classes that continue the initialization work started by `OpenTelemetryAgent`, as well
as some internal javaagent classes and interfaces that must be globally available to the whole
application. This module is internal and its APIs are considered unstable.
* **The `instrumentation-api` and `instrumentation-api-semconv` modules**:
these modules contain the [Instrumenter API](using-instrumenter-api.md) and other related
utilities. Because they are used by almost all instrumentations, they must be globally available
to all classloaders running within the instrumented application. The classes located in these
modules are used by both javaagent and library instrumentations - they all must be usable even
without the javaagent present.
* **The `instrumentation-api-annotation-support` module**:
it contains classes that provide support for annotation-based auto-instrumentation, e.g.
the `@WithSpan` annotation. This module is internal and its APIs are considered unstable.
* **The `instrumentation-appender-api-internal` module**:
it contains classes that constitute the "appender API", used by logging instrumentations. This
module is internal and its APIs are considered unstable.
* **The `io.opentelemetry.javaagent.bootstrap` package from the `javaagent-extension-api` module**:
this package contains several instrumentation utilities that are only usable when an application
is instrumented with the javaagent; for example, the `Java8BytecodeBridge` that should be used
inside advice classes.
* All modules using the `otel.javaagent-bootstrap` Gradle plugin:
these modules contain instrumentation-specific classes that must be globally available in the
bootstrap class loader. For example, classes that are used to coordinate
different `InstrumentationModule`s, like the common utilities for storing Servlet context path, or
the thread local switch used to coordinate different Kafka consumer instrumentations. By
convention, all these modules are named according to this
pattern: `:instrumentation:...:bootstrap`.
* The [OpenTelemetry API](https://github.com/open-telemetry/opentelemetry-java/tree/main/api/all).
Inside the javaagent jar, these classes are all located under the `io/opentelemetry/javaagent/`
directory. Aside from the javaagent-specific `javaagent-bootstrap` and `javaagent-extension-api`
modules, all other modules are relocated and placed under the `io/opentelemetry/javaagent/shaded/`
directory. This is done to avoid conflicts with the application code, which may contain different
versions of some of our APIs (`opentelemetry-api`, `instrumentation-api`).
## Agent class loader
The agent classloader contains almost everything else not mentioned before, including:
* **The `javaagent-tooling` module**:
this module picks up the initialization process started by `OpenTelemetryAgent`
and `javaagent-bootstrap` and actually finishes the work, starting up the OpenTelemetry SDK and
building and installing the `ClassFileTransformer` in the JVM. The javaagent
uses [ByteBuddy](https://bytebuddy.net) to configure and construct the `ClassFileTransformer`.
This module is internal and its APIs are considered unstable.
* **The `muzzle` module**:
it contains classes that are internally used by [muzzle](muzzle.md), our safety net feature. This
module is internal and its APIs are considered unstable.
* **The `io.opentelemetry.javaagent.extension` package from the `javaagent-extension-api` module**:
this package contains common extension points and SPIs that can be used to customize the agent
behavior.
* All modules using the `otel.javaagent-instrumentation` Gradle plugin:
these modules contain actual javaagent instrumentations. Almost all of them implement
the `InstrumentationModule`, some of them include a library instrumentation as an `implementation`
dependency. You can read more about writing instrumentations [here](writing-instrumentation.md).
By convention, all these modules are named according to this
pattern: `:instrumentation:...:javaagent`.
* The [OpenTelemetry SDK](https://github.com/open-telemetry/opentelemetry-java/tree/main/sdk/all),
along with various exporters and SDK extensions.
* [ByteBuddy](https://bytebuddy.net).
Inside the javaagent jar, all classes and resources that are meant to be loaded by
the `AgentClassLoader` are placed inside the `inst/` directory. All Java class files have
the `.classdata` extension (instead of just `.class`) - this ensures that they will not be loaded by
general class loaders included with the application, making the javaagent internals completely
isolated from the application code.
If a javaagent instrumentation includes a library instrumentation as an `implementation` dependency,
that dependency is shaded to prevent conflicts with application code (which may or may not include
the same library classes in different version).
## Extension class loader
The extension class loader(s) is used to load custom extensions, if they're used. Extensions can be
external jars (provided by the `otel.javaagent.extensions` configuration property), or can be
embedded into an OpenTelemetry javaagent distribution (by adding the extension jars into
the `extensions/` directory inside the javaagent jar). Each extension is loaded in isolation, in a
separate class loader - this is intended to reduce the possibility of conflicts between different
extensions. Extension jars can be compiled against unshaded versions of the OpenTelemetry APIs,
the javaagent will apply shading dynamically in the runtime, when the extension is loaded.
## Class loader hierarchy graph
![Agent classloader hierarchy](classloader-hierarchy.svg)
[Image source](https://docs.google.com/drawings/d/1DOftemu_96_0RggzOV3hFXejqeZWTmPBgbkaUhHw--g)

View File

@ -328,7 +328,7 @@ code, see [this section](#writing-java-agent-unit-tests).
### Instrumenting code that is not available as a Maven dependency
If an instrumented server or library jar isn't available in any public Maven repository you can
create a module with stub classes that defines only the methods that you need to write the
create a module with stub classes that define only the methods that you need to write the
instrumentation. Methods in these stub classes can just `throw new UnsupportedOperationException()`;
these classes are only used to compile the advice classes and won't be packaged into the agent.
During runtime, real classes from instrumented server or library will be used.
@ -350,6 +350,38 @@ compileOnly(project(":instrumentation:yarpc-1.0:compile-stub"))
Now you can use your stub classes inside the javaagent instrumentation.
### Coordinating different `InstrumentationModule`s
When you need to share some classes between different `InstrumentationModule`s and communicate
between different instrumentations (which might be injected/loaded into different class loaders),
you can add instrumentation-specific bootstrap module that contains all the common classes.
That way you can use these shared, globally available utilities to communicate between different
instrumentation modules.
Some examples of this include:
* Application server instrumentations communicating with Servlet API instrumentations.
* Different high-level Kafka consumer instrumentations suppressing the low-level `kafka-clients`
instrumentation.
Create a module named `bootstrap` and add a `build.gradle.kts` file with the following content:
```kotlin
plugins {
id("otel.javaagent-bootstrap")
}
```
In all `javaagent` modules that need to access the new shared module, add a `compileOnly`
dependency:
```kotlin
compileOnly(project(":instrumentation:yarpc-1.0:bootstrap"))
```
All classes from the newly added bootstrap module will be loaded by the bootstrap module and
globally available within the JVM. **IMPORTANT: Note that you _cannot_ use any third-party libraries
here, including the instrumented library - you can only use JDK and OpenTelemetry API classes.**
## Writing Java agent unit tests
As mentioned before, tests in the `javaagent` module cannot access the javaagent instrumentation

View File

@ -51,7 +51,7 @@ potentially cause linkage errors.
## Classloader separation
See more detail about the classloader separation [here](./contributing/javaagent-jar-components.md).
See more detail about the classloader separation [here](./contributing/javaagent-structure.md).
The Java agent makes sure to include as little code as possible in the user app's classloader, and
all code that is included is either unique to the agent itself or shaded in the agent build. This is