opentelemetry-java-instrume.../benchmark-overhead/README.md


# Overhead tests

* [Process](#process)
* [What do we measure?](#what-do-we-measure)
* [Config](#config)
* [Agents](#agents)
* [Automation](#automation)
* [Setup and Usage](#setup-and-usage)
* [Visualization](#visualization)

This directory will contain tools and utilities
that help us to measure the performance overhead introduced by
the agent and to measure how this overhead changes over time.

The overhead tests here should be considered a "macro" benchmark. They serve to measure high-level
overhead as perceived by the operator of a "typical" application. Tests are performed on a Java 11
distribution from [Eclipse Temurin](https://projects.eclipse.org/projects/adoptium.temurin).

## Process

There is one dynamic test here called [OverheadTests](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/benchmark-overhead/src/test/java/io/opentelemetry/OverheadTests.java).
The `@TestFactory` method creates a test pass for each of the [defined configurations](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/benchmark-overhead/src/test/java/io/opentelemetry/config/Configs.java).
Before the tests run, a single collector instance is started. Each test pass has one or more agents configured and those are tested in series.
For each agent defined in a configuration, the test runner (using [testcontainers](https://www.testcontainers.org/)) will:

1. create a fresh postgres instance and populate it with initial data.
2. create a fresh instance of [spring-petclinic-rest](https://github.com/spring-petclinic/spring-petclinic-rest) instrumented with the specified agent
3. measure the time until the petclinic app is marked "healthy" and then write it to a file.
4. if configured, perform a warmup phase. During the warmup phase, a bit of traffic is generated in order to get the application into a steady state (primarily helping facilitate jit compilations). Currently, we use a 30 second warmup time.
5. start a JFR recording by running `jcmd` inside the petclinic container
6. run the [k6 test script](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/benchmark-overhead/k6/basic.js) with the configured number of iterations through the file and the configured number of concurrent virtual users (VUs).
7. after k6 completes, petclinic is shut down
8. after petclinic is shut down, postgres is shut down

And this repeats for every agent configured in each test configuration.

After all the tests are complete, the results are collected and committed back to the `/results` subdirectory as csv and summary text files.

## What do we measure?

For each test pass, we record the following metrics in order to compare agents and determine
relative overhead.

| metric name              | units      | description                                              |
|--------------------------|------------|----------------------------------------------------------|
| Startup time             | ms         | How long it takes for the spring app to report "healthy"
| Total allocated mem      | bytes      | Across the life of the application
| Heap (min)               | bytes      | Smallest observed heap size
| Heap (max)               | bytes      | Largest observed heap size
| Thread switch rate       | # / s      | Max observed thread context switch rate
| GC time                  | ms         | Total amount of time spent paused for garbage collection
| Request mean             | ms         | Average time to handle a single web request (measured at the caller)
| Request p95              | ms         | 95th percentile time to handle a single web requ4st (measured at the caller)
| Iteration mean           | ms         | average time to do a single pass through the k6 test script
| Iteration p95            | ms         | 95th percentile time to do a single pass through the k6 test script
| Peak threads             | #          | Highest number of running threads in the VM, including agent threads
| Network read mean        | bits/s     | Average network read rate
| Network write mean       | bits/s     | Average network write rate
| Average JVM user CPU     | %          | Average observed user CPU (range 0.0-1.0)
| Max JVM user CPU         | %          | Max observed user CPU used (range 0.0-1.0)
| Average machine tot. CPU | %          | Average percentage of machine CPU used (range 0.0-1.0)
| Total GC pause nanos     | ns         | JVM time spent paused due to GC
| Run duration ms          | ms         | Duration of the test run, in ms

## Config

Each config contains the following:

* name
* description
* list of agents (see below)
* maxRequestRate (optional, used to throttle traffic)
* concurrentConnections (number of concurrent virtual users [VUs])
* totalIterations - the number of passes to make through the k6 test script
* warmupSeconds - how long to wait before starting conducting measurements

Currently, we test:

* no agent versus latest released agent
* no agent versus latest snapshot
* latest release vs. latest snapshot

Additional configurations can be created by submitting a PR against the `Configs` class.

### Agents

An agent is defined in code as a name, description, optional URL, and optional additional
arguments to be passed to the JVM (not including `-javaagent:`). New agents may be defined
by creating new instances of the `Agent` class. The `AgentResolver` is used to download
the relevant agent jar for an `Agent` definition.

## Automation

The tests are run nightly via github actions. The results are collected and appended to
a csv file, which is committed back to the repo in the `/results` subdirectory.

## Setup and Usage

The tests require docker to be running. Simply run `OverheadTests` in your IDE.

Alternatively, you can run the tests from
the command line with gradle:

```
cd benchmark-overhead
./gradlew test

```

## Visualization

None yet. Help wanted! Our goal is to have the results and a rich UI running in the
`gh-pages` branch similar to [earlier tools](https://breedx-splk.github.io/iguanodon/web/).
Please help us make this happen.