This upgrades otel to v1.15.0, and the /contrib/ packages to v0.41.0.
Several dependencies are upgraded as dependencies, notably grpc.
This contains a change to grpc, only mapping some grpc.Errors into span
errors if it's Unknown, DeadlineExceeded, Unimplemented, Internal,
Unavailable, or DataLoss, which should be helpful for us as we use grpc
errors semantically in Boulder, especially NotFound.
Add a new shared config stanza which all boulder components can use to
configure their Open Telemetry tracing. This allows components to
specify where their traces should be sent, what their sampling ratio
should be, and whether or not they should respect their parent's
sampling decisions (so that web front-ends can ignore sampling info
coming from outside our infrastructure). It's likely we'll need to
evolve this configuration over time, but this is a good starting point.
Add basic Open Telemetry setup to our existing cmd.StatsAndLogging
helper, so that it gets initialized at the same time as our other
observability helpers. This sets certain default fields on all
traces/spans generated by the service. Currently these include the
service name, the service version, and information about the telemetry
SDK itself. In the future we'll likely augment this with information
about the host and process.
Finally, add instrumentation for the HTTP servers and grpc
clients/servers. This gives us a starting point of being able to monitor
Boulder, but is fairly minimal as this PR is already somewhat unwieldy:
It's really only enough to understand that everything is wired up
properly in the configuration. In subsequent work we'll enhance those
spans with more data, and add more spans for things not automatically
traced here.
Fixes https://github.com/letsencrypt/boulder/issues/6361
---------
Co-authored-by: Aaron Gable <aaron@aarongable.com>
Use the built-in grpc-go client and server interceptor chaining
utilities, instead of the ones provided by go-grpc-middleware.
Simplify our interceptors to call their handlers/invokers directly,
instead of delegating to the metrics interceptor, and add the
metrics interceptor to the chains instead.
Add Honeycomb tracing to all Boulder components which act as
HTTP servers, gRPC servers, or gRPC clients. Add many values
which we currently emit to logs to the trace spans. Add a way to
configure the Honeycomb integration to our config files, and by
default configure all of our tests to "mute" (send nothing).
Followup changes will refine the configuration, attempt to reduce
the new dependency load, and introduce better sampling.
Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218
Because the package versions in go.mod match what we use in Godeps.json,
there are no substantive code diffs. However, there are some tiny
differences resulting from how go mod vendors things differently than
godep:
go mod does not preserve executable permissions on shell scripts
Some packages have import lines like:
package ocsp // import "golang.org/x/crypto/ocsp"
godep used to remove the comment from these lines, but go mod vendor does not.
This introduces several indirect dependencies that we didn't have
before. This is because godep used to operate at a package level, but
go mod operates at a module (~= repository) level. So if we used a
given repository, but didn't use all of its packages, we wouldn't
previously care about the transitive dependencies of the packages we
weren't using. However, in the go mod world, once we care about the
repository, we care about all of that repository's transitive
dependencies. AFAICT this doesn't affect vendoring.
Fixes#4116
The go-grpc-prometheus package by default registers its metrics with Prometheus' global registry. In #3167, when we stopped using the global registry, we accidentally lost our gRPC metrics. This change adds them back.
Specifically, it adds two convenience functions, one for clients and one for servers, that makes the necessary metrics object and registers it. We run these in the main function of each server.
I considered adding these as part of StatsAndLogging, but the corresponding ClientMetrics and ServerMetrics objects (defined by go-grpc-prometheus) need to be subsequently made available during construction of the gRPC clients and servers. We could add them as fields on Scope, but this seemed like a little too much tight coupling.
Also, update go-grpc-prometheus to get the necessary methods.
```
$ go test github.com/grpc-ecosystem/go-grpc-prometheus/...
ok github.com/grpc-ecosystem/go-grpc-prometheus 0.069s
? github.com/grpc-ecosystem/go-grpc-prometheus/examples/testproto [no test files]
```
Fixes https://github.com/letsencrypt/boulder/issues/3205.
Previously, we would only move aside Godeps.json before running `godep save ./...`. However, in order to get a true picture of what is needed, we must also remove the existing `vendor/` directory.
This change also removes some unnecessary dependencies that have piled up over the years, generally test dependencies. Godep used to vendor such dependencies but no longer does.
Updates the various gRPC/protobuf libs (google.golang.org/grpc/... and github.com/golang/protobuf/proto) and the boulder-tools image so that we can update to the newest github.com/grpc-ecosystem/go-grpc-prometheus. Also regenerates all of the protobuf definition files.
Tests run on updated packages all pass.
Unblocks #2633fixes#2636.
There's an off-the-shelf package that provides most of the stats we care about
for gRPC using interceptors. This change vendors go-grpc-prometheus and its
dependencies, and calls out to the interceptors provided by that package from
our own interceptors.
This will allow us to get metrics like latency histograms by call, status codes
by call, and so on.
Fixes#2390.
This change vendors go-grpc-prometheus and its dependencies. Per contributing guidelines, I've run the tests on these dependencies, and they pass:
go test github.com/davecgh/go-spew/spew github.com/grpc-ecosystem/go-grpc-prometheus github.com/grpc-ecosystem/go-grpc-prometheus/examples/testproto github.com/pmezard/go-difflib/difflib github.com/stretchr/testify/assert github.com/stretchr/testify/require github.com/stretchr/testify/suite
ok github.com/davecgh/go-spew/spew 0.022s
ok github.com/grpc-ecosystem/go-grpc-prometheus 0.120s
? github.com/grpc-ecosystem/go-grpc-prometheus/examples/testproto [no test files]
ok github.com/pmezard/go-difflib/difflib 0.042s
ok github.com/stretchr/testify/assert 0.021s
ok github.com/stretchr/testify/require 0.017s
ok github.com/stretchr/testify/suite 0.012s