Many services already have --addr and/or --debug-addr flags.
However, it wasn't universal, so this PR adds flags to commands where
they're not currently present.
This makes it easier to use a shared config file but listen on different
ports, for running multiple instances on a single host.
The config options are made optional as well, and removed from
config-next/.
For "ordinary" errors like "file not found" for some part of the config,
we would prefer to log an error and exit without logging about a panic
and printing a stack trace.
To achieve that, we want to call `defer AuditPanic()` once, at the top
of `cmd/boulder`'s main. That's so early that we haven't yet parsed the
config, which means we haven't yet initialized a logger. We compromise:
`AuditPanic` now calls `log.Get()`, which will retrieve the configured
logger if one has been set up, or will create a default one (which logs
to stderr/stdout).
AuditPanic and Fail/FailOnError now cooperate: Fail/FailOnError panic
with a special type, and AuditPanic checks for that type and prints a
simple message before exiting when it's present.
This PR also coincidentally fixes a bug: panicking didn't previously
cause the program to exit with nonzero status, because it recovered the
panic but then did not explicitly exit nonzero.
Fixes#6933
Export new prometheus metrics for the `notBefore` and `notAfter` fields
to track internal certificate validity periods when calling the `Load()`
method for a `*tls.Config`. Each metric is labeled with the `serial`
field.
```
tlsconfig_notafter_seconds{serial="2152072875247971686"} 1.664821961e+09
tlsconfig_notbefore_seconds{serial="2152072875247971686"} 1.664821960e+09
```
Fixes https://github.com/letsencrypt/boulder/issues/6829
Add a new shared config stanza which all boulder components can use to
configure their Open Telemetry tracing. This allows components to
specify where their traces should be sent, what their sampling ratio
should be, and whether or not they should respect their parent's
sampling decisions (so that web front-ends can ignore sampling info
coming from outside our infrastructure). It's likely we'll need to
evolve this configuration over time, but this is a good starting point.
Add basic Open Telemetry setup to our existing cmd.StatsAndLogging
helper, so that it gets initialized at the same time as our other
observability helpers. This sets certain default fields on all
traces/spans generated by the service. Currently these include the
service name, the service version, and information about the telemetry
SDK itself. In the future we'll likely augment this with information
about the host and process.
Finally, add instrumentation for the HTTP servers and grpc
clients/servers. This gives us a starting point of being able to monitor
Boulder, but is fairly minimal as this PR is already somewhat unwieldy:
It's really only enough to understand that everything is wired up
properly in the configuration. In subsequent work we'll enhance those
spans with more data, and add more spans for things not automatically
traced here.
Fixes https://github.com/letsencrypt/boulder/issues/6361
---------
Co-authored-by: Aaron Gable <aaron@aarongable.com>
Enable the errcheck linter. Update the way we express exclusions to use
the new, non-deprecated, non-regex-based format. Fix all places where we
began accidentally violating errcheck while it was disabled.
- Require `letsencrypt/validator` package.
- Add a framework for registering configuration structs and any custom
validators for each Boulder component at `init()` time.
- Add a `validate` subcommand which allows you to pass a `-component`
name and `-config` file path.
- Expose validation via exported utility functions
`cmd.LookupConfigValidator()`, `cmd.ValidateJSONConfig()` and
`cmd.ValidateYAMLConfig()`.
- Add unit test which validates all registered component configuration
structs against test configuration files.
Part of #6052
Remove the need for clients to explicitly call bgrpc.NewClientMetrics,
by moving that call inside bgrpc.ClientSetup. In case ClientSetup is
called multiple times, use the recommended method to gracefully recover
from registering duplicate metrics. This makes gRPC client setup much
more similar to gRPC server setup after the previous server refactoring
change landed.
This refactors how rocsp-tool processes its subcommands so we're less
likely to make mistakes (like under- or over- including subcommands in the
help output).
Emit PEM output instead of pretty-printed output. Send the pretty-printed
output straight to stdout instead of via a logger, so the internal newlines don't
get escaped.
Fixes#6310
The metadata values were planned to be used for scanning Redis in
ocsp-updater. Since we won't do that, remove it. Happily, this also
allows us to get rid of shortIssuerId.
Removing the issuer check in rocsp_sa.go uncovered a "boxed nil" problem:
SA was doing a nil check against an interface field that in practice was
never nil (because it was promoted from a concrete type at construction
time). So we would always hit the ROCSP path. But one of the first steps
in that path was looking up an issuer ID. Since `test/config` never
had the issuers set, we would look up the issuer ID, not find it, and
return an error before we attempted to call storeResponse. To fix this,
I made `NewSQLStorageAuthority` take a concrete `*rocsp.WritingClient`
instead of an interface, and check for nil before assigning it to an
internal interface field.
Built on top of #6201.
Create a new type `db.MappedSelector` which exposes a new
`Query` method. This method behaves similar to gorp's
`SelectFoo` methods, in that it uses the desired result type to
look up the correct table to query and uses reflection to map
the table columns to the struct fields. It behaves similarly to
the stdlib's `sql.Query` in that it returns a `Rows` object which
can be iterated over to get one row of results at a time. And it
improves both of those by using generics, rather than `interface{}`,
to provide a nicely-typed calling interface.
Use this new type to simplify the existing streaming query in
`SerialsForIncident`. Similarly use the new type to simplify
rocsp-tool's and ocsp-updater's streams of `CertStatusMetadata`.
This new type will also be used by the crl-updater's upcoming
`GetRevokedCerts` streaming query.
Fixes#6173
ServiceConfig is only needed for components that act as gRPC services.
For rocsp-tool, which is a gRPC client but is long-running enough to
merit a debug port, we should provide DebugAddr in the config.
We have decided that we don't like the if err := call(); err != nil
syntax, because it creates confusing scopes, but we have not cleaned up
all existing instances of that syntax. However, we have now found a
case where that syntax enables a bug: It caused readers to believe that
a later err = call() statement was assigning to an already-declared err
in the local scope, when in fact it was assigning to an
already-declared err in the parent scope of a closure. This caused our
ineffassign and staticcheck linters to be unable to analyze the
lifetime of the err variable, and so they did not complain when we
never checked the actual value of that error.
This change standardizes on the two-line error checking syntax
everywhere, so that we can more easily ensure that our linters are
correctly analyzing all error assignments.
Boulder components initialize their gorp and gorp-less (non-wrapped) database
clients via two new SA helpers. These helpers handle client construction,
database metric initialization, and (for gorp only) debug logging setup.
Removes transaction isolation parameter `'READ-UNCOMMITTED'` from all database
connections.
Fixes#5715Fixes#5889
We can scan metadata and get the age of responses.
We can scan responses and print them in base64.
Note: this issues a GET for each key, and blocks on the result. For much
faster scanning we will want to introduce parallel GETs in a subsequent
PR.
Also, add a `get` operation to get a single entry.
Fixes#5830
If configured, ocsp-updater will write responses to Redis in parallel
with MariaDB, giving up if Redis is slower and incrementing a stat.
Factors out the ShortIDIssuer concept from rocsp-tool into
rocsp_config.
This splits rocsp-tool/main.go into main.go, client.go, issuers.go,
and inflight.go.
Adds tests for issuers and inflight, plus storeResponse in
client.go. Doesn't yet have a test for loadFromDB in client.go.
Part of #5786
Previously loadFromDB was calling cl.storeResponse, which parses, stores, and
then fetches a response, and logs it to stderr. Since we'll be storing
responses at high volume, we don't want to log them all to stderr. And
we're willing to trust that the CA signed a valid response, so we don't
need to parse it again. And we certainly don't need to fetch it right
after storing it.
Fixes#5782
This scans the database for certificateStatus rows, gets them signed by the CA, and writes them to Redis.
Also, bump the default PoolSize for Redis to 100.
This is a sort of proof of concept of the Redis interaction, which will
evolve into a tool for inspection and manual repair of missing entries,
if we find ourselves needing to do that.
The important bits here are rocsp/rocsp.go and
cmd/rocsp-tool/main.go. Also, the newly-vendored Redis client.