The conditional introduced in
https://github.com/letsencrypt/boulder/pull/8067 contained a bug left
over from an earlier draft of the PR. Remove the zero-length check to
ensure the code matches the documented intent.
Give boulder-observer the ability to detect if the CRL it fetches is the
CRL it expects, by comparing that CRLs issuingDistributionPoint
extension to the prober's configured URL. Only do this if instructed to
(by configuring the CRL prober as "partitioned") because non-partitioned
CRLs do not necessarily contain an IDP.
Fixes https://github.com/letsencrypt/boulder/issues/7527
Many services already have --addr and/or --debug-addr flags.
However, it wasn't universal, so this PR adds flags to commands where
they're not currently present.
This makes it easier to use a shared config file but listen on different
ports, for running multiple instances on a single host.
The config options are made optional as well, and removed from
config-next/.
We had this disabled in one of our probes, but not all. Add a common
dialer that disables the fallback and use it in each applicable prober.
This avoids masking failures of IPv6 connectivity.
Also change to use contexts instead of timeout parameters consistently.
The shared dialer is in the new `obsdialer` package because putting it
in `observer` results in import cycles.
This is potentially useful for diagnosing issues with connection
timeouts, which could have separate causes from HTTP errors. For
instance, a connection timeout is more likely to be caused by network
congestion or high CPU usage on the load balancer.
This is the counterpart of obs_tls_not_after, and is useful for
satisfying requirements like "the target certificate must not be more
than X days old."
Fixes#7119
Delete our forked version of the x509 library, and update all call-sites
to use the version that we upstreamed and got released in go1.21. This
requires making a few changes to calling code:
- replace crl_x509.RevokedCertificate with x509.RevocationListEntry
- replace RevocationList.RevokedCertificates with
RevocationList.RevokedCertificateEntries
- make RevocationListEntry.ReasonCode a non-pointer integer
Our lints cannot yet be updated to use the new types and fields, because
those improvements have not yet been adopted by the zcrypto/x509 package
used by the linting framework.
Fixes https://github.com/letsencrypt/boulder/issues/6741
For "ordinary" errors like "file not found" for some part of the config,
we would prefer to log an error and exit without logging about a panic
and printing a stack trace.
To achieve that, we want to call `defer AuditPanic()` once, at the top
of `cmd/boulder`'s main. That's so early that we haven't yet parsed the
config, which means we haven't yet initialized a logger. We compromise:
`AuditPanic` now calls `log.Get()`, which will retrieve the configured
logger if one has been set up, or will create a default one (which logs
to stderr/stdout).
AuditPanic and Fail/FailOnError now cooperate: Fail/FailOnError panic
with a special type, and AuditPanic checks for that type and prints a
simple message before exiting when it's present.
This PR also coincidentally fixes a bug: panicking didn't previously
cause the program to exit with nonzero status, because it recovered the
panic but then did not explicitly exit nonzero.
Fixes#6933
I think ideally we'd only ever call exportMetrics
with a valid time, but that's a bit bigger of a refactor of this code.
This was the fix we lightly decided on in the discussion of #6635Fixes#6635
Add a new shared config stanza which all boulder components can use to
configure their Open Telemetry tracing. This allows components to
specify where their traces should be sent, what their sampling ratio
should be, and whether or not they should respect their parent's
sampling decisions (so that web front-ends can ignore sampling info
coming from outside our infrastructure). It's likely we'll need to
evolve this configuration over time, but this is a good starting point.
Add basic Open Telemetry setup to our existing cmd.StatsAndLogging
helper, so that it gets initialized at the same time as our other
observability helpers. This sets certain default fields on all
traces/spans generated by the service. Currently these include the
service name, the service version, and information about the telemetry
SDK itself. In the future we'll likely augment this with information
about the host and process.
Finally, add instrumentation for the HTTP servers and grpc
clients/servers. This gives us a starting point of being able to monitor
Boulder, but is fairly minimal as this PR is already somewhat unwieldy:
It's really only enough to understand that everything is wired up
properly in the configuration. In subsequent work we'll enhance those
spans with more data, and add more spans for things not automatically
traced here.
Fixes https://github.com/letsencrypt/boulder/issues/6361
---------
Co-authored-by: Aaron Gable <aaron@aarongable.com>
- Require `letsencrypt/validator` package.
- Add a framework for registering configuration structs and any custom
validators for each Boulder component at `init()` time.
- Add a `validate` subcommand which allows you to pass a `-component`
name and `-config` file path.
- Expose validation via exported utility functions
`cmd.LookupConfigValidator()`, `cmd.ValidateJSONConfig()` and
`cmd.ValidateYAMLConfig()`.
- Add unit test which validates all registered component configuration
structs against test configuration files.
Part of #6052
We rely on the ratelimit/ package in CI to validate our ratelimit
configurations. However, because that package relies on cmd/ just for
cmd.ConfigDuration, many additional dependencies get pulled in.
This refactors just that struct to a separate config package. This was
done using Goland's automatic refactoring tooling, which also organized
a few imports while it was touching them, keeping standard library,
internal and external dependencies grouped.
Adds a custom YAML unmarshaller in the `//strictyaml` package based on
`go-yaml/yaml v3` with unique key detection enabled and ensures that
target struct is able to contain all target fields.
Fixes https://github.com/letsencrypt/boulder/issues/3344.
When attempting to add TLS probe monitoring, got the error `TLS is not a
registered Prober type`. This PR adds TLS Prober to `observer.go` to
complete its registration and adds TLS Prober to the observer README.
Co-authored-by: Samantha <hello@entropy.cat>
Add a new kind of prober to boulder-observer which makes a TLS
connection to the target hostname and expects the certificate presented
for the TLS handshake to have certain properties, such as being valid,
expired, or revoked.
Part of #5927
Adding an insecure option to HTTP prober so that it can still check the
status of sites that we expect to be insecure (e.g. expired sites).
Co-authored-by: Aaron Gable <aaron@aarongable.com>
This PR is a follow-up to #6277 and #6290 to add a new prober type to
boulder-observer for monitoring CRLs, making use of the new prober-specific
metrics capability to define the following new metrics:
- `obs_crl_this_update` the Unix timestamp of the CRL's thisUpdate value
- `obs_crl_next_update` the Unix timestamp of the CRL's nextUpdate value
- `obs_crl_revoked_cert_count` the number of certificates listed in the CRL
**Configuration:** Each defined CRL monitor takes a single configuration option,
a URL that specifies the location of the CRL to monitor.
**Metrics:** The three CRL-specific metrics described above are only published
at /metrics if at least one valid monitor is defined in the config.yml. The
metrics have a single label `url` that is set to the URL configured for the
monitor
The gopkg.in/yaml.v2 package has a potential crash when
parsing malicious input. Although we only use the yaml
package to parse trusted configuration, update to v3 anyway.
We have decided that we don't like the if err := call(); err != nil
syntax, because it creates confusing scopes, but we have not cleaned up
all existing instances of that syntax. However, we have now found a
case where that syntax enables a bug: It caused readers to believe that
a later err = call() statement was assigning to an already-declared err
in the local scope, when in fact it was assigning to an
already-declared err in the parent scope of a closure. This caused our
ineffassign and staticcheck linters to be unable to analyze the
lifetime of the err variable, and so they did not complain when we
never checked the actual value of that error.
This change standardizes on the two-line error checking syntax
everywhere, so that we can more easily ensure that our linters are
correctly analyzing all error assignments.
Add `stylecheck` to our list of lints, since it got separated out from
`staticcheck`. Fix the way we configure both to be clearer and not
rely on regexes.
Additionally fix a number of easy-to-change `staticcheck` and
`stylecheck` violations, allowing us to reduce our number of ignored
checks.
Part of #5681