Commit Graph

30 Commits

Author SHA1 Message Date
James Renken ac68828f43
Replace most uses of net.IP with netip.Addr (#8205)
Retain `net.IP` only where we directly work with `x509.Certificate` and
friends.

Fixes #5925
Depends on #8196
2025-05-27 15:05:35 -07:00
Phil Porada 7ea51e5f91
boulder-observer: check certificate status via CRL too (#8186)
Let's Encrypt [recently removed OCSP URLs from
certificates](https://community.letsencrypt.org/t/removing-ocsp-urls-from-certificates/236699)
which unfortunately caused the boulder-observer TLS prober to panic.
This change short circuits the OCSP checking logic if no OCSP URL exists
in the to-be-checked certificate.

Fixes https://github.com/letsencrypt/boulder/issues/8185

---------

Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
2025-05-20 09:24:21 -07:00
Aaron Gable 75a89f7a4a
Simplify and fix CRL observer IDP check (#8069)
The conditional introduced in
https://github.com/letsencrypt/boulder/pull/8067 contained a bug left
over from an earlier draft of the PR. Remove the zero-length check to
ensure the code matches the documented intent.
2025-03-17 14:34:14 -07:00
Aaron Gable d045b387ef
Observer: detect CRL IDP mismatch (#8067)
Give boulder-observer the ability to detect if the CRL it fetches is the
CRL it expects, by comparing that CRLs issuingDistributionPoint
extension to the prober's configured URL. Only do this if instructed to
(by configuring the CRL prober as "partitioned") because non-partitioned
CRLs do not necessarily contain an IDP.

Fixes https://github.com/letsencrypt/boulder/issues/7527
2025-03-14 14:52:29 -07:00
forcedebug b33d28c8bd
Remove repeated words in comments (#7445)
Signed-off-by: forcedebug <forcedebug@outlook.com>
2024-04-23 10:30:33 -04:00
Matthew McPherrin cb5384dcd7
Add --addr and/or --debug-addr flags to all commands (#7175)
Many services already have --addr and/or --debug-addr flags.

However, it wasn't universal, so this PR adds flags to commands where
they're not currently present.

This makes it easier to use a shared config file but listen on different
ports, for running multiple instances on a single host.

The config options are made optional as well, and removed from
config-next/.
2023-12-07 17:41:01 -08:00
Jacob Hoffman-Andrews 015cea3853
observer: disable IPv6-to-IPv4 fallback (#7173)
We had this disabled in one of our probes, but not all. Add a common
dialer that disables the fallback and use it in each applicable prober.
This avoids masking failures of IPv6 connectivity.

Also change to use contexts instead of timeout parameters consistently.

The shared dialer is in the new `obsdialer` package because putting it
in `observer` results in import cycles.
2023-11-28 09:53:20 -08:00
Jacob Hoffman-Andrews c84201c09a
observer: add TCP prober (#7118)
This is potentially useful for diagnosing issues with connection
timeouts, which could have separate causes from HTTP errors. For
instance, a connection timeout is more likely to be caused by network
congestion or high CPU usage on the load balancer.
2023-10-27 09:11:18 -07:00
Jacob Hoffman-Andrews 1cde2e8861
observer: add obs_tls_not_before metric (#7120)
This is the counterpart of obs_tls_not_after, and is useful for
satisfying requirements like "the target certificate must not be more
than X days old."

Fixes #7119
2023-10-26 12:59:12 -07:00
Aaron Gable cb28a001e9
Unfork crl x509 (#7078)
Delete our forked version of the x509 library, and update all call-sites
to use the version that we upstreamed and got released in go1.21. This
requires making a few changes to calling code:
- replace crl_x509.RevokedCertificate with x509.RevocationListEntry
- replace RevocationList.RevokedCertificates with
RevocationList.RevokedCertificateEntries
- make RevocationListEntry.ReasonCode a non-pointer integer

Our lints cannot yet be updated to use the new types and fields, because
those improvements have not yet been adopted by the zcrypto/x509 package
used by the linting framework.

Fixes https://github.com/letsencrypt/boulder/issues/6741
2023-09-15 20:25:13 -07:00
Jacob Hoffman-Andrews a2b2e53045
cmd: fail without panic (#6935)
For "ordinary" errors like "file not found" for some part of the config,
we would prefer to log an error and exit without logging about a panic
and printing a stack trace.

To achieve that, we want to call `defer AuditPanic()` once, at the top
of `cmd/boulder`'s main. That's so early that we haven't yet parsed the
config, which means we haven't yet initialized a logger. We compromise:
`AuditPanic` now calls `log.Get()`, which will retrieve the configured
logger if one has been set up, or will create a default one (which logs
to stderr/stdout).

AuditPanic and Fail/FailOnError now cooperate: Fail/FailOnError panic
with a special type, and AuditPanic checks for that type and prints a
simple message before exiting when it's present.

This PR also coincidentally fixes a bug: panicking didn't previously
cause the program to exit with nonzero status, because it recovered the
panic but then did not explicitly exit nonzero.

Fixes #6933
2023-06-20 12:29:02 -07:00
Matthew McPherrin 68e1c6bde7
Don't update the notAfter Gauge with zeros (#6924)
I think ideally we'd only ever call exportMetrics
with a valid time, but that's a bit bigger of a refactor of this code.

This was the fix we lightly decided on in the discussion of #6635

Fixes #6635
2023-05-31 14:19:28 -04:00
Matthew McPherrin 0060e695b5
Introduce OpenTelemetry Tracing (#6750)
Add a new shared config stanza which all boulder components can use to
configure their Open Telemetry tracing. This allows components to
specify where their traces should be sent, what their sampling ratio
should be, and whether or not they should respect their parent's
sampling decisions (so that web front-ends can ignore sampling info
coming from outside our infrastructure). It's likely we'll need to
evolve this configuration over time, but this is a good starting point.

Add basic Open Telemetry setup to our existing cmd.StatsAndLogging
helper, so that it gets initialized at the same time as our other
observability helpers. This sets certain default fields on all
traces/spans generated by the service. Currently these include the
service name, the service version, and information about the telemetry
SDK itself. In the future we'll likely augment this with information
about the host and process.

Finally, add instrumentation for the HTTP servers and grpc
clients/servers. This gives us a starting point of being able to monitor
Boulder, but is fairly minimal as this PR is already somewhat unwieldy:
It's really only enough to understand that everything is wired up
properly in the configuration. In subsequent work we'll enhance those
spans with more data, and add more spans for things not automatically
traced here.

Fixes https://github.com/letsencrypt/boulder/issues/6361

---------

Co-authored-by: Aaron Gable <aaron@aarongable.com>
2023-04-21 10:46:59 -07:00
Samantha b2224eb4bc
config: Add validation tags to all configuration structs (#6674)
- Require `letsencrypt/validator` package.
- Add a framework for registering configuration structs and any custom
validators for each Boulder component at `init()` time.
- Add a `validate` subcommand which allows you to pass a `-component`
name and `-config` file path.
- Expose validation via exported utility functions
`cmd.LookupConfigValidator()`, `cmd.ValidateJSONConfig()` and
`cmd.ValidateYAMLConfig()`.
- Add unit test which validates all registered component configuration
structs against test configuration files.

Part of #6052
2023-03-21 14:08:03 -04:00
Matthew McPherrin 391a59921b
Move cmd.ConfigDuration to config.Duration (#6705)
We rely on the ratelimit/ package in CI to validate our ratelimit
configurations. However, because that package relies on cmd/ just for
cmd.ConfigDuration, many additional dependencies get pulled in.

This refactors just that struct to a separate config package. This was
done using Goland's automatic refactoring tooling, which also organized
a few imports while it was touching them, keeping standard library,
internal and external dependencies grouped.
2023-02-28 08:11:49 -08:00
Phil Porada d3845f25c6
Strict YAML parsing (#6652)
Adds a custom YAML unmarshaller in the `//strictyaml` package based on
`go-yaml/yaml v3` with unique key detection enabled and ensures that
target struct is able to contain all target fields.

Fixes https://github.com/letsencrypt/boulder/issues/3344.
2023-02-22 14:56:26 -05:00
Phil Porada 365c9af463
Replace deprecated iotuil.ReadAll with io.ReadAll (#6678)
Per [1]: 
> Deprecated: As of Go 1.16, this function (ioutil.ReadAll) simply calls
io.ReadAll.

1. https://pkg.go.dev/io/ioutil#ReadAll
2023-02-21 11:07:55 -08:00
Preston Locke 7d3fc60271
observer: Monitors probe immediately instead of waiting a full duration (#6594)
Do work on startup to limit the likelihood of post-deployment runtime panics.
2023-01-24 16:45:19 -05:00
lenaunderwood22 55e5a24e7d
observer: TLS prober check root optionally (#6569)
Modify the TLS prober to only check the root if one is provided.
2023-01-24 12:16:40 -05:00
lenaunderwood22 5016908905
observer: Fix nil pointer in TLS prober (#6591)
Initialize `Intermediates` field in `VerifyOptions`.
2023-01-20 19:28:11 -05:00
lenaunderwood22 b21f9b7976
Register TLS prober (#6570)
When attempting to add TLS probe monitoring, got the error `TLS is not a
registered Prober type`. This PR adds TLS Prober to `observer.go` to
complete its registration and adds TLS Prober to the observer README.

Co-authored-by: Samantha <hello@entropy.cat>
2023-01-11 14:01:41 -05:00
lenaunderwood22 f2bb0e42f1
boulder-observer: Add TLS prober (#6480)
Add a new kind of prober to boulder-observer which makes a TLS
connection to the target hostname and expects the certificate presented
for the TLS handshake to have certain properties, such as being valid,
expired, or revoked.

Part of #5927
2022-12-12 13:54:31 -08:00
lenaunderwood22 a90d9bff8d
add insecure option to HTTP Prober (#6514)
Adding an insecure option to HTTP prober so that it can still check the
status of sites that we expect to be insecure (e.g. expired sites).

Co-authored-by: Aaron Gable <aaron@aarongable.com>
2022-12-05 12:23:04 -08:00
Preston Locke 8477ba38e3
boulder-observer: Add a CRL prober type (#6349)
This PR is a follow-up to #6277 and #6290 to add a new prober type to
boulder-observer for monitoring CRLs, making use of the new prober-specific
metrics capability to define the following new metrics:

- `obs_crl_this_update` the Unix timestamp of the CRL's thisUpdate value
- `obs_crl_next_update` the Unix timestamp of the CRL's nextUpdate value
- `obs_crl_revoked_cert_count` the number of certificates listed in the CRL

**Configuration:** Each defined CRL monitor takes a single configuration option,
a URL that specifies the location of the CRL to monitor.

**Metrics:** The three CRL-specific metrics described above are only published
at /metrics if at least one valid monitor is defined in the config.yml. The
metrics have a single label `url` that is set to the URL configured for the
monitor
2022-09-15 11:44:56 -07:00
Preston Locke 647eb3f2fa
boulder-observer: Add support for prober specific metrics (#6290) 2022-09-02 10:40:03 -07:00
Aaron Gable 1a6f7154d8
Update yaml from v2.4.0 to v3.0.1 (#6146)
The gopkg.in/yaml.v2 package has a potential crash when
parsing malicious input. Although we only use the yaml
package to parse trusted configuration, update to v3 anyway.
2022-06-14 13:53:58 -07:00
Aaron Gable 305ef9cce9
Improve error checking paradigm (#5920)
We have decided that we don't like the if err := call(); err != nil
syntax, because it creates confusing scopes, but we have not cleaned up
all existing instances of that syntax. However, we have now found a
case where that syntax enables a bug: It caused readers to believe that
a later err = call() statement was assigning to an already-declared err
in the local scope, when in fact it was assigning to an
already-declared err in the parent scope of a closure. This caused our
ineffassign and staticcheck linters to be unable to analyze the
lifetime of the err variable, and so they did not complain when we
never checked the actual value of that error.

This change standardizes on the two-line error checking syntax
everywhere, so that we can more easily ensure that our linters are
correctly analyzing all error assignments.
2022-02-01 14:42:43 -07:00
Aaron Gable ab79f96d7b
Fixup staticcheck and stylecheck, and violations thereof (#5897)
Add `stylecheck` to our list of lints, since it got separated out from
`staticcheck`. Fix the way we configure both to be clearer and not
rely on regexes.

Additionally fix a number of easy-to-change `staticcheck` and
`stylecheck` violations, allowing us to reduce our number of ignored
checks.

Part of #5681
2022-01-20 16:22:30 -08:00
Andrew Gabbitas b5aab29407
Make boulder-observer HTTP User-Agent configurable (#5484)
- Make User-Agent configurable in config file
- Fix README example
- Add tests
2021-06-14 11:08:18 -06:00
Samantha 97e393d2e7
boulder-observer (#5315)
Add configuration driven Prometheus black box metric exporter
2021-03-29 12:56:54 -07:00