Commit Graph

25 Commits

Author SHA1 Message Date
Matthew McPherrin cb5384dcd7
Add --addr and/or --debug-addr flags to all commands (#7175)
Many services already have --addr and/or --debug-addr flags.

However, it wasn't universal, so this PR adds flags to commands where
they're not currently present.

This makes it easier to use a shared config file but listen on different
ports, for running multiple instances on a single host.

The config options are made optional as well, and removed from
config-next/.
2023-12-07 17:41:01 -08:00
Matthew McPherrin 32adaf1846
Make log-validator take glob patterns to monitor for log files (#7172)
To simplify deployment of the log validator, this allows wildcards
(using go's filepath.Glob) to be included in the file paths.

In order to detect new files, a new background goroutine polls the glob
patterns every minute for matches.

Because the "monitor" function is running in its own goroutine, a lock
is needed to ensure it's not trying to add new tailers while shutdown is
happening.
2023-11-27 12:48:46 -08:00
Matthew McPherrin 70a6b1093a
Refactor log-validator to break up the big main function. (#7170)
The main function in log-validator is overly big, so this refactors it
in preparation for adding support for file globs, which is started in
the (currently draft) #7134

This PR should be functionally a no-op, except for one change:
The 1-per-second ratelimiter is moved to be per-file, so it's
1-per-second-per-file. I think this more closely aligns with what we'd
want, as we could potentially miss that a file had bad lines in it if it
was overwhelmed by another log file at the same time.
2023-11-27 13:07:39 -05:00
Matthew McPherrin f57bd30931
Set "CompleteLines" true in TailFile (#7158)
We've observed with the upgraded tailer, it can observe partial lines.
But we only want complete lines to validate them.
2023-11-14 15:44:31 -08:00
Matthew McPherrin 75439eab4b
Replace hpcloud/tail with nxadm/tail (#7152)
The hpcloud version appears abandoned, with numerous unfixed bugs
including ones that can cause it to miss data.  The nxadm fork is
maintained.

The updated tail also pulls in an updated fsnotify. We had it vendored
at two paths before, so this has a side benefit of simplifying us to
having just one copy.
2023-11-09 16:30:15 -08:00
Phil Porada 4bd90ea82f
Log version string for more tools at startup (#7087)
This is a followup to https://github.com/letsencrypt/boulder/pull/7086
2023-09-19 12:46:55 -04:00
Matthew McPherrin 0060e695b5
Introduce OpenTelemetry Tracing (#6750)
Add a new shared config stanza which all boulder components can use to
configure their Open Telemetry tracing. This allows components to
specify where their traces should be sent, what their sampling ratio
should be, and whether or not they should respect their parent's
sampling decisions (so that web front-ends can ignore sampling info
coming from outside our infrastructure). It's likely we'll need to
evolve this configuration over time, but this is a good starting point.

Add basic Open Telemetry setup to our existing cmd.StatsAndLogging
helper, so that it gets initialized at the same time as our other
observability helpers. This sets certain default fields on all
traces/spans generated by the service. Currently these include the
service name, the service version, and information about the telemetry
SDK itself. In the future we'll likely augment this with information
about the host and process.

Finally, add instrumentation for the HTTP servers and grpc
clients/servers. This gives us a starting point of being able to monitor
Boulder, but is fairly minimal as this PR is already somewhat unwieldy:
It's really only enough to understand that everything is wired up
properly in the configuration. In subsequent work we'll enhance those
spans with more data, and add more spans for things not automatically
traced here.

Fixes https://github.com/letsencrypt/boulder/issues/6361

---------

Co-authored-by: Aaron Gable <aaron@aarongable.com>
2023-04-21 10:46:59 -07:00
Aaron Gable bd1d27b8e8
Fix non-gRPC process cleanup and exit (#6808)
Although #6771 significantly cleaned up how gRPC services stop and clean
up, it didn't make any changes to our HTTP servers or our non-server
(e.g. crl-updater, log-validator) processes. This change finishes the
work.

Add a new helper method cmd.WaitForSignal, which simply blocks until one
of the three signals we care about is received. This easily replaces all
calls to cmd.CatchSignals which passed `nil` as the callback argument,
with the added advantage that it doesn't call os.Exit() and therefore
allows deferred cleanup functions to execute. This new function is
intended to be the last line of main(), allowing the whole process to
exit once it returns.

Reimplement cmd.CatchSignals as a thin wrapper around cmd.WaitForSignal,
but with the added callback functionality. Also remove the os.Exit()
call from CatchSignals, so that the main goroutine is allowed to finish
whatever it's doing, call deferred functions, and exit naturally.

Update all of our non-gRPC binaries to use one of these two functions.
The vast majority use WaitForSignal, as they run their main processing
loop in a background goroutine. A few (particularly those that can run
either in run-once or in daemonized mode) still use CatchSignals, since
their primary processing happens directly on the main goroutine.

The changes to //test/load-generator are the most invasive, simply
because that binary needed to have a context plumbed into it for proper
cancellation, but it already had a custom struct type named "context"
which needed to be renamed to avoid shadowing.

Fixes https://github.com/letsencrypt/boulder/issues/6794
2023-04-14 16:22:56 -04:00
Matthew McPherrin 49851d7afd
Remove Beeline configuration (#6765)
In a previous PR, #6733, this configuration was marked deprecated
pending removal.  Here is that removal.
2023-03-23 16:58:36 -04:00
Samantha b2224eb4bc
config: Add validation tags to all configuration structs (#6674)
- Require `letsencrypt/validator` package.
- Add a framework for registering configuration structs and any custom
validators for each Boulder component at `init()` time.
- Add a `validate` subcommand which allows you to pass a `-component`
name and `-config` file path.
- Expose validation via exported utility functions
`cmd.LookupConfigValidator()`, `cmd.ValidateJSONConfig()` and
`cmd.ValidateYAMLConfig()`.
- Add unit test which validates all registered component configuration
structs against test configuration files.

Part of #6052
2023-03-21 14:08:03 -04:00
Matthew McPherrin e1ed1a2ac2
Remove beeline tracing (#6733)
Remove tracing using Beeline from Boulder. The only remnant left behind
is the deprecated configuration, to ensure deployability.

We had previously planned to swap in OpenTelemetry in a single PR, but
that adds significant churn in a single change, so we're doing this as
multiple steps that will each be significantly easier to reason about
and review.

Part of #6361
2023-03-14 15:14:27 -07:00
Jacob Hoffman-Andrews de2574a37a
crl/updater: fix incorrect logging of error (#6401)
Fix instances where an error check was conditioned on something
other than the traditional `err`, such as `myStruct.err`, but then the
error being logged was the `err` from elsewhere in the function.
2022-09-28 09:30:32 -07:00
Aaron Gable 9c197e1f43
Use io and os instead of deprecated ioutil (#6286)
The iotuil package has been deprecated since go1.16; the various
functions it provided now exist in the os and io packages. Replace all
instances of ioutil with either io or os, as appropriate.
2022-08-10 13:30:17 -07:00
Aaron Gable 305ef9cce9
Improve error checking paradigm (#5920)
We have decided that we don't like the if err := call(); err != nil
syntax, because it creates confusing scopes, but we have not cleaned up
all existing instances of that syntax. However, we have now found a
case where that syntax enables a bug: It caused readers to believe that
a later err = call() statement was assigning to an already-declared err
in the local scope, when in fact it was assigning to an
already-declared err in the parent scope of a closure. This caused our
ineffassign and staticcheck linters to be unable to analyze the
lifetime of the err variable, and so they did not complain when we
never checked the actual value of that error.

This change standardizes on the two-line error checking syntax
everywhere, so that we can more easily ensure that our linters are
correctly analyzing all error assignments.
2022-02-01 14:42:43 -07:00
Aaron Gable ab79f96d7b
Fixup staticcheck and stylecheck, and violations thereof (#5897)
Add `stylecheck` to our list of lints, since it got separated out from
`staticcheck`. Fix the way we configure both to be clearer and not
rely on regexes.

Additionally fix a number of easy-to-change `staticcheck` and
`stylecheck` violations, allowing us to reduce our number of ignored
checks.

Part of #5681
2022-01-20 16:22:30 -08:00
Jacob Hoffman-Andrews 3bf06bb4d8
Export the config structs from our main files (#5875)
This allows our documentation on those structs to show up in our godoc
output.
2022-01-12 12:20:27 -08:00
Samantha 7a7f436212
log-validator: ensure that log lines contain a checksum (#5788)
Most Boulder logging is supposed to go through our logging subsystem, where a
checksum is added. However, very occasionally Boulder emits output on stdout or
stderr. For instance this can happen during panics, or if we load a pkcs11
module that emits messages on stdout or stderr.

When that happens, the logs are collected by systemd and sent into rsyslog with
the same programname as the lines that went through our logging subsystem. This
causes spurious alerts from log-validator because it can't find the checksum in
those log lines.

This change reduces the risk of spurious alerting by providing a separate metric
for "malformed log line" vs "well-formed log line with a checksum mismatch."
We'll still want to alert on "malformed log line", in case a future change to
logging causes all log lines to be malformed. But we can set the threshold for
it much higher.

Fixes #5771
2021-11-09 12:38:09 -08:00
Jacob Hoffman-Andrews 23dd1e21f9
Build all boulder binaries into a single binary (#5693)
The resulting `boulder` binary can be invoked by different names to
trigger the behavior of the relevant subcommand. For instance, symlinking
and invoking as `boulder-ca` acts as the CA. Symlinking and invoking as
`boulder-va` acts as the VA.

This reduces the .deb file size from about 200MB to about 20MB.

This works by creating a registry that maps subcommand names to `main`
functions. Each subcommand registers itself in an `init()` function. The
monolithic `boulder` binary then checks what name it was invoked with
(`os.Args[0]`), looks it up in the registry, and invokes the appropriate
`main`. To avoid conflicts, all of the old `package main` are replaced
with `package notmain`.

To get the list of registered subcommands, run `boulder --list`. This
is used when symlinking all the variants into place, to ensure the set
of symlinked names matches the entries in the registry.

Fixes #5692
2021-10-20 17:05:45 -07:00
Aaron Gable 9abb39d4d6
Honeycomb integration proof-of-concept (#5408)
Add Honeycomb tracing to all Boulder components which act as
HTTP servers, gRPC servers, or gRPC clients. Add many values
which we currently emit to logs to the trace spans. Add a way to
configure the Honeycomb integration to our config files, and by
default configure all of our tests to "mute" (send nothing).

Followup changes will refine the configuration, attempt to reduce
the new dependency load, and introduce better sampling.

Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218
2021-05-24 16:13:08 -07:00
Jacob Hoffman-Andrews 3b5915a6f2
Reduce chance of log-validator having runaway output. (#4926) 2020-07-10 11:16:18 -07:00
Garrett Squire 739686ba88
Bug Fixes (#4798)
Patches:

Make sure all log tailing types call Cleanup
Make sure the http.Response body is closed in all cases
Make sure that the challenge token is always deleted
2020-04-30 11:56:43 -07:00
Jacob Hoffman-Andrews 87fb6028c1
Add log validator to integration tests (#4782)
For now this mainly provides an example config and confirms that
log-validator can start up and shut down cleanly, as well as provide a
stat indicating how many log lines it has handled.

This introduces a syslog config to the boulder-tools image that will write
logs to /var/log/program.log. It also tweaks the various .json config
files so they have non-default syslogLevel, to ensure they actually
write something for log-validator to verify.
2020-04-20 13:33:42 -07:00
Jacob Hoffman-Andrews b351fa5979
log-validator: handle spurious shutdown errors. (#4776)
Also add a logs adapter for tail's built-in logging type.
2020-04-15 13:44:12 -07:00
Roland Bracewell Shoemaker 4743889cd3
Count number of corrupt lines + allow non-existent files to be… (#4631)
Fixes #4612.
2020-01-07 13:33:47 -08:00
Roland Bracewell Shoemaker 308960cbdd log-validator: add cmd/daemon for verifying log integrity (#4482)
In f32fdc4 the Boulder logging framework was updated to emit a CRC32-IEEE
checksum in log lines. The `log-validator` command verifies these checksums in
one of two ways:

1. By running as a daemon process, tailing logs and verifying checksums as they
arrive.
2. By running as a one-off command, verifying checksums of every line in a log
file on disk.
2019-10-21 10:12:55 -04:00