boulder

Commit Graph

Author	SHA1	Message	Date
Matthew McPherrin	cb5384dcd7	Add --addr and/or --debug-addr flags to all commands (#7175 ) Many services already have --addr and/or --debug-addr flags. However, it wasn't universal, so this PR adds flags to commands where they're not currently present. This makes it easier to use a shared config file but listen on different ports, for running multiple instances on a single host. The config options are made optional as well, and removed from config-next/.	2023-12-07 17:41:01 -08:00
Matthew McPherrin	32adaf1846	Make log-validator take glob patterns to monitor for log files (#7172 ) To simplify deployment of the log validator, this allows wildcards (using go's filepath.Glob) to be included in the file paths. In order to detect new files, a new background goroutine polls the glob patterns every minute for matches. Because the "monitor" function is running in its own goroutine, a lock is needed to ensure it's not trying to add new tailers while shutdown is happening.	2023-11-27 12:48:46 -08:00
Matthew McPherrin	70a6b1093a	Refactor log-validator to break up the big main function. (#7170 ) The main function in log-validator is overly big, so this refactors it in preparation for adding support for file globs, which is started in the (currently draft) #7134 This PR should be functionally a no-op, except for one change: The 1-per-second ratelimiter is moved to be per-file, so it's 1-per-second-per-file. I think this more closely aligns with what we'd want, as we could potentially miss that a file had bad lines in it if it was overwhelmed by another log file at the same time.	2023-11-27 13:07:39 -05:00
Matthew McPherrin	f57bd30931	Set "CompleteLines" true in TailFile (#7158 ) We've observed with the upgraded tailer, it can observe partial lines. But we only want complete lines to validate them.	2023-11-14 15:44:31 -08:00
Matthew McPherrin	75439eab4b	Replace hpcloud/tail with nxadm/tail (#7152 ) The hpcloud version appears abandoned, with numerous unfixed bugs including ones that can cause it to miss data. The nxadm fork is maintained. The updated tail also pulls in an updated fsnotify. We had it vendored at two paths before, so this has a side benefit of simplifying us to having just one copy.	2023-11-09 16:30:15 -08:00
Phil Porada	4bd90ea82f	Log version string for more tools at startup (#7087 ) This is a followup to https://github.com/letsencrypt/boulder/pull/7086	2023-09-19 12:46:55 -04:00
Matthew McPherrin	0060e695b5	Introduce OpenTelemetry Tracing (#6750 ) Add a new shared config stanza which all boulder components can use to configure their Open Telemetry tracing. This allows components to specify where their traces should be sent, what their sampling ratio should be, and whether or not they should respect their parent's sampling decisions (so that web front-ends can ignore sampling info coming from outside our infrastructure). It's likely we'll need to evolve this configuration over time, but this is a good starting point. Add basic Open Telemetry setup to our existing cmd.StatsAndLogging helper, so that it gets initialized at the same time as our other observability helpers. This sets certain default fields on all traces/spans generated by the service. Currently these include the service name, the service version, and information about the telemetry SDK itself. In the future we'll likely augment this with information about the host and process. Finally, add instrumentation for the HTTP servers and grpc clients/servers. This gives us a starting point of being able to monitor Boulder, but is fairly minimal as this PR is already somewhat unwieldy: It's really only enough to understand that everything is wired up properly in the configuration. In subsequent work we'll enhance those spans with more data, and add more spans for things not automatically traced here. Fixes https://github.com/letsencrypt/boulder/issues/6361 --------- Co-authored-by: Aaron Gable <aaron@aarongable.com>	2023-04-21 10:46:59 -07:00
Aaron Gable	bd1d27b8e8	Fix non-gRPC process cleanup and exit (#6808 ) Although #6771 significantly cleaned up how gRPC services stop and clean up, it didn't make any changes to our HTTP servers or our non-server (e.g. crl-updater, log-validator) processes. This change finishes the work. Add a new helper method cmd.WaitForSignal, which simply blocks until one of the three signals we care about is received. This easily replaces all calls to cmd.CatchSignals which passed `nil` as the callback argument, with the added advantage that it doesn't call os.Exit() and therefore allows deferred cleanup functions to execute. This new function is intended to be the last line of main(), allowing the whole process to exit once it returns. Reimplement cmd.CatchSignals as a thin wrapper around cmd.WaitForSignal, but with the added callback functionality. Also remove the os.Exit() call from CatchSignals, so that the main goroutine is allowed to finish whatever it's doing, call deferred functions, and exit naturally. Update all of our non-gRPC binaries to use one of these two functions. The vast majority use WaitForSignal, as they run their main processing loop in a background goroutine. A few (particularly those that can run either in run-once or in daemonized mode) still use CatchSignals, since their primary processing happens directly on the main goroutine. The changes to //test/load-generator are the most invasive, simply because that binary needed to have a context plumbed into it for proper cancellation, but it already had a custom struct type named "context" which needed to be renamed to avoid shadowing. Fixes https://github.com/letsencrypt/boulder/issues/6794	2023-04-14 16:22:56 -04:00
Matthew McPherrin	49851d7afd	Remove Beeline configuration (#6765 ) In a previous PR, #6733, this configuration was marked deprecated pending removal. Here is that removal.	2023-03-23 16:58:36 -04:00
Samantha	b2224eb4bc	config: Add validation tags to all configuration structs (#6674 ) - Require `letsencrypt/validator` package. - Add a framework for registering configuration structs and any custom validators for each Boulder component at `init()` time. - Add a `validate` subcommand which allows you to pass a `-component` name and `-config` file path. - Expose validation via exported utility functions `cmd.LookupConfigValidator()`, `cmd.ValidateJSONConfig()` and `cmd.ValidateYAMLConfig()`. - Add unit test which validates all registered component configuration structs against test configuration files. Part of #6052	2023-03-21 14:08:03 -04:00
Matthew McPherrin	e1ed1a2ac2	Remove beeline tracing (#6733 ) Remove tracing using Beeline from Boulder. The only remnant left behind is the deprecated configuration, to ensure deployability. We had previously planned to swap in OpenTelemetry in a single PR, but that adds significant churn in a single change, so we're doing this as multiple steps that will each be significantly easier to reason about and review. Part of #6361	2023-03-14 15:14:27 -07:00
Jacob Hoffman-Andrews	de2574a37a	crl/updater: fix incorrect logging of error (#6401 ) Fix instances where an error check was conditioned on something other than the traditional `err`, such as `myStruct.err`, but then the error being logged was the `err` from elsewhere in the function.	2022-09-28 09:30:32 -07:00
Aaron Gable	9c197e1f43	Use io and os instead of deprecated ioutil (#6286 ) The iotuil package has been deprecated since go1.16; the various functions it provided now exist in the os and io packages. Replace all instances of ioutil with either io or os, as appropriate.	2022-08-10 13:30:17 -07:00
Aaron Gable	305ef9cce9	Improve error checking paradigm (#5920 ) We have decided that we don't like the if err := call(); err != nil syntax, because it creates confusing scopes, but we have not cleaned up all existing instances of that syntax. However, we have now found a case where that syntax enables a bug: It caused readers to believe that a later err = call() statement was assigning to an already-declared err in the local scope, when in fact it was assigning to an already-declared err in the parent scope of a closure. This caused our ineffassign and staticcheck linters to be unable to analyze the lifetime of the err variable, and so they did not complain when we never checked the actual value of that error. This change standardizes on the two-line error checking syntax everywhere, so that we can more easily ensure that our linters are correctly analyzing all error assignments.	2022-02-01 14:42:43 -07:00
Aaron Gable	ab79f96d7b	Fixup staticcheck and stylecheck, and violations thereof (#5897 ) Add `stylecheck` to our list of lints, since it got separated out from `staticcheck`. Fix the way we configure both to be clearer and not rely on regexes. Additionally fix a number of easy-to-change `staticcheck` and `stylecheck` violations, allowing us to reduce our number of ignored checks. Part of #5681	2022-01-20 16:22:30 -08:00
Jacob Hoffman-Andrews	3bf06bb4d8	Export the config structs from our main files (#5875 ) This allows our documentation on those structs to show up in our godoc output.	2022-01-12 12:20:27 -08:00
Samantha	7a7f436212	log-validator: ensure that log lines contain a checksum (#5788 ) Most Boulder logging is supposed to go through our logging subsystem, where a checksum is added. However, very occasionally Boulder emits output on stdout or stderr. For instance this can happen during panics, or if we load a pkcs11 module that emits messages on stdout or stderr. When that happens, the logs are collected by systemd and sent into rsyslog with the same programname as the lines that went through our logging subsystem. This causes spurious alerts from log-validator because it can't find the checksum in those log lines. This change reduces the risk of spurious alerting by providing a separate metric for "malformed log line" vs "well-formed log line with a checksum mismatch." We'll still want to alert on "malformed log line", in case a future change to logging causes all log lines to be malformed. But we can set the threshold for it much higher. Fixes #5771	2021-11-09 12:38:09 -08:00
Jacob Hoffman-Andrews	23dd1e21f9	Build all boulder binaries into a single binary (#5693 ) The resulting `boulder` binary can be invoked by different names to trigger the behavior of the relevant subcommand. For instance, symlinking and invoking as `boulder-ca` acts as the CA. Symlinking and invoking as `boulder-va` acts as the VA. This reduces the .deb file size from about 200MB to about 20MB. This works by creating a registry that maps subcommand names to `main` functions. Each subcommand registers itself in an `init()` function. The monolithic `boulder` binary then checks what name it was invoked with (`os.Args[0]`), looks it up in the registry, and invokes the appropriate `main`. To avoid conflicts, all of the old `package main` are replaced with `package notmain`. To get the list of registered subcommands, run `boulder --list`. This is used when symlinking all the variants into place, to ensure the set of symlinked names matches the entries in the registry. Fixes #5692	2021-10-20 17:05:45 -07:00
Aaron Gable	9abb39d4d6	Honeycomb integration proof-of-concept (#5408 ) Add Honeycomb tracing to all Boulder components which act as HTTP servers, gRPC servers, or gRPC clients. Add many values which we currently emit to logs to the trace spans. Add a way to configure the Honeycomb integration to our config files, and by default configure all of our tests to "mute" (send nothing). Followup changes will refine the configuration, attempt to reduce the new dependency load, and introduce better sampling. Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218	2021-05-24 16:13:08 -07:00
Jacob Hoffman-Andrews	3b5915a6f2	Reduce chance of log-validator having runaway output. (#4926 )	2020-07-10 11:16:18 -07:00
Garrett Squire	739686ba88	Bug Fixes (#4798 ) Patches: Make sure all log tailing types call Cleanup Make sure the http.Response body is closed in all cases Make sure that the challenge token is always deleted	2020-04-30 11:56:43 -07:00
Jacob Hoffman-Andrews	87fb6028c1	Add log validator to integration tests (#4782 ) For now this mainly provides an example config and confirms that log-validator can start up and shut down cleanly, as well as provide a stat indicating how many log lines it has handled. This introduces a syslog config to the boulder-tools image that will write logs to /var/log/program.log. It also tweaks the various .json config files so they have non-default syslogLevel, to ensure they actually write something for log-validator to verify.	2020-04-20 13:33:42 -07:00
Jacob Hoffman-Andrews	b351fa5979	log-validator: handle spurious shutdown errors. (#4776 ) Also add a logs adapter for tail's built-in logging type.	2020-04-15 13:44:12 -07:00
Roland Bracewell Shoemaker	4743889cd3	Count number of corrupt lines + allow non-existent files to be… (#4631 ) Fixes #4612.	2020-01-07 13:33:47 -08:00
Roland Bracewell Shoemaker	308960cbdd	log-validator: add cmd/daemon for verifying log integrity (#4482 ) In `f32fdc4` the Boulder logging framework was updated to emit a CRC32-IEEE checksum in log lines. The `log-validator` command verifies these checksums in one of two ways: 1. By running as a daemon process, tailing logs and verifying checksums as they arrive. 2. By running as a one-off command, verifying checksums of every line in a log file on disk.	2019-10-21 10:12:55 -04:00

25 Commits