boulder

Commit Graph

Author	SHA1	Message	Date
Aaron Gable	b92581d620	Better compile-time type checking for gRPC server implementations (#7504 ) Replaced our embeds of foopb.UnimplementedFooServer with foopb.UnsafeFooServer. Per the grpc-go docs this reduces the "forwards compatibility" of our implementations, but that is only a concern for codebases that are implementing gRPC interfaces maintained by third parties, and which want to be able to update those third-party dependencies without updating their own implementations in lockstep. Because we update our protos and our implementations simultaneously, we can remove this safety net to replace runtime type checking with compile-time type checking. However, that replacement is not enough, because we never pass our implementation objects to a function which asserts that they match a specific interface. So this PR also replaces our reflect-based unittests with idiomatic interface assertions. I do not view this as a perfect solution, as it relies on people implementing new gRPC servers to add this line, but it is no worse than the status quo which relied on people adding the "TestImplementation" test. Fixes https://github.com/letsencrypt/boulder/issues/7497	2024-05-28 09:26:29 -07:00
Aaron Gable	0d8efb9b38	Purger: compute throughput values from number of instances (#7502 ) Give akamai-purger a new "Throughput.TotalInstances" config value, to inform it how many instances of itself are competing for akamai rate limit quote. Combine the `useOptimizedDefaults` and `validate` functions into a single `optimizeAndValidate` function which sets default values according to the number of active instances, and confirms that the results still fall within the rate limits. Fixes https://github.com/letsencrypt/boulder/issues/7487	2024-05-24 13:30:46 -04:00
Aaron Gable	e05d47a10a	Replace explicit int loops with range-over-int (#7434 ) This adopts modern Go syntax to reduce the chance of off-by-one errors and remove unnecessary loop variable declarations. Fixes https://github.com/letsencrypt/boulder/issues/7227	2024-04-22 10:34:51 -07:00
Shiloh Heurich	2cf734edcd	Fix TestAkamaiPurgerDrainQueueSucceeds data race (#7389 ) Fixes https://github.com/letsencrypt/boulder/issues/7388	2024-03-25 10:52:19 -07:00
Phil Porada	55e512cd37	akamai-purger: Check the correct pointer for manual mode configuration file (#7177 ) When running in manual mode, the `configFile` variable will take the zero value of `""` while `manualConfigFile` will be provided on the CLI by the operator. A startup check incorrectly dereferences `configFile`; but correctly determines that it is the zero value `""`, outputs the help text, and exits never allowing manual mode to perform work. Fixes https://github.com/letsencrypt/boulder/issues/7176	2023-12-04 10:58:07 -05:00
Jacob Hoffman-Andrews	a2b2e53045	cmd: fail without panic (#6935 ) For "ordinary" errors like "file not found" for some part of the config, we would prefer to log an error and exit without logging about a panic and printing a stack trace. To achieve that, we want to call `defer AuditPanic()` once, at the top of `cmd/boulder`'s main. That's so early that we haven't yet parsed the config, which means we haven't yet initialized a logger. We compromise: `AuditPanic` now calls `log.Get()`, which will retrieve the configured logger if one has been set up, or will create a default one (which logs to stderr/stdout). AuditPanic and Fail/FailOnError now cooperate: Fail/FailOnError panic with a special type, and AuditPanic checks for that type and prints a simple message before exiting when it's present. This PR also coincidentally fixes a bug: panicking didn't previously cause the program to exit with nonzero status, because it recovered the panic but then did not explicitly exit nonzero. Fixes #6933	2023-06-20 12:29:02 -07:00
Samantha	124c4cc6f5	grpc/sa: Implement deep health checks (#6928 ) Add the necessary scaffolding for deep health checking of our various gRPC components. Each component implementation that also implements the grpc.checker interface will be checked periodically, and the health status of the component will be updated accordingly. Add the necessary methods to SA to implement the grpc.checker interface and register these new health checks with Consul. Additionally: - Update entry point script to check for ProxySQL readiness. - Increase the poll rate for gRPC Consul checks from 5s to 2s to help with DNS failures, due to check failures, on startup. - Change log level for Consul from INFO to ERROR to deal with noisy logs full of transport failures due to Consul gRPC checks firing before the SAs are up. Fixes #6878 Part of #6795	2023-06-12 13:58:53 -04:00
Aaron Gable	42bd62e50b	Purger: list failed urls in error message (#6882 ) Fixes https://github.com/letsencrypt/boulder/issues/6853	2023-05-11 10:39:54 -07:00
Phil Porada	17fb1b287f	cmd: Export prometheus metrics for TLS cert notBefore and notAfter fields (#6836 ) Export new prometheus metrics for the `notBefore` and `notAfter` fields to track internal certificate validity periods when calling the `Load()` method for a `*tls.Config`. Each metric is labeled with the `serial` field. ``` tlsconfig_notafter_seconds{serial="2152072875247971686"} 1.664821961e+09 tlsconfig_notbefore_seconds{serial="2152072875247971686"} 1.664821960e+09 ``` Fixes https://github.com/letsencrypt/boulder/issues/6829	2023-04-24 16:28:05 -04:00
Matthew McPherrin	0060e695b5	Introduce OpenTelemetry Tracing (#6750 ) Add a new shared config stanza which all boulder components can use to configure their Open Telemetry tracing. This allows components to specify where their traces should be sent, what their sampling ratio should be, and whether or not they should respect their parent's sampling decisions (so that web front-ends can ignore sampling info coming from outside our infrastructure). It's likely we'll need to evolve this configuration over time, but this is a good starting point. Add basic Open Telemetry setup to our existing cmd.StatsAndLogging helper, so that it gets initialized at the same time as our other observability helpers. This sets certain default fields on all traces/spans generated by the service. Currently these include the service name, the service version, and information about the telemetry SDK itself. In the future we'll likely augment this with information about the host and process. Finally, add instrumentation for the HTTP servers and grpc clients/servers. This gives us a starting point of being able to monitor Boulder, but is fairly minimal as this PR is already somewhat unwieldy: It's really only enough to understand that everything is wired up properly in the configuration. In subsequent work we'll enhance those spans with more data, and add more spans for things not automatically traced here. Fixes https://github.com/letsencrypt/boulder/issues/6361 --------- Co-authored-by: Aaron Gable <aaron@aarongable.com>	2023-04-21 10:46:59 -07:00
Aaron Gable	d6cd589795	Simplify how gRPC services start, stop, and clean up (#6771 ) The CA, RA, and VA have multiple goroutines running alongside primary gRPC handling goroutine. These ancillary goroutines should be gracefully shut down when the process is about to exit. Historically, we have handled this by putting a call to each of these goroutine's shutdown function inside cmd.CatchSignals, so that when a SIGINT is received, all of the various cleanup routines happen in sequence. But there's a cleaner way to do it: just use defer! All of these cleanups need to happen after the primary gRPC server has fully shut down, so that we know they stick around at least as long as the service is handling gRPC requests. And when the service receives a SIGINT, cmd.CatchSignals will call the gRPC server's GracefulStop, which will cause the server's .Serve() to finally exit, which will cause start() to exit, which will cause main() to exit, which will cause all deferred functions to be run. In addition, remove filterShutdownErrors as the bug which made it necessary (.Serve() returning an error even when GracefulShutdown() is called) was fixed back in 2017. This allows us to call the start() function in a much more natural way, simply logging any error it returns instead of calling os.Exit(1) if it returns an error. This allows us to simplify the exit-handling code in these three services' main() functions, and lets us be a bit more idiomatic with our deferred cleanup functions. Part of #6794	2023-04-05 14:55:57 -07:00
Aaron Gable	9262ca6e3f	Add grpc implementation tests to all services (#6782 ) As a follow-up to #6780, add the same style of implementation test to all of our other gRPC services. This was not included in that PR just to keep it small and single-purpose.	2023-03-31 09:52:26 -07:00
Matthew McPherrin	49851d7afd	Remove Beeline configuration (#6765 ) In a previous PR, #6733, this configuration was marked deprecated pending removal. Here is that removal.	2023-03-23 16:58:36 -04:00
Samantha	b2224eb4bc	config: Add validation tags to all configuration structs (#6674 ) - Require `letsencrypt/validator` package. - Add a framework for registering configuration structs and any custom validators for each Boulder component at `init()` time. - Add a `validate` subcommand which allows you to pass a `-component` name and `-config` file path. - Expose validation via exported utility functions `cmd.LookupConfigValidator()`, `cmd.ValidateJSONConfig()` and `cmd.ValidateYAMLConfig()`. - Add unit test which validates all registered component configuration structs against test configuration files. Part of #6052	2023-03-21 14:08:03 -04:00
Matthew McPherrin	e1ed1a2ac2	Remove beeline tracing (#6733 ) Remove tracing using Beeline from Boulder. The only remnant left behind is the deprecated configuration, to ensure deployability. We had previously planned to swap in OpenTelemetry in a single PR, but that adds significant churn in a single change, so we're doing this as multiple steps that will each be significantly easier to reason about and review. Part of #6361	2023-03-14 15:14:27 -07:00
Matthew McPherrin	391a59921b	Move cmd.ConfigDuration to config.Duration (#6705 ) We rely on the ratelimit/ package in CI to validate our ratelimit configurations. However, because that package relies on cmd/ just for cmd.ConfigDuration, many additional dependencies get pulled in. This refactors just that struct to a separate config package. This was done using Goland's automatic refactoring tooling, which also organized a few imports while it was touching them, keeping standard library, internal and external dependencies grouped.	2023-02-28 08:11:49 -08:00
Aaron Gable	46c8d66c31	bgrpc.NewServer: support multiple services (#6487 ) Turn bgrpc.NewServer into a builder-pattern, with a config-based initialization, multiple calls to Add to add new gRPC services, and a final call to Build to produce the start() and stop() functions which control server behavior. All calls are chainable to produce compact code in each component's main() function. This improves the process of creating a new gRPC server in three ways: 1) It avoids the need for generics/templating, which was slightly verbose. 2) It allows the set of services to be registered on this server to be known ahead of time. 3) It greatly streamlines adding multiple services to the same server, which we use today in the VA and will be using soon in the SA and CA. While we're here, add a new per-service config stanza to the GRPCServerConfig, so that individual services on the same server can have their own configuration. For now, only provide a "ClientNames" key, which will be used in a follow-up PR. Part of #6454	2022-11-04 13:26:42 -07:00
Samantha	6d519059a3	akamai-purger: Deprecate PurgeInterval config field (#6489 ) Fixes #6003	2022-11-04 12:44:35 -07:00
Aaron Gable	9213bd0993	Streamline gRPC server creation (#6457 ) Collapse most of our boilerplate gRPC creation steps (in particular, creating default metrics, making the server and listener, registering the server, creating and registering the health service, filtering shutdown errors from the output, and gracefully stopping) into a single function in the existing bgrpc package. This allows all but one of our server main functions to drop their calls to NewServer and NewServerMetrics. To enable this, create a new helper type and method in the bgrpc package. Conceptually, this could be just a new function, but it must be attached to a new type so that it can be generic over the type of gRPC server being created. (Unfortunately, the grpc.RegisterFooServer methods do not accept an interface type for their second argument). The only main function which is not updated is the boulder-va, which is a special case because it creates multiple gRPC servers but (unlike the CA) serves them all on the same port with the same server and listener. Part of #6452	2022-10-26 15:45:52 -07:00
Aaron Gable	0340b574d9	Add unparam linter to CI (#6312 ) Enable the "unparam" linter, which checks for unused function parameters, unused function return values, and parameters and return values that always have the same value every time they are used. In addition, fix many instances where the unparam linter complains about our existing codebase. Remove error return values from a number of functions that never return an error, remove or use context and test parameters that were previously unused, and simplify a number of (mostly test-only) functions that always take the same value for their parameter. Most notably, remove the ability to customize the RSA Public Exponent from the ceremony tooling, since it should always be 65537 anyway. Fixes #6104	2022-08-23 12:37:24 -07:00
Samantha	cb5c10335d	akamai-purger: Make the purger queue a stack (#6042 ) Avoid rejecting new purge requests by making the `akamai-purger` queue a stack that pops entries off the bottom (oldest) to make room. Fixes #5941	2022-04-08 12:47:02 -07:00
Jacob Hoffman-Andrews	8903cd9f2d	akamai-purger: show help for `manual` subcommand (#6019 ) Fixes #5967	2022-04-06 11:10:40 -07:00
Samantha	7c22b99d63	akamai-purger: Improve throughput and configuration safety (#6006 ) - Add new configuration key `throughput`, a mapping which contains all throughput related akamai-purger settings. - Deprecate configuration key `purgeInterval` in favor of `purgeBatchInterval` in the new `throughput` configuration mapping. - When no `throughput` or `purgeInterval` is provided, the purger uses optimized default settings which offer 1.9x the throughput of current production settings. - At startup, all throughput related settings are modeled to ensure that we don't exceed the limits imposed on us by Akamai. - Queue is now `[][]string`, instead of `[]string`. - When a given queue entry is purged we know all 3 of it's URLs were purged. - At startup we know the size of a theoretical request to purge based on the number of queue entries included - Raises the queue size from ~333-thousand cached OCSP responses to 1.25-million, which is roughly 6 hours of work using the optimized default settings - Raise `purgeInterval` in test config from 1ms, which violates API limits, to 800ms Fixes #5984	2022-03-23 17:23:07 -07:00
Samantha	3b665f8dbf	akamai-purger: Queue and response handling improvements (#5955 ) - Make maximum queue size configurable via a new configuration key: 'MaxQueueSize'. - Default 'MaxQueueSize' to the previous value (1M) when 'MaxQueueSize' isn't specified. - akamaiPurger.purge() will only place the URLs starting at the first entry of the failed batch where a failure was encountered instead of the entire set that was originally passed. - Add a test to ensure that these changes are working as intended. - Make the purge batching easier to understand with some minor changes to variable names - Responses whose HTTP status code is not 201 will no longer be unmarshaled - Logs will explicitly call out if a response indicates that we've exceeded any rate limits imposed by Akamai. Fixes #5917	2022-03-07 12:21:16 -08:00
Samantha	80fe3aed54	akamai-purger: Cleanup (#5949 ) Light cleanup of akamai-purger and the akamai cache-client. This does not make any material changes to logic. - Use `errors.New` and `errors.Is` instead of a custom `ErrFatal` type and `errors.As` - Add whitespace to separate chunks of execution and error checking from one another - Use `logger.Infof` and `logger.Errorf` instead of wrapped calls to `fmt.Sprintf` - Remove capital letters from the beginning of error messages - Additional comments and removal of some that are no longer accurate	2022-02-24 20:57:25 -08:00
Aaron Gable	305ef9cce9	Improve error checking paradigm (#5920 ) We have decided that we don't like the if err := call(); err != nil syntax, because it creates confusing scopes, but we have not cleaned up all existing instances of that syntax. However, we have now found a case where that syntax enables a bug: It caused readers to believe that a later err = call() statement was assigning to an already-declared err in the local scope, when in fact it was assigning to an already-declared err in the parent scope of a closure. This caused our ineffassign and staticcheck linters to be unable to analyze the lifetime of the err variable, and so they did not complain when we never checked the actual value of that error. This change standardizes on the two-line error checking syntax everywhere, so that we can more easily ensure that our linters are correctly analyzing all error assignments.	2022-02-01 14:42:43 -07:00
Jacob Hoffman-Andrews	3bf06bb4d8	Export the config structs from our main files (#5875 ) This allows our documentation on those structs to show up in our godoc output.	2022-01-12 12:20:27 -08:00
Jacob Hoffman-Andrews	960fde9347	Add manual Akamai cache tag purger (#5742 ) Add functionality to purge by cache tags in our Akamai CachePurgeClient. Use that functionality in a new manual mode of akamai-purger, which takes a single tag with the `--tag` flag, or a file containing multiple tags with `--tag-file`. A tag file containing a random set of cache tags can be generated with: printf "%x\n" $(seq 0 255) \| shuf -n 5	2021-10-25 18:21:27 -07:00
Jacob Hoffman-Andrews	23dd1e21f9	Build all boulder binaries into a single binary (#5693 ) The resulting `boulder` binary can be invoked by different names to trigger the behavior of the relevant subcommand. For instance, symlinking and invoking as `boulder-ca` acts as the CA. Symlinking and invoking as `boulder-va` acts as the VA. This reduces the .deb file size from about 200MB to about 20MB. This works by creating a registry that maps subcommand names to `main` functions. Each subcommand registers itself in an `init()` function. The monolithic `boulder` binary then checks what name it was invoked with (`os.Args[0]`), looks it up in the registry, and invokes the appropriate `main`. To avoid conflicts, all of the old `package main` are replaced with `package notmain`. To get the list of registered subcommands, run `boulder --list`. This is used when symlinking all the variants into place, to ensure the set of symlinked names matches the entries in the registry. Fixes #5692	2021-10-20 17:05:45 -07:00
Aaron Gable	8be32d3312	Use google.protobuf.Empty instead of core.Empty (#5454 ) Replace `core.Empty` with `google.protobuf.Empty` in all of our gRPC methods which consume or return an empty protobuf. The golang core proto libraries provide an empty message type, so there is no need for us to reinvent the wheel. This change is backwards-compatible and does not require a special deploy. The protobuf message descriptions of `core.Empty` and `google.protobuf.Empty` are identical, so their wire-formats are indistinguishable and therefore interoperable / cross-compatible. Fixes #5443	2021-06-03 14:17:41 -07:00
Aaron Gable	9abb39d4d6	Honeycomb integration proof-of-concept (#5408 ) Add Honeycomb tracing to all Boulder components which act as HTTP servers, gRPC servers, or gRPC clients. Add many values which we currently emit to logs to the trace spans. Add a way to configure the Honeycomb integration to our config files, and by default configure all of our tests to "mute" (send nothing). Followup changes will refine the configuration, attempt to reduce the new dependency load, and introduce better sampling. Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218	2021-05-24 16:13:08 -07:00
Jacob Hoffman-Andrews	7194624191	Update grpc and protobuf to latest. (#5369 ) protoc now generates grpc code in a separate file from protobuf code. Also, grpc servers are now required to embed an "unimplemented" interface from the generated .pb.go file, which provides forward compatibility. Update the generate.go files since the invocation for protoc has changed with the split into .pb.org and _grpc.pb.go. Fixes #5368	2021-04-01 17:18:15 -07:00
Samantha	c9c8a1a507	Export akamai-purger queue length metrics (#5366 ) Collect akamai-purger queue length on each Prometheus scrape Fixes #4716	2021-03-31 13:39:35 -07:00
Aaron Gable	2d14cfb8d1	Add gRPC Health service to all Boulder services (#5093 ) This health service implements the gRPC Health Checking Protocol, as defined in https://github.com/grpc/grpc/blob/master/doc/health-checking.md and as implemented by the gRPC authors in https://pkg.go.dev/google.golang.org/grpc/health@v1.29.0 It simply instantiates a health service, and attaches it to the same gRPC server that is handling requests to the primary (e.g. CA) service. When the main service would be shut down (e.g. because it caught a signal), it also sets the status of the service to NOT_SERVING. This change also imports the health client into our grpc client, ensuring that all of our grpc clients use the health service to inform their load-balancing behavior. This will be used to replace our current usage of polling the debug port to determine whether a given service is up and running. It may also be useful for more comprehensive checks and blackbox probing in the future. Part of #5074	2020-10-06 12:14:02 -07:00
Jacob Hoffman-Andrews	50d404333e	akamai-purger: empty queue on shutdown (#4944 )	2020-07-10 13:04:46 -07:00
Roland Bracewell Shoemaker	046955e99c	Add a standalone akamai purger service (#4040 ) Fixes #4030.	2019-02-05 09:00:31 -08:00
Jacob Hoffman-Andrews	bb09cb8fb0	Remove akamai-purger, not currently used. (#1843 )	2016-05-25 10:18:15 -07:00
Jacob Hoffman-Andrews	e6c17e1717	Switch to new vendor style (#1747 ) * Switch to new vendor style. * Fix metrics generate command. * Fix miekg/dns types_generate. * Use generated copies of files. * Update miekg to latest. Fixes a problem with `go generate`. * Set GO15VENDOREXPERIMENT. * Build in letsencrypt/boulder. * fix travis more. * Exclude vendor instead of godeps. * Replace some ... * Fix unformatted cmd * Fix errcheck for vendorexp * Add GO15VENDOREXPERIMENT to Makefile. * Temp disable errcheck. * Restore master fetch. * Restore errcheck. * Build with 1.6 also. * Match statsd." Skip errcheck unles Go1.6. * Add other ignorepkg. * Fix errcheck. * move errcheck * Remove go1.6 requirement. * Put godep-restore with errcheck. * Remove go1.6 dep. * Revert master fetch revert. * Remove -r flag from godep save. * Set GO15VENDOREXPERIMENT in Dockerfile and remove _worskpace. * Fix Godep version.	2016-04-18 12:51:36 -07:00
Jacob Hoffman-Andrews	2fc0f3143e	Improve logging. Consolidate initialization of stats and logging from each main.go into cmd package. Define a new config parameter, `StdoutLevel`, that determines the maximum log level that will be printed to stdout. It can be set to 6 to inhibit debug messages, or 0 to print only emergency messages, or -1 to print no messages at all. Remove the existing config parameter `Tag`. Instead, choose the tag from the basename of the currently running process. Previously all Boulder log messages had the tag "boulder", but now they will be differentiated by process, like "boulder-wfe". Shorten the date format used in stdout logging, and add the current binary's basename. Consolidate setup function in audit-logger_test.go. Note: Most CLI binaries now get their stats and logging from the parameters of Action. However, a few of our binaries don't use our custom AppShell, and instead use codegangsta/cli directly. For those binaries, we export the new StatsAndLogging method from cmd. Fixes https://github.com/letsencrypt/boulder/issues/852	2015-11-11 16:52:42 -08:00
Roland Shoemaker	7675f33317	Add a Akamai CCU client and use it to purge OCSP responses on revocation and update Adds a (currently gated) Akamai CCU API client used to purge GET OCSP responses from the CDN. It also contains a small tool (cmd/akamai-purger) that can be used to purge ARLs from the command line.	2015-10-27 21:45:25 -07:00

40 Commits