boulder

Commit Graph

Author	SHA1	Message	Date
James Renken	ac68828f43	Replace most uses of net.IP with netip.Addr (#8205 ) Retain `net.IP` only where we directly work with `x509.Certificate` and friends. Fixes #5925 Depends on #8196	2025-05-27 15:05:35 -07:00
James Renken	f6c748c1c3	WFE/nonce: Remove deprecated NoncePrefixKey field (#7825 ) Remove the deprecated WFE & nonce config field `NoncePrefixKey`, which has been replaced by `NonceHMACKey`. <del>DO NOT MERGE until:</del> - <del>#7793 (in `release-2024-11-18`) has been deployed, AND:</del> - <del>`NoncePrefixKey` has been removed from all running configs.</del> Fixes #7632	2025-02-06 15:32:49 -08:00
James Renken	0a27cba9f4	WFE/nonce: Add NonceHMACKey field (#7793 ) Add a new WFE & nonce config field, `NonceHMACKey`, which uses the new `cmd.HMACKeyConfig` type. Deprecate the `NoncePrefixKey` config field. Generalize the error message when validating `HMACKeyConfig` in `config`. Remove the deprecated `UseDerivablePrefix` config field, which is no longer used anywhere. Part of #7632	2024-11-13 10:31:28 -05:00
Samantha Frank	d9046ae495	config: Improve comment for HMACKeyConfig and add TODOs (#7633 )	2024-07-25 16:16:58 -04:00
Aaron Gable	5be3650e56	Remove deprecated WFE.RedeemNonceServices (#7493 ) Fixes https://github.com/letsencrypt/boulder/issues/6610	2024-05-21 13:13:13 -04:00
Jacob Hoffman-Andrews	ce5632b480	Remove `service1` / `service2` names in consul (#7266 ) These names corresponded to single instances of a service, and were primarily used for (a) specifying which interface to bind a gRPC port on and (b) allowing `health-checker` to check individual instances rather than a service as a whole. For (a), change the `--grpc-addr` flags to bind to "all interfaces." For (b), provide a specific IP address and port for health checking. This required adding a `--hostOverride` flag for `health-checker` because the service certificates contain hostname SANs, not IP address SANs. Clarify the situation with nonce services a little bit. Previously we had one nonce "service" in Consul and got nonces from that (i.e. randomly between the two nonce-service instances). Now we have two nonce services in consul, representing multiple datacenters, and one of them is explicitly configured as the "get" service, while both are configured as the "redeem" service. Part of #7245. Note this change does not yet get rid of the rednet/bluenet distinction, nor does it get rid of all use of 10.88.88.88. That will be a followup change.	2024-01-22 09:34:20 -08:00
Jacob Hoffman-Andrews	a2b2e53045	cmd: fail without panic (#6935 ) For "ordinary" errors like "file not found" for some part of the config, we would prefer to log an error and exit without logging about a panic and printing a stack trace. To achieve that, we want to call `defer AuditPanic()` once, at the top of `cmd/boulder`'s main. That's so early that we haven't yet parsed the config, which means we haven't yet initialized a logger. We compromise: `AuditPanic` now calls `log.Get()`, which will retrieve the configured logger if one has been set up, or will create a default one (which logs to stderr/stdout). AuditPanic and Fail/FailOnError now cooperate: Fail/FailOnError panic with a special type, and AuditPanic checks for that type and prints a simple message before exiting when it's present. This PR also coincidentally fixes a bug: panicking didn't previously cause the program to exit with nonzero status, because it recovered the panic but then did not explicitly exit nonzero. Fixes #6933	2023-06-20 12:29:02 -07:00
Samantha	124c4cc6f5	grpc/sa: Implement deep health checks (#6928 ) Add the necessary scaffolding for deep health checking of our various gRPC components. Each component implementation that also implements the grpc.checker interface will be checked periodically, and the health status of the component will be updated accordingly. Add the necessary methods to SA to implement the grpc.checker interface and register these new health checks with Consul. Additionally: - Update entry point script to check for ProxySQL readiness. - Increase the poll rate for gRPC Consul checks from 5s to 2s to help with DNS failures, due to check failures, on startup. - Change log level for Consul from INFO to ERROR to deal with noisy logs full of transport failures due to Consul gRPC checks firing before the SAs are up. Fixes #6878 Part of #6795	2023-06-12 13:58:53 -04:00
Phil Porada	17fb1b287f	cmd: Export prometheus metrics for TLS cert notBefore and notAfter fields (#6836 ) Export new prometheus metrics for the `notBefore` and `notAfter` fields to track internal certificate validity periods when calling the `Load()` method for a `*tls.Config`. Each metric is labeled with the `serial` field. ``` tlsconfig_notafter_seconds{serial="2152072875247971686"} 1.664821961e+09 tlsconfig_notbefore_seconds{serial="2152072875247971686"} 1.664821960e+09 ``` Fixes https://github.com/letsencrypt/boulder/issues/6829	2023-04-24 16:28:05 -04:00
Matthew McPherrin	0060e695b5	Introduce OpenTelemetry Tracing (#6750 ) Add a new shared config stanza which all boulder components can use to configure their Open Telemetry tracing. This allows components to specify where their traces should be sent, what their sampling ratio should be, and whether or not they should respect their parent's sampling decisions (so that web front-ends can ignore sampling info coming from outside our infrastructure). It's likely we'll need to evolve this configuration over time, but this is a good starting point. Add basic Open Telemetry setup to our existing cmd.StatsAndLogging helper, so that it gets initialized at the same time as our other observability helpers. This sets certain default fields on all traces/spans generated by the service. Currently these include the service name, the service version, and information about the telemetry SDK itself. In the future we'll likely augment this with information about the host and process. Finally, add instrumentation for the HTTP servers and grpc clients/servers. This gives us a starting point of being able to monitor Boulder, but is fairly minimal as this PR is already somewhat unwieldy: It's really only enough to understand that everything is wired up properly in the configuration. In subsequent work we'll enhance those spans with more data, and add more spans for things not automatically traced here. Fixes https://github.com/letsencrypt/boulder/issues/6361 --------- Co-authored-by: Aaron Gable <aaron@aarongable.com>	2023-04-21 10:46:59 -07:00
Aaron Gable	d6cd589795	Simplify how gRPC services start, stop, and clean up (#6771 ) The CA, RA, and VA have multiple goroutines running alongside primary gRPC handling goroutine. These ancillary goroutines should be gracefully shut down when the process is about to exit. Historically, we have handled this by putting a call to each of these goroutine's shutdown function inside cmd.CatchSignals, so that when a SIGINT is received, all of the various cleanup routines happen in sequence. But there's a cleaner way to do it: just use defer! All of these cleanups need to happen after the primary gRPC server has fully shut down, so that we know they stick around at least as long as the service is handling gRPC requests. And when the service receives a SIGINT, cmd.CatchSignals will call the gRPC server's GracefulStop, which will cause the server's .Serve() to finally exit, which will cause start() to exit, which will cause main() to exit, which will cause all deferred functions to be run. In addition, remove filterShutdownErrors as the bug which made it necessary (.Serve() returning an error even when GracefulShutdown() is called) was fixed back in 2017. This allows us to call the start() function in a much more natural way, simply logging any error it returns instead of calling os.Exit(1) if it returns an error. This allows us to simplify the exit-handling code in these three services' main() functions, and lets us be a bit more idiomatic with our deferred cleanup functions. Part of #6794	2023-04-05 14:55:57 -07:00
Matthew McPherrin	49851d7afd	Remove Beeline configuration (#6765 ) In a previous PR, #6733, this configuration was marked deprecated pending removal. Here is that removal.	2023-03-23 16:58:36 -04:00
Samantha	b2224eb4bc	config: Add validation tags to all configuration structs (#6674 ) - Require `letsencrypt/validator` package. - Add a framework for registering configuration structs and any custom validators for each Boulder component at `init()` time. - Add a `validate` subcommand which allows you to pass a `-component` name and `-config` file path. - Expose validation via exported utility functions `cmd.LookupConfigValidator()`, `cmd.ValidateJSONConfig()` and `cmd.ValidateYAMLConfig()`. - Add unit test which validates all registered component configuration structs against test configuration files. Part of #6052	2023-03-21 14:08:03 -04:00
Matthew McPherrin	e1ed1a2ac2	Remove beeline tracing (#6733 ) Remove tracing using Beeline from Boulder. The only remnant left behind is the deprecated configuration, to ensure deployability. We had previously planned to swap in OpenTelemetry in a single PR, but that adds significant churn in a single change, so we're doing this as multiple steps that will each be significantly easier to reason about and review. Part of #6361	2023-03-14 15:14:27 -07:00
Samantha	d73125d8f6	WFE: Add custom balancer implementation which routes nonce redemption RPCs by prefix (#6618 ) Assign nonce prefixes for each nonce-service by taking the first eight characters of the the base64url encoded HMAC-SHA256 hash of the RPC listening address using a provided key. The provided key must be same across all boulder-wfe and nonce-service instances. - Add a custom `grpc-go` load balancer implementation (`nonce`) which can route nonce redemption RPC messages by matching the prefix to the derived prefix of the nonce-service instance which created it. - Modify the RPC client constructor to allow the operator to override the default load balancer implementation (`round_robin`). - Modify the `srv` RPC resolver to accept a comma separated list of targets to be resolved. - Remove unused nonce-service `-prefix` flag. Fixes #6404	2023-02-03 17:52:18 -05:00
Jacob Hoffman-Andrews	fd74d20934	wfe2: update unittest to use gRPC-style backend (#6533 ) Originally, WFEs had a built-in nonce service. Then we added a "remote nonce service" via gRPC, but we kept a fallback path for when the remote nonce service was not configured, to use a built-in nonce service. This PR removes that fallback path. Since the fallback path was relied on by the unittests, this also refactors the unittests to use a gRPC-style nonce service (but in-memory for the unittests). Fixes #6530	2022-12-05 11:36:31 -08:00
Aaron Gable	46c8d66c31	bgrpc.NewServer: support multiple services (#6487 ) Turn bgrpc.NewServer into a builder-pattern, with a config-based initialization, multiple calls to Add to add new gRPC services, and a final call to Build to produce the start() and stop() functions which control server behavior. All calls are chainable to produce compact code in each component's main() function. This improves the process of creating a new gRPC server in three ways: 1) It avoids the need for generics/templating, which was slightly verbose. 2) It allows the set of services to be registered on this server to be known ahead of time. 3) It greatly streamlines adding multiple services to the same server, which we use today in the VA and will be using soon in the SA and CA. While we're here, add a new per-service config stanza to the GRPCServerConfig, so that individual services on the same server can have their own configuration. For now, only provide a "ClientNames" key, which will be used in a follow-up PR. Part of #6454	2022-11-04 13:26:42 -07:00
Aaron Gable	9213bd0993	Streamline gRPC server creation (#6457 ) Collapse most of our boilerplate gRPC creation steps (in particular, creating default metrics, making the server and listener, registering the server, creating and registering the health service, filtering shutdown errors from the output, and gracefully stopping) into a single function in the existing bgrpc package. This allows all but one of our server main functions to drop their calls to NewServer and NewServerMetrics. To enable this, create a new helper type and method in the bgrpc package. Conceptually, this could be just a new function, but it must be attached to a new type so that it can be generic over the type of gRPC server being created. (Unfortunately, the grpc.RegisterFooServer methods do not accept an interface type for their second argument). The only main function which is not updated is the boulder-va, which is a special case because it creates multiple gRPC servers but (unlike the CA) serves them all on the same port with the same server and listener. Part of #6452	2022-10-26 15:45:52 -07:00
Jacob Hoffman-Andrews	3bf06bb4d8	Export the config structs from our main files (#5875 ) This allows our documentation on those structs to show up in our godoc output.	2022-01-12 12:20:27 -08:00
Jacob Hoffman-Andrews	23dd1e21f9	Build all boulder binaries into a single binary (#5693 ) The resulting `boulder` binary can be invoked by different names to trigger the behavior of the relevant subcommand. For instance, symlinking and invoking as `boulder-ca` acts as the CA. Symlinking and invoking as `boulder-va` acts as the VA. This reduces the .deb file size from about 200MB to about 20MB. This works by creating a registry that maps subcommand names to `main` functions. Each subcommand registers itself in an `init()` function. The monolithic `boulder` binary then checks what name it was invoked with (`os.Args[0]`), looks it up in the registry, and invokes the appropriate `main`. To avoid conflicts, all of the old `package main` are replaced with `package notmain`. To get the list of registered subcommands, run `boulder --list`. This is used when symlinking all the variants into place, to ensure the set of symlinked names matches the entries in the registry. Fixes #5692	2021-10-20 17:05:45 -07:00
Aaron Gable	8be32d3312	Use google.protobuf.Empty instead of core.Empty (#5454 ) Replace `core.Empty` with `google.protobuf.Empty` in all of our gRPC methods which consume or return an empty protobuf. The golang core proto libraries provide an empty message type, so there is no need for us to reinvent the wheel. This change is backwards-compatible and does not require a special deploy. The protobuf message descriptions of `core.Empty` and `google.protobuf.Empty` are identical, so their wire-formats are indistinguishable and therefore interoperable / cross-compatible. Fixes #5443	2021-06-03 14:17:41 -07:00
Aaron Gable	9abb39d4d6	Honeycomb integration proof-of-concept (#5408 ) Add Honeycomb tracing to all Boulder components which act as HTTP servers, gRPC servers, or gRPC clients. Add many values which we currently emit to logs to the trace spans. Add a way to configure the Honeycomb integration to our config files, and by default configure all of our tests to "mute" (send nothing). Followup changes will refine the configuration, attempt to reduce the new dependency load, and introduce better sampling. Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218	2021-05-24 16:13:08 -07:00
Jacob Hoffman-Andrews	7194624191	Update grpc and protobuf to latest. (#5369 ) protoc now generates grpc code in a separate file from protobuf code. Also, grpc servers are now required to embed an "unimplemented" interface from the generated .pb.go file, which provides forward compatibility. Update the generate.go files since the invocation for protoc has changed with the split into .pb.org and _grpc.pb.go. Fixes #5368	2021-04-01 17:18:15 -07:00
Aaron Gable	2d14cfb8d1	Add gRPC Health service to all Boulder services (#5093 ) This health service implements the gRPC Health Checking Protocol, as defined in https://github.com/grpc/grpc/blob/master/doc/health-checking.md and as implemented by the gRPC authors in https://pkg.go.dev/google.golang.org/grpc/health@v1.29.0 It simply instantiates a health service, and attaches it to the same gRPC server that is handling requests to the primary (e.g. CA) service. When the main service would be shut down (e.g. because it caught a signal), it also sets the status of the service to NOT_SERVING. This change also imports the health client into our grpc client, ensuring that all of our grpc clients use the health service to inform their load-balancing behavior. This will be used to replace our current usage of polling the debug port to determine whether a given service is up and running. It may also be useful for more comprehensive checks and blackbox probing in the future. Part of #5074	2020-10-06 12:14:02 -07:00
Roland Bracewell Shoemaker	af41bea99a	Switch to more efficient multi nonce-service design (#4308 ) Basically a complete re-write/re-design of the forwarding concept introduced in #4297 (sorry for the rapid churn here). Instead of nonce-services blindly forwarding nonces around to each other in an attempt to find out who issued the nonce we add an identifying prefix to each nonce generated by a service. The WFEs then use this prefix to decide which nonce-service to ask to validate the nonce. This requires a slightly more complicated configuration at the WFE/2 end, but overall I think ends up being a way cleaner, more understandable, easy to reason about implementation. When configuring the WFE you need to provide two forms of gRPC config: * one gRPC config for retrieving nonces, this should be a DNS name that resolves to all available nonce-services (or at least the ones you want to retrieve nonces from locally, in a two DC setup you might only configure the nonce-services that are in the same DC as the WFE instance). This allows getting a nonce from any of the configured services and is load-balanced transparently at the gRPC layer. * a map of nonce prefixes to gRPC configs, this maps each individual nonce-service to it's prefix and allows the WFE instances to figure out which nonce-service to ask to validate a nonce it has received (in a two DC setup you'd want to configure this with all the nonce-services across both DCs so that you can validate a nonce that was generated by a nonce-service in another DC). This balancing is implemented in the integration tests. Given the current remote nonce code hasn't been deployed anywhere yet this makes a number of hard breaking changes to both the existing nonce-service code, and the forwarding code. Fixes #4303.	2019-06-28 12:58:46 -04:00
Roland Bracewell Shoemaker	66f4a48b1b	nonce-service: switch to proto3 (#4304 )	2019-06-27 10:07:17 -04:00
Roland Bracewell Shoemaker	844ae26b65	Allow forwarding of nonce-service Redeem RPCs from one service… (#4297 ) Fixes #4295.	2019-06-26 13:04:31 -07:00
Roland Bracewell Shoemaker	4ca01b5de3	Implement standalone nonce service (#4228 ) Fixes #3976.	2019-06-05 10:41:19 -07:00

28 Commits