boulder

Commit Graph

Author	SHA1	Message	Date
Aaron Gable	23608e19c5	Simplify docker-compose network setup (#8214 ) Remove static IPs from services that can be reached by their service name. Remove consulnet and redisnet, and have the services which connected to those network connect directly to bouldernet instead. Instruct docker-compose to only dynamically allocate IPs from the upper half of the bouldernet subset, to avoid clashing with the few static IPs we still specify.	2025-05-30 13:23:27 -07:00
James Renken	648ab05b37	policy: Support IP address identifiers (#8173 ) Add `pa.validIP` to test IP address validity & absence from IANA reservations. Modify `pa.WillingToIssue` and `pa.WellFormedIdentifiers` to support IP address identifiers. Add a map of allowed identifier types to the `pa` config. Part of #8137	2025-05-14 13:49:51 -07:00
Samantha Frank	dfdf554f76	config: Use hex-encoding for HMACKey (#7950 )	2025-01-15 14:28:09 -05:00
James Renken	0a27cba9f4	WFE/nonce: Add NonceHMACKey field (#7793 ) Add a new WFE & nonce config field, `NonceHMACKey`, which uses the new `cmd.HMACKeyConfig` type. Deprecate the `NoncePrefixKey` config field. Generalize the error message when validating `HMACKeyConfig` in `config`. Remove the deprecated `UseDerivablePrefix` config field, which is no longer used anywhere. Part of #7632	2024-11-13 10:31:28 -05:00
Samantha Frank	d9046ae495	config: Improve comment for HMACKeyConfig and add TODOs (#7633 )	2024-07-25 16:16:58 -04:00
Samantha Frank	986c78a2b4	WFE: Reject new orders containing paused identifiers (#7599 ) Part of #7406 Fixes #7475	2024-07-25 13:46:40 -04:00
Phil Porada	30c6e592f7	sfe: Implement self-service frontend for account pausing/unpausing (#7500 ) Adds a new boulder component named `sfe` aka the Self-service FrontEnd which is dedicated to non-ACME related Subscriber functions. This change implements one such function which is a web interface and handlers for account unpausing. When paused, an ACME client receives a log line URL with a JWT parameter from the WFE. For the observant Subscriber, manually clicking the link opens their web browser and displays a page with a pre-filled HTML form. Upon clicking the form button, the SFE sends an HTTP POST back to itself and either validates the JWT and issues an RA gRPC request to unpause the account, or returns an HTML error page. The SFE and WFE should share a 32 byte seed value e.g. the output of `openssl rand -hex 16` which will be used as a go-jose symmetric signer using the HS256 algorithm. The SFE will check various [RFC 7519](https://datatracker.ietf.org/doc/html/rfc7519) claims on the JWT such as the `iss`, `aud`, `nbf`, `exp`, `iat`, and a custom `apiVersion` claim. The SFE should not yet be relied upon or deployed to staging/production environments. It is very much a work in progress, but this change is big enough as-is. Related to https://github.com/letsencrypt/boulder/issues/7406 Part of https://github.com/letsencrypt/boulder/issues/7499	2024-07-10 10:52:33 -04:00
Phil Porada	c1561b070b	Add a new remoteva binary (#7437 ) * Adds a new `remoteva` binary that takes a distinct configuration from the existing `boulder-va` * Removed the `boulder-remoteva` name registration from `boulder-va`. * Existing users of `boulder-remoteva` must either 1. laterally migrate to `boulder-va` which uses that same config, or 2. switch to using `remoteva` with a new config. Part of https://github.com/letsencrypt/boulder/issues/5294	2024-05-06 16:29:29 -04:00
Samantha	1d2dbbdf25	config: Update minimum TLS version from 1.2 to 1.3 (#7457 ) Set the minimum TLS version used for communication with gRPC, Redis, and Unbound to 1.3. Also remove deprecated `SecurityVersion` setting in `clientTransportCredentials` and `serverTransportCredentials`, as grpc-go now uses the settings provided by the `tls.Config`. The http-01 and tls-alpn-01challenges are not affected: - `939ac1be8f/va/http.go (L140-L157)` - `939ac1be8f/va/tlsalpn.go (L213-L217)`	2024-04-30 09:45:39 -07:00
Jacob Hoffman-Andrews	14a8378dd0	test: remove use of 10.88.88.88 in most places (#7270 ) Part of #7245. There are still a few places that use 10.88.88.88 that will be harder to remove. In particular, some of the Python integration tests start up their own HTTP servers that differ from challtestsrv in some important way (like timing out requests). Because challtestsrv already binds to 10.77.77.77:80, those test servers need a different IP address to bind to. We can probably solve that but I'll leave it for another PR.	2024-01-30 11:34:13 -08:00
Matthew McPherrin	cb5384dcd7	Add --addr and/or --debug-addr flags to all commands (#7175 ) Many services already have --addr and/or --debug-addr flags. However, it wasn't universal, so this PR adds flags to commands where they're not currently present. This makes it easier to use a shared config file but listen on different ports, for running multiple instances on a single host. The config options are made optional as well, and removed from config-next/.	2023-12-07 17:41:01 -08:00
Samantha	7068db96fe	redis: Add support for *redis.Ring shard configuration using SRV records (#7042 ) Part of #5545	2023-09-11 15:05:55 -04:00
Jacob Hoffman-Andrews	38fc840184	sa: refactor how metrics and logging are set up (#7031 ) This eliminates the need for a pair of accessors on `db.WrappedMap` that expose the underlying `sql.DB` and `borp.DbMap`. Fixes #6991	2023-08-08 09:51:23 -07:00
Aaron Gable	8d8fd3731b	Remove VA.DNSResolver (#7001 ) I have confirmed that this config field is not set in any deployment environment. Fixes https://github.com/letsencrypt/boulder/issues/6868	2023-07-13 17:56:41 -07:00
Samantha	124c4cc6f5	grpc/sa: Implement deep health checks (#6928 ) Add the necessary scaffolding for deep health checking of our various gRPC components. Each component implementation that also implements the grpc.checker interface will be checked periodically, and the health status of the component will be updated accordingly. Add the necessary methods to SA to implement the grpc.checker interface and register these new health checks with Consul. Additionally: - Update entry point script to check for ProxySQL readiness. - Increase the poll rate for gRPC Consul checks from 5s to 2s to help with DNS failures, due to check failures, on startup. - Change log level for Consul from INFO to ERROR to deal with noisy logs full of transport failures due to Consul gRPC checks firing before the SAs are up. Fixes #6878 Part of #6795	2023-06-12 13:58:53 -04:00
Matthew McPherrin	3aae67b8a9	Opentelemetry: Add option for public endpoints (#6867 ) This PR adds a new configuration block specifically for the otelhttp instrumentation. This block is separate from the existing "opentelemetry" configuration, and is only relevant when using otelhttp instrumentation. It does not share any codepath with the existing configuration, so it is at the top level to indicate which services it applies to. There's a bit of plumbing new configuration through. I've adopted the measured_http package to also set up opentelemetry instead of just metrics, which should hopefully allow any future changes to be smaller (just config & there) and more consistent between the wfe2 and ocsp responder. There's one option here now, which disables setting [otelhttp.WithPublicEndpoint](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp#WithPublicEndpoint). This option is designed to do exactly what we want: Don't accept incoming spans as parents of the new span created in the server. Previously we had a setting to disable parent-based sampling to help with this problem, which doesn't really make sense anymore, so let's just remove it and simplify that setup path. The default of "false" is designed to be the safe option. It's set to True in the test/ configs for integration tests that use traces, and I expect we'll likely set it true in production eventually once the LBs are configured to handle tracing themselves. Fixes #6851	2023-05-12 15:34:34 -04:00
Samantha	310546a14e	VA: Support discovery of DNS resolvers via Consul (#6869 ) Deprecate `va.DNSResolver` in favor of backwards compatible `va.DNSProvider`. Fixes #6852	2023-05-12 12:54:31 -04:00
Jacob Hoffman-Andrews	ac4be89b56	grpc: add NoWaitForReady config field (#6850 ) Currently we set WaitForReady(true), which causes gRPC requests to not fail immediately if no backends are available, but instead wait until the timeout in case a backend does become available. The downside is that this behavior masks true connection errors. We'd like to turn it off. Fixes #6834	2023-05-09 16:16:44 -07:00
Samantha	c453ca0571	grpc: Deprecate clientNames field (#6870 ) - SRE removed in IN-8755 Fixes #6698	2023-05-08 14:49:27 -04:00
Samantha	487680629d	cmd: TLSConfig values should be string not *string (#6872 ) Fixes #6737	2023-05-08 13:21:42 -04:00
Phil Porada	17fb1b287f	cmd: Export prometheus metrics for TLS cert notBefore and notAfter fields (#6836 ) Export new prometheus metrics for the `notBefore` and `notAfter` fields to track internal certificate validity periods when calling the `Load()` method for a `*tls.Config`. Each metric is labeled with the `serial` field. ``` tlsconfig_notafter_seconds{serial="2152072875247971686"} 1.664821961e+09 tlsconfig_notbefore_seconds{serial="2152072875247971686"} 1.664821960e+09 ``` Fixes https://github.com/letsencrypt/boulder/issues/6829	2023-04-24 16:28:05 -04:00
Matthew McPherrin	0060e695b5	Introduce OpenTelemetry Tracing (#6750 ) Add a new shared config stanza which all boulder components can use to configure their Open Telemetry tracing. This allows components to specify where their traces should be sent, what their sampling ratio should be, and whether or not they should respect their parent's sampling decisions (so that web front-ends can ignore sampling info coming from outside our infrastructure). It's likely we'll need to evolve this configuration over time, but this is a good starting point. Add basic Open Telemetry setup to our existing cmd.StatsAndLogging helper, so that it gets initialized at the same time as our other observability helpers. This sets certain default fields on all traces/spans generated by the service. Currently these include the service name, the service version, and information about the telemetry SDK itself. In the future we'll likely augment this with information about the host and process. Finally, add instrumentation for the HTTP servers and grpc clients/servers. This gives us a starting point of being able to monitor Boulder, but is fairly minimal as this PR is already somewhat unwieldy: It's really only enough to understand that everything is wired up properly in the configuration. In subsequent work we'll enhance those spans with more data, and add more spans for things not automatically traced here. Fixes https://github.com/letsencrypt/boulder/issues/6361 --------- Co-authored-by: Aaron Gable <aaron@aarongable.com>	2023-04-21 10:46:59 -07:00
Matthew McPherrin	49851d7afd	Remove Beeline configuration (#6765 ) In a previous PR, #6733, this configuration was marked deprecated pending removal. Here is that removal.	2023-03-23 16:58:36 -04:00
Samantha	b2224eb4bc	config: Add validation tags to all configuration structs (#6674 ) - Require `letsencrypt/validator` package. - Add a framework for registering configuration structs and any custom validators for each Boulder component at `init()` time. - Add a `validate` subcommand which allows you to pass a `-component` name and `-config` file path. - Expose validation via exported utility functions `cmd.LookupConfigValidator()`, `cmd.ValidateJSONConfig()` and `cmd.ValidateYAMLConfig()`. - Add unit test which validates all registered component configuration structs against test configuration files. Part of #6052	2023-03-21 14:08:03 -04:00
Matthew McPherrin	e1ed1a2ac2	Remove beeline tracing (#6733 ) Remove tracing using Beeline from Boulder. The only remnant left behind is the deprecated configuration, to ensure deployability. We had previously planned to swap in OpenTelemetry in a single PR, but that adds significant churn in a single change, so we're doing this as multiple steps that will each be significantly easier to reason about and review. Part of #6361	2023-03-14 15:14:27 -07:00
Samantha	8227052345	GRPC: Add TODO for Config.GRPCServerConfig.ClientNames (#6718 ) Part of #6698	2023-03-02 17:17:55 -05:00
Matthew McPherrin	391a59921b	Move cmd.ConfigDuration to config.Duration (#6705 ) We rely on the ratelimit/ package in CI to validate our ratelimit configurations. However, because that package relies on cmd/ just for cmd.ConfigDuration, many additional dependencies get pulled in. This refactors just that struct to a separate config package. This was done using Goland's automatic refactoring tooling, which also organized a few imports while it was touching them, keeping standard library, internal and external dependencies grouped.	2023-02-28 08:11:49 -08:00
Jacob Hoffman-Andrews	cd1bbc0d82	Tidy up integration test environment (#6668 ) Remove `example.com` domain name, which was used by the deleted OldTLS tests. Remove GODEBUG=x509sha1=1. Add a longer comment for the Consul DNS fallback in docker-compose.yml. Use the "dnsAuthority" field for all gRPC clients in config-next, instead of implicitly relying on the system DNS. This matches what we do in prod. Make "dnsAuthority" field of GRPCClientConfig mandatory whenever SRVLookup or SRVLookups is used. Make test/config/ocsp-responder.json use ServerAddress instead of SRVLookup, like the rest of test/config.	2023-02-16 09:33:24 -08:00
Samantha	d73125d8f6	WFE: Add custom balancer implementation which routes nonce redemption RPCs by prefix (#6618 ) Assign nonce prefixes for each nonce-service by taking the first eight characters of the the base64url encoded HMAC-SHA256 hash of the RPC listening address using a provided key. The provided key must be same across all boulder-wfe and nonce-service instances. - Add a custom `grpc-go` load balancer implementation (`nonce`) which can route nonce redemption RPC messages by matching the prefix to the derived prefix of the nonce-service instance which created it. - Modify the RPC client constructor to allow the operator to override the default load balancer implementation (`round_robin`). - Modify the `srv` RPC resolver to accept a comma separated list of targets to be resolved. - Remove unused nonce-service `-prefix` flag. Fixes #6404	2023-02-03 17:52:18 -05:00
Phil Porada	9d9a2dddcf	Rework VA PortConfig (#6619 ) Remove the PortConfig field from both the VA's config struct and from the NewValidationAuthorityImpl constructor. This config item is no longer used anywhere, and removing this prevents us from accidentally overriding the "Authorized Ports" (80 and 443) which are required by the Baseline Requirements. Unit tests are still able to override the httpPort and tlsPort fields of the ValidationAuthorityImpl. Fixes #3940	2023-01-30 17:03:33 -08:00
Aaron Gable	d9cb35c60c	Remove unused DBConnect config string (#6615 ) Neither our testing, staging, nor production configs use the DBConfig.DBConnect config value. Remove it. To connect to a database, you have to provide a connection URL. These URLs often contain sensitive information such as DB usernames and passwords, so we don't store them directly in our configs -- instead, we store paths to files which contain these strings, and provision those files via a separate mechanism. We maintained the ability to provide a URL directly in the config for the sake of easy testing, but have not used it for that purpose for some time now.	2023-01-27 13:10:52 -08:00
Samantha	0d6f8569c5	grpc/rocsp: Allow use of TLSv1.2 and TLSv1.3 (#6600 ) When we clamped our MaxVersion to TLS1.2, there wasn't any support for TLS1.3 yet. Allowing higher versions to be negotiated is good. Fixes #6580	2023-01-24 12:53:13 -08:00
Aaron Gable	46c8d66c31	bgrpc.NewServer: support multiple services (#6487 ) Turn bgrpc.NewServer into a builder-pattern, with a config-based initialization, multiple calls to Add to add new gRPC services, and a final call to Build to produce the start() and stop() functions which control server behavior. All calls are chainable to produce compact code in each component's main() function. This improves the process of creating a new gRPC server in three ways: 1) It avoids the need for generics/templating, which was slightly verbose. 2) It allows the set of services to be registered on this server to be known ahead of time. 3) It greatly streamlines adding multiple services to the same server, which we use today in the VA and will be using soon in the SA and CA. While we're here, add a new per-service config stanza to the GRPCServerConfig, so that individual services on the same server can have their own configuration. For now, only provide a "ClientNames" key, which will be used in a follow-up PR. Part of #6454	2022-11-04 13:26:42 -07:00
Samantha	9c12e58c7b	grpc: Allow static host override in client config (#6423 ) - Add a new gRPC client config field which overrides the dNSName checked in the certificate presented by the gRPC server. - Revert all test gRPC credentials to `<service>.boulder` - Revert all ClientNames in gRPC server configs to `<service>.boulder` - Set all gRPC clients in `test/config` to use `serverAddress` + `hostOverride` - Set all gRPC clients in `test/config-next` to use `srvLookup` + `hostOverride` - Rename incorrect SRV record for `ca` with port `9096` to `ca-ocsp` - Rename incorrect SRV record for `ca` with port `9106` to `ca-crl` Resolves #6424	2022-10-03 15:23:55 -07:00
Samantha	ffad58009e	grpc: Backend discovery improvements (#6394 ) - Fork the default `dns` resolver from `go-grpc` to add backend discovery via DNS SRV resource records. - Add new fields for SRV based discovery to `cmd.GRPCClientConfig` - Add new (optional) field `DNSAuthority` for specifying custom DNS server to `cmd.GRPCClientConfig` - Add a utility method to `cmd.GRPCClientConfig` to simplify target URI and host construction. With three schemes and `DNSAuthority` it makes more sense to handle all of this parsing and construction outside of the RPC client constructor. Resolves #6111	2022-09-23 13:11:59 -07:00
Jacob Hoffman-Andrews	db044a8822	log: fix spurious honeycomb warnings; improve stdout logger (#6364 ) Honeycomb was emitting logs directly to stderr like this: ``` WARN: Missing API Key. WARN: Dataset is ignored in favor of service name. Data will be sent to service name: boulder ``` Fix this by providing a fake API key and replacing "dataset" with "serviceName" in configs. Also add missing Honeycomb configs for crl-updater. For stdout-only logger, include checksums and escape newlines.	2022-09-14 11:25:02 -07:00
Jacob Hoffman-Andrews	f98d74c14d	log: emit warnings and errors on stderr (#6325 ) Debug and Info messages still go to stdout. Fix the CAA integration test, which asserted that stderr should be empty when caa-log-checker finds a problem. That used to be the case because we never logged to stderr, but now it is the case. Update the logging docs. Fixes #6324	2022-08-29 15:00:55 -07:00
Aaron Gable	9c197e1f43	Use io and os instead of deprecated ioutil (#6286 ) The iotuil package has been deprecated since go1.16; the various functions it provided now exist in the os and io packages. Replace all instances of ioutil with either io or os, as appropriate.	2022-08-10 13:30:17 -07:00
Samantha	576b6777b5	grpc: Implement a static multiple IP address gRPC resolver (#6270 ) - Implement a static resolver for the gPRC dialer under the scheme `static:///` which allows the dialer to resolve a backend from a static list of IPv4/IPv6 addresses passed via the existing JSON config. - Add config key `serverAddresses` to the `GRPCClientConfig` which, when populated, enables static IP resolution of gRPC server backends. - Set `config-next` to use static gRPC backend resolution for all SA clients. - Generate a new SA certificate which adds `10.77.77.77` and `10.88.88.88` to the SANs. Resolves #6255	2022-08-05 10:20:57 -07:00
Aaron Gable	305ef9cce9	Improve error checking paradigm (#5920 ) We have decided that we don't like the if err := call(); err != nil syntax, because it creates confusing scopes, but we have not cleaned up all existing instances of that syntax. However, we have now found a case where that syntax enables a bug: It caused readers to believe that a later err = call() statement was assigning to an already-declared err in the local scope, when in fact it was assigning to an already-declared err in the parent scope of a closure. This caused our ineffassign and staticcheck linters to be unable to analyze the lifetime of the err variable, and so they did not complain when we never checked the actual value of that error. This change standardizes on the two-line error checking syntax everywhere, so that we can more easily ensure that our linters are correctly analyzing all error assignments.	2022-02-01 14:42:43 -07:00
Samantha	f69b57e0e1	Make DB client initialization uniform and stop setting 'READ-UNCOMMITTED' (#5741 ) Boulder components initialize their gorp and gorp-less (non-wrapped) database clients via two new SA helpers. These helpers handle client construction, database metric initialization, and (for gorp only) debug logging setup. Removes transaction isolation parameter `'READ-UNCOMMITTED'` from all database connections. Fixes #5715 Fixes #5889	2022-01-31 13:34:23 -08:00
Aaron Gable	18389c9024	Remove dead code (#5893 ) Running an older version (v0.0.1-2020.1.4) of `staticcheck` in whole-program mode (`staticcheck --unused.whole-program=true -- ./...`) finds various instances of unused code which don't normally show up as CI issues. I've used this to find and remove a large chunk of the unused code, to pave the way for additional large deletions accompanying the WFE1 removal. Part of #5681	2022-01-19 12:23:06 -08:00
Aaron Gable	a6ad023c6a	Tracing: don't exclude gRPC client spans (#5664 ) We'd prefer not to send these, as they don't contain a significant amount of relevant information, and they nearly double the total number of spans sent. However, not sending them is worse: the resulting server spans created on the other side of the connection all have parent span IDs that "don't exist" (are never sent to the tracing service), breaking all ability of the tracing system to infer causality. The ideal solution here would be to customize the client wrapper code to not generate these client spans at all, but that requires upstream changes that may take some time. In the mean time, let's at least do this.	2021-09-21 13:44:30 -06:00
J.C. Jones	7b31bdb30a	Add read-only dbConns to SQLStorageAuthority and OCSPUpdater (#5555 ) This changeset adds a second DB connect string for the SA for use in read-only queries that are not themselves dependencies for read-write queries. In other words, this is attempting to only catch things like rate-limit `SELECT`s and other coarse-counting, so we can potentially move those read queries off the read-write primary database. It also adds a second DB connect string to the OCSP Updater. This is a little trickier, as the subsequent `UPDATE`s _are_ dependent on the output of the `SELECT`, but in this case it's operating on data batches, and a few seconds' replication latency are several orders of magnitude below the threshold for update frequency, so any certificates that aren't caught on run `n` can be caught on run `n+1`. Since we export DB metrics to Prometheus, this also refactors `InitDBMetrics` to take a DB Address (host:port tuple) and User out of the DB connection DSN and include those as labels in the metrics. Fixes #5550 Fixes #4985	2021-08-02 11:21:34 -07:00
Aaron Gable	1f0f59883c	Honeycomb: emit sample rate 1 instead of 0 (#5448 ) The sampling rate integer is used by the span collector to estimate "how many spans does this span actually represent". This allows accurate volume comparisons: for example, if you sample successful requests at a rate of 1/100 and error requests at a rate of 1/10, the trace query interface will know to scale its query results by those respective values in order to arrive at accurate error rate estimates. Previously, this code was returning a sample rate integer of 0 to indicate that the span was selected for sampling due to an extraordinary circumstance. This was wrong. This change updates the sample rate int to be 1, indicating that every such span which exhibited this feature was sampled, and represents only itself.	2021-06-01 08:42:14 -07:00
Aaron Gable	02b6ea1489	Honeycomb: don't transmit gRPC client spans (#5446 ) Add a test to the Honeycomb SamplerHook which never sends spans which have a "meta.type" of "grpc_client". This field and value are set automatically by the Honeycomb gRPC client interceptor, and can't be set by application code (any fields set by application code have "app." prepended to their name). Never sending these spans reduces our visibility into in-datacenter network latency, but also reduces the number of spans sent by roughly 50%.	2021-05-28 09:36:01 -07:00
Aaron Gable	e3d194f4b0	Honeycomb: Use a deterministic SamplerHook (#5433 ) Switch from using the honeycomb beeline's built-in sampling to a sampler hook which bases its sampling decisions on a hash of the trace ID. This allows us to do "deterministic" sampling, where every span in a given trace will either be sent or not (since the trace ID is the same across all spans in a trace), giving us more complete traces. This preserves the same simple (single integer) configuration of the sample rate. The sample rate can be set differently for different boulder components (e.g. 1 at the WFE, 100 at the RA, and 1000 at the nonce-service), but the sampling rate denominator should only increase towards the leaves of a gRPC request path.	2021-05-27 13:13:54 -07:00
Aaron Gable	773c98875b	Remove PasswordConfig's in-config option (#5434 ) We never really want to be using in-config passwords anyway, so remove this option. Fixes #5426	2021-05-26 10:14:13 -07:00
Aaron Gable	9abb39d4d6	Honeycomb integration proof-of-concept (#5408 ) Add Honeycomb tracing to all Boulder components which act as HTTP servers, gRPC servers, or gRPC clients. Add many values which we currently emit to logs to the trace spans. Add a way to configure the Honeycomb integration to our config files, and by default configure all of our tests to "mute" (send nothing). Followup changes will refine the configuration, attempt to reduce the new dependency load, and introduce better sampling. Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218	2021-05-24 16:13:08 -07:00
Samantha	5a92926b0c	Remove dbconfig migration deployability code (#5348 ) Default boulder code paths to exclusively use the `db` config key Fixes #5338	2021-03-18 16:41:15 -07:00

1 2 3 4

182 Commits