boulder

Commit Graph

Author	SHA1	Message	Date
Jacob Hoffman-Andrews	e0e5a17899	crl: add cache control headers (#8011 ) The crl-storer passes along Cache-Control and Expires from the crl-updater (because the crl-updater knows the UpdatePeriod). The crl-updater calculates the Expires header based on when it expects to update the CRL, plus a margin of error. Fixes #8004	2025-02-13 14:20:29 -08:00
Jacob Hoffman-Andrews	eda496606d	crl-updater: split temporal/explicit sharding by serial (#7990 ) When we turn on explicit sharding, we'll change the CA serial prefix, so we can know that all issuance from the new prefixes uses explicit sharding, and all issuance from the old prefixes uses temporal sharding. This lets us avoid putting a revoked cert in two different CRL shards (the temporal one and the explicit one). To achieve this, the crl-updater gets a list of temporally sharded serial prefixes. When it queries the `certificateStatus` table by date (`GetRevokedCerts`), it will filter out explicitly sharded certificates: those that don't have their prefix on the list. Part of #7094	2025-02-04 11:45:46 -05:00
Jacob Hoffman-Andrews	a8074d2e9d	test: add more testing for CRL revocation (#7957 ) In revocation_test.go, fetch all CRLs, and look for revoked certificates on both CRLs and OCSP. Make s3-test-srv listen on all interfaces, so the CRL URLs in the CA config work. Add IssuerNameIDs to the CRL URLs in ca.json, to match how those CRLs are uploaded to S3. Make TestRevocation parallel. Speedup from ~60s to ~3s. Increase ocsp-responder's allowed parallelism to account for parallel test. Also, add "maxInflightSignings" to config/ since it's in prod. "maxSigningWaiters" is not yet in prod, so don't move that field. Add a mutex around running crl-updater, and decrease the log level so errors stand out more when they happen.	2025-01-23 18:49:55 -08:00
Aaron Gable	6ae6aa8e90	Dynamically generate grpc-creds at integration test startup (#7477 ) The summary here is: - Move test/cert-ceremonies to test/certs - Move .hierarchy (generated by the above) to test/certs/webpki - Remove our mapping of .hierarchy to /hierarchy inside docker - Move test/grpc-creds to test/certs/ipki - Unify the generation of both test/certs/webpki and test/certs/ipki into a single script at test/certs/generate.sh - Make that script the entrypoint of a new docker compose service - Have t.sh and tn.sh invoke that service to ensure keys and certs are created before tests run No production changes are necessary, the config changes here are just for testing purposes. Part of https://github.com/letsencrypt/boulder/issues/7476	2024-05-15 11:31:23 -04:00
Aaron Gable	94d14689bf	Implement unpredictable issuance from similar intermediates (#7418 ) Replace the CA's "useForRSA" and "useForECDSA" config keys with a single "active" boolean. When the CA starts up, all active RSA issuers will be used to issue precerts with RSA pubkeys, and all ECDSA issuers will be used to issue precerts with ECDSA pubkeys (if the ECDSAForAll flag is true; otherwise just those that are on the allow-list). All "inactive" issuers can still issue OCSP responses, CRLs, and (notably) final certificates. Instead of using the "useForRSA" and "useForECDSA" flags, plus implicit config ordering, to determine which issuer to use to handle a given issuance, simply use the issuer's public key algorithm to determine which issuances it should be handling. All implicit ordering considerations are removed, because the "active" certificates now just form a pool that is sampled from randomly. To facilitate this, update some unit and integration tests to be more flexible and try multiple potential issuing intermediates, particularly when constructing OCSP requests. For this change to be safe to deploy with no user-visible behavior changes, the CA configs must contain: - Exactly one RSA-keyed intermediate with "useForRSALeaves" set to true; and - Exactly one ECDSA-keyed intermediate with "useForECDSALeaves" set to true. If the configs contain more than one intermediate meeting one of the bullets above, then randomized issuance will begin immediately. Fixes https://github.com/letsencrypt/boulder/issues/7291 Fixes https://github.com/letsencrypt/boulder/issues/7290	2024-04-18 10:00:38 -07:00
Aaron Gable	327f96d281	Update integration test hierarchy for the modern era (#7411 ) Update the hierarchy which the integration tests auto-generate inside the ./hierarchy folder to include three intermediates of each key type, two to be actively loaded and one to be held in reserve. To facilitate this: - Update the generation script to loop, rather than hard-coding each intermediate we want - Improve the filenames of the generated hierarchy to be more readable - Replace the WFE's AIA endpoint with a thin aia-test-srv so that we don't have to have NameIDs hardcoded in our ca.json configs Having this new hierarchy will make it easier for our integration tests to validate that new features like "unpredictable issuance" are working correctly. Part of https://github.com/letsencrypt/boulder/issues/729	2024-04-08 14:06:00 -07:00
Matthew McPherrin	cb5384dcd7	Add --addr and/or --debug-addr flags to all commands (#7175 ) Many services already have --addr and/or --debug-addr flags. However, it wasn't universal, so this PR adds flags to commands where they're not currently present. This makes it easier to use a shared config file but listen on different ports, for running multiple instances on a single host. The config options are made optional as well, and removed from config-next/.	2023-12-07 17:41:01 -08:00
Aaron Gable	102b447e8d	Smoother scheduling and leasing for crl-updater (#7010 ) Overhaul crl-updater's default (i.e. non-runOnce) behavior to update individual CRL shards continuously, rather than updating all shards in a large batch. To accomplish this, it spins up one goroutine for each shard of each issuer this updater is responsible for. Each goroutine is solely responsible for its assigned shard. It sleeps for a random amount of time (to stagger their starts), then begins a ticker to wake up every updateInterval and re-issue its shard. As part of this change, refactor updater.go into three separate files (batch.go, continuous.go, and updater.go) containing functions dedicated to single-run batch processing, long-running continuous processing, and shared helpers, respectively. IN-9475 tracks the deprecation of the `updateOffset` config key. The other configuration changes in this PR do not require production changes. Fixes https://github.com/letsencrypt/boulder/issues/7023	2023-09-08 09:16:15 -07:00
Aaron Gable	6a450a2272	Improve CRL shard leasing (#7030 ) Simplify the index-picking logic in the SA's leaseOldestCrlShard method. Specifically, more clearly separate it into "missing" and "non-missing" cases, which require entirely different logic: picking a random missing shard, or picking the oldest unleased shard, respectively. Also change the UpdateCRLShard method to "unlease" shards when they're updated. This allows the crl-updater to run as quickly as it likes, while still ensuring that multiple instances do not step on each other's toes. The config change for shardWidth and lookbackPeriod instead of certificateLifetime has been deployed in prod since IN-8445. The config change changing the shardWidth is just so that the tests neither produce a bazillion shards, nor have to do a bazillion SA queries for each chunk within a shard, improving the readability of test logs. Part of https://github.com/letsencrypt/boulder/issues/7023	2023-08-08 17:05:00 -07:00
Aaron Gable	9a4f0ca678	Deprecate LeaseCRLShards feature (#7009 ) This feature flag is enabled in both staging and prod.	2023-08-07 15:17:00 -07:00
Aaron Gable	908421bb98	crl-updater: lease CRL shards to prevent races (#6941 ) Add a new feature flag, LeaseCRLShards, which controls certain aspects of crl-updater's behavior. When this flag is enabled, crl-updater calls the new SA.LeaseCRLShard method before beginning work on a shard. This prevents it from stepping on the toes of another crl-updater instance which may be working on the same shard. This is important to prevent two competing instances from accidentally updating a CRL's Number (which is an integer representation of its thisUpdate timestamp) backwards, which would be a compliance violation. When this flag is enabled, crl-updater also calls the new SA.UpdateCRLShard method after finishing work on a shard. In the future, additional work will be done to make crl-updater use the "give me the oldest available shard" mode of the LeaseCRLShard method. Fixes https://github.com/letsencrypt/boulder/issues/6897	2023-07-19 15:11:16 -07:00
Aaron Gable	fe523f142d	crl-updater: retry failed shards (#6907 ) Add per-shard exponential backoff and retry to crl-updater. Each individual CRL shard will be retried up to MaxAttempts (default 1) times, with exponential backoff starting at 1 second and maxing out at 1 minute between each attempt. This can effectively reduce the parallelism of crl-updater: while a goroutine is sleeping between attempts of a failing shard, it is not doing work on another shard. This is a desirable feature, since it means that crl-updater gently reduces the total load it places on the network and database when shards start to fail. Setting this new config parameter is tracked in IN-9140 Fixes https://github.com/letsencrypt/boulder/issues/6895	2023-05-22 12:59:09 -07:00
Matthew McPherrin	b7d9f8c2e3	In config-next/, opentelemetry -> openTelemetry for consistency (#6888 ) In configs, opentelemetry -> openTelemetry As pointed out in review of #6867, these should match the case of their corresponding Go identifiers for consistency. JSON keys are case-insensitive in Go (part of why we've got a fork in go-jose), so this change should have no functional impact.	2023-05-15 17:07:29 -04:00
Samantha	19c5244088	test: Use consul hostname instead of IP for dnsAuthority (#6883 ) Standardize on hostnames for dnsAuthority to match production. Related to #6869	2023-05-11 14:13:53 -07:00
Jacob Hoffman-Andrews	ac4be89b56	grpc: add NoWaitForReady config field (#6850 ) Currently we set WaitForReady(true), which causes gRPC requests to not fail immediately if no backends are available, but instead wait until the timeout in case a backend does become available. The downside is that this behavior masks true connection errors. We'd like to turn it off. Fixes #6834	2023-05-09 16:16:44 -07:00
Matthew McPherrin	8427245675	OTel Integration test using jaeger (#6842 ) This adds Jaeger's all-in-one dev container (with no persistent storage) to boulder's dev docker-compose. It configures config-next/ to send all traces there. A new integration test creates an account and issues a cert, then verifies the trace contains some set of expected spans. This test found that async finalize broke spans, so I fixed that and a few related spots where we make a new context.	2023-05-05 10:41:29 -04:00
Matthew McPherrin	05c9106eba	lints: Consistently format JSON configuration files (#6755 ) - Consistently format existing test JSON config files - Add a small Python script which loads and dumps JSON files - Add CI JSON lint test to CI --------- Co-authored-by: Aaron Gable <aaron@aarongable.com>	2023-03-20 18:11:19 -04:00
Matthew McPherrin	e1ed1a2ac2	Remove beeline tracing (#6733 ) Remove tracing using Beeline from Boulder. The only remnant left behind is the deprecated configuration, to ensure deployability. We had previously planned to swap in OpenTelemetry in a single PR, but that adds significant churn in a single change, so we're doing this as multiple steps that will each be significantly easier to reason about and review. Part of #6361	2023-03-14 15:14:27 -07:00
Jacob Hoffman-Andrews	cd1bbc0d82	Tidy up integration test environment (#6668 ) Remove `example.com` domain name, which was used by the deleted OldTLS tests. Remove GODEBUG=x509sha1=1. Add a longer comment for the Consul DNS fallback in docker-compose.yml. Use the "dnsAuthority" field for all gRPC clients in config-next, instead of implicitly relying on the system DNS. This matches what we do in prod. Make "dnsAuthority" field of GRPCClientConfig mandatory whenever SRVLookup or SRVLookups is used. Make test/config/ocsp-responder.json use ServerAddress instead of SRVLookup, like the rest of test/config.	2023-02-16 09:33:24 -08:00
Aaron Gable	4466c953de	CA: Expose all gRPC services on single address (#6495 ) Now that we have the ability to easily add multiple gRPC services to the same server, and control access to each service individually, use that capability to expose the CA's CertificateAuthority, OCSPGenerator, and CRLGenerator services all on the same address/port. This will make establishing connections to the CA easier, but no less secure. Part of #6448	2022-11-08 15:28:59 -08:00
Aaron Gable	6efd941e3c	Stabilize CRL shard boundaries (#6445 ) Add two new config keys to the crl-updater: * shardWidth, which controls the width of the chunks that we divide all of time into, with a default value of "16h" (approximately the same as today's shard width derived from 128 shards covering 90 days); and * lookbackPeriod, which controls the amount of already-expired certificates that should be included in our CRLs to ensure that even certificates which are revoked immediately before they expire still show up in aborts least one CRL, with a default value of "24h" (approximately the same as today's lookback period derived from our run frequency of 6h). Use these two new values to change the way CRL shards are computed. Previously, we would compute the total time we care about based on the configured certificate lifetime (to determine how far forward to look) and the configured update period (to determine how far back to look), and then divide that time evenly by the number of shards. However, this method had two fatal flaws. First, if the certificate lifetime is configured incorrectly, then the CRL updater will fail to query the database for some certs that should be included in the CRLs. Second, if the update period is changed, this would change the lookback period, which in turn would change the shard width, causing all CRL entries to suddenly change which shard they're in. Instead, first compute all chunk locations based only on the shard width and number of shards. Then determine which chunks we need to care about based on the configured lookback period and by querying the database for the farthest-future expiration, to ensure we cover all extant certificates. This may mean that more than one chunk of time will get mapped to a single shard, but that's okay -- each chunk will remain mapped to the same shard for the whole time we care about it. Fixes #6438 Fixes #6440	2022-10-27 15:59:48 -07:00
Samantha	9c12e58c7b	grpc: Allow static host override in client config (#6423 ) - Add a new gRPC client config field which overrides the dNSName checked in the certificate presented by the gRPC server. - Revert all test gRPC credentials to `<service>.boulder` - Revert all ClientNames in gRPC server configs to `<service>.boulder` - Set all gRPC clients in `test/config` to use `serverAddress` + `hostOverride` - Set all gRPC clients in `test/config-next` to use `srvLookup` + `hostOverride` - Rename incorrect SRV record for `ca` with port `9096` to `ca-ocsp` - Rename incorrect SRV record for `ca` with port `9106` to `ca-crl` Resolves #6424	2022-10-03 15:23:55 -07:00
Samantha	90eb90bdbe	test: Replace sd-test-srv with consul (#6389 ) - Add a dedicated Consul container - Replace `sd-test-srv` with Consul - Add documentation for configuring Consul - Re-issue all gRPC credentials for `<service-name>.service.consul` Part of #6111	2022-09-19 16:13:53 -07:00
Jacob Hoffman-Andrews	db044a8822	log: fix spurious honeycomb warnings; improve stdout logger (#6364 ) Honeycomb was emitting logs directly to stderr like this: ``` WARN: Missing API Key. WARN: Dataset is ignored in favor of service name. Data will be sent to service name: boulder ``` Fix this by providing a fake API key and replacing "dataset" with "serviceName" in configs. Also add missing Honeycomb configs for crl-updater. For stdout-only logger, include checksums and escape newlines.	2022-09-14 11:25:02 -07:00
Jacob Hoffman-Andrews	dd1c52573e	log: allow logging to stdout/stderr instead of syslog (#6307 ) Right now, Boulder expects to be able to connect to syslog, and panics if it's not available. We'd like to be able to log to stdout/stderr as a replacement for syslog. - Add a detailed timestamp (down to microseconds, same as we collect in prod via syslog). - Remove the escape codes for colorizing output. - Report the severity level numerically rather than with a letter prefix. Add locking for stdout/stderr and syslog logs. Neither the [syslog] package nor the [os] package document concurrency-safety, and the Go rule is: if it's not documented to be concurrent-safe, it's not. Notably the [log.Logger] package is documented to be concurrent-safe, and a look at its implementation shows it uses a Mutex internally. Remove places that use the singleton `blog.Get()`, and instead pass through a logger from main in all the places that need it. [syslog]: https://pkg.go.dev/log/syslog [os]: https://pkg.go.dev/os [log.Logger]: https://pkg.go.dev/log#Logger	2022-08-29 06:19:22 -07:00
Aaron Gable	6a9bb399f7	Create new crl-storer service (#6264 ) Create a new crl-storer service, which receives CRL shards via gRPC and uploads them to an S3 bucket. It ignores AWS SDK configuration in the usual places, in favor of configuration from our standard JSON service config files. It ensures that the CRLs it receives parse and are signed by the appropriate issuer before uploading them. Integrate crl-updater with the new service. It streams bytes to the crl-storer as it receives them from the CA, without performing any checking at the same time. This new functionality is disabled if the crl-updater does not have a config stanza instructing it how to connect to the crl-storer. Finally, add a new test component, the s3-test-srv. This acts similarly to the existing mail-test-srv: it receives requests, stores information about them, and exposes that information for later querying by the integration test. The integration test uses this to ensure that a newly-revoked certificate does show up in the next generation of CRLs produced. Fixes #6162	2022-08-08 16:22:48 -07:00
Samantha	576b6777b5	grpc: Implement a static multiple IP address gRPC resolver (#6270 ) - Implement a static resolver for the gPRC dialer under the scheme `static:///` which allows the dialer to resolve a backend from a static list of IPv4/IPv6 addresses passed via the existing JSON config. - Add config key `serverAddresses` to the `GRPCClientConfig` which, when populated, enables static IP resolution of gRPC server backends. - Set `config-next` to use static gRPC backend resolution for all SA clients. - Generate a new SA certificate which adds `10.77.77.77` and `10.88.88.88` to the SANs. Resolves #6255	2022-08-05 10:20:57 -07:00
Aaron Gable	694d73d67b	crl-updater: add UpdateOffset config to run on a schedule (#6260 ) Add a new config key `UpdateOffset` to crl-updater, which causes it to run on a regular schedule rather than running immediately upon startup and then every `UpdatePeriod` after that. It is safe for this new config key to be omitted and take the default zero value. Also add a new command line flag `runOnce` to crl-updater which causes it to immediately run a single time and then exit, rather than running continuously as a daemon. This will be useful for integration tests and emergency situations. Part of #6163	2022-07-29 13:30:16 -07:00
Aaron Gable	436061fb35	CRL: Create crl-updater service (#6212 ) Create a new service named crl-updater. It is responsible for maintaining the full set of CRLs we issue: one "full and complete" CRL for each currently-active Issuer, split into a number of "shards" which are essentially CRLs with arbitrary scopes. The crl-updater is modeled after the ocsp-updater: it is a long-running standalone service that wakes up periodically, does a large amount of work in parallel, and then sleeps. The period at which it wakes to do work is configurable. Unlike the ocsp-responder, it does all of its work every time it wakes, so we expect to set the update frequency at 6-24 hours. Maintaining CRL scopes is done statelessly. Every certificate belongs to a specific "bucket", given its notAfter date. This mapping is generally unchanging over the life of the certificate, so revoked certificate entries will not be moving between shards upon every update. The only exception is if we change the number of shards, in which case all of the bucket boundaries will be recomputed. For more details, see the comment on `getShardBoundaries`. It uses the new SA.GetRevokedCerts method to collect all of the revoked certificates whose notAfter timestamps fall within the boundaries of each shard's time-bucket. It uses the new CA.GenerateCRL method to sign the CRLs. In the future, it will send signed CRLs to the crl-storer to be persisted outside our infrastructure. Fixes #6163	2022-07-08 09:34:51 -07:00

29 Commits