boulder

Commit Graph

Author	SHA1	Message	Date
Jacob Hoffman-Andrews	0a726370b9	crl/updater: fix lookback period (#8072 ) We were adding the lookback period to `clk.Now()` but should have been subtracting it. Includes a unittest, which I've verified fails against the pre-fix code.	2025-03-18 10:39:29 -07:00
Jacob Hoffman-Andrews	e0e5a17899	crl: add cache control headers (#8011 ) The crl-storer passes along Cache-Control and Expires from the crl-updater (because the crl-updater knows the UpdatePeriod). The crl-updater calculates the Expires header based on when it expects to update the CRL, plus a margin of error. Fixes #8004	2025-02-13 14:20:29 -08:00
Jacob Hoffman-Andrews	eda496606d	crl-updater: split temporal/explicit sharding by serial (#7990 ) When we turn on explicit sharding, we'll change the CA serial prefix, so we can know that all issuance from the new prefixes uses explicit sharding, and all issuance from the old prefixes uses temporal sharding. This lets us avoid putting a revoked cert in two different CRL shards (the temporal one and the explicit one). To achieve this, the crl-updater gets a list of temporally sharded serial prefixes. When it queries the `certificateStatus` table by date (`GetRevokedCerts`), it will filter out explicitly sharded certificates: those that don't have their prefix on the list. Part of #7094	2025-02-04 11:45:46 -05:00
Jacob Hoffman-Andrews	e0221b6bbe	crl-updater: query by explicit shard too (#7973 ) Add querying by explicit shard (SA.GetRevokedCertsByShard) in addition to querying by temporal shard (SA.GetRevokedCerts). Merge results from both kinds of shard. De-duplicate by serial within a shard, because the same certificate could wind up in a temporal shard that matches its explicit shard. When de-duplicating, validate that revocation reasons are the same or (very unlikely) represent a re-revocation based on demonstrating key compromise. This can happen because the two different SA queries occur at slightly different times. Add unit testing that CRL entries make it through the whole pipeline from SA, to CA, to uploader. Rename some types in the unittest to be more accessible. Tweak a comment in SA.UpdateRevokedCertificate to make it clear that status _and_ reason are critical for re-revocation. Note: This GetRevokedCertsByShard code path will always return zero certificates right now, because nothing is writing to the `revokedCertificates` table. Writing to that table is gated on certificates having CRL URLs in them, which is not yet implemented (and will be config-gated). Part of #7094	2025-01-27 10:11:09 -08:00
Aaron Gable	46fc4c25ab	Re-enable wastedassign linter (#7788 ) Fixes https://github.com/letsencrypt/boulder/issues/6202	2024-11-05 07:45:37 -08:00
Aaron Gable	89213f9214	Use generic types for gRPC stream implementations (#7501 ) Update the version of protoc-gen-go-grpc that we use to generate Go gRPC code from our proto files, and update the versions of other gRPC tools and libraries that we use to match. Turn on the new `use_generic_streams` code generation flag to change how protoc-gen-go-grpc generates implementations of our streaming methods, from creating a wholly independent implementation for every stream to using shared generic implementations. Take advantage of this code-sharing to remove our SA "wrapper" methods, now that they have truly the same signature as the SARO methods which they wrap. Also remove all references to the old-style stream names (e.g. foopb.FooService_BarMethodClient) and replace them with the new underlying generic names, for the sake of consistency. Finally, also remove a few custom stream test mocks, replacing them with the generic mocks.ServerStreamClient. Note that this PR does not change the names in //mocks/sa.go, to avoid conflicts with work happening in the pursuit of https://github.com/letsencrypt/boulder/issues/7476. Note also that this PR updates the version of protoc-gen-go-grpc that we use to a specific commit. This is because, although a new release of grpc-go itself has been cut, the codegen binary is a separate Go module with its own releases, and it hasn't had a new release cut yet. Tracking for that is in https://github.com/grpc/grpc-go/issues/7030.	2024-05-24 13:54:25 -07:00
Aaron Gable	4663b9898e	Use custom mocks instead of mocks.StorageAuthority (#7494 ) Replace "mocks.StorageAuthority" with "sapb.StorageAuthorityClient" in our test mocks. The improves them by removing implementations of the methods the tests don't actually need, instead of inheriting lots of extraneous methods from the huge and cumbersome mocks.StorageAuthority. This reduces our usage of mocks.StorageAuthority to only the WFE tests (which create one in the frequently-used setup() function), which will make refactoring those mocks in the pursuit of https://github.com/letsencrypt/boulder/issues/7476 much easier. Part of https://github.com/letsencrypt/boulder/issues/7476	2024-05-21 09:16:17 -07:00
Aaron Gable	1d31a22245	Export crl shard calculation code for future use (#7127 ) Make crl.GetChunkAtTIme an exported function, so that it can be used by the RA in a future PR without making that PR bigger and harder to review. Also move it from being a method to an independent function which takes two new arguments to compensate for the loss of its receiver. Also move some tests from batch_test to updater_test, where they should have been in the first place. Part of https://github.com/letsencrypt/boulder/issues/7094	2023-11-02 11:21:01 -07:00
Aaron Gable	102b447e8d	Smoother scheduling and leasing for crl-updater (#7010 ) Overhaul crl-updater's default (i.e. non-runOnce) behavior to update individual CRL shards continuously, rather than updating all shards in a large batch. To accomplish this, it spins up one goroutine for each shard of each issuer this updater is responsible for. Each goroutine is solely responsible for its assigned shard. It sleeps for a random amount of time (to stagger their starts), then begins a ticker to wake up every updateInterval and re-issue its shard. As part of this change, refactor updater.go into three separate files (batch.go, continuous.go, and updater.go) containing functions dedicated to single-run batch processing, long-running continuous processing, and shared helpers, respectively. IN-9475 tracks the deprecation of the `updateOffset` config key. The other configuration changes in this PR do not require production changes. Fixes https://github.com/letsencrypt/boulder/issues/7023	2023-09-08 09:16:15 -07:00
Aaron Gable	9a4f0ca678	Deprecate LeaseCRLShards feature (#7009 ) This feature flag is enabled in both staging and prod.	2023-08-07 15:17:00 -07:00
Aaron Gable	63319d8cd0	crl-updater: duplicate shard 0 as shard NumShards (#7008 ) When crl-updater produces a CRL with shard index 0, have it also produce an identical CRL with shard index NumShards (e.g. 128). This is the first step towards having it only produce shards numbered 1 through NumShards, i.e. transition from using 0-indexing to 1-indexing. We want to do this because various aspects of Golang and gRPC cannot tell the difference between "this struct/message has no shard index set" and "this struct/message has shard index 0 set". Part of https://github.com/letsencrypt/boulder/issues/7007	2023-07-26 15:32:20 -07:00
Aaron Gable	908421bb98	crl-updater: lease CRL shards to prevent races (#6941 ) Add a new feature flag, LeaseCRLShards, which controls certain aspects of crl-updater's behavior. When this flag is enabled, crl-updater calls the new SA.LeaseCRLShard method before beginning work on a shard. This prevents it from stepping on the toes of another crl-updater instance which may be working on the same shard. This is important to prevent two competing instances from accidentally updating a CRL's Number (which is an integer representation of its thisUpdate timestamp) backwards, which would be a compliance violation. When this flag is enabled, crl-updater also calls the new SA.UpdateCRLShard method after finishing work on a shard. In the future, additional work will be done to make crl-updater use the "give me the oldest available shard" mode of the LeaseCRLShard method. Fixes https://github.com/letsencrypt/boulder/issues/6897	2023-07-19 15:11:16 -07:00
Aaron Gable	fe523f142d	crl-updater: retry failed shards (#6907 ) Add per-shard exponential backoff and retry to crl-updater. Each individual CRL shard will be retried up to MaxAttempts (default 1) times, with exponential backoff starting at 1 second and maxing out at 1 minute between each attempt. This can effectively reduce the parallelism of crl-updater: while a goroutine is sleeping between attempts of a failing shard, it is not doing work on another shard. This is a desirable feature, since it means that crl-updater gently reduces the total load it places on the network and database when shards start to fail. Setting this new config parameter is tracked in IN-9140 Fixes https://github.com/letsencrypt/boulder/issues/6895	2023-05-22 12:59:09 -07:00
Aaron Gable	ba34ac6b6e	Use read-only SA clients in wfe, ocsp, and crl (#6484 ) In the WFE, ocsp-responder, and crl-updater, switch from using StorageAuthorityClients to StorageAuthorityReadOnlyClients. This ensures that these services cannot call methods which write to our database. Fixes #6454	2022-12-02 13:48:28 -08:00
Aaron Gable	6efd941e3c	Stabilize CRL shard boundaries (#6445 ) Add two new config keys to the crl-updater: * shardWidth, which controls the width of the chunks that we divide all of time into, with a default value of "16h" (approximately the same as today's shard width derived from 128 shards covering 90 days); and * lookbackPeriod, which controls the amount of already-expired certificates that should be included in our CRLs to ensure that even certificates which are revoked immediately before they expire still show up in aborts least one CRL, with a default value of "24h" (approximately the same as today's lookback period derived from our run frequency of 6h). Use these two new values to change the way CRL shards are computed. Previously, we would compute the total time we care about based on the configured certificate lifetime (to determine how far forward to look) and the configured update period (to determine how far back to look), and then divide that time evenly by the number of shards. However, this method had two fatal flaws. First, if the certificate lifetime is configured incorrectly, then the CRL updater will fail to query the database for some certs that should be included in the CRLs. Second, if the update period is changed, this would change the lookback period, which in turn would change the shard width, causing all CRL entries to suddenly change which shard they're in. Instead, first compute all chunk locations based only on the shard width and number of shards. Then determine which chunks we need to care about based on the configured lookback period and by querying the database for the farthest-future expiration, to ensure we cover all extant certificates. This may mean that more than one chunk of time will get mapped to a single shard, but that's okay -- each chunk will remain mapped to the same shard for the whole time we care about it. Fixes #6438 Fixes #6440	2022-10-27 15:59:48 -07:00
Aaron Gable	9bd0c7967f	Do CRL shard math with shorter durations (#6425 ) Change the anchor time used to stabilize shard boundaries from the zero time (time.Time{}) to the notBefore date of Let's Encrypt's first self-signed root certificate. This prevents us from attempting to compute durations that are greater than 290 years, the maximum representable duration in Go. Previously, using the zero time was causing our durations to all be equal to the maximum duration, removing the utility of the anchor time and causing our shard boundaries to drift with time.	2022-10-03 11:24:40 -07:00
Aaron Gable	76583552c2	Simplify crl-updater's simultaneous gRPC streams (#6419 ) Previously, we would stream CRL Entries directly from the SA's response stream into the CA's request stream, and similarly directly stream bytes from the CA's response stream into the Storer's request stream. Since we're seeing odd errors and inconsistencies in our gRPC streaming metrics, simplify these to only conduct one stream at a time. This will make our streaming and error semantics much simpler, at the cost of memory usage in the updater.	2022-09-29 14:57:47 -07:00
Aaron Gable	d53c90a3bc	Streamline and test crl-updater errors and audit logs (#6382 ) Modify the way errors are handled in crl-updater: - Rather than having each method in the tick, tickIssuer, tickShard chain concatenate all errors from its children, simply have them summarize the errors. This results in much shorter error messages. - Rather than having each method log its own errors as it returns, have each caller responsible for logging the errors it receives from its children. In addition, add tests for tick, tickIssuer, and tickShard which cover their simple errors paths, where one of the gRPC requests to the SA, CA, or CRLStorer encounters an error. These tests let us ensure that errors are being properly propagated upwards through the layers of indirection and goroutines in the three methods under test, and that the appropriate metrics are being incremented and log messages are being printed. Fixes #6375	2022-09-14 16:41:28 -07:00
Aaron Gable	436061fb35	CRL: Create crl-updater service (#6212 ) Create a new service named crl-updater. It is responsible for maintaining the full set of CRLs we issue: one "full and complete" CRL for each currently-active Issuer, split into a number of "shards" which are essentially CRLs with arbitrary scopes. The crl-updater is modeled after the ocsp-updater: it is a long-running standalone service that wakes up periodically, does a large amount of work in parallel, and then sleeps. The period at which it wakes to do work is configurable. Unlike the ocsp-responder, it does all of its work every time it wakes, so we expect to set the update frequency at 6-24 hours. Maintaining CRL scopes is done statelessly. Every certificate belongs to a specific "bucket", given its notAfter date. This mapping is generally unchanging over the life of the certificate, so revoked certificate entries will not be moving between shards upon every update. The only exception is if we change the number of shards, in which case all of the bucket boundaries will be recomputed. For more details, see the comment on `getShardBoundaries`. It uses the new SA.GetRevokedCerts method to collect all of the revoked certificates whose notAfter timestamps fall within the boundaries of each shard's time-bucket. It uses the new CA.GenerateCRL method to sign the CRLs. In the future, it will send signed CRLs to the crl-storer to be persisted outside our infrastructure. Fixes #6163	2022-07-08 09:34:51 -07:00

19 Commits