boulder

Commit Graph

Author	SHA1	Message	Date
Jacob Hoffman-Andrews	eda496606d	crl-updater: split temporal/explicit sharding by serial (#7990 ) When we turn on explicit sharding, we'll change the CA serial prefix, so we can know that all issuance from the new prefixes uses explicit sharding, and all issuance from the old prefixes uses temporal sharding. This lets us avoid putting a revoked cert in two different CRL shards (the temporal one and the explicit one). To achieve this, the crl-updater gets a list of temporally sharded serial prefixes. When it queries the `certificateStatus` table by date (`GetRevokedCerts`), it will filter out explicitly sharded certificates: those that don't have their prefix on the list. Part of #7094	2025-02-04 11:45:46 -05:00
Jacob Hoffman-Andrews	e0221b6bbe	crl-updater: query by explicit shard too (#7973 ) Add querying by explicit shard (SA.GetRevokedCertsByShard) in addition to querying by temporal shard (SA.GetRevokedCerts). Merge results from both kinds of shard. De-duplicate by serial within a shard, because the same certificate could wind up in a temporal shard that matches its explicit shard. When de-duplicating, validate that revocation reasons are the same or (very unlikely) represent a re-revocation based on demonstrating key compromise. This can happen because the two different SA queries occur at slightly different times. Add unit testing that CRL entries make it through the whole pipeline from SA, to CA, to uploader. Rename some types in the unittest to be more accessible. Tweak a comment in SA.UpdateRevokedCertificate to make it clear that status _and_ reason are critical for re-revocation. Note: This GetRevokedCertsByShard code path will always return zero certificates right now, because nothing is writing to the `revokedCertificates` table. Writing to that table is gated on certificates having CRL URLs in them, which is not yet implemented (and will be config-gated). Part of #7094	2025-01-27 10:11:09 -08:00
Aaron Gable	e05d47a10a	Replace explicit int loops with range-over-int (#7434 ) This adopts modern Go syntax to reduce the chance of off-by-one errors and remove unnecessary loop variable declarations. Fixes https://github.com/letsencrypt/boulder/issues/7227	2024-04-22 10:34:51 -07:00
Aaron Gable	ab6e023b6f	Simplify issuance.NameID and how it is used (#7260 ) Rename "IssuerNameID" to just "NameID". Similarly rename the standalone functions which compute it to better describe their function. Add a .NameID() directly to issuance.Issuer, so that callers in other packages don't have to directly access the .Cert member of an Issuer. Finally, rearrange the code in issuance.go to be sensibly grouped as concerning NameIDs, Certificates, or Issuers, rather than all mixed up between the three. Fixes https://github.com/letsencrypt/boulder/issues/5152	2024-01-17 12:55:56 -08:00
Phil Porada	6925fad324	Finish migration from int64 timestamps to timestamppb (#7142 ) This is a cleanup PR finishing the migration from int64 timestamps to protobuf `timestamppb.Timestamps` by removing all usage of the old int64 fields. In the previous PR https://github.com/letsencrypt/boulder/pull/7121 all fields were switched to read from the protobuf timestamppb fields. Adds a new case to `core.IsAnyNilOrZero` to check various properties of a `timestamppb.Timestamp` reducing the visual complexity for receivers. Fixes https://github.com/letsencrypt/boulder/issues/7060	2023-11-27 13:37:31 -08:00
Aaron Gable	1d31a22245	Export crl shard calculation code for future use (#7127 ) Make crl.GetChunkAtTIme an exported function, so that it can be used by the RA in a future PR without making that PR bigger and harder to review. Also move it from being a method to an independent function which takes two new arguments to compensate for the loss of its receiver. Also move some tests from batch_test to updater_test, where they should have been in the first place. Part of https://github.com/letsencrypt/boulder/issues/7094	2023-11-02 11:21:01 -07:00
Phil Porada	a5c2772004	Add and populate new protobuf Timestamp fields (#7070 ) * Adds new `google.protobuf.Timestamp` fields to each .proto file where we had been using `int64` fields as a timestamp. * Updates relevant gRPC messages to populate the new `google.protobuf.Timestamp` fields in addition to the old `int64` timestamp fields. * Added tests for each `<x>ToPB` and `PBto<x>` functions to ensure that new fields passed into a gRPC message arrive as intended. * Removed an unused error return from `PBToCert` and `PBToCertStatus` and cleaned up each call site. Built on-top of https://github.com/letsencrypt/boulder/pull/7069 Part 2 of 4 related to https://github.com/letsencrypt/boulder/issues/7060	2023-10-11 12:12:12 -04:00
Aaron Gable	519c199c9a	Greatly simplify crl-updater's batch mode (#7079 ) Replace crl-updater's overly complex RunOnce and updateIssuer methods with a single, much simpler RunOnce that is modeled off of the recently-redone continuous Run method's model. Instead of breaking things down by issuer then shard, simply kick off everything in parallel. This also improves batch mode's ability to listen for context cancellations at all the appropriate times. At the same time, move getShardMappings into the shared updater.go file because it is used by both the batch and continuous modes of operation, and improve uniformity of usage of the crlId structure in log output. Fixes https://github.com/letsencrypt/boulder/issues/7066	2023-09-19 13:40:18 -07:00
Phil Porada	034316ef6a	Rename int64 timestamp related protobuf fields to <fieldname>NS (#7069 ) Rename all of int64 timestamp fields to `<fieldname>NS` to indicate they are Unix nanosecond timestamps. Part 1 of 4 related to https://github.com/letsencrypt/boulder/issues/7060	2023-09-15 13:49:07 -04:00
Aaron Gable	102b447e8d	Smoother scheduling and leasing for crl-updater (#7010 ) Overhaul crl-updater's default (i.e. non-runOnce) behavior to update individual CRL shards continuously, rather than updating all shards in a large batch. To accomplish this, it spins up one goroutine for each shard of each issuer this updater is responsible for. Each goroutine is solely responsible for its assigned shard. It sleeps for a random amount of time (to stagger their starts), then begins a ticker to wake up every updateInterval and re-issue its shard. As part of this change, refactor updater.go into three separate files (batch.go, continuous.go, and updater.go) containing functions dedicated to single-run batch processing, long-running continuous processing, and shared helpers, respectively. IN-9475 tracks the deprecation of the `updateOffset` config key. The other configuration changes in this PR do not require production changes. Fixes https://github.com/letsencrypt/boulder/issues/7023	2023-09-08 09:16:15 -07:00
Aaron Gable	9a4f0ca678	Deprecate LeaseCRLShards feature (#7009 ) This feature flag is enabled in both staging and prod.	2023-08-07 15:17:00 -07:00
Aaron Gable	63319d8cd0	crl-updater: duplicate shard 0 as shard NumShards (#7008 ) When crl-updater produces a CRL with shard index 0, have it also produce an identical CRL with shard index NumShards (e.g. 128). This is the first step towards having it only produce shards numbered 1 through NumShards, i.e. transition from using 0-indexing to 1-indexing. We want to do this because various aspects of Golang and gRPC cannot tell the difference between "this struct/message has no shard index set" and "this struct/message has shard index 0 set". Part of https://github.com/letsencrypt/boulder/issues/7007	2023-07-26 15:32:20 -07:00
Aaron Gable	908421bb98	crl-updater: lease CRL shards to prevent races (#6941 ) Add a new feature flag, LeaseCRLShards, which controls certain aspects of crl-updater's behavior. When this flag is enabled, crl-updater calls the new SA.LeaseCRLShard method before beginning work on a shard. This prevents it from stepping on the toes of another crl-updater instance which may be working on the same shard. This is important to prevent two competing instances from accidentally updating a CRL's Number (which is an integer representation of its thisUpdate timestamp) backwards, which would be a compliance violation. When this flag is enabled, crl-updater also calls the new SA.UpdateCRLShard method after finishing work on a shard. In the future, additional work will be done to make crl-updater use the "give me the oldest available shard" mode of the LeaseCRLShard method. Fixes https://github.com/letsencrypt/boulder/issues/6897	2023-07-19 15:11:16 -07:00
Aaron Gable	fe523f142d	crl-updater: retry failed shards (#6907 ) Add per-shard exponential backoff and retry to crl-updater. Each individual CRL shard will be retried up to MaxAttempts (default 1) times, with exponential backoff starting at 1 second and maxing out at 1 minute between each attempt. This can effectively reduce the parallelism of crl-updater: while a goroutine is sleeping between attempts of a failing shard, it is not doing work on another shard. This is a desirable feature, since it means that crl-updater gently reduces the total load it places on the network and database when shards start to fail. Setting this new config parameter is tracked in IN-9140 Fixes https://github.com/letsencrypt/boulder/issues/6895	2023-05-22 12:59:09 -07:00
Aaron Gable	98fa0f07b4	Re-enable errcheck linter (#6819 ) Enable the errcheck linter. Update the way we express exclusions to use the new, non-deprecated, non-regex-based format. Fix all places where we began accidentally violating errcheck while it was disabled.	2023-04-14 15:41:12 -04:00
Aaron Gable	ba34ac6b6e	Use read-only SA clients in wfe, ocsp, and crl (#6484 ) In the WFE, ocsp-responder, and crl-updater, switch from using StorageAuthorityClients to StorageAuthorityReadOnlyClients. This ensures that these services cannot call methods which write to our database. Fixes #6454	2022-12-02 13:48:28 -08:00
Aaron Gable	6efd941e3c	Stabilize CRL shard boundaries (#6445 ) Add two new config keys to the crl-updater: * shardWidth, which controls the width of the chunks that we divide all of time into, with a default value of "16h" (approximately the same as today's shard width derived from 128 shards covering 90 days); and * lookbackPeriod, which controls the amount of already-expired certificates that should be included in our CRLs to ensure that even certificates which are revoked immediately before they expire still show up in aborts least one CRL, with a default value of "24h" (approximately the same as today's lookback period derived from our run frequency of 6h). Use these two new values to change the way CRL shards are computed. Previously, we would compute the total time we care about based on the configured certificate lifetime (to determine how far forward to look) and the configured update period (to determine how far back to look), and then divide that time evenly by the number of shards. However, this method had two fatal flaws. First, if the certificate lifetime is configured incorrectly, then the CRL updater will fail to query the database for some certs that should be included in the CRLs. Second, if the update period is changed, this would change the lookback period, which in turn would change the shard width, causing all CRL entries to suddenly change which shard they're in. Instead, first compute all chunk locations based only on the shard width and number of shards. Then determine which chunks we need to care about based on the configured lookback period and by querying the database for the farthest-future expiration, to ensure we cover all extant certificates. This may mean that more than one chunk of time will get mapped to a single shard, but that's okay -- each chunk will remain mapped to the same shard for the whole time we care about it. Fixes #6438 Fixes #6440	2022-10-27 15:59:48 -07:00
Aaron Gable	9bd0c7967f	Do CRL shard math with shorter durations (#6425 ) Change the anchor time used to stabilize shard boundaries from the zero time (time.Time{}) to the notBefore date of Let's Encrypt's first self-signed root certificate. This prevents us from attempting to compute durations that are greater than 290 years, the maximum representable duration in Go. Previously, using the zero time was causing our durations to all be equal to the maximum duration, removing the utility of the anchor time and causing our shard boundaries to drift with time.	2022-10-03 11:24:40 -07:00
Matthew McPherrin	6874d909f2	Enable go vet printf-auditing our logger (#6421 ) Explicitly inform go vet about the names of our logging methods which should be checked in the same way as fmt.Printf is. Although go vet can often find such functions on its own, it can't find these ones because log.Logger is an interface, not a struct. In addition, fix several format string mistakes caught by go vet.	2022-09-30 16:37:53 -07:00
Aaron Gable	65a60807cc	Fix new crl-updater numEntries log line (#6420 )	2022-09-30 14:07:19 -07:00
Aaron Gable	76583552c2	Simplify crl-updater's simultaneous gRPC streams (#6419 ) Previously, we would stream CRL Entries directly from the SA's response stream into the CA's request stream, and similarly directly stream bytes from the CA's response stream into the Storer's request stream. Since we're seeing odd errors and inconsistencies in our gRPC streaming metrics, simplify these to only conduct one stream at a time. This will make our streaming and error semantics much simpler, at the cost of memory usage in the updater.	2022-09-29 14:57:47 -07:00
Jacob Hoffman-Andrews	de2574a37a	crl/updater: fix incorrect logging of error (#6401 ) Fix instances where an error check was conditioned on something other than the traditional `err`, such as `myStruct.err`, but then the error being logged was the `err` from elsewhere in the function.	2022-09-28 09:30:32 -07:00
Aaron Gable	d53c90a3bc	Streamline and test crl-updater errors and audit logs (#6382 ) Modify the way errors are handled in crl-updater: - Rather than having each method in the tick, tickIssuer, tickShard chain concatenate all errors from its children, simply have them summarize the errors. This results in much shorter error messages. - Rather than having each method log its own errors as it returns, have each caller responsible for logging the errors it receives from its children. In addition, add tests for tick, tickIssuer, and tickShard which cover their simple errors paths, where one of the gRPC requests to the SA, CA, or CRLStorer encounters an error. These tests let us ensure that errors are being properly propagated upwards through the layers of indirection and goroutines in the three methods under test, and that the appropriate metrics are being incremented and log messages are being printed. Fixes #6375	2022-09-14 16:41:28 -07:00
Aaron Gable	4eb9a9f06a	crl-updater: log once per shard on success or failure (#6373 ) Ensure that crl-updater logs one line per shard, no matter whether that shard succeeded or failed.	2022-09-12 14:35:34 -07:00
Aaron Gable	7f189f7a3b	Improve how crl-updater formats and surfaces errors (#6369 ) Make every function in the Run -> Tick -> tickIssuer -> tickShard chain return an error. Make that return value a named return (which we usually avoid) so that we can remove the manual setting of the metric result label and have the deferred metric handling function take care of that instead. In addition, let that cleanup function wrap the returned error (if any) with the identity of the shard, issuer, or tick that is returning it, so that we don't have to include that info in every individual error message. Finally, have the functions which spin off many helpers (Tick and tickIssuer) collect all of their helpers' errors and only surface that error at the end, to ensure the process completes even in the presence of transient errors. In crl-updater's main, surface the error returned by Run or Tick, to make debugging easier.	2022-09-12 11:36:42 -07:00
Aaron Gable	9228e60159	Fix goroutine leak by canceling gRPC contexts (#6371 ) In crl-updater's tickShard, which handles all of the gRPC requests to the SA, CA, and crl-storer, ensure that any open gRPC connections get closed when the function returns for any reason by canceling the context object used in those connections. This fixes a goroutine leak where the updater would open simultaneous connections to the SA and CA, get an error from the SA, and then return without closing the connection to the CA. This left the CA stream open forever, leading to goroutine and memory leaks in the updater.	2022-09-09 16:03:23 -07:00
Aaron Gable	efa0cd0a10	Improve crl metrics (#6358 ) Remove the secondsSinceSuccess metric, which was both never being set and rather useless given the bursty nature of CRL update scheduling. Add an "issuer" label to the crl_updater_generated metric, to match the other metrics exported by the updater and the storer.	2022-09-09 14:22:04 -07:00
Samantha	7ed4cd992e	CRL: Improve shard identification in error messages (#6306 ) - Create new package `crl` - Add a common unique CRL identifier `crl.id` with constructor `crl.Id()` - Replace `shardIdx` with `crl.Id` in `storer` and `updater` errors - Add a common type for the `CRLNumber` field `crl.number` with constructor `crl.Number()` - Replace `CRLNumber` construction in CA and CRL package with `crl.Number()` Resolves #6261	2022-08-23 12:35:00 -07:00
Aaron Gable	6a9bb399f7	Create new crl-storer service (#6264 ) Create a new crl-storer service, which receives CRL shards via gRPC and uploads them to an S3 bucket. It ignores AWS SDK configuration in the usual places, in favor of configuration from our standard JSON service config files. It ensures that the CRLs it receives parse and are signed by the appropriate issuer before uploading them. Integrate crl-updater with the new service. It streams bytes to the crl-storer as it receives them from the CA, without performing any checking at the same time. This new functionality is disabled if the crl-updater does not have a config stanza instructing it how to connect to the crl-storer. Finally, add a new test component, the s3-test-srv. This acts similarly to the existing mail-test-srv: it receives requests, stores information about them, and exposes that information for later querying by the integration test. The integration test uses this to ensure that a newly-revoked certificate does show up in the next generation of CRLs produced. Fixes #6162	2022-08-08 16:22:48 -07:00
Aaron Gable	733bcec941	Standardize on 'shardIdx' to identify crl shards (#6263 ) Realized that "ShardID" is a bad name, because a real unique identifier of a shard would include the issuer, crl number, and shard number. Switching to "ShardIdx" makes it clearer that shards within a full and complete CRL are identified by a zero-indexed integer.	2022-08-02 13:21:26 -07:00
Aaron Gable	694d73d67b	crl-updater: add UpdateOffset config to run on a schedule (#6260 ) Add a new config key `UpdateOffset` to crl-updater, which causes it to run on a regular schedule rather than running immediately upon startup and then every `UpdatePeriod` after that. It is safe for this new config key to be omitted and take the default zero value. Also add a new command line flag `runOnce` to crl-updater which causes it to immediately run a single time and then exit, rather than running continuously as a daemon. This will be useful for integration tests and emergency situations. Part of #6163	2022-07-29 13:30:16 -07:00
Aaron Gable	436061fb35	CRL: Create crl-updater service (#6212 ) Create a new service named crl-updater. It is responsible for maintaining the full set of CRLs we issue: one "full and complete" CRL for each currently-active Issuer, split into a number of "shards" which are essentially CRLs with arbitrary scopes. The crl-updater is modeled after the ocsp-updater: it is a long-running standalone service that wakes up periodically, does a large amount of work in parallel, and then sleeps. The period at which it wakes to do work is configurable. Unlike the ocsp-responder, it does all of its work every time it wakes, so we expect to set the update frequency at 6-24 hours. Maintaining CRL scopes is done statelessly. Every certificate belongs to a specific "bucket", given its notAfter date. This mapping is generally unchanging over the life of the certificate, so revoked certificate entries will not be moving between shards upon every update. The only exception is if we change the number of shards, in which case all of the bucket boundaries will be recomputed. For more details, see the comment on `getShardBoundaries`. It uses the new SA.GetRevokedCerts method to collect all of the revoked certificates whose notAfter timestamps fall within the boundaries of each shard's time-bucket. It uses the new CA.GenerateCRL method to sign the CRLs. In the future, it will send signed CRLs to the crl-storer to be persisted outside our infrastructure. Fixes #6163	2022-07-08 09:34:51 -07:00

32 Commits