Commit Graph

14 Commits

Author SHA1 Message Date
Matthew McPherrin 6874d909f2
Enable go vet printf-auditing our logger (#6421)
Explicitly inform go vet about the names of our logging methods
which should be checked in the same way as fmt.Printf is. Although
go vet can often find such functions on its own, it can't find these
ones because log.Logger is an interface, not a struct.

In addition, fix several format string mistakes caught by go vet.
2022-09-30 16:37:53 -07:00
Aaron Gable 65a60807cc
Fix new crl-updater numEntries log line (#6420) 2022-09-30 14:07:19 -07:00
Aaron Gable 76583552c2
Simplify crl-updater's simultaneous gRPC streams (#6419)
Previously, we would stream CRL Entries directly from the SA's response
stream into the CA's request stream, and similarly directly stream bytes
from the CA's response stream into the Storer's request stream.

Since we're seeing odd errors and inconsistencies in our gRPC streaming
metrics, simplify these to only conduct one stream at a time. This will
make our streaming and error semantics much simpler, at the cost of
memory usage in the updater.
2022-09-29 14:57:47 -07:00
Jacob Hoffman-Andrews de2574a37a
crl/updater: fix incorrect logging of error (#6401)
Fix instances where an error check was conditioned on something
other than the traditional `err`, such as `myStruct.err`, but then the
error being logged was the `err` from elsewhere in the function.
2022-09-28 09:30:32 -07:00
Aaron Gable d53c90a3bc
Streamline and test crl-updater errors and audit logs (#6382)
Modify the way errors are handled in crl-updater:
- Rather than having each method in the tick, tickIssuer, tickShard
  chain concatenate all errors from its children, simply have them
  summarize the errors. This results in much shorter error messages.
- Rather than having each method log its own errors as it returns, have
  each caller responsible for logging the errors it receives from its
  children.

In addition, add tests for tick, tickIssuer, and tickShard which cover
their simple errors paths, where one of the gRPC requests to the SA, CA,
or CRLStorer encounters an error. These tests let us ensure that errors
are being properly propagated upwards through the layers of indirection
and goroutines in the three methods under test, and that the appropriate
metrics are being incremented and log messages are being printed.

Fixes #6375
2022-09-14 16:41:28 -07:00
Aaron Gable 4eb9a9f06a
crl-updater: log once per shard on success or failure (#6373)
Ensure that crl-updater logs one line per shard, no matter whether
that shard succeeded or failed.
2022-09-12 14:35:34 -07:00
Aaron Gable 7f189f7a3b
Improve how crl-updater formats and surfaces errors (#6369)
Make every function in the Run -> Tick -> tickIssuer -> tickShard chain
return an error. Make that return value a named return (which we usually
avoid) so that we can remove the manual setting of the metric result
label and have the deferred metric handling function take care of that
instead. In addition, let that cleanup function wrap the returned error
(if any) with the identity of the shard, issuer, or tick that is
returning it, so that we don't have to include that info in every
individual error message. Finally, have the functions which spin off
many helpers (Tick and tickIssuer) collect all of their helpers' errors
and only surface that error at the end, to ensure the process completes
even in the presence of transient errors.

In crl-updater's main, surface the error returned by Run or Tick, to
make debugging easier.
2022-09-12 11:36:42 -07:00
Aaron Gable 9228e60159
Fix goroutine leak by canceling gRPC contexts (#6371)
In crl-updater's tickShard, which handles all of the gRPC requests
to the SA, CA, and crl-storer, ensure that any open gRPC connections
get closed when the function returns for any reason by canceling the
context object used in those connections.

This fixes a goroutine leak where the updater would open simultaneous
connections to the SA and CA, get an error from the SA, and then return
without closing the connection to the CA. This left the CA stream open
forever, leading to goroutine and memory leaks in the updater.
2022-09-09 16:03:23 -07:00
Aaron Gable efa0cd0a10
Improve crl metrics (#6358)
Remove the secondsSinceSuccess metric, which was both never
being set and rather useless given the bursty nature of CRL update
scheduling.

Add an "issuer" label to the crl_updater_generated metric, to match
the other metrics exported by the updater and the storer.
2022-09-09 14:22:04 -07:00
Samantha 7ed4cd992e
CRL: Improve shard identification in error messages (#6306)
- Create new package `crl`
- Add a common unique CRL identifier `crl.id` with constructor `crl.Id()`
- Replace `shardIdx` with `crl.Id` in `storer` and `updater` errors
- Add a common type for the `CRLNumber` field `crl.number` with constructor
  `crl.Number()`
- Replace `CRLNumber` construction in CA and CRL package with `crl.Number()`

Resolves #6261
2022-08-23 12:35:00 -07:00
Aaron Gable 6a9bb399f7
Create new crl-storer service (#6264)
Create a new crl-storer service, which receives CRL shards via gRPC and
uploads them to an S3 bucket. It ignores AWS SDK configuration in the
usual places, in favor of configuration from our standard JSON service
config files. It ensures that the CRLs it receives parse and are signed
by the appropriate issuer before uploading them.

Integrate crl-updater with the new service. It streams bytes to the
crl-storer as it receives them from the CA, without performing any
checking at the same time. This new functionality is disabled if the
crl-updater does not have a config stanza instructing it how to connect
to the crl-storer.

Finally, add a new test component, the s3-test-srv. This acts similarly
to the existing mail-test-srv: it receives requests, stores information
about them, and exposes that information for later querying by the
integration test. The integration test uses this to ensure that a
newly-revoked certificate does show up in the next generation of CRLs
produced.

Fixes #6162
2022-08-08 16:22:48 -07:00
Aaron Gable 733bcec941
Standardize on 'shardIdx' to identify crl shards (#6263)
Realized that "ShardID" is a bad name, because a real unique
identifier of a shard would include the issuer, crl number, and
shard number. Switching to "ShardIdx" makes it clearer that
shards within a full and complete CRL are identified by a
zero-indexed integer.
2022-08-02 13:21:26 -07:00
Aaron Gable 694d73d67b
crl-updater: add UpdateOffset config to run on a schedule (#6260)
Add a new config key `UpdateOffset` to crl-updater, which causes it to
run on a regular schedule rather than running immediately upon startup
and then every `UpdatePeriod` after that. It is safe for this new config
key to be omitted and take the default zero value.

Also add a new command line flag `runOnce` to crl-updater which causes
it to immediately run a single time and then exit, rather than running
continuously as a daemon. This will be useful for integration tests and
emergency situations.

Part of #6163
2022-07-29 13:30:16 -07:00
Aaron Gable 436061fb35
CRL: Create crl-updater service (#6212)
Create a new service named crl-updater. It is responsible for
maintaining the full set of CRLs we issue: one "full and complete" CRL
for each currently-active Issuer, split into a number of "shards" which
are essentially CRLs with arbitrary scopes.

The crl-updater is modeled after the ocsp-updater: it is a long-running
standalone service that wakes up periodically, does a large amount of
work in parallel, and then sleeps. The period at which it wakes to do
work is configurable. Unlike the ocsp-responder, it does all of its work
every time it wakes, so we expect to set the update frequency at 6-24
hours.

Maintaining CRL scopes is done statelessly. Every certificate belongs to
a specific "bucket", given its notAfter date. This mapping is generally
unchanging over the life of the certificate, so revoked certificate
entries will not be moving between shards upon every update. The only
exception is if we change the number of shards, in which case all of the
bucket boundaries will be recomputed. For more details, see the comment
on `getShardBoundaries`.

It uses the new SA.GetRevokedCerts method to collect all of the revoked
certificates whose notAfter timestamps fall within the boundaries of
each shard's time-bucket. It uses the new CA.GenerateCRL method to sign
the CRLs. In the future, it will send signed CRLs to the crl-storer to
be persisted outside our infrastructure.

Fixes #6163
2022-07-08 09:34:51 -07:00