Explicitly inform go vet about the names of our logging methods
which should be checked in the same way as fmt.Printf is. Although
go vet can often find such functions on its own, it can't find these
ones because log.Logger is an interface, not a struct.
In addition, fix several format string mistakes caught by go vet.
Previously, we would stream CRL Entries directly from the SA's response
stream into the CA's request stream, and similarly directly stream bytes
from the CA's response stream into the Storer's request stream.
Since we're seeing odd errors and inconsistencies in our gRPC streaming
metrics, simplify these to only conduct one stream at a time. This will
make our streaming and error semantics much simpler, at the cost of
memory usage in the updater.
Fix instances where an error check was conditioned on something
other than the traditional `err`, such as `myStruct.err`, but then the
error being logged was the `err` from elsewhere in the function.
Modify the way errors are handled in crl-updater:
- Rather than having each method in the tick, tickIssuer, tickShard
chain concatenate all errors from its children, simply have them
summarize the errors. This results in much shorter error messages.
- Rather than having each method log its own errors as it returns, have
each caller responsible for logging the errors it receives from its
children.
In addition, add tests for tick, tickIssuer, and tickShard which cover
their simple errors paths, where one of the gRPC requests to the SA, CA,
or CRLStorer encounters an error. These tests let us ensure that errors
are being properly propagated upwards through the layers of indirection
and goroutines in the three methods under test, and that the appropriate
metrics are being incremented and log messages are being printed.
Fixes#6375
Make every function in the Run -> Tick -> tickIssuer -> tickShard chain
return an error. Make that return value a named return (which we usually
avoid) so that we can remove the manual setting of the metric result
label and have the deferred metric handling function take care of that
instead. In addition, let that cleanup function wrap the returned error
(if any) with the identity of the shard, issuer, or tick that is
returning it, so that we don't have to include that info in every
individual error message. Finally, have the functions which spin off
many helpers (Tick and tickIssuer) collect all of their helpers' errors
and only surface that error at the end, to ensure the process completes
even in the presence of transient errors.
In crl-updater's main, surface the error returned by Run or Tick, to
make debugging easier.
In crl-updater's tickShard, which handles all of the gRPC requests
to the SA, CA, and crl-storer, ensure that any open gRPC connections
get closed when the function returns for any reason by canceling the
context object used in those connections.
This fixes a goroutine leak where the updater would open simultaneous
connections to the SA and CA, get an error from the SA, and then return
without closing the connection to the CA. This left the CA stream open
forever, leading to goroutine and memory leaks in the updater.
Remove the secondsSinceSuccess metric, which was both never
being set and rather useless given the bursty nature of CRL update
scheduling.
Add an "issuer" label to the crl_updater_generated metric, to match
the other metrics exported by the updater and the storer.
- Create new package `crl`
- Add a common unique CRL identifier `crl.id` with constructor `crl.Id()`
- Replace `shardIdx` with `crl.Id` in `storer` and `updater` errors
- Add a common type for the `CRLNumber` field `crl.number` with constructor
`crl.Number()`
- Replace `CRLNumber` construction in CA and CRL package with `crl.Number()`
Resolves#6261
Create a new crl-storer service, which receives CRL shards via gRPC and
uploads them to an S3 bucket. It ignores AWS SDK configuration in the
usual places, in favor of configuration from our standard JSON service
config files. It ensures that the CRLs it receives parse and are signed
by the appropriate issuer before uploading them.
Integrate crl-updater with the new service. It streams bytes to the
crl-storer as it receives them from the CA, without performing any
checking at the same time. This new functionality is disabled if the
crl-updater does not have a config stanza instructing it how to connect
to the crl-storer.
Finally, add a new test component, the s3-test-srv. This acts similarly
to the existing mail-test-srv: it receives requests, stores information
about them, and exposes that information for later querying by the
integration test. The integration test uses this to ensure that a
newly-revoked certificate does show up in the next generation of CRLs
produced.
Fixes#6162
Realized that "ShardID" is a bad name, because a real unique
identifier of a shard would include the issuer, crl number, and
shard number. Switching to "ShardIdx" makes it clearer that
shards within a full and complete CRL are identified by a
zero-indexed integer.
Add a new config key `UpdateOffset` to crl-updater, which causes it to
run on a regular schedule rather than running immediately upon startup
and then every `UpdatePeriod` after that. It is safe for this new config
key to be omitted and take the default zero value.
Also add a new command line flag `runOnce` to crl-updater which causes
it to immediately run a single time and then exit, rather than running
continuously as a daemon. This will be useful for integration tests and
emergency situations.
Part of #6163
Create a new service named crl-updater. It is responsible for
maintaining the full set of CRLs we issue: one "full and complete" CRL
for each currently-active Issuer, split into a number of "shards" which
are essentially CRLs with arbitrary scopes.
The crl-updater is modeled after the ocsp-updater: it is a long-running
standalone service that wakes up periodically, does a large amount of
work in parallel, and then sleeps. The period at which it wakes to do
work is configurable. Unlike the ocsp-responder, it does all of its work
every time it wakes, so we expect to set the update frequency at 6-24
hours.
Maintaining CRL scopes is done statelessly. Every certificate belongs to
a specific "bucket", given its notAfter date. This mapping is generally
unchanging over the life of the certificate, so revoked certificate
entries will not be moving between shards upon every update. The only
exception is if we change the number of shards, in which case all of the
bucket boundaries will be recomputed. For more details, see the comment
on `getShardBoundaries`.
It uses the new SA.GetRevokedCerts method to collect all of the revoked
certificates whose notAfter timestamps fall within the boundaries of
each shard's time-bucket. It uses the new CA.GenerateCRL method to sign
the CRLs. In the future, it will send signed CRLs to the crl-storer to
be persisted outside our infrastructure.
Fixes#6163