Replace all of Boulder's usage of the Go stdlib "math/rand" package with
the newer "math/rand/v2" package which first became available in go1.22.
This package has an improved API and faster performance across the
board.
See https://go.dev/blog/randv2 and https://go.dev/blog/chacha8rand for
details.
Replaced our embeds of foopb.UnimplementedFooServer with
foopb.UnsafeFooServer. Per the grpc-go docs this reduces the "forwards
compatibility" of our implementations, but that is only a concern for
codebases that are implementing gRPC interfaces maintained by third
parties, and which want to be able to update those third-party
dependencies without updating their own implementations in lockstep.
Because we update our protos and our implementations simultaneously, we
can remove this safety net to replace runtime type checking with
compile-time type checking.
However, that replacement is not enough, because we never pass our
implementation objects to a function which asserts that they match a
specific interface. So this PR also replaces our reflect-based unittests
with idiomatic interface assertions. I do not view this as a perfect
solution, as it relies on people implementing new gRPC servers to add
this line, but it is no worse than the status quo which relied on people
adding the "TestImplementation" test.
Fixes https://github.com/letsencrypt/boulder/issues/7497
Update the version of protoc-gen-go-grpc that we use to generate Go gRPC
code from our proto files, and update the versions of other gRPC tools
and libraries that we use to match. Turn on the new
`use_generic_streams` code generation flag to change how
protoc-gen-go-grpc generates implementations of our streaming methods,
from creating a wholly independent implementation for every stream to
using shared generic implementations.
Take advantage of this code-sharing to remove our SA "wrapper" methods,
now that they have truly the same signature as the SARO methods which
they wrap. Also remove all references to the old-style stream names
(e.g. foopb.FooService_BarMethodClient) and replace them with the new
underlying generic names, for the sake of consistency. Finally, also
remove a few custom stream test mocks, replacing them with the generic
mocks.ServerStreamClient.
Note that this PR does not change the names in //mocks/sa.go, to avoid
conflicts with work happening in the pursuit of
https://github.com/letsencrypt/boulder/issues/7476. Note also that this
PR updates the version of protoc-gen-go-grpc that we use to a specific
commit. This is because, although a new release of grpc-go itself has
been cut, the codegen binary is a separate Go module with its own
releases, and it hasn't had a new release cut yet. Tracking for that is
in https://github.com/grpc/grpc-go/issues/7030.
Replace "mocks.StorageAuthority" with "sapb.StorageAuthorityClient" in
our test mocks. The improves them by removing implementations of the
methods the tests don't actually need, instead of inheriting lots of
extraneous methods from the huge and cumbersome mocks.StorageAuthority.
This reduces our usage of mocks.StorageAuthority to only the WFE tests
(which create one in the frequently-used setup() function), which will
make refactoring those mocks in the pursuit of
https://github.com/letsencrypt/boulder/issues/7476 much easier.
Part of https://github.com/letsencrypt/boulder/issues/7476
We had disabled our lints on go1.22 because golangci-lint and
staticcheck didn't work with some of its updates. Re-enable them, and
fix the things which the updated linters catch now.
Fixes https://github.com/letsencrypt/boulder/issues/7229
Change crl-storer to only require that 1 of the IssuingDistributionPoint
URIs remain consistent between consecutive CRLs in the same sequence.
This allows us to add and remove IDP URIs, so we can change our IDP
scheme over time.
To facilitate this, also move all code which builds or parses IDP
extensions into a single place, so that we don't have to have multiple
definitions of the same types and similar code in many places.
Fixes https://github.com/letsencrypt/boulder/issues/7340
Part of https://github.com/letsencrypt/boulder/issues/7296
Move the CRL issuance logic -- building an x509.RevocationList template,
populating it with correctly-built extensions, linting it, and actually
signing it -- out of the //ca package and into the //issuance package.
This means that the CA's CRL code no longer needs to be able to reach
inside the issuance package to access its issuers and certificates (and
those fields will be able to be made private after the same is done for
OCSP issuance).
Additionally, improve the configuration of CRL issuance, create
additional checks on CRL's ThisUpdate and NextUpdate fields, and make it
possible for a CRL to contain two IssuingDistributionPoint URIs so that
we can migrate to shorter addresses.
IN-10045 tracks the corresponding production changes.
Fixes https://github.com/letsencrypt/boulder/issues/7159
Part of https://github.com/letsencrypt/boulder/issues/7296
Part of https://github.com/letsencrypt/boulder/issues/7294
Part of https://github.com/letsencrypt/boulder/issues/7094
Part of https://github.com/letsencrypt/boulder/issues/7100
Remove the Profile field from issuance.Issuer, to reflect the fact that
profiles are in fact independent pieces of configuration which can be
shared across (and are configured independently of) multiple issuers.
Move the IssuerURL, OCSPUrl, and CRLURL fields from issuance.Profile to
issuance.Issuer, since they reflect fundamental attributes of the
issuer, rather than attributes of a particular profile. This also
reflects the location at which those values are configured, in
issuance.IssuerConfig.
All other changes are fallout from the above: adding a Profile argument
to various methods in the issuance and linting packages, adding a
profile field to the caImpl struct, etc. This change paves the way for
two future changes: moving OCSP and CRL creation into the issuance
package, and supporting multiple simultaneous profiles that the CA can
select between.
Part of https://github.com/letsencrypt/boulder/issues/7159
Part of https://github.com/letsencrypt/boulder/issues/6316
Part of https://github.com/letsencrypt/boulder/issues/6966
Rename "IssuerNameID" to just "NameID". Similarly rename the standalone
functions which compute it to better describe their function. Add a
.NameID() directly to issuance.Issuer, so that callers in other packages
don't have to directly access the .Cert member of an Issuer. Finally,
rearrange the code in issuance.go to be sensibly grouped as concerning
NameIDs, Certificates, or Issuers, rather than all mixed up between the
three.
Fixes https://github.com/letsencrypt/boulder/issues/5152
This is a cleanup PR finishing the migration from int64 timestamps to
protobuf `*timestamppb.Timestamps` by removing all usage of the old
int64 fields. In the previous PR
https://github.com/letsencrypt/boulder/pull/7121 all fields were
switched to read from the protobuf timestamppb fields.
Adds a new case to `core.IsAnyNilOrZero` to check various properties of
a `*timestamppb.Timestamp` reducing the visual complexity for receivers.
Fixes https://github.com/letsencrypt/boulder/issues/7060
Make crl.GetChunkAtTIme an exported function, so that it can be used by
the RA in a future PR without making that PR bigger and harder to
review. Also move it from being a method to an independent function
which takes two new arguments to compensate for the loss of its
receiver.
Also move some tests from batch_test to updater_test, where they should
have been in the first place.
Part of https://github.com/letsencrypt/boulder/issues/7094
* Adds new `google.protobuf.Timestamp` fields to each .proto file where
we had been using `int64` fields as a timestamp.
* Updates relevant gRPC messages to populate the new
`google.protobuf.Timestamp` fields in addition to the old `int64`
timestamp fields.
* Added tests for each `<x>ToPB` and `PBto<x>` functions to ensure that
new fields passed into a gRPC message arrive as intended.
* Removed an unused error return from `PBToCert` and `PBToCertStatus`
and cleaned up each call site.
Built on-top of https://github.com/letsencrypt/boulder/pull/7069
Part 2 of 4 related to
https://github.com/letsencrypt/boulder/issues/7060
Have the crl-storer download the previous CRL from S3, parse it, and
compare its number against the about-to-be-uploaded CRL. This is not an
atomic operation, so it is not a 100% guarantee, but it is still a
useful safety check to prevent accidentally uploading CRL shards whose
CRL Numbers are not strictly increasing.
Part of https://github.com/letsencrypt/boulder/issues/6456
Replace crl-updater's overly complex RunOnce and updateIssuer methods
with a single, much simpler RunOnce that is modeled off of the
recently-redone continuous Run method's model. Instead of breaking
things down by issuer then shard, simply kick off everything in
parallel. This also improves batch mode's ability to listen for context
cancellations at all the appropriate times.
At the same time, move getShardMappings into the shared updater.go file
because it is used by both the batch and continuous modes of operation,
and improve uniformity of usage of the crlId structure in log output.
Fixes https://github.com/letsencrypt/boulder/issues/7066
Delete our forked version of the x509 library, and update all call-sites
to use the version that we upstreamed and got released in go1.21. This
requires making a few changes to calling code:
- replace crl_x509.RevokedCertificate with x509.RevocationListEntry
- replace RevocationList.RevokedCertificates with
RevocationList.RevokedCertificateEntries
- make RevocationListEntry.ReasonCode a non-pointer integer
Our lints cannot yet be updated to use the new types and fields, because
those improvements have not yet been adopted by the zcrypto/x509 package
used by the linting framework.
Fixes https://github.com/letsencrypt/boulder/issues/6741
Overhaul crl-updater's default (i.e. non-runOnce) behavior to update
individual CRL shards continuously, rather than updating all shards in a
large batch.
To accomplish this, it spins up one goroutine for each shard of each
issuer this updater is responsible for. Each goroutine is solely
responsible for its assigned shard. It sleeps for a random amount of
time (to stagger their starts), then begins a ticker to wake up every
updateInterval and re-issue its shard.
As part of this change, refactor updater.go into three separate files
(batch.go, continuous.go, and updater.go) containing functions dedicated
to single-run batch processing, long-running continuous processing, and
shared helpers, respectively.
IN-9475 tracks the deprecation of the `updateOffset` config key. The
other configuration changes in this PR do not require production
changes.
Fixes https://github.com/letsencrypt/boulder/issues/7023
When crl-updater produces a CRL with shard index 0, have it also produce
an identical CRL with shard index NumShards (e.g. 128). This is the
first step towards having it only produce shards numbered 1 through
NumShards, i.e. transition from using 0-indexing to 1-indexing.
We want to do this because various aspects of Golang and gRPC cannot
tell the difference between "this struct/message has no shard index set"
and "this struct/message has shard index 0 set".
Part of https://github.com/letsencrypt/boulder/issues/7007
Add a new feature flag, LeaseCRLShards, which controls certain aspects
of crl-updater's behavior.
When this flag is enabled, crl-updater calls the new SA.LeaseCRLShard
method before beginning work on a shard. This prevents it from stepping
on the toes of another crl-updater instance which may be working on the
same shard. This is important to prevent two competing instances from
accidentally updating a CRL's Number (which is an integer representation
of its thisUpdate timestamp) *backwards*, which would be a compliance
violation.
When this flag is enabled, crl-updater also calls the new
SA.UpdateCRLShard method after finishing work on a shard.
In the future, additional work will be done to make crl-updater use the
"give me the oldest available shard" mode of the LeaseCRLShard method.
Fixes https://github.com/letsencrypt/boulder/issues/6897
Update zlint to v3.5.0, which introduces scaffolding for running lints
over CRLs.
Convert all of our existing CRL checks to structs which match the zlint
interface, and add them to the registry. Then change our linter's
CheckCRL function, and crl-checker's Validate function, to run all lints
in the zlint registry.
Finally, update the ceremony tool to run these lints as well.
This change touches a lot of files, but involves almost no logic
changes. It's all just infrastructure, changing the way our lints and
their tests are shaped, and moving test files into new homes.
Fixes https://github.com/letsencrypt/boulder/issues/6934
Fixes https://github.com/letsencrypt/boulder/issues/6979
Add per-shard exponential backoff and retry to crl-updater. Each
individual CRL shard will be retried up to MaxAttempts (default 1)
times, with exponential backoff starting at 1 second and maxing out at 1
minute between each attempt.
This can effectively reduce the parallelism of crl-updater: while a
goroutine is sleeping between attempts of a failing shard, it is not
doing work on another shard. This is a desirable feature, since it means
that crl-updater gently reduces the total load it places on the network
and database when shards start to fail.
Setting this new config parameter is tracked in IN-9140
Fixes https://github.com/letsencrypt/boulder/issues/6895
Enable the errcheck linter. Update the way we express exclusions to use
the new, non-deprecated, non-regex-based format. Fix all places where we
began accidentally violating errcheck while it was disabled.
As a follow-up to #6780, add the same style of implementation test to
all of our other gRPC services. This was not included in that PR just to
keep it small and single-purpose.
In the WFE, ocsp-responder, and crl-updater, switch from using
StorageAuthorityClients to StorageAuthorityReadOnlyClients. This ensures
that these services cannot call methods which write to our database.
Fixes#6454
The things we need in crl/checker really only need x509.Certificate.
This allows us to remove a dependency on pkcs11key from the crl checker,
and transitively on CGO.
I've confirmed that `CGO_ENABLED=0 go build ./crl/checker` succeeds on
this branch, while it fails on main.
Add two new config keys to the crl-updater:
* shardWidth, which controls the width of the chunks that we divide all
of time into, with a default value of "16h" (approximately the same as
today's shard width derived from 128 shards covering 90 days); and
* lookbackPeriod, which controls the amount of already-expired
certificates that should be included in our CRLs to ensure that even
certificates which are revoked immediately before they expire still show
up in aborts least one CRL, with a default value of "24h" (approximately
the same as today's lookback period derived from our run frequency of
6h).
Use these two new values to change the way CRL shards are computed.
Previously, we would compute the total time we care about based on the
configured certificate lifetime (to determine how far forward to look)
and the configured update period (to determine how far back to look),
and then divide that time evenly by the number of shards. However, this
method had two fatal flaws. First, if the certificate lifetime is
configured incorrectly, then the CRL updater will fail to query the
database for some certs that should be included in the CRLs. Second, if
the update period is changed, this would change the lookback period,
which in turn would change the shard width, causing all CRL entries to
suddenly change which shard they're in.
Instead, first compute all chunk locations based only on the shard width
and number of shards. Then determine which chunks we need to care about
based on the configured lookback period and by querying the database for
the farthest-future expiration, to ensure we cover all extant
certificates. This may mean that more than one chunk of time will get
mapped to a single shard, but that's okay -- each chunk will remain
mapped to the same shard for the whole time we care about it.
Fixes#6438Fixes#6440
Clean up several spots where we were behaving differently on
go1.18 and go1.19, now that we're using go1.19 everywhere. Also
re-enable the lint and generate tests, and fix the various places where
the two versions disagreed on how comments should be formatted.
Also clean up the OldTLS codepaths, now that both go1.19 and our
own feature flags have forbidden TLS < 1.2 everywhere.
Fixes#6011
Currently, we refuse to error out when checking the error from
S3, to ensure that we can update the metrics appropriately. This
requires us to use an unconventional error-checking structure,
and to check the error again when it comes time to return.
Instead, move the metrics above the error check, and then make
the error check a more traditional structure.
Change the anchor time used to stabilize shard boundaries from
the zero time (time.Time{}) to the notBefore date of Let's Encrypt's
first self-signed root certificate.
This prevents us from attempting to compute durations that are
greater than 290 years, the maximum representable duration in
Go. Previously, using the zero time was causing our durations to
all be equal to the maximum duration, removing the utility of the
anchor time and causing our shard boundaries to drift with time.
Explicitly inform go vet about the names of our logging methods
which should be checked in the same way as fmt.Printf is. Although
go vet can often find such functions on its own, it can't find these
ones because log.Logger is an interface, not a struct.
In addition, fix several format string mistakes caught by go vet.
Previously, we would stream CRL Entries directly from the SA's response
stream into the CA's request stream, and similarly directly stream bytes
from the CA's response stream into the Storer's request stream.
Since we're seeing odd errors and inconsistencies in our gRPC streaming
metrics, simplify these to only conduct one stream at a time. This will
make our streaming and error semantics much simpler, at the cost of
memory usage in the updater.
Fix instances where an error check was conditioned on something
other than the traditional `err`, such as `myStruct.err`, but then the
error being logged was the `err` from elsewhere in the function.
Modify the way errors are handled in crl-updater:
- Rather than having each method in the tick, tickIssuer, tickShard
chain concatenate all errors from its children, simply have them
summarize the errors. This results in much shorter error messages.
- Rather than having each method log its own errors as it returns, have
each caller responsible for logging the errors it receives from its
children.
In addition, add tests for tick, tickIssuer, and tickShard which cover
their simple errors paths, where one of the gRPC requests to the SA, CA,
or CRLStorer encounters an error. These tests let us ensure that errors
are being properly propagated upwards through the layers of indirection
and goroutines in the three methods under test, and that the appropriate
metrics are being incremented and log messages are being printed.
Fixes#6375
Make every function in the Run -> Tick -> tickIssuer -> tickShard chain
return an error. Make that return value a named return (which we usually
avoid) so that we can remove the manual setting of the metric result
label and have the deferred metric handling function take care of that
instead. In addition, let that cleanup function wrap the returned error
(if any) with the identity of the shard, issuer, or tick that is
returning it, so that we don't have to include that info in every
individual error message. Finally, have the functions which spin off
many helpers (Tick and tickIssuer) collect all of their helpers' errors
and only surface that error at the end, to ensure the process completes
even in the presence of transient errors.
In crl-updater's main, surface the error returned by Run or Tick, to
make debugging easier.
In crl-updater's tickShard, which handles all of the gRPC requests
to the SA, CA, and crl-storer, ensure that any open gRPC connections
get closed when the function returns for any reason by canceling the
context object used in those connections.
This fixes a goroutine leak where the updater would open simultaneous
connections to the SA and CA, get an error from the SA, and then return
without closing the connection to the CA. This left the CA stream open
forever, leading to goroutine and memory leaks in the updater.
Remove the secondsSinceSuccess metric, which was both never
being set and rather useless given the bursty nature of CRL update
scheduling.
Add an "issuer" label to the crl_updater_generated metric, to match
the other metrics exported by the updater and the storer.
Change the S3 object paths to not include the CRL Number.
At one point, the plan was to upload all of the CRL shards to
S3 paths containing their CRL Number (which monotonically
increases every generation), and then later move or symlink
them into paths not containing that number. However, we saw
that S3 does not have any atomic move or rename semantics,
so we decided to instead enable object versioning and upload
the shards to the same path every time. Apparently I never fixed
the object key computation to match the updated design.
The CRL Number is still stored on the object as a metadata tag.
- Create new package `crl`
- Add a common unique CRL identifier `crl.id` with constructor `crl.Id()`
- Replace `shardIdx` with `crl.Id` in `storer` and `updater` errors
- Add a common type for the `CRLNumber` field `crl.number` with constructor
`crl.Number()`
- Replace `CRLNumber` construction in CA and CRL package with `crl.Number()`
Resolves#6261
The iotuil package has been deprecated since go1.16; the various
functions it provided now exist in the os and io packages. Replace all
instances of ioutil with either io or os, as appropriate.
Create a new crl-storer service, which receives CRL shards via gRPC and
uploads them to an S3 bucket. It ignores AWS SDK configuration in the
usual places, in favor of configuration from our standard JSON service
config files. It ensures that the CRLs it receives parse and are signed
by the appropriate issuer before uploading them.
Integrate crl-updater with the new service. It streams bytes to the
crl-storer as it receives them from the CA, without performing any
checking at the same time. This new functionality is disabled if the
crl-updater does not have a config stanza instructing it how to connect
to the crl-storer.
Finally, add a new test component, the s3-test-srv. This acts similarly
to the existing mail-test-srv: it receives requests, stores information
about them, and exposes that information for later querying by the
integration test. The integration test uses this to ensure that a
newly-revoked certificate does show up in the next generation of CRLs
produced.
Fixes#6162