Commit Graph

20 Commits

Author SHA1 Message Date
Aaron Gable c0e31f9a4f
Add integration test for when CRL entries are removed (#8084)
We already have an integration test showing that a serial does not show
up on any CRL before its certificate has been revoked, and does show up
afterwards. Extend that test to cover three new times:
- shortly before the certificate expires, when the entry must still
appear;
- shortly after the certificate expires, when the entry must still
appear; and
- significantly after the certificate expires, when the entry may be
removed.

To facilitate this, augment the s3-test-srv with a new reset endpoint,
so that the integration test can query the contents of only the
most-recently-generated set of CRLs.

I have confirmed that the new integration test fails with
https://github.com/letsencrypt/boulder/pull/8072 reverted.

Fixes https://github.com/letsencrypt/boulder/issues/8083
2025-03-31 09:07:41 -07:00
James Renken 3f879ed0b4
Add Identifiers to Authorization & Order structs (#7961)
Add `identifier` fields, which will soon replace the `dnsName` fields,
to:
- `corepb.Authorization`
- `corepb.Order`
- `rapb.NewOrderRequest`
- `sapb.CountFQDNSetsRequest`
- `sapb.CountInvalidAuthorizationsRequest`
- `sapb.FQDNSetExistsRequest`
- `sapb.GetAuthorizationsRequest`
- `sapb.GetOrderForNamesRequest`
- `sapb.GetValidAuthorizationsRequest`
- `sapb.NewOrderRequest`

Populate these `identifier` fields in every function that creates
instances of these structs.

Use these `identifier` fields instead of `dnsName` fields (at least
preferentially) in every function that uses these structs. When crossing
component boundaries, don't assume they'll be present, for
deployability's sake.

Deployability note: Mismatched `cert-checker` and `sa` versions will be
incompatible because of a type change in the arguments to
`sa.SelectAuthzsMatchingIssuance`.

Part of #7311
2025-03-26 10:30:24 -07:00
Aaron Gable 12e660874d
Reduce flakiness in crl-updater integration tests (#8044)
Remove crl-updater from the list of services run by startservers.py, so
that it isn't running at the same time as the crl-updater instances run
by specific integration tests. In return, add a new integration test
which starts crl-updater and waits for it to listen on its debug port,
just like startservers does.

Also make the existing crl-updater integration tests more robust and
more parallelizable by having them always reset the leasedUntil column
before executing the updater, instead of requiring each individual test
to perform that reset.

Fixes https://github.com/letsencrypt/boulder/issues/7590
2025-03-07 09:38:02 -08:00
Aaron Gable 28b49a82d4
SA: Improve concurrency robustness of CRL leasing transactions (#8030)
In a few places within the SA, we use explicit transactions to wrap
read-then-update style operations. Because we set the transaction
isolation level on a per-session basis, these transactions do not in
fact change their isolation level, and therefore generally remain at the
default isolation level of REPEATABLE READ.

Unfortunately, we cannot resolve this simply by converting the SELECT
statements into SELECT...FOR UPDATE statements: although this would fix
the issue by making those queries into locking statements, it also
triggers what appears to be an InnoDB bug when many transactions all
attempt to select-then-insert into a table with both a primary key and a
separate unique key, as the crlShards table has. This causes the
integration tests in GitHub Actions, which run with an empty database
and therefore use the needToInsert codepath instead of the update
codepath, to consistently flake.

Instead, resolve the issue by having the UPDATE statements specify that
the value of the leasedUntil column is still the same as was read by the
initial SELECT. Although two crl-updaters may still attempt these
transactions concurrently, the UPDATE statements will still be fully
sequenced, and the latter one will fail.

Part of https://github.com/letsencrypt/boulder/issues/8031
2025-03-03 15:29:57 -08:00
Aaron Gable 63a0e500ed
Create profiles integration test (#8003)
This wasn't previously possible because eggsampler/acme didn't support
profiles until late last week.
2025-02-11 15:47:41 -08:00
Jacob Hoffman-Andrews eda496606d
crl-updater: split temporal/explicit sharding by serial (#7990)
When we turn on explicit sharding, we'll change the CA serial prefix, so
we can know that all issuance from the new prefixes uses explicit
sharding, and all issuance from the old prefixes uses temporal sharding.
This lets us avoid putting a revoked cert in two different CRL shards
(the temporal one and the explicit one).

To achieve this, the crl-updater gets a list of temporally sharded
serial prefixes. When it queries the `certificateStatus` table by date
(`GetRevokedCerts`), it will filter out explicitly sharded certificates:
those that don't have their prefix on the list.

Part of #7094
2025-02-04 11:45:46 -05:00
Jacob Hoffman-Andrews a8074d2e9d
test: add more testing for CRL revocation (#7957)
In revocation_test.go, fetch all CRLs, and look for revoked certificates
on both CRLs and OCSP.

Make s3-test-srv listen on all interfaces, so the CRL URLs in the CA
config work.

Add IssuerNameIDs to the CRL URLs in ca.json, to match how those CRLs
are uploaded to S3.

Make TestRevocation parallel. Speedup from ~60s to ~3s.

Increase ocsp-responder's allowed parallelism to account for parallel
test. Also, add "maxInflightSignings" to config/ since it's in prod.
"maxSigningWaiters" is not yet in prod, so don't move that field.

Add a mutex around running crl-updater, and decrease the log level so
errors stand out more when they happen.
2025-01-23 18:49:55 -08:00
Aaron Gable 327f96d281
Update integration test hierarchy for the modern era (#7411)
Update the hierarchy which the integration tests auto-generate inside
the ./hierarchy folder to include three intermediates of each key type,
two to be actively loaded and one to be held in reserve. To facilitate
this:
- Update the generation script to loop, rather than hard-coding each
intermediate we want
- Improve the filenames of the generated hierarchy to be more readable
- Replace the WFE's AIA endpoint with a thin aia-test-srv so that we
don't have to have NameIDs hardcoded in our ca.json configs

Having this new hierarchy will make it easier for our integration tests
to validate that new features like "unpredictable issuance" are working
correctly.

Part of https://github.com/letsencrypt/boulder/issues/729
2024-04-08 14:06:00 -07:00
Aaron Gable 19f03da1e4
crl-storer: check number before uploading (#7065)
Have the crl-storer download the previous CRL from S3, parse it, and
compare its number against the about-to-be-uploaded CRL. This is not an
atomic operation, so it is not a 100% guarantee, but it is still a
useful safety check to prevent accidentally uploading CRL shards whose
CRL Numbers are not strictly increasing.

Part of https://github.com/letsencrypt/boulder/issues/6456
2023-09-27 09:12:44 -07:00
Aaron Gable 6a450a2272
Improve CRL shard leasing (#7030)
Simplify the index-picking logic in the SA's leaseOldestCrlShard method.
Specifically, more clearly separate it into "missing" and "non-missing"
cases, which require entirely different logic: picking a random missing
shard, or picking the oldest unleased shard, respectively.

Also change the UpdateCRLShard method to "unlease" shards when they're
updated. This allows the crl-updater to run as quickly as it likes,
while still ensuring that multiple instances do not step on each other's
toes.

The config change for shardWidth and lookbackPeriod instead of
certificateLifetime has been deployed in prod since IN-8445. The config
change changing the shardWidth is just so that the tests neither produce
a bazillion shards, nor have to do a bazillion SA queries for each chunk
within a shard, improving the readability of test logs.

Part of https://github.com/letsencrypt/boulder/issues/7023
2023-08-08 17:05:00 -07:00
Aaron Gable 9a4f0ca678
Deprecate LeaseCRLShards feature (#7009)
This feature flag is enabled in both staging and prod.
2023-08-07 15:17:00 -07:00
Aaron Gable 908421bb98
crl-updater: lease CRL shards to prevent races (#6941)
Add a new feature flag, LeaseCRLShards, which controls certain aspects
of crl-updater's behavior.

When this flag is enabled, crl-updater calls the new SA.LeaseCRLShard
method before beginning work on a shard. This prevents it from stepping
on the toes of another crl-updater instance which may be working on the
same shard. This is important to prevent two competing instances from
accidentally updating a CRL's Number (which is an integer representation
of its thisUpdate timestamp) *backwards*, which would be a compliance
violation.

When this flag is enabled, crl-updater also calls the new
SA.UpdateCRLShard method after finishing work on a shard.

In the future, additional work will be done to make crl-updater use the
"give me the oldest available shard" mode of the LeaseCRLShard method.

Fixes https://github.com/letsencrypt/boulder/issues/6897
2023-07-19 15:11:16 -07:00
Matthew McPherrin b5118dde36
Stop using DIRECTORY env var in integration tests (#6854)
We only ever set it to the same value, and then read it back in
make_client, so just hardcode it there instead.

It's a bit spooky-action-at-a-distance and is process-wide with no
synchronization, which means we can't safely use different values
anyway.
2023-05-03 09:54:48 -04:00
Aaron Gable 6d6f3632da
Change SetCommonName to RequireCommonName (#6749)
Change the SetCommonName flag, introduced in #6706, to
RequireCommonName. Rather than having the flag control both whether or
not a name is hoisted from the SANs into the CN *and* whether or not the
CA is willing to issue certs with no CN, this updated flag now only
controls the latter. By default, the new flag is true, and continues our
current behavior of failing issuance if we cannot set a CN in the cert.
When the flag is set to false, then we are willing to issue certificates
for which the CSR contains no CN and there is no SAN short enough to be
hoisted into the CN field.

When we have rolled out this change, we can move on to the next flag in
this series: HoistCommonName, which will control whether or not a SAN is
hoisted at all, effectively giving the CSRs (and therefore the clients)
full control over whether their certificate contains a SAN.

This change is safe because no environment explicitly sets the
SetCommonName flag to false yet.

Fixes #5112
2023-03-21 11:07:06 -07:00
Matthew McPherrin 1d16ff9b00
Add support for subcommands to "boulder" command (#6426)
Boulder builds a single binary which is symlinked to the different binary names, which are included in its releases.
However, requiring symlinks isn't always convenient.

This change makes the base `boulder` command usable as any of the other binary names.  If the binary is invoked as boulder, runs the second argument as the command name.  It shifts off the `boulder` from os.Args so that all the existing argument parsing can remain unchanged.

This uses the subcommand versions in integration tests, which I think is important to verify this change works, however we can debate whether or not that should be merged, since we're using the symlink method in production, that's what we want to test.

Issue #6362 suggests we want to move to a more fully-featured command-line parsing library that has proper subcommand support. This fixes one fragment of that, by providing subcommands, but is definitely nowhere near as nice as it could be with a more fully fleshed out library.  Thus this change takes a minimal-touch approach to this change, since we know a larger refactoring is coming.
2022-10-06 11:21:47 -07:00
Samantha 90eb90bdbe
test: Replace sd-test-srv with consul (#6389)
- Add a dedicated Consul container
- Replace `sd-test-srv` with Consul
- Add documentation for configuring Consul
- Re-issue all gRPC credentials for `<service-name>.service.consul`

Part of #6111
2022-09-19 16:13:53 -07:00
Aaron Gable 7f189f7a3b
Improve how crl-updater formats and surfaces errors (#6369)
Make every function in the Run -> Tick -> tickIssuer -> tickShard chain
return an error. Make that return value a named return (which we usually
avoid) so that we can remove the manual setting of the metric result
label and have the deferred metric handling function take care of that
instead. In addition, let that cleanup function wrap the returned error
(if any) with the identity of the shard, issuer, or tick that is
returning it, so that we don't have to include that info in every
individual error message. Finally, have the functions which spin off
many helpers (Tick and tickIssuer) collect all of their helpers' errors
and only surface that error at the end, to ensure the process completes
even in the presence of transient errors.

In crl-updater's main, surface the error returned by Run or Tick, to
make debugging easier.
2022-09-12 11:36:42 -07:00
Aaron Gable 78fbda1cd2
Enable CRL test in config integration tests (#6368)
Now that both crl-updater and crl-storer are running in prod,
run this integration test in both test environments as well.

In addition, remove the fake storer grpc client that the updater
used when no storer client was configured, as storer clients
are now configured in all environments.
2022-09-09 16:03:49 -07:00
Aaron Gable 9c197e1f43
Use io and os instead of deprecated ioutil (#6286)
The iotuil package has been deprecated since go1.16; the various
functions it provided now exist in the os and io packages. Replace all
instances of ioutil with either io or os, as appropriate.
2022-08-10 13:30:17 -07:00
Aaron Gable 6a9bb399f7
Create new crl-storer service (#6264)
Create a new crl-storer service, which receives CRL shards via gRPC and
uploads them to an S3 bucket. It ignores AWS SDK configuration in the
usual places, in favor of configuration from our standard JSON service
config files. It ensures that the CRLs it receives parse and are signed
by the appropriate issuer before uploading them.

Integrate crl-updater with the new service. It streams bytes to the
crl-storer as it receives them from the CA, without performing any
checking at the same time. This new functionality is disabled if the
crl-updater does not have a config stanza instructing it how to connect
to the crl-storer.

Finally, add a new test component, the s3-test-srv. This acts similarly
to the existing mail-test-srv: it receives requests, stores information
about them, and exposes that information for later querying by the
integration test. The integration test uses this to ensure that a
newly-revoked certificate does show up in the next generation of CRLs
produced.

Fixes #6162
2022-08-08 16:22:48 -07:00