We already have an integration test showing that a serial does not show
up on any CRL before its certificate has been revoked, and does show up
afterwards. Extend that test to cover three new times:
- shortly before the certificate expires, when the entry must still
appear;
- shortly after the certificate expires, when the entry must still
appear; and
- significantly after the certificate expires, when the entry may be
removed.
To facilitate this, augment the s3-test-srv with a new reset endpoint,
so that the integration test can query the contents of only the
most-recently-generated set of CRLs.
I have confirmed that the new integration test fails with
https://github.com/letsencrypt/boulder/pull/8072 reverted.
Fixes https://github.com/letsencrypt/boulder/issues/8083
Delete several python revocation integration tests whose functionality
is already replicated by the go revocation integration tests. Add
support for revoking via admin-revoker to TestRevocation, and use that
to replace several more python tests.
The go versions of these tests use CRLs, rather than OCSP, to confirm
the revocation status of the certs in question. This is fine because the
purpose of these tests is to ensure that we handle revocation requests
correctly in general, not specifically via OCSP.
Part of https://github.com/letsencrypt/boulder/issues/8059
Change all of the helper methods and functions in verify.go to return an
`error` instead of a `probs.ProblemDetails`. Add a few new types to our
errors package, and support for those types in ProblemDetailsForError,
to maintain the same public-facing error types. Update the tests to
check for specific errors instead of specific problems.
This is a building block towards making the probs.ProblemDetails type
not implement the Error interface, and only be used when rendering
errors to the user (i.e. not within Boulder logic itself).
Part of https://github.com/letsencrypt/boulder/issues/4980
Add `identifier` fields, which will soon replace the `dnsName` fields,
to:
- `corepb.Authorization`
- `corepb.Order`
- `rapb.NewOrderRequest`
- `sapb.CountFQDNSetsRequest`
- `sapb.CountInvalidAuthorizationsRequest`
- `sapb.FQDNSetExistsRequest`
- `sapb.GetAuthorizationsRequest`
- `sapb.GetOrderForNamesRequest`
- `sapb.GetValidAuthorizationsRequest`
- `sapb.NewOrderRequest`
Populate these `identifier` fields in every function that creates
instances of these structs.
Use these `identifier` fields instead of `dnsName` fields (at least
preferentially) in every function that uses these structs. When crossing
component boundaries, don't assume they'll be present, for
deployability's sake.
Deployability note: Mismatched `cert-checker` and `sa` versions will be
incompatible because of a type change in the arguments to
`sa.SelectAuthzsMatchingIssuance`.
Part of #7311
- Plumb the "replaces" value from the WFE through to the SA via the RA
- Store validated "replaces" value for new orders in the orders table
- Reflect the stored "replaces" value to subscribers in the order object
- Reorder CertificateProfileName before Replaces/ReplacesSerial in RA
and SA protos for consistency
Fixes#8034
Return "alreadyReplaced" in addition to HTTP 409 Conflict to signal that
an order indicates that it replaces a certificate which already has a
replacement order.
When we receive an update-account request which is not empty, but
doesn't contain the "contact" field, don't assume that they want to
remove their contacts. Only remove contacts if the "contact" field is
present, but empty.
Add a unit test and an integration test which will catch regressions in
this behavior.
PR #8018 integrated the email-exporter service with WFE, updating
wfe.NewAccount and wfe.updateAccount to submit valid email contacts to
the Salesforce Pardot API. However, our new_or_updated_contact metric
shows that (account) contact updates currently exceed the highest
Salesforce tier’s daily submission limit by several times.
This change can be reverted if additional filtering logic reduces
updated (+ new) account contacts below the daily submission limit.
Remove crl-updater from the list of services run by startservers.py, so
that it isn't running at the same time as the crl-updater instances run
by specific integration tests. In return, add a new integration test
which starts crl-updater and waits for it to listen on its debug port,
just like startservers does.
Also make the existing crl-updater integration tests more robust and
more parallelizable by having them always reset the leasedUntil column
before executing the updater, instead of requiring each individual test
to perform that reset.
Fixes https://github.com/letsencrypt/boulder/issues/7590
Add a new boulder service, email-exporter, which uses the Pardot API
client added in #8016 and the email.Exporter gRPC service added in
#8017.
Add pardot-test-srv, a test-only service for mocking communication with
Salesforce OAuth and Pardot APIs in non-production environments. Since
Salesforce does not provide Pardot functionality in developer sandboxes,
pardot-test-srv must run in all non-production environments (e.g.,
sre-development and staging).
Integrate the email-exporter service with the WFE and modify
WFE.NewAccount and WFE.UpdateAccount to submit valid email contacts.
Ensure integration tests verify that contacts eventually reach
pardot-test-srv.
Update configuration where necessary to:
- Build pardot-test-srv as a standalone binary.
- Bring up pardot-test-srv and cmd/email-exporter for integration
testing.
- Integrate WFE with cmd/email-exporter when running test/config-next.
Closes#7966
Compute the width of the ARI suggested renewal window as 2% of the
validity period. This means that 90-day certificates have their
suggested window shrink slightly from 48 hours to 43.2 hours, and gives
six-day (160h) certs a suggested window 3.2 hours wide.
Also move the center of that window to the midpoint of the certificate
validity period for certs which are valid for less than 10 days, so that
operators have (proportionally) a little more time to respond to renewal
issues.
Fixes https://github.com/letsencrypt/boulder/issues/7996
In a few places within the SA, we use explicit transactions to wrap
read-then-update style operations. Because we set the transaction
isolation level on a per-session basis, these transactions do not in
fact change their isolation level, and therefore generally remain at the
default isolation level of REPEATABLE READ.
Unfortunately, we cannot resolve this simply by converting the SELECT
statements into SELECT...FOR UPDATE statements: although this would fix
the issue by making those queries into locking statements, it also
triggers what appears to be an InnoDB bug when many transactions all
attempt to select-then-insert into a table with both a primary key and a
separate unique key, as the crlShards table has. This causes the
integration tests in GitHub Actions, which run with an empty database
and therefore use the needToInsert codepath instead of the update
codepath, to consistently flake.
Instead, resolve the issue by having the UPDATE statements specify that
the value of the leasedUntil column is still the same as was read by the
initial SELECT. Although two crl-updaters may still attempt these
transactions concurrently, the UPDATE statements will still be fully
sequenced, and the latter one will fail.
Part of https://github.com/letsencrypt/boulder/issues/8031
Add MaxNames to the set of things that can be configured on a
per-profile basis. Remove all references to the RA's global maxNames,
replacing them with reference's to the current profile's maxNames. Add
code to the RA's main() to copy a globally-configured MaxNames into each
profile, for deployability.
Also remove any understanding of MaxNames from the WFE, as it is
redundant with the RA and is not configured in staging or prod. Instead,
hardcode the upper limit of 100 into the ratelimit package itself.
Fixes https://github.com/letsencrypt/boulder/issues/7993
Add a new RPC to the CA: `IssueCertificate` covers issuance of both the
precertificate and the final certificate. In between, it calls out to
the RA's new method `GetSCTs`.
The RA calls the new `CA.IssueCertificate` if the `UnsplitIssuance`
feature flag is true.
The RA had a metric that counted certificates by profile name and hash.
Since the RA doesn't receive a profile hash in the new flow, simply
record the total number of issuances.
Fixes https://github.com/letsencrypt/boulder/issues/7983
Update from go1.23.1 to go1.23.6 for our primary CI and release builds.
This brings in a few security fixes that aren't directly relevant to us.
Add go1.24.0 to our matrix of CI and release versions, to prepare for
switching to this next major version in prod.
When we turn on explicit sharding, we'll change the CA serial prefix, so
we can know that all issuance from the new prefixes uses explicit
sharding, and all issuance from the old prefixes uses temporal sharding.
This lets us avoid putting a revoked cert in two different CRL shards
(the temporal one and the explicit one).
To achieve this, the crl-updater gets a list of temporally sharded
serial prefixes. When it queries the `certificateStatus` table by date
(`GetRevokedCerts`), it will filter out explicitly sharded certificates:
those that don't have their prefix on the list.
Part of #7094
The CRLDP is included only when the profile's
IncludeCRLDistributionPoints field is true.
Introduce a new config field for issuers, CRLShards. If
IncludeCRLDistributionPoints is true and this is zero, issuance will
error.
The CRL shard is assigned at issuance time based on the (random) low
bits of the serial number.
Part of https://github.com/letsencrypt/boulder/issues/7094
In revocation_test.go, fetch all CRLs, and look for revoked certificates
on both CRLs and OCSP.
Make s3-test-srv listen on all interfaces, so the CRL URLs in the CA
config work.
Add IssuerNameIDs to the CRL URLs in ca.json, to match how those CRLs
are uploaded to S3.
Make TestRevocation parallel. Speedup from ~60s to ~3s.
Increase ocsp-responder's allowed parallelism to account for parallel
test. Also, add "maxInflightSignings" to config/ since it's in prod.
"maxSigningWaiters" is not yet in prod, so don't move that field.
Add a mutex around running crl-updater, and decrease the log level so
errors stand out more when they happen.
Remove code using `certificatesPerName` & `newOrdersRL` tables.
Deprecate `DisableLegacyLimitWrites` & `UseKvLimitsForNewOrder` flags.
Remove legacy `ratelimit` package.
Delete these RA test cases:
- `TestAuthzFailedRateLimitingNewOrder` (rl:
`FailedAuthorizationsPerDomainPerAccount`)
- `TestCheckCertificatesPerNameLimit` (rl: `CertificatesPerDomain`)
- `TestCheckExactCertificateLimit` (rl: `CertificatesPerFQDNSet`)
- `TestExactPublicSuffixCertLimit` (rl: `CertificatesPerDomain`)
Rate limits in NewOrder are now enforced by the WFE, starting here:
5a9b4c4b18/wfe2/wfe.go (L781)
We collect a batch of transactions to check limits, check them all at
once, go through and find which one(s) failed, and serve the failure
with the Retry-After that's furthest in the future. All this code
doesn't really need to be tested again; what needs to be tested is that
we're returning the correct failure. That code is
`NewOrderLimitTransactions`, and the `ratelimits` package's tests cover
this.
The public suffix handling behavior is tested by
`TestFQDNsToETLDsPlusOne`:
5a9b4c4b18/ratelimits/utilities_test.go (L9)
Some other RA rate limit tests were deleted earlier, in #7869.
Part of #7671.
The RA's DeactivateAccount method expects the account provided to it by
the WFE to still have status Valid. The new WFE deactivation code was
hardcoding the status to Deactivated. Fix the WFE to pass the account's
current status instead.
Add an integration test to confirm both the breakage and the fix. Also
leave behind some TODOs to simplify this codepath further, and not
require the status to be provided at all.
Part of #5554
This means that most traffic will go to the authz URLs with account.
After this has been deployed for 30 days (the max lifetime of an order),
we can remove support for the old paths.
Part of #7683
Add a new WFE & nonce config field, `NonceHMACKey`, which uses the new
`cmd.HMACKeyConfig` type. Deprecate the `NoncePrefixKey` config field.
Generalize the error message when validating `HMACKeyConfig` in
`config`.
Remove the deprecated `UseDerivablePrefix` config field, which is no
longer used anywhere.
Part of #7632
TestTraces is designed to test whether our Open Telemetry tracing system
is working: that spans are being output, that they have the appropriate
parents, etc. It should not be testing whether Boulder took a specific
path through its code -- that's the domain of package-specific unit
tests. Simplify TestTraces to the point that it is asserting (nearly)
the bare minimum about the set of operations Boulder performs.
Add a new method, `BatchIncrement`, to issue `IncrBy` (instead of `Set`)
to Redis. This helps prevent the race condition that allows bursts of
near-simultaneous requests to, effectively, spend the same token.
Call this new method when incrementing an existing key. New keys still
need to use `BatchSet` because Redis doesn't have a facility to, within
a single operation, increment _or_ set a default value if none exists.
Add a new feature flag, `IncrementRateLimits`, gating the use of this
new method.
CPS Compliance Review: This feature flag does not change any behaviour
that is described or constrained by our CP/CPS. The closest relation
would just be API availability in general.
Fixes#7780
Clean up how we handle identifiers throughout the Boulder codebase by
- moving the Identifier protobuf message definition from sa.proto to
core.proto;
- adding support for IP identifier to the "identifier" package;
- renaming the "identifier" package's exported names to be clearer; and
- ensuring we use the identifier package's helper functions everywhere
we can.
This will make future work to actually respect identifier types (such as
in Authorization and Order protobuf messages) simpler and easier to
review.
Part of https://github.com/letsencrypt/boulder/issues/7311
Have the WFE ask the RA for authorizations, rather than asking the SA
directly. This extra layer of indirection allows us to filter out
challenges which have been disabled, so that clients don't think they
can attempt challenges that we have disabled.
Also shuffle the order of challenges within the authz objects rendered
by the API. We used to have code which does this at authz creation time,
but of course that was completely ineffectual once we stored the
challenges as just a bitmap in the database.
Update the WFE unit tests to mock RA.GetAuthorization instead of
SA.GetAuthorization2. This includes making the mock more accurate, so
that (e.g.) valid authorizations contain valid challenges, and the
challenges have their correct types (e.g. "http-01" instead of just
"http"). Also update the OTel tracing test to account for the new RPC.
Part of https://github.com/letsencrypt/boulder/issues/5913
- Add feature flag `UseKvLimitsForNewOrder`
- Add feature flag `UseKvLimitsForNewAccount`
- Flush all Redis shards before running integration or unit tests, this
avoids false positives between local testing runs
Fixes#7664
Blocked by #7676
- Check `CertificatesPerDomain` at newOrder and spend at Finalize time.
- Check `CertificatesPerAccountPerDomain` at newOrder and spend at
Finalize time.
- Check `CertificatesPerFQDNSet` at newOrder and spend at Finalize time.
- Fix a bug
in`FailedAuthorizationsPerDomainPerAccountSpendOnlyTransaction()` which
results in failed authorizations being spent for the exact FQDN, not the
eTLD+1.
- Remove redundant "max names" check at transaction construction time
- Enable key-value rate limits in the RA
- Instruct callers to call *Decision.Result() to check the result of
rate limit transactions
- Preserve the Transaction within the resulting *Decision
- Generate consistently formatted verbose errors using the metadata
found in the *Decision
- Fix broken key-value rate limits integration test in
TestDuplicateFQDNRateLimit
Fixes#7577
Call `RA.UnpauseAccount` for valid unpause form submissions.
Determine and display the appropriate outcome to the Subscriber based on
the count returned by `RA.UnpauseAccount`:
- If the count is zero, display the "Account already unpaused" message.
- If the count equals the max number of identifiers allowed in a single
request, display a page explaining the need to visit the unpause URL
again.
- Otherwise, display the "Successfully unpaused all N identifiers"
message.
Apply per-request timeout from the SFE configuration.
Part of https://github.com/letsencrypt/boulder/issues/7406
- Rename `NewOrderRequest` field `LimitsExempt` to `IsARIRenewal`
- Introduce a new `NewOrderRequest` field, `IsRenewal`
- Introduce a new (temporary) feature flag, `CheckRenewalExemptionAtWFE`
WFE:
- Perform renewal detection in the WFE when `CheckRenewalExemptionAtWFE`
is set
- Skip (key-value) `NewOrdersPerAccount` and `CertificatesPerDomain`
limit checks when renewal detection indicates the the order is a
renewal.
RA:
- Leave renewal detection in the RA intact
- Skip renewal detection and (legacy) `NewOrdersPerAccount` and
`CertificatesPerDomain` limit checks when `CheckRenewalExemptionAtWFE`
is set and the `NewOrderRequest` indicates that the order is a renewal.
Fixes#7508
Part of #5545
- Shrink the number of public `ratelimits` methods by relocating two
sizeable transaction constructors. Simplify the spend and refund
call-sites in the WFE.
- Spend calls now block instead of being called asynchronously.
This allows us to give a user-meaningful error about malformed names
early on, instead of propagating internal errors from the new rate
limiting system.
This moves the well-formedness logic from `WillingToIssue` into a new
function `WellFormedDomainNames`, which calls `ValidDomain` on each name
and combines the errors into suberrors if there is more than one.
`WillingToIssue` now calls `WellFormedDomainNames` to keep the existing
behavior. Additionally, WFE calls `WellFormedDomainNames` before
checking rate limits.
This creates a slight behavior change: If an order contains both
malformed domain names and wellformed but blocked domain names,
suberrors will only be generated for the malformed domain names. This is
reflected in the changes to `TestWillingToIssue_Wildcard`.
Adds a WFE test case for receiving malformed identifiers in a new-order
request.
Follows up on #3323 and #7218Fixes#7526
Some small incidental fixes:
- checkWildcardHostList was checking `pa.blocklist` for `nil` before
accessing `pa.wildcardExactBlocklist`. Fix that.
- move table test for WillingToIssue into a new test case for
WellFormedDomainNames
- move two standalone test cases into the big table test
Remove the redis-tls, wfe-tls, and mail-test-srv keys which were
generated by minica and then checked in to the repo. All three are
replaced by the dynamically-generated ipki directory.
Part of https://github.com/letsencrypt/boulder/issues/7476
The summary here is:
- Move test/cert-ceremonies to test/certs
- Move .hierarchy (generated by the above) to test/certs/webpki
- Remove our mapping of .hierarchy to /hierarchy inside docker
- Move test/grpc-creds to test/certs/ipki
- Unify the generation of both test/certs/webpki and test/certs/ipki
into a single script at test/certs/generate.sh
- Make that script the entrypoint of a new docker compose service
- Have t.sh and tn.sh invoke that service to ensure keys and certs are
created before tests run
No production changes are necessary, the config changes here are just
for testing purposes.
Part of https://github.com/letsencrypt/boulder/issues/7476
We first introduced caa-log-checker as a remediation item in the wake of
https://bugzilla.mozilla.org/show_bug.cgi?id=1619047. Since that time,
we have upgraded to go1.22, which completely remoes the class of bug
which led to that incident (https://tip.golang.org/doc/go1.22#language).
Throughout its life, caa-log-checker was an operational burden, and was
at best a post-hoc check to detect issues after they had already
occurred. Therefore, we no longer run it in our production environment,
and it can be removed from the Boulder source.
Replace the CA's "useForRSA" and "useForECDSA" config keys with a single
"active" boolean. When the CA starts up, all active RSA issuers will be
used to issue precerts with RSA pubkeys, and all ECDSA issuers will be
used to issue precerts with ECDSA pubkeys (if the ECDSAForAll flag is
true; otherwise just those that are on the allow-list). All "inactive"
issuers can still issue OCSP responses, CRLs, and (notably) final
certificates.
Instead of using the "useForRSA" and "useForECDSA" flags, plus implicit
config ordering, to determine which issuer to use to handle a given
issuance, simply use the issuer's public key algorithm to determine
which issuances it should be handling. All implicit ordering
considerations are removed, because the "active" certificates now just
form a pool that is sampled from randomly.
To facilitate this, update some unit and integration tests to be more
flexible and try multiple potential issuing intermediates, particularly
when constructing OCSP requests.
For this change to be safe to deploy with no user-visible behavior
changes, the CA configs must contain:
- Exactly one RSA-keyed intermediate with "useForRSALeaves" set to true;
and
- Exactly one ECDSA-keyed intermediate with "useForECDSALeaves" set to
true.
If the configs contain more than one intermediate meeting one of the
bullets above, then randomized issuance will begin immediately.
Fixes https://github.com/letsencrypt/boulder/issues/7291
Fixes https://github.com/letsencrypt/boulder/issues/7290