Methodology:
- Copy test/config-next/* to test/config/.
- Review the diff, reverting things that should stay `next`-only.
- When in doubt, check against prod configs (e.g. for feature flags).
In the process I noticed that config for the TCP prober in `observer`
had been added to test/config but not test/config-next, so I ported it
forward (and my IDE stripped some trailing spaces in both versions).
#8109 updated CI to use 24.04 runners, now update the Docker image to
build 24.04 and CI to use it.
Build fixes:
- Unpin mariadb-client-core, 10.3 is no longer provided in 24.04 apt
repositories
- Use new pip flag --break-system-packages to comply with PEP 668, which
is now enforced in Python 3.12+
Runtime fixes:
- Start rsyslogd directly due to missing symlink (see:
https://github.com/rsyslog/rsyslog/issues/5611)
- Fix SyntaxWarning: invalid escape sequence '\w' error.
- Replace OpenSSL.crypto.load_certificate with
x509.load_pem_x509_certificate due to
d73d0ed417
Plumb the userAgent field, used to set http-01 User-Agent headers, from
va/rva configuration through to where User-Agent headers can be set for
DoH queries. Use integration tests to validate that the User-Agent is
set for http-01 challenges, dns-01 challenges over DoH, and CAA checks
over DoH.
Fixes#7963.
Add a new config field for profiles which causes the profile to omit the
AIA OCSP URI. It can only be omitted if the CRLDP extension is
configured to be included instead. Enable this flag in config-next.
When a certificate is revoked, if it does not have an AIA OCSP URI,
don't bother with an Akamai OCSP purge.
Builds on #8089
Most of the changes in this PR relate to tests. Different from #8089, I
chose to keep testing of OCSP in the config-next world. This is because
we intend to keep operating OCSP even after we have stopped including it
in new certificates. So we should test it in as many environments as
possible.
Adds a WithURLFallback option to ocsp_helper. When
`ocsp_helper.ReqDer()` is called for a certificate with no OCSP URI, it
will query the fallback URL instead. As before, if the certificate has
an OCSP URI ocsp_helper will use that. Use that for all places in the
integration tests that call ocsp_helper.
- Update the chall-test-srv-client to make DNS events and DNS01 methods
more convenient
- Add an integration test that counts DCV and CAA checks for each
validation method
Part of #7963
For explicitly sharded certificates, CRL status is read from the
`revokedCertificates` table. This table gets written at revocation time.
At re-revocation time (for key compromise), it only gets written by the
SA if the caller passes a nonzero ShardIdx to UpdateRevokedCertificate.
The RA was never passing a nonzero ShardIdx to UpdateRevokedCertificate.
- Copy
https://pkg.go.dev/github.com/letsencrypt/pebble/v2/cmd/pebble-challtestsrv
to `test/chall-test-srv`
- Rename pebble-challtestsrv to chall-test-srv, consistent with other
test server naming in Boulder
- Replace Dockerfile go install with Makefile compilation of
`chall-test-srv`
- Run chall-test-srv from `./bin/chall-test-srv`
- Bump `github.com/letsencrypt/challtestsrv` from `v1.2.1` to `v1.3.2`
in go.mod
- Update boulder-ci GitHub workflow to use `go1.24.1_2025-04-02`
Part of #7963
Replace a python integration test which relies on our
"setup_twenty_days_ago" scaffolding with a Go test that uses direct
database statements to avoid any need to do clock manipulation. The
resulting test is much more verbose, but also (in my opinion) much
clearer and significantly faster.
We already have an integration test showing that a serial does not show
up on any CRL before its certificate has been revoked, and does show up
afterwards. Extend that test to cover three new times:
- shortly before the certificate expires, when the entry must still
appear;
- shortly after the certificate expires, when the entry must still
appear; and
- significantly after the certificate expires, when the entry may be
removed.
To facilitate this, augment the s3-test-srv with a new reset endpoint,
so that the integration test can query the contents of only the
most-recently-generated set of CRLs.
I have confirmed that the new integration test fails with
https://github.com/letsencrypt/boulder/pull/8072 reverted.
Fixes https://github.com/letsencrypt/boulder/issues/8083
Delete several python revocation integration tests whose functionality
is already replicated by the go revocation integration tests. Add
support for revoking via admin-revoker to TestRevocation, and use that
to replace several more python tests.
The go versions of these tests use CRLs, rather than OCSP, to confirm
the revocation status of the certs in question. This is fine because the
purpose of these tests is to ensure that we handle revocation requests
correctly in general, not specifically via OCSP.
Part of https://github.com/letsencrypt/boulder/issues/8059
Change all of the helper methods and functions in verify.go to return an
`error` instead of a `probs.ProblemDetails`. Add a few new types to our
errors package, and support for those types in ProblemDetailsForError,
to maintain the same public-facing error types. Update the tests to
check for specific errors instead of specific problems.
This is a building block towards making the probs.ProblemDetails type
not implement the Error interface, and only be used when rendering
errors to the user (i.e. not within Boulder logic itself).
Part of https://github.com/letsencrypt/boulder/issues/4980
Remove the backwards-compatible profile hashing code. It is no longer
necessary, since all deployed profile configs now set
IncludeCRLDistributionPoints to true and set the UnsplitIssuance flag to
true. Catch up the CA and crl-updater configs to match config-next and
what is actively deployed in prod.
Part of https://github.com/letsencrypt/boulder/issues/8039
Part of https://github.com/letsencrypt/boulder/issues/8059
Add `identifier` fields, which will soon replace the `dnsName` fields,
to:
- `corepb.Authorization`
- `corepb.Order`
- `rapb.NewOrderRequest`
- `sapb.CountFQDNSetsRequest`
- `sapb.CountInvalidAuthorizationsRequest`
- `sapb.FQDNSetExistsRequest`
- `sapb.GetAuthorizationsRequest`
- `sapb.GetOrderForNamesRequest`
- `sapb.GetValidAuthorizationsRequest`
- `sapb.NewOrderRequest`
Populate these `identifier` fields in every function that creates
instances of these structs.
Use these `identifier` fields instead of `dnsName` fields (at least
preferentially) in every function that uses these structs. When crossing
component boundaries, don't assume they'll be present, for
deployability's sake.
Deployability note: Mismatched `cert-checker` and `sa` versions will be
incompatible because of a type change in the arguments to
`sa.SelectAuthzsMatchingIssuance`.
Part of #7311
Add a new custom lint which sends CRLs to PKIMetal, and configure it to
run in our integration test environment. Factor out most of the code
used to talk to the PKIMetal API so that it can be shared by the two
custom lints which do so. Add the ability to configure lints to the
CRLProfileConfig, so that zlint knows where to load the necessary custom
config from.
Update the SA to re-query the database for the updated account after
deactivating it, and return this to the RA. Update the RA to pass this
value through to the WFE. Update the WFE to return this value, rather
than locally modifying the pre-deactivation account object, if it gets
one (for deployability).
Also remove the RA's requirement that the request object specify its
current status so that the request can be trimmed down to just an ID.
This proto change is backwards-compatible because the new
DeactivateRegistrationRequest's registrationID field has the same type
(int64) and field number (1) as corepb.Registration's id field.
Part of https://github.com/letsencrypt/boulder/issues/5554
Give cert-checker the ability to load zlint configs, so that it can be
configured to talk to PKIMetal in CI and hopefully in staging/production
in the future.
Also update how cert-checker executes lints, so that it uses a real lint
registry instead of using the global registry and passing around a
dictionary of lints to filter out of the results.
Fixes https://github.com/letsencrypt/boulder/issues/7786
Replace the bpkilint container with a new bpkimetal container. Update
our custom lint which calls out to that API to speak PKIMetal's (very
similar) protocol instead. Update our zlint custom configuration to
configure this updated lint.
Fixes https://github.com/letsencrypt/boulder/issues/8009
- Plumb the "replaces" value from the WFE through to the SA via the RA
- Store validated "replaces" value for new orders in the orders table
- Reflect the stored "replaces" value to subscribers in the order object
- Reorder CertificateProfileName before Replaces/ReplacesSerial in RA
and SA protos for consistency
Fixes#8034
Return "alreadyReplaced" in addition to HTTP 409 Conflict to signal that
an order indicates that it replaces a certificate which already has a
replacement order.
We have upgraded to go1.24.1 in production, and no longer need to test
go1.23.x. Updating the version in our go.mod also allows us to begin
using x509.Certificate.Policies instead of .PolicyIdentifiers.
Add a new "MPICFullResults" feature flag. When this flag is enabled in
the VA, it will wait for all Remote VAs to return their results for both
Domain Control Validation and CAA checking, rather than short-circuiting
as soon as it has seen enough results to know whether corroboration will
or will not be achieved.
We make this change because waiting for these to return honestly doesn't
take that long, because we do validation (although not CAA rechecking)
asynchronously, and because it improves the quality of our MPIC quorum
summary logs (so we don't always say only 3/4 concurred because the
fourth was cancelled).
Fixes https://github.com/letsencrypt/boulder/issues/7809
When we receive an update-account request which is not empty, but
doesn't contain the "contact" field, don't assume that they want to
remove their contacts. Only remove contacts if the "contact" field is
present, but empty.
Add a unit test and an integration test which will catch regressions in
this behavior.
PR #8018 integrated the email-exporter service with WFE, updating
wfe.NewAccount and wfe.updateAccount to submit valid email contacts to
the Salesforce Pardot API. However, our new_or_updated_contact metric
shows that (account) contact updates currently exceed the highest
Salesforce tier’s daily submission limit by several times.
This change can be reverted if additional filtering logic reduces
updated (+ new) account contacts below the daily submission limit.
Remove crl-updater from the list of services run by startservers.py, so
that it isn't running at the same time as the crl-updater instances run
by specific integration tests. In return, add a new integration test
which starts crl-updater and waits for it to listen on its debug port,
just like startservers does.
Also make the existing crl-updater integration tests more robust and
more parallelizable by having them always reset the leasedUntil column
before executing the updater, instead of requiring each individual test
to perform that reset.
Fixes https://github.com/letsencrypt/boulder/issues/7590
Remove some RA tests that were checking for errors specific to the split
issuance flow. Make one of the tests test GetSCTs directly, which makes
for a much nicer test!
Add a new boulder service, email-exporter, which uses the Pardot API
client added in #8016 and the email.Exporter gRPC service added in
#8017.
Add pardot-test-srv, a test-only service for mocking communication with
Salesforce OAuth and Pardot APIs in non-production environments. Since
Salesforce does not provide Pardot functionality in developer sandboxes,
pardot-test-srv must run in all non-production environments (e.g.,
sre-development and staging).
Integrate the email-exporter service with the WFE and modify
WFE.NewAccount and WFE.UpdateAccount to submit valid email contacts.
Ensure integration tests verify that contacts eventually reach
pardot-test-srv.
Update configuration where necessary to:
- Build pardot-test-srv as a standalone binary.
- Bring up pardot-test-srv and cmd/email-exporter for integration
testing.
- Integrate WFE with cmd/email-exporter when running test/config-next.
Closes#7966
Compute the width of the ARI suggested renewal window as 2% of the
validity period. This means that 90-day certificates have their
suggested window shrink slightly from 48 hours to 43.2 hours, and gives
six-day (160h) certs a suggested window 3.2 hours wide.
Also move the center of that window to the midpoint of the certificate
validity period for certs which are valid for less than 10 days, so that
operators have (proportionally) a little more time to respond to renewal
issues.
Fixes https://github.com/letsencrypt/boulder/issues/7996
In a few places within the SA, we use explicit transactions to wrap
read-then-update style operations. Because we set the transaction
isolation level on a per-session basis, these transactions do not in
fact change their isolation level, and therefore generally remain at the
default isolation level of REPEATABLE READ.
Unfortunately, we cannot resolve this simply by converting the SELECT
statements into SELECT...FOR UPDATE statements: although this would fix
the issue by making those queries into locking statements, it also
triggers what appears to be an InnoDB bug when many transactions all
attempt to select-then-insert into a table with both a primary key and a
separate unique key, as the crlShards table has. This causes the
integration tests in GitHub Actions, which run with an empty database
and therefore use the needToInsert codepath instead of the update
codepath, to consistently flake.
Instead, resolve the issue by having the UPDATE statements specify that
the value of the leasedUntil column is still the same as was read by the
initial SELECT. Although two crl-updaters may still attempt these
transactions concurrently, the UPDATE statements will still be fully
sequenced, and the latter one will fail.
Part of https://github.com/letsencrypt/boulder/issues/8031
Replace DCV and CAA checks (PerformValidation and IsCAAValid) in
va/va.go and va/caa.go with their MPIC compliant counterparts (DoDCV and
DoCAA) in va/vampic.go. Deprecate EnforceMultiCAA and EnforceMPIC and
default code paths as though they are both true. Require that RIR and
Perspective be set for primary and remote VAs.
Fixes#7965Fixes#7819
Add MaxNames to the set of things that can be configured on a
per-profile basis. Remove all references to the RA's global maxNames,
replacing them with reference's to the current profile's maxNames. Add
code to the RA's main() to copy a globally-configured MaxNames into each
profile, for deployability.
Also remove any understanding of MaxNames from the WFE, as it is
redundant with the RA and is not configured in staging or prod. Instead,
hardcode the upper limit of 100 into the ratelimit package itself.
Fixes https://github.com/letsencrypt/boulder/issues/7993
Add a new RPC to the CA: `IssueCertificate` covers issuance of both the
precertificate and the final certificate. In between, it calls out to
the RA's new method `GetSCTs`.
The RA calls the new `CA.IssueCertificate` if the `UnsplitIssuance`
feature flag is true.
The RA had a metric that counted certificates by profile name and hash.
Since the RA doesn't receive a profile hash in the new flow, simply
record the total number of issuances.
Fixes https://github.com/letsencrypt/boulder/issues/7983
Deprecate the "InsertAuthzsIndividually" feature flag, which has been
set to true in both Staging and Production. Delete the code guarded
behind that flag being false, namely the ability of the MultiInserter to
return the newly-created IDs from all of the rows it has inserted. This
behavior is being removed because it is not supported in MySQL / Vitess.
Fixes https://github.com/letsencrypt/boulder/issues/7718
---
> [!WARNING]
> ~~Do not merge until IN-10737 is complete~~
Update from go1.23.1 to go1.23.6 for our primary CI and release builds.
This brings in a few security fixes that aren't directly relevant to us.
Add go1.24.0 to our matrix of CI and release versions, to prepare for
switching to this next major version in prod.
These paths receive (literally) zero traffic, and they require the WFE
to duplicate the RA's authorization lifetime configuration. Since that
configuration is now per-profile, the WFE can no longer easily replicate
it, and the resulting staleness calculations will be wrong. Remove the
duplicated configuration, remove the unused endpoints that rely on it,
and remove the staleness-checking code which supported those endpoints.
Leave the non-ACME /get/ endpoint for certificates in place, because
checking staleness for those does not require any additional
configuration, and having a non-ACME serial-based API for certificates
is a good thing.
Fixes https://github.com/letsencrypt/boulder/issues/8007
The crl-storer passes along Cache-Control and Expires from the
crl-updater (because the crl-updater knows the UpdatePeriod).
The crl-updater calculates the Expires header based on when it expects
to update the CRL, plus a margin of error.
Fixes#8004
Increase pkilint timeout from 200ms to 2s. In #8006 I found that errors
were stemming from timeouts talking to the bpkilint container. These
probably showed up in TestRevocation particularly because that
integration test now issues for many certificates in parallel. Pkilint's
slowness, combined with the relatively small number of cores in CI,
probably resulted in some requests taking too long.
Remove the deprecated WFE & nonce config field `NoncePrefixKey`, which
has been replaced by `NonceHMACKey`.
<del>DO NOT MERGE until:</del>
- <del>#7793 (in `release-2024-11-18`) has been deployed, AND:</del>
- <del>`NoncePrefixKey` has been removed from all running configs.</del>
Fixes#7632
When we turn on explicit sharding, we'll change the CA serial prefix, so
we can know that all issuance from the new prefixes uses explicit
sharding, and all issuance from the old prefixes uses temporal sharding.
This lets us avoid putting a revoked cert in two different CRL shards
(the temporal one and the explicit one).
To achieve this, the crl-updater gets a list of temporally sharded
serial prefixes. When it queries the `certificateStatus` table by date
(`GetRevokedCerts`), it will filter out explicitly sharded certificates:
those that don't have their prefix on the list.
Part of #7094
Add three new fields to the ra.ValidationProfile structure, representing
the profile's pending authorization lifetime (used to assign an
expiration when a new authz is created), valid authorization lifetime
(used to assign an expiration when an authz is successfully validated),
and order lifetime (used to assign an expiration when a new order is
created). Remove the prior top-level fields which controlled these
values across all orders.
Add a "defaultProfileName" field to the RA as well, to facilitate
looking up a default set of lifetimes when the order doesn't specify a
profile. If this default name is explicitly configured, always provide
it to the CA when requesting issuance, so we don't have to duplicate the
default between the two services.
Modify the RA's config struct in a corresponding way: add three new
fields to the ValidationProfiles structure, and deprecate the three old
top-level fields. Also upgrade the ra.NewValidationProfile constructor
to handle these new fields, including doing validation on their values.
Fixes https://github.com/letsencrypt/boulder/issues/7605
The CRLDP is included only when the profile's
IncludeCRLDistributionPoints field is true.
Introduce a new config field for issuers, CRLShards. If
IncludeCRLDistributionPoints is true and this is zero, issuance will
error.
The CRL shard is assigned at issuance time based on the (random) low
bits of the serial number.
Part of https://github.com/letsencrypt/boulder/issues/7094