Greatly simplify the two RA unit tests covering failed validations and
account+identifier pausing. Most importantly, directly manipulate the
ratelimit backing store during test setup, to avoid having to "perform"
extra validations.
Fixes https://github.com/letsencrypt/boulder/issues/7812
- Remove undeployed feature flag MultiCAAFullResults
- Perform local CAA checks prior to initiating remote checks, instead of
starting remote checks and proceeding to perform local checks.
- Remove VA.IsCAAValid specific remote validation logic, use
VA.performRemoteOperation instead
- Refactor va.logRemoteResults to be easier to test and omit the RVA
problem
- Drive-by fix: Calculate logEvent.Latency with va.clk.Since() instead
of time.Since() like everything else in VA.performRemoteOperation
- Make performRemoteValidation a more generic function that returns a
new remoteResult interface
- Modify the return value of IsCAAValid and PerformValidation to satisfy
the remoteResult interface
- Include compile time checks and tests that pass an arbitrary operation
- Make the primary VA aware of the expected Perspective and RIR of each
remote VA.
- All Perspectives should be unique, have the primary VA check for
duplicate Perspectives at startup.
- Update test setup functions to ensure that each remote VA client and
corresponding inmem impl have a matching perspective and RIR.
Part of #7819
- Remove Perspective and RIR from ValidationRecords
- Make ValidationResultToPB Perspective and RIR aware
- Update comment for VA.PerformValidation
- Make verificationRequestEvent more like doDCVAuditLog
- Update language used in problems created by performRemoteValidation to
be more like doRemoteDCV.
To prepare for the MPIC requirement of having a minimum of 3
perspectives, I added code to `NewValidationAuthorityImpl` to error if
there aren't enough remote VAs configured _and_ the current VA is the
primary perspective. Then I fixed all the tests, which involved adding
some backends in the unittests, and spinning up `remoteva-c` in the
integration tests.
As a reminder, the `boulder va` command always considers itself the
primary perspective, while `boulder remoteva` gives itself a perspective
based on its config.
I wound up backing out the code in `NewValidationAuthorityImpl` because
right now our remote VAs are actually running the `boulder va` command,
so they would error out in prod, even though our actual primary
perspective does have enough backends. So this wound up as a test-only
change.
Previously this was a configuration field.
Ports `maxAllowedFailures()` from `determineMaxAllowedFailures()` in
#7794.
Test updates:
Remove the `maxRemoteFailures` param from `setup` in all VA tests.
Some tests were depending on setting this param directly to provoke
failures.
For example, `TestMultiVAEarlyReturn` previously relied on "zero allowed
failures". Since the number of allowed failures is now 1 for the number
of remote VAs we were testing (2), the VA wasn't returning early with an
error; it was succeeding! To fix that, make sure there are two failures.
Since two failures from two RVAs wouldn't exercise the right situation,
add a third RVA, so we get two failures from three RVAs.
Similarly, TestMultiCAARechecking had several test cases that omitted
this field, effectively setting it to zero allowed failures. I updated
the "1 RVA failure" test case to expect overall success and added a "2
RVA failures" test case to expect overall failure (we previously
expected overall failure from a single RVA failing).
In TestMultiVA I had to change a test for `len(lines) != 1` to
`len(lines) == 0`, because with more backends we were now logging more
errors, and finding e.g. `len(lines)` to be 2.
This means that most traffic will go to the authz URLs with account.
After this has been deployed for 30 days (the max lifetime of an order),
we can remove support for the old paths.
Part of #7683
Add a new WFE & nonce config field, `NonceHMACKey`, which uses the new
`cmd.HMACKeyConfig` type. Deprecate the `NoncePrefixKey` config field.
Generalize the error message when validating `HMACKeyConfig` in
`config`.
Remove the deprecated `UseDerivablePrefix` config field, which is no
longer used anywhere.
Part of #7632
- Added a new key-value ratelimit
`FailedAuthorizationsForPausingPerDomainPerAccount` which is incremented
each time a client fails a validation.
- As long as capacity exists in the bucket, a successful validation
attempt will reset the bucket back to full capacity.
- Upon exhausting bucket capacity, the RA will send a gRPC to the SA to
pause the `account:identifier`. Further validation attempts will be
rejected by the [WFE](https://github.com/letsencrypt/boulder/pull/7599).
- Added a new feature flag, `AutomaticallyPauseZombieClients`, which
enables automatic pausing of zombie clients in the RA.
- Added a new RA metric `paused_pairs{"paused":[bool],
"repaused":[bool], "grace":[bool]}` to monitor use of this new
functionality.
- Updated `ra_test.go` `initAuthorities` to allow accessing the
`*ratelimits.RedisSource` for checking that the new ratelimit functions
as intended.
Co-authored-by: @pgporada
Fixes https://github.com/letsencrypt/boulder/issues/7738
---------
Co-authored-by: Phil Porada <pporada@letsencrypt.org>
Co-authored-by: Phil Porada <philporada@gmail.com>
Goodkey has two ways to detect a key as weak: it runs a variety of
algorithmic checks (such as Fermat factorization and rocacheck), or the
key can be listed in a "weak key file". Similarly, it has two ways to
detect a key as blocked: it can call a generic function (which we use to
query our database), or the key can be listed in a "blocked key file".
This is two methods too many. Reliance on files of weak or blocked keys
introduces unnecessary complexity to both the implementation and
configuration of the goodkey package. Remove both "key file" options and
delete all code which supported them.
Also remove //test/block-a-key, as it was only used to generate these
test files.
IN-10762 tracked the removal of these files in prod.
Fixes https://github.com/letsencrypt/boulder/issues/7748
TestTraces is designed to test whether our Open Telemetry tracing system
is working: that spans are being output, that they have the appropriate
parents, etc. It should not be testing whether Boulder took a specific
path through its code -- that's the domain of package-specific unit
tests. Simplify TestTraces to the point that it is asserting (nearly)
the bare minimum about the set of operations Boulder performs.
Add a new method, `BatchIncrement`, to issue `IncrBy` (instead of `Set`)
to Redis. This helps prevent the race condition that allows bursts of
near-simultaneous requests to, effectively, spend the same token.
Call this new method when incrementing an existing key. New keys still
need to use `BatchSet` because Redis doesn't have a facility to, within
a single operation, increment _or_ set a default value if none exists.
Add a new feature flag, `IncrementRateLimits`, gating the use of this
new method.
CPS Compliance Review: This feature flag does not change any behaviour
that is described or constrained by our CP/CPS. The closest relation
would just be API availability in general.
Fixes#7780
- Add `Perspective` and `RIR` fields to the remote-va configuration
- Configure RVA ValidationAuthorityImpl instances with the contents of
the JSON configuration
- Configure VA ValidationAuthorityImpl instances with the constant
`va.PrimaryPerspective`
- Log `Perspective` for non-Primary Perspectives, per the MPIC
requirements in section 5.4.1 (2) vii of the BRs. Also log the RIR for
posterity.
- Introduce `ValidationResult` RPC fields `Perspective` and `Rir`, which
are not currently used but will be required for corroboration in #7616
Fixes https://github.com/letsencrypt/boulder/issues/7613
Part of https://github.com/letsencrypt/boulder/issues/7615
Part of https://github.com/letsencrypt/boulder/issues/7616
The CAB/F Debian weak keys (https://github.com/cabforum/Debian-weak-keys)
repository contains a bunch of DER encoded private keys that we should ensure
are blocked. I hacked up the block-a-key tool to output a base64 encoded SPKI
hash from an arbitrary PEM formatted private key file.
This is our only use of MariaDB's "INSERT ... RETURNING" syntax, which
does not exist in MySQL and Vitess. Add a feature flag which removes our
use of this feature, so that we can easily disable it and then re-enable
it if it turns out to be too much of a performance hit.
Also add a benchmark showing that the serial-insertion approach is
slower, but perhaps not debilitatingly so.
Part of https://github.com/letsencrypt/boulder/issues/7718
Add a new SerialPrefixHex field to the CA's config, which takes a
two-character hexadecimal string to use as the serial prefix. This
matches the way that the OCSP Responder's acceptable serial prefixes are
configured, and is easier for human operators to configure than raw
integers.
At the same time, change the type of the CA's internal serial prefix
from `int` to `byte`, using the type system to enforce its 8-bit length.
Fixes#7213
Clean up how we handle identifiers throughout the Boulder codebase by
- moving the Identifier protobuf message definition from sa.proto to
core.proto;
- adding support for IP identifier to the "identifier" package;
- renaming the "identifier" package's exported names to be clearer; and
- ensuring we use the identifier package's helper functions everywhere
we can.
This will make future work to actually respect identifier types (such as
in Authorization and Order protobuf messages) simpler and easier to
review.
Part of https://github.com/letsencrypt/boulder/issues/7311
Begin testing on go1.23. To facilitate this, also update /x/net,
golangci-lint, staticcheck, and pebble-challtestsrv to versions which
support go1.23. As a result of these updates, also fix a handful of new
lint findings, mostly regarding passing non-static (i.e. potentially
user-controlled) format strings into Sprintf-style functions.
Additionally, delete one VA unittest that was duplicating the checks
performed by a different VA unittest, but with a context timeout bug
that caused it to break when go1.23 subtly changed DialContext behavior.
Have the WFE ask the RA for authorizations, rather than asking the SA
directly. This extra layer of indirection allows us to filter out
challenges which have been disabled, so that clients don't think they
can attempt challenges that we have disabled.
Also shuffle the order of challenges within the authz objects rendered
by the API. We used to have code which does this at authz creation time,
but of course that was completely ineffectual once we stored the
challenges as just a bitmap in the database.
Update the WFE unit tests to mock RA.GetAuthorization instead of
SA.GetAuthorization2. This includes making the mock more accurate, so
that (e.g.) valid authorizations contain valid challenges, and the
challenges have their correct types (e.g. "http-01" instead of just
"http"). Also update the OTel tracing test to account for the new RPC.
Part of https://github.com/letsencrypt/boulder/issues/5913
- Add feature flag `UseKvLimitsForNewOrder`
- Add feature flag `UseKvLimitsForNewAccount`
- Flush all Redis shards before running integration or unit tests, this
avoids false positives between local testing runs
Fixes#7664
Blocked by #7676
- Check `CertificatesPerDomain` at newOrder and spend at Finalize time.
- Check `CertificatesPerAccountPerDomain` at newOrder and spend at
Finalize time.
- Check `CertificatesPerFQDNSet` at newOrder and spend at Finalize time.
- Fix a bug
in`FailedAuthorizationsPerDomainPerAccountSpendOnlyTransaction()` which
results in failed authorizations being spent for the exact FQDN, not the
eTLD+1.
- Remove redundant "max names" check at transaction construction time
- Enable key-value rate limits in the RA
- Instruct callers to call *Decision.Result() to check the result of
rate limit transactions
- Preserve the Transaction within the resulting *Decision
- Generate consistently formatted verbose errors using the metadata
found in the *Decision
- Fix broken key-value rate limits integration test in
TestDuplicateFQDNRateLimit
Fixes#7577
Replace all of Boulder's usage of the Go stdlib "math/rand" package with
the newer "math/rand/v2" package which first became available in go1.22.
This package has an improved API and faster performance across the
board.
See https://go.dev/blog/randv2 and https://go.dev/blog/chacha8rand for
details.
Move the two lint-configuration keys, LintConfig and IgnoreLints, from
the top-level CA.Issuance config stanza into each individual
CA.Issuance.CertProfiles stanza. This allows us to have
differently-configured lints for different profiles, to ensure that our
linting regime is as strict as possible.
Without this change, it would be necessary for us to ignore both the
"common name included" and the "no subject key id" lints at the
top-level, when in fact each of those warnings only triggers on one of
our two profiles.
Fixes https://github.com/letsencrypt/boulder/issues/7635
Call `RA.UnpauseAccount` for valid unpause form submissions.
Determine and display the appropriate outcome to the Subscriber based on
the count returned by `RA.UnpauseAccount`:
- If the count is zero, display the "Account already unpaused" message.
- If the count equals the max number of identifiers allowed in a single
request, display a page explaining the need to visit the unpause URL
again.
- Otherwise, display the "Successfully unpaused all N identifiers"
message.
Apply per-request timeout from the SFE configuration.
Part of https://github.com/letsencrypt/boulder/issues/7406
Add three new keys to the CA's ProfileConfig:
- OmitKeyEncipherment causes the keyEncipherment Key Usage to be omitted
from certificates with RSA public keys. We currently include it for
backwards compatibility with TLS 1.1 servers that don't support modern
cipher suites, but this KU is completely useless as of TLS 1.3.
- OmitClientAuth causes the tlsClientAuthentication Extended Key Usage
to be omitted from all certificates. We currently include it to support
any subscribers who may be relying on it, but Root Programs are moving
towards single-purpose hierarchies and its inclusion is being
discouraged.
- OmitSKID causes the Subject Key Identifier extension to be omitted
from all certificates. We currently include this extension because it is
recommended by RFC 5280, but it serves little to no practical purpose
and consumes a large number of bytes, so it is now NOT RECOMMENDED by
the Baseline Requirements.
Make substantive changes to issuer.requestValid and issuer.Prepare to
implement the desired behavior for each of these options. Make a very
slight change to ra.matchesCSR to generally allow for serverAuth-only
EKUs. Improve the unit tests of both the //ca and //issuance packages to
cover the new behavior.
Part of https://github.com/letsencrypt/boulder/issues/7610
One of our goals with profiles is to allow different profiles to have
different validity periods. While the profiles already had the ability
to enforce different maximum backdates and validities, the CA still had
separate global configuration for what the backdate and validity period
should actually be.
Move the computation of the notBefore and notAfter timestamps into the
issuance package, so that it can be based on the profile's configured
backdate and validity durations. Deprecate the global "backdate" and
"expiry" config fields, as they are no longer used. Finally, add more
validation for the profile's backdate and validity.
Part of https://github.com/letsencrypt/boulder/issues/7610
Add a new profile config key named "OmitCommonName" which, if set to
`true`, causes the issuance package to exclude the CN from the resulting
certificate even if the initiating IssuanceRequest specified one.
Deprecate the old "AllowCommonName" config key, so that it no longer has
any effect, rather than causing the issuance package to fully reject
IssuanceRequests containing a CN.
This allows for more graceful variation between profiles, since we know
that excluding the Common Name is always safe.
Part of https://github.com/letsencrypt/boulder/issues/7610
Change the way profiles are configured at the WFE to allow them to be
accompanied by descriptive strings. Augment the construction of the
directory resource's "meta" sub-object to include these profile names
and descriptions.
This config swap is safe, since no Boulder WFE instance is configured
with `CertificateProfileNames` yet.
Fixes https://github.com/letsencrypt/boulder/issues/7602
These profile variables are set to "true" everywhere, and we have no
intention of ever setting them to "false" anywhere. Deprecate them so
that they can be removed in the future, and to reduce the chances of
confusion when new profile variables are introduced in the near future.
Part of https://github.com/letsencrypt/boulder/issues/7610
This change guarantees compliance with CA/BF Ballot SC-073 "Compromised
and Weak Keys", which requires that at least 100 rounds of Fermat
Factorization be attempted:
> Section 6.1.1.3 Subscriber Key Pair Generation
> The CA SHALL reject a certificate request if... The Public Key
corresponds to an industry-demonstrated weak Private Key. For requests
submitted on or after November 15, 2024,... In the case of Close Primes
vulnerability (https://fermatattack.secvuln.info/), the CA SHALL reject
weak keys which can be factored within 100 rounds using Fermat’s
factorization method.
We choose 110 rounds to ensure a margin above and beyond the requirements.
Fixes https://github.com/letsencrypt/boulder/issues/7558
Adds a new boulder component named `sfe` aka the Self-service FrontEnd
which is dedicated to non-ACME related Subscriber functions. This change
implements one such function which is a web interface and handlers for
account unpausing.
When paused, an ACME client receives a log line URL with a JWT parameter
from the WFE. For the observant Subscriber, manually clicking the link
opens their web browser and displays a page with a pre-filled HTML form.
Upon clicking the form button, the SFE sends an HTTP POST back to itself
and either validates the JWT and issues an RA gRPC request to unpause
the account, or returns an HTML error page.
The SFE and WFE should share a 32 byte seed value e.g. the output of
`openssl rand -hex 16` which will be used as a go-jose symmetric signer
using the HS256 algorithm. The SFE will check various [RFC
7519](https://datatracker.ietf.org/doc/html/rfc7519) claims on the JWT
such as the `iss`, `aud`, `nbf`, `exp`, `iat`, and a custom `apiVersion`
claim.
The SFE should not yet be relied upon or deployed to staging/production
environments. It is very much a work in progress, but this change is big
enough as-is.
Related to https://github.com/letsencrypt/boulder/issues/7406
Part of https://github.com/letsencrypt/boulder/issues/7499
I occasionally receive timeouts due to pkilint being unresponsive during
local integration tests. Typically this happens after rebooting my
machine, with no containers previously running due to the reboot, and no
container data in disk/memory cache.
Example timeout
```
16:14:40.485848 3 boulder-ca _PeZ5w0 [AUDIT] Preparing precert failed: issuer=[int rsa b] serial=[7f2ba75acba0b729fc4e1ba5e2f6aacd5921] regID=[1] names=[rand.3ce2c964.xyz] certProfileName=[defaultBoulderCertificateProfile] certProfileHash=[de4c8c8866ed46b1d4af0d79e6b7ecf2d1ea625e26adcbbd3979ececd8fbd05a] err=[tbsCertificate linting failed: failed lint(s): e_pkilint_lint_cabf_serverauth_cert (making POST request to pkilint API: Post "http://10.77.77.9/certificate/cabf-serverauth": context deadline exceeded)]
```
The `not_valid_after` property was deprecated in favor of
`not_valid_after_utc`. Both return a timestamp in UTC time so this seems
like a safe lateral move. See
[here](https://cryptography.io/en/latest/x509/reference/#cryptography.x509.Certificate.not_valid_after)
for more information.
```
/boulder/test/v2_integration.py:1405: CryptographyDeprecationWarning: Properties that return a naïve datetime object have been deprecated. Please switch to not_valid_after_utc.
```
- Rename `NewOrderRequest` field `LimitsExempt` to `IsARIRenewal`
- Introduce a new `NewOrderRequest` field, `IsRenewal`
- Introduce a new (temporary) feature flag, `CheckRenewalExemptionAtWFE`
WFE:
- Perform renewal detection in the WFE when `CheckRenewalExemptionAtWFE`
is set
- Skip (key-value) `NewOrdersPerAccount` and `CertificatesPerDomain`
limit checks when renewal detection indicates the the order is a
renewal.
RA:
- Leave renewal detection in the RA intact
- Skip renewal detection and (legacy) `NewOrdersPerAccount` and
`CertificatesPerDomain` limit checks when `CheckRenewalExemptionAtWFE`
is set and the `NewOrderRequest` indicates that the order is a renewal.
Fixes#7508
Part of #5545
- Shrink the number of public `ratelimits` methods by relocating two
sizeable transaction constructors. Simplify the spend and refund
call-sites in the WFE.
- Spend calls now block instead of being called asynchronously.
`ECDSAForAll` feature is now enabled by default (due to it not being
referenced in any issuance path) and as a result the `ECDSAAllowlist`
has been deleted.
Fixes https://github.com/letsencrypt/boulder/issues/7535
This change removes the ECDSAAllowList entry and enables ECDSAForAll for
the `test/config/ca.json` to match the configuration in
`test/config-next/ca.json`. A future change will remove ECDSAAllowList
and ECDSAForAll permanently.
Part of https://github.com/letsencrypt/boulder/issues/7535
Fix three unit tests which have been flakily failing for the last
several weeks:
//test/load-generator/acme: TestNew/unreachable_directory_URL
Fixed by changing the error checking code to care only about the
underlying "connection refused" message, and not the IP address from
which it was receieved.
//va: TestHTTPDialTimeout
Fixed by correcting the error checking code to look for "network is
unreachable" instead of "Network unreachable"
//va: TestFetchHTTP/Broken_IPv6_only
Fixed by making the expected error message more specific -- it was
previously looking for "Error getting validation data", which is the
message that `detailedError` gives for errors it doesn't recognize. An
underlying library has changed to provide an error type that
`detailedError` now recognizes as a connection error.
This allows us to give a user-meaningful error about malformed names
early on, instead of propagating internal errors from the new rate
limiting system.
This moves the well-formedness logic from `WillingToIssue` into a new
function `WellFormedDomainNames`, which calls `ValidDomain` on each name
and combines the errors into suberrors if there is more than one.
`WillingToIssue` now calls `WellFormedDomainNames` to keep the existing
behavior. Additionally, WFE calls `WellFormedDomainNames` before
checking rate limits.
This creates a slight behavior change: If an order contains both
malformed domain names and wellformed but blocked domain names,
suberrors will only be generated for the malformed domain names. This is
reflected in the changes to `TestWillingToIssue_Wildcard`.
Adds a WFE test case for receiving malformed identifiers in a new-order
request.
Follows up on #3323 and #7218Fixes#7526
Some small incidental fixes:
- checkWildcardHostList was checking `pa.blocklist` for `nil` before
accessing `pa.wildcardExactBlocklist`. Fix that.
- move table test for WillingToIssue into a new test case for
WellFormedDomainNames
- move two standalone test cases into the big table test
This was introduced in config-next in #7441, and has been working well.
We should run it in the mainline tests as well.
No production config change is necessary.
These flags have been true and false, respectively, for years. We do not
expect to change them at any time in the future, and their continued
existence makes certain parts of the VA code significantly more complex.
Remove all references to them, preserving behavior in the "enforce, but
not full results" configuration.
IN-10358 tracks the corresponding config changes
Replaced our embeds of foopb.UnimplementedFooServer with
foopb.UnsafeFooServer. Per the grpc-go docs this reduces the "forwards
compatibility" of our implementations, but that is only a concern for
codebases that are implementing gRPC interfaces maintained by third
parties, and which want to be able to update those third-party
dependencies without updating their own implementations in lockstep.
Because we update our protos and our implementations simultaneously, we
can remove this safety net to replace runtime type checking with
compile-time type checking.
However, that replacement is not enough, because we never pass our
implementation objects to a function which asserts that they match a
specific interface. So this PR also replaces our reflect-based unittests
with idiomatic interface assertions. I do not view this as a perfect
solution, as it relies on people implementing new gRPC servers to add
this line, but it is no worse than the status quo which relied on people
adding the "TestImplementation" test.
Fixes https://github.com/letsencrypt/boulder/issues/7497
Update the version of protoc-gen-go-grpc that we use to generate Go gRPC
code from our proto files, and update the versions of other gRPC tools
and libraries that we use to match. Turn on the new
`use_generic_streams` code generation flag to change how
protoc-gen-go-grpc generates implementations of our streaming methods,
from creating a wholly independent implementation for every stream to
using shared generic implementations.
Take advantage of this code-sharing to remove our SA "wrapper" methods,
now that they have truly the same signature as the SARO methods which
they wrap. Also remove all references to the old-style stream names
(e.g. foopb.FooService_BarMethodClient) and replace them with the new
underlying generic names, for the sake of consistency. Finally, also
remove a few custom stream test mocks, replacing them with the generic
mocks.ServerStreamClient.
Note that this PR does not change the names in //mocks/sa.go, to avoid
conflicts with work happening in the pursuit of
https://github.com/letsencrypt/boulder/issues/7476. Note also that this
PR updates the version of protoc-gen-go-grpc that we use to a specific
commit. This is because, although a new release of grpc-go itself has
been cut, the codegen binary is a separate Go module with its own
releases, and it hasn't had a new release cut yet. Tracking for that is
in https://github.com/grpc/grpc-go/issues/7030.
Give akamai-purger a new "Throughput.TotalInstances" config value, to
inform it how many instances of itself are competing for akamai rate
limit quote. Combine the `useOptimizedDefaults` and `validate` functions
into a single `optimizeAndValidate` function which sets default values
according to the number of active instances, and confirms that the
results still fall within the rate limits.
Fixes https://github.com/letsencrypt/boulder/issues/7487
Remove the redis-tls, wfe-tls, and mail-test-srv keys which were
generated by minica and then checked in to the repo. All three are
replaced by the dynamically-generated ipki directory.
Part of https://github.com/letsencrypt/boulder/issues/7476
The summary here is:
- Move test/cert-ceremonies to test/certs
- Move .hierarchy (generated by the above) to test/certs/webpki
- Remove our mapping of .hierarchy to /hierarchy inside docker
- Move test/grpc-creds to test/certs/ipki
- Unify the generation of both test/certs/webpki and test/certs/ipki
into a single script at test/certs/generate.sh
- Make that script the entrypoint of a new docker compose service
- Have t.sh and tn.sh invoke that service to ensure keys and certs are
created before tests run
No production changes are necessary, the config changes here are just
for testing purposes.
Part of https://github.com/letsencrypt/boulder/issues/7476
* Adds a `VerifyGRPCClientCertIfGiven` boolean to the `remoteva` config
that cause the RVA server to use the less strict
`tls.VerifyClientCertIfGiven` for use with an Amazon Web Services
Application Load Balancer (ALB) between the `boulder-va` and `remoteva`.
See https://github.com/letsencrypt/boulder/issues/7386.
Part of https://github.com/letsencrypt/boulder/issues/5294
---------
Co-authored-by: Samantha <hello@entropy.cat>
* Adds a new `remoteva` binary that takes a distinct configuration from
the existing `boulder-va`
* Removed the `boulder-remoteva` name registration from `boulder-va`.
* Existing users of `boulder-remoteva` must either
1. laterally migrate to `boulder-va` which uses that same config, or
2. switch to using `remoteva` with a new config.
Part of https://github.com/letsencrypt/boulder/issues/5294
Remove the CA's global "crldpBase" config item, and the code which used
it to produce a IDP URI in our CRLs if it was configured.
This config item has been replaced by per-issuer crlURLBase configs
instead, because we have switched our CRL URL format from
"commonURL/issuerID/shard.crl" to "issuerURL/shard.crl" in anticipation
of including these URLs directly in our end-entity certs.
IN-10046 tracked the corresponding change in prod
Add a new "LintConfig" item to the CA's config, which can point to a
zlint configuration toml file. This allows lints to be configured, e.g.
to control the number of rounds of factorization performed by the Fermat
factorization lint.
Leverage this new config to create a new custom zlint which calls out to
a configured pkilint API endpoint. In config-next integration tests,
configure the lint to point at a new pkilint docker container.
This approach has three nice forward-looking features: we now have the
ability to configure any of our lints; it's easy to expand this
mechanism to lint CRLs when the pkilint API has support for that; and
it's easy to enable this new lint if we decide to stand up a pkilint
container in our production environment.
No production configuration changes are necessary at this time.
Fixes https://github.com/letsencrypt/boulder/issues/7430
We first introduced caa-log-checker as a remediation item in the wake of
https://bugzilla.mozilla.org/show_bug.cgi?id=1619047. Since that time,
we have upgraded to go1.22, which completely remoes the class of bug
which led to that incident (https://tip.golang.org/doc/go1.22#language).
Throughout its life, caa-log-checker was an operational burden, and was
at best a post-hoc check to detect issues after they had already
occurred. Therefore, we no longer run it in our production environment,
and it can be removed from the Boulder source.
Replace the CA's "useForRSA" and "useForECDSA" config keys with a single
"active" boolean. When the CA starts up, all active RSA issuers will be
used to issue precerts with RSA pubkeys, and all ECDSA issuers will be
used to issue precerts with ECDSA pubkeys (if the ECDSAForAll flag is
true; otherwise just those that are on the allow-list). All "inactive"
issuers can still issue OCSP responses, CRLs, and (notably) final
certificates.
Instead of using the "useForRSA" and "useForECDSA" flags, plus implicit
config ordering, to determine which issuer to use to handle a given
issuance, simply use the issuer's public key algorithm to determine
which issuances it should be handling. All implicit ordering
considerations are removed, because the "active" certificates now just
form a pool that is sampled from randomly.
To facilitate this, update some unit and integration tests to be more
flexible and try multiple potential issuing intermediates, particularly
when constructing OCSP requests.
For this change to be safe to deploy with no user-visible behavior
changes, the CA configs must contain:
- Exactly one RSA-keyed intermediate with "useForRSALeaves" set to true;
and
- Exactly one ECDSA-keyed intermediate with "useForECDSALeaves" set to
true.
If the configs contain more than one intermediate meeting one of the
bullets above, then randomized issuance will begin immediately.
Fixes https://github.com/letsencrypt/boulder/issues/7291
Fixes https://github.com/letsencrypt/boulder/issues/7290
Update the hierarchy which the integration tests auto-generate inside
the ./hierarchy folder to include three intermediates of each key type,
two to be actively loaded and one to be held in reserve. To facilitate
this:
- Update the generation script to loop, rather than hard-coding each
intermediate we want
- Improve the filenames of the generated hierarchy to be more readable
- Replace the WFE's AIA endpoint with a thin aia-test-srv so that we
don't have to have NameIDs hardcoded in our ca.json configs
Having this new hierarchy will make it easier for our integration tests
to validate that new features like "unpredictable issuance" are working
correctly.
Part of https://github.com/letsencrypt/boulder/issues/729
This change introduces a new config key `certProfiles` which contains a
map of `profiles`. Only one of `profile` or `certProfiles` should be
used, because configuring both will result in the CA erroring and
shutting down. Further, the singular `profile` is now
[deprecated](https://github.com/letsencrypt/boulder/issues/7414).
The CA pre-computes several maps at startup;
* A human-readable name to a `*issuance.Profile` which is referred to as
"name".
* A SHA-256 sum over the entire contents of the given profile to the
`*issuance.Profile`. We'll refer to this as "hash".
Internally, CA methods no longer pass an `*issuance.Profile`, instead
they pass a structure containing maps of certificate profile
identifiers. To determine the default profile used by the CA, a new
config field `defaultCertificateProfileName` has been added to the
Issuance struct. Absence of `defaultCertificateProfileName` will cause
the CA to use the default value of `defaultBoulderCertificateProfile`
such as for the the deprecated `profile`. The key for each given
certificate profile will be used as the "name". Duplicate names or
hashes will cause the CA to error during initialization and shutdown.
When the RA calls `ra.CA.IssuePrecertificate`, it will pass an arbitrary
certificate profile name to the CA triggering the CA to lookup if the
name exists in its internal mapping. The RA maintains no state or
knowledge of configured certificate profiles and relies on the CA to
provide this information. If the name exists in the CA's map, it will
return the hash along with the precertificate bytes in a
`capb.IssuePrecertificateResponse`. The RA will then call
`ra.CA.IssueCertificateForPrecertificate` with that same hash. The CA
will lookup the hash to determine if it exists in its map, and if so
will continue on with certificate issuance.
Precertificate and certificate issuance audit logs will now include the
certificate profile name and hex representation of the hash that they
were issued with.
Fixes https://github.com/letsencrypt/boulder/issues/6966
There are no required config or SQL changes.
We had disabled our lints on go1.22 because golangci-lint and
staticcheck didn't work with some of its updates. Re-enable them, and
fix the things which the updated linters catch now.
Fixes https://github.com/letsencrypt/boulder/issues/7229
Upgrade from the old go-jose v2.6.1 to the newly minted go-jose v4.0.1.
Cleans up old code now that `jose.ParseSigned` can take a list of
supported signature algorithms.
Fixes https://github.com/letsencrypt/boulder/issues/7390
---------
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
We have moved entirely to go1.22 in prod. This also allows us to remove
setting loopvar from our CI tasks, since it is the default behavior as
of go1.22.
Adds `certificateProfileName` to the `orders` database table. The
[maximum
length](https://github.com/letsencrypt/boulder/pull/7325/files#diff-a64a0af7cbf484da8e6d08d3eefdeef9314c5d9888233f0adcecd21b800102acR35)
of a profile name matches the `//issuance` package.
Adds a `MultipleCertificateProfiles` feature flag that, when enabled,
will store the certificate profile name from a `NewOrderRequest`. The
certificate profile name is allowed to be empty and the database will
treat that row as [NULL](https://mariadb.com/kb/en/null-values/). When
the SA retrieves this potentially NULL row, it will be cast as the
golang string zero value `""`.
SRE ticket IN-10145 has been filed to perform the database migration and
enable the new feature flag. The migration must be performed before
enabling the feature flag.
Part of https://github.com/letsencrypt/boulder/issues/7324
- Parse and validate the `profile` field in `newOrder` requests.
- Pass the `profile` field from `newOrder` calls to the resulting
`RA.NewOrder` call.
- When the client requests a specific profile, ensure that the profile
field is populated in the order returned.
Fixes#7332
Part of #7309
This adds a new --write flag which will write out the formatted JSON
files.
By default this command now checks if the files are properly formatted
and prints a list of unformatted files.
This avoids the problem of lints failing if there are uncommited
changes, and decouples this check from git.
By using a proper argument parsing library, we also get a good --help
flag.
- Update the failed authorizations limit to use 'enum:regId:domain' for
transactions while maintaining 'enum:regId' for overrides.
- Modify the failed authorizations transaction builder to generate a
transaction for each order name.
- Rename the `FailedAuthorizationsPerAccount` enum to
`FailedAuthorizationsPerDomainPerAccount` to align with its corrected
implementation. This change is possible because the limit isn't yet
deployed in staging or production.
Blocks #7346
Part of #5545
We run the RVAs in AWS, where we don't have all the same service
discovery infrastructure we do for the primary VAs and the rest of
Boulder. The solution for populating SRV records we have today hasn't
been reliable, so we'd like to experiment with bringing up RVAs paired
1:1 with a local DNS resolver. This brings back some of the previous
static DNS resolver configuration, though it's not a clean revert
because other configuration has changed in the meantime
Remove three deprecated feature flags which have been removed from all
production configs:
- StoreLintingCertificateInsteadOfPrecertificate
- LeaseCRLShards
- AllowUnrecognizedFeatures
Deprecate three flags which are set to true in all production configs:
- CAAAfterValidation
- AllowNoCommonName
- SHA256SubjectKeyIdentifier
IN-9879 tracked the removal of these flags.
Move the CRL issuance logic -- building an x509.RevocationList template,
populating it with correctly-built extensions, linting it, and actually
signing it -- out of the //ca package and into the //issuance package.
This means that the CA's CRL code no longer needs to be able to reach
inside the issuance package to access its issuers and certificates (and
those fields will be able to be made private after the same is done for
OCSP issuance).
Additionally, improve the configuration of CRL issuance, create
additional checks on CRL's ThisUpdate and NextUpdate fields, and make it
possible for a CRL to contain two IssuingDistributionPoint URIs so that
we can migrate to shorter addresses.
IN-10045 tracks the corresponding production changes.
Fixes https://github.com/letsencrypt/boulder/issues/7159
Part of https://github.com/letsencrypt/boulder/issues/7296
Part of https://github.com/letsencrypt/boulder/issues/7294
Part of https://github.com/letsencrypt/boulder/issues/7094
Part of https://github.com/letsencrypt/boulder/issues/7100
The `//ca/ca_test.go` `setup` function will now create issuers that each
have a unique private key from `//test/hierarchy/`, rather than multiple
issuers sharing a private key. This was spotted while reviewing an [OCSP
test](10e894a172/ca/ocsp_test.go (L53-L87)).
Some now unnecessary key material has been deleted from `//test/`.
Fixes https://github.com/letsencrypt/boulder/issues/7304
Create a new administration tool "bin/admin" as a successor to and
replacement of "admin-revoker".
This new tool supports all the same fundamental capabilities as the old
admin-revoker, including:
- Revoking by serial, by batch of serials, by incident table, and by
private key
- Blocking a key to let bad-key-revoker take care of revocation
- Clearing email addresses from all accounts that use them
Improvements over the old admin-revoker include:
- All commands run in "dry-run" mode by default, to prevent accidental
executions
- All revocation mechanisms allow setting the revocation reason,
skipping blocking the key, indicating that the certificate is malformed,
and controlling the number of parallel workers conducting revocation
- All revocation mechanisms do not parse the cert in question, leaving
that to the RA
- Autogenerated usage information for all subcommands
- A much more modular structure to simplify adding more capabilities in
the future
- Significantly simplified tests with smaller mocks
The new tool has analogues of all of admin-revokers unit tests, and all
integration tests have been updated to use the new tool instead. A
future PR will remove admin-revoker, once we're sure SRE has had time to
update all of their playbooks.
Fixes https://github.com/letsencrypt/boulder/issues/7135
Fixes https://github.com/letsencrypt/boulder/issues/7269
Fixes https://github.com/letsencrypt/boulder/issues/7268
Fixes https://github.com/letsencrypt/boulder/issues/6927
Part of https://github.com/letsencrypt/boulder/issues/6840
When passing detailed error information between services as gRPC
metadata, ensure that the suberrors being sent contain only ascii
characters, because gRPC metadata is sent as HTTP headers which only
allow visible ascii characters.
Also add a regression test.
Part of #7245.
There are still a few places that use 10.88.88.88 that will be harder to
remove. In particular, some of the Python integration tests start up
their own HTTP servers that differ from challtestsrv in some important
way (like timing out requests). Because challtestsrv already binds to
10.77.77.77:80, those test servers need a different IP address to bind
to. We can probably solve that but I'll leave it for another PR.
The RA.AdministrativelyRevokeCertificate method has two primary modes of
operation: if a certificate DER blob is provided, it parses and extracts
information from that blob, and revokes the cert; if no DER is provided,
it assumes the cert is malformed, and revokes it (but doesn't do an OCSP
cache purge) based on the serial alone. However, this scheme has
slightly confusing semantics in the RA and requires that the admin
tooling look up the certificates to provide them to the RA.
Instead, add a new "malformed" field to the RA's
AdministrativelyRevokeCertificateRequest, and deprecate the "cert" field
of that same request. When the malformed boolean is false, the RA will
look up and parse the certificate itself. When the malformed field is
true, it will revoke the cert based on serial alone.
Note that the main logic of AdministrativelyRevokeCertificate -- namely
revoking, potentially re-revoking, doing an akamai cache purge, etc --
is not changed by this PR. The only thing that changes here is how the
RA gets access to the to-be-revoked certificate's information.
Part of https://github.com/letsencrypt/boulder/issues/7135
Previously, `va.IsCAAValid` would only check CAA records from the
primary VA during initial domain control validation, completely ignoring
any configured RVAs. The upcoming
[MPIC](https://github.com/ryancdickson/staging/pull/8) ballot will
require that it be done from multiple perspectives. With the currently
deployed [Multi-Perspective
Validation](https://letsencrypt.org/2020/02/19/multi-perspective-validation.html)
in staging and production, this change brings us in line with the
[proposed phase
3](https://github.com/ryancdickson/staging/pull/8/files#r1368708684).
This change reuses the existing
[MaxRemoteValidationFailures](21fc191273/cmd/boulder-va/main.go (L35))
variable for the required non-corroboration quorum.
> Phase 3: June 15, 2025 - December 14, 2025 ("CAs MUST implement MPIC
in blocking mode*"):
>
> MUST implement MPIC? Yes
> Required quorum?: Minimally, 2 remote perspectives must be used. If
using less than 6 remote perspectives, 1 non-corroboration is allowed.
If using 6 or more remote perspectives, 2 non-corroborations are
allowed.
> MUST block issuance if quorum is not met: Yes.
> Geographic diversity requirements?: Perspectives must be 500km from 1)
the primary perspective and 2) all other perspectives used in the
quorum.
>
> * Note: "Blocking Mode" is a nickname. As opposed to "monitoring mode"
(described in the last milestone), CAs MUST NOT issue a certificate if
quorum requirements are not met from this point forward.
Adds new VA feature flags:
* `EnforceMultiCAA` instructs a primary VA to command each of its
configured RVAs to perform a CAA recheck.
* `MultiCAAFullResults` causes the primary VA to block waiting for all
RVA CAA recheck results to arrive.
Renamed `va.logRemoteValidationDifferentials` to
`va.logRemoteDifferentials` because it can handle initial domain control
validations and CAA rechecking with minimal editing.
Part of https://github.com/letsencrypt/boulder/issues/7061
When a client is attempting to open a new Order which is identical to an
already-issued certificate, allow that request to bypass the normal New
Orders rate limit. This will allow renewals to go through even when a
client is exhibiting other bad behavior. This should not open the door
to floods of requests for the same certificate in rapid success, as the
Duplicate Certificates rate limit will still block those.
Fixes https://github.com/letsencrypt/boulder/issues/6792
This is necessary in order for build.sh to download the correct version
of protoc.
This bug was introduced by
https://github.com/letsencrypt/boulder/pull/7205, which inserted another
"FROM" clause between the top of the file (where TARGETPLATFORM was
originally pulled in) and the point where build.sh is executed.
These names corresponded to single instances of a service, and were
primarily used for (a) specifying which interface to bind a gRPC port on
and (b) allowing `health-checker` to check individual instances rather
than a service as a whole.
For (a), change the `--grpc-addr` flags to bind to "all interfaces." For
(b), provide a specific IP address and port for health checking. This
required adding a `--hostOverride` flag for `health-checker` because the
service certificates contain hostname SANs, not IP address SANs.
Clarify the situation with nonce services a little bit. Previously we
had one nonce "service" in Consul and got nonces from that (i.e.
randomly between the two nonce-service instances). Now we have two nonce
services in consul, representing multiple datacenters, and one of them
is explicitly configured as the "get" service, while both are configured
as the "redeem" service.
Part of #7245.
Note this change does not yet get rid of the rednet/bluenet distinction,
nor does it get rid of all use of 10.88.88.88. That will be a followup
change.
Replace the python "codespell" tool with the rust "typos" tool.
To accomplish this, add a new rust-based step to the boulder-tools
docker build process, with some complexity to handle builds on
multiple developer architectures.
Co-authored-by: Viktor Szépe <viktor@szepe.net>
Enable the atomicalign, deepequalerrors, findcall, nilness,
reflectvaluecompare, sortslice, timeformat, and unusedwrite go vet
analyzers, which golangci-lint does not enable by default. Additionally,
enable new go vet analyzers by default as they become available.
The fieldalignment and shadow analyzers remain disabled because they
report so many errors that they should be fixed in a separate PR.
Note that the nilness analyzer appears to have found one very real bug
in tlsalpn.go.
Upgrade to zlint v3.6.0
Two new lints are triggered in various places:
aia_contains_internal_names is ignored in integration test
configurations, and unit tests are updated to have more realistic URLs.
The w_subject_common_name_included lint needs to be ignored where we'd
ignored n_subject_common_name_included before.
Related to https://github.com/letsencrypt/boulder/issues/7261
This partially reverts commit 20b121138c,
which was landed in https://github.com/letsencrypt/boulder/pull/7254.
Specifically, it reverts the addition of "noWaitForReady" to the
health-checker's gRPC config. This appears to stop the flaky `last
resolver error: produced zero addresses` failures we've been seeing in
the CI integration tests.
When we have a problem with our authentication certificates, it's better
to get a clear error early than to wait for health checker to time out.
Also, set noWaitForReady in the config, which prevents detailed errors
from being obscured by "timed out" errors.
Part of #7245.
This just provides a unique port for each instance, and breaks the
service<->port mapping. A subsequent PR will move to listening on the
same IP.
Remove unused `-b` variants of crl-storer and akamai-purger.
The new port scheme is that the first instance of a service is on `93xx`
and the second instance of a service is on `94xx`.
Part of a stacked change with #7243.
While working on https://github.com/letsencrypt/boulder/pull/7238, I dug
into why the consul services config has, for example, `[ca-a, ca-b]` in
addition to `[ca1, ca2]`. Boulder test configs use `ca.service.consul`
which will return both CAs (`[ca-a, ca-b]`). For `[ca1, ca2]` though, a
grpc load balancing [integration
test](a55bf19ea0/test/integration-test.py (L121-L143))
individually targets services such as to verify that each backend is
working correctly.
Change the max value of the CA's `SerialPrefix` config value from 255 (a
byte of all 1s) to 127 (a byte of one 0 followed by seven 1s). This
prevents the serial prefix from ever beginning with a 1.
This is important because serials are interpreted as signed
(twos-complement) integers, and are required to be positive -- a serial
whose first bit is 1 is considered to be negative and therefore in
violation of RFC 5280. The go stdlib fixes this for us by prepending a
zero byte to any serial that begins with a 1 bit, but we'd prefer all
our serials to be the same length.
Corresponding config change was completed in IN-9880.
These feature flags are no longer referenced in any test, staging, or
production configuration. They were removed in:
- StoreRevokerInfo: IN-8546
- ROCSPStage6 and ROCSPStage7: IN-8886
- CAAValidationMethods and CAAAccountURI: IN-9301
Replace the current three-piece setup (enum of feature variables, map of
feature vars to default values, and autogenerated bidirectional maps of
feature variables to and from strings) with a much simpler one-piece
setup: a single struct with one boolean-typed field per feature. This
preserves the overall structure of the package -- a single global
feature set protected by a mutex, and Set, Reset, and Enabled methods --
although the exact function signatures have all changed somewhat.
The executable config format remains the same, so no deployment changes
are necessary. This change does deprecate the AllowUnrecognizedFeatures
feature, as we cannot tell the json config parser to ignore unknown
field names, but that flag is set to False in all of our deployment
environments already.
Fixes https://github.com/letsencrypt/boulder/issues/6802
Fixes https://github.com/letsencrypt/boulder/issues/5229
Previously we made these a single `RUN` step in the Dockerfile to reduce
the size of the final image. Docker pulls all the dependent layers for
an image, which means that even if you delete intermediate build files
in a later `RUN` step, they still contribute to the overall download
size. You can work around that by deleting the intermediate files within
a single `RUN` step.
However, that has downsides: changing one Go dependency meant
downloading Go and all the other dependencies again. By moving these
back into `RUN` steps we get incremental builds, which are nice. And by
adding the builder pattern (`FROM ... AS godeps`), we can avoid having
intermediate files contribute to the overall image size.
This solves a few problems:
- When producing a new revision of boulder-tools, it often requires
multiple iterations to get it right. This provides a straightforward
path to build those iterations without trying to upload them to a Docker
repository each time.
- It's no longer necessary to produce dev container images in addition
to CI container images. Dev images are built on-demand and cached.
- Cross builds are no longer needed unless building the CI images on
non-amd64.
For third-party integration tests that do `docker compose up`, this may
result in longer build times if they are rebuilding from scratch each
time. That can be improved by keeping docker cache around.
The servers are invoked such that they have to look up their service
names in DNS in order to bind a port. This means that when consul is
down, they take a long time to start up- they are timing out the query.
In the meantime there are a number of messages about timed out health
checks. This winds up obscuring the real error, so let's do a quick DNS
check at startup and give a more meaningful error.
minica by default sets restrictive permissions on the directories it
makes. This produced confusing behavior after regenerating keys: the
`bconsul` container failed to start up because it couldn't access its
TLS keys, which led to other errors during startservers.
Many services already have --addr and/or --debug-addr flags.
However, it wasn't universal, so this PR adds flags to commands where
they're not currently present.
This makes it easier to use a shared config file but listen on different
ports, for running multiple instances on a single host.
The config options are made optional as well, and removed from
config-next/.
- Adds a feature flag to gate rollout for SHA256 Subject Key Identifiers
for end-entity certificates.
- The ceremony tool will now use the RFC 7093 section 2 option 1 method
for generating Subject Key Identifiers for future root CA, intermediate
CA, and cross-sign ceremonies.
- - - -
[RFC 7093 section 2 option
1](https://datatracker.ietf.org/doc/html/rfc7093#section-2) provides a
method for generating a truncated SHA256 hash for the Subject Key
Identifier field in accordance with Baseline Requirement [section
7.1.2.11.4 Subject Key
Identifier](90a98dc7c1/docs/BR.md (712114-subject-key-identifier)).
> [RFC5280] specifies two examples for generating key identifiers from
> public keys. Four additional mechanisms are as follows:
>
> 1) The keyIdentifier is composed of the leftmost 160-bits of the
> SHA-256 hash of the value of the BIT STRING subjectPublicKey
> (excluding the tag, length, and number of unused bits).
The related [RFC 5280 section
4.2.1.2](https://datatracker.ietf.org/doc/html/rfc5280#section-4.2.1.2)
states:
> For CA certificates, subject key identifiers SHOULD be derived from
> the public key or a method that generates unique values. Two common
> methods for generating key identifiers from the public key are:
> ...
> Other methods of generating unique numbers are also acceptable.
To simplify deployment of the log validator, this allows wildcards
(using go's filepath.Glob) to be included in the file paths.
In order to detect new files, a new background goroutine polls the glob
patterns every minute for matches.
Because the "monitor" function is running in its own goroutine, a lock
is needed to ensure it's not trying to add new tailers while shutdown is
happening.
The RequireCommonName feature flag was our only "inverted" feature flag,
which defaulted to true and had to be explicitly set to false. This
inversion can lead to confusion, especially to readers who expect all Go
default values to be zero values. We plan to remove the ability for our
feature flag system to support default-true flags, which the existence
of this flag blocked. Since this flag has not been set in any real
configs, inverting it is easy.
Part of https://github.com/letsencrypt/boulder/issues/6802
This value is always set to the empty string in prod, which (correctly)
results in the issued certificates not having a CRLDP at all. It turns
out our integration test environment has been including CRLDPs in all of
our test certs because we set crlURL to a non-empty value! This change
updates our test configs to match reality.
I'll remove the code which supports this config value as part of my
upcoming CA CRLDP changes.