Change most functions in `ratelimits` to use full ACMEIdentifier(s) as
arguments, instead of using their values as strings. This makes the
plumbing from other packages more consistent, and allows us to:
Rename `FQDNsToETLDsPlusOne` to `coveringIdentifiers` and handle IP
identifiers, parsing IPv6 addresses into their covering /64 prefixes for
CertificatesPerDomain[PerAccount] bucket keys.
Port improved IP/CIDR validation logic to NewRegistrationsPerIPAddress &
PerIPv6Range.
Rename `domain` parts of bucket keys to either `identValue` or
`domainOrCIDR`.
Rename other internal functions to clarify that they now handle
identifier values, not just domains.
Add the new reserved IPv6 address range from RFC 9780.
For deployability, don't (yet) rename rate limits themselves; and
because it remains the name of the database table, preserve the term
`fqdnSets`.
Fixes#8223
Part of #7311
All of the identifiers being passed into the bucket construction helpers
have already passed through policy.WellFormedIdentifiers in the WFE. We
can trust that function, and our own ability to construct bucket keys,
to reduce the amount of revalidation we do before sending bucket keys to
redis.
The validateIdForName function is still used to validate override bucket
keys loaded from yaml.
Remove static IPs from services that can be reached by their service
name. Remove consulnet and redisnet, and have the services which
connected to those network connect directly to bouldernet instead.
Instruct docker-compose to only dynamically allocate IPs from the upper
half of the bouldernet subset, to avoid clashing with the few static IPs
we still specify.
Add `pa.validIP` to test IP address validity & absence from IANA
reservations.
Modify `pa.WillingToIssue` and `pa.WellFormedIdentifiers` to support IP
address identifiers.
Add a map of allowed identifier types to the `pa` config.
Part of #8137
Add `identifier` fields, which will soon replace the `dnsName` fields,
to:
- `corepb.Authorization`
- `corepb.Order`
- `rapb.NewOrderRequest`
- `sapb.CountFQDNSetsRequest`
- `sapb.CountInvalidAuthorizationsRequest`
- `sapb.FQDNSetExistsRequest`
- `sapb.GetAuthorizationsRequest`
- `sapb.GetOrderForNamesRequest`
- `sapb.GetValidAuthorizationsRequest`
- `sapb.NewOrderRequest`
Populate these `identifier` fields in every function that creates
instances of these structs.
Use these `identifier` fields instead of `dnsName` fields (at least
preferentially) in every function that uses these structs. When crossing
component boundaries, don't assume they'll be present, for
deployability's sake.
Deployability note: Mismatched `cert-checker` and `sa` versions will be
incompatible because of a type change in the arguments to
`sa.SelectAuthzsMatchingIssuance`.
Part of #7311
Rename `FromDNSNames` to `NewDNSSlice`, since it's exactly `NewDNS`
except for slices.
Rename `AsProto` to use the "To" prefix, since it's the opposite of
"From".
Add a named type `ACMEIdentifiers` so that we can add methods to slices.
We will have a lot of slice handling code coming up, which this will
make more elegant and readable.
Add a comment to explain naming conventions in the `identifier` package.
Part of #7311
Alternative to #8068
Add MaxNames to the set of things that can be configured on a
per-profile basis. Remove all references to the RA's global maxNames,
replacing them with reference's to the current profile's maxNames. Add
code to the RA's main() to copy a globally-configured MaxNames into each
profile, for deployability.
Also remove any understanding of MaxNames from the WFE, as it is
redundant with the RA and is not configured in staging or prod. Instead,
hardcode the upper limit of 100 into the ratelimit package itself.
Fixes https://github.com/letsencrypt/boulder/issues/7993
Update from go1.23.1 to go1.23.6 for our primary CI and release builds.
This brings in a few security fixes that aren't directly relevant to us.
Add go1.24.0 to our matrix of CI and release versions, to prepare for
switching to this next major version in prod.
Add a new `ratelimits.NewTransactionBuilderWithLimits` constructor which
takes pre-populated rate limit data, instead of filenames for reading it
off disk.
Use this new constructor to change rate limits during RA tests, instead
of using extra `testdata` files.
Fix ARI renewals' exception from rate limits: consider `isARIRenewal` as
part of the `isRenewal` arg to `checkNewOrderLimits`.
Remove obsolete RA tests for rate limits that are now only checked in
the WFE.
Update remaining new order rate limit tests from deprecated `ratelimit`s
to new Redis `ratelimits`.
#7782 fixed an issue where concurrent requests to the same existing
bucket ignored all but one rate limit spend. However, concurrent
requests to the same empty bucket can still cause multiple
initializations that skip all but one spend. Use BatchSetNotExisting
(SETNX in Redis) to detect this scenario and then fall back to
BatchIncrement (INCRBY in Redis).
#7782 sets the TTL (Time-To-Live) of incremented buckets to the maximum
possible burst for the applied limit. Because this TTL doesn’t match the
TAT, these buckets can become "stale," holding a TAT in the past.
Incrementing these stale buckets by cost * emissionInterval leaves the
new TAT behind the current time, allowing clients who are sometimes idle
to gain extra burst capacity. Instead, use BatchSet (SET in Redis) to
overwrite the TAT to now + cost * emissionInterval. Though this
introduces a similar race condition as empty buckets, it’s less harmful
than granting extra burst capacity.
Mostly we refer consistently to token bucket, but these two places (one
of which is soon to be removed) still had the "leaky" terminology, which
is potentially confusing.
For batch operations, include the operation and the number of keys in
the error message. This should help diagnose whether we are getting `i/o
timeout` errors disproportionately for larger requests, or for certain
operations.
Also, make the ignored errors part of the overall WFE request logs,
which allows us to get additional context, like whether certain
requesters or domain names are getting disproportionately many errors.
Related to #7846.
The purpose of these RA and WFE unit tests is to test how they deal with
certain rate limit conditions, not to test talking to an actual redis
instance. Streamline the tests by having them talk to an in-memory rate
limits store, rather than a redis-backed one.
The zero value for `limit` is invalid, so returning `nil` in error cases
avoids silently returning invalid limits (and means that if code makes a
mistake and references an invalid limit it will be an obvious clear
stack trace).
This is more consistent, since the methods on `limit` use a pointer
receiver. Also, since `limit` is a fairly large object, this saves some
copying.
Related to #7803 and #7797.
In the FailedAuthorizations limits, there was code that intentionally
ignored errLimitDisabled errors (`errors.Is(err, errLimitDisabled)`).
However, that that resulted in those functions later using a returned
`limit` value that was invalid (i.e. its zero value). That happened to
trigger some later checks in validateTransaction. Specifically this
check failed:
if txn.cost > txn.limit.Burst {
// error
When txt.limit.Burst is zero, this will always fail.
This problem doesn't really show up in prod, where all the limits are
configured. But it showed up in tests, specifically
TestPerformValidation_FailedValidationsTriggerPauseIdentifiersRatelimit,
where the limits are constructed using a simplified config that leaves
most of them disabled.
In this change, I tried to make handling of errLimitDisabled more
consistent, and always return an allow-only transaction as early as
possible instead of falling through the error condition.
Where that wasn't possible, I used a boolean to record whether the
result of `builder.getLimit()` was valid before referencing any of its
fields.
I also added some "shouldn't happen" errors to catch this problem
earlier if it recurs.
I removed some "skip disabled limit" comments because those say "what
the code does" (which the code also says), not "why the code does it".
Fixes the test failures in #7797.
- Added a new key-value ratelimit
`FailedAuthorizationsForPausingPerDomainPerAccount` which is incremented
each time a client fails a validation.
- As long as capacity exists in the bucket, a successful validation
attempt will reset the bucket back to full capacity.
- Upon exhausting bucket capacity, the RA will send a gRPC to the SA to
pause the `account:identifier`. Further validation attempts will be
rejected by the [WFE](https://github.com/letsencrypt/boulder/pull/7599).
- Added a new feature flag, `AutomaticallyPauseZombieClients`, which
enables automatic pausing of zombie clients in the RA.
- Added a new RA metric `paused_pairs{"paused":[bool],
"repaused":[bool], "grace":[bool]}` to monitor use of this new
functionality.
- Updated `ra_test.go` `initAuthorities` to allow accessing the
`*ratelimits.RedisSource` for checking that the new ratelimit functions
as intended.
Co-authored-by: @pgporada
Fixes https://github.com/letsencrypt/boulder/issues/7738
---------
Co-authored-by: Phil Porada <pporada@letsencrypt.org>
Co-authored-by: Phil Porada <philporada@gmail.com>
Add a new method, `BatchIncrement`, to issue `IncrBy` (instead of `Set`)
to Redis. This helps prevent the race condition that allows bursts of
near-simultaneous requests to, effectively, spend the same token.
Call this new method when incrementing an existing key. New keys still
need to use `BatchSet` because Redis doesn't have a facility to, within
a single operation, increment _or_ set a default value if none exists.
Add a new feature flag, `IncrementRateLimits`, gating the use of this
new method.
CPS Compliance Review: This feature flag does not change any behaviour
that is described or constrained by our CP/CPS. The closest relation
would just be API availability in general.
Fixes#7780
Initialize a slice with a capacity of len(nameToString) rather than initializing
the length of this slice.
Signed-off-by: huochexizhan <huochexizhan@outlook.com>
- Check `CertificatesPerDomain` at newOrder and spend at Finalize time.
- Check `CertificatesPerAccountPerDomain` at newOrder and spend at
Finalize time.
- Check `CertificatesPerFQDNSet` at newOrder and spend at Finalize time.
- Fix a bug
in`FailedAuthorizationsPerDomainPerAccountSpendOnlyTransaction()` which
results in failed authorizations being spent for the exact FQDN, not the
eTLD+1.
- Remove redundant "max names" check at transaction construction time
- Enable key-value rate limits in the RA
Fix a bug added in #7653 which sometimes attributed an "Allowed"
`Transaction` to the amalgamated "Denied" `*Decision`. Instead, always
return the most restrictive `*Decision` in the batch.
Remove a debug `fmt.Printf()` call added in #7653
- Instruct callers to call *Decision.Result() to check the result of
rate limit transactions
- Preserve the Transaction within the resulting *Decision
- Generate consistently formatted verbose errors using the metadata
found in the *Decision
- Fix broken key-value rate limits integration test in
TestDuplicateFQDNRateLimit
Fixes#7577
Replace all of Boulder's usage of the Go stdlib "math/rand" package with
the newer "math/rand/v2" package which first became available in go1.22.
This package has an improved API and faster performance across the
board.
See https://go.dev/blog/randv2 and https://go.dev/blog/chacha8rand for
details.
- Rename `NewOrderRequest` field `LimitsExempt` to `IsARIRenewal`
- Introduce a new `NewOrderRequest` field, `IsRenewal`
- Introduce a new (temporary) feature flag, `CheckRenewalExemptionAtWFE`
WFE:
- Perform renewal detection in the WFE when `CheckRenewalExemptionAtWFE`
is set
- Skip (key-value) `NewOrdersPerAccount` and `CertificatesPerDomain`
limit checks when renewal detection indicates the the order is a
renewal.
RA:
- Leave renewal detection in the RA intact
- Skip renewal detection and (legacy) `NewOrdersPerAccount` and
`CertificatesPerDomain` limit checks when `CheckRenewalExemptionAtWFE`
is set and the `NewOrderRequest` indicates that the order is a renewal.
Fixes#7508
Part of #5545
- Shrink the number of public `ratelimits` methods by relocating two
sizeable transaction constructors. Simplify the spend and refund
call-sites in the WFE.
- Spend calls now block instead of being called asynchronously.
This allows us to give a user-meaningful error about malformed names
early on, instead of propagating internal errors from the new rate
limiting system.
This moves the well-formedness logic from `WillingToIssue` into a new
function `WellFormedDomainNames`, which calls `ValidDomain` on each name
and combines the errors into suberrors if there is more than one.
`WillingToIssue` now calls `WellFormedDomainNames` to keep the existing
behavior. Additionally, WFE calls `WellFormedDomainNames` before
checking rate limits.
This creates a slight behavior change: If an order contains both
malformed domain names and wellformed but blocked domain names,
suberrors will only be generated for the malformed domain names. This is
reflected in the changes to `TestWillingToIssue_Wildcard`.
Adds a WFE test case for receiving malformed identifiers in a new-order
request.
Follows up on #3323 and #7218Fixes#7526
Some small incidental fixes:
- checkWildcardHostList was checking `pa.blocklist` for `nil` before
accessing `pa.wildcardExactBlocklist`. Fix that.
- move table test for WillingToIssue into a new test case for
WellFormedDomainNames
- move two standalone test cases into the big table test
Change ratelimits.validateIdForName to call the appropriate validate
function based on the contents of the to-be-validated ID, rather than
falling back and potentially performing multiple validations.
Previously, if an ID like "12345:bad.domain" was passed into this
function, it would fail the first validateRegIdDomain validation due to
having a bad domain name (no TLD), fall back to the simpler
validateRegId function, and then fail that because it contains a colon.
This obscured the true reason for the failure. Changing this code to not
fall back means that the true reason for the id validation failure will
be exposed in the error message.
Remove the redis-tls, wfe-tls, and mail-test-srv keys which were
generated by minica and then checked in to the repo. All three are
replaced by the dynamically-generated ipki directory.
Part of https://github.com/letsencrypt/boulder/issues/7476
We had disabled our lints on go1.22 because golangci-lint and
staticcheck didn't work with some of its updates. Re-enable them, and
fix the things which the updated linters catch now.
Fixes https://github.com/letsencrypt/boulder/issues/7229
- Update the failed authorizations limit to use 'enum:regId:domain' for
transactions while maintaining 'enum:regId' for overrides.
- Modify the failed authorizations transaction builder to generate a
transaction for each order name.
- Rename the `FailedAuthorizationsPerAccount` enum to
`FailedAuthorizationsPerDomainPerAccount` to align with its corrected
implementation. This change is possible because the limit isn't yet
deployed in staging or production.
Blocks #7346
Part of #5545