boulder

Commit Graph

Author	SHA1	Message	Date
James Renken	48d5ad3c19	ratelimits: Add IP address identifier support (#8221 ) Change most functions in `ratelimits` to use full ACMEIdentifier(s) as arguments, instead of using their values as strings. This makes the plumbing from other packages more consistent, and allows us to: Rename `FQDNsToETLDsPlusOne` to `coveringIdentifiers` and handle IP identifiers, parsing IPv6 addresses into their covering /64 prefixes for CertificatesPerDomain[PerAccount] bucket keys. Port improved IP/CIDR validation logic to NewRegistrationsPerIPAddress & PerIPv6Range. Rename `domain` parts of bucket keys to either `identValue` or `domainOrCIDR`. Rename other internal functions to clarify that they now handle identifier values, not just domains. Add the new reserved IPv6 address range from RFC 9780. For deployability, don't (yet) rename rate limits themselves; and because it remains the name of the database table, preserve the term `fqdnSets`. Fixes #8223 Part of #7311	2025-06-12 11:47:32 -07:00
Samantha Frank	6402a2275f	ratelimits: Remove a metric and some labels that we're not finding useful (#7902 )	2024-12-20 08:44:08 -05:00
James Renken	62299362bd	ra/ratelimits: Update tests, use new TransactionBuilder constructor, fix ARI rate limit exception (#7869 ) Add a new `ratelimits.NewTransactionBuilderWithLimits` constructor which takes pre-populated rate limit data, instead of filenames for reading it off disk. Use this new constructor to change rate limits during RA tests, instead of using extra `testdata` files. Fix ARI renewals' exception from rate limits: consider `isARIRenewal` as part of the `isRenewal` arg to `checkNewOrderLimits`. Remove obsolete RA tests for rate limits that are now only checked in the WFE. Update remaining new order rate limit tests from deprecated `ratelimit`s to new Redis `ratelimits`.	2024-12-18 14:23:13 -08:00
Samantha Frank	11d543bd98	ratelimits: Correctly handle stale and concurrently initialized buckets (#7886 ) #7782 fixed an issue where concurrent requests to the same existing bucket ignored all but one rate limit spend. However, concurrent requests to the same empty bucket can still cause multiple initializations that skip all but one spend. Use BatchSetNotExisting (SETNX in Redis) to detect this scenario and then fall back to BatchIncrement (INCRBY in Redis). #7782 sets the TTL (Time-To-Live) of incremented buckets to the maximum possible burst for the applied limit. Because this TTL doesn’t match the TAT, these buckets can become "stale," holding a TAT in the past. Incrementing these stale buckets by cost * emissionInterval leaves the new TAT behind the current time, allowing clients who are sometimes idle to gain extra burst capacity. Instead, use BatchSet (SET in Redis) to overwrite the TAT to now + cost * emissionInterval. Though this introduces a similar race condition as empty buckets, it’s less harmful than granting extra burst capacity.	2024-12-17 12:42:51 -05:00
Jacob Hoffman-Andrews	40e100c297	doc: replace "leaky" with "token" bucket (#7881 ) Mostly we refer consistently to token bucket, but these two places (one of which is soon to be removed) still had the "leaky" terminology, which is potentially confusing.	2024-12-10 16:39:30 -08:00
Jacob Hoffman-Andrews	27e65f3e9f	ratelimits: add detail to error messages (#7871 ) For batch operations, include the operation and the number of keys in the error message. This should help diagnose whether we are getting `i/o timeout` errors disproportionately for larger requests, or for certain operations. Also, make the ignored errors part of the overall WFE request logs, which allows us to get additional context, like whether certain requesters or domain names are getting disproportionately many errors. Related to #7846.	2024-12-05 15:58:26 -08:00
Aaron Gable	d962c61067	RA and WFE tests: use inmem rate limit source (#7859 ) The purpose of these RA and WFE unit tests is to test how they deal with certain rate limit conditions, not to test talking to an actual redis instance. Streamline the tests by having them talk to an in-memory rate limits store, rather than a redis-backed one.	2024-12-03 14:52:16 -08:00
Aaron Gable	bac5602c6d	Always use INCRBY for redis rate limits (#7856 ) Deprecate the IncrementRateLimits feature flag, and always use the redis INCRBY instruction to update rate limit TATs. Fixes https://github.com/letsencrypt/boulder/issues/7855	2024-12-02 15:25:33 -08:00
Kruti Sutaria	a79a830f3b	ratelimits: Auto pause zombie clients (#7763 ) - Added a new key-value ratelimit `FailedAuthorizationsForPausingPerDomainPerAccount` which is incremented each time a client fails a validation. - As long as capacity exists in the bucket, a successful validation attempt will reset the bucket back to full capacity. - Upon exhausting bucket capacity, the RA will send a gRPC to the SA to pause the `account:identifier`. Further validation attempts will be rejected by the [WFE](https://github.com/letsencrypt/boulder/pull/7599). - Added a new feature flag, `AutomaticallyPauseZombieClients`, which enables automatic pausing of zombie clients in the RA. - Added a new RA metric `paused_pairs{"paused":[bool], "repaused":[bool], "grace":[bool]}` to monitor use of this new functionality. - Updated `ra_test.go` `initAuthorities` to allow accessing the `*ratelimits.RedisSource` for checking that the new ratelimit functions as intended. Co-authored-by: @pgporada Fixes https://github.com/letsencrypt/boulder/issues/7738 --------- Co-authored-by: Phil Porada <pporada@letsencrypt.org> Co-authored-by: Phil Porada <philporada@gmail.com>	2024-11-08 13:51:41 -08:00
James Renken	4adc65fb7d	Rate limits: replace redis SET with INCRBY (#7782 ) Add a new method, `BatchIncrement`, to issue `IncrBy` (instead of `Set`) to Redis. This helps prevent the race condition that allows bursts of near-simultaneous requests to, effectively, spend the same token. Call this new method when incrementing an existing key. New keys still need to use `BatchSet` because Redis doesn't have a facility to, within a single operation, increment _or_ set a default value if none exists. Add a new feature flag, `IncrementRateLimits`, gating the use of this new method. CPS Compliance Review: This feature flag does not change any behaviour that is described or constrained by our CP/CPS. The closest relation would just be API availability in general. Fixes #7780	2024-11-04 11:20:44 -08:00
Samantha Frank	6e6c8fe480	ratelimits: Update errors to deep link to individual limits documentation (#7767 ) Updates rate limits error messages to deep link to new website docs added in https://github.com/letsencrypt/website/pull/1756.	2024-10-25 13:55:51 -04:00
Samantha Frank	31d0ff0f98	ratelimits: Replace Decision merging with always returning most restrictive (#7667 ) Fix a bug added in #7653 which sometimes attributed an "Allowed" `Transaction` to the amalgamated "Denied" `Decision`. Instead, always return the most restrictive `*Decision` in the batch. Remove a debug `fmt.Printf()` call added in #7653	2024-08-15 17:49:34 -04:00
Samantha Frank	6a3e9d725b	ratelimits: Provide verbose user-facing rate limit errors (#7653 ) - Instruct callers to call Decision.Result() to check the result of rate limit transactions - Preserve the Transaction within the resulting Decision - Generate consistently formatted verbose errors using the metadata found in the *Decision - Fix broken key-value rate limits integration test in TestDuplicateFQDNRateLimit Fixes #7577	2024-08-12 16:14:15 -04:00
Samantha Frank	a6e0fdc80e	ratelimits: Fix latency calculations (#7627 )	2024-07-24 22:07:33 -04:00
Samantha	063db40db2	ratelimits: Fix cardinality explosion in overrideUsageGauge (#7548 )	2024-06-14 13:34:40 -04:00
Samantha	eb49d4487e	ratelimits: Implement batched Spends and Refunds (#7143 ) - Move default and override limits, and associated methods, out of the Limiter to new limitRegistry struct, embedded in a new public TransactionBuilder. - Export Transaction and add corresponding Transaction constructor methods for each limit Name, making Limiter and TransactionBuilder the API for interacting with the ratelimits package. - Implement batched Spends and Refunds on the Limiter, the new methods accept a slice of Transactions. - Add new boolean fields check and spend to Transaction to support more complicated cases that can arise in batches: 1. the InvalidAuthorizations limit is checked at New Order time in a batch with many other limits, but should only be spent when an Authorization is first considered invalid. 2. the CertificatesPerDomain limit is overridden by CertficatesPerDomainPerAccount, when this is the case, spends of the CertificatesPerDomain limit should be "best-effort" but NOT deny the request if capacity is lacking. - Modify the existing Spend/Refund methods to support Transaction.check/spend and 0 cost Transactions. - Make bucketId private and add a constructor for each bucket key format supported by ratelimits. - Move domainsForRateLimiting() from the ra.go to ratelimits. This avoids a circular import issue in ra.go. Part of #5545	2023-12-07 11:56:02 -05:00
Samantha	ca6314fa48	ratelimits: API improvements necessary for batches and limit fixes (#7117 ) The `Limiter` API has been adjusted significantly to both improve both safety and ergonomics and two `Limit` types have been corrected to match the legacy implementations. Safety Previously, the key used for looking up limit overrides and for fetching individual buckets from the key-value store was constructed within the WFE. This posed a risk: if the key was malformed, the default limit would still be enforced, but individual overrides would fail to function properly. This has been addressed by the introduction of a new `BucketId` type along with a `BucketId` constructor for each `Limit` type. Each constructor is responsible for producing a well-formed bucket key which undergoes the very same validation as any potentially matching override key. Ergonomics Previously, each of the `Limiter` methods took a `Limit` name, a bucket identifier, and a cost to be spent/ refunded. To simplify this, each method now accepts a new `Transaction` type which provides a cost, and wraps a `BucketId` identifying the specific bucket. The two changes above, when taken together, make the implementation of batched rate limit transactions considerably easier, as a batch method can accept a slice of `Transaction`. Limit Corrections PR #6947 added all of the existing rate limits which could be made compatible with the key-value approach. Two of these were improperly implemented; - `CertificatesPerDomain` and `CertificatesPerFQDNSet`, were implemented as - `CertificatesPerDomainPerAccount` and `CertificatesPerFQDNSetPerAccount`. Since we do not actually associate these limits with a particular ACME account, the `regID` portion of each of their bucket keys has been removed.	2023-11-08 13:29:01 -05:00
Samantha	9aef5839b5	WFE: Add new key-value ratelimits implementation (#7089 ) Integrate the key-value rate limits from #6947 into the WFE. Rate limits are backed by the Redis source added in #7016, and use the SRV record shard discovery added in #7042. Part of #5545	2023-10-04 14:12:38 -04:00
Samantha	6223acd987	ratelimit: Add an override usage gauge (#7076 ) Add a gauge similar to the one added to our key-value rate limits implementation in #7044. Part of #7036	2023-09-13 17:34:51 -04:00
Samantha	636d30f4a9	ratelimit: Overhaul metrics for the our existing rate limits (#7054 ) - Use constants for each rate limit name to ensure consistency when labeling metrics - Consistently check `.Enabled()` outside of each limit check RA method - Replace the existing checks counter with a latency histogram Part of #5545	2023-09-11 15:06:16 -04:00
Samantha	077a4e2dc4	ratelimits: Export override utilization metrics (#7044 ) Fixes #7036	2023-08-23 13:40:23 -04:00
Samantha	48f211c7ba	ratelimits: Add Redis source (#7016 ) Part of #5545	2023-08-10 11:45:04 -04:00
Samantha	055f620c4b	Initial implementation of key-value rate limits (#6947 ) This design seeks to reduce read-pressure on our DB by moving rate limit tabulation to a key-value datastore. This PR provides the following: - (README.md) a short guide to the schemas, formats, and concepts introduced in this PR - (source.go) an interface for storing, retrieving, and resetting a subscriber bucket - (name.go) an enumeration of all defined rate limits - (limit.go) a schema for defining default limits and per-subscriber overrides - (limiter.go) a high-level API for interacting with key-value rate limits - (gcra.go) an implementation of the Generic Cell Rate Algorithm, a leaky bucket-style scheduling algorithm, used to calculate the present or future capacity of a subscriber bucket using spend and refund operations Note: the included source implementation is test-only and currently accomplished using a simple in-memory map protected by a mutex, implementations using Redis and potentially other data stores will follow. Part of #5545	2023-07-21 12:57:18 -04:00

23 Commits