Commit Graph

6916 Commits

Author SHA1 Message Date
Samantha Frank 8bf13a90f4
VA: Make PerformValidation more like DoDCV (#7828)
- Remove Perspective and RIR from ValidationRecords
- Make ValidationResultToPB Perspective and RIR aware
- Update comment for VA.PerformValidation
- Make verificationRequestEvent more like doDCVAuditLog
- Update language used in problems created by performRemoteValidation to
be more like doRemoteDCV.
2024-11-20 14:13:55 -05:00
Samantha Frank c2632585f5
core: Move canceled.Is to core.IsCanceled (#7831)
Small refactor to use errors.Is() rather than the equality operator.
Also moves this utility function into core.
2024-11-20 13:10:02 -05:00
Samantha Frank a8cdaf8989
ratelimit: Remove legacy registrations per IP implementation (#7760)
Part of #7671
2024-11-19 18:39:21 -05:00
Samantha Frank 65de9fb4b8
VA: Fix two IsCAAValid bugs (#7829)
Fix two bugs introduced in #7816:
- Fix localLatency time for CAA rechecking is always 0
- Fix logEvent.InternalError is no longer being written to log
2024-11-19 11:14:34 -08:00
Jacob Hoffman-Andrews 577a1e38eb
va: prepare to require minimum of 3 RVAs (#7815)
To prepare for the MPIC requirement of having a minimum of 3
perspectives, I added code to `NewValidationAuthorityImpl` to error if
there aren't enough remote VAs configured _and_ the current VA is the
primary perspective. Then I fixed all the tests, which involved adding
some backends in the unittests, and spinning up `remoteva-c` in the
integration tests.

As a reminder, the `boulder va` command always considers itself the
primary perspective, while `boulder remoteva` gives itself a perspective
based on its config.

I wound up backing out the code in `NewValidationAuthorityImpl` because
right now our remote VAs are actually running the `boulder va` command,
so they would error out in prod, even though our actual primary
perspective does have enough backends. So this wound up as a test-only
change.
2024-11-19 10:23:32 -05:00
Jacob Hoffman-Andrews a46c388f66
va: compute maxRemoteFailures based on MPIC (#7810)
Previously this was a configuration field.

Ports `maxAllowedFailures()` from `determineMaxAllowedFailures()` in
#7794.

Test updates:
 
Remove the `maxRemoteFailures` param from `setup` in all VA tests.

Some tests were depending on setting this param directly to provoke
failures.

For example, `TestMultiVAEarlyReturn` previously relied on "zero allowed
failures". Since the number of allowed failures is now 1 for the number
of remote VAs we were testing (2), the VA wasn't returning early with an
error; it was succeeding! To fix that, make sure there are two failures.
Since two failures from two RVAs wouldn't exercise the right situation,
add a third RVA, so we get two failures from three RVAs.

Similarly, TestMultiCAARechecking had several test cases that omitted
this field, effectively setting it to zero allowed failures. I updated
the "1 RVA failure" test case to expect overall success and added a "2
RVA failures" test case to expect overall failure (we previously
expected overall failure from a single RVA failing).

In TestMultiVA I had to change a test for `len(lines) != 1` to
`len(lines) == 0`, because with more backends we were now logging more
errors, and finding e.g. `len(lines)` to be 2.
2024-11-18 15:36:09 -08:00
Jacob Hoffman-Andrews 20fdcbcfe0
ratelimits: always use a pointer for limit (#7804)
The zero value for `limit` is invalid, so returning `nil` in error cases
avoids silently returning invalid limits (and means that if code makes a
mistake and references an invalid limit it will be an obvious clear
stack trace).

This is more consistent, since the methods on `limit` use a pointer
receiver. Also, since `limit` is a fairly large object, this saves some
copying.

Related to #7803 and #7797.
2024-11-15 13:45:07 -08:00
Samantha Frank 3506f09285
RA: Make calls to countCertificateIssued and countFailedValidations synchronous (#7824)
Solves CI flakes in TestCertificatesPerDomain and
TestIdentifiersPausedForAccount that are the result of a race on the
Redis database. This has the downside of making failed validations and
successful finalizations take slightly longer.
2024-11-15 16:33:51 -05:00
Samantha Frank 3baac6f6df
VA: Consolidate multiple metrics into one histogram (#7816)
- Add a new histogram, validationLatency
- Add a VA.observeLatency for observing validation latency
- Refactor to ensure this metric can be observed exclusively within
VA.PerformValidation and VA.IsCAAValid.
- Replace validationTime, localValidationTime, remoteValidationTime,
remoteValidationFailures, caaCheckTime, localCAACheckTime,
remoteCAACheckTime, and remoteCAACheckFailures with validationLatency
2024-11-15 15:51:39 -05:00
Samantha Frank f9fb688427
build: Specify go 1.23.0 in go.mod (#7822)
We're using `.Keys()` a method of the maps package added in 1.23
(https://pkg.go.dev/maps@master#Keys) but our go.mod states 1.22.0. This
causes some in-IDE linting errors in VSCode.
2024-11-15 12:49:04 -08:00
Samantha Frank b2b5645e16
VA: Cleanup performRemoteValidation (#7814)
Bring this code more in line with `VA.remoteDoDCV` in #7794. This should
make these two easier to diff in review.
2024-11-15 12:25:06 -08:00
Samantha Frank 2502113ac3
VA: Remove logging of RIR and Perspective (#7818) 2024-11-15 14:54:03 -05:00
Jacob Hoffman-Andrews 56f0ed6419
wfe: orders link to authz IDs with acccount (#7790)
This means that most traffic will go to the authz URLs with account.
After this has been deployed for 30 days (the max lifetime of an order),
we can remove support for the old paths.

Part of #7683
2024-11-15 10:34:14 -08:00
Jacob Hoffman-Andrews 0d70b12a75
ra: fix unittest for resetting pause limit (#7813)
TestPerformValidation_FailedThenSuccessfulValidationResetsPauseIdentifiersRatelimit
checks for a bucket being empty after a reset. However, that bucket is
based on an account ID that is shared across multiple test cases.
Instead, use a unique account and domain for this test.

Fixes #7812
2024-11-14 15:04:38 -08:00
Jacob Hoffman-Andrews c39f33e24f
va: fix race in TestMultiVALogging (#7811) 2024-11-14 14:17:42 -08:00
Jacob Hoffman-Andrews 5e385e440a
ra: clean up countFailedValidations (#7797)
Return an error and do logging in the caller. This adds early returns on
a number of error conditions, which can prevent nil pointer dereference
in those cases.

Also update the description for AutomaticallyPauseZombieClients.

Follows up #7763.
2024-11-14 16:16:36 -05:00
Jacob Hoffman-Andrews 26a9910911
ratelimits: improve disabled limit handling (#7803)
In the FailedAuthorizations limits, there was code that intentionally
ignored errLimitDisabled errors (`errors.Is(err, errLimitDisabled)`).
However, that that resulted in those functions later using a returned
`limit` value that was invalid (i.e. its zero value). That happened to
trigger some later checks in validateTransaction. Specifically this
check failed:

    	if txn.cost > txn.limit.Burst {
        // error

When txt.limit.Burst is zero, this will always fail.

This problem doesn't really show up in prod, where all the limits are
configured. But it showed up in tests, specifically
TestPerformValidation_FailedValidationsTriggerPauseIdentifiersRatelimit,
where the limits are constructed using a simplified config that leaves
most of them disabled.

In this change, I tried to make handling of errLimitDisabled more
consistent, and always return an allow-only transaction as early as
possible instead of falling through the error condition.

Where that wasn't possible, I used a boolean to record whether the
result of `builder.getLimit()` was valid before referencing any of its
fields.

I also added some "shouldn't happen" errors to catch this problem
earlier if it recurs.

I removed some "skip disabled limit" comments because those say "what
the code does" (which the code also says), not "why the code does it".

Fixes the test failures in #7797.
2024-11-13 16:23:50 -05:00
James Renken 0a27cba9f4
WFE/nonce: Add NonceHMACKey field (#7793)
Add a new WFE & nonce config field, `NonceHMACKey`, which uses the new
`cmd.HMACKeyConfig` type. Deprecate the `NoncePrefixKey` config field.

Generalize the error message when validating `HMACKeyConfig` in
`config`.

Remove the deprecated `UseDerivablePrefix` config field, which is no
longer used anywhere.

Part of #7632
2024-11-13 10:31:28 -05:00
Jacob Hoffman-Andrews 5be3e99a4d
features: remove deprecated features (#7805)
Fixes #7802
2024-11-13 10:22:32 -05:00
Jacob Hoffman-Andrews 1d8cf3e212
ra: remove special case for empty DNSNames (#7795)
This case was added to work around a test case that didn't fill it out;
instead, fill DNSNames for that test case.
2024-11-11 11:07:20 -05:00
Kruti Sutaria a79a830f3b
ratelimits: Auto pause zombie clients (#7763)
- Added a new key-value ratelimit
`FailedAuthorizationsForPausingPerDomainPerAccount` which is incremented
each time a client fails a validation.
- As long as capacity exists in the bucket, a successful validation
attempt will reset the bucket back to full capacity.
- Upon exhausting bucket capacity, the RA will send a gRPC to the SA to
pause the `account:identifier`. Further validation attempts will be
rejected by the [WFE](https://github.com/letsencrypt/boulder/pull/7599).
- Added a new feature flag, `AutomaticallyPauseZombieClients`, which
enables automatic pausing of zombie clients in the RA.
- Added a new RA metric `paused_pairs{"paused":[bool],
"repaused":[bool], "grace":[bool]}` to monitor use of this new
functionality.
- Updated `ra_test.go` `initAuthorities` to allow accessing the
`*ratelimits.RedisSource` for checking that the new ratelimit functions
as intended.

Co-authored-by: @pgporada 

Fixes https://github.com/letsencrypt/boulder/issues/7738

---------

Co-authored-by: Phil Porada <pporada@letsencrypt.org>
Co-authored-by: Phil Porada <philporada@gmail.com>
2024-11-08 13:51:41 -08:00
Jacob Hoffman-Andrews 2058d985cc
Allow account IDs in authz and challenge URLs (#7768)
This adds new handlers under `/acme/authz/` and `/acme/chall/` that
expect to be followed by `{regID}/{authzID}` and
`{regID}/{authzID}/{challengeID}`, respectively. For deployability, the
old handlers continue to work, and the URLs returned for newly created
objects will still point to the paths used by the old handlers
(`/acme/authz-v3/` and `/acme/chall-v3/`).

There are some self-referential URLs in authz and challenge responses,
like the Location header, and the URL of challenges embedded in an
authorization object. This PR updates `prepAuthorizationForDisplay` and
`prepChallengeForDisplay` so those URLs can be generated consistently
with the path that was requested.

For the WFE tests, in most cases I duplicated an entire test and then
updated it to test the `WithAccount` handler. The idea is that once
we're fully switched over to the new format we can delete the tests for
the non-`WithAccount` variants.

Part of #7683
2024-11-06 11:52:10 -08:00
Aaron Gable 2603aa45a8
Remove weakKeyFile and blockedKeyFile support (#7783)
Goodkey has two ways to detect a key as weak: it runs a variety of
algorithmic checks (such as Fermat factorization and rocacheck), or the
key can be listed in a "weak key file". Similarly, it has two ways to
detect a key as blocked: it can call a generic function (which we use to
query our database), or the key can be listed in a "blocked key file".

This is two methods too many. Reliance on files of weak or blocked keys
introduces unnecessary complexity to both the implementation and
configuration of the goodkey package. Remove both "key file" options and
delete all code which supported them.

Also remove //test/block-a-key, as it was only used to generate these
test files.

IN-10762 tracked the removal of these files in prod.

Fixes https://github.com/letsencrypt/boulder/issues/7748
2024-11-06 10:48:39 -08:00
James Renken 6a2819a95a
Introduce separate UpdateRegistrationContact & UpdateRegistrationKey methods in RA & SA (#7735)
Introduce separate UpdateRegistrationContact & UpdateRegistrationKey
methods in RA & SA

Clear contact field during DeactivateRegistration

Part of #7716
Part of #5554
2024-11-06 10:07:31 -08:00
Aaron Gable 84b15eb911
Truncate ARI timestamps to 1-second resolution (#7784)
There's no reason for us to be providing nanosecond precision on ARI
timestamps, and apparently it messes up some JSON date-parsing
libraries.

Fixes https://github.com/letsencrypt/boulder/issues/7779
2024-11-05 10:04:27 -08:00
Aaron Gable 46fc4c25ab
Re-enable wastedassign linter (#7788)
Fixes https://github.com/letsencrypt/boulder/issues/6202
2024-11-05 07:45:37 -08:00
Aaron Gable 3b62e81999
Clean up migration to separate remoteva executable (#7787)
Fixes https://github.com/letsencrypt/boulder/issues/7733
2024-11-05 07:44:08 -08:00
Jacob Hoffman-Andrews cb56bf6beb
ca: log cert signing using JSON objects (#7742)
This makes the log events easier to parse, and makes it easier to
consistently use the correct fields from the issuance request.

Also, reduce the number of fields that are logged on error events.
Logging just the serial and the error in most cases should suffice to
cross-reference the error with the item that we attempted to sign.

One downside is that this increases the total log size (10kB above, vs
7kB from a similar production issuance) due in part to more repetition.
For example, both the "signing cert" and "signing cert success" log
lines include the full precert DER.

Note that our long-term plan for more structured logs is to have a
unique event id to join logs on, which can avoid this repetition. But
since we don't currently have convenient ways to do that join, some
duplication (as we currently have in the logs) seems reasonable.
2024-11-04 16:54:07 -08:00
Matthew McPherrin 1fa66781ee
Allow admin command to block key from a CSR file (#7770)
One format we receive key compromise reports is as a CSR file. For
example, from https://pwnedkeys.com/revokinator

This allows the admin command to block a key from a CSR directly,
instead of needing to validate it manually and get the SPKI or key from
it.

I've added a flag (default true) to check the signature on the CSR, in
case we ever decide we want to block a key from a CSR with a bad
signature for whatever reason.
2024-11-04 15:11:43 -08:00
Jacob Hoffman-Andrews 02685602a2
web: add feature flag PropagateCancels (#7778)
This allow client-initiated cancels to propagate through gRPC.

IN-10803 tracks the SRE-side changes to enable this flag.
2024-11-04 14:37:29 -08:00
Aaron Gable 21bc647fa5
Simplify TestTraces to reduce specificity (#7785)
TestTraces is designed to test whether our Open Telemetry tracing system
is working: that spans are being output, that they have the appropriate
parents, etc. It should not be testing whether Boulder took a specific
path through its code -- that's the domain of package-specific unit
tests. Simplify TestTraces to the point that it is asserting (nearly)
the bare minimum about the set of operations Boulder performs.
2024-11-04 12:02:57 -08:00
James Renken 4adc65fb7d
Rate limits: replace redis SET with INCRBY (#7782)
Add a new method, `BatchIncrement`, to issue `IncrBy` (instead of `Set`)
to Redis. This helps prevent the race condition that allows bursts of
near-simultaneous requests to, effectively, spend the same token.

Call this new method when incrementing an existing key. New keys still
need to use `BatchSet` because Redis doesn't have a facility to, within
a single operation, increment _or_ set a default value if none exists.

Add a new feature flag, `IncrementRateLimits`, gating the use of this
new method.

CPS Compliance Review: This feature flag does not change any behaviour
that is described or constrained by our CP/CPS. The closest relation
would just be API availability in general.

Fixes #7780
2024-11-04 11:20:44 -08:00
Jacob Hoffman-Andrews 2d69d7b9df
wfe: set Retry-After header on 500s (#7781) 2024-11-04 10:34:11 -08:00
Jacob Hoffman-Andrews 3377102aa8
issuance: ignore some ignored lints (#7771)
This improves deployability for the v3.6.2 release of zlint.

Fixes #7756
2024-10-28 13:51:18 -07:00
Samantha Frank b69c005d85
WFE: Use JSON tags to omit the Authorization ID and RegistrationID fields (#7769)
Use the `-` JSON tag to omit `ID` and `RegistrationID` fields instead of
mutating the core.Authorization object.
2024-10-28 14:52:18 -04:00
Jacob Hoffman-Andrews e182d889b2
sa: document the storage of linting certificates (#7772)
The naming of our `precertificates` table (now used to store linting
certificates) is definitely confusing, so add some more comments in
various places explaining. See #6807.
2024-10-28 10:23:39 -07:00
Samantha Frank 6e6c8fe480
ratelimits: Update errors to deep link to individual limits documentation (#7767)
Updates rate limits error messages to deep link to new website docs added in https://github.com/letsencrypt/website/pull/1756.
2024-10-25 13:55:51 -04:00
Samantha Frank 6c85b8d019
wfe/sa/features: Deprecate TrackReplacementCertificatesARI (#7766) 2024-10-24 13:38:33 -04:00
Samantha Frank e5edb7077f
wfe/features: Deprecate UseKvLimitsForNewOrder (#7765)
Default code paths that depended on this flag to be true.

Part of #5545
2024-10-23 18:13:24 -04:00
dependabot[bot] 844334e04a
build(deps): bump the aws group across 1 directory with 4 updates (#7757)
Bumps the aws group with 3 updates in the / directory:
[github.com/aws/aws-sdk-go-v2](https://github.com/aws/aws-sdk-go-v2),
[github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2)
and
[github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2).

Updates `github.com/aws/aws-sdk-go-v2` from 1.31.0 to 1.32.2
Updates `github.com/aws/aws-sdk-go-v2/config` from 1.27.39 to 1.27.43
Updates `github.com/aws/aws-sdk-go-v2/service/s3` from 1.63.3 to 1.65.3
Updates `github.com/aws/smithy-go` from 1.21.0 to 1.22.0

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-21 17:15:48 -04:00
Samantha Frank 71178f4ca4
WFE: Track in-flight for "/" (#7759) 2024-10-18 12:59:26 -04:00
Samantha Frank d0c9aa3808
WFE: Track in-flight HTTP requests by endpoint using a gauge (#7758) 2024-10-18 09:51:02 -04:00
Samantha Frank d17d71cc6e
ratelimits: Rename bucket.go to transaction.go (#7753) 2024-10-16 18:57:48 -04:00
Samantha Frank 6692160ced
test-cli: Pass -v/--verbose flag to Go integration tests (#7754)
Also remove -o/--list-integration-tests, this flag isn't really that
useful.
2024-10-10 15:26:15 -04:00
James Renken b0bcbb12aa
SA: Create list of authzIDs earlier in NewOrderAndAuthzs (#7744)
Creating the list of authzIDs earlier in NewOrderAndAuthzs:
- Saves a `for` loop with duplicated code; we no longer need to range
over two different slices, just one.
- Allows us to create the Order PB later, after more of the data
collection logic, without interrupting it. This makes the order of
operations slightly easier to follow.
2024-10-10 09:55:02 -07:00
Samantha Frank 37b85fbd38
VA/RVA: Add metadata necessary for the MPIC ballot (#7732)
- Add `Perspective` and `RIR` fields to the remote-va configuration
- Configure RVA ValidationAuthorityImpl instances with the contents of
the JSON configuration
- Configure VA ValidationAuthorityImpl instances with the constant
`va.PrimaryPerspective`
- Log `Perspective` for non-Primary Perspectives, per the MPIC
requirements in section 5.4.1 (2) vii of the BRs. Also log the RIR for
posterity.
- Introduce `ValidationResult` RPC fields `Perspective` and `Rir`, which
are not currently used but will be required for corroboration in #7616

Fixes https://github.com/letsencrypt/boulder/issues/7613
Part of https://github.com/letsencrypt/boulder/issues/7615
Part of https://github.com/letsencrypt/boulder/issues/7616
2024-10-10 09:37:55 -04:00
Samantha Frank c5dae06ffc
ratelimits: Add unit test coverage for TransactionBuilder methods (#7752) 2024-10-09 19:30:51 -04:00
James Renken 15c8752534
ceremony: Remove deprecated id-qt-cps support (#7750)
Fixes #7726
2024-10-08 16:09:33 -04:00
huochexizhan a6dc97cb5b
fix: fix slice init length (#7731)
Initialize a slice with a capacity of len(nameToString) rather than initializing
the length of this slice.

Signed-off-by: huochexizhan <huochexizhan@outlook.com>
2024-10-08 11:32:25 -04:00
dependabot[bot] 0a543d151b
build(deps): bump the aws group across 1 directory with 4 updates (#7734) 2024-10-07 13:39:28 -07:00