- Remove Perspective and RIR from ValidationRecords
- Make ValidationResultToPB Perspective and RIR aware
- Update comment for VA.PerformValidation
- Make verificationRequestEvent more like doDCVAuditLog
- Update language used in problems created by performRemoteValidation to
be more like doRemoteDCV.
Fix two bugs introduced in #7816:
- Fix localLatency time for CAA rechecking is always 0
- Fix logEvent.InternalError is no longer being written to log
To prepare for the MPIC requirement of having a minimum of 3
perspectives, I added code to `NewValidationAuthorityImpl` to error if
there aren't enough remote VAs configured _and_ the current VA is the
primary perspective. Then I fixed all the tests, which involved adding
some backends in the unittests, and spinning up `remoteva-c` in the
integration tests.
As a reminder, the `boulder va` command always considers itself the
primary perspective, while `boulder remoteva` gives itself a perspective
based on its config.
I wound up backing out the code in `NewValidationAuthorityImpl` because
right now our remote VAs are actually running the `boulder va` command,
so they would error out in prod, even though our actual primary
perspective does have enough backends. So this wound up as a test-only
change.
Previously this was a configuration field.
Ports `maxAllowedFailures()` from `determineMaxAllowedFailures()` in
#7794.
Test updates:
Remove the `maxRemoteFailures` param from `setup` in all VA tests.
Some tests were depending on setting this param directly to provoke
failures.
For example, `TestMultiVAEarlyReturn` previously relied on "zero allowed
failures". Since the number of allowed failures is now 1 for the number
of remote VAs we were testing (2), the VA wasn't returning early with an
error; it was succeeding! To fix that, make sure there are two failures.
Since two failures from two RVAs wouldn't exercise the right situation,
add a third RVA, so we get two failures from three RVAs.
Similarly, TestMultiCAARechecking had several test cases that omitted
this field, effectively setting it to zero allowed failures. I updated
the "1 RVA failure" test case to expect overall success and added a "2
RVA failures" test case to expect overall failure (we previously
expected overall failure from a single RVA failing).
In TestMultiVA I had to change a test for `len(lines) != 1` to
`len(lines) == 0`, because with more backends we were now logging more
errors, and finding e.g. `len(lines)` to be 2.
The zero value for `limit` is invalid, so returning `nil` in error cases
avoids silently returning invalid limits (and means that if code makes a
mistake and references an invalid limit it will be an obvious clear
stack trace).
This is more consistent, since the methods on `limit` use a pointer
receiver. Also, since `limit` is a fairly large object, this saves some
copying.
Related to #7803 and #7797.
Solves CI flakes in TestCertificatesPerDomain and
TestIdentifiersPausedForAccount that are the result of a race on the
Redis database. This has the downside of making failed validations and
successful finalizations take slightly longer.
- Add a new histogram, validationLatency
- Add a VA.observeLatency for observing validation latency
- Refactor to ensure this metric can be observed exclusively within
VA.PerformValidation and VA.IsCAAValid.
- Replace validationTime, localValidationTime, remoteValidationTime,
remoteValidationFailures, caaCheckTime, localCAACheckTime,
remoteCAACheckTime, and remoteCAACheckFailures with validationLatency
We're using `.Keys()` a method of the maps package added in 1.23
(https://pkg.go.dev/maps@master#Keys) but our go.mod states 1.22.0. This
causes some in-IDE linting errors in VSCode.
This means that most traffic will go to the authz URLs with account.
After this has been deployed for 30 days (the max lifetime of an order),
we can remove support for the old paths.
Part of #7683
TestPerformValidation_FailedThenSuccessfulValidationResetsPauseIdentifiersRatelimit
checks for a bucket being empty after a reset. However, that bucket is
based on an account ID that is shared across multiple test cases.
Instead, use a unique account and domain for this test.
Fixes#7812
Return an error and do logging in the caller. This adds early returns on
a number of error conditions, which can prevent nil pointer dereference
in those cases.
Also update the description for AutomaticallyPauseZombieClients.
Follows up #7763.
In the FailedAuthorizations limits, there was code that intentionally
ignored errLimitDisabled errors (`errors.Is(err, errLimitDisabled)`).
However, that that resulted in those functions later using a returned
`limit` value that was invalid (i.e. its zero value). That happened to
trigger some later checks in validateTransaction. Specifically this
check failed:
if txn.cost > txn.limit.Burst {
// error
When txt.limit.Burst is zero, this will always fail.
This problem doesn't really show up in prod, where all the limits are
configured. But it showed up in tests, specifically
TestPerformValidation_FailedValidationsTriggerPauseIdentifiersRatelimit,
where the limits are constructed using a simplified config that leaves
most of them disabled.
In this change, I tried to make handling of errLimitDisabled more
consistent, and always return an allow-only transaction as early as
possible instead of falling through the error condition.
Where that wasn't possible, I used a boolean to record whether the
result of `builder.getLimit()` was valid before referencing any of its
fields.
I also added some "shouldn't happen" errors to catch this problem
earlier if it recurs.
I removed some "skip disabled limit" comments because those say "what
the code does" (which the code also says), not "why the code does it".
Fixes the test failures in #7797.
Add a new WFE & nonce config field, `NonceHMACKey`, which uses the new
`cmd.HMACKeyConfig` type. Deprecate the `NoncePrefixKey` config field.
Generalize the error message when validating `HMACKeyConfig` in
`config`.
Remove the deprecated `UseDerivablePrefix` config field, which is no
longer used anywhere.
Part of #7632
- Added a new key-value ratelimit
`FailedAuthorizationsForPausingPerDomainPerAccount` which is incremented
each time a client fails a validation.
- As long as capacity exists in the bucket, a successful validation
attempt will reset the bucket back to full capacity.
- Upon exhausting bucket capacity, the RA will send a gRPC to the SA to
pause the `account:identifier`. Further validation attempts will be
rejected by the [WFE](https://github.com/letsencrypt/boulder/pull/7599).
- Added a new feature flag, `AutomaticallyPauseZombieClients`, which
enables automatic pausing of zombie clients in the RA.
- Added a new RA metric `paused_pairs{"paused":[bool],
"repaused":[bool], "grace":[bool]}` to monitor use of this new
functionality.
- Updated `ra_test.go` `initAuthorities` to allow accessing the
`*ratelimits.RedisSource` for checking that the new ratelimit functions
as intended.
Co-authored-by: @pgporada
Fixes https://github.com/letsencrypt/boulder/issues/7738
---------
Co-authored-by: Phil Porada <pporada@letsencrypt.org>
Co-authored-by: Phil Porada <philporada@gmail.com>
This adds new handlers under `/acme/authz/` and `/acme/chall/` that
expect to be followed by `{regID}/{authzID}` and
`{regID}/{authzID}/{challengeID}`, respectively. For deployability, the
old handlers continue to work, and the URLs returned for newly created
objects will still point to the paths used by the old handlers
(`/acme/authz-v3/` and `/acme/chall-v3/`).
There are some self-referential URLs in authz and challenge responses,
like the Location header, and the URL of challenges embedded in an
authorization object. This PR updates `prepAuthorizationForDisplay` and
`prepChallengeForDisplay` so those URLs can be generated consistently
with the path that was requested.
For the WFE tests, in most cases I duplicated an entire test and then
updated it to test the `WithAccount` handler. The idea is that once
we're fully switched over to the new format we can delete the tests for
the non-`WithAccount` variants.
Part of #7683
Goodkey has two ways to detect a key as weak: it runs a variety of
algorithmic checks (such as Fermat factorization and rocacheck), or the
key can be listed in a "weak key file". Similarly, it has two ways to
detect a key as blocked: it can call a generic function (which we use to
query our database), or the key can be listed in a "blocked key file".
This is two methods too many. Reliance on files of weak or blocked keys
introduces unnecessary complexity to both the implementation and
configuration of the goodkey package. Remove both "key file" options and
delete all code which supported them.
Also remove //test/block-a-key, as it was only used to generate these
test files.
IN-10762 tracked the removal of these files in prod.
Fixes https://github.com/letsencrypt/boulder/issues/7748
Introduce separate UpdateRegistrationContact & UpdateRegistrationKey
methods in RA & SA
Clear contact field during DeactivateRegistration
Part of #7716
Part of #5554
This makes the log events easier to parse, and makes it easier to
consistently use the correct fields from the issuance request.
Also, reduce the number of fields that are logged on error events.
Logging just the serial and the error in most cases should suffice to
cross-reference the error with the item that we attempted to sign.
One downside is that this increases the total log size (10kB above, vs
7kB from a similar production issuance) due in part to more repetition.
For example, both the "signing cert" and "signing cert success" log
lines include the full precert DER.
Note that our long-term plan for more structured logs is to have a
unique event id to join logs on, which can avoid this repetition. But
since we don't currently have convenient ways to do that join, some
duplication (as we currently have in the logs) seems reasonable.
One format we receive key compromise reports is as a CSR file. For
example, from https://pwnedkeys.com/revokinator
This allows the admin command to block a key from a CSR directly,
instead of needing to validate it manually and get the SPKI or key from
it.
I've added a flag (default true) to check the signature on the CSR, in
case we ever decide we want to block a key from a CSR with a bad
signature for whatever reason.
TestTraces is designed to test whether our Open Telemetry tracing system
is working: that spans are being output, that they have the appropriate
parents, etc. It should not be testing whether Boulder took a specific
path through its code -- that's the domain of package-specific unit
tests. Simplify TestTraces to the point that it is asserting (nearly)
the bare minimum about the set of operations Boulder performs.
Add a new method, `BatchIncrement`, to issue `IncrBy` (instead of `Set`)
to Redis. This helps prevent the race condition that allows bursts of
near-simultaneous requests to, effectively, spend the same token.
Call this new method when incrementing an existing key. New keys still
need to use `BatchSet` because Redis doesn't have a facility to, within
a single operation, increment _or_ set a default value if none exists.
Add a new feature flag, `IncrementRateLimits`, gating the use of this
new method.
CPS Compliance Review: This feature flag does not change any behaviour
that is described or constrained by our CP/CPS. The closest relation
would just be API availability in general.
Fixes#7780
The naming of our `precertificates` table (now used to store linting
certificates) is definitely confusing, so add some more comments in
various places explaining. See #6807.
Bumps the aws group with 3 updates in the / directory:
[github.com/aws/aws-sdk-go-v2](https://github.com/aws/aws-sdk-go-v2),
[github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2)
and
[github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2).
Updates `github.com/aws/aws-sdk-go-v2` from 1.31.0 to 1.32.2
Updates `github.com/aws/aws-sdk-go-v2/config` from 1.27.39 to 1.27.43
Updates `github.com/aws/aws-sdk-go-v2/service/s3` from 1.63.3 to 1.65.3
Updates `github.com/aws/smithy-go` from 1.21.0 to 1.22.0
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Creating the list of authzIDs earlier in NewOrderAndAuthzs:
- Saves a `for` loop with duplicated code; we no longer need to range
over two different slices, just one.
- Allows us to create the Order PB later, after more of the data
collection logic, without interrupting it. This makes the order of
operations slightly easier to follow.
- Add `Perspective` and `RIR` fields to the remote-va configuration
- Configure RVA ValidationAuthorityImpl instances with the contents of
the JSON configuration
- Configure VA ValidationAuthorityImpl instances with the constant
`va.PrimaryPerspective`
- Log `Perspective` for non-Primary Perspectives, per the MPIC
requirements in section 5.4.1 (2) vii of the BRs. Also log the RIR for
posterity.
- Introduce `ValidationResult` RPC fields `Perspective` and `Rir`, which
are not currently used but will be required for corroboration in #7616
Fixes https://github.com/letsencrypt/boulder/issues/7613
Part of https://github.com/letsencrypt/boulder/issues/7615
Part of https://github.com/letsencrypt/boulder/issues/7616
Initialize a slice with a capacity of len(nameToString) rather than initializing
the length of this slice.
Signed-off-by: huochexizhan <huochexizhan@outlook.com>