Remove the .Error() method from probs.ProblemDetails, so that it can no
longer be returned from functions which return an error. Update various
call sites to use the .String() method to get a textual representation
of the problem instead. Simplify ProblemDetailsForError to not
special-case and pass-through ProblemDetails, since they are no longer a
valid input to that function.
This reduces instances of "boxed nil" bugs, and paves the way for all of
the WFE methods to be refactored to simply return errors instead of
writing them directly into the response object.
Part of https://github.com/letsencrypt/boulder/issues/4980
Add an `identifier` field to the `va.PerformValidationRequest` proto, which will soon replace its `dnsName` field.
Accept and prefer the `identifier` field in every VA function that uses this struct. Don't (yet) assume it will be present.
Throughout the VA, accept and handle the IP address identifier type. Handling is similar to DNS names, except that `getAddrs` is not called, and consider that:
- IPs are represented in a different field in the `x509.Certificate` struct.
- IPs must be presented as reverse DNS (`.arpa`) names in SNI for [TLS-ALPN-01 challenge requests](https://datatracker.ietf.org/doc/html/rfc8738#name-tls-with-application-layer-).
- IPv6 addresses are enclosed in square brackets when composing or parsing URLs.
For HTTP-01 challenges, accept redirects to bare IP addresses, which were previously rejected.
Fixes#2706
Part of #7311
Update from go1.23.1 to go1.23.6 for our primary CI and release builds.
This brings in a few security fixes that aren't directly relevant to us.
Add go1.24.0 to our matrix of CI and release versions, to prepare for
switching to this next major version in prod.
Previously this was a configuration field.
Ports `maxAllowedFailures()` from `determineMaxAllowedFailures()` in
#7794.
Test updates:
Remove the `maxRemoteFailures` param from `setup` in all VA tests.
Some tests were depending on setting this param directly to provoke
failures.
For example, `TestMultiVAEarlyReturn` previously relied on "zero allowed
failures". Since the number of allowed failures is now 1 for the number
of remote VAs we were testing (2), the VA wasn't returning early with an
error; it was succeeding! To fix that, make sure there are two failures.
Since two failures from two RVAs wouldn't exercise the right situation,
add a third RVA, so we get two failures from three RVAs.
Similarly, TestMultiCAARechecking had several test cases that omitted
this field, effectively setting it to zero allowed failures. I updated
the "1 RVA failure" test case to expect overall success and added a "2
RVA failures" test case to expect overall failure (we previously
expected overall failure from a single RVA failing).
In TestMultiVA I had to change a test for `len(lines) != 1` to
`len(lines) == 0`, because with more backends we were now logging more
errors, and finding e.g. `len(lines)` to be 2.
Clean up how we handle identifiers throughout the Boulder codebase by
- moving the Identifier protobuf message definition from sa.proto to
core.proto;
- adding support for IP identifier to the "identifier" package;
- renaming the "identifier" package's exported names to be clearer; and
- ensuring we use the identifier package's helper functions everywhere
we can.
This will make future work to actually respect identifier types (such as
in Authorization and Order protobuf messages) simpler and easier to
review.
Part of https://github.com/letsencrypt/boulder/issues/7311
The core.Challenge.ProvidedKeyAuthorization field is problematic, both
because it is poorly named (which is admittedly easily fixable) and
because it is a field which we never expose to the client yet it is held
on a core type. Deprecate this field, and replace it with a new
vapb.PerformValidationRequest.ExpectedKeyAuthorization field.
Within the VA, this also simplifies the primary logic methods to just
take the expected key authorization, rather than taking a whole (largely
unnecessary) challenge object. This has large but wholly mechanical
knock-on effects on the unit tests.
While we're here, improve the documentation on core.Challenge itself,
and remove Challenge.URI, which was deprecated long ago and is wholly
unused.
Part of https://github.com/letsencrypt/boulder/issues/7514
This allows us to defer creating the user-friendly ProblemDetails to the
highest level (va.PerformValidation), which in turn makes it possible to
log the original error alongside the user-friendly error. It also
reduces the likelihood of "boxed nil" bugs.
Many of the unittests check for a specific ProblemDetails.Type and
specific Details contents. These test against the output of
`detailedError`, which transforms `error` into `ProblemDetails`. So the
updates to the tests include insertion of `detailedError(err)` in many
places.
Several places that were returning a specific ProblemDetails.Type
instead return the corresponding `berrors` type. This follows a pattern
that `berrors` was designed to enable: use the `berrors` types
internally and transform into `ProblemDetails` at the edge where we are
rendering something to present to the user: WFE, and now VA.
Previously, `va.IsCAAValid` would only check CAA records from the
primary VA during initial domain control validation, completely ignoring
any configured RVAs. The upcoming
[MPIC](https://github.com/ryancdickson/staging/pull/8) ballot will
require that it be done from multiple perspectives. With the currently
deployed [Multi-Perspective
Validation](https://letsencrypt.org/2020/02/19/multi-perspective-validation.html)
in staging and production, this change brings us in line with the
[proposed phase
3](https://github.com/ryancdickson/staging/pull/8/files#r1368708684).
This change reuses the existing
[MaxRemoteValidationFailures](21fc191273/cmd/boulder-va/main.go (L35))
variable for the required non-corroboration quorum.
> Phase 3: June 15, 2025 - December 14, 2025 ("CAs MUST implement MPIC
in blocking mode*"):
>
> MUST implement MPIC? Yes
> Required quorum?: Minimally, 2 remote perspectives must be used. If
using less than 6 remote perspectives, 1 non-corroboration is allowed.
If using 6 or more remote perspectives, 2 non-corroborations are
allowed.
> MUST block issuance if quorum is not met: Yes.
> Geographic diversity requirements?: Perspectives must be 500km from 1)
the primary perspective and 2) all other perspectives used in the
quorum.
>
> * Note: "Blocking Mode" is a nickname. As opposed to "monitoring mode"
(described in the last milestone), CAs MUST NOT issue a certificate if
quorum requirements are not met from this point forward.
Adds new VA feature flags:
* `EnforceMultiCAA` instructs a primary VA to command each of its
configured RVAs to perform a CAA recheck.
* `MultiCAAFullResults` causes the primary VA to block waiting for all
RVA CAA recheck results to arrive.
Renamed `va.logRemoteValidationDifferentials` to
`va.logRemoteDifferentials` because it can handle initial domain control
validations and CAA rechecking with minimal editing.
Part of https://github.com/letsencrypt/boulder/issues/7061
Add go1.21rc2 to the matrix of go versions we test against.
Add a new step to our CI workflows (boulder-ci, try-release, and
release) which sets the "GOEXPERIMENT=loopvar" environment variable if
we're running go1.21. This experiment makes it so that loop variables
are scoped only to their single loop iteration, rather than to the whole
loop. This prevents bugs such as our CAA Rechecking incident
(https://bugzilla.mozilla.org/show_bug.cgi?id=1619047). Also add a line
to our docker setup to propagate this environment variable into the
container, where it can affect builds.
Finally, fix one TLS-ALPN-01 test to have the fake subscriber server
actually willing to negotiate the acme-tls/1 protocol, so that the ACME
server's tls client actually waits to (fail to) get the certificate,
instead of dying immediately. This fix is related to the upgrade to
go1.21, not the loopvar experiment.
Fixes https://github.com/letsencrypt/boulder/issues/6950
Clean up several spots where we were behaving differently on
go1.18 and go1.19, now that we're using go1.19 everywhere. Also
re-enable the lint and generate tests, and fix the various places where
the two versions disagreed on how comments should be formatted.
Also clean up the OldTLS codepaths, now that both go1.19 and our
own feature flags have forbidden TLS < 1.2 everywhere.
Fixes#6011
Enable the "unparam" linter, which checks for unused function
parameters, unused function return values, and parameters and
return values that always have the same value every time they
are used.
In addition, fix many instances where the unparam linter complains
about our existing codebase. Remove error return values from a
number of functions that never return an error, remove or use
context and test parameters that were previously unused, and
simplify a number of (mostly test-only) functions that always take the
same value for their parameter. Most notably, remove the ability to
customize the RSA Public Exponent from the ceremony tooling,
since it should always be 65537 anyway.
Fixes#6104
Run the Boulder unit and integration tests with go1.19.
In addition, make a few small changes to allow both sets of
tests to run side-by-side. Mark a few tests, including our lints
and generate checks, as go1.18-only. Reformat a few doc
comments, particularly lists, to abide by go1.19's stricter gofmt.
Causes #6275
Slightly refactor `validateTLSALPN01` to use a common function
to format the error messages it returns. This reduces code duplication
and makes the important validation logic easier to follow.
Fixes#5922
RFC 8737 says "The client prepares for validation by constructing a self-signed
certificate...". Add a check for whether the challenge certificate is
self-signed by ensuring its issuer and subject are equal, and checking its
signature with its own public key.
Also slightly refactor the helper methods to return only a single cert, since we
only care about the first one returned. And add a test.
Detect when a non-compliant ACME client presents a non-compliant x.509
certificate that contains multiple copies of the SubjectAlternativeName or ACME
Identifier extensions; this should be forbidden.
RFC 8737 states that the certificate presented during the "acme-tls/1"
handshake has "a subjectAltName extension containing the dNSName
being validated and no other entries". We were checking that it contained
no other dNSNames, but not requiring that it not have any other kinds of
Subject Alternative Names.
Factor all of our SAN checks into a helper function. Have that function
construct the expected bytes of the SAN extension from the one DNS
name we expect to see, and assert that the actual bytes match the
expectation. Add non-DNS-name identifiers to our error output when we
encounter a cert whose SANs don't match. And add tests which check
that we fail the validation when the cert has multiple SANs.
Current metrics show that subscribers present certificates using the
obsolete OID to identify their id-pe-acmeIdentifier extension about
an order of magnitude less often than they present the correct OID.
Remove support for the never-standardized OID.
RFC 8737, Section 4, states "ACME servers that implement "acme-tls/1"
MUST only negotiate TLS 1.2 [RFC5246] or higher when connecting to
clients for validation." Enforce that our outgoing connections to validate
TLS-ALPN-01 challenges do not negotiate TLS1.1.
Add go1.17beta1 docker images to the set of things we build,
and integrate go1.17beta1 into the set of environments CI runs.
Fix one test which breaks due to an underlying refactoring in
the `crypto/x509` stdlib package. Fix one other test which breaks
due to new guarantees in the stdlib's TLS ALPN implementation.
Also removes go1.16.5 from CI so we're only running 2 versions.
Fixes#5480
Update all of our tests to use `AssertMetricWithLabelsEquals`
instead of combinations of the older `CountFoo` helpers with
simple asserts. This coalesces all of our prometheus inspection
logic into a single function, allowing the deletion of four separate
helper functions.
- Add 1.16.1 to the GitHub CI test matrix
- Fix tlsalpn tests for go 1.16.1 but maintain compatibility with 1.15.x
- Fix integration tests.
Fix: #5301Fix: #5316
[Go style says](https://blog.golang.org/package-names):
> Avoid stutter. Since client code uses the package name as a prefix
> when referring to the package contents, the names for those contents
> need not repeat the package name. The HTTP server provided by the
> http package is called Server, not HTTPServer. Client code refers to
> this type as http.Server, so there is no ambiguity.
Rename DNSClient, DNSClientImpl, NewDNSClientImpl,
NewTestDNSClientImpl, DNSError, and MockDNSClient to follow those
guidelines.
Unexport DNSClientImpl and MockTimeoutError (was only used internally).
Make New and NewTest return the Client interface rather than a concrete
`impl` type.
Updates the type of the ValidationAuthority's PerformValidation
method to be identical to that of the corresponding auto-generated
grpc method, i.e. directly taking and returning proto message
types, rather than exploded arguments.
This allows all logic to be removed from the VA wrappers, which
will allow them to be fully removed after the migration to proto3.
Also updates all tests and VA clients to adopt the new interface.
Depends on #4983 (do not review first four commits)
Part of #4956
A unit test is included to verify that a TLS-ALPN-01 challenge to
a TLS 1.3 only server doesn't succeed when the `GODEBUG` value to
disable TLS 1.3 in `docker-compose.yml` is set. Without this env var
the test fails on the Go 1.13 build because of the new default:
```
=== RUN TestTLSALPN01TLS13
--- FAIL: TestTLSALPN01TLS13 (0.04s)
tlsalpn_test.go:531: expected problem validating TLS-ALPN-01 challenge against a TLS 1.3 only server, got nil
FAIL
FAIL github.com/letsencrypt/boulder/va 0.065s
```
With the env var set the test passes, getting the expected connection
problem reporting a tls error:
```
=== RUN TestTLSALPN01TLS13
2019/09/13 18:59:00 http: TLS handshake error from 127.0.0.1:51240: tls: client offered only unsupported versions: [303 302 301]
--- PASS: TestTLSALPN01TLS13 (0.03s)
PASS
ok github.com/letsencrypt/boulder/va 1.054s
```
Since we plan to eventually enable TLS 1.3 support and the `GODEBUG`
mechanism tested in the above test is platform-wide vs package
specific I decided it wasn't worth the time investment to write a
similar HTTP-01 unit test that verifies the TLS 1.3 behaviour on a
HTTP-01 HTTP->HTTPS redirect.
Resolves https://github.com/letsencrypt/boulder/issues/4415
This PR changes the VA to return `dns` problem type for errors when performing
HTTP-01 challenges for domains that have no IP addresses, or errors looking up
the IP addresses.
The `va.getAddrs` function is internal to the VA and can return
`berrors.BoulderError`s with a DNS type when there is an error, allowing the
calling code to convert this to a problem when required
using an updated `detailedError` function. This avoids some clunky conversion
the HTTP-01 code was doing that misrepresented DNS level errors as connection
problems with a DNS detail message.
In order to add an integration test for challenge validation that results in
`getAddrs` DNS level errors the Boulder tools image had to be bumped to a tag
that includes the latest `pebble-challtestsrv` that
supports mocking SERVFAILs. It isn't possible to mock this case with internal IP
addresses because our VA test configuration does not filter internal addresses
to support the testing context.
Additionally this branch removes the `UnknownHostProblem` from the `probs`
package:
1. It isn't used anywhere after 532c210
2. It's not a real RFC 8555 problem type. We should/do use the
DNS type for this.
Resolves https://github.com/letsencrypt/boulder/issues/4407
Brings it to be more in line with the responses from the other two challenges and
will hopefully make the challenge a lot easier to debug (like in the recent community
thread).
```json
"error": {
"type": "urn:ietf:params:acme:error:unauthorized",
"detail": "Incorrect validation certificate for tls-alpn-01 challenge. Expected acmeValidationV1 extension value 836bf5358f8a32826c61faeff2e0225b00756f935b00ed3002cabb9d536b9f53 for this challenge but got 8539b12e31c306b81a0aedab4128722c6ad71f71f46316a3c71612f47df0e532",
"status": 403
},
```
This will allow implementing sub-problems without creating a cyclic
dependency between `core` and `problems`.
The `identifier` package is somewhat small/single-purpose and in the
future we may want to move more "ACME" bits beyond the `identifier`
types into a dedicated package outside of `core`.
When I introduced the new HTTP-01 code I did it in `va/http.go` intending to try and make the very large `va.go` file a little bit smaller. This is the continuation of that work.
* f96ad92 - moves remaining HTTP-01 specific code to `va/http.go`.
* 1efb9a1 - moves TLS-ALPN-01 code into `va/tlsalpn.go`.
* 95ea567 - moves DNS-01 code into `va/dns.go`.
* 6ff0395 - moves unit tests from `va/va_test.go` into `va/http_test.go`, `va/tlsalpn_test.go` and `va/dns_test.go`.
In the end `va/va.go` contains code related to metrics, top level RPCs (e.g. `PerformValidation`), and the multi-VA code. This makes the file lengths much more manageable overall.
Note: There is certainly room for cleaning up some of the older unit test cruft from `va/va_test.go`. For now I only moved it as-is into the challenge specific test files.