Fix three unit tests which have been flakily failing for the last
several weeks:
//test/load-generator/acme: TestNew/unreachable_directory_URL
Fixed by changing the error checking code to care only about the
underlying "connection refused" message, and not the IP address from
which it was receieved.
//va: TestHTTPDialTimeout
Fixed by correcting the error checking code to look for "network is
unreachable" instead of "Network unreachable"
//va: TestFetchHTTP/Broken_IPv6_only
Fixed by making the expected error message more specific -- it was
previously looking for "Error getting validation data", which is the
message that `detailedError` gives for errors it doesn't recognize. An
underlying library has changed to provide an error type that
`detailedError` now recognizes as a connection error.
The core.Challenge.ProvidedKeyAuthorization field is problematic, both
because it is poorly named (which is admittedly easily fixable) and
because it is a field which we never expose to the client yet it is held
on a core type. Deprecate this field, and replace it with a new
vapb.PerformValidationRequest.ExpectedKeyAuthorization field.
Within the VA, this also simplifies the primary logic methods to just
take the expected key authorization, rather than taking a whole (largely
unnecessary) challenge object. This has large but wholly mechanical
knock-on effects on the unit tests.
While we're here, improve the documentation on core.Challenge itself,
and remove Challenge.URI, which was deprecated long ago and is wholly
unused.
Part of https://github.com/letsencrypt/boulder/issues/7514
When we present error messages containing IP addresses to end users,
sometimes we're presenting the IP at which we began a chain of
redirects, but not presenting the IP at which the final redirect was
located. This can make it difficult for client operators to identify
exactly which server in their infrastructure is misbehaving.
Change our IP error messages to reference the most-recently-tried
address from the set of all validation records, rather than the address
which started the current (potential) redirect chain.
Fixes https://github.com/letsencrypt/boulder/issues/7347
This allows us to defer creating the user-friendly ProblemDetails to the
highest level (va.PerformValidation), which in turn makes it possible to
log the original error alongside the user-friendly error. It also
reduces the likelihood of "boxed nil" bugs.
Many of the unittests check for a specific ProblemDetails.Type and
specific Details contents. These test against the output of
`detailedError`, which transforms `error` into `ProblemDetails`. So the
updates to the tests include insertion of `detailedError(err)` in many
places.
Several places that were returning a specific ProblemDetails.Type
instead return the corresponding `berrors` type. This follows a pattern
that `berrors` was designed to enable: use the `berrors` types
internally and transform into `ProblemDetails` at the edge where we are
rendering something to present to the user: WFE, and now VA.
Adds the chosen DNS resolver to the VAs `ValidationRecord` object so
that for each challenge type during a validation, boulder can audit log
the resolver(s) chosen to fulfill the request..
Fixes https://github.com/letsencrypt/boulder/issues/7140
Previously, `va.IsCAAValid` would only check CAA records from the
primary VA during initial domain control validation, completely ignoring
any configured RVAs. The upcoming
[MPIC](https://github.com/ryancdickson/staging/pull/8) ballot will
require that it be done from multiple perspectives. With the currently
deployed [Multi-Perspective
Validation](https://letsencrypt.org/2020/02/19/multi-perspective-validation.html)
in staging and production, this change brings us in line with the
[proposed phase
3](https://github.com/ryancdickson/staging/pull/8/files#r1368708684).
This change reuses the existing
[MaxRemoteValidationFailures](21fc191273/cmd/boulder-va/main.go (L35))
variable for the required non-corroboration quorum.
> Phase 3: June 15, 2025 - December 14, 2025 ("CAs MUST implement MPIC
in blocking mode*"):
>
> MUST implement MPIC? Yes
> Required quorum?: Minimally, 2 remote perspectives must be used. If
using less than 6 remote perspectives, 1 non-corroboration is allowed.
If using 6 or more remote perspectives, 2 non-corroborations are
allowed.
> MUST block issuance if quorum is not met: Yes.
> Geographic diversity requirements?: Perspectives must be 500km from 1)
the primary perspective and 2) all other perspectives used in the
quorum.
>
> * Note: "Blocking Mode" is a nickname. As opposed to "monitoring mode"
(described in the last milestone), CAs MUST NOT issue a certificate if
quorum requirements are not met from this point forward.
Adds new VA feature flags:
* `EnforceMultiCAA` instructs a primary VA to command each of its
configured RVAs to perform a CAA recheck.
* `MultiCAAFullResults` causes the primary VA to block waiting for all
RVA CAA recheck results to arrive.
Renamed `va.logRemoteValidationDifferentials` to
`va.logRemoteDifferentials` because it can handle initial domain control
validations and CAA rechecking with minimal editing.
Part of https://github.com/letsencrypt/boulder/issues/7061
This can take two values (typically the return values of a two-value
function) and panic if the error is non-nil, returning the interesting
value. This is particularly useful for cases where we statically know
the call will succeed.
Thanks to @mcpherrinm for the idea!
Removes the `Hostname` and `Port` fields from an http-01
ValidationRecord model prior to storing the record in the database.
Using `"hostname":"example.com","port":"80"` as a snippet of a whole
validation record, we'll save minimum 36 bytes for each new http-01
ValidationRecord that gets stored. When retrieving the record, the
ValidationRecord `RehydrateHostPort` method will repopulate the
`Hostname` and `Port` fields from the `URL` field.
Fixes the main goal of
https://github.com/letsencrypt/boulder/issues/5231.
---------
Co-authored-by: Samantha <hello@entropy.cat>
Occasionally (and just now) I've responded to an issue or thread that
involves this error message:
> The key authorization file from the server did not match this
challenge
"LoqXcYV8q5ONbJQxbmR7SCTNo3tiAXDfowyjxAjEuX0.9jg46WB3rR_AHD-EBXdN7cBkH1WOu0tA3M9fm21mqTI"
!= "\xef\xffAABBCC
and I've found myself looking at Boulder's source code, to check which
way around the values are. I suspect that users are not understanding it
either.
Make minor, non-user-visible changes to how we structure the probs
package. Notably:
- Add new problem types for UnsupportedContact and
UnsupportedIdentifier, which are specified by RFC8555 and which we will
use in the future, but haven't been using historically.
- Sort the problem types and constructor functions to match the
(alphabetical) order given in RFC8555.
- Rename some of the constructor functions to better match their
underlying problem types (e.g. "TLSError" to just "TLS").
- Replace the redundant ProblemDetailsToStatusCode function with simply
always returning a 500 if we haven't properly set the problem's
HTTPStatus.
- Remove the ability to use either the V1 or V2 error namespace prefix;
always use the proper RFC namespace prefix.
Compute "started" and "took" variables inside the retry loop, so that we're
actually measuring how long the request which timed out took, rather than
measuring how long the request which timed out plus all the previous "network
unreachable" attempts took.
For consistency, put the error field at the end of unstructured log
lines to make them more ... structured.
Adds the `issuerID` field to "orphaning certificate" log line in the CA
to match the "orphaning precertificate" log line.
Fixes broken tests as a result of the CA and bdns log line change.
Fixes#5457
This avoids serialization errors passing through gRPC.
Also, add a pass-through path in replaceInvalidUTF8 that saves an
allocation in the trivial case.
Fixes#6490
Clean up several spots where we were behaving differently on
go1.18 and go1.19, now that we're using go1.19 everywhere. Also
re-enable the lint and generate tests, and fix the various places where
the two versions disagreed on how comments should be formatted.
Also clean up the OldTLS codepaths, now that both go1.19 and our
own feature flags have forbidden TLS < 1.2 everywhere.
Fixes#6011
Enable the "unparam" linter, which checks for unused function
parameters, unused function return values, and parameters and
return values that always have the same value every time they
are used.
In addition, fix many instances where the unparam linter complains
about our existing codebase. Remove error return values from a
number of functions that never return an error, remove or use
context and test parameters that were previously unused, and
simplify a number of (mostly test-only) functions that always take the
same value for their parameter. Most notably, remove the ability to
customize the RSA Public Exponent from the ceremony tooling,
since it should always be 65537 anyway.
Fixes#6104
Run the Boulder unit and integration tests with go1.19.
In addition, make a few small changes to allow both sets of
tests to run side-by-side. Mark a few tests, including our lints
and generate checks, as go1.18-only. Reformat a few doc
comments, particularly lists, to abide by go1.19's stricter gofmt.
Causes #6275
The iotuil package has been deprecated since go1.16; the various
functions it provided now exist in the os and io packages. Replace all
instances of ioutil with either io or os, as appropriate.
These new linters are almost all part of golangci-lint's collection
of default linters, that would all be running if we weren't setting
`disable-all: true`. By adding them, we now have parity with the
default configuration, as well as the additional linters we like.
Adds the following linters:
* unconvert
* deadcode
* structcheck
* typecheck
* varcheck
* wastedassign
This adds three features flags: SHA1CSRs, OldTLSOutbound, and
OldTLSInbound. Each controls the behavior of an upcoming deprecation
(except OldTLSInbound, which isn't yet scheduled for a deprecation
but will be soon). Note that these feature flags take advantage of
`features`' default values, so they can default to "true" (that is, each
of these features is enabled by default), and we set them to "false"
in the config JSON to turn them off when the time comes.
The unittest for OldTLSOutbound requires that `example.com` resolves
to 127.0.0.1. This is because there's logic in the VA that checks
that redirected-to hosts end in an IANA TLD. The unittest relies on
redirecting, and we can't use e.g. `localhost` in it because of that
TLD check, so we use example.com.
Fixes#6036 and #6037
Make minor updates to our implementation of the HTTP-01 validation method based
on in-depth review of BRs Section 3.2.2.4.19 and RFC 8555 Section 8.3.
- Move the HTTP response code check above parsing the body.
- Explicitly check for 301, 302, 307, and 308 redirect codes, so that if the go
stdlib updates to allow additional redirects we don't follow suit.
- Trim additional forms of white-space from the key authorization.
These log lines are sometimes useful for debugging, but are a normal
part of operation, not an error: Unbound will allow a response to
timeout if the remote server is too slow.
[Go style says](https://blog.golang.org/package-names):
> Avoid stutter. Since client code uses the package name as a prefix
> when referring to the package contents, the names for those contents
> need not repeat the package name. The HTTP server provided by the
> http package is called Server, not HTTPServer. Client code refers to
> this type as http.Server, so there is no ambiguity.
Rename DNSClient, DNSClientImpl, NewDNSClientImpl,
NewTestDNSClientImpl, DNSError, and MockDNSClient to follow those
guidelines.
Unexport DNSClientImpl and MockTimeoutError (was only used internally).
Make New and NewTest return the Client interface rather than a concrete
`impl` type.
Currently the VA checks to see how many redirects have been followed and
bails out if greater than maxRedirect (10), but it does not check to see
if any redirect url has been followed twice which would mean a broken
infinite redirect loop. Storing the validation records for these is
relatively expensive because we store a record for each hop in the
redirect.
This change checks the previous redirect records to see if the URL has
been used before and error if it has. This will catch a redirect loop
earlier than the maxRedirect value in most cases.
Fixes#5224
Updates the type of the ValidationAuthority's PerformValidation
method to be identical to that of the corresponding auto-generated
grpc method, i.e. directly taking and returning proto message
types, rather than exploded arguments.
This allows all logic to be removed from the VA wrappers, which
will allow them to be fully removed after the migration to proto3.
Also updates all tests and VA clients to adopt the new interface.
Depends on #4983 (do not review first four commits)
Part of #4956
In #3708, we added formatters for the the convenience methods in the
`probs` package.
However, in #4783, @alexzorin pointed out that we were incorrectly
passing an error message through fmt.Sprintf as the format parameter
rather than as a value parameter.
I proposed a fix in #4784, but during code review we concluded that the
underlying problem was the pattern of using format-style functions that
don't have some variant of printf in the name. That makes this wrong:
`probs.DNS(err.Error())`, and this right: `probs.DNS("%s", err)`. Since
that's an easy mistake to make and a hard one to spot during code review,
we're going to stop using this particular pattern and call `fmt.Sprintf`
directly.
This PR reverts #3708 and adds some `fmt.Sprintf` where needed.
This can happen when a misconfiguration redirects a certain path to
itself, doubled. After 10 redirects the error message can get quite
long. Instead we halt things at 2000 bytes, which should be more than
enough.
We've found we need the context offered from logging the error closer to when it
happens in the `bdns` package rather than in the `va`. Adopting the function
requires adapting it slightly. Specifically in the new location we know it won't
be called with any timeout results, with a non-dns error, or with a nil
underlying error.
Having the logging done in `bdns` (and specifically from `exchangeOne`) also
lets us log the wire format of the query and response when we get a `dns.ErrId`
error indicating a query/response ID mismatch. A small unit test is included
that ensures the logging happens as expected.
In case it proves useful for matching against other metrics the DNS ID mismatch
error case also now increments a dedicated prometheus counter vector stat,
`dns_id_mismatch`. The stat is labelled by resolver and query type.
Resolves https://github.com/letsencrypt/boulder/issues/4532
When we get a DNS error that has an internal cause (like connection
refused), we return a generic message like "networking error" to the
user to avoid revealing details that would be confusing. However, when
debugging problems with our own services, it's useful to have the
underlying errors.
This adds a helper method in the VA and calls it from each place we use
DNS errors.
This PR changes the VA to return `dns` problem type for errors when performing
HTTP-01 challenges for domains that have no IP addresses, or errors looking up
the IP addresses.
The `va.getAddrs` function is internal to the VA and can return
`berrors.BoulderError`s with a DNS type when there is an error, allowing the
calling code to convert this to a problem when required
using an updated `detailedError` function. This avoids some clunky conversion
the HTTP-01 code was doing that misrepresented DNS level errors as connection
problems with a DNS detail message.
In order to add an integration test for challenge validation that results in
`getAddrs` DNS level errors the Boulder tools image had to be bumped to a tag
that includes the latest `pebble-challtestsrv` that
supports mocking SERVFAILs. It isn't possible to mock this case with internal IP
addresses because our VA test configuration does not filter internal addresses
to support the testing context.
Additionally this branch removes the `UnknownHostProblem` from the `probs`
package:
1. It isn't used anywhere after 532c210
2. It's not a real RFC 8555 problem type. We should/do use the
DNS type for this.
Resolves https://github.com/letsencrypt/boulder/issues/4407
This will allow implementing sub-problems without creating a cyclic
dependency between `core` and `problems`.
The `identifier` package is somewhat small/single-purpose and in the
future we may want to move more "ACME" bits beyond the `identifier`
types into a dedicated package outside of `core`.
Often folks will mis-configure their webserver to send an HTTP redirect missing
a `/' between the FQDN and the path.
E.g. in Apache using:
Redirect / https://bad-redirect.org
Instead of:
Redirect / https://bad-redirect.org/
Will produce an invalid HTTP-01 redirect target like:
https://bad-redirect.org.well-known/acme-challenge/xxxx
This happens frequently enough we want to return a distinct error message
for this case by detecting the redirect targets ending in ".well-known".
After the "Simple HTTP-01" code landed this case was previously getting an error
message of the form:
> "Invalid hostname in redirect target, must end in IANA registered TLD"
Resolves https://github.com/letsencrypt/boulder/issues/3606
When I introduced the new HTTP-01 code I did it in `va/http.go` intending to try and make the very large `va.go` file a little bit smaller. This is the continuation of that work.
* f96ad92 - moves remaining HTTP-01 specific code to `va/http.go`.
* 1efb9a1 - moves TLS-ALPN-01 code into `va/tlsalpn.go`.
* 95ea567 - moves DNS-01 code into `va/dns.go`.
* 6ff0395 - moves unit tests from `va/va_test.go` into `va/http_test.go`, `va/tlsalpn_test.go` and `va/dns_test.go`.
In the end `va/va.go` contains code related to metrics, top level RPCs (e.g. `PerformValidation`), and the multi-VA code. This makes the file lengths much more manageable overall.
Note: There is certainly room for cleaning up some of the older unit test cruft from `va/va_test.go`. For now I only moved it as-is into the challenge specific test files.
* `EnforceMultiVA` to allow configuring multiple VAs but not changing the primary VA's result based on what the remote VAs return.
* `MultiVAFullResults` to allow collecting all of the remote VA results. When all results are collected a JSON log line with the differential between the primary/remote VAs is logged.
Resolves https://github.com/letsencrypt/boulder/issues/4066
* A redirect without a hostname is obviously bad and should get
a distinct error message as early as possible.
* A redirect to a hostname that doesn't end in an IANA registered TLD is
also obviously bad and should get a distinct error message as early as
possible.
We're only using the simplified HTTP-01 code from `va/http.go` now 🎉 The old unit tests that still seem relevant are left in place in `va/va_test.go` instead of being moved to `va/http_test.go` to signal that they're a bit crufty and could probably use a separate cleanup. For now I'm hesitant to remove test coverage so I updated them in-place without moving them to a new home.
Resolves https://github.com/letsencrypt/boulder/issues/4089
The `singleDialTimeout` field was previously a global `const` in the
`va` package. Making it a field of the VA impl (and the dialer structs)
makes it easier to test that it is working as expected with a smaller
than normal value.
A new `TestPreresolvedDialerTimeout` unit test is added that tests the
fix from https://github.com/letsencrypt/boulder/pull/4046
Without the fix applied:
```
=== RUN TestPreresolvedDialerTimeout
--- FAIL: TestPreresolvedDialerTimeout (0.49s)
http_test.go:86: fetch didn't timeout after 50ms
FAIL
FAIL github.com/letsencrypt/boulder/va 0.512s
```
With the fix applied:
```
=== RUN TestPreresolvedDialerTimeout
--- PASS: TestPreresolvedDialerTimeout (0.05s)
PASS
ok github.com/letsencrypt/boulder/va 1.075s
```
The URL construction approach we were previously using for the refactored VA HTTP-01 validation code was nice but broke SNI for HTTP->HTTPS redirects. In order to preserve this functionality we need to use a custom `DialContext` handler on the HTTP Transport that overrides the target host to use a pre-resolved IP.
Resolves https://github.com/letsencrypt/boulder/issues/3969
When the `SimplifiedVAHTTP01` feature flag is enabled we need to
preserve query parameters when reconstructing a redirect URL for the
resolved IP address.
To add integration testing for this condition the Boulder tools images
are updated to in turn pull in an updated `pebble-challtestsrv` command
that tracks request history.
A new Python wrapper for the `pebble-challtestsrv` HTTP API is added to
centralize interacting with the chall test srv to add mock data and to
get the history of HTTP requests that have been processed.
The VA code activated by the `SimplifiedHTTP01` flag had a regression
that constructed validation URLs for HTTP-01 challenge redirects to
a HTTPS host incorrectly.
The regression is fixed and the unit tests are updated to cover this
case. An integration test is not included in this commit but will be
done as follow-up work since it requires adjusting existing integration
tests for HTTP-01 to use the challtestrv instead of Certbot's ACME
standalone server.
Continued bugs from the custom dialer approach used by the VA for HTTP-01 (most recently https://github.com/letsencrypt/boulder/issues/3889) motivated a rewrite.
Instead of using a custom dialer to be able to control DNS resolution for HTTP validation requests we can construct URLs for the IP addresses we resolve and overload the Host header. This avoids having to do address resolution within the dialer and eliminates the complexity of the dialer `addrInfoChan`. The only thing left for our custom dialer now is to shave some time off of the provided context to help us discern timeouts before/after connect.
The existing IP preference & fallback behaviour is preserved: e.g. if a host has both IPv6 and IPv4 addresses we connect to the first IPv6 address. If there is a network error connecting to that address (e.g. an error during "dial"), we try once more with the first IPv4 address. No other retries are done. Matching existing behaviour no fallback is done for HTTP level failures on an IPv6 address (e.g. mismatched webroots, redirect loops, etc). A new Prometheus counter "http01_fallbacks" is used to keep track of the number of fallbacks performed.
As a result of moving the layer at which the retry happens a fallback like described above will now produce two validation records: one for the initial IPv6 connection, and one for the IPv4 connection. Neither will have the "addressesTried" field populated, just "addressesResolved" and "addressUsed". Previously with the dialer doing the retry we would have created just one validation record with an IPv4 "addressUsed" field and both an IPv6 and IPv4 address in the "addressesTried" field.
Because this is a big diff for a key part of the VA the new code is gated by the `SimplifiedVAHTTP` feature flag.
Resolves#3889