We no longer need a code path to resolve reserved IP addresses during
integration tests.
Move to a public IP for the remaining tests, after #8187 did so for many
of them.
Depends on #8187
Move `IsReservedIP` and its supporting vars from `bdns` to `policy`.
Rewrite `IsReservedIP` to:
* Use `netip` because `netip.Prefix` can be used as a map key, allowing
us to define prefix lists more elegantly. This will enable future work
to import prefix lists from IANA's primary source data.
* Return an error including the reserved network's name.
Refactor `IsReservedIP` tests to be table-based.
Fixes#8040
Plumb the userAgent field, used to set http-01 User-Agent headers, from
va/rva configuration through to where User-Agent headers can be set for
DoH queries. Use integration tests to validate that the User-Agent is
set for http-01 challenges, dns-01 challenges over DoH, and CAA checks
over DoH.
Fixes#7963.
Adds the chosen DNS resolver to the VAs `ValidationRecord` object so
that for each challenge type during a validation, boulder can audit log
the resolver(s) chosen to fulfill the request..
Fixes https://github.com/letsencrypt/boulder/issues/7140
Change the CAA NXDOMAIN carve-out to only apply to registered domains
and their subdomains, not to TLDs.
Our CAA lookup function has a carveout that allows queries which receive
an NXDOMAIN response to be treated as though they received a successful
empty response. This is important due to the confluence of three
circumstances:
1) many clients use the DNS-01 method to validate issuance for names
which generally don't have publicly-visible A/AAAA records;
2) many ACME clients remove their DNS-01 TXT record promptly after
validation has completed; and
3) authorizations may be reused more than 8 hours after they were first
validated and CAA was first checked.
When these circumstances combine, the DNS rightly returns NXDOMAIN when
we re-check CAA at issuance time, because no records exist at all for
that name. We have to treat this as permission to issue, the same as any
other domain which has no CAA records.
However, this should never be the case for TLDs: those should always
have at least one record. If a TLD returns NXDOMAIN, something much
worse has happened -- such as a gTLD being unlisted by ICANN -- and we
should treat it as a failure.
This change adds a check that the name in question contains at least one
dot (".") before triggering the existing carve-out, to make it so that
the carve-out does not apply to TLDs.
Fixes https://github.com/letsencrypt/boulder/issues/7056
A recent refactoring (https://github.com/letsencrypt/boulder/pull/6906)
started treating NXDOMAIN for a CAA lookup as a hard error, when it
should be treated (from Boulder's point of view) as meaning there is an
empty list of resource records.
If the resolver provides EDE (https://www.rfc-editor.org/rfc/rfc8914),
Boulder will automatically expose it in the error message. Note that
most error messages contain the error RCODE (NXDOMAIN, SERVFAIL, etc),
when there is EDE present we omit it in the interest of brevity. In
practice it will almost always be SERVFAIL, and the extended error
information is more informative anyhow.
This will have no effect in production until we configure Unbound to
enable EDE.
Fixes#6875.
---------
Co-authored-by: Matthew McPherrin <mattm@letsencrypt.org>
Remove the port component of the resolver IP:port. Each of our unbounds
serves on multiple ports, and we talk to all of those ports, to increase
the entropy of our UDP query packets and reduce the chances of a
spurious ID mismatch error. But since all of those ports are running on
the same server, they are not worth distinguishing for metrics purposes.
Remove the authenticated_data label from the query time histograms. We
don't use this.
Part of #6142
These new linters are almost all part of golangci-lint's collection
of default linters, that would all be running if we weren't setting
`disable-all: true`. By adding them, we now have parity with the
default configuration, as well as the additional linters we like.
Adds the following linters:
* unconvert
* deadcode
* structcheck
* typecheck
* varcheck
* wastedassign
Running an older version (v0.0.1-2020.1.4) of `staticcheck` in
whole-program mode (`staticcheck --unused.whole-program=true -- ./...`)
finds various instances of unused code which don't normally show up
as CI issues. I've used this to find and remove a large chunk of the
unused code, to pave the way for additional large deletions accompanying
the WFE1 removal.
Part of #5681
When we query DNS for a host, and both the A and AAAA lookups fail or
are empty, combine both errors into a single error rather than only
returning the error from the A lookup.
Fixes#5819Fixes#5319
- Add function `validateServerAddress()` to `bdns/servers.go` which ensures that
DNS server addresses are TCP/ UDP dial-able per: https://golang.org/src/net/dial.go?#L281
- Add unit test for `validateServerAddress()` in `bdns/servers_test.go`
- Update `cmd/boulder-va/main.go` to handle `bdns.NewStaticProvider()`
potentially returning an error.
- Update unit tests in `bdns/dns_test.go`:
- Handle `bdns.NewStaticProvider()` potentially returning an error
- Add an IPv6 address to `TestRotateServerOnErr`
- Ensure DNS server addresses are validated by `validateServerAddress` whenever:
- `dynamicProvider.update() is called`
- `staticProvider` is constructed
- Construct server addresses using `net.JoinHostPost()` when
`dynamicProvider.Addrs()` is called
Fixes#5463
Abstract out the way that the bdns library keeps track of the
resolvers it uses to do DNS lookups. Create one implementation,
the `StaticProvider`, which behaves exactly the same as the old
mechanism (providing whatever names or addresses were given
in the config). Create another implementation, `DynamicProvider`,
which re-resolves the provided name on a regular basis.
The dynamic provider consumes a single name, does a lookup
on that name for any SRV records suggesting that it is running a
DNS service, and then looks up A records to get the address of
all the names returned by the SRV query. It exports its successes
and failures as a prometheus metric.
Finally, update the tests and config-next configs to work with
this new mechanism. Give sd-test-srv the capability to respond
to SRV queries, and put the names it provides into docker's
default DNS resolver.
Fixes#5306
Under normal circumstances, I believe we should never have cause to
return a cancellation-related error to the user. This change should
distinguish that case in the logs so we can look for it. If it turns out
we do sometimes return cancellation-related errors to the user, we
should do further digging and figure out why.
Related #5346
Update all of our tests to use `AssertMetricWithLabelsEquals`
instead of combinations of the older `CountFoo` helpers with
simple asserts. This coalesces all of our prometheus inspection
logic into a single function, allowing the deletion of four separate
helper functions.
[Go style says](https://blog.golang.org/package-names):
> Avoid stutter. Since client code uses the package name as a prefix
> when referring to the package contents, the names for those contents
> need not repeat the package name. The HTTP server provided by the
> http package is called Server, not HTTPServer. Client code refers to
> this type as http.Server, so there is no ambiguity.
Rename DNSClient, DNSClientImpl, NewDNSClientImpl,
NewTestDNSClientImpl, DNSError, and MockDNSClient to follow those
guidelines.
Unexport DNSClientImpl and MockTimeoutError (was only used internally).
Make New and NewTest return the Client interface rather than a concrete
`impl` type.
When the VA encounters CAA records, it logs the contents of those
records. When those records were the result of following a chain of
CNAMEs, the CNAMEs are included as part of the response from our
recursive resolver. However, the current flow for logging the responses
logs only the CAA records, not the CNAMEs.
This change returns the complete dig-style RR response from bdns to the
va where the response of the authoritative CAA RR is string-quoted and
logged.
This dig-style RR response is quite verbose, however it is only ever
returned from bdns.LookupCAA when a CAA response is non-empty. If the CAA
response is empty only an empty string is returned.
Fixes#5082
This change adds two new test assertion helpers, `AssertErrorIs`
and `AssertErrorWraps`. The former is a wrapper around `errors.Is`,
and asserts that the error's wrapping chain contains a specific (i.e.
singleton) error. The latter is a wrapper around `errors.As`, and
asserts that the error's wrapping chain contains any error which is
of the given type; it also has the same unwrapping side effect as
`errors.As`, which can be useful for further assertions about the
contents of the error.
It also makes two small changes to our `berrors` package, namely
making `berrors.ErrorType` itself an error rather than just an int,
and giving `berrors.BoulderError` an `Unwrap()` method which
exposes that inner `ErrorType`. This allows us to use the two new
helpers above to make assertions about berrors, rather than
having to hand-roll equality assertions about their types.
Finally, it takes advantage of the two changes above to greatly
simplify many of the assertions in our tests, removing conditional
checks and replacing them with simple assertions.
errors.As checks for a specific error in a wrapped error chain
(see https://golang.org/pkg/errors/#As) as opposed to asserting
that an error is of a specific type
Part of #5010
In a handful of places I've nuked old stats which are not used in any alerts or dashboards as they either duplicate other stats or don't provide much insight/have never actually been used. If we feel like we need them again in the future it's trivial to add them back.
There aren't many dashboards that rely on old statsd style metrics, but a few will need to be updated when this change is deployed. There are also a few cases where prometheus labels have been changed from camel to snake case, dashboards that use these will also need to be updated. As far as I can tell no alerts are impacted by this change.
Fixes#4591.
We've found we need the context offered from logging the error closer to when it
happens in the `bdns` package rather than in the `va`. Adopting the function
requires adapting it slightly. Specifically in the new location we know it won't
be called with any timeout results, with a non-dns error, or with a nil
underlying error.
Having the logging done in `bdns` (and specifically from `exchangeOne`) also
lets us log the wire format of the query and response when we get a `dns.ErrId`
error indicating a query/response ID mismatch. A small unit test is included
that ensures the logging happens as expected.
In case it proves useful for matching against other metrics the DNS ID mismatch
error case also now increments a dedicated prometheus counter vector stat,
`dns_id_mismatch`. The stat is labelled by resolver and query type.
Resolves https://github.com/letsencrypt/boulder/issues/4532
When a retryable error occurs and there are multiple DNS servers configured it is prudent to change servers before retrying the query. This helps ensure that one dead DNS server won't result in queries failing.
Resolves https://github.com/letsencrypt/boulder/issues/3846
Distinguish between deadline exceeded vs canceled. Also, combine those
two cases with "out of retries" into a single stat with a label
determining type.
We used to use TCP because we would request DNSSEC records from Unbound, and
they would always cause truncated records when present. Now that we no longer
request those (#2718), we can use UDP. This is better because the TCP serving
paths in Unbound are likely less thoroughly tested, and not optimized for high
load. In particular this may resolve some availability problems we've seen
recently when trying to upgrade to a more recent Unbound.
Note that this only affects the Boulder->Unbound path. The Unbound->upstream
path is already UDP by default (with TCP fallback for truncated ANSWERs).
Since the legacy CAA spec does the wrong thing with DNAMEs (treating them as
CNAMEs), and it's hard to reconcile this approach with CNAME handling, and
DNAMEs are extremely rare, reject outright any CAA responses containing DNAMEs.
Also, in the process, fix a bug in the previous LegacyCAA implementation.
Because the processing of records in LookupCAA was gated by `if
answer.Header().RRType == dnsType`, non-CAA responses were filtered out. This
wasn't caught by previous testing, because it was unittesting that mocked out
bdns.
This implements the pre-erratum 5065 version of CAA, behind a feature flag.
This involved refactoring DNSClient.LookupCAA to return a list of CNAMEs in addition to the CAA records, and adding an alternate lookuper that does tree-climbing on single-depth aliases.
Add a logging statement that fires when a remote VA fail causes
overall failure. Also change remoteValidationFailures into a
counter that counts the same thing, instead of a histogram. Since
the histogram had the default bucket sizes, it failed to collect
what we needed, and produced more metrics than necessary.
Fixes#639.
This resolves something that has bugged me for two+ years, our DNSResolverImpl is not a DNS resolver, it is a DNS client. This change just makes that obvious.
The LookupIPv6 flag has been enabled in production and isn't required anymore. This PR removes the flag entirely.
The errA and errAAAA error handling in LookupHost is left as-is, meaning that a non-nil errAAAA will not be returned to the caller. This matches the existing behaviour, and the expectations of the TestDNSLookupHost unit tests.
This commit also removes the tests from TestDNSLookupHost that tested the LookupIPv6 == false behaviours since those are no longer implemented.
Resolves#2191
This PR, makes testing the bdns package more reliable. A race condition in TestMain was resulting in the test running before the test dns server had started. This is fixed by actively polling for the DNS server to be ready before starting the test suite.
Furthermore, a 1 millisecond server read/write timeout was proving to time out on occasion. This is fixed increasing to a 1 second read/write timeout to increase test reliability.
FYI: ran package bdns tests 1000 times with 22 failures previously, after this PR ran 1000 times with 0 failures.
fixes#1317