When a retryable error occurs and there are multiple DNS servers configured it is prudent to change servers before retrying the query. This helps ensure that one dead DNS server won't result in queries failing.
Resolves https://github.com/letsencrypt/boulder/issues/3846
Remove various unnecessary uses of fmt.Sprintf - in particular:
- Avoid calls like t.Error(fmt.Sprintf(...)), where t.Errorf can be used directly.
- Use strconv when converting an integer to a string, rather than using
fmt.Sprintf("%d", ...). This is simpler and can also detect type errors at
compile time.
- Instead of using x.Write([]byte(fmt.Sprintf(...))), use fmt.Fprintf(x, ...).
Distinguish between deadline exceeded vs canceled. Also, combine those
two cases with "out of retries" into a single stat with a label
determining type.
Also instead of repeating the same bucket definitions everywhere just use a single top level var in the metrics package in order to discourage copy/pasting.
Fixes#3607.
We used to use TCP because we would request DNSSEC records from Unbound, and
they would always cause truncated records when present. Now that we no longer
request those (#2718), we can use UDP. This is better because the TCP serving
paths in Unbound are likely less thoroughly tested, and not optimized for high
load. In particular this may resolve some availability problems we've seen
recently when trying to upgrade to a more recent Unbound.
Note that this only affects the Boulder->Unbound path. The Unbound->upstream
path is already UDP by default (with TCP fallback for truncated ANSWERs).
Since the legacy CAA spec does the wrong thing with DNAMEs (treating them as
CNAMEs), and it's hard to reconcile this approach with CNAME handling, and
DNAMEs are extremely rare, reject outright any CAA responses containing DNAMEs.
Also, in the process, fix a bug in the previous LegacyCAA implementation.
Because the processing of records in LookupCAA was gated by `if
answer.Header().RRType == dnsType`, non-CAA responses were filtered out. This
wasn't caught by previous testing, because it was unittesting that mocked out
bdns.
This implements the pre-erratum 5065 version of CAA, behind a feature flag.
This involved refactoring DNSClient.LookupCAA to return a list of CNAMEs in addition to the CAA records, and adding an alternate lookuper that does tree-climbing on single-depth aliases.
Fixes#639.
This resolves something that has bugged me for two+ years, our DNSResolverImpl is not a DNS resolver, it is a DNS client. This change just makes that obvious.
I'm interested in seeing both how often DNS responses we see are signed (mainly for CAA, but
also interested in other query types). This change adds a new counter, `Authenticated`, that can
be compared against the `Successes` counter to find the percentage of signed responses we see.
The counter is incremented if the `msg.AuthenticatedData` bit is set by the upstream resolver.
Remove `SetEdns0` call in `bdns.exchangeOne`. Since we talk over TCP to the production
resolver and we don't do any local validation of DNSSEC records adding the EDNS0 OPT
record is pointless and confusing. Testing against a local `unbound` instance shows you
don't need to set the DO bit for DNSSEC requests/validation to be done at the resolver
level.
The LookupIPv6 flag has been enabled in production and isn't required anymore. This PR removes the flag entirely.
The errA and errAAAA error handling in LookupHost is left as-is, meaning that a non-nil errAAAA will not be returned to the caller. This matches the existing behaviour, and the expectations of the TestDNSLookupHost unit tests.
This commit also removes the tests from TestDNSLookupHost that tested the LookupIPv6 == false behaviours since those are no longer implemented.
Resolves#2191
Updates #1699.
Adds a new package, `features`, which exposes methods to set and check if various internal features are enabled. The implementation uses global state to store the features so that services embedded in another service do not each require their own features map in order to check if something is enabled.
Requires a `boulder-tools` image update to include `golang.org/x/tools/cmd/stringer`.
exchangeOne used a deferd method which contained a expression as a argument. Because of how defer works the arguments where evaluated immediately (unlike the method) causing the total latency to always be the same.
* Split out CAA checking service (minus logging etc)
* Add example.yml config + follow general Boulder style
* Update protobuf package to correct version
* Add grpc client to va
* Add TLS authentication in both directions for CAA client/server
* Remove go lint check
* Add bcodes package listing custom codes for Boulder
* Add very basic (pull-only) gRPC metrics to VA + caa-service
Previously we would return a detailed errorString, which ProblemDetailsFromDNSError
would turn into a generic, uninformative "Server failure at resolver".
Now we return a new internal dnsError type, which ProblemDetailsFromDNSError can
turn into a more informative message to be shown to the user.
This provides a means to add retries to DNS look ups, and, with some
future work, end retries early if our request deadline is blown. That
future work is tagged with #1292.
Updates #1258
This moves the RTT metrics calculation inside of the DNSResolver. This
cleans up code in the RA and VA and makes some adding retries to the
DNSResolver less ugly to do.
Note: this will put `Rate` and `RTT` after the name of DNS query
type (`A`, `MX`, etc.). I think that's fine and desirable. We aren't
using this data in alerts or many dashboards, yet, so a flag day is
okay.
Fixes#1124
Moves the DNS code from core to dns and renames the dns package to bdns
to be clearer.
Fixes#1260 and will be good to have while we add retries and such.