Since the switch to DoH, when a query from Boulder to Unbound times out
we get a generic "server failure at resolver" error message. This
updates `bdns.Error` so it returns "query timed out" instead, which is
more informative.
Co-authored-by: Samantha Frank <hello@entropy.cat>
If the resolver provides EDE (https://www.rfc-editor.org/rfc/rfc8914),
Boulder will automatically expose it in the error message. Note that
most error messages contain the error RCODE (NXDOMAIN, SERVFAIL, etc),
when there is EDE present we omit it in the interest of brevity. In
practice it will almost always be SERVFAIL, and the extended error
information is more informative anyhow.
This will have no effect in production until we configure Unbound to
enable EDE.
Fixes#6875.
---------
Co-authored-by: Matthew McPherrin <mattm@letsencrypt.org>
Under normal circumstances, I believe we should never have cause to
return a cancellation-related error to the user. This change should
distinguish that case in the logs so we can look for it. If it turns out
we do sometimes return cancellation-related errors to the user, we
should do further digging and figure out why.
Related #5346
[Go style says](https://blog.golang.org/package-names):
> Avoid stutter. Since client code uses the package name as a prefix
> when referring to the package contents, the names for those contents
> need not repeat the package name. The HTTP server provided by the
> http package is called Server, not HTTPServer. Client code refers to
> this type as http.Server, so there is no ambiguity.
Rename DNSClient, DNSClientImpl, NewDNSClientImpl,
NewTestDNSClientImpl, DNSError, and MockDNSClient to follow those
guidelines.
Unexport DNSClientImpl and MockTimeoutError (was only used internally).
Make New and NewTest return the Client interface rather than a concrete
`impl` type.
When we get a DNS error that has an internal cause (like connection
refused), we return a generic message like "networking error" to the
user to avoid revealing details that would be confusing. However, when
debugging problems with our own services, it's useful to have the
underlying errors.
This adds a helper method in the VA and calls it from each place we use
DNS errors.
This patch removes all usages of the `core.XXXError` and almost all usages of `probs` outside of the WFE and VA and replaces them with a unified internal error type. Since the VA uses `probs.ProblemDetails` quite extensively in challenges, and currently stores them in the DB I've saved this change for another change (it'll also require a migration). Since `ProblemDetails` should only ever be exposed to end-users all of its related logic should be moved into the `WFE` but since it still needs to be exposed to the VA and SA I've left it in place for now.
The new internal `errors` package offers the same convenience functions as `probs` does as well as a new simpler type testing method. A few small changes have also been made to error messages, mainly adding the library and function name to internal server errors for easier debugging (i.e. where a number of functions return the exact same errors and there is no other way to distinguish which method threw the error).
Also adds proper encoding of internal errors transferred over gRPC (the current encoding scheme is kept for `core` and `probs` errors since it'll be ideally be removed after we deploy this and follow-up changes) using `grpc/metadata` instead of the gRPC status codes.
Fixes#2507. Updates #2254 and #2505.
This PR reworks the validateEmail() function from the RA to allow timeouts during DNS validation of MX/A/AAAA records for an email to be non-fatal and match our intention to verify emails best-effort.
Notes:
bdns/problem.go - DNSError.Timeout() was changed to also include context cancellation and timeout as DNS timeouts. This matches what DNSError.Error() was doing to set the error message and supports external callers to Timeout not duplicating the work.
bdns/mocks.go - the LookupMX mock was changed to support always.error and always.timeout in a manner similar to the LookupHost mock. Otherwise the TestValidateEmail unit test for the RA would fail when the MX lookup completed before the Host lookup because the error wouldn't be correct (empty DNS records vs a timeout or network error).
test/config/ra.json, test/config-next/ra.json - the dnsTries and dnsTimeout values were updated such that dnsTries * dnsTimeout was <= the WFE->RA RPC timeout (currently 15s in the test configs). This allows the dns lookups to all timeout without the overall RPC timing out.
Resolves#2260.
Several of the `ProblemType`s had convenience functions to instantiate `ProblemDetails`s using their type and a detail message. Where these existed I did a quick scan of the codebase to convert places where callers were explicitly constructing the `ProblemDetails` to use the convenience function.
For the `ProblemType`s that did not have such a function, I created one and then converted callers to use it.
Solves #1837.
When a CAA request to Unbound times out, fall back to checking CAA via Google Public DNS' HTTPS API, through multiple proxies so as to hit geographically distributed paths. All successful multipath responses must be identical in order to succeed, and at most one can fail.
Fixes#1618
* Split out CAA checking service (minus logging etc)
* Add example.yml config + follow general Boulder style
* Update protobuf package to correct version
* Add grpc client to va
* Add TLS authentication in both directions for CAA client/server
* Remove go lint check
* Add bcodes package listing custom codes for Boulder
* Add very basic (pull-only) gRPC metrics to VA + caa-service
Previously we would return a detailed errorString, which ProblemDetailsFromDNSError
would turn into a generic, uninformative "Server failure at resolver".
Now we return a new internal dnsError type, which ProblemDetailsFromDNSError can
turn into a more informative message to be shown to the user.
Moves the DNS code from core to dns and renames the dns package to bdns
to be clearer.
Fixes#1260 and will be good to have while we add retries and such.