This PR reworks the validateEmail() function from the RA to allow timeouts during DNS validation of MX/A/AAAA records for an email to be non-fatal and match our intention to verify emails best-effort.
Notes:
bdns/problem.go - DNSError.Timeout() was changed to also include context cancellation and timeout as DNS timeouts. This matches what DNSError.Error() was doing to set the error message and supports external callers to Timeout not duplicating the work.
bdns/mocks.go - the LookupMX mock was changed to support always.error and always.timeout in a manner similar to the LookupHost mock. Otherwise the TestValidateEmail unit test for the RA would fail when the MX lookup completed before the Host lookup because the error wouldn't be correct (empty DNS records vs a timeout or network error).
test/config/ra.json, test/config-next/ra.json - the dnsTries and dnsTimeout values were updated such that dnsTries * dnsTimeout was <= the WFE->RA RPC timeout (currently 15s in the test configs). This allows the dns lookups to all timeout without the overall RPC timing out.
Resolves#2260.
The LookupIPv6 flag has been enabled in production and isn't required anymore. This PR removes the flag entirely.
The errA and errAAAA error handling in LookupHost is left as-is, meaning that a non-nil errAAAA will not be returned to the caller. This matches the existing behaviour, and the expectations of the TestDNSLookupHost unit tests.
This commit also removes the tests from TestDNSLookupHost that tested the LookupIPv6 == false behaviours since those are no longer implemented.
Resolves#2191
Updates #1699.
Adds a new package, `features`, which exposes methods to set and check if various internal features are enabled. The implementation uses global state to store the features so that services embedded in another service do not each require their own features map in order to check if something is enabled.
Requires a `boulder-tools` image update to include `golang.org/x/tools/cmd/stringer`.
This PR, makes testing the bdns package more reliable. A race condition in TestMain was resulting in the test running before the test dns server had started. This is fixed by actively polling for the DNS server to be ready before starting the test suite.
Furthermore, a 1 millisecond server read/write timeout was proving to time out on occasion. This is fixed increasing to a 1 second read/write timeout to increase test reliability.
FYI: ran package bdns tests 1000 times with 22 failures previously, after this PR ran 1000 times with 0 failures.
fixes#1317
Several of the `ProblemType`s had convenience functions to instantiate `ProblemDetails`s using their type and a detail message. Where these existed I did a quick scan of the codebase to convert places where callers were explicitly constructing the `ProblemDetails` to use the convenience function.
For the `ProblemType`s that did not have such a function, I created one and then converted callers to use it.
Solves #1837.
When a CAA request to Unbound times out, fall back to checking CAA via Google Public DNS' HTTPS API, through multiple proxies so as to hit geographically distributed paths. All successful multipath responses must be identical in order to succeed, and at most one can fail.
Fixes#1618
exchangeOne used a deferd method which contained a expression as a argument. Because of how defer works the arguments where evaluated immediately (unlike the method) causing the total latency to always be the same.
* Split out CAA checking service (minus logging etc)
* Add example.yml config + follow general Boulder style
* Update protobuf package to correct version
* Add grpc client to va
* Add TLS authentication in both directions for CAA client/server
* Remove go lint check
* Add bcodes package listing custom codes for Boulder
* Add very basic (pull-only) gRPC metrics to VA + caa-service
* Fix all errcheck errors
* Add errcheck to test.sh
* Add a new sa.Rollback method to make handling errors in rollbacks easier.
This also causes a behavior change in the VA. If a HTTP connection is
abruptly closed after serving the headers for a non-200 response, the
reported error will be the read failure instead of the non-200.
Also, stop skipping CAA lookups for the root TLDs. The RFC is unclear on
the desired behavior here, but the ICANNTLD function is nonstandard and
the behavior is strictly more conservative than what we had before.
This unblocks the removal of the ICANNTLD function, which allows us to
stop forking upstream.
Closes#1522
The CAA response checking method has been refactored to have a
easier to follow straight-line control flow. Several bugs in it have
been fixed:
- Firstly, parameters for issue and issuewild directives were not
parsed, so any attempt to specify parameters would result in
a string mismatch with the CA CAA identity (e.g. "letsencrypt.org").
Moreover, the syntax as specified permits leading and trailing
whitespace, so a parameter-free record such as
" letsencrypt.org ; " would not be considered a match.
This has been fixed by stripping whitespace and parameters. The RFC
does not specify the criticality of parameters, so unknown
parameters (currently all parameters) are considered noncritical.
I justify this as follows:
If someone decides to nominate a CA in a CAA record, they can,
with trivial research, determine what parameters, if any, that
CA supports, and presumably in trusting them in the first place
is able to adequately trust that the CA will continue to support
those parameters. The risk from other CAs is zero because other CAs
do not process the parameters because the records in which they
appear they do not relate to them.
- Previously, all of the flag bits were considered to effectively mean
'critical'. However, the RFC specifies that all bits except for the
actual critical bit (decimal 128) should be ignored. In practice,
many people have misunderstood the RFC to mean that the critical bit
is decimal 1, so both bits are interpreted to mean 'critical', but
this change ignores all of the other bits. This ensures that the
remaining six bits are reasonably usable for future standards action
if any need should arise.
- Previously, existence of an "issue" directive but no "issuewild"
directive was essentially equivalent to an unsatisfiable "issuewild"
directive, meaning that no wildcard identifiers could pass the CAA
check. This is contrary to the RFC, which states that issuewild
should default to what is specified for "issue" if no issuewild
directives are specified. (This is somewhat moot since boulder
doesn't currently support wildcard issuance.)
- Conversely, existence of an "issuewild" directive but no "issue"
directive would cause CAA validation for a non-wildcard identifier
to fail, which was contrary to the RFC. This has been fixed.
- More generally, existence of any unknown non-critical directive, say
"foobar", would cause the CAA checking code to act as though an
unsatisfiable "issue" directive existed, preventing any issuance.
This has been fixed.
Test coverage for corner cases is enhanced and provides regression
testing for these bugs.
statsd statistics have been added for tracking the relative frequency
of occurrence of different CAA features and outcomes. I added these on
a whim suspecting that they may be of interest.
Fixes#1436.
This is more what we expect from a dns server.
dig A nx.google.com @ns2.google.com
; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> A nx.google.com @ns2.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 28643
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;nx.google.com. IN A
;; AUTHORITY SECTION:
google.com. 60 IN SOA ns4.google.com. dns-admin.google.com. 112672771 900 900 1800 60
;; Query time: 13 msec
;; SERVER: 216.239.34.10#53(216.239.34.10)
;; WHEN: Thu Jan 21 14:44:06 CET 2016
;; MSG SIZE rcvd: 81
VS
dig A www.google.com @ns2.google.com
; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> A www.google.com @ns2.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18684
;; flags: qr aa rd; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;www.google.com. IN A
;; ANSWER SECTION:
www.google.com. 300 IN A 64.233.184.99
www.google.com. 300 IN A 64.233.184.105
www.google.com. 300 IN A 64.233.184.106
www.google.com. 300 IN A 64.233.184.104
www.google.com. 300 IN A 64.233.184.147
www.google.com. 300 IN A 64.233.184.103
;; Query time: 13 msec
;; SERVER: 216.239.34.10#53(216.239.34.10)
;; WHEN: Thu Jan 21 14:44:32 CET 2016
;; MSG SIZE rcvd: 128
Previously we would return a detailed errorString, which ProblemDetailsFromDNSError
would turn into a generic, uninformative "Server failure at resolver".
Now we return a new internal dnsError type, which ProblemDetailsFromDNSError can
turn into a more informative message to be shown to the user.
This provides a means to add retries to DNS look ups, and, with some
future work, end retries early if our request deadline is blown. That
future work is tagged with #1292.
Updates #1258
This moves the RTT metrics calculation inside of the DNSResolver. This
cleans up code in the RA and VA and makes some adding retries to the
DNSResolver less ugly to do.
Note: this will put `Rate` and `RTT` after the name of DNS query
type (`A`, `MX`, etc.). I think that's fine and desirable. We aren't
using this data in alerts or many dashboards, yet, so a flag day is
okay.
Fixes#1124
Moves the DNS code from core to dns and renames the dns package to bdns
to be clearer.
Fixes#1260 and will be good to have while we add retries and such.