Commit Graph

33 Commits

Author SHA1 Message Date
Roland Bracewell Shoemaker 97d1788a18 Add resolver to DNS metrics (#3874)
Helpful for debugging stuff in multi-resolver setups.
2018-10-01 11:16:45 -07:00
Daniel McCarney cca4a0c14a BDNS: Rotate the DNS server between query retries. (#3861)
When a retryable error occurs and there are multiple DNS servers configured it is prudent to change servers before retrying the query. This helps ensure that one dead DNS server won't result in queries failing.

Resolves https://github.com/letsencrypt/boulder/issues/3846
2018-09-19 08:06:09 -07:00
Jacob Hoffman-Andrews 49511538d0 Make DNS timeout stat more specific. (#3627)
Distinguish between deadline exceeded vs canceled. Also, combine those
two cases with "out of retries" into a single stat with a label
determining type.
2018-04-09 09:29:07 -04:00
Roland Bracewell Shoemaker bdea281ae0 Remove CAA SERVFAIL exceptions code (#3262)
Fixes #3080.
2017-12-05 14:39:37 -08:00
Jacob Hoffman-Andrews 2fd2f9e230 Remove LegacyCAA implementation. (#3240)
Fixes #3236
2017-11-20 16:09:00 -05:00
Jacob Hoffman-Andrews 90278c80fe Revert "Reject CAA responses containing DNAMEs (#3082)" (#3188)
This reverts commit 08d2018c10.

Feedback from root programs:

https://cabforum.org/pipermail/public/2017-October/012293.html
https://cabforum.org/pipermail/public/2017-October/012297.html
https://cabforum.org/pipermail/public/2017-October/012358.html
https://cabforum.org/pipermail/public/2017-October/012320.html

Resolves #3130.
2017-10-23 11:14:56 -07:00
Jacob Hoffman-Andrews 4e68fb2ff6 Switch to udp for internal DNS. (#3135)
We used to use TCP because we would request DNSSEC records from Unbound, and
they would always cause truncated records when present. Now that we no longer
request those (#2718), we can use UDP. This is better because the TCP serving
paths in Unbound are likely less thoroughly tested, and not optimized for high
load. In particular this may resolve some availability problems we've seen
recently when trying to upgrade to a more recent Unbound.

Note that this only affects the Boulder->Unbound path. The Unbound->upstream
path is already UDP by default (with TCP fallback for truncated ANSWERs).
2017-10-10 10:06:33 -04:00
Jacob Hoffman-Andrews 08d2018c10 Reject CAA responses containing DNAMEs (#3082)
Since the legacy CAA spec does the wrong thing with DNAMEs (treating them as
CNAMEs), and it's hard to reconcile this approach with CNAME handling, and
DNAMEs are extremely rare, reject outright any CAA responses containing DNAMEs.

Also, in the process, fix a bug in the previous LegacyCAA implementation.
Because the processing of records in LookupCAA was gated by `if
answer.Header().RRType == dnsType`, non-CAA responses were filtered out. This
wasn't caught by previous testing, because it was unittesting that mocked out
bdns.
2017-09-13 10:54:48 -07:00
Jacob Hoffman-Andrews 4266853092 Implement legacy form of CAA (#3075)
This implements the pre-erratum 5065 version of CAA, behind a feature flag.

This involved refactoring DNSClient.LookupCAA to return a list of CNAMEs in addition to the CAA records, and adding an alternate lookuper that does tree-climbing on single-depth aliases.
2017-09-13 10:16:12 -04:00
Jacob Hoffman-Andrews 568407e5b8 Remote VA logging and stats (#3063)
Add a logging statement that fires when a remote VA fail causes
overall failure. Also change remoteValidationFailures into a
counter that counts the same thing, instead of a histogram. Since
the histogram had the default bucket sizes, it failed to collect
what we needed, and produced more metrics than necessary.
2017-09-11 12:50:50 -07:00
Roland Bracewell Shoemaker eadbc19c43 Switch DNS metrics from statsd to prometheus (#2994)
Makes the DNS stats code much nicer if I don't say so myself. Should make investigating DNS problems much easier now as well.

Fixes #2956.
2017-08-22 14:33:36 -07:00
Roland Bracewell Shoemaker 05d869b005 Rename DNSResolver -> DNSClient (#2878)
Fixes #639.

This resolves something that has bugged me for two+ years, our DNSResolverImpl is not a DNS resolver, it is a DNS client. This change just makes that obvious.
2017-07-18 08:37:45 -04:00
Daniel McCarney 409f1623e6 Retires `LookupIPv6` VA flag. (#2205)
The LookupIPv6 flag has been enabled in production and isn't required anymore. This PR removes the flag entirely.

The errA and errAAAA error handling in LookupHost is left as-is, meaning that a non-nil errAAAA will not be returned to the caller. This matches the existing behaviour, and the expectations of the TestDNSLookupHost unit tests.

This commit also removes the tests from TestDNSLookupHost that tested the LookupIPv6 == false behaviours since those are no longer implemented.

Resolves #2191
2016-09-26 18:00:01 -07:00
Roland Bracewell Shoemaker c8f1fb3e2f Remove direct usages of go-statsd-client in favor of using metrics.Scope (#2136)
Fixes #2118, fixes #2082.
2016-09-07 19:35:13 -04:00
Jacob Hoffman-Andrews ffd8e92896 Disable validations to 2002::/16 (6to4 anycast) (#2095)
We disable validations to IPs under the 6to4 anycase prefix because
there's too much risk of a malicious actor advertising the prefix and
answering validations for a 6to4 host they do not control.

https://community.letsencrypt.org/t/problems-validating-ipv6-against-host-running-6to4/18312/9
2016-08-01 10:15:32 -04:00
Jacob Hoffman-Andrews 0c0e94dfaf Add enforcement for CAA SERVFAIL (#1971)
https://github.com/letsencrypt/boulder/pull/1971
2016-06-28 11:00:23 -07:00
Ben Irving 51425cab81 Remove race condition from bdns_test.go (#1906)
This PR, makes testing the bdns package more reliable. A race condition in TestMain was resulting in the test running before the test dns server had started. This is fixed by actively polling for the DNS server to be ready before starting the test suite.

Furthermore, a 1 millisecond server read/write timeout was proving to time out on occasion. This is fixed increasing to a 1 second read/write timeout to increase test reliability. 

FYI: ran package bdns tests 1000 times with 22 failures previously, after this PR ran 1000 times with 0 failures.

fixes #1317
2016-06-08 17:33:27 -04:00
Roland Bracewell Shoemaker 54573b36ba Remove all stray copyright headers and appends the initial line to LICENSE.txt (#1853) 2016-05-31 12:32:04 -07:00
Kane York 339405bcb9 Look up A and AAAA in parallel (#1760)
This allows validating IPv6-only hosts.

Fixes #593.
2016-05-09 08:38:23 -07:00
Jacob Hoffman-Andrews e6c17e1717 Switch to new vendor style (#1747)
* Switch to new vendor style.

* Fix metrics generate command.

* Fix miekg/dns types_generate.

* Use generated copies of files.

* Update miekg to latest.

Fixes a problem with `go generate`.

* Set GO15VENDOREXPERIMENT.

* Build in letsencrypt/boulder.

* fix travis more.

* Exclude vendor instead of godeps.

* Replace some ...

* Fix unformatted cmd

* Fix errcheck for vendorexp

* Add GO15VENDOREXPERIMENT to Makefile.

* Temp disable errcheck.

* Restore master fetch.

* Restore errcheck.

* Build with 1.6 also.

* Match statsd.*"

* Skip errcheck unles Go1.6.

* Add other ignorepkg.

* Fix errcheck.

* move errcheck

* Remove go1.6 requirement.

* Put godep-restore with errcheck.

* Remove go1.6 dep.

* Revert master fetch revert.

* Remove -r flag from godep save.

* Set GO15VENDOREXPERIMENT in Dockerfile and remove _worskpace.

* Fix Godep version.
2016-04-18 12:51:36 -07:00
Roland Bracewell Shoemaker 8eaf247ee9 Split CAA checking out to its own service (#1647)
* Split out CAA checking service (minus logging etc)
* Add example.yml config + follow general Boulder style
* Update protobuf package to correct version
* Add grpc client to va
* Add TLS authentication in both directions for CAA client/server
* Remove go lint check
* Add bcodes package listing custom codes for Boulder
* Add very basic (pull-only) gRPC metrics to VA + caa-service
2016-04-12 23:02:41 -07:00
Kane York 25b45a45ec Errcheck errors fixed (#1677)
* Fix all errcheck errors
* Add errcheck to test.sh
* Add a new sa.Rollback method to make handling errors in rollbacks easier.
This also causes a behavior change in the VA. If a HTTP connection is
abruptly closed after serving the headers for a non-200 response, the
reported error will be the read failure instead of the non-200.
2016-04-12 16:54:01 -07:00
Jacob Hoffman-Andrews 4b318de37e Make a couple of fields private on DNS impl
These fields were not used externally and could not be modified concurrently, so
they should not be exposed.
2016-03-11 22:44:16 -08:00
Romain Fliedel 3e06307e8a only add soa authority response for nxdomain or not entry
This is more what we expect from a dns server.

dig A nx.google.com @ns2.google.com

; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> A nx.google.com @ns2.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 28643
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;nx.google.com.                 IN      A

;; AUTHORITY SECTION:
google.com.             60      IN      SOA     ns4.google.com. dns-admin.google.com. 112672771 900 900 1800 60

;; Query time: 13 msec
;; SERVER: 216.239.34.10#53(216.239.34.10)
;; WHEN: Thu Jan 21 14:44:06 CET 2016
;; MSG SIZE  rcvd: 81

VS

dig A www.google.com @ns2.google.com

; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> A www.google.com @ns2.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18684
;; flags: qr aa rd; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;www.google.com.                        IN      A

;; ANSWER SECTION:
www.google.com.         300     IN      A       64.233.184.99
www.google.com.         300     IN      A       64.233.184.105
www.google.com.         300     IN      A       64.233.184.106
www.google.com.         300     IN      A       64.233.184.104
www.google.com.         300     IN      A       64.233.184.147
www.google.com.         300     IN      A       64.233.184.103

;; Query time: 13 msec
;; SERVER: 216.239.34.10#53(216.239.34.10)
;; WHEN: Thu Jan 21 14:44:32 CET 2016
;; MSG SIZE  rcvd: 128
2016-01-21 15:59:02 +01:00
Jacob Hoffman-Andrews d5acf9d2b5 Restore expectedCount. 2016-01-08 16:55:13 -08:00
Jacob Hoffman-Andrews 40167f3da3 Merge remote-tracking branch 'le/master' into dns-errors-fix
Conflicts:
	bdns/dns.go
	bdns/dns_test.go
	mocks/mocks.go
	ra/registration-authority.go
	ra/registration-authority_test.go
2016-01-08 14:07:05 -08:00
Roland Shoemaker a801766f05 Fix silly merge error 2016-01-06 15:30:13 -08:00
Roland Shoemaker b4c761aa5d Merge branch 'master' into dns-meta 2016-01-06 15:09:28 -08:00
Jacob Hoffman-Andrews df4ba7aaa8 Report DNS errors properly.
Previously we would return a detailed errorString, which ProblemDetailsFromDNSError
would turn into a generic, uninformative "Server failure at resolver".

Now we return a new internal dnsError type, which ProblemDetailsFromDNSError can
turn into a more informative message to be shown to the user.
2016-01-04 16:07:02 -08:00
Jeff Hodges 116ce96326 add retries and context deadlines to DNSResolver
This provides a means to add retries to DNS look ups, and, with some
future work, end retries early if our request deadline is blown. That
future work is tagged with #1292.

Updates #1258
2016-01-04 14:59:10 -08:00
Roland Shoemaker d18b8a536d Add DNS ValidationRecord metadata 2016-01-04 12:20:45 -08:00
Jeff Hodges e36895c9c5 bring RTT metrics inside DNSResolver
This moves the RTT metrics calculation inside of the DNSResolver. This
cleans up code in the RA and VA and makes some adding retries to the
DNSResolver less ugly to do.

Note: this will put `Rate` and `RTT` after the name of DNS query
type (`A`, `MX`, etc.). I think that's fine and desirable. We aren't
using this data in alerts or many dashboards, yet, so a flag day is
okay.

Fixes #1124
2015-12-16 17:41:42 -08:00
Jeff Hodges b31165444f move dns code to dns pkg and rename to bdns
Moves the DNS code from core to dns and renames the dns package to bdns
to be clearer.

Fixes #1260 and will be good to have while we add retries and such.
2015-12-14 11:21:43 -08:00