Commit Graph

68 Commits

Author SHA1 Message Date
Roland Bracewell Shoemaker 6f93942a04 Consistently used stdlib context package (#4229) 2019-05-28 14:36:16 -04:00
Roland Bracewell Shoemaker e839042bae dns: Remove Authorities field from ValidationRecord (#4230) 2019-05-28 14:11:32 -04:00
Jacob Hoffman-Andrews 4c420e2bc2 bdns: Remove LookupMX. (#4202)
We used to use this for checking email domains on registration, but not
anymore.
2019-05-06 09:29:44 -04:00
Roland Bracewell Shoemaker 97d1788a18 Add resolver to DNS metrics (#3874)
Helpful for debugging stuff in multi-resolver setups.
2018-10-01 11:16:45 -07:00
Daniel McCarney cca4a0c14a BDNS: Rotate the DNS server between query retries. (#3861)
When a retryable error occurs and there are multiple DNS servers configured it is prudent to change servers before retrying the query. This helps ensure that one dead DNS server won't result in queries failing.

Resolves https://github.com/letsencrypt/boulder/issues/3846
2018-09-19 08:06:09 -07:00
Joel Sing f8a023e49c Remove various unnecessary uses of fmt.Sprintf (#3707)
Remove various unnecessary uses of fmt.Sprintf - in particular:

- Avoid calls like t.Error(fmt.Sprintf(...)), where t.Errorf can be used directly.

- Use strconv when converting an integer to a string, rather than using
  fmt.Sprintf("%d", ...). This is simpler and can also detect type errors at
  compile time.

- Instead of using x.Write([]byte(fmt.Sprintf(...))), use fmt.Fprintf(x, ...).
2018-05-11 11:55:25 -07:00
Jacob Hoffman-Andrews 49511538d0 Make DNS timeout stat more specific. (#3627)
Distinguish between deadline exceeded vs canceled. Also, combine those
two cases with "out of retries" into a single stat with a label
determining type.
2018-04-09 09:29:07 -04:00
Roland Bracewell Shoemaker 8167abd5e3 Use internet facing appropriate histogram buckets for DNS latencies (#3616)
Also instead of repeating the same bucket definitions everywhere just use a single top level var in the metrics package in order to discourage copy/pasting.

Fixes #3607.
2018-04-04 08:01:54 -04:00
Roland Bracewell Shoemaker 5748792a07 Fix IP blacklist typo (#3478) 2018-02-26 09:31:33 -08:00
Jacob Hoffman-Andrews 4f2e5f11e5 Accept large responses from the resolver. (#3467)
This fixes some use cases where a domain being validated has a very
large number of CAA records.
2018-02-21 08:54:48 -05:00
Jacob Hoffman-Andrews 1fe8aa8128 Improve errors for DNS challenge (#3317)
Before this change, we would just log "Correct value not found for DNS challenge"
when we got a TXT record that didn't match what we expected. This was different
from the error when no TXT records were found at all, but viewing the error out of
context doesn't make that clear. This change improves the error to specifically say
that we found a TXT record, but it was the wrong one.

Also in this change: if we found multiple TXT records, we mention the number;
and we trim the length of the echoed TXT record.
2018-01-03 15:37:23 -05:00
Jacob Hoffman-Andrews f16c3af335 Active UDPDNS by default. (#3285)
Active UDPDNS by default
2017-12-15 12:26:45 -08:00
Roland Bracewell Shoemaker bdea281ae0 Remove CAA SERVFAIL exceptions code (#3262)
Fixes #3080.
2017-12-05 14:39:37 -08:00
Jacob Hoffman-Andrews 2fd2f9e230 Remove LegacyCAA implementation. (#3240)
Fixes #3236
2017-11-20 16:09:00 -05:00
Jacob Hoffman-Andrews 90278c80fe Revert "Reject CAA responses containing DNAMEs (#3082)" (#3188)
This reverts commit 08d2018c10.

Feedback from root programs:

https://cabforum.org/pipermail/public/2017-October/012293.html
https://cabforum.org/pipermail/public/2017-October/012297.html
https://cabforum.org/pipermail/public/2017-October/012358.html
https://cabforum.org/pipermail/public/2017-October/012320.html

Resolves #3130.
2017-10-23 11:14:56 -07:00
Jacob Hoffman-Andrews 4e68fb2ff6 Switch to udp for internal DNS. (#3135)
We used to use TCP because we would request DNSSEC records from Unbound, and
they would always cause truncated records when present. Now that we no longer
request those (#2718), we can use UDP. This is better because the TCP serving
paths in Unbound are likely less thoroughly tested, and not optimized for high
load. In particular this may resolve some availability problems we've seen
recently when trying to upgrade to a more recent Unbound.

Note that this only affects the Boulder->Unbound path. The Unbound->upstream
path is already UDP by default (with TCP fallback for truncated ANSWERs).
2017-10-10 10:06:33 -04:00
Jacob Hoffman-Andrews fce975a1e6 Move CAA mocks into caa_test. (#3084)
There were a bunch of test fixtures in bdns/mocks.go that were only used in va/caa_test.go. This moves them to be in the same file so we have less spooky action at a distance.

One side-effect: We can't construct bdns.DNSError with the internal fields we want, because those fields are unexported. So we switch a couple of mock cases to just return a generic error, and the corresponding test cases to expect that error.
2017-09-18 13:10:01 -07:00
Jacob Hoffman-Andrews 08d2018c10 Reject CAA responses containing DNAMEs (#3082)
Since the legacy CAA spec does the wrong thing with DNAMEs (treating them as
CNAMEs), and it's hard to reconcile this approach with CNAME handling, and
DNAMEs are extremely rare, reject outright any CAA responses containing DNAMEs.

Also, in the process, fix a bug in the previous LegacyCAA implementation.
Because the processing of records in LookupCAA was gated by `if
answer.Header().RRType == dnsType`, non-CAA responses were filtered out. This
wasn't caught by previous testing, because it was unittesting that mocked out
bdns.
2017-09-13 10:54:48 -07:00
Jacob Hoffman-Andrews 4266853092 Implement legacy form of CAA (#3075)
This implements the pre-erratum 5065 version of CAA, behind a feature flag.

This involved refactoring DNSClient.LookupCAA to return a list of CNAMEs in addition to the CAA records, and adding an alternate lookuper that does tree-climbing on single-depth aliases.
2017-09-13 10:16:12 -04:00
Roland Bracewell Shoemaker 9d34af6a82 Set AD bit in the header of DNS queries (#3068)
Fixes broken DNSSEC metrics, lack of this bit being set in queries had no security implications.
2017-09-12 09:28:07 -07:00
Jacob Hoffman-Andrews 568407e5b8 Remote VA logging and stats (#3063)
Add a logging statement that fires when a remote VA fail causes
overall failure. Also change remoteValidationFailures into a
counter that counts the same thing, instead of a histogram. Since
the histogram had the default bucket sizes, it failed to collect
what we needed, and produced more metrics than necessary.
2017-09-11 12:50:50 -07:00
Roland Bracewell Shoemaker eadbc19c43 Switch DNS metrics from statsd to prometheus (#2994)
Makes the DNS stats code much nicer if I don't say so myself. Should make investigating DNS problems much easier now as well.

Fixes #2956.
2017-08-22 14:33:36 -07:00
Roland Bracewell Shoemaker 05d869b005 Rename DNSResolver -> DNSClient (#2878)
Fixes #639.

This resolves something that has bugged me for two+ years, our DNSResolverImpl is not a DNS resolver, it is a DNS client. This change just makes that obvious.
2017-07-18 08:37:45 -04:00
Daniel c905cfb8db
Rewords IPv6 -> IPv4 fallback error messages. 2017-05-11 10:35:30 -04:00
Daniel McCarney 47452d6c6c Prefer IPv6 addresses, fall back to IPv4. (#2715)
This PR introduces a new feature flag "IPv6First". 

When the "IPv6First" feature is enabled the VA's HTTP dialer and TLS SNI
(01 and 02) certificate fetch requests will attempt to automatically
retry when the initial connection was to IPv6 and there is an IPv4
address available to retry with.

This resolves https://github.com/letsencrypt/boulder/issues/2623
2017-05-08 13:00:16 -07:00
Roland Bracewell Shoemaker b82c244e65 Add stat for how often DNS responses are signed (#2716)
I'm interested in seeing both how often DNS responses we see are signed (mainly for CAA, but
also interested in other query types). This change adds a new counter, `Authenticated`, that can
be compared against the `Successes` counter to find the percentage of signed responses we see.
The counter is incremented if the `msg.AuthenticatedData` bit is set by the upstream resolver.
2017-05-02 10:57:11 -07:00
Roland Bracewell Shoemaker 2ecb8bf7a5 Remove confusing SetEdns0 call (#2718)
Remove `SetEdns0` call in `bdns.exchangeOne`. Since we talk over TCP to the production
resolver and we don't do any local validation of DNSSEC records adding the EDNS0 OPT
record is pointless and confusing. Testing against a local `unbound` instance shows you
 don't need to set the DO bit for DNSSEC requests/validation to be done at the resolver
level.
2017-05-02 10:55:47 -07:00
Roland Bracewell Shoemaker e2b2511898 Overhaul internal error usage (#2583)
This patch removes all usages of the `core.XXXError` and almost all usages of `probs` outside of the WFE and VA and replaces them with a unified internal error type. Since the VA uses `probs.ProblemDetails` quite extensively in challenges, and currently stores them in the DB I've saved this change for another change (it'll also require a migration). Since `ProblemDetails` should only ever be exposed to end-users all of its related logic should be moved into the `WFE` but since it still needs to be exposed to the VA and SA I've left it in place for now.

The new internal `errors` package offers the same convenience functions as `probs` does as well as a new simpler type testing method. A few small changes have also been made to error messages, mainly adding the library and function name to internal server errors for easier debugging (i.e. where a number of functions return the exact same errors and there is no other way to distinguish which method threw the error).

Also adds proper encoding of internal errors transferred over gRPC (the current encoding scheme is kept for `core` and `probs` errors since it'll be ideally be removed after we deploy this and follow-up changes) using `grpc/metadata` instead of the gRPC status codes.

Fixes #2507. Updates #2254 and #2505.
2017-03-22 23:27:31 -07:00
Daniel McCarney d4902820ca Adds unique VA DNS validation error for empty TXTs. (#2401)
Presently when the VA performs a DNS-01 challenge verification it
returns the same error for the case where the remote nameserver had the
**wrong** TXT value, and when the remote nameserver had an **empty**
response for the TXT query. It would aid debugging if the user was told
which of the two failure cases was responsible for the overall challenge
failure.

This commit adds a unique error message for the empty TXT records case,
and a unit test/mock to exercise the new the error message.

Resolves #2326
2016-12-08 11:27:28 -08:00
Daniel McCarney eb67ad4f88 Allow `validateEmail` to timeout w/o error. (#2288)
This PR reworks the validateEmail() function from the RA to allow timeouts during DNS validation of MX/A/AAAA records for an email to be non-fatal and match our intention to verify emails best-effort.

Notes:

bdns/problem.go - DNSError.Timeout() was changed to also include context cancellation and timeout as DNS timeouts. This matches what DNSError.Error() was doing to set the error message and supports external callers to Timeout not duplicating the work.

bdns/mocks.go - the LookupMX mock was changed to support always.error and always.timeout in a manner similar to the LookupHost mock. Otherwise the TestValidateEmail unit test for the RA would fail when the MX lookup completed before the Host lookup because the error wouldn't be correct (empty DNS records vs a timeout or network error).

test/config/ra.json, test/config-next/ra.json - the dnsTries and dnsTimeout values were updated such that dnsTries * dnsTimeout was <= the WFE->RA RPC timeout (currently 15s in the test configs). This allows the dns lookups to all timeout without the overall RPC timing out.

Resolves #2260.
2016-10-27 11:56:12 -07:00
Daniel McCarney 409f1623e6 Retires `LookupIPv6` VA flag. (#2205)
The LookupIPv6 flag has been enabled in production and isn't required anymore. This PR removes the flag entirely.

The errA and errAAAA error handling in LookupHost is left as-is, meaning that a non-nil errAAAA will not be returned to the caller. This matches the existing behaviour, and the expectations of the TestDNSLookupHost unit tests.

This commit also removes the tests from TestDNSLookupHost that tested the LookupIPv6 == false behaviours since those are no longer implemented.

Resolves #2191
2016-09-26 18:00:01 -07:00
Roland Bracewell Shoemaker 239bf9ae0a Very basic feature flag impl (#1705)
Updates #1699.

Adds a new package, `features`, which exposes methods to set and check if various internal features are enabled. The implementation uses global state to store the features so that services embedded in another service do not each require their own features map in order to check if something is enabled.

Requires a `boulder-tools` image update to include `golang.org/x/tools/cmd/stringer`.
2016-09-20 16:29:01 -07:00
Roland Bracewell Shoemaker c8f1fb3e2f Remove direct usages of go-statsd-client in favor of using metrics.Scope (#2136)
Fixes #2118, fixes #2082.
2016-09-07 19:35:13 -04:00
Jacob Hoffman-Andrews ffd8e92896 Disable validations to 2002::/16 (6to4 anycast) (#2095)
We disable validations to IPs under the 6to4 anycase prefix because
there's too much risk of a malicious actor advertising the prefix and
answering validations for a 6to4 host they do not control.

https://community.letsencrypt.org/t/problems-validating-ipv6-against-host-running-6to4/18312/9
2016-08-01 10:15:32 -04:00
Jacob Hoffman-Andrews 0c0e94dfaf Add enforcement for CAA SERVFAIL (#1971)
https://github.com/letsencrypt/boulder/pull/1971
2016-06-28 11:00:23 -07:00
Ben Irving 51425cab81 Remove race condition from bdns_test.go (#1906)
This PR, makes testing the bdns package more reliable. A race condition in TestMain was resulting in the test running before the test dns server had started. This is fixed by actively polling for the DNS server to be ready before starting the test suite.

Furthermore, a 1 millisecond server read/write timeout was proving to time out on occasion. This is fixed increasing to a 1 second read/write timeout to increase test reliability. 

FYI: ran package bdns tests 1000 times with 22 failures previously, after this PR ran 1000 times with 0 failures.

fixes #1317
2016-06-08 17:33:27 -04:00
Daniel McCarney 77030c3eb1 Make it easier to instantiate ProblemDetails (#1851)
Several of the `ProblemType`s had convenience functions to instantiate `ProblemDetails`s using their type and a detail message. Where these existed I did a quick scan of the codebase to convert places where callers were explicitly constructing the `ProblemDetails` to use the convenience function.

For the `ProblemType`s that did not have such a function, I created one and then converted callers to use it.

Solves #1837.
2016-05-31 14:05:37 -07:00
Roland Bracewell Shoemaker 54573b36ba Remove all stray copyright headers and appends the initial line to LICENSE.txt (#1853) 2016-05-31 12:32:04 -07:00
Kane York 339405bcb9 Look up A and AAAA in parallel (#1760)
This allows validating IPv6-only hosts.

Fixes #593.
2016-05-09 08:38:23 -07:00
Roland Bracewell Shoemaker 35b6e83e81 Implement CAA quorum checking after failure (#1763)
When a CAA request to Unbound times out, fall back to checking CAA via Google Public DNS' HTTPS API, through multiple proxies so as to hit geographically distributed paths. All successful multipath responses must be identical in order to succeed, and at most one can fail.

Fixes #1618
2016-05-05 11:16:58 -07:00
Roland Bracewell Shoemaker c6de21a53a Fix total DNS latency stat (#1751)
exchangeOne used a deferd method which contained a expression as a argument. Because of how defer works the arguments where evaluated immediately (unlike the method) causing the total latency to always be the same.
2016-04-19 10:36:44 -07:00
Jacob Hoffman-Andrews e6c17e1717 Switch to new vendor style (#1747)
* Switch to new vendor style.

* Fix metrics generate command.

* Fix miekg/dns types_generate.

* Use generated copies of files.

* Update miekg to latest.

Fixes a problem with `go generate`.

* Set GO15VENDOREXPERIMENT.

* Build in letsencrypt/boulder.

* fix travis more.

* Exclude vendor instead of godeps.

* Replace some ...

* Fix unformatted cmd

* Fix errcheck for vendorexp

* Add GO15VENDOREXPERIMENT to Makefile.

* Temp disable errcheck.

* Restore master fetch.

* Restore errcheck.

* Build with 1.6 also.

* Match statsd.*"

* Skip errcheck unles Go1.6.

* Add other ignorepkg.

* Fix errcheck.

* move errcheck

* Remove go1.6 requirement.

* Put godep-restore with errcheck.

* Remove go1.6 dep.

* Revert master fetch revert.

* Remove -r flag from godep save.

* Set GO15VENDOREXPERIMENT in Dockerfile and remove _worskpace.

* Fix Godep version.
2016-04-18 12:51:36 -07:00
Roland Bracewell Shoemaker 8eaf247ee9 Split CAA checking out to its own service (#1647)
* Split out CAA checking service (minus logging etc)
* Add example.yml config + follow general Boulder style
* Update protobuf package to correct version
* Add grpc client to va
* Add TLS authentication in both directions for CAA client/server
* Remove go lint check
* Add bcodes package listing custom codes for Boulder
* Add very basic (pull-only) gRPC metrics to VA + caa-service
2016-04-12 23:02:41 -07:00
Kane York 25b45a45ec Errcheck errors fixed (#1677)
* Fix all errcheck errors
* Add errcheck to test.sh
* Add a new sa.Rollback method to make handling errors in rollbacks easier.
This also causes a behavior change in the VA. If a HTTP connection is
abruptly closed after serving the headers for a non-200 response, the
reported error will be the read failure instead of the non-200.
2016-04-12 16:54:01 -07:00
Jacob Hoffman-Andrews 4b318de37e Make a couple of fields private on DNS impl
These fields were not used externally and could not be modified concurrently, so
they should not be exposed.
2016-03-11 22:44:16 -08:00
Kane York 31535f5b89 Perform CAA lookups in parallel.
Also, stop skipping CAA lookups for the root TLDs. The RFC is unclear on
the desired behavior here, but the ICANNTLD function is nonstandard and
the behavior is strictly more conservative than what we had before.

This unblocks the removal of the ICANNTLD function, which allows us to
stop forking upstream.

Closes #1522
2016-03-04 11:07:14 -08:00
Jessica Frazelle 3df2e942be
go fmt fixes
Signed-off-by: Jessica Frazelle <acidburn@docker.com>
2016-02-16 12:19:15 -08:00
Hugo Landau ea9853a35b Remove issuewild support from CAA patch 2016-01-31 02:01:34 +00:00
Hugo Landau 4f27c24cf3 Make CAA checking more compliant with the RFC; CAA refactoring
The CAA response checking method has been refactored to have a
easier to follow straight-line control flow. Several bugs in it have
been fixed:

  - Firstly, parameters for issue and issuewild directives were not
    parsed, so any attempt to specify parameters would result in
    a string mismatch with the CA CAA identity (e.g. "letsencrypt.org").
    Moreover, the syntax as specified permits leading and trailing
    whitespace, so a parameter-free record such as
    "  letsencrypt.org ;  " would not be considered a match.

    This has been fixed by stripping whitespace and parameters. The RFC
    does not specify the criticality of parameters, so unknown
    parameters (currently all parameters) are considered noncritical.
    I justify this as follows:

    If someone decides to nominate a CA in a CAA record, they can,
    with trivial research, determine what parameters, if any, that
    CA supports, and presumably in trusting them in the first place
    is able to adequately trust that the CA will continue to support
    those parameters. The risk from other CAs is zero because other CAs
    do not process the parameters because the records in which they
    appear they do not relate to them.

  - Previously, all of the flag bits were considered to effectively mean
    'critical'. However, the RFC specifies that all bits except for the
    actual critical bit (decimal 128) should be ignored. In practice,
    many people have misunderstood the RFC to mean that the critical bit
    is decimal 1, so both bits are interpreted to mean 'critical', but
    this change ignores all of the other bits. This ensures that the
    remaining six bits are reasonably usable for future standards action
    if any need should arise.

  - Previously, existence of an "issue" directive but no "issuewild"
    directive was essentially equivalent to an unsatisfiable "issuewild"
    directive, meaning that no wildcard identifiers could pass the CAA
    check. This is contrary to the RFC, which states that issuewild
    should default to what is specified for "issue" if no issuewild
    directives are specified. (This is somewhat moot since boulder
    doesn't currently support wildcard issuance.)

  - Conversely, existence of an "issuewild" directive but no "issue"
    directive would cause CAA validation for a non-wildcard identifier
    to fail, which was contrary to the RFC. This has been fixed.

  - More generally, existence of any unknown non-critical directive, say
    "foobar", would cause the CAA checking code to act as though an
    unsatisfiable "issue" directive existed, preventing any issuance.
    This has been fixed.

Test coverage for corner cases is enhanced and provides regression
testing for these bugs.

statsd statistics have been added for tracking the relative frequency
of occurrence of different CAA features and outcomes. I added these on
a whim suspecting that they may be of interest.

Fixes #1436.
2016-01-31 01:51:28 +00:00
Roland Shoemaker d5d4795626 Fix mock CAA response in test 2016-01-27 14:15:16 -08:00