In order to move multi perspective validation forward we need to support policy
in Boulder configuration that can relax multi-va requirements temporarily.
A similar mechanism was used in support of the gradual deprecation of the
TLS-SNI-01 challenge type and with the introduction of CAA enforcement and has
shown to be a helpful tool to have available when introducing changes that are
expected to break sites.
When the VA "multiVAPolicyFile" is specified it is assumed to be a YAML file
containing two lists:
1. disabledNames - a list of domain names that are exempt from multi VA
enforcement.
2. disabledAccounts - a list of account IDs that are exempt from multi VA
enforcement.
When a hostname or account ID is added to the policy we'll begin communication
with the related ACME account contact to establish that this is a temporary
measure and the root problem will need to be addressed before an eventual
cut-off date.
Resolves https://github.com/letsencrypt/boulder/issues/4455
A unit test is included to verify that a TLS-ALPN-01 challenge to
a TLS 1.3 only server doesn't succeed when the `GODEBUG` value to
disable TLS 1.3 in `docker-compose.yml` is set. Without this env var
the test fails on the Go 1.13 build because of the new default:
```
=== RUN TestTLSALPN01TLS13
--- FAIL: TestTLSALPN01TLS13 (0.04s)
tlsalpn_test.go:531: expected problem validating TLS-ALPN-01 challenge against a TLS 1.3 only server, got nil
FAIL
FAIL github.com/letsencrypt/boulder/va 0.065s
```
With the env var set the test passes, getting the expected connection
problem reporting a tls error:
```
=== RUN TestTLSALPN01TLS13
2019/09/13 18:59:00 http: TLS handshake error from 127.0.0.1:51240: tls: client offered only unsupported versions: [303 302 301]
--- PASS: TestTLSALPN01TLS13 (0.03s)
PASS
ok github.com/letsencrypt/boulder/va 1.054s
```
Since we plan to eventually enable TLS 1.3 support and the `GODEBUG`
mechanism tested in the above test is platform-wide vs package
specific I decided it wasn't worth the time investment to write a
similar HTTP-01 unit test that verifies the TLS 1.3 behaviour on a
HTTP-01 HTTP->HTTPS redirect.
Resolves https://github.com/letsencrypt/boulder/issues/4415
This PR changes the VA to return `dns` problem type for errors when performing
HTTP-01 challenges for domains that have no IP addresses, or errors looking up
the IP addresses.
The `va.getAddrs` function is internal to the VA and can return
`berrors.BoulderError`s with a DNS type when there is an error, allowing the
calling code to convert this to a problem when required
using an updated `detailedError` function. This avoids some clunky conversion
the HTTP-01 code was doing that misrepresented DNS level errors as connection
problems with a DNS detail message.
In order to add an integration test for challenge validation that results in
`getAddrs` DNS level errors the Boulder tools image had to be bumped to a tag
that includes the latest `pebble-challtestsrv` that
supports mocking SERVFAILs. It isn't possible to mock this case with internal IP
addresses because our VA test configuration does not filter internal addresses
to support the testing context.
Additionally this branch removes the `UnknownHostProblem` from the `probs`
package:
1. It isn't used anywhere after 532c210
2. It's not a real RFC 8555 problem type. We should/do use the
DNS type for this.
Resolves https://github.com/letsencrypt/boulder/issues/4407
Brings it to be more in line with the responses from the other two challenges and
will hopefully make the challenge a lot easier to debug (like in the recent community
thread).
```json
"error": {
"type": "urn:ietf:params:acme:error:unauthorized",
"detail": "Incorrect validation certificate for tls-alpn-01 challenge. Expected acmeValidationV1 extension value 836bf5358f8a32826c61faeff2e0225b00756f935b00ed3002cabb9d536b9f53 for this challenge but got 8539b12e31c306b81a0aedab4128722c6ad71f71f46316a3c71612f47df0e532",
"status": 403
},
```
This will allow implementing sub-problems without creating a cyclic
dependency between `core` and `problems`.
The `identifier` package is somewhat small/single-purpose and in the
future we may want to move more "ACME" bits beyond the `identifier`
types into a dedicated package outside of `core`.
In some rare conditions the malformed HTTP response error message that
we match in the VA for HTTP-01 connections to HTTP/2 servers will be
returned as a raw `http.badStringError` that doesn't have a transport
connection broken prefix. In these cases the existing
`test_http2_http01_challenge` integration tests fails because the
`h2SettingsFrameErrRegex` doesn't match the returned error.
To accommodate this we make the `h2SettingsFrameErrRegex` optionally
match the transport connection broken prefix.
In practice it seems the only way to add a specific error for when an initial HTTP-01 challenge request is made to an HTTP/2 server mis-configured on `:80` is with a regex on the error string.
The error returned from the stdlib `http.Client` for HTTP to an HTTP/2 server is just an `errors.ErrorString` instance without any context (once you peel it out of the wrapping `url.Error`):
> Err:(*errors.errorString)(0xc420609bf0)}] errStr=[Get http://example.com/.well-known/acme-challenge/xxxxxxx: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x00\x00\x12\x04\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x80\x00\x04\x00\x01\x00\x00\x00\x05\x00\xff\xff\xff\x00\x00\x04\b\x00\x00\x00\x00\x00\u007f\xff\x00\x00\x00\x00\b\a\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01"]
Even directly in the stdlib code at the place in `http/response.go` that generates the error it's using a `&badStringError{}` and just putting the byte string that `textproto` read into it.
To detect this case in `detailedError` I added a pre-compiled regex that will match the net/http malformed HTTP response error for raw bytes matching an arbitrarily sized HTTP/2 SETTINGS frame. Per RFC "A SETTINGS frame MUST be sent by both endpoints at the start of a connection" and so this seems like a fairly reliable indicator of an unexpected HTTP/2 response in an HTTP/1.1 context.
Thanks to @mnordhoff for the detailed notes (and RFC refs) in #3416 It made this a lot easier!
Resolves#3416.
Often folks will mis-configure their webserver to send an HTTP redirect missing
a `/' between the FQDN and the path.
E.g. in Apache using:
Redirect / https://bad-redirect.org
Instead of:
Redirect / https://bad-redirect.org/
Will produce an invalid HTTP-01 redirect target like:
https://bad-redirect.org.well-known/acme-challenge/xxxx
This happens frequently enough we want to return a distinct error message
for this case by detecting the redirect targets ending in ".well-known".
After the "Simple HTTP-01" code landed this case was previously getting an error
message of the form:
> "Invalid hostname in redirect target, must end in IANA registered TLD"
Resolves https://github.com/letsencrypt/boulder/issues/3606
When I introduced the new HTTP-01 code I did it in `va/http.go` intending to try and make the very large `va.go` file a little bit smaller. This is the continuation of that work.
* f96ad92 - moves remaining HTTP-01 specific code to `va/http.go`.
* 1efb9a1 - moves TLS-ALPN-01 code into `va/tlsalpn.go`.
* 95ea567 - moves DNS-01 code into `va/dns.go`.
* 6ff0395 - moves unit tests from `va/va_test.go` into `va/http_test.go`, `va/tlsalpn_test.go` and `va/dns_test.go`.
In the end `va/va.go` contains code related to metrics, top level RPCs (e.g. `PerformValidation`), and the multi-VA code. This makes the file lengths much more manageable overall.
Note: There is certainly room for cleaning up some of the older unit test cruft from `va/va_test.go`. For now I only moved it as-is into the challenge specific test files.
Precursor to #4116. Since some of our dependencies impose a minimum
version on these two packages higher than what we have in Godeps, we'll
have to bump them anyhow. Bumping them independently of the modules
update should keep things a little simpler.
In order to get protobuf tests to pass, I had to update protoc-gen-go in
boulder-tools. Now we download a prebuilt binary instead of using the
Ubuntu package, which is stuck on 3.0.0. This also meant I needed to
re-generate our pb.go files, since the new version generates somewhat
different output.
This happens to change the tag for pbutil, but it's not a substantive change - they just added a tagged version where there was none.
$ go test github.com/miekg/dns/...
ok github.com/miekg/dns 4.675s
ok github.com/miekg/dns/dnsutil 0.003s
ok github.com/golang/protobuf/descriptor (cached)
ok github.com/golang/protobuf/jsonpb (cached)
? github.com/golang/protobuf/jsonpb/jsonpb_test_proto [no test files]
ok github.com/golang/protobuf/proto (cached)
? github.com/golang/protobuf/proto/proto3_proto [no test files]
? github.com/golang/protobuf/proto/test_proto [no test files]
ok github.com/golang/protobuf/protoc-gen-go (cached)
? github.com/golang/protobuf/protoc-gen-go/descriptor [no test files]
ok github.com/golang/protobuf/protoc-gen-go/generator (cached)
ok github.com/golang/protobuf/protoc-gen-go/generator/internal/remap (cached)
? github.com/golang/protobuf/protoc-gen-go/grpc [no test files]
? github.com/golang/protobuf/protoc-gen-go/plugin [no test files]
ok github.com/golang/protobuf/ptypes (cached)
? github.com/golang/protobuf/ptypes/any [no test files]
? github.com/golang/protobuf/ptypes/duration [no test files]
? github.com/golang/protobuf/ptypes/empty [no test files]
? github.com/golang/protobuf/ptypes/struct [no test files]
? github.com/golang/protobuf/ptypes/timestamp [no test files]
? github.com/golang/protobuf/ptypes/wrappers [no test files]
Previously we relied on each instance of `features.Set` to have a
corresponding `defer features.Reset()`. If we forget that, we can wind
up with unexpected behavior where features set in one test case leak
into another test case. This led to the bug in
https://github.com/letsencrypt/boulder/issues/4118 going undetected.
Fix#4120
* `EnforceMultiVA` to allow configuring multiple VAs but not changing the primary VA's result based on what the remote VAs return.
* `MultiVAFullResults` to allow collecting all of the remote VA results. When all results are collected a JSON log line with the differential between the primary/remote VAs is logged.
Resolves https://github.com/letsencrypt/boulder/issues/4066
* A redirect without a hostname is obviously bad and should get
a distinct error message as early as possible.
* A redirect to a hostname that doesn't end in an IANA registered TLD is
also obviously bad and should get a distinct error message as early as
possible.
* Remove the challenge whitelist
* Reduce the signature for ChallengesFor and ChallengeTypeEnabled
* Some unit tests in the VA were changed from testing TLS-SNI to testing the same behavior
in TLS-ALPN, when that behavior wasn't already tested. For instance timeouts during connect
are now tested.
Fixes#4109
Our integration test test_http_challenge_timeout occasionally fails with
boulder-ra [AUDIT] Could not communicate with VA: rpc
error: code = DeadlineExceeded desc = context deadline exceeded
In at least one of these cases, the VA correctly timed-out its HTTP
request and logged a validation error with the correct error message.
I believe that there is a race between the VA returning its validation
error to the RA, and the RA timing out its gRPC call. By shaving some
time off the context we should more reliably get the response back to
the RA.
The order the primary VA calls `PerformValidation` on configured
remote VAs is also changed to be done in a random order.
Resolves#4087
We're only using the simplified HTTP-01 code from `va/http.go` now 🎉 The old unit tests that still seem relevant are left in place in `va/va_test.go` instead of being moved to `va/http_test.go` to signal that they're a bit crufty and could probably use a separate cleanup. For now I'm hesitant to remove test coverage so I updated them in-place without moving them to a new home.
Resolves https://github.com/letsencrypt/boulder/issues/4089
When this test fails, it logs the fact that it got the wrong type of
ProblemDetail, but not what the actual ProblemDetail was. Fixing that
will make it easier to track down intermittent failures.
* in boulder-ra we connected to the publisher and created a publisher gRPC client twice for no apparent reason
* in the SA we ignored errors from `getChallenges` in `GetAuthorizations` which could result in a nil challenge being returned in an authorization
The `singleDialTimeout` field was previously a global `const` in the
`va` package. Making it a field of the VA impl (and the dialer structs)
makes it easier to test that it is working as expected with a smaller
than normal value.
A new `TestPreresolvedDialerTimeout` unit test is added that tests the
fix from https://github.com/letsencrypt/boulder/pull/4046
Without the fix applied:
```
=== RUN TestPreresolvedDialerTimeout
--- FAIL: TestPreresolvedDialerTimeout (0.49s)
http_test.go:86: fetch didn't timeout after 50ms
FAIL
FAIL github.com/letsencrypt/boulder/va 0.512s
```
With the fix applied:
```
=== RUN TestPreresolvedDialerTimeout
--- PASS: TestPreresolvedDialerTimeout (0.05s)
PASS
ok github.com/letsencrypt/boulder/va 1.075s
```
The `DefaultTransport`'s `DialContext` sets a `Timeout` and `KeepAlive`
of 30 seconds. When configured with the `SimplifiedVAHTTP` feature flag
we need to use a shorter `Timeout`.
The URL construction approach we were previously using for the refactored VA HTTP-01 validation code was nice but broke SNI for HTTP->HTTPS redirects. In order to preserve this functionality we need to use a custom `DialContext` handler on the HTTP Transport that overrides the target host to use a pre-resolved IP.
Resolves https://github.com/letsencrypt/boulder/issues/3969
Adds a feature which gates creation of authorizations following the style required for the new schema (and which can be used for gating the reset of our new schema code later down the road).
There was an internal discussion about an issue this creates regarding a predictable ordering of challenges within a challenge due to sequential challenge IDs which will always be static for each challenge type. It was suggested we could add some kind of obfuscation to the challenge ID when presented to the user to prevent this. This hasn't been done in this PR as it would only be focused in the WFE and would be better suited as its own changeset.
Fixes#3981.
When the `SimplifiedVAHTTP01` feature flag is enabled we need to
preserve query parameters when reconstructing a redirect URL for the
resolved IP address.
To add integration testing for this condition the Boulder tools images
are updated to in turn pull in an updated `pebble-challtestsrv` command
that tracks request history.
A new Python wrapper for the `pebble-challtestsrv` HTTP API is added to
centralize interacting with the chall test srv to add mock data and to
get the history of HTTP requests that have been processed.
Marshaling invalid UTF-8 strings to protocol buffers causes an error. This can
happen in VA `PerformValidation` RPC responses if remote servers return invalid
UTF-8 in some ACME challenge contexts. We previously fixed this for HTTP-01 and
DNS-01 but missed a case where TLS-ALPN-01/TLS-SNI-01 challenge response
certificate content was included in error messages without replacing invalid
UTF-8. That's now fixed & unit tests are added.
To aid in diagnosing any future instances the VA is also updated to proactively
attempt to marshal its `PerformValidation` results before handing off to the RPC
wrappers that will do the same. This way if we detect an error in marshaling the
VA can audit log the escaped content for investigation purposes.
Hopefully with these two efforts combined we can avoid any future VA RPC errors
from UTF-8 encoding.
Resolves https://github.com/letsencrypt/boulder/issues/3838
To complete https://github.com/letsencrypt/boulder/issues/3956 the `challtestsrv` is updated such that its existing TLS-ALPN-01 challenge test server will serve HTTP-01 responses with a self-signed certificate when a non-TLS-ALPN-01 request arrives. This lets the TLS-ALPN-01 challenge server double as a HTTPS version of the HTTP challenge server. The `challtestsrv` now also supports adding/remove redirects that will be served to clients when requesting matching paths.
The existing chisel/chisel2 integration tests are updated to use the `challtestsrv` instead of starting their own standalone servers. This centralizes our mock challenge responses and lets us bind the `challtestsrv` to the VA's HTTP port in `startservers.py` without clashing ports later on.
New integration tests are added for HTTP-01 redirect scenarios using the updated `challtestserv`. These test cases cover:
* valid HTTP -> HTTP redirect
* valid HTTP -> HTTPS redirect
* Invalid HTTP -> non-HTTP/HTTPS port redirect
* Invalid HTTP-> non-HTTP/HTTPS protocol scheme redirect
* Invalid HTTP-> bare IP redirect
* Invalid HTTP redirect loop
The new integration tests shook out two fixes that were required for the legacy VA HTTP-01 code (afad22b) and one fix for the challtestsrv mock DNS (59b7d6d).
Resolves https://github.com/letsencrypt/boulder/issues/3956
The VA code activated by the `SimplifiedHTTP01` flag had a regression
that constructed validation URLs for HTTP-01 challenge redirects to
a HTTPS host incorrectly.
The regression is fixed and the unit tests are updated to cover this
case. An integration test is not included in this commit but will be
done as follow-up work since it requires adjusting existing integration
tests for HTTP-01 to use the challtestrv instead of Certbot's ACME
standalone server.
Continued bugs from the custom dialer approach used by the VA for HTTP-01 (most recently https://github.com/letsencrypt/boulder/issues/3889) motivated a rewrite.
Instead of using a custom dialer to be able to control DNS resolution for HTTP validation requests we can construct URLs for the IP addresses we resolve and overload the Host header. This avoids having to do address resolution within the dialer and eliminates the complexity of the dialer `addrInfoChan`. The only thing left for our custom dialer now is to shave some time off of the provided context to help us discern timeouts before/after connect.
The existing IP preference & fallback behaviour is preserved: e.g. if a host has both IPv6 and IPv4 addresses we connect to the first IPv6 address. If there is a network error connecting to that address (e.g. an error during "dial"), we try once more with the first IPv4 address. No other retries are done. Matching existing behaviour no fallback is done for HTTP level failures on an IPv6 address (e.g. mismatched webroots, redirect loops, etc). A new Prometheus counter "http01_fallbacks" is used to keep track of the number of fallbacks performed.
As a result of moving the layer at which the retry happens a fallback like described above will now produce two validation records: one for the initial IPv6 connection, and one for the IPv4 connection. Neither will have the "addressesTried" field populated, just "addressesResolved" and "addressUsed". Previously with the dialer doing the retry we would have created just one validation record with an IPv4 "addressUsed" field and both an IPv6 and IPv4 address in the "addressesTried" field.
Because this is a big diff for a key part of the VA the new code is gated by the `SimplifiedVAHTTP` feature flag.
Resolves#3889
In the VA, we were rendering a Duration to JSON, which gave an integer
number of nanoseconds rather than a float64 of seconds. Also, in both VA
and WFE we were rendering way more precision than we needed. Millisecond
precision is enough, and since we log latency for every WFE response,
the extra bytes are worth saving.
Previously, if a TLS handshake timed out, we would block forever in
`conn.Handshake()`, leaking both a TCP connection and a goroutine.
This sets a deadline on the underlying TCP connection, ensuring that
`conn.Handshake()` eventually times out.
Fixes#3915
Removes the checks for a handful of deployed feature flags in preparation for removing the flags entirely. Also moves all of the currently deprecated flags to a separate section of the flags list so they can be more easily removed once purged from production configs.
Fixes#3880.
For HTTP-01 challenges that return incorrect responses, the
VA tries to put the first little bit of the HTTP response in the problem
detail.
However, VA needs to be able to serialize the problem detail as a
protobuf to send it to the RA, and protobufs require string types to be
UTF-8. Filter out any invalid UTF-8 sequences and replace them with
REPLACEMENT CHARACTER.
This switches from whitespace to semi-colon separated tag/value parameters,
while implementing stricter checks on valid tag and value values (to match
the RFC). Test coverage is added for CAA value parameter parsing, along with
some additional tests for CAA records with multiple parameter values.
Fixes issue #3795.
If the context provided to a remote VA's `PerformValidation` is
cancelled we should not treat the returned context cancelled error as an
unexpected error and should instead ignore it as an expected result.
Previously, if a remote VA returned an error that is not a ProblemDetail, the
primary VA would log a ServerInternalProblem but not the underlying error.
This commit updates performRemoteValidation to always return the full error it
receives from a remote VA.
This commit also adds a unittest that checks that the VA still returns a
ServerInternalProblem to the RA, and that the VA audit logs the underlying
error.
Resolves https://github.com/letsencrypt/boulder/issues/3753
This adds support for the account-uri CAA parameter as specified by
section 3 of https://tools.ietf.org/html/draft-ietf-acme-caa-04, allowing
issuance to be restricted to one or more ACME accounts as specified by CAA
records.
This commit updates the VA's checkCAA function to include the provided
challengeType parameter as part of the audit log line for the CAA
check result. If challengeType == nil then the value "none" is logged.
Unit tests are updated accordingly.
This change will make it easier to distinguish cases where "Valid for
issuance" is false because of a validation-methods restriction.
Resolves#3740
When performing CAA checking respect the validation-methods parameter (if
present) and restrict the allowed authorization methods to those specified.
This allows a domain to restrict authorization methods that can be used with
Let's Encrypt.
This is largely based on PR #3003 (by @lukaslihotzki), which was landed and
then later reverted due to issue #3143. The bug the resulted in the previous
code being reverted has been addressed (likely inadvertently) by 76973d0f.
This implementation also includes integration tests for CAA validation-methods.
Fixes issue #3143.
This is a quick first pass at audit logging the *dns.CAA records in JSON format from within the VA's IsCAAValid function. This will provide more information for post-hoc analysis of CAA decisions.
Per RFC 6844 Section 5.1 the matching of CAA tag identifiers (e.g.
"Issue") is case insensitive. This commit updates the CAA tag processing
to be case insensitive as required by the RFC.
To exercise the fix this commit adds a test case to the `caaMockDNS`
`LookupCAA` implementation for a hostname (`mixedcase.com`) that has
a CAA record with a mixed case `Issue` tag. Prior to the fix from this
branch being included in `va/caa.go` the test fails:
```
--- FAIL: TestCAAChecking/Bad_(Reserved,_Mixed_case_Issue) (0.00s)
caa_test.go:292: checkCAARecords validity mismatch for
mixedcase.com: got true expected false
```
With the fix applied, the test passes.
Remove various unnecessary uses of fmt.Sprintf - in particular:
- Avoid calls like t.Error(fmt.Sprintf(...)), where t.Errorf can be used directly.
- Use strconv when converting an integer to a string, rather than using
fmt.Sprintf("%d", ...). This is simpler and can also detect type errors at
compile time.
- Instead of using x.Write([]byte(fmt.Sprintf(...))), use fmt.Fprintf(x, ...).
Many of the probs.XYZ calls are of the form probs.XYZ(fmt.Sprintf(...)).
Convert these functions to take a format string and optional arguments,
following the same pattern used in the errors package. Convert the
various call sites to remove the now redundant fmt.Sprintf calls.
A very large number of the logger calls are of the form log.Function(fmt.Sprintf(...)).
Rather than sprinkling fmt.Sprintf at every logger call site, provide formatting versions
of the logger functions and call these directly with the format and arguments.
While here remove some unnecessary trailing newlines and calls to String/Error.
This change cleans up how `va.http01Dialer` works with regards to `core.ValidationRecord`s. Instead of using the record as both an input and a output it now uses a set of inputs and outputs information about addresses via a channel. The validation record is then constructed in the parent scope or in the redirect function instead of the dialer itself.
Fixes#2730, fixes#3109, and fixes#3663.
This field was set to singleDialTimeout, but the net/http library treats
it as covering all of dial, write headers, and read headers and body.
Since http01Dialer also uses singleDialTimeout, there's a race between
http01Dialer and net/http to see who will time out first. The result is
that sometimes we give "Timeout after connect" when the error really
should be "Timeout during connect." This issue also inhibits IPv6 to
IPv4 fallback, and tickles a data race that was causing a rare panic in
VA: https://github.com/letsencrypt/boulder/issues/3109.
After this change, the overall HTTP request will get the full deadline
allowed by the RPC context. The dialer will continue to use
singleDialTimeout for each of its two possible dial attempts.
This allows us to have fast-running unittests without modifying the global state in singleDialTimeout,
which can become a const.
Fixes#3628.
Builds on top of #3629, review that first.
In particular, differentiate timeouts during connect (which are usually a firewall problem) from timeouts after connect (which are usually a software problem). In the process, refactor the tests and add testing for specific problem detail messages.
This also switches over the HTTP challenge's dialer to use DialContext, and to shave a little bit of headroom off of the context deadline, so that the dial can report its timeout before the overall context expires, which would lead to an overly generic "deadline exceeded" error, which would then get translated (incorrectly) into a "timeout after connect."
There is an additional error case, Timeout during %s (your server may be slow or overloaded), (where %s can be read or write) which doesn't have any unittests. I believe it may not be possible to trigger this, since read and write timeouts get subsumed by the HTTP or TLS library, but it's worth having as a fallback case. We'll see if it shows up in the logs.
Among the test refactorings, I shortened the timeout on the TLS timeout test to 50ms. Previously this was the long pole making the whole test take 10s. Now it takes ~500 ms overall.
I recommend starting review at https://github.com/letsencrypt/boulder/compare/detailed-va-errors?expand=1#diff-4c51d1d7ca3ec3022d14b42809af0d7eR671 (the changes to detailedError), then reviewing the Dial -> DialContext changes, then the tests.
Also instead of repeating the same bucket definitions everywhere just use a single top level var in the metrics package in order to discourage copy/pasting.
Fixes#3607.
Right now we check safe browsing at new-authz time, which introduces a possible
external dependency when calling new-authz. This is usually fine, since most safe
browsing checks can be satisfied locally, but when requests have to go external,
it can create variance in new-authz timing.
Fixes#3491.
This commit updates the RA to make the notion of submitting
a KeyAuthorization value as part of the ra.UpdateAuthorization call
optional. If set, the value is enforced against expected and an error is
returned if the provided authorization isn't correct. If it isn't set
the RA populates the field with the computed authorization for the VA to
enforce against the value it sees in challenges. This retains the legacy
behaviour of the V1 API. The V2 API will never unmarshal a provided
key authorization.
The ACMEv2/WFEv2 prepChallengeForDisplay function is updated to strip
the ProvidedKeyAuthorization field before sending the challenge object
back to a client. ACMEv1/WFEv1 continue to return the KeyAuthorization
in challenges to avoid breaking clients that are relying on this legacy
behaviour.
For deployability ease this commit retains the name of the
core.Challenge.ProvidedKeyAuthorization field even though it should
be called core.Challenge.ComputedKeyAuthorization now that it isn't
set based on the client's provided key authz. This will be easier as
a follow-up change.
Resolves#3514
This code was never enabled in production. Our original intent was to
ship this as part of the ACMEv2 API. Before that could happen flaws were
identified in TLS-SNI-01|02 that resulted in TLS-SNI-02 being removed
from the ACME protocol. We won't ever be enabling this code and so we
might as well remove it.
Before this change, we would just log "Correct value not found for DNS challenge"
when we got a TXT record that didn't match what we expected. This was different
from the error when no TXT records were found at all, but viewing the error out of
context doesn't make that clear. This change improves the error to specifically say
that we found a TXT record, but it was the wrong one.
Also in this change: if we found multiple TXT records, we mention the number;
and we trim the length of the echoed TXT record.
This commit implements RFC 6844's description of the "CAA issuewild
property" for CAA records.
We check CAA in two places: at the time of validation, and at the time
of issuance when an authorization is more than 8hours old. Both
locations have been updated to properly enforce issuewild when checking
CAA for a domain corresponding to a wildcard name in a certificate
order.
Resolves https://github.com/letsencrypt/boulder/issues/3211
This PR implements issuance for wildcard names in the V2 order flow. By policy, pending authorizations for wildcard names only receive a DNS-01 challenge for the base domain. We do not re-use authorizations for the base domain that do not come from a previous wildcard issuance (e.g. a normal authorization for example.com turned valid by way of a DNS-01 challenge will not be reused for a *.example.com order).
The wildcard prefix is stripped off of the authorization identifier value in two places:
When presenting the authorization to the user - ACME forbids having a wildcard character in an authorization identifier.
When performing validation - We validate the base domain name without the *. prefix.
This PR is largely a rewrite/extension of #3231. Instead of using a pseudo-challenge-type (DNS-01-Wildcard) to indicate an authorization & identifier correspond to the base name of a wildcard order name we instead allow the identifier to take the wildcard order name with the *. prefix.
This PR changes the VA's singleDialTimeout value from 5 * time.Second to 10 * time.Second. This will give slower servers a better chance to respond, especially for the multi-VA case where n requests arrive ~simultaneously.
This PR also bumps the RA->VA timeout by 5s and the WFE->RA timeout by 5s to accommodate the increased dial timeout. I put this in a separate commit in case we'd rather deal with this separately.
Updates the buckets for histograms in the publisher, va, and expiration-mailer which are used to measure the latency of operations that go over the internet and therefore are liable to take a lot longer than the default buckets can measure. Uses a standard set of buckets for all three instead of attempting to tune for each one.
Fixes#3217.
This pulls in google/safebrowsing#74, which introduces a new LookupURLsContext that allows us to pass through timeout information nicely.
Also, update calling code to use LookupURLsContext instead of LookupURLs.
This commit adds CAA `issue` paramter parsing and the `challenge` parameter to permit a single challenge type only. By setting `challenge=dns-01`, the nameserver keeps control over every issued certificate.
Prior to this commit the VA would follow redirects from the initial
HTTP-01 challenge request on port 80 to any other port. In practice the
Let's Encrypt production environment has network egress firewall rules
that drop outbound requests that are not on port 80 or 443. In effect
this meant any challenge request that was redirected from 80 to a port
other than 80/443 was turned into a mysterious connection timeout error.
We have decided to preserve the egress firewall rule and continue to act
conservatively. Only port 80 and 443 should be allowed in redirects.
This commit updates the VA to return a clear error message when
a non-80/443 redirect is made.
To aid in testing/configuration the actual ports enforced are specified
by the va.httpPort and va.httpsPort that are used for the initial
outbound HTTP-01 connection.
The VA TestHTTPRedirectLookup unit test is updated accordingly to test
that a non-80/443 redirect fails with the expected message.
Resolves#3049
For certificates with many domains it can be difficult to associate
a given CAA error with the specific domain that caused it. To make this
easier this commit explicitly prefixes all of the problems that can be
returned from `va.IsCAAValid` with the domain name in question.
A small unit test is included to check a CAA problem's detail message is
suitably prefixed with the affected domain.
There were a bunch of test fixtures in bdns/mocks.go that were only used in va/caa_test.go. This moves them to be in the same file so we have less spooky action at a distance.
One side-effect: We can't construct bdns.DNSError with the internal fields we want, because those fields are unexported. So we switch a couple of mock cases to just return a generic error, and the corresponding test cases to expect that error.
Previously, CAA problems were lumped in under "ConnectionProblem" or
"Unauthorized". This should make things clearer and easier to differentiate.
Fixes#3043
This implements the pre-erratum 5065 version of CAA, behind a feature flag.
This involved refactoring DNSClient.LookupCAA to return a list of CNAMEs in addition to the CAA records, and adding an alternate lookuper that does tree-climbing on single-depth aliases.
Add a logging statement that fires when a remote VA fail causes
overall failure. Also change remoteValidationFailures into a
counter that counts the same thing, instead of a histogram. Since
the histogram had the default bucket sizes, it failed to collect
what we needed, and produced more metrics than necessary.
To support having problem types that use either the classic
"urn:acme:error" namespace or the new "urn:ietf:params:acme:error"
namespace as appropriate we need to prefix the problem type at runtime
right before returning it through the WFE to the user as JSON. This
commit updates the WFE/WFE2 to do this for both problems sent through
sendError as well as problems embedded in challenges. For the latter
we do not modify problems with a type that is already prefixed to
support backwards compatibility.
Resolves#2938
Note: We should cut a follow-up issue to devise a way to share some
common code between the WFE and WFE2. For example, the
prepChallengeForDisplay should probably be hoisted to a common
"web" package
Fixes#2889.
VA now implements two gRPC services: VA and CAA. These both run on the same port, but this allows implementation of the IsCAAValid RPC to skip using the gRPC wrappers, and makes it easier to potentially separate the service into its own package in the future.
RA.NewCertificate now checks the expiration times of authorizations, and will call out to VA to recheck CAA for those authorizations that were not validated recently enough.
va.go is quite a large file. This splits out the CAA-related code and tests into its own file for simplicity. This is a simple move; no code has been changed, and there is no package split.
The VA test had a global:
`var ident = core.AcmeIdentifier{Type: core.IdentifierDNS, Value: "localhost"}`
Evidently this was meant as a convenience to avoid having to retype this common value, but it wound up being mutated independently by different tests. This PR replaces it with a convenience function `dnsi()` that generates a DNS-type identifier with the given hostname. Makes the VA test much more reliable locally.
This commit updates the VA's `IsSafeDomain` RPC to treat errors from the
Google Safe Browsing client as a positive response. Subsequently the VA
will only block authz creation in the case that the GSB API returns
a true negative (e.g. confirms an unsafe domain). If the database is in
an inconsistent state due to an API outage we will allow the authz to be
created.
This commit replaces the Boulder dependency on
gopkg.in/square/go-jose.v1 with gopkg.in/square/go-jose.v2. This is
necessary both to stay in front of bitrot and because the ACME v2 work
will require a feature from go-jose.v2 for JWS validation.
The largest part of this diff is cosmetic changes:
Changing import paths
jose.JsonWebKey -> jose.JSONWebKey
jose.JsonWebSignature -> jose.JSONWebSignature
jose.JoseHeader -> jose.Header
Some more significant changes were caused by updates in the API for
for creating new jose.Signer instances. Previously we constructed
these with jose.NewSigner(algorithm, key). Now these are created with
jose.NewSigner(jose.SigningKey{},jose.SignerOptions{}). At present all
signers specify EmbedJWK: true but this will likely change with
follow-up ACME V2 work.
Another change was the removal of the jose.LoadPrivateKey function
that the wfe tests relied on. The jose v2 API removed these functions,
moving them to a cmd's main package where we can't easily import them.
This function was reimplemented in the WFE's test code & updated to fail
fast rather than return errors.
Per CONTRIBUTING.md I have verified the go-jose.v2 tests at the imported
commit pass:
ok gopkg.in/square/go-jose.v2 14.771s
ok gopkg.in/square/go-jose.v2/cipher 0.025s
? gopkg.in/square/go-jose.v2/jose-util [no test files]
ok gopkg.in/square/go-jose.v2/json 1.230s
ok gopkg.in/square/go-jose.v2/jwt 0.073s
Resolves#2880
Fixes#639.
This resolves something that has bugged me for two+ years, our DNSResolverImpl is not a DNS resolver, it is a DNS client. This change just makes that obvious.
If we hit a `syscall.ECONNRESET` error return a more useful error than `Error getting validation data`, updates the `TestDetailedError` test to cover this case.
Fixes#2851.
The implementation of the dialer used by the HTTP01 challenge, constructed with `resolveAndConstructDialer`, used the same wrapped `net.Dialer` for both the initial IPv6 connection, and any subsequent IPv4 fallback connections. This caused the IPv4 fallback to never succeed for cases where the initial IPv6 connection expended the `validationTimeout`.
This commit updates the http01Dialer (newly renamed from `dialer` since it is in fact specific to HTTP01 challenges) to use a fresh dialer for each connection. To facilitate testing the http01Dialer maintains
a count of how many dialer instances it has constructed. We use this in a unit test to ensure the correct behaviour without a great deal of new mocking/interfaces.
Resolves#2770
This commit improves the rather vague error message that was previously returned if an IPv6 challenge validation failed when IPv6First was enabled and there were no IPv4 addresses left to try as a fallback.
Resolves#2821
Adds basic multi-path validation functionality. A new method `performRemoteValidation` is added to `boulder-va` which is called if it is configured with a list of remote VA gRPC addresses. In this initial implementation the remote VAs are only used to check the validation result of the main VA, if all of the remote validations succeed but the local validation failed, the overall validation will still fail. Remote VAs use the exact same code as the local VA to perform validation. If the local validation succeeds then a configured quorum of the remote VA successes must be met in order to fully complete the validation.
This implementation assumes that metrics are collected from the remote VAs in order to have visibility into their individual validation latencies etc.
Fixes#2621.
Previously, a lot of validations problems would give the message "Failed to
connect to X for ..." This was misleading because the issue was not always a
connection error, and when it was, it was valuable to distinguish between
connection refused and timeout. Also, for HTTP, this message would echo the
first URL in a redirect chain, when we really want the URL that failed.
Renames the misleading "parseHTTPConnError" and removes an inaccurate
check for temporary errors. It also eliminates the "detail" argument, instead
generating all messages inside the function.
Improves the handling of tls.alert errors to actually pass through the error
message, rather than just quietly changing the problem type (which was very
easy to miss).
Gives a specific error message for timeouts.
Preserves the URL from url.Error types and incorporates it into error messages.
Splits the HTTP timeout test into its own test case.
Following up on #2752, we don't need to use global vars for our Prometheus stats. We already have a custom registry plumbed through using Scope objects. In this PR, expose the MustRegister method of that registry through the Scope interface, and move existing global vars to be fields of objects. This should improve testability somewhat.
Note that this has a bit of an unfortunate side effect: two instances of the same stats-using class (e.g. VA) can't use the same Scope object, because their MustRegister calls will conflict. In practice this is fine since we never instantiate duplicates of the the classes that use stats, but it's something we should keep an eye on.
Updates #2733
When the VA validates a domain, it runs both CAA check and validation check in
parallel. If the validation check returns a failed result before the CAA check is done,
the VA returns immediately. However, the CAA check increments a stat. So there's
a flaky condition where, depending on the timing, that stat increment may or may
not get called on the MockScope. Add `.AnyTimes()` to our mock statements to
indicate we don't care whether Inc gets called.
This removes the config and code to output to statsd.
- Change `cmd.StatsAndLogging` to output a `Scope`, not a `Statter`.
- Remove the prefixing of component name (e.g. "VA") in front of stats; this was stripped by `autoProm` but now no longer needs to be.
- Delete vendored statsd client.
- Delete `MockStatter` (generated by gomock) and `mocks.Statter` (hand generated) in favor of mocking `metrics.Scope`, which is the interface we now use everywhere.
- Remove a few unused methods on `metrics.Scope`, and update its generated mock.
- Refactor `autoProm` and add `autoRegisterer`, which can be included in a `metrics.Scope`, avoiding global state. `autoProm` now registers everything with the `prometheus.Registerer` it is given.
- Change va_test.go's `setup()` to not return a stats object; instead the individual tests that care about stats override `va.stats` directly.
Fixes#2639, #2733.
This PR introduces a new feature flag "IPv6First".
When the "IPv6First" feature is enabled the VA's HTTP dialer and TLS SNI
(01 and 02) certificate fetch requests will attempt to automatically
retry when the initial connection was to IPv6 and there is an IPv4
address available to retry with.
This resolves https://github.com/letsencrypt/boulder/issues/2623
This change changes the returning values from boolean to error.
It makes `checkConsistency` an internal function and removes the
optional argument in favor of making checks explicit where they are
used.
It also renames those functions to CheckConsistency* to not
give the impression of still returning boolean values.
Signed-off-by: David Calavera <david.calavera@gmail.com>
Rename HTTPMonitor to MeasuredHandler.
Remove inflight stat (we didn't use it).
Add timing stat by method, endpoint, and status code.
The timing stat subsumes the "rate" stat, so remove that.
WFE now wraps in MeasuredHandler, instead of relying on its cmd/main.go.
Remove FBAdapter stats.
MeasuredHandler tracks stats by method, status code, and endpoint.
In VA, add a Prometheus histogram for validation timing.
Updates the various gRPC/protobuf libs (google.golang.org/grpc/... and github.com/golang/protobuf/proto) and the boulder-tools image so that we can update to the newest github.com/grpc-ecosystem/go-grpc-prometheus. Also regenerates all of the protobuf definition files.
Tests run on updated packages all pass.
Unblocks #2633fixes#2636.
This patch removes all usages of the `core.XXXError` and almost all usages of `probs` outside of the WFE and VA and replaces them with a unified internal error type. Since the VA uses `probs.ProblemDetails` quite extensively in challenges, and currently stores them in the DB I've saved this change for another change (it'll also require a migration). Since `ProblemDetails` should only ever be exposed to end-users all of its related logic should be moved into the `WFE` but since it still needs to be exposed to the VA and SA I've left it in place for now.
The new internal `errors` package offers the same convenience functions as `probs` does as well as a new simpler type testing method. A few small changes have also been made to error messages, mainly adding the library and function name to internal server errors for easier debugging (i.e. where a number of functions return the exact same errors and there is no other way to distinguish which method threw the error).
Also adds proper encoding of internal errors transferred over gRPC (the current encoding scheme is kept for `core` and `probs` errors since it'll be ideally be removed after we deploy this and follow-up changes) using `grpc/metadata` instead of the gRPC status codes.
Fixes#2507. Updates #2254 and #2505.
I think these are all the necessary changes to implement TLS-SNI-02 validations, according to the section 7.3 of draft 05:
https://tools.ietf.org/html/draft-ietf-acme-acme-05#section-7.3
I don't have much experience with this code, I'll really appreciate your feedback.
Signed-off-by: David Calavera <david.calavera@gmail.com>
The VA's `getAddr` function prior to this commit had an outdated comment
& a pointer to a TODO for Boulder Issue #593. That issue has been closed
and bdns' `LookupHost` supports AAAA records now. This commit updates
the comment to match the current behaviour and removes the TODO.
Previous to this PR the VA's validateTLSWithZName function would
return an error message containing the SAN names of the leaf certificate
when the validation failed. This commit updates that message to include
the Subject Common Name of the leaf cert in addition to the SANs. The
names are deduplicated to prevent listing a Subj CN twice if its also
a SAN. This will help debug cases where a cert with no SANs is returned
by the server.
In addition, the number of certificates in the chain received from the
server is included in the message. This will hopefully further help
users identify misconfiguration since a TLS SNI 01 challenge response
should have a chain length of 1.
Resolves#2468
If the VA fails to validate a TLS-SNI-01 challenge because it is trying to talk TLS to a HTTP-only server return a special error message that is slightly more informative.
Right now we are using a third-party client for the Google Safe Browsing API, but Google has recently released their own [Golang library](https://github.com/google/safebrowsing) which also supports the newer v4 API. Using this library will let us avoid fixing some lingering race conditions & unpleasantness in our fork of `go-safebrowsing-api`.
This PR adds support for using the Google library & the v4 API in place of our existing fork when the `GoogleSafeBrowsingV4` feature flag is enabled in the VA "features" configuration.
Resolves https://github.com/letsencrypt/boulder/issues/1863
Per `CONTRIBUTING.md` I also ran the unit tests for the new dependency:
```
daniel@XXXXXXXXXX:~/go/src/github.com/google/safebrowsing$ go test ./...
ok github.com/google/safebrowsing 3.274s
? github.com/google/safebrowsing/cmd/sblookup [no test files]
? github.com/google/safebrowsing/cmd/sbserver [no test files]
? github.com/google/safebrowsing/cmd/sbserver/statik [no test files]
? github.com/google/safebrowsing/internal/safebrowsing_proto [no test files]
ok github.com/google/safebrowsing/vendor/github.com/golang/protobuf/jsonpb 0.012s
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/jsonpb/jsonpb_test_proto [no test files]
ok github.com/google/safebrowsing/vendor/github.com/golang/protobuf/proto 0.062s
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/proto/proto3_proto [no test files]
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go [no test files]
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go/descriptor [no test files]
ok github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go/generator 0.017s
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go/grpc [no test files]
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go/plugin [no test files]
ok github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes 0.009s
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/any [no test files]
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/duration [no test files]
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/empty [no test files]
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/struct [no test files]
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/timestamp [no test files]
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/wrappers [no test files]
? github.com/google/safebrowsing/vendor/github.com/rakyll/statik [no test files]
? github.com/google/safebrowsing/vendor/github.com/rakyll/statik/fs [no test files]
ok github.com/google/safebrowsing/vendor/golang.org/x/net/idna 0.003s
```
Presently when the VA performs a DNS-01 challenge verification it
returns the same error for the case where the remote nameserver had the
**wrong** TXT value, and when the remote nameserver had an **empty**
response for the TXT query. It would aid debugging if the user was told
which of the two failure cases was responsible for the overall challenge
failure.
This commit adds a unique error message for the empty TXT records case,
and a unit test/mock to exercise the new the error message.
Resolves#2326
This commit updates the `go-jose` dependency to [v1.1.0](https://github.com/square/go-jose/releases/tag/v1.1.0) (Commit: aa2e30fdd1fe9dd3394119af66451ae790d50e0d). Since the import path changed from `github.com/square/...` to `gopkg.in/square/go-jose.v1/` this means removing the old dep and adding the new one.
The upstream go-jose library added a `[]*x509.Certificate` member to the `JsonWebKey` struct that prevents us from using a direct equality test against two `JsonWebKey` instances. Instead we now must compare the inner `Key` members.
The `TestRegistrationContactUpdate` function from `ra_test.go` was updated to populate the `Key` members used in testing instead of only using KeyID's to allow the updated comparisons to work as intended.
The `Key` field of the `Registration` object was switched from `jose.JsonWebKey` to `*jose.JsonWebKey ` to make it easier to represent a registration w/o a Key versus using a value with a nil `JsonWebKey.Key`.
I verified the upstream unit tests pass per contributing.md:
```
daniel@XXXXX:~/go/src/gopkg.in/square/go-jose.v1$ git show
commit aa2e30fdd1fe9dd3394119af66451ae790d50e0d
Merge: 139276c e18a743
Author: Cedric Staub <cs@squareup.com>
Date: Thu Sep 22 17:08:11 2016 -0700
Merge branch 'master' into v1
* master:
Better docs explaining embedded JWKs
Reject invalid embedded public keys
Improve multi-recipient/multi-sig handling
daniel@XXXXX:~/go/src/gopkg.in/square/go-jose.v1$ go test ./...
ok gopkg.in/square/go-jose.v1 17.599s
ok gopkg.in/square/go-jose.v1/cipher 0.007s
? gopkg.in/square/go-jose.v1/jose-util [no test files]
ok gopkg.in/square/go-jose.v1/json 1.238s
```
Protobuf files need to be regenerated because (I think) Golang 1.7.3 uses a somewhat different method of ordering fields in a struct when marshaling to bytes.
* Switch back to go 1.5 in Travis.
* Add back GO15VENDOREXPERIMENT.
* Add GO15VENDOREXPERIMENT to Dockerfile
* Revert FAKE_DNS change.
* Revert "Properly close test servers (#2110)"
* Revert "Close VA HTTP test servers (#2111)"
* Change Godep version to 1.5.
* Standardize on issue number
The `letsencrypt/boulder-tools` image was recently updated, pulling in version
0.8.0 of certbot. That version stores the output of `certonly` requests in a
different path. In test.sh, we check out a specific tagged release of certbot in
order to get its integration tests. Prior to this commit, we were using
certbot 0.8.0 with the integration tests from version 0.6.0 of certbot,
which looked for `certonly` output in the wrong place, and failed.
This commit changes test.sh to checkout the 0.8.0 branch, and also removes a
temporary shim we used to make the `certbot` command call out to the
`letsencrypt` command.
Also, since the latest version of `letsencrypt/boulder-tools` includes an updated
`protoc-gen-go`, this change also updates the support packages to match.
While testing with real proxies I noticed the original CDR implementation was actually pretty broken, this refactors a bit and fixes a number of bugs. With this patch fallback to GPDNS over three distributed test proxies worked perfectly.
(Side note: `nginx` is not a viable forward proxy for this use as it doesn't support SSL, and a bunch of other _real_ forward proxy features, I ended up just using `squid3`.)
The main error in the previous implementation was the fallback was implemented in `getCAASet` which is only called in the old code path (the local CAA impl instead of the remote service) which mean't it wasn't actually being tested in the integration test. This also refactors a few repeated blocks into their own functions. Also there was a unicode encoding problem somewhere with the query string but for the life of me I can't figure out why it was broken now.
In https://github.com/letsencrypt/boulder/pull/774 we introduced and account key stored with the challenge. This was a stopgap fix to the now-defunct SimpleHTTP and DNS challenges in the face of https://mailarchive.ietf.org/arch/msg/acme/F71iz6qq1o_QPVhJCV4dqWf-4Yc. However, we no longer offer or implement those challenges, so the extra field is unnecessary. It also take up a huge amount of space in the challenges table, which is our biggest table. SimpleHTTP and DNS challenges were removed in https://github.com/letsencrypt/boulder/pull/1247.
We can provide a follow-up migration to delete the column later, once we have a plan for large migrations without downtime.
Fixes#1909
Clarifies the `UnknownHost` problem details error message created in the VA's `getAddr` when there is no valid IP address for the domain.
Previously this was reported as "No IPv4 addresses found for x" leading to user confusion (ref #1790) when a domain resolved to a private IP.
In this PR, logger is passed to the following callers:
NewWebFrontEndImpl
NewCertificateAuthorityImpl
NewValidationAuthorityImpl
NewAmqpRPCServer
newUpdater
NewRegistrationAuthorityServer
This reduces the usage of a global singleton logger and allows tests to consistently use a mock logger.
Fixes#1642
* remove blog.Get() in wfe
* remove blog.Get() from va
* remove Blog.Get() from ca
* remove blog.Get() from oscp updater, ampq rpc server, registration authority server
* removed some pointless logging code
* remove one added newline
* fix format issue
* fix setup function to return *blog.Mock instead of being passed in
* remove useless blog.NewMock() call
- Run both gRPC and AMQP servers simultaneously
- Take explicit constructor parameters and unexport fields that were previously set by users
- Remove transitional DomainCheck code in RA now that GSB is enabled.
- Remove some leftover UpdateValidation dummy methods.
Several of the `ProblemType`s had convenience functions to instantiate `ProblemDetails`s using their type and a detail message. Where these existed I did a quick scan of the codebase to convert places where callers were explicitly constructing the `ProblemDetails` to use the convenience function.
For the `ProblemType`s that did not have such a function, I created one and then converted callers to use it.
Solves #1837.
* Improve error messages in validation cases.
* Add challenge type to error detail.
This makes the errors easier to search for.
* %v->%s
* Revert boulder-config to master.
When a CAA request to Unbound times out, fall back to checking CAA via Google Public DNS' HTTPS API, through multiple proxies so as to hit geographically distributed paths. All successful multipath responses must be identical in order to succeed, and at most one can fail.
Fixes#1618
* Limit the length of logged HTTP response
Fixes#1777
* Fix crash, add tests
* Fix utf-8 truncation
* move more logic into helper method
* Add unit test for truncateBody
https://github.com/letsencrypt/boulder/pull/1778
* Fix newVARPC sanity check logic error
* Redo style of sanity check in RA to match VA
* Switch to ServerInternal(), readd log.Info
* Add tests to make sure correct metric measured
* comments on exported symbols
* review fixes
* build an array instead of storing last
* Fix go generate command in metrics.
The previous command only worked on OS X. This one works on Linux but not
OS X.
Also add generate phase of test.sh.
* Add mockgen to test setup.
* Fix github-pr-status output.
* Fix envvar style.
* Set xtrace.
* Fix test.sh
* Fix test.sh some more.
* Fix mockgen command.
* Add dependencies for running `go generate`.
* Add protoc-gen-go.
* Fix go get command.
* Fix generate.
* Wait for all.
* Fix generate.
* Update generated pb.
* Fix generate commands for vendored world.
* Update documentation for new vendor style.
* Update grpc package to latest.
* Update caaChecker proto with latest.
* Run go generate only over TESTPATHS
* See if Travis passes under 1.6
* Switch back to 1.5.
* Trim run command.
* Run stringer from correct directory.
* Move generate command.
* Restore and generate
* Fix path.
* list contents of GOPATH.
* Fix stringer by prebuilding.
* Try another import path.
* regenerate bcode_string.
* remove excess package
* pull jsha fork of protoc-gen-go that echoes
* Echo protoc version.
* install from source
* CD back.
* Go back to normal protoc-gen-go
* Fix path
* Move protobuf install into test/setup.sh
* Move before_install to install.
* Set PATH.
* Follow 301 with curl.
* Shuffle test order.
* Swap back test order.
* Restore all tests.
* Restore 1.5.3 to Travis.
* Remove unnecessary wait-or-exit
* Generate metrics mock with latest mockgen.
* Wrap TESTPATHS in curlies
* Remove spurious bracket
Because this event can be caused by end users, we don't want it higher than
Info. Also, it's redundant here: if this winds up being the gating factor, it
will show up in the "Validation result" log entry.
* Split out CAA checking service (minus logging etc)
* Add example.yml config + follow general Boulder style
* Update protobuf package to correct version
* Add grpc client to va
* Add TLS authentication in both directions for CAA client/server
* Remove go lint check
* Add bcodes package listing custom codes for Boulder
* Add very basic (pull-only) gRPC metrics to VA + caa-service
* Fix all errcheck errors
* Add errcheck to test.sh
* Add a new sa.Rollback method to make handling errors in rollbacks easier.
This also causes a behavior change in the VA. If a HTTP connection is
abruptly closed after serving the headers for a non-200 response, the
reported error will be the read failure instead of the non-200.
- Remove error signatures from log methods. This means fewer places where errcheck will show ignored errors.
- Pull in latest cfssl to be compatible with errorless log messages.
- Reduce the number of message priorities we support to just those we actually use.
- AuditNotice -> AuditInfo
- Remove InfoObject (only one use, switched to Info)
- Remove EmergencyExit and related functions in favor of panic
- Remove SyslogWriter / AuditLogger separate types in favor of a single interface, Logger, that has all the logging methods on it.
- Merge mock log into logger. This allows us to unexport the internals but still override them in the mock.
- Shorten names to be compatible with Go style: New, Set, Get, Logger, NewMock, etc.
- Use a shorter log format for stdout logs.
- Remove "... Starting" log messages. We have better information in the "Versions" message logged at startup.
Motivation: The AuditLogger / SyslogWriter distinction was confusing and exposed internals only necessary for tests. Some components accepted one type and some accepted the other. This made it hard to consistently use mock loggers in tests. Also, the unnecessarily fat interface for AuditLogger made it hard to meaningfully mock out.
Also, when a certificate already exists, treat that as info, not error.
Update mock logger to allow matching by log level, and fix WFE and VA tests
correspondingly.
Windows uses \r\n as a new line character. This change should fix
issues some Windows users are seeing when they include a new line
at the end of their challenge tokens.
The RPC call returns only the updated Challenge object instead of the
whole authz, because the VA shouldn't be allowed to update any of the
other fields of the Authorization object.
Also, stop skipping CAA lookups for the root TLDs. The RFC is unclear on
the desired behavior here, but the ICANNTLD function is nonstandard and
the behavior is strictly more conservative than what we had before.
This unblocks the removal of the ICANNTLD function, which allows us to
stop forking upstream.
Closes#1522
The CAA response checking method has been refactored to have a
easier to follow straight-line control flow. Several bugs in it have
been fixed:
- Firstly, parameters for issue and issuewild directives were not
parsed, so any attempt to specify parameters would result in
a string mismatch with the CA CAA identity (e.g. "letsencrypt.org").
Moreover, the syntax as specified permits leading and trailing
whitespace, so a parameter-free record such as
" letsencrypt.org ; " would not be considered a match.
This has been fixed by stripping whitespace and parameters. The RFC
does not specify the criticality of parameters, so unknown
parameters (currently all parameters) are considered noncritical.
I justify this as follows:
If someone decides to nominate a CA in a CAA record, they can,
with trivial research, determine what parameters, if any, that
CA supports, and presumably in trusting them in the first place
is able to adequately trust that the CA will continue to support
those parameters. The risk from other CAs is zero because other CAs
do not process the parameters because the records in which they
appear they do not relate to them.
- Previously, all of the flag bits were considered to effectively mean
'critical'. However, the RFC specifies that all bits except for the
actual critical bit (decimal 128) should be ignored. In practice,
many people have misunderstood the RFC to mean that the critical bit
is decimal 1, so both bits are interpreted to mean 'critical', but
this change ignores all of the other bits. This ensures that the
remaining six bits are reasonably usable for future standards action
if any need should arise.
- Previously, existence of an "issue" directive but no "issuewild"
directive was essentially equivalent to an unsatisfiable "issuewild"
directive, meaning that no wildcard identifiers could pass the CAA
check. This is contrary to the RFC, which states that issuewild
should default to what is specified for "issue" if no issuewild
directives are specified. (This is somewhat moot since boulder
doesn't currently support wildcard issuance.)
- Conversely, existence of an "issuewild" directive but no "issue"
directive would cause CAA validation for a non-wildcard identifier
to fail, which was contrary to the RFC. This has been fixed.
- More generally, existence of any unknown non-critical directive, say
"foobar", would cause the CAA checking code to act as though an
unsatisfiable "issue" directive existed, preventing any issuance.
This has been fixed.
Test coverage for corner cases is enhanced and provides regression
testing for these bugs.
statsd statistics have been added for tracking the relative frequency
of occurrence of different CAA features and outcomes. I added these on
a whim suspecting that they may be of interest.
Fixes#1436.
Previously we would return a detailed errorString, which ProblemDetailsFromDNSError
would turn into a generic, uninformative "Server failure at resolver".
Now we return a new internal dnsError type, which ProblemDetailsFromDNSError can
turn into a more informative message to be shown to the user.
This provides a means to add retries to DNS look ups, and, with some
future work, end retries early if our request deadline is blown. That
future work is tagged with #1292.
Updates #1258