This commit implements RFC 6844's description of the "CAA issuewild
property" for CAA records.
We check CAA in two places: at the time of validation, and at the time
of issuance when an authorization is more than 8hours old. Both
locations have been updated to properly enforce issuewild when checking
CAA for a domain corresponding to a wildcard name in a certificate
order.
Resolves https://github.com/letsencrypt/boulder/issues/3211
This PR implements issuance for wildcard names in the V2 order flow. By policy, pending authorizations for wildcard names only receive a DNS-01 challenge for the base domain. We do not re-use authorizations for the base domain that do not come from a previous wildcard issuance (e.g. a normal authorization for example.com turned valid by way of a DNS-01 challenge will not be reused for a *.example.com order).
The wildcard prefix is stripped off of the authorization identifier value in two places:
When presenting the authorization to the user - ACME forbids having a wildcard character in an authorization identifier.
When performing validation - We validate the base domain name without the *. prefix.
This PR is largely a rewrite/extension of #3231. Instead of using a pseudo-challenge-type (DNS-01-Wildcard) to indicate an authorization & identifier correspond to the base name of a wildcard order name we instead allow the identifier to take the wildcard order name with the *. prefix.
This PR changes the VA's singleDialTimeout value from 5 * time.Second to 10 * time.Second. This will give slower servers a better chance to respond, especially for the multi-VA case where n requests arrive ~simultaneously.
This PR also bumps the RA->VA timeout by 5s and the WFE->RA timeout by 5s to accommodate the increased dial timeout. I put this in a separate commit in case we'd rather deal with this separately.
Updates the buckets for histograms in the publisher, va, and expiration-mailer which are used to measure the latency of operations that go over the internet and therefore are liable to take a lot longer than the default buckets can measure. Uses a standard set of buckets for all three instead of attempting to tune for each one.
Fixes#3217.
This commit adds CAA `issue` paramter parsing and the `challenge` parameter to permit a single challenge type only. By setting `challenge=dns-01`, the nameserver keeps control over every issued certificate.
Prior to this commit the VA would follow redirects from the initial
HTTP-01 challenge request on port 80 to any other port. In practice the
Let's Encrypt production environment has network egress firewall rules
that drop outbound requests that are not on port 80 or 443. In effect
this meant any challenge request that was redirected from 80 to a port
other than 80/443 was turned into a mysterious connection timeout error.
We have decided to preserve the egress firewall rule and continue to act
conservatively. Only port 80 and 443 should be allowed in redirects.
This commit updates the VA to return a clear error message when
a non-80/443 redirect is made.
To aid in testing/configuration the actual ports enforced are specified
by the va.httpPort and va.httpsPort that are used for the initial
outbound HTTP-01 connection.
The VA TestHTTPRedirectLookup unit test is updated accordingly to test
that a non-80/443 redirect fails with the expected message.
Resolves#3049
Add a logging statement that fires when a remote VA fail causes
overall failure. Also change remoteValidationFailures into a
counter that counts the same thing, instead of a histogram. Since
the histogram had the default bucket sizes, it failed to collect
what we needed, and produced more metrics than necessary.
va.go is quite a large file. This splits out the CAA-related code and tests into its own file for simplicity. This is a simple move; no code has been changed, and there is no package split.
Fixes#639.
This resolves something that has bugged me for two+ years, our DNSResolverImpl is not a DNS resolver, it is a DNS client. This change just makes that obvious.
If we hit a `syscall.ECONNRESET` error return a more useful error than `Error getting validation data`, updates the `TestDetailedError` test to cover this case.
Fixes#2851.
The implementation of the dialer used by the HTTP01 challenge, constructed with `resolveAndConstructDialer`, used the same wrapped `net.Dialer` for both the initial IPv6 connection, and any subsequent IPv4 fallback connections. This caused the IPv4 fallback to never succeed for cases where the initial IPv6 connection expended the `validationTimeout`.
This commit updates the http01Dialer (newly renamed from `dialer` since it is in fact specific to HTTP01 challenges) to use a fresh dialer for each connection. To facilitate testing the http01Dialer maintains
a count of how many dialer instances it has constructed. We use this in a unit test to ensure the correct behaviour without a great deal of new mocking/interfaces.
Resolves#2770
This commit improves the rather vague error message that was previously returned if an IPv6 challenge validation failed when IPv6First was enabled and there were no IPv4 addresses left to try as a fallback.
Resolves#2821
Adds basic multi-path validation functionality. A new method `performRemoteValidation` is added to `boulder-va` which is called if it is configured with a list of remote VA gRPC addresses. In this initial implementation the remote VAs are only used to check the validation result of the main VA, if all of the remote validations succeed but the local validation failed, the overall validation will still fail. Remote VAs use the exact same code as the local VA to perform validation. If the local validation succeeds then a configured quorum of the remote VA successes must be met in order to fully complete the validation.
This implementation assumes that metrics are collected from the remote VAs in order to have visibility into their individual validation latencies etc.
Fixes#2621.
Previously, a lot of validations problems would give the message "Failed to
connect to X for ..." This was misleading because the issue was not always a
connection error, and when it was, it was valuable to distinguish between
connection refused and timeout. Also, for HTTP, this message would echo the
first URL in a redirect chain, when we really want the URL that failed.
Renames the misleading "parseHTTPConnError" and removes an inaccurate
check for temporary errors. It also eliminates the "detail" argument, instead
generating all messages inside the function.
Improves the handling of tls.alert errors to actually pass through the error
message, rather than just quietly changing the problem type (which was very
easy to miss).
Gives a specific error message for timeouts.
Preserves the URL from url.Error types and incorporates it into error messages.
Splits the HTTP timeout test into its own test case.
Following up on #2752, we don't need to use global vars for our Prometheus stats. We already have a custom registry plumbed through using Scope objects. In this PR, expose the MustRegister method of that registry through the Scope interface, and move existing global vars to be fields of objects. This should improve testability somewhat.
Note that this has a bit of an unfortunate side effect: two instances of the same stats-using class (e.g. VA) can't use the same Scope object, because their MustRegister calls will conflict. In practice this is fine since we never instantiate duplicates of the the classes that use stats, but it's something we should keep an eye on.
Updates #2733
This PR introduces a new feature flag "IPv6First".
When the "IPv6First" feature is enabled the VA's HTTP dialer and TLS SNI
(01 and 02) certificate fetch requests will attempt to automatically
retry when the initial connection was to IPv6 and there is an IPv4
address available to retry with.
This resolves https://github.com/letsencrypt/boulder/issues/2623
This change changes the returning values from boolean to error.
It makes `checkConsistency` an internal function and removes the
optional argument in favor of making checks explicit where they are
used.
It also renames those functions to CheckConsistency* to not
give the impression of still returning boolean values.
Signed-off-by: David Calavera <david.calavera@gmail.com>
Rename HTTPMonitor to MeasuredHandler.
Remove inflight stat (we didn't use it).
Add timing stat by method, endpoint, and status code.
The timing stat subsumes the "rate" stat, so remove that.
WFE now wraps in MeasuredHandler, instead of relying on its cmd/main.go.
Remove FBAdapter stats.
MeasuredHandler tracks stats by method, status code, and endpoint.
In VA, add a Prometheus histogram for validation timing.
This patch removes all usages of the `core.XXXError` and almost all usages of `probs` outside of the WFE and VA and replaces them with a unified internal error type. Since the VA uses `probs.ProblemDetails` quite extensively in challenges, and currently stores them in the DB I've saved this change for another change (it'll also require a migration). Since `ProblemDetails` should only ever be exposed to end-users all of its related logic should be moved into the `WFE` but since it still needs to be exposed to the VA and SA I've left it in place for now.
The new internal `errors` package offers the same convenience functions as `probs` does as well as a new simpler type testing method. A few small changes have also been made to error messages, mainly adding the library and function name to internal server errors for easier debugging (i.e. where a number of functions return the exact same errors and there is no other way to distinguish which method threw the error).
Also adds proper encoding of internal errors transferred over gRPC (the current encoding scheme is kept for `core` and `probs` errors since it'll be ideally be removed after we deploy this and follow-up changes) using `grpc/metadata` instead of the gRPC status codes.
Fixes#2507. Updates #2254 and #2505.
I think these are all the necessary changes to implement TLS-SNI-02 validations, according to the section 7.3 of draft 05:
https://tools.ietf.org/html/draft-ietf-acme-acme-05#section-7.3
I don't have much experience with this code, I'll really appreciate your feedback.
Signed-off-by: David Calavera <david.calavera@gmail.com>
The VA's `getAddr` function prior to this commit had an outdated comment
& a pointer to a TODO for Boulder Issue #593. That issue has been closed
and bdns' `LookupHost` supports AAAA records now. This commit updates
the comment to match the current behaviour and removes the TODO.
Previous to this PR the VA's validateTLSWithZName function would
return an error message containing the SAN names of the leaf certificate
when the validation failed. This commit updates that message to include
the Subject Common Name of the leaf cert in addition to the SANs. The
names are deduplicated to prevent listing a Subj CN twice if its also
a SAN. This will help debug cases where a cert with no SANs is returned
by the server.
In addition, the number of certificates in the chain received from the
server is included in the message. This will hopefully further help
users identify misconfiguration since a TLS SNI 01 challenge response
should have a chain length of 1.
Resolves#2468
If the VA fails to validate a TLS-SNI-01 challenge because it is trying to talk TLS to a HTTP-only server return a special error message that is slightly more informative.
Presently when the VA performs a DNS-01 challenge verification it
returns the same error for the case where the remote nameserver had the
**wrong** TXT value, and when the remote nameserver had an **empty**
response for the TXT query. It would aid debugging if the user was told
which of the two failure cases was responsible for the overall challenge
failure.
This commit adds a unique error message for the empty TXT records case,
and a unit test/mock to exercise the new the error message.
Resolves#2326
While testing with real proxies I noticed the original CDR implementation was actually pretty broken, this refactors a bit and fixes a number of bugs. With this patch fallback to GPDNS over three distributed test proxies worked perfectly.
(Side note: `nginx` is not a viable forward proxy for this use as it doesn't support SSL, and a bunch of other _real_ forward proxy features, I ended up just using `squid3`.)
The main error in the previous implementation was the fallback was implemented in `getCAASet` which is only called in the old code path (the local CAA impl instead of the remote service) which mean't it wasn't actually being tested in the integration test. This also refactors a few repeated blocks into their own functions. Also there was a unicode encoding problem somewhere with the query string but for the life of me I can't figure out why it was broken now.
In https://github.com/letsencrypt/boulder/pull/774 we introduced and account key stored with the challenge. This was a stopgap fix to the now-defunct SimpleHTTP and DNS challenges in the face of https://mailarchive.ietf.org/arch/msg/acme/F71iz6qq1o_QPVhJCV4dqWf-4Yc. However, we no longer offer or implement those challenges, so the extra field is unnecessary. It also take up a huge amount of space in the challenges table, which is our biggest table. SimpleHTTP and DNS challenges were removed in https://github.com/letsencrypt/boulder/pull/1247.
We can provide a follow-up migration to delete the column later, once we have a plan for large migrations without downtime.
Fixes#1909
Clarifies the `UnknownHost` problem details error message created in the VA's `getAddr` when there is no valid IP address for the domain.
Previously this was reported as "No IPv4 addresses found for x" leading to user confusion (ref #1790) when a domain resolved to a private IP.