Previously, if a remote VA returned an error that is not a ProblemDetail, the
primary VA would log a ServerInternalProblem but not the underlying error.
This commit updates performRemoteValidation to always return the full error it
receives from a remote VA.
This commit also adds a unittest that checks that the VA still returns a
ServerInternalProblem to the RA, and that the VA audit logs the underlying
error.
Resolves https://github.com/letsencrypt/boulder/issues/3753
This adds support for the account-uri CAA parameter as specified by
section 3 of https://tools.ietf.org/html/draft-ietf-acme-caa-04, allowing
issuance to be restricted to one or more ACME accounts as specified by CAA
records.
This commit updates the VA's checkCAA function to include the provided
challengeType parameter as part of the audit log line for the CAA
check result. If challengeType == nil then the value "none" is logged.
Unit tests are updated accordingly.
This change will make it easier to distinguish cases where "Valid for
issuance" is false because of a validation-methods restriction.
Resolves#3740
When performing CAA checking respect the validation-methods parameter (if
present) and restrict the allowed authorization methods to those specified.
This allows a domain to restrict authorization methods that can be used with
Let's Encrypt.
This is largely based on PR #3003 (by @lukaslihotzki), which was landed and
then later reverted due to issue #3143. The bug the resulted in the previous
code being reverted has been addressed (likely inadvertently) by 76973d0f.
This implementation also includes integration tests for CAA validation-methods.
Fixes issue #3143.
This is a quick first pass at audit logging the *dns.CAA records in JSON format from within the VA's IsCAAValid function. This will provide more information for post-hoc analysis of CAA decisions.
Per RFC 6844 Section 5.1 the matching of CAA tag identifiers (e.g.
"Issue") is case insensitive. This commit updates the CAA tag processing
to be case insensitive as required by the RFC.
To exercise the fix this commit adds a test case to the `caaMockDNS`
`LookupCAA` implementation for a hostname (`mixedcase.com`) that has
a CAA record with a mixed case `Issue` tag. Prior to the fix from this
branch being included in `va/caa.go` the test fails:
```
--- FAIL: TestCAAChecking/Bad_(Reserved,_Mixed_case_Issue) (0.00s)
caa_test.go:292: checkCAARecords validity mismatch for
mixedcase.com: got true expected false
```
With the fix applied, the test passes.
Remove various unnecessary uses of fmt.Sprintf - in particular:
- Avoid calls like t.Error(fmt.Sprintf(...)), where t.Errorf can be used directly.
- Use strconv when converting an integer to a string, rather than using
fmt.Sprintf("%d", ...). This is simpler and can also detect type errors at
compile time.
- Instead of using x.Write([]byte(fmt.Sprintf(...))), use fmt.Fprintf(x, ...).
Many of the probs.XYZ calls are of the form probs.XYZ(fmt.Sprintf(...)).
Convert these functions to take a format string and optional arguments,
following the same pattern used in the errors package. Convert the
various call sites to remove the now redundant fmt.Sprintf calls.
A very large number of the logger calls are of the form log.Function(fmt.Sprintf(...)).
Rather than sprinkling fmt.Sprintf at every logger call site, provide formatting versions
of the logger functions and call these directly with the format and arguments.
While here remove some unnecessary trailing newlines and calls to String/Error.
This change cleans up how `va.http01Dialer` works with regards to `core.ValidationRecord`s. Instead of using the record as both an input and a output it now uses a set of inputs and outputs information about addresses via a channel. The validation record is then constructed in the parent scope or in the redirect function instead of the dialer itself.
Fixes#2730, fixes#3109, and fixes#3663.
This field was set to singleDialTimeout, but the net/http library treats
it as covering all of dial, write headers, and read headers and body.
Since http01Dialer also uses singleDialTimeout, there's a race between
http01Dialer and net/http to see who will time out first. The result is
that sometimes we give "Timeout after connect" when the error really
should be "Timeout during connect." This issue also inhibits IPv6 to
IPv4 fallback, and tickles a data race that was causing a rare panic in
VA: https://github.com/letsencrypt/boulder/issues/3109.
After this change, the overall HTTP request will get the full deadline
allowed by the RPC context. The dialer will continue to use
singleDialTimeout for each of its two possible dial attempts.
This allows us to have fast-running unittests without modifying the global state in singleDialTimeout,
which can become a const.
Fixes#3628.
Builds on top of #3629, review that first.
In particular, differentiate timeouts during connect (which are usually a firewall problem) from timeouts after connect (which are usually a software problem). In the process, refactor the tests and add testing for specific problem detail messages.
This also switches over the HTTP challenge's dialer to use DialContext, and to shave a little bit of headroom off of the context deadline, so that the dial can report its timeout before the overall context expires, which would lead to an overly generic "deadline exceeded" error, which would then get translated (incorrectly) into a "timeout after connect."
There is an additional error case, Timeout during %s (your server may be slow or overloaded), (where %s can be read or write) which doesn't have any unittests. I believe it may not be possible to trigger this, since read and write timeouts get subsumed by the HTTP or TLS library, but it's worth having as a fallback case. We'll see if it shows up in the logs.
Among the test refactorings, I shortened the timeout on the TLS timeout test to 50ms. Previously this was the long pole making the whole test take 10s. Now it takes ~500 ms overall.
I recommend starting review at https://github.com/letsencrypt/boulder/compare/detailed-va-errors?expand=1#diff-4c51d1d7ca3ec3022d14b42809af0d7eR671 (the changes to detailedError), then reviewing the Dial -> DialContext changes, then the tests.
Also instead of repeating the same bucket definitions everywhere just use a single top level var in the metrics package in order to discourage copy/pasting.
Fixes#3607.
Right now we check safe browsing at new-authz time, which introduces a possible
external dependency when calling new-authz. This is usually fine, since most safe
browsing checks can be satisfied locally, but when requests have to go external,
it can create variance in new-authz timing.
Fixes#3491.
This commit updates the RA to make the notion of submitting
a KeyAuthorization value as part of the ra.UpdateAuthorization call
optional. If set, the value is enforced against expected and an error is
returned if the provided authorization isn't correct. If it isn't set
the RA populates the field with the computed authorization for the VA to
enforce against the value it sees in challenges. This retains the legacy
behaviour of the V1 API. The V2 API will never unmarshal a provided
key authorization.
The ACMEv2/WFEv2 prepChallengeForDisplay function is updated to strip
the ProvidedKeyAuthorization field before sending the challenge object
back to a client. ACMEv1/WFEv1 continue to return the KeyAuthorization
in challenges to avoid breaking clients that are relying on this legacy
behaviour.
For deployability ease this commit retains the name of the
core.Challenge.ProvidedKeyAuthorization field even though it should
be called core.Challenge.ComputedKeyAuthorization now that it isn't
set based on the client's provided key authz. This will be easier as
a follow-up change.
Resolves#3514
This code was never enabled in production. Our original intent was to
ship this as part of the ACMEv2 API. Before that could happen flaws were
identified in TLS-SNI-01|02 that resulted in TLS-SNI-02 being removed
from the ACME protocol. We won't ever be enabling this code and so we
might as well remove it.
Before this change, we would just log "Correct value not found for DNS challenge"
when we got a TXT record that didn't match what we expected. This was different
from the error when no TXT records were found at all, but viewing the error out of
context doesn't make that clear. This change improves the error to specifically say
that we found a TXT record, but it was the wrong one.
Also in this change: if we found multiple TXT records, we mention the number;
and we trim the length of the echoed TXT record.
This commit implements RFC 6844's description of the "CAA issuewild
property" for CAA records.
We check CAA in two places: at the time of validation, and at the time
of issuance when an authorization is more than 8hours old. Both
locations have been updated to properly enforce issuewild when checking
CAA for a domain corresponding to a wildcard name in a certificate
order.
Resolves https://github.com/letsencrypt/boulder/issues/3211
This PR implements issuance for wildcard names in the V2 order flow. By policy, pending authorizations for wildcard names only receive a DNS-01 challenge for the base domain. We do not re-use authorizations for the base domain that do not come from a previous wildcard issuance (e.g. a normal authorization for example.com turned valid by way of a DNS-01 challenge will not be reused for a *.example.com order).
The wildcard prefix is stripped off of the authorization identifier value in two places:
When presenting the authorization to the user - ACME forbids having a wildcard character in an authorization identifier.
When performing validation - We validate the base domain name without the *. prefix.
This PR is largely a rewrite/extension of #3231. Instead of using a pseudo-challenge-type (DNS-01-Wildcard) to indicate an authorization & identifier correspond to the base name of a wildcard order name we instead allow the identifier to take the wildcard order name with the *. prefix.
This PR changes the VA's singleDialTimeout value from 5 * time.Second to 10 * time.Second. This will give slower servers a better chance to respond, especially for the multi-VA case where n requests arrive ~simultaneously.
This PR also bumps the RA->VA timeout by 5s and the WFE->RA timeout by 5s to accommodate the increased dial timeout. I put this in a separate commit in case we'd rather deal with this separately.
Updates the buckets for histograms in the publisher, va, and expiration-mailer which are used to measure the latency of operations that go over the internet and therefore are liable to take a lot longer than the default buckets can measure. Uses a standard set of buckets for all three instead of attempting to tune for each one.
Fixes#3217.
This pulls in google/safebrowsing#74, which introduces a new LookupURLsContext that allows us to pass through timeout information nicely.
Also, update calling code to use LookupURLsContext instead of LookupURLs.
This commit adds CAA `issue` paramter parsing and the `challenge` parameter to permit a single challenge type only. By setting `challenge=dns-01`, the nameserver keeps control over every issued certificate.
Prior to this commit the VA would follow redirects from the initial
HTTP-01 challenge request on port 80 to any other port. In practice the
Let's Encrypt production environment has network egress firewall rules
that drop outbound requests that are not on port 80 or 443. In effect
this meant any challenge request that was redirected from 80 to a port
other than 80/443 was turned into a mysterious connection timeout error.
We have decided to preserve the egress firewall rule and continue to act
conservatively. Only port 80 and 443 should be allowed in redirects.
This commit updates the VA to return a clear error message when
a non-80/443 redirect is made.
To aid in testing/configuration the actual ports enforced are specified
by the va.httpPort and va.httpsPort that are used for the initial
outbound HTTP-01 connection.
The VA TestHTTPRedirectLookup unit test is updated accordingly to test
that a non-80/443 redirect fails with the expected message.
Resolves#3049
For certificates with many domains it can be difficult to associate
a given CAA error with the specific domain that caused it. To make this
easier this commit explicitly prefixes all of the problems that can be
returned from `va.IsCAAValid` with the domain name in question.
A small unit test is included to check a CAA problem's detail message is
suitably prefixed with the affected domain.
There were a bunch of test fixtures in bdns/mocks.go that were only used in va/caa_test.go. This moves them to be in the same file so we have less spooky action at a distance.
One side-effect: We can't construct bdns.DNSError with the internal fields we want, because those fields are unexported. So we switch a couple of mock cases to just return a generic error, and the corresponding test cases to expect that error.
Previously, CAA problems were lumped in under "ConnectionProblem" or
"Unauthorized". This should make things clearer and easier to differentiate.
Fixes#3043
This implements the pre-erratum 5065 version of CAA, behind a feature flag.
This involved refactoring DNSClient.LookupCAA to return a list of CNAMEs in addition to the CAA records, and adding an alternate lookuper that does tree-climbing on single-depth aliases.
Add a logging statement that fires when a remote VA fail causes
overall failure. Also change remoteValidationFailures into a
counter that counts the same thing, instead of a histogram. Since
the histogram had the default bucket sizes, it failed to collect
what we needed, and produced more metrics than necessary.
To support having problem types that use either the classic
"urn:acme:error" namespace or the new "urn:ietf:params:acme:error"
namespace as appropriate we need to prefix the problem type at runtime
right before returning it through the WFE to the user as JSON. This
commit updates the WFE/WFE2 to do this for both problems sent through
sendError as well as problems embedded in challenges. For the latter
we do not modify problems with a type that is already prefixed to
support backwards compatibility.
Resolves#2938
Note: We should cut a follow-up issue to devise a way to share some
common code between the WFE and WFE2. For example, the
prepChallengeForDisplay should probably be hoisted to a common
"web" package
Fixes#2889.
VA now implements two gRPC services: VA and CAA. These both run on the same port, but this allows implementation of the IsCAAValid RPC to skip using the gRPC wrappers, and makes it easier to potentially separate the service into its own package in the future.
RA.NewCertificate now checks the expiration times of authorizations, and will call out to VA to recheck CAA for those authorizations that were not validated recently enough.
va.go is quite a large file. This splits out the CAA-related code and tests into its own file for simplicity. This is a simple move; no code has been changed, and there is no package split.
The VA test had a global:
`var ident = core.AcmeIdentifier{Type: core.IdentifierDNS, Value: "localhost"}`
Evidently this was meant as a convenience to avoid having to retype this common value, but it wound up being mutated independently by different tests. This PR replaces it with a convenience function `dnsi()` that generates a DNS-type identifier with the given hostname. Makes the VA test much more reliable locally.
This commit updates the VA's `IsSafeDomain` RPC to treat errors from the
Google Safe Browsing client as a positive response. Subsequently the VA
will only block authz creation in the case that the GSB API returns
a true negative (e.g. confirms an unsafe domain). If the database is in
an inconsistent state due to an API outage we will allow the authz to be
created.