Replace the current three-piece setup (enum of feature variables, map of
feature vars to default values, and autogenerated bidirectional maps of
feature variables to and from strings) with a much simpler one-piece
setup: a single struct with one boolean-typed field per feature. This
preserves the overall structure of the package -- a single global
feature set protected by a mutex, and Set, Reset, and Enabled methods --
although the exact function signatures have all changed somewhat.
The executable config format remains the same, so no deployment changes
are necessary. This change does deprecate the AllowUnrecognizedFeatures
feature, as we cannot tell the json config parser to ignore unknown
field names, but that flag is set to False in all of our deployment
environments already.
Fixes https://github.com/letsencrypt/boulder/issues/6802
Fixes https://github.com/letsencrypt/boulder/issues/5229
Change the CAA NXDOMAIN carve-out to only apply to registered domains
and their subdomains, not to TLDs.
Our CAA lookup function has a carveout that allows queries which receive
an NXDOMAIN response to be treated as though they received a successful
empty response. This is important due to the confluence of three
circumstances:
1) many clients use the DNS-01 method to validate issuance for names
which generally don't have publicly-visible A/AAAA records;
2) many ACME clients remove their DNS-01 TXT record promptly after
validation has completed; and
3) authorizations may be reused more than 8 hours after they were first
validated and CAA was first checked.
When these circumstances combine, the DNS rightly returns NXDOMAIN when
we re-check CAA at issuance time, because no records exist at all for
that name. We have to treat this as permission to issue, the same as any
other domain which has no CAA records.
However, this should never be the case for TLDs: those should always
have at least one record. If a TLD returns NXDOMAIN, something much
worse has happened -- such as a gTLD being unlisted by ICANN -- and we
should treat it as a failure.
This change adds a check that the name in question contains at least one
dot (".") before triggering the existing carve-out, to make it so that
the carve-out does not apply to TLDs.
Fixes https://github.com/letsencrypt/boulder/issues/7056
Add a new feature flag "CAAAfterValidation" which, when set to true in
the VA, causes the VA to only begin CAA checks after basic domain
control validation has completed successfully. This will make successful
validations take longer, since the DCV and CAA checks are performed
serially instead of in parallel. However, it will also reduce the number
of CAA checks we perform by up to 80%, since such a high percentage of
validations also fail.
IN-9575 tracks enabling this feature flag in staging and prod
Fixes https://github.com/letsencrypt/boulder/issues/7058
Add go1.21rc2 to the matrix of go versions we test against.
Add a new step to our CI workflows (boulder-ci, try-release, and
release) which sets the "GOEXPERIMENT=loopvar" environment variable if
we're running go1.21. This experiment makes it so that loop variables
are scoped only to their single loop iteration, rather than to the whole
loop. This prevents bugs such as our CAA Rechecking incident
(https://bugzilla.mozilla.org/show_bug.cgi?id=1619047). Also add a line
to our docker setup to propagate this environment variable into the
container, where it can affect builds.
Finally, fix one TLS-ALPN-01 test to have the fake subscriber server
actually willing to negotiate the acme-tls/1 protocol, so that the ACME
server's tls client actually waits to (fail to) get the certificate,
instead of dying immediately. This fix is related to the upgrade to
go1.21, not the loopvar experiment.
Fixes https://github.com/letsencrypt/boulder/issues/6950
This can take two values (typically the return values of a two-value
function) and panic if the error is non-nil, returning the interesting
value. This is particularly useful for cases where we statically know
the call will succeed.
Thanks to @mcpherrinm for the idea!
RFC 8659 (CAA; https://www.rfc-editor.org/rfc/rfc8659) says that "A CA
MUST NOT issue certificates for any FQDN if the Relevant RRset for that
FQDN contains a CAA critical Property for an unknown or unsupported
Property Tag."
Let's Encrypt does technically support the iodef property tag: we
recognize it, but then ignore it and never choose to send notifications
to the given contact address. Historically, we have carried around the
iodef property tags in our internal structures as though we might use
them, but all code referencing them was essentially dead code.
As part of a set of simplifications,
https://github.com/letsencrypt/boulder/pull/6886 made it so that we
completely ignore iodef property tags. However, this had the unintended
side-effect of causing iodef property tags with the Critical bit set to
be counted as "unknown critical" tags, which prevent issuance.
This change causes our property tag parsing code to recognize iodef tags
again, so that critical iodef tags don't prevent issuance.
Removes the `Hostname` and `Port` fields from an http-01
ValidationRecord model prior to storing the record in the database.
Using `"hostname":"example.com","port":"80"` as a snippet of a whole
validation record, we'll save minimum 36 bytes for each new http-01
ValidationRecord that gets stored. When retrieving the record, the
ValidationRecord `RehydrateHostPort` method will repopulate the
`Hostname` and `Port` fields from the `URL` field.
Fixes the main goal of
https://github.com/letsencrypt/boulder/issues/5231.
---------
Co-authored-by: Samantha <hello@entropy.cat>
When processing CAA records, keep track of the FQDN at which that CAA
record was found (which may be different from the FQDN for which we are
attempting issuance, since we crawl CAA records upwards from the
requested name to the TLD). Then surface this name upwards so that it
can be included in our own log lines and in the problem documents which
we return to clients.
Fixes https://github.com/letsencrypt/boulder/issues/3171
Occasionally (and just now) I've responded to an issue or thread that
involves this error message:
> The key authorization file from the server did not match this
challenge
"LoqXcYV8q5ONbJQxbmR7SCTNo3tiAXDfowyjxAjEuX0.9jg46WB3rR_AHD-EBXdN7cBkH1WOu0tA3M9fm21mqTI"
!= "\xef\xffAABBCC
and I've found myself looking at Boulder's source code, to check which
way around the values are. I suspect that users are not understanding it
either.
Make minor, non-user-visible changes to how we structure the probs
package. Notably:
- Add new problem types for UnsupportedContact and
UnsupportedIdentifier, which are specified by RFC8555 and which we will
use in the future, but haven't been using historically.
- Sort the problem types and constructor functions to match the
(alphabetical) order given in RFC8555.
- Rename some of the constructor functions to better match their
underlying problem types (e.g. "TLSError" to just "TLS").
- Replace the redundant ProblemDetailsToStatusCode function with simply
always returning a 500 if we haven't properly set the problem's
HTTPStatus.
- Remove the ability to use either the V1 or V2 error namespace prefix;
always use the proper RFC namespace prefix.
As a follow-up to #6780, add the same style of implementation test to
all of our other gRPC services. This was not included in that PR just to
keep it small and single-purpose.
Remove tracing using Beeline from Boulder. The only remnant left behind
is the deprecated configuration, to ensure deployability.
We had previously planned to swap in OpenTelemetry in a single PR, but
that adds significant churn in a single change, so we're doing this as
multiple steps that will each be significantly easier to reason about
and review.
Part of #6361
Compute "started" and "took" variables inside the retry loop, so that we're
actually measuring how long the request which timed out took, rather than
measuring how long the request which timed out plus all the previous "network
unreachable" attempts took.
For consistency, put the error field at the end of unstructured log
lines to make them more ... structured.
Adds the `issuerID` field to "orphaning certificate" log line in the CA
to match the "orphaning precertificate" log line.
Fixes broken tests as a result of the CA and bdns log line change.
Fixes#5457
Remove the PortConfig field from both the VA's config struct and from
the NewValidationAuthorityImpl constructor. This config item is no
longer used anywhere, and removing this prevents us from accidentally
overriding the "Authorized Ports" (80 and 443) which are required by the
Baseline Requirements.
Unit tests are still able to override the httpPort and tlsPort fields of
the ValidationAuthorityImpl.
Fixes#3940
Followup from #6506.
As noted in that PR: I'm not aware of any way to trigger invalid UTF-8
from the layers below this, so I can't think of a good way to unittest
it.
This avoids serialization errors passing through gRPC.
Also, add a pass-through path in replaceInvalidUTF8 that saves an
allocation in the trivial case.
Fixes#6490
Make minor changes to our implementation of CAA Account and Method
Binding, as a result of reviewing the code in preparation for enabling
it in production. Specifically:
- Ensure that the validation method and account ID are included at the
request level, rather than waiting until we perform the checks which use
those parameters;
- Clean up code which assumed the validation method and account ID might
not be populated;
- Use the core.AcmeChallenge type (rather than plain string) for the
validation method everywhere;
- Update comments to reference the latest version and correct sections
of the CAA RFCs; and
- Remove the CAA feature flags from the config integration tests to
reflect that they are not yet enabled in prod.
I have reviewed this code side-by-side with RFC 8659 (CAA) and RFC 8657
(ACME CAA Account and Method Binding) and believe it to be compliant
with both.
Clean up several spots where we were behaving differently on
go1.18 and go1.19, now that we're using go1.19 everywhere. Also
re-enable the lint and generate tests, and fix the various places where
the two versions disagreed on how comments should be formatted.
Also clean up the OldTLS codepaths, now that both go1.19 and our
own feature flags have forbidden TLS < 1.2 everywhere.
Fixes#6011
- Add a new field, `RetryAfter` to `BoulderError`s
- Add logic to wrap/unwrap the value of the `RetryAfter` field to our gRPC error
interceptor
- Plumb `RetryAfter` for `DuplicateCertificateError` emitted by RA to the WFE
client response header
Part of #6256
- Add a dedicated Consul container
- Replace `sd-test-srv` with Consul
- Add documentation for configuring Consul
- Re-issue all gRPC credentials for `<service-name>.service.consul`
Part of #6111
Enable the "unparam" linter, which checks for unused function
parameters, unused function return values, and parameters and
return values that always have the same value every time they
are used.
In addition, fix many instances where the unparam linter complains
about our existing codebase. Remove error return values from a
number of functions that never return an error, remove or use
context and test parameters that were previously unused, and
simplify a number of (mostly test-only) functions that always take the
same value for their parameter. Most notably, remove the ability to
customize the RSA Public Exponent from the ceremony tooling,
since it should always be 65537 anyway.
Fixes#6104
Run the Boulder unit and integration tests with go1.19.
In addition, make a few small changes to allow both sets of
tests to run side-by-side. Mark a few tests, including our lints
and generate checks, as go1.18-only. Reformat a few doc
comments, particularly lists, to abide by go1.19's stricter gofmt.
Causes #6275
The iotuil package has been deprecated since go1.16; the various
functions it provided now exist in the os and io packages. Replace all
instances of ioutil with either io or os, as appropriate.
Fix a race condition in one of the TestMultiVA test cases.
There are many other possible flakes in these tests, because
they use real HTTP to talk to a fake remote VA on localhost,
but we can at least trivially remove this race condition.
Fixes#6119
Update:
- golangci-lint from v1.42.1 to v1.46.2
- protoc from v3.15.6 to v3.20.1
- protoc-gen-go from v1.26.0 to v1.28.0
- protoc-gen-go-grpc from v1.1.0 to v1.2.0
- fpm from v1.14.0 to v1.14.2
Also remove a reference to go1.17.9 from one last place.
This does result in updating all of our generated .pb.go files, but only
to update the version number embedded in each file's header.
Fixes#6123
If the test unexpectedly failed, it would hit a nil pointer dereference
on line 511 when it tries to access the `.Type` field of nil. Add another
case to handle this.
These new linters are almost all part of golangci-lint's collection
of default linters, that would all be running if we weren't setting
`disable-all: true`. By adding them, we now have parity with the
default configuration, as well as the additional linters we like.
Adds the following linters:
* unconvert
* deadcode
* structcheck
* typecheck
* varcheck
* wastedassign
This adds three features flags: SHA1CSRs, OldTLSOutbound, and
OldTLSInbound. Each controls the behavior of an upcoming deprecation
(except OldTLSInbound, which isn't yet scheduled for a deprecation
but will be soon). Note that these feature flags take advantage of
`features`' default values, so they can default to "true" (that is, each
of these features is enabled by default), and we set them to "false"
in the config JSON to turn them off when the time comes.
The unittest for OldTLSOutbound requires that `example.com` resolves
to 127.0.0.1. This is because there's logic in the VA that checks
that redirected-to hosts end in an IANA TLD. The unittest relies on
redirecting, and we can't use e.g. `localhost` in it because of that
TLD check, so we use example.com.
Fixes#6036 and #6037
Slightly refactor `validateTLSALPN01` to use a common function
to format the error messages it returns. This reduces code duplication
and makes the important validation logic easier to follow.
Fixes#5922
That causes the VA to emit ValidationRecords with the OldTLS bit set if
it observes a redirect to HTTPS that negotiates TLS < 1.2.
I've manually tested but there is not yet an integration test. I need
to make a parallel change in challtestsrv and then incorporate here.
Make minor updates to our implementation of the HTTP-01 validation method based
on in-depth review of BRs Section 3.2.2.4.19 and RFC 8555 Section 8.3.
- Move the HTTP response code check above parsing the body.
- Explicitly check for 301, 302, 307, and 308 redirect codes, so that if the go
stdlib updates to allow additional redirects we don't follow suit.
- Trim additional forms of white-space from the key authorization.
RFC 8737 says "The client prepares for validation by constructing a self-signed
certificate...". Add a check for whether the challenge certificate is
self-signed by ensuring its issuer and subject are equal, and checking its
signature with its own public key.
Also slightly refactor the helper methods to return only a single cert, since we
only care about the first one returned. And add a test.
We have decided that we don't like the if err := call(); err != nil
syntax, because it creates confusing scopes, but we have not cleaned up
all existing instances of that syntax. However, we have now found a
case where that syntax enables a bug: It caused readers to believe that
a later err = call() statement was assigning to an already-declared err
in the local scope, when in fact it was assigning to an
already-declared err in the parent scope of a closure. This caused our
ineffassign and staticcheck linters to be unable to analyze the
lifetime of the err variable, and so they did not complain when we
never checked the actual value of that error.
This change standardizes on the two-line error checking syntax
everywhere, so that we can more easily ensure that our linters are
correctly analyzing all error assignments.
Detect when a non-compliant ACME client presents a non-compliant x.509
certificate that contains multiple copies of the SubjectAlternativeName or ACME
Identifier extensions; this should be forbidden.
RFC 8737 states that the certificate presented during the "acme-tls/1"
handshake has "a subjectAltName extension containing the dNSName
being validated and no other entries". We were checking that it contained
no other dNSNames, but not requiring that it not have any other kinds of
Subject Alternative Names.
Factor all of our SAN checks into a helper function. Have that function
construct the expected bytes of the SAN extension from the one DNS
name we expect to see, and assert that the actual bytes match the
expectation. Add non-DNS-name identifiers to our error output when we
encounter a cert whose SANs don't match. And add tests which check
that we fail the validation when the cert has multiple SANs.
Current metrics show that subscribers present certificates using the
obsolete OID to identify their id-pe-acmeIdentifier extension about
an order of magnitude less often than they present the correct OID.
Remove support for the never-standardized OID.
RFC 8737, Section 4, states "ACME servers that implement "acme-tls/1"
MUST only negotiate TLS 1.2 [RFC5246] or higher when connecting to
clients for validation." Enforce that our outgoing connections to validate
TLS-ALPN-01 challenges do not negotiate TLS1.1.
Add `stylecheck` to our list of lints, since it got separated out from
`staticcheck`. Fix the way we configure both to be clearer and not
rely on regexes.
Additionally fix a number of easy-to-change `staticcheck` and
`stylecheck` violations, allowing us to reduce our number of ignored
checks.
Part of #5681
When we query DNS for a host, and both the A and AAAA lookups fail or
are empty, combine both errors into a single error rather than only
returning the error from the A lookup.
Fixes#5819Fixes#5319
These log lines are sometimes useful for debugging, but are a normal
part of operation, not an error: Unbound will allow a response to
timeout if the remote server is too slow.
- Add function `validateServerAddress()` to `bdns/servers.go` which ensures that
DNS server addresses are TCP/ UDP dial-able per: https://golang.org/src/net/dial.go?#L281
- Add unit test for `validateServerAddress()` in `bdns/servers_test.go`
- Update `cmd/boulder-va/main.go` to handle `bdns.NewStaticProvider()`
potentially returning an error.
- Update unit tests in `bdns/dns_test.go`:
- Handle `bdns.NewStaticProvider()` potentially returning an error
- Add an IPv6 address to `TestRotateServerOnErr`
- Ensure DNS server addresses are validated by `validateServerAddress` whenever:
- `dynamicProvider.update() is called`
- `staticProvider` is constructed
- Construct server addresses using `net.JoinHostPost()` when
`dynamicProvider.Addrs()` is called
Fixes#5463
Add go1.17beta1 docker images to the set of things we build,
and integrate go1.17beta1 into the set of environments CI runs.
Fix one test which breaks due to an underlying refactoring in
the `crypto/x509` stdlib package. Fix one other test which breaks
due to new guarantees in the stdlib's TLS ALPN implementation.
Also removes go1.16.5 from CI so we're only running 2 versions.
Fixes#5480
Create script which finds every .proto file in the repo and correctly
invokes `protoc` for each. Create a single file with a `//go:generate`
directive to invoke the new script. Delete all of the other generate.go
files, so that our proto generation is unified in one place.
Fixes#5453