The RA should set the expiry of valid authorizations based only on the current time and the configured authorizationLifetime. It should not extend the pending authorization's lifetime by the authorizationLifetime.
Resolves#4617
I didn't gate this with a feature flag. If we think this needs an API announcement and gradual rollout (I don't personally think this change deserves that) then I think we should change the RA config's authorizationLifetimeDays value to 37 days instead of adding a feature flag that we'll have to clean up after the flag date. We can change it back to 30 after the flag date.
In a handful of places I've nuked old stats which are not used in any alerts or dashboards as they either duplicate other stats or don't provide much insight/have never actually been used. If we feel like we need them again in the future it's trivial to add them back.
There aren't many dashboards that rely on old statsd style metrics, but a few will need to be updated when this change is deployed. There are also a few cases where prometheus labels have been changed from camel to snake case, dashboards that use these will also need to be updated. As far as I can tell no alerts are impacted by this change.
Fixes#4591.
In the deep dark history of Boulder we ended up jamming contacts into
a VARCHAR db field. We need to make sure that when contacts are
marshaled the resulting bytes will fit into the column or a 500 will
be returned to the user when the SA RPC fails.
One day we should fix this properly and not return a hacky error message
that's hard for users to understand. Unfortunately that will likely
require a migration or a new DB table. In the shorter term this hack
will prevent 500s which is a clear improvement.
Prev. we weren't checking the domain portion of an email contact address
very strictly in the RA. This updates the PA to export a function that
can be used to validate the domain the same way we validate domain
portions of DNS type identifiers for issuance.
This also changes the RA to use the `invalidEmail` error type in more
places.
A new Go integration test is added that checks these errors end-to-end
for both account creation and account update.
We need the RA's `NewOrder` RPC to return a `berrors.Malformed` instance
when there are too many identifiers. A bare error will be turned into
a server internal problem by the WFE2's `web.ProblemDetailsForError`
call while a `berrors.Malformed` will produce the expected malformed
problem.
This commit fixes the err, updates the unit test, and adds an end-to-end
integration test so we don't mess this up again.
LE is popular and aims to popularise certificate issuance. End users who see
error messages cannot be assumed to be as DNS-experienced as previously. The
user-facing error messages in the policy authority file are terse and unobvious
to the point that they are often unlikely to be well understood by those they
are intended to inform, who may be "just trying to get a LE cert for their
domain".
We need to apply some fixes for bugs introduced in #4476 before it can be deployed, as such we need to revert #4495 as there needs to be a full deploy cycle between these two changes.
This reverts commit 3ae1ae1.
😭
This change set makes the authz2 storage format the default format. It removes
most of the functionality related to the previous storage format, except for
the SA fallbacks and old gRPC methods which have been left for a follow-up
change in order to make these changes deployable without introducing
incompatibilities.
Fixes#4454.
This also adds the badCSR error type specified by RFC 8555. It is a natural fit for the errors in VerifyCSR that aren't covered by badPublicKey. The web package function for converting a berror to
a problem is updated for the new badCSR error type.
The callers (RA and CA) are updated to return the berrors from VerifyCSR as is instead of unconditionally wrapping them as a berrors.MalformedError instance. Unit/integration tests are updated accordingly.
Resolves#4418
When the `features.PrecertificateRevocation` feature flag is enabled the WFE2
will allow revoking certificates for a submitted precertificate. The legacy WFE1
behaviour remains unchanged (as before (pre)certificates issued through the V1
API will be revocable with the V2 API).
Previously the WFE2 vetted the certificate from the revocation request by
looking up a final certificate by the serial number in the requested
certificate, and then doing a byte for byte comparison between the stored and
requested certificate.
Rather than adjust this logic to handle looking up and comparing stored
precertificates against requested precertificates (requiring new RPCs and an
additional round-trip) we choose to instead check the signature on the requested
certificate or precertificate and consider it valid for revocation if the
signature validates with one of the WFE2's known issuers. We trust the integrity
of our own signatures.
An integration test that performs a revocation of a precertificate (in this case
one that never had a final certificate issued due to SCT embedded errors) with
all of the available authentication mechanisms is included.
Resolves https://github.com/letsencrypt/boulder/issues/4414
This change adds two tables and two methods in the SA, to store precertificates
and serial numbers.
In the CA, when the feature flag is turned on, we generate a serial number, store it,
sign a precertificate and OCSP, store them, and then return the precertificate. Storing
the serial as an additional step before signing the certificate adds an extra layer of
insurance against duplicate serials, and also serves as a check on database availability.
Since an error storing the serial prevents going on to sign the precertificate, this decreases
the chance of signing something while the database is down.
Right now, neither table has read operations available in the SA.
To make this work, I needed to remove the check for duplicate certificateStatus entry
when inserting a final certificate and its OCSP response. I also needed to remove
an error that can occur when expiration-mailer processes a precertificate that lacks
a final certificate. That error would otherwise have prevented further processing of
expiration warnings.
Fixes#4412
This change builds on #4417, please review that first for ease of review.
In #4179 we added a different method of counting the certificatesPerName
rate limit that can provide the correct behavior for exact public suffix
matches without the need for a separate RPC call. This cleans up the
separate code paths in the SA and RA that are no longer necesary.
In the RA's recheckCAA function we loop through a list of *core.Authorizations, dispatching each to a Go routine that checks CAA for the authz and writes an error to a results channel.
Later, we iterate the same *core.Authorization list and read errors from the channel. If we get a non-nil error, then the current iteration's *core.Authorization is used as the identifier for the suberror created with the non-nil error.
This is a flawed approach and relies on the scheduling of recheck goroutines matching the iteration of the authorizations. When the goroutines write error results to the channel in an order that doesn't match the loop over the authorizations the RA will construct a suberror with the wrong identifier. This manifests as making the TestRecheckCAAFail unit test appear flaky, because it specifically checks the expected identifiers in the returned subproblems.
The fix involves writing both the checked authorization and the error result to the results channel. Later instead of iterating the authorizations we just read the correct number of results from the channel and use the attached authorization from the result when constructing a suberror.
Resolves#4248
Take away lessons:
Write unit tests and always verify expected values!
Always investigate flaky unit tests! Sometimes there's a real bug and not just a subpar test :-)
When Boulder's RA rechecks CAA for a set of authorization identifiers it
should use suberrors to make it easy to identify which of a possible 100
identifiers had a CAA issue at order finalization time.
Updates #4193Resolves#4235
This will allow implementing sub-problems without creating a cyclic
dependency between `core` and `problems`.
The `identifier` package is somewhat small/single-purpose and in the
future we may want to move more "ACME" bits beyond the `identifier`
types into a dedicated package outside of `core`.
The RA now updates a `names_per_cert` Prometheus histogram, sliced by
a "type" label. NewOrder requests will observe the number of identifiers
in the new order with the type label == "requested". Successful order
finalization will observe the number of names in the issued certificate
with the type label == "issued".
This follows up on some refactoring we had done previously but not
completed. This removes various binary-specific config structs from the
common cmd package, and moves them into their appropriate packages. In
the case of CT configs, they had to be moved into their own package to
avoid a dependency loop between RA and ctpolicy.
In general the RA test has an anti-pattern of too much global state. This piece
in particular got a little outdated over time, and there were some incorrect
comments littered around the code about what SupportedChallenges did or did not
contain. This change removes most cases where it was overridden, and moves the
main definition into initAuthorities.
Previously we'd have to look up authorizations by name, then re-fetch
them by ID for return to the WFE, because some SA calls did not include
challenge objects in the authorizations they return. However, now all SA
calls do include challenge objects, so we can delete this code and save
some lookups.
We introduced a test case called "TestUpdateAuthorizationNewRPC" when we
added the "PerformValidation" RPC at the SA. It mostly duplicated the
TestUpdateAuthorization test case that already existed. They evolved
alongside over the years, but we should have cleaned up the "NewRPC"
variant and removed the old one it was copied from a long time ago.
This PR implements new SA methods for handling authz2 style authorizations and updates existing SA methods to count and retrieve them where applicable when the `NewAuthorizationSchema` feature is enabled.
Fixes#4093Fixes#4082
Updates #4078
Updates #4077
Checking the "certificates per name" rate limit is moderately expensive, particularly
for domains that have lots of certificates on their subdomains. By checking the renewal
exemption first, we can save some database queries in a lot of cases.
Part of #4152
Previously only "pending" orders were returned by `sa.GetOrderForNames` for the RA to reuse in `ra.NewOrder`. When we added the "ready" status to match late RFC 8555 developments we forgot to update `GetOrderForNames` to return "ready" orders. Prior to the "ready" status existing a fully validated order would have been "pending" and reused. This branch updates the reuse logic to restore reuse of validated orders.
Resolves https://github.com/letsencrypt/boulder/issues/4117
Early ACME drafts supported a notion of "combinations" of challenges
that had to be completed together. This was removed from subsequent
drafts. Boulder has only ever supported "combinations" that exactly map
to the list of challenges, 1 for 1.
This removes all the plumbing for combinations, and adds a list of
combinations to the authz JSON right before marshaling it in WFE1.
Precursor to #4116. Since some of our dependencies impose a minimum
version on these two packages higher than what we have in Godeps, we'll
have to bump them anyhow. Bumping them independently of the modules
update should keep things a little simpler.
In order to get protobuf tests to pass, I had to update protoc-gen-go in
boulder-tools. Now we download a prebuilt binary instead of using the
Ubuntu package, which is stuck on 3.0.0. This also meant I needed to
re-generate our pb.go files, since the new version generates somewhat
different output.
This happens to change the tag for pbutil, but it's not a substantive change - they just added a tagged version where there was none.
$ go test github.com/miekg/dns/...
ok github.com/miekg/dns 4.675s
ok github.com/miekg/dns/dnsutil 0.003s
ok github.com/golang/protobuf/descriptor (cached)
ok github.com/golang/protobuf/jsonpb (cached)
? github.com/golang/protobuf/jsonpb/jsonpb_test_proto [no test files]
ok github.com/golang/protobuf/proto (cached)
? github.com/golang/protobuf/proto/proto3_proto [no test files]
? github.com/golang/protobuf/proto/test_proto [no test files]
ok github.com/golang/protobuf/protoc-gen-go (cached)
? github.com/golang/protobuf/protoc-gen-go/descriptor [no test files]
ok github.com/golang/protobuf/protoc-gen-go/generator (cached)
ok github.com/golang/protobuf/protoc-gen-go/generator/internal/remap (cached)
? github.com/golang/protobuf/protoc-gen-go/grpc [no test files]
? github.com/golang/protobuf/protoc-gen-go/plugin [no test files]
ok github.com/golang/protobuf/ptypes (cached)
? github.com/golang/protobuf/ptypes/any [no test files]
? github.com/golang/protobuf/ptypes/duration [no test files]
? github.com/golang/protobuf/ptypes/empty [no test files]
? github.com/golang/protobuf/ptypes/struct [no test files]
? github.com/golang/protobuf/ptypes/timestamp [no test files]
? github.com/golang/protobuf/ptypes/wrappers [no test files]
Previously we relied on each instance of `features.Set` to have a
corresponding `defer features.Reset()`. If we forget that, we can wind
up with unexpected behavior where features set in one test case leak
into another test case. This led to the bug in
https://github.com/letsencrypt/boulder/issues/4118 going undetected.
Fix#4120
* Remove the challenge whitelist
* Reduce the signature for ChallengesFor and ChallengeTypeEnabled
* Some unit tests in the VA were changed from testing TLS-SNI to testing the same behavior
in TLS-ALPN, when that behavior wasn't already tested. For instance timeouts during connect
are now tested.
Fixes#4109