Remove all error checking and type transformation from the gRPC wrappers
for the following methods on the SA:
- GetRegistration
- GetRegistrationByKey
- NewRegistration
- UpdateRegistration
- DeactivateRegistration
Update callers of these methods to construct the appropriate protobuf
request messages directly, and to consume the protobuf response messages
directly. In many cases, this requires changing the way that clients
handle the `Jwk` field (from expecting a `JSONWebKey` to expecting a
slice of bytes) and the `Contacts` field (from expecting a possibly-nil
pointer to relying on the value of the `ContactsPresent` boolean field).
Implement two new methods in `sa/model.go` to convert directly between
database models and protobuf messages, rather than round-tripping
through `core` objects in between. Delete the older methods that
converted between database models and `core` objects, as they are no
longer necessary.
Update test mocks to have the correct signatures, and update tests to
not rely on `JSONWebKey` and instead use byte slices.
Fixes#5531
- Move `AdministrativelyRevokeCertificate` logic from `grpc` to `ra`
- Test new error conditions in `ra/ra_test.go`
- Update `ra` mocks in `wfe` tests
Fixes#5529
Update the RA to specify the IssuerNameID rather than the IssuerID when
requesting that the CA generate a new OCSP response for a revoked
certificate.
Depends on #5515
Part of #5152
- Move response validation from `RA` client wrapper to `WFE` and `WFE2`
- Move request validation from `RA` server wrapper to `RA`
- Refactor `RA` tests to construct valid `core.Authorization` objects
- Consolidate multiple error declarations to global `errIncompleteGRPCRequest`
Fixes#5439
Replace `core.Empty` with `google.protobuf.Empty` in all of our gRPC
methods which consume or return an empty protobuf. The golang core
proto libraries provide an empty message type, so there is no need
for us to reinvent the wheel.
This change is backwards-compatible and does not require a special
deploy. The protobuf message descriptions of `core.Empty` and
`google.protobuf.Empty` are identical, so their wire-formats are
indistinguishable and therefore interoperable / cross-compatible.
Fixes#5443
Update the signature of the RA's RevokeCertificateWithReg
method to exactly match that of the gRPC method it implements.
Remove all logic from the `RevokeCertificateWithReg` client
and server wrappers. Move the small amount of checking they
were performing directly into the server implementation.
Fixes#5440
Add Honeycomb tracing to all Boulder components which act as
HTTP servers, gRPC servers, or gRPC clients. Add many values
which we currently emit to logs to the trace spans. Add a way to
configure the Honeycomb integration to our config files, and by
default configure all of our tests to "mute" (send nothing).
Followup changes will refine the configuration, attempt to reduce
the new dependency load, and introduce better sampling.
Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218
Add a new rate limit, identical in implementation to the current
`CertificatesPerFQDNSet` limit, intended to always have both a lower
window and a lower threshold. This allows us to block runaway clients
quickly, and give their owners the ability to fix and try again quickly
(on the order of hours instead of days).
Configure the integration tests to set this new limit at 2 certs per 2
hours. Also increase the existing limit from 5 to 6 certs in 7 days, to
allow clients to hit the first limit three times before being fully
blocked for the week. Also add a new integration test to verify this
behavior.
Note that the new ratelimit must have a window greater than the
configured certificate backdate (currently 1 hour) in order to be
useful.
Fixes#5210
Update the RA's `revokeCertificate` method to identify the
certificate to be revoked using its serial and issuer ID, rather
than its full DER-encoded bytes. This removes one of the
two remaining places that the certDER codepath is used.
Also update the admin-revoker tests to properly set up an
actual issuer, so that revocation works.
Part of #5079
Delete the PublisherClientWrapper and PublisherServerWrapper. Update
various structs and functions to expect a pubpb.PublisherClient instead
of a core.Publisher; these two interfaces differ only in that the
auto-generated PublisherClient takes a variadic CallOptions parameter.
Update all mock publishers in tests to match the new interface. Finally,
delete the now-unused core.Publisher interface and some already-unused
mock-generating code.
This deletes a single sanity check (for a nil SCT even when there is a
nil error), but that check was redundant with an identical check in the
only extant client code in ctpolicy.go.
Fixes#5323
Delete the CertificateAuthorityClientWrapper, OCSPGeneratorClientWrapper,
and CertificateAuthorityServerWrapper structs, which provided no error
checking above and beyond their wrapped types. Replace them with the
corresponding auto-generated gRPC types in calling code. Update some
mocks to have the necessary variadic grpc.CallOption parameter. Finally,
delete the now-unused core.CertificateAuthority interface.
Fixes#5324
Move the validated timestamp to the RA where the challenge is passed to
the SA for database storage. If a challenge becomes valid or invalid, take
the validated timestamp and store it in the attemptedAt field of the
authz2 table. Upon retrieval of the challenge from the database, add the
attemptedAt value to challenge.Validated which is passed back to the WFE
and presented to the user as part of the challenge as required in ACME
RFC8555.
Fix: #5198
We intend to delete the v1 API (i.e. `wfe` and its associated codepaths)
in the near future, and as such are not giving it new features or
capabilities. However, before then we intend to allow the v2 API to
provide issuance both from our RSA and from our ECDSA intermediates.
The v1 API cannot gain such capability at the same time.
The CA doesn't know which frontend originated any given issuance
request, so we can't simply gate the single- or double-issuer behavior
based on that. Instead, this change introduces the ability for the
WFE (and the RA, which sits between the WFE and the CA) to request
issuance from a specific intermediate. If the specified intermediate is
not available in the CA, issuance will fail. If no intermediate is
specified (as is the case in requests coming from wfe2), it falls back
to selecting the issuer based on the algorithm of the public key to
be signed.
Fixes#5216
The database stores order expiry values as type DATETIME, which only
supports values with second-level accuracy. (Contrast with type
DATETIME(6), which allows microsecond accuracy.) But the order object
being written to the database does not have its expiry value rewritten
by the insert process. This leads to Boulder returning different
values for `expires` depending on whether the order was created fresh
(nanosecond accuracy) or retrieved from the db (second accuracy).
There appears to be only one codepath which suffers from this
discrepancy. Although Authorization objects also have an `expires`
field, they are never returned to the client immediately upon creation;
they are first exposed to the user as URLs within an Order object, and
so are always retrieved from the database.
Fixes#5166
This error class is only used in one instance, and when returned to
the user it is transformed into a `probs.Malformed` anyway. It does
more harm than good to keep this one-off BoulderError around, as
it introduces confusion about what sorts of errors we expose to the
client.
Fixes#5167
The RA is responsible for contacting Akamai to purge cached OCSP
responses when a certificate is revoked and fresh OCSP responses need to
be served ASAP. In order to do so, it needs to construct the same OCSP
URLs that clients would construct, and that Akamai would cache. In order
to do that, it needs access to the issuing certificate to compute a hash
across its Subject Info and Public Key.
Currently, the RA holds a single issuer certificate in memory, and uses
that cert to compute all OCSP URLs, on the assumption that all certs
we're being asked to revoke were issued by the same issuer.
In order to support issuance from multiple intermediates at the same
time (e.g. RSA and ECDSA), and to support rollover between different
issuers of the same type (we may need to revoke certs issued by two
different issuers for the 90 days in which their end-entity certs
overlap), this commit changes the configuration to provide a list of
issuer certificates instead.
In order to support efficient lookup of issuer certs, this change also
introduces a new concept, the Chain ID. The Chain ID is a truncated hash
across the raw bytes of either the Issuer Info or the Subject Info of a
given cert. As such, it can be used to confirm issuer/subject
relationships between certificates. In the future, this may be a
replacement for our current IssuerID (a truncated hash over the whole
issuer certificate), but for now it is used to map revoked certs to
their issuers inside the RA.
Part of #5120
This change adds two new test assertion helpers, `AssertErrorIs`
and `AssertErrorWraps`. The former is a wrapper around `errors.Is`,
and asserts that the error's wrapping chain contains a specific (i.e.
singleton) error. The latter is a wrapper around `errors.As`, and
asserts that the error's wrapping chain contains any error which is
of the given type; it also has the same unwrapping side effect as
`errors.As`, which can be useful for further assertions about the
contents of the error.
It also makes two small changes to our `berrors` package, namely
making `berrors.ErrorType` itself an error rather than just an int,
and giving `berrors.BoulderError` an `Unwrap()` method which
exposes that inner `ErrorType`. This allows us to use the two new
helpers above to make assertions about berrors, rather than
having to hand-roll equality assertions about their types.
Finally, it takes advantage of the two changes above to greatly
simplify many of the assertions in our tests, removing conditional
checks and replacing them with simple assertions.
The only caller of this function is the RA's `revokeCertificate`
method, which already has the hydrated `x509.Certificate`
version of the cert. There's no need to pass the raw version
and re-parse the DER again, just pass a reference to the
existing cert.
This was already part done: There is an ID() method in issuance. This
change extends that by:
- Defining a type alias indicating something is an IssuerID.
- Defining issuance.Certificate, which also has an ID() method,
so that components that aren't the CA can load certificates and
use the type system to mark them as issuers (and get their IDs).
- Converting akamai-purger and ca to use the new types.
- Removing idForIssuer from ca.go.
These were used during the transition to authzv2. The SA side of these
RPCs already ignores these booleans. This is just cleaning up the
protobufs and call sites.
One slightly surprising / interesting thing: Since core types like
Order and Registration are still proto2 and have pointer fields,
there are actually some places in this PR where I had to add
a `*` rather than delete an `&`, because I was taking a pointer
field from one of those core types and passing it as a field in
an SA RPC request.
Fixes#5037.
Updates the Registration Authority to use proto3 for its
RPC methods. This turns out to be a fairly minimal change,
as many of the RA's request and response messages are
defined in core.proto, and are therefore still proto2.
Fixes#4955
Any field which can be zero must be allowed to be nil,
so that a proto2 server receiving requests from a proto3
client is willing to process messages with zero-value fields
encoded as missing.
Part of #4955
As part of the migration to proto3, any fields in requests that may be
zero should also be allowed to be nil. That's because proto3 will
represent those fields as absent when they have their zero value.
This is based on a manual review of the wrappers for the SA, plus
a pair of integration test runs. For the integration test runs I took these
steps:
1. Copy sa/proto to sa/proto2
2. Change sa/proto to use proto3 and regenerate.
3. In sa/*.go and cmd/boulder-sa/main.go, update the imports to use the
proto2 version.
4. Split grpc/sa-wrappers.go into sa-server-wrappers.go and sa-wrappers.go
(containing the client code)
5. In sa-server-wrappers.go, change the import to use sa/proto2.
6. In sa-server-wrappers.go, make a local copy of the core.StorageAuthority
interface that uses the sa/proto2 types. This was necessary as
a temporary kludge because of how the server wrapper internally
uses the core.StorageAuthority interface.
7. Fix all the pointer-vs-value build errors in every other package.
8. Run integration tests.
I also performed those steps with proto2 and proto3 swapped, to confirm the
behavior when a proto2 client talks to a proto3 SA.
This updates va.proto to use proto3 syntax, and updates
all clients of the autogenerated code to use the new types.
In particular, it removes indirection from built-in types
(proto3 uses ints, rather than pointers to ints, for example).
Depends on #5003Fixes#4956
ra.PerformValidation's goroutine surfaces errors not by returning them,
but by accumulating them into the `prob`variable and saving them to
the database. This makes it possible for processing to continue even
in error cases when it should (mostly) halt. This change fixes a bug
where we would try to access a member of the result returned from
va.PerformValidation, even if that function call had returned an error.
ACME Challenges are well-known strings ("http-01", "dns-01", and
"tlsalpn-01") identifying which kind of challenge should be used
to verify control of a domain. Because they are well-known and
only certain values are valid, it is better to represent them as
something more akin to an enum than as bare strings. This also
improves our ability to ensure that an AcmeChallenge is not
accidentally used as some other kind of string in a different
context. This change also brings them closer in line with the
existing core.AcmeResource and core.OCSPStatus string enums.
Fixes#5009
Updates the type of the ValidationAuthority's PerformValidation
method to be identical to that of the corresponding auto-generated
grpc method, i.e. directly taking and returning proto message
types, rather than exploded arguments.
This allows all logic to be removed from the VA wrappers, which
will allow them to be fully removed after the migration to proto3.
Also updates all tests and VA clients to adopt the new interface.
Depends on #4983 (do not review first four commits)
Part of #4956
We'd like to issue certs with no CN eventually, but it's not
going to happen any time soon. In the mean time, the existing
code never gets exercised and is rather complex, so this
removes it.
This is the only method on the ca which uses a non-proto
type as its request or response value. Changing this to
use a proto removes the last logic from the wrappers,
allowing them to be removed in a future CL. It also makes
the interface more uniform and easier to reason about.
Issue: #4940
We previously used mixed case names for proto imports
(e.g. both `caPB` and `rapb`), sometimes in the same file.
This change standardizes on the all-lowercase spelling,
which was predominant throughout the codebase.
This updates va.proto to use proto3 syntax, and updates
all clients of the autogenerated code to use the new types. In
particular, it removes indirection from built-in types (proto3
uses ints, rather than pointers to ints, for example).
Fixes#4956
This updates the ca.proto to use proto3 syntax, and updates
all clients of the autogenerated code to use the new types. In
particular, it removes indirection from built-in types (proto3
uses ints, rather than pointers to ints, for example).
It also updates a few instances where tests were being
conducted to see if various object fields were nil to instead
check for those fields' new zero-value.
Fixes#4940
This commit consists of three classes of changes:
1) Changing various command main.go files to always behave as they
would have when features.BlockedKeyTable was true. Also changing
one test in the same manner.
2) Removing the BlockedKeyTable flag from configuration in config-next,
because the flag is already live.
3) Moving the BlockedKeyTable flag to the "deprecated" section of
features.go, and regenerating featureflag_strings.go.
A future change will remove the BlockedKeyTable flag (and other
similarly deprecated flags) from features.go entirely.
Fixes#4873
This reverts commit 6454513ded.
We actually need to wait 90 days to ensure the issuerID field of the
certificateStatus table is non-nil for all extant certificates.
As part of that, add support for issuer IDs in orphan-finder's
and RA's calls to GenerateOCSP.
This factors out the idForIssuer logic from ca/ca.go into a new
issuercerts package.
orphan-finder refactors:
Add a list of issuers in config.
Create an orphanFinder struct to hold relevant fields, including the
newly added issuers field.
Factor out a storeDER function to reduce duplication between the
parse-der and parse-ca-log cases.
Use test certificates generated specifically for orphan-finder tests.
This was necessary because the issuers of these test certificates have
to be configured for the orphan finder.
This required a refactoring: Move validateEmail from the RA to ValidEmail
in the `policy` package. I also moved `ValidDomain` from a method on
PolicyAuthority to a standalone function so that ValidEmail can call it.
notify-mailer will now log invalid addresses and skip them without
attempting to send mail. Since @example.com addresses are invalid,
I updated the notify-mailer test, which used a lot of such addresses.
Also, now when notify-mailer receives an unrecoverable error sending
mail, it logs the email address and what offset within the list it was.
When we originally added this package (4 years ago) x/crypto/ocsp didn't
have its own list of revocation reasons, so we added our own. Now it does
have its own list, so just use that list instead of duplicating code for
no real reason.
Also we build a list of the revocation reasons we support so that we can
tell users when they try to use an unsupported one. Instead of building
this string every time, just build it once it during package initialization.
Finally return the same error message in wfe that we use in wfe2 when a
user requests an unsupported reason.
Adds a daemon which monitors the new blockedKeys table and checks for any unexpired, unrevoked certificates that are associated with the added SPKI hashes and revokes them, notifying the user that issued the certificates.
Fixes#4772.
https://tools.ietf.org/html/rfc8555#section-7.3
Clients MUST NOT
provide a "mailto" URL in the "contact" field that contains "hfields"
[RFC6068] or more than one "addr-spec" in the "to" component. If a
server encounters a "mailto" contact URL that does not meet these
criteria, then it SHOULD reject it as invalid.
The RA should set the expiry of valid authorizations based only on the current time and the configured authorizationLifetime. It should not extend the pending authorization's lifetime by the authorizationLifetime.
Resolves#4617
I didn't gate this with a feature flag. If we think this needs an API announcement and gradual rollout (I don't personally think this change deserves that) then I think we should change the RA config's authorizationLifetimeDays value to 37 days instead of adding a feature flag that we'll have to clean up after the flag date. We can change it back to 30 after the flag date.
In a handful of places I've nuked old stats which are not used in any alerts or dashboards as they either duplicate other stats or don't provide much insight/have never actually been used. If we feel like we need them again in the future it's trivial to add them back.
There aren't many dashboards that rely on old statsd style metrics, but a few will need to be updated when this change is deployed. There are also a few cases where prometheus labels have been changed from camel to snake case, dashboards that use these will also need to be updated. As far as I can tell no alerts are impacted by this change.
Fixes#4591.
In the deep dark history of Boulder we ended up jamming contacts into
a VARCHAR db field. We need to make sure that when contacts are
marshaled the resulting bytes will fit into the column or a 500 will
be returned to the user when the SA RPC fails.
One day we should fix this properly and not return a hacky error message
that's hard for users to understand. Unfortunately that will likely
require a migration or a new DB table. In the shorter term this hack
will prevent 500s which is a clear improvement.
Prev. we weren't checking the domain portion of an email contact address
very strictly in the RA. This updates the PA to export a function that
can be used to validate the domain the same way we validate domain
portions of DNS type identifiers for issuance.
This also changes the RA to use the `invalidEmail` error type in more
places.
A new Go integration test is added that checks these errors end-to-end
for both account creation and account update.
We need the RA's `NewOrder` RPC to return a `berrors.Malformed` instance
when there are too many identifiers. A bare error will be turned into
a server internal problem by the WFE2's `web.ProblemDetailsForError`
call while a `berrors.Malformed` will produce the expected malformed
problem.
This commit fixes the err, updates the unit test, and adds an end-to-end
integration test so we don't mess this up again.
We need to apply some fixes for bugs introduced in #4476 before it can be deployed, as such we need to revert #4495 as there needs to be a full deploy cycle between these two changes.
This reverts commit 3ae1ae1.
😭
This change set makes the authz2 storage format the default format. It removes
most of the functionality related to the previous storage format, except for
the SA fallbacks and old gRPC methods which have been left for a follow-up
change in order to make these changes deployable without introducing
incompatibilities.
Fixes#4454.
This also adds the badCSR error type specified by RFC 8555. It is a natural fit for the errors in VerifyCSR that aren't covered by badPublicKey. The web package function for converting a berror to
a problem is updated for the new badCSR error type.
The callers (RA and CA) are updated to return the berrors from VerifyCSR as is instead of unconditionally wrapping them as a berrors.MalformedError instance. Unit/integration tests are updated accordingly.
Resolves#4418
In #4179 we added a different method of counting the certificatesPerName
rate limit that can provide the correct behavior for exact public suffix
matches without the need for a separate RPC call. This cleans up the
separate code paths in the SA and RA that are no longer necesary.
In the RA's recheckCAA function we loop through a list of *core.Authorizations, dispatching each to a Go routine that checks CAA for the authz and writes an error to a results channel.
Later, we iterate the same *core.Authorization list and read errors from the channel. If we get a non-nil error, then the current iteration's *core.Authorization is used as the identifier for the suberror created with the non-nil error.
This is a flawed approach and relies on the scheduling of recheck goroutines matching the iteration of the authorizations. When the goroutines write error results to the channel in an order that doesn't match the loop over the authorizations the RA will construct a suberror with the wrong identifier. This manifests as making the TestRecheckCAAFail unit test appear flaky, because it specifically checks the expected identifiers in the returned subproblems.
The fix involves writing both the checked authorization and the error result to the results channel. Later instead of iterating the authorizations we just read the correct number of results from the channel and use the attached authorization from the result when constructing a suberror.
Resolves#4248
Take away lessons:
Write unit tests and always verify expected values!
Always investigate flaky unit tests! Sometimes there's a real bug and not just a subpar test :-)
When Boulder's RA rechecks CAA for a set of authorization identifiers it
should use suberrors to make it easy to identify which of a possible 100
identifiers had a CAA issue at order finalization time.
Updates #4193Resolves#4235
This will allow implementing sub-problems without creating a cyclic
dependency between `core` and `problems`.
The `identifier` package is somewhat small/single-purpose and in the
future we may want to move more "ACME" bits beyond the `identifier`
types into a dedicated package outside of `core`.
The RA now updates a `names_per_cert` Prometheus histogram, sliced by
a "type" label. NewOrder requests will observe the number of identifiers
in the new order with the type label == "requested". Successful order
finalization will observe the number of names in the issued certificate
with the type label == "issued".
Previously we'd have to look up authorizations by name, then re-fetch
them by ID for return to the WFE, because some SA calls did not include
challenge objects in the authorizations they return. However, now all SA
calls do include challenge objects, so we can delete this code and save
some lookups.
This PR implements new SA methods for handling authz2 style authorizations and updates existing SA methods to count and retrieve them where applicable when the `NewAuthorizationSchema` feature is enabled.
Fixes#4093Fixes#4082
Updates #4078
Updates #4077
Checking the "certificates per name" rate limit is moderately expensive, particularly
for domains that have lots of certificates on their subdomains. By checking the renewal
exemption first, we can save some database queries in a lot of cases.
Part of #4152
Previously only "pending" orders were returned by `sa.GetOrderForNames` for the RA to reuse in `ra.NewOrder`. When we added the "ready" status to match late RFC 8555 developments we forgot to update `GetOrderForNames` to return "ready" orders. Prior to the "ready" status existing a fully validated order would have been "pending" and reused. This branch updates the reuse logic to restore reuse of validated orders.
Resolves https://github.com/letsencrypt/boulder/issues/4117
Early ACME drafts supported a notion of "combinations" of challenges
that had to be completed together. This was removed from subsequent
drafts. Boulder has only ever supported "combinations" that exactly map
to the list of challenges, 1 for 1.
This removes all the plumbing for combinations, and adds a list of
combinations to the authz JSON right before marshaling it in WFE1.
* Remove the challenge whitelist
* Reduce the signature for ChallengesFor and ChallengeTypeEnabled
* Some unit tests in the VA were changed from testing TLS-SNI to testing the same behavior
in TLS-ALPN, when that behavior wasn't already tested. For instance timeouts during connect
are now tested.
Fixes#4109
If an order for a given set of names will fail finalization because of certificate rate limits (certs per domain, certs per fqdn set) there isn't any point in allowing an order for those names to be created. We can stop a lot of requests earlier by enforcing the cert rate limits at new order time as well as finalization time. A new RA `EarlyOrderRateLimit` feature flag controls whether this is done or not.
Resolves#3975
The existing (but undeployed) `AllowRenewalFirstRL` feature flag is used to gate whether the SA `CountCertificatesByNames` and `CountCertificatesByExactNames` RPCs will exclude renewals from the returned counts using the `issuedNames` table's `renewal` field.
The previous implementation of `AllowRenewalFirstRL` is deleted. It wasn't performant in specific corner cases.
There's no new integration test in this branch because the existing `test_renewal_exemption` integration test from the first `AllowRenewalFirstRL` implementation provides the required coverage for `config-next` runs.
Resolves#4060Resolves#4006
Implements a feature that enables immediate revocation instead of marking a certificate revoked and waiting for the OCSP-Updater to generate the OCSP response. This means that as soon as the request returns from the WFE the revoked OCSP response should be available to the user. This feature requires that the RA be configured to use the standalone Akamai purger service.
Fixes#4031.
Before fixing `ra.NewOrder` the `TestNewOrderAuthzReuseDisabled` test
from this branch fails as expected based on #4026:
```
=== RUN TestNewOrderAuthzReuseDisabled
--- FAIL: TestNewOrderAuthzReuseDisabled (0.24s)
ra_test.go:2477: "reused-valid-authz" == "reused-valid-authz"
FAIL
FAIL github.com/letsencrypt/boulder/ra 0.270s
```
Afterwards it passes:
```
=== RUN TestNewOrderAuthzReuseDisabled
--- PASS: TestNewOrderAuthzReuseDisabled (0.26s)
PASS
ok github.com/letsencrypt/boulder/ra 1.291s
```
Staging and prod both deployed the PerformValidationRPC feature flag. All running WFE/WFE2 instances are using the more accurately named PerformValidation RPC and we can strip out the old UpdateAuthorization bits. The feature flag for PerformValidationRPC remains until we clean up the staging/prod configs.
Resolves#3947 and completes the last of #3930
The existing RA `UpdateAuthorization` RPC needs replacing for
two reasons:
1. The name isn't accurate - `PerformValidation` better captures
the purpose of the RPC.
2. The `core.Challenge` argument is superfluous since Key
Authorizations are not sent in the initiation POST from the client
anymore. The corresponding unmarshal and verification is now
removed. Notably this means broken clients that were POSTing
the wrong thing and failing pre-validation will now likely fail
post-validation.
To remove `UpdateAuthorization` the new `PerformValidation`
RPC is added alongside the old one. WFE and WFE2 are
updated to use the new RPC when the perform validation
feature flag is enabled. We can remove
`UpdateAuthorization` and its associated wrappers once all
WFE instances have been updated.
Resolves https://github.com/letsencrypt/boulder/issues/3930
Removes superfluous usage of `UpdatePendingAuthorization` in the RA to update the key authorization and test if the authorization is pending and instead uses the result of the initial `GetAuthorization` call in the WFE.
Fixes#3923.
At one point, the POST to a challenge object was in theory updating a
resource, and might have a "Type" field. We considered adding a check
that the "Type" field matched the stored "Type" field. In ACMEv2, the POST
body is always simply `{}` so this check is irrelevant. And we're unlikely
to change the behavior in ACMEv1, so let's get rid of the counter and TODO.
Resolves https://github.com/letsencrypt/boulder/issues/3872
**Note to reviewers**: There's an outstanding bug that I've tracked down to the `--load` stage of the integration tests that results in one of the remote VA instances in the `test/config-next` configuration under Go 1.11.1 to fail to cleanly shut down. I'm working on finding the root cause but in the meantime I've disabled `--load` during CI so we can unblock moving forward with getting Go 1.11.1 in dev/CI. Tracking this in https://github.com/letsencrypt/boulder/issues/3889
Removes the checks for a handful of deployed feature flags in preparation for removing the flags entirely. Also moves all of the currently deprecated flags to a separate section of the flags list so they can be more easily removed once purged from production configs.
Fixes#3880.
Required a little bit of rework of the RA issuance flow (to add parsing of the precert to determine the expiration date, and moving final cert parsing before final cert submission) and RA tests, but I think it shouldn't create any issues...
Fixes#3197.
In db01b0b we removed email validation from the RA. This was the only
use of the `bdns` package by the RA and so we can go one step further
and delete the remaining setup, configuration and `bdns` fields.
Performing DNS lookups to check the A/AAAA/MX records of a provided contact e-mail address adds variability to the RA's NewRegistration/UpdateRegistration functions and requires that the RA be able to reach out to the EFN. Since this is simply a convenience to prevent some classes of registration errors we can remove it to improve performance and to tighten up our security posture slightly.
Resolves https://github.com/letsencrypt/boulder/issues/3849
Things removed:
* features.EmbedSCTs (and all the associated RA/CA/ocsp-updater code etc)
* ca.enablePrecertificateFlow (and all the associated RA/CA code)
* sa.AddSCTReceipt and sa.GetSCTReceipt RPCs
* publisher.SubmitToCT and publisher.SubmitToSingleCT RPCs
Fixes#3755.
This adds support for the account-uri CAA parameter as specified by
section 3 of https://tools.ietf.org/html/draft-ietf-acme-caa-04, allowing
issuance to be restricted to one or more ACME accounts as specified by CAA
records.
We see a fair number of ACME accounts/registrations with contact
addresses for the RFC2606 Section 3 "Reserved Example Second Level
Domain Names" (`example.com`, `example.net`, `example.org`). These are
not real contact addresses and are likely the result of the user
copy-pasting example configuration. These users will miss out on
expiration emails and other subscriber communications :-(
This commit updates the RA's `validateEmail` function to reject any
contact addresses for reserved example domain names. The corresponding
unit test is updated accordingly.
Resolves https://github.com/letsencrypt/boulder/issues/3719
When performing CAA checking respect the validation-methods parameter (if
present) and restrict the allowed authorization methods to those specified.
This allows a domain to restrict authorization methods that can be used with
Let's Encrypt.
This is largely based on PR #3003 (by @lukaslihotzki), which was landed and
then later reverted due to issue #3143. The bug the resulted in the previous
code being reverted has been addressed (likely inadvertently) by 76973d0f.
This implementation also includes integration tests for CAA validation-methods.
Fixes issue #3143.
When rechecking CAA, the existing code maps all failures to a CAAError.
This means that any other non-CAA failure (for example, an internal server
error) gets hidden.
Avoid this by reworking recheckCAA to return errors and if we find a
non-CAAError, we return that directly. Revise tests to cover both
situations.
Updates issue #3143.
So I took a quick stab at one possible solution for the authz expiration variance discussed over at community.letsencrypt.org/t/inconsistent-datetime-format-in-some-responses/61452
Golang's nanosecond precision results in newly created pending authz having one expires datetime, but subsequent requests have a different expires datetime due to database storage throwing away fractional second information.
"expires": "2018-04-14T04:43:00.105818284Z",
...
"expires": "2018-04-14T04:43:00Z",
I am not a Go expert so there might be some more widely accepted approach to accomplishing the same thing, please let me know if you would prefer a different solution.
Using both a sync.WaitGroup and channel is somewhat unnecessary - instead
synchronize directly on the channel. Additionally, strings in Go are
immutable - as such using string concatentation in a loop results in
reallocation. Use a string slice and strings.Join, which not only avoids
this, but also cuts down on the lines of code needed.
A very large number of the logger calls are of the form log.Function(fmt.Sprintf(...)).
Rather than sprinkling fmt.Sprintf at every logger call site, provide formatting versions
of the logger functions and call these directly with the format and arguments.
While here remove some unnecessary trailing newlines and calls to String/Error.
Submits final certificates to any configured CT logs. This doesn't introduce a feature flag as it is config gated, any log we want to submit final certificates to needs to have it's log description updated to include the `"submitFinalCerts": true` field.
Fixes#3605.
In #3614 we adjusted the `SA.NewOrder` function to conditionally call `ssa.statusForOrder` on the new order when `features.OrderReadyStatus` was enabled. Unfortunately this call to `ssa.statusForOrder` happened *before* the `req.BeganProcessing` field was initialized with a pointer to a `false` bool. The `ssa.statusForOrder` function (correctly) assumes that `req.BeganProcessing == nil` is illegal and doesn't correspond to a known status. This results in `NewOrder` requests returning a 500 error
of the form:
> Internal error - Error creating new order - Order XXX is in an invalid state. No state known for this order's authorizations
Our integration tests missed this because we didn't have a test case that issued for a set of names with one account, and then issued again for the same set of names with the same account.
This PR fixes the original bug by moving the `BeganProcessing` initialization before the call to `statusForOrder`. This PR also adds an integration test to catch this sort of bug again in the future.
Prior to the SA fix this test failed with the 500 server internal error observed by the Certbot team. With the SA fix in place the test passes again.
Finally, this PR disables the `OrderReadyStatus` feature flag in `test/config-next/sa.json`. Certbot's ACME implementation breaks when this flag is enabled (See https://github.com/certbot/certbot/issues/5856). Since Certbot runs integration tests against Boulder with config-next we should be courteous and leave this flag disabled until we are closer to being able to turn it on for staging/prod.
The `TotalCertificates` rate limit serves to ensure we don't
accidentally exceed our OCSP signing capacity by issuing too many
certificates within a fixed period. In practice this rate limit has been
fragile and the associated queries have been linked to performance
problems.
Since we now have better means of monitoring our OCSP signing capacity
this commit removes the rate limit and associated code.
* SA: Add Order "Ready" status, feature flag.
This commit adds the new "Ready" status to `core/objects.go` and updates
`sa.statusForOrder` to use it conditionally for orders with all valid
authorizations that haven't been finalized yet. This state is used
conditionally based on the `features.OrderReadyStatus` feature flag
since it will likely break some existing clients that expect status
"Processing" for this state. The SA unit test for `statusForOrder` is
updated with a "ready" status test case.
* RA: Enforce order ready status conditionally.
This commit updates the RA to conditionally expect orders that are being
finalized to be in the "ready" status instead of "pending". This is
conditionally enforced based on the `OrderReadyStatus` feature flag.
Along the way the SA was changed to calculate the order status for the
order returned in `sa.NewOrder` dynamically now that it could be
something other than "pending".
* WFE2: Conditionally enforce order ready status for finalization.
Similar to the RA the WFE2 should conditionally enforce that an order's
status is either "ready" or "pending" based on the "OrderReadyStatus"
feature flag.
* Integration: Fix `test_order_finalize_early`.
This commit updates the V2 `test_order_finalize_early` test for the
"ready" status. A nice side-effect of the ready state change is that we
no longer invalidate an order when it is finalized too soon because we
can reject the finalization in the WFE. Subsequently the
`test_order_finalize_early` testcase is also smaller.
* Integration: Test classic behaviour w/o feature flag.
In the previous commit I fixed the integration test for the
`config/test-next` run that has the `OrderReadyStatus` feature flag set
but broke it for the `config/test` run without the feature flag.
This commit updates the `test_order_finalize_early` test to work
correctly based on the feature flag status in both cases.
Previously we updated the RA's issueCertificateInner function to prefix errors returned from the CA with meaningful information about which CA RPC caused the failure. Unfortunately by using fmt.Errorf to do this we're discarding the underlying error type. This can cause unexpected server internal errors downstream if (for e.g.) the CA rejects a CSR with a malformed error (see #3632).
This PR updates the issueCertificateInner error message prefixing to maintain the error type if the underlying error is a berrors.BoulderError. A RA unit test with several mock CAs is added to test the prefixing occurs as expected without loss of error type.
This PR also adds an integration test that ensures we reject a CSR with >100 names with a malformed error. This is not strictly related to this PR but since I wrote it while debugging the root issue I thought I'd include it. To allow this test to pass the pendingAuthorizationsPerAccount in test/rate-limit-policies.yml and associated tests had to be adjusted.
Resolves#3632
This PR updates the RA such that certificateRequestEvent objects created during issuance and written to the audit log as JSON also include a new Authorizations field. This field is a map of the form map[string]certificateRequestAuthz and can be used to map from an identifier name appearing in the associated certificate to a certificateRequestAuthz object. Each of the certificateRequestAuthz objects holds an authorization ID and the type of challenge that made the authorization valid.
Example Audit log output (with the JSON pulled out and pretty-printed):
{
"ID":"0BjPk94KlxExRRIQ3kslRVSJ68KMaTh416chRvq0wyA",
"Requester":666,
"SerialNumber":"ff699d91cab5bc84f1bc97fc71e4e27abc1a",
"VerifiedFields":["subject.commonName","subjectAltName"],
"CommonName":"rand.44634cbf.xyz",
"Names":["rand.44634cbf.xyz"],
"NotBefore":"2018-03-28T19:50:07Z",
"NotAfter":"2018-06-26T19:50:07Z",
"RequestTime":"2018-03-28T20:50:07.234038859Z",
"ResponseTime":"2018-03-28T20:50:07.278848954Z",
"Authorizations": {
"rand.44634cbf.xyz" : {
"ID":"jGt37Rnvfr0nhZn-wLkxrQxc2HBfV4t6TSraRiGnNBM",
"ChallengeType":"http-01"
}
}
}
Resolves#3253
Also instead of repeating the same bucket definitions everywhere just use a single top level var in the metrics package in order to discourage copy/pasting.
Fixes#3607.
This commit updates CTPolicy & the RA to return a distinct error when
the RA is unable to fetch the required SCTs for a certificate when
processing an issuance. This error type is plumbed up to the WFE/WFE2
where the `web/probs.go` code converts it into a server internal error
with a suitable user facing error.
Prior to this commit an order's expiry was set based on
ra.orderLiftime while pending and valid authorization expiry was set
based on ra.pendingAuthorizationLifetime and
ra.authorizationLifetime. Since orders reused existing valid/pending
authorizations this can lead to a case where an order has an expiry
beyond the associated authorization expiries. In this case when an
authorization expired the order becomes inactionable and the extra order
lifetime is not useful.
This commit addresses this problem in two ways:
1. The SA GetAuthorizations function used to find authzs to reuse for
ra.NewOrder is adjusted to only return authorizations at min 24hr
away from expiry.
2. Order expiry is now calculated by the RA in newOrder
as the min of the order's own expiry or the soonest
authorization expiry. This properly reflects the order's true
lifetime based on the authorization lifetime.
The RA/SA unit tests are updated accordingly.
Resolves#3498
Adds SCT embedding to the certificate issuance flow. When a issuance is requested a precertificate (the requested certificate but poisoned with the critical CT extension) is issued and submitted to the required CT logs. Once the SCTs for the precertificate have been collected a new certificate is issued with the poison extension replace with a SCT list extension containing the retrieved SCTs.
Fixes#2244, fixes#3492 and fixes#3429.
Requesting a certificate with "*.example.com" and "www.example.com" as
separate SANs doesn't make sense, because "www.example.com" is covered
by the wildcard.
#3524
Right now we check safe browsing at new-authz time, which introduces a possible
external dependency when calling new-authz. This is usually fine, since most safe
browsing checks can be satisfied locally, but when requests have to go external,
it can create variance in new-authz timing.
Fixes#3491.
The logEvent setup we had in issueCertificate depending on all error
handling code setting logEvent.Error in addition to returning the error.
Since this is not the common pattern in Go, there were a few places
where logEvent.Error wasn't set, making it hard to find the root cause
of errors in the RA logs (though the errors would get propagated to the
WFE for logging). This change wraps issueCertificate such that all
errors returned get logged.
This commit updates the RA to make the notion of submitting
a KeyAuthorization value as part of the ra.UpdateAuthorization call
optional. If set, the value is enforced against expected and an error is
returned if the provided authorization isn't correct. If it isn't set
the RA populates the field with the computed authorization for the VA to
enforce against the value it sees in challenges. This retains the legacy
behaviour of the V1 API. The V2 API will never unmarshal a provided
key authorization.
The ACMEv2/WFEv2 prepChallengeForDisplay function is updated to strip
the ProvidedKeyAuthorization field before sending the challenge object
back to a client. ACMEv1/WFEv1 continue to return the KeyAuthorization
in challenges to avoid breaking clients that are relying on this legacy
behaviour.
For deployability ease this commit retains the name of the
core.Challenge.ProvidedKeyAuthorization field even though it should
be called core.Challenge.ComputedKeyAuthorization now that it isn't
set based on the client's provided key authz. This will be easier as
a follow-up change.
Resolves#3514
This race was uncovered by running the load generator as part of our CI.
Also, update ra_test.go. It was previously testing that the returned authz
and the stored authz should be identical, which is not actually a property
of UpdateAuthorization; in general, they will not be identical.
There's a minor data race in UpdateAuthorization where the authorization can
be changed in the goroutine concurrently with returning it. This fixes it by
copying the authorization as an argument to the goroutine.
Fixes#3506.
Previously we introduced the concept of a "pending orders per account
ID" rate limit. After struggling with making an implementation of this
rate limit perform well we reevaluated the problem and decided a "new
orders per account per time window" rate limit would be a better fit for
ACMEv2 overall.
This commit introduces the new newOrdersPerAccount rate limit. The RA
now checks this before creating new pending orders in ra.NewOrder. It
does so after order reuse takes place ensuring the rate limit is only
applied in cases when a distinct new pending order row would be created.
To accomplish this a migration for a new orders field (created) and an
index over created and registrationID is added. It would be possible to
use the existing expires field for this like we've done in the past, but that
was primarily to avoid running a migration on a large table in prod. Since
we don't have that problem yet for V2 tables we can Do The Right Thing
and add a column.
For deployability the deprecated pendingOrdersPerAccount code & SA
gRPC bits are left around. A follow-up PR will be needed to remove
those (#3502).
Resolves#3410
When we originally deployed the inline CT submission code, we wanted to
be conservative in case it increased our issuance error rate. However,
we've established that the success rate is quite good, so we'll remove
some complexity and make things more realistic by removing the code that
avoids returning errors when CT submission fails.
We occasionally get deadline exceeded errors inside RA's NewCertificate
method, however it's not clear which SA queries cause these: we just see
"deadline exceeded" errors. This prefixes errors from the most relevant
queries with details about the request being made.
Removes usage of the `EnforceChallengeDisable` feature, the feature itself is not removed as it is still configured in staging/production, once that is fixed I'll submit another PR removing the actual flag.
This keeps the behavior that when authorizations are retrieved from the SA they have their challenges populated, because that seems to make the most sense to me? It also retains TLS re-validation.
Fixes#3441.
Prior to this commit a logical error in the RA's `NewOrder` caused
a safety check that prevents authorization reuse with a non-wildcard
authz for a wildcard name to not work. This commit adds a test for the
condition that the safety check is designed for and fixes the logical
error. Prior to fixing the logical error the test fails. With the
corrected safety check the test passes.
Prior to this commit when building up the authorizations for a new-order
request we looked for any unexpired pending/valid authorizations owned
by the account and used them for the order. This allows a client to use
the V1 new-authz endpoint in combination with the V2 new-order endpoint
and we do not want to support this behaviour. All V2 authorizations
should be sourced from other V2 orders. This commit implements a new
parameter for the SA's getAuthorizations function that allows filtering
out legacy V1 authorizations by doing a JOIN on the order to
authorizations join table.
Resolves#3328
This commit resolves the case where an error during finalization occurs.
Prior to this commit if an error (expected or otherwise) occurred after
setting an order to status processing at the start of order
finalization the order would be stuck processing forever.
The SA now has a `SetOrderError` RPC that can be used by the RA to
persist an error onto an order. The order status calculation can use
this error to decide if the order is invalid. The WFE is updated to
write the error to the order JSON when displaying the order information.
Prior to this commit the order protobuf had the error field as
a `[]byte`. It doesn't seem like this is the right decision, we have
a specific protobuf type for ProblemDetails and so this commit switches
the error field to use it. The conversion to/from `[]byte` is done with
the model by the SA.
An integration test is included that prior to this commit left an order
in a stuck processing state. With this commit the integration test
passes as expected.
Resolves https://github.com/letsencrypt/boulder/issues/3403
The SA RPC previously called `GetOrderAuthorizations` only returns
**valid, unexpired** authorizations. This commit updates the name to
emphasize that it only returns valid order authzs.
This PR is a rework of what was originally https://github.com/letsencrypt/boulder/pull/3382, integrating the design feedback proposed by @jsha: https://github.com/letsencrypt/boulder/pull/3382#issuecomment-359912549
This PR removes the stored Order status field and replaces it with a value that is calculated on-the-fly by the SA when fetching an order, based on the order's associated authorizations.
In summary (and order of precedence):
* If any of the order's authorizations are invalid, the order is invalid.
* If any of the order's authorizations are deactivated, the order is deactivated.
* If any of the order's authorizations are pending, the order is pending.
* If all of the order's authorizations are valid, and there is a certificate serial, the order is valid.
* If all of the order's authorizations are valid, and we have began processing, but there is no certificate serial, the order is processing.
* If all of the order's authorizations are valid, and we haven't processing, then the order is pending waiting a finalization request.
This avoids having to explicitly update the order status when an associated authorization changes status.
The RA's implementation of new-order is updated to only reuse an existing order if the calculated status is pending. This avoids giving back invalid or deactivated orders to clients.
Resolves#3333
Fixes#3368.
Basically just adds a `csr.VerifyCSR` call in `ra.FinalizeOrder` that mirrors what we have in `ra.NewCertificate`, this moves the CN to SAN as expected if included.
Per
https://community.letsencrypt.org/t/2018-01-11-update-regarding-acme-tls-sni-and-shared-hosting-infrastructure/50188/3,
we are planning to treat prior issuance by an account as reason to whitelist
that account for reissuance via TLS-SNI. By extension, reusing validations that
occurred prior to disclosure of the TLS-SNI issue is reasonably safe, so this
change removes the issuance-time check for whether a challenge has been
disabled. This saves us significant complexity and database load in implementing
TLSSNIRevalidation (https://github.com/letsencrypt/boulder/pull/3361), since
ChallengeTypeEnabled returns false, so we'd have to plumb through data about
whether an issuance was based on a revalidation. Instead, we can safely delete
this code.
Note that "EnforceChallengeDisable" is implemented in three places: new-authz,
validation time, and issuance time. We're keeping it in place at new-authz for
now because it's intertwined with the account whitelisting code. We're keeping
it in place at validation time, because there's a small chance that someone
could have created a pending authz for a domain they don't control before the
TLS-SNI issue was announced, and that authz could still be pending, and they
could find out that that domain is hosted on a vulnerable provider, and use the
vulnerability now that they know about it. A tiny chance, but may as well be
careful.
This change adds a feature flag, TLSSNIRevalidation. When it is enabled, Boulder
will create new authorization objects with TLS-SNI challenges if the requesting
account has issued a certificate with the relevant domain name, and was the most
recent account to do so*. This setting overrides the configured list of
challenges in the PolicyAuthority, so even if TLS-SNI is disabled in general, it
will be enabled for revalidation.
Note that this interacts with EnforceChallengeDisable. Because
EnforceChallengeDisable causes additional checked at validation time and at
issuance time, we need to update those two places as well. We'll send a
follow-up PR with that.
*We chose to make this work only for the most recent account to issue, even if
there were overlapping certificates, because it significantly simplifies the
database access patterns and should work for 95+% of cases.
Note that this change will let an account revalidate and reissue for a domain
even if the previous issuance on that account used http-01 or dns-01. This also
simplifies implementation, and fits within the intent of the mitigation plan: If
someone previously issued for a domain using http-01, we have high confidence
that they are actually the owner, and they are not going to "steal" the domain
from themselves using tls-sni-01.
Also note: This change also doesn't work properly with ReusePendingAuthz: true.
Specifically, if you attempted issuance in the last couple days and failed
because there was no tls-sni challenge, you'll still have an http-01 challenge
lying around, and we'll reuse that; then your client will fail due to lack of
tls-sni challenge again.
This change was joint work between @rolandshoemaker and @jsha.
This updates the PA component to allow authorization challenge types that are globally disabled if the account ID owning the authorization is on a configured whitelist for that challenge type.
This commit adds pending order reuse. Subsequent to this commit multiple
add-order requests from the same account ID for the same set of order
names will result in only one order being created. Orders are only
reused while they are not expired. Finalized orders will not be reused
for subsequent new-order requests allowing for duplicate order issuance.
Note that this is a second level of reuse, building on the pending
authorization reuse that's done between separate orders already.
To efficiently find an appropriate order ID given a set of names,
a registration ID, and the current time a new orderFqdnSets table is
added with appropriate indexes and foreign keys.
Resolves#3258
This commit implements RFC 6844's description of the "CAA issuewild
property" for CAA records.
We check CAA in two places: at the time of validation, and at the time
of issuance when an authorization is more than 8hours old. Both
locations have been updated to properly enforce issuewild when checking
CAA for a domain corresponding to a wildcard name in a certificate
order.
Resolves https://github.com/letsencrypt/boulder/issues/3211
This commit adds a new rate limit to restrict the number of outstanding
pending orders per account. If the threshold for this rate limit is
crossed subsequent new-order requests will return a 429 response.
Note: Since this the rate limit object itself defines an `Enabled()`
test based on whether or not it has been configured there is **not**
a feature flag for this change.
Resolves https://github.com/letsencrypt/boulder/issues/3246
This PR implements issuance for wildcard names in the V2 order flow. By policy, pending authorizations for wildcard names only receive a DNS-01 challenge for the base domain. We do not re-use authorizations for the base domain that do not come from a previous wildcard issuance (e.g. a normal authorization for example.com turned valid by way of a DNS-01 challenge will not be reused for a *.example.com order).
The wildcard prefix is stripped off of the authorization identifier value in two places:
When presenting the authorization to the user - ACME forbids having a wildcard character in an authorization identifier.
When performing validation - We validate the base domain name without the *. prefix.
This PR is largely a rewrite/extension of #3231. Instead of using a pseudo-challenge-type (DNS-01-Wildcard) to indicate an authorization & identifier correspond to the base name of a wildcard order name we instead allow the identifier to take the wildcard order name with the *. prefix.
Makes a couple of changes:
* Change `SubmitToCT` to make submissions to each log in parallel instead of in serial, this prevents a single slow log from eating up the majority of the deadline and causing submissions to other logs to fail
* Remove the 'submissionTimeout' field on the publisher since it is actually bounded by the gRPC timeout as is misleading
* Add a timeout to the CT clients internal HTTP client so that when log servers hang indefinitely we actually do retries instead of just using the entire submission deadline. Currently set at 2.5 minutes
Fixes#3218.
This PR implements order finalization for the ACME v2 API.
In broad strokes this means:
* Removing the CSR from order objects & the new-order flow
* Adding identifiers to the order object & new-order
* Providing a finalization URL as part of orders returned by new-order
* Adding support to the WFE's Order endpoint to receive finalization POST requests with a CSR
* Updating the RA to accept finalization requests and to ensure orders are fully validated before issuance can proceed
* Updating the SA to allow finding order authorizations & updating orders.
* Updating the CA to accept an Order ID to log when issuing a certificate corresponding to an order object
Resolves#3123
This commit adds CAA `issue` paramter parsing and the `challenge` parameter to permit a single challenge type only. By setting `challenge=dns-01`, the nameserver keeps control over every issued certificate.
These were added as part of #62,
based on the original CPS at
https://letsencrypt.org/documents/ISRG-CPS-May-5-2015.pdf. Request method was an
odd thing to log because for Let's Encrypt it will always be "online", never "in
person." And VerificationMethods is logged separately during the authz
validation process. The newest CPS at
https://letsencrypt.org/documents/isrg-cps-v2.0/ no longer requires these
specific fields, so we're removing them for clarity.
For the new-order endpoint only. This does some refactoring of the order of operations in `ra.NewAuthorization` as well in order to reduce the duplication of code relating to creating pending authorizations, existing tests still seem to work as intended... A close eye should be given to this since we don't have integration tests yet that test it end to end. This also changes the inner type of `grpc.StorageAuthorityServerWrapper` to `core.StorageAuthority` so that we can avoid a circular import that is created by needing to import `grpc.AuthzToPB` and `grpc.PBToAuthz` in `sa/sa.go`.
This is a big change but should considerably improve the performance of the new-order flow.
Fixes#2955.
The RA's `checkCertificatesPerFQDNSetLimit` function was using `>` where
it should have been using `>=` when evaluating a FQDNSet count against
the rate limit threshold. This resulted in an off by one error where we
allowed 1 more duplicate certificate than intended.
This commit fixes the off-by-one error and adds a short unit test. The
unit test failed the
`TestCheckExactCertificateLimit/FQDN_set_issuances_equal_to_limit`
subtest before the fix was applied and passes afterwards.
Previously, CAA problems were lumped in under "ConnectionProblem" or
"Unauthorized". This should make things clearer and easier to differentiate.
Fixes#3043
RA.NewRegistration checked for an error return from
SA.NewRegistration, but failed to properly propagate
the error. It was just setting the err variable and falling
through to the successful return case. This fixes that
issue and adds a unittest.
This commit adds a new boulder error type WrongAuthorizationState.
This error type is returned by the SA when UpdateAuthorization is
provided an authz that isn't pending. The RA and WFE are updated
accordingly such that this error percolates back through the API to the
user as a Malformed problem with a sensible description. Previously this
behaviour resulted in a ServerInternal error.
Resolves#3032
* Remove all of the errors under core. Their purpose is now served by errors, and they were almost entirely unused. The remaining uses were switched to errors.
* Remove errors.NotSupportedError. It was used in only one place (ca.go), and that usage is more appropriately a ServerInternal error.
ConnectionFailure is only used during validation, and so isn't handled by WFE's
problemDetailsFromBoulderError. This led to returning ServerInternal instead of
the intended error code, and hiding the error detail. Unauthorized is probably
a better error type for now anyhow, but long-term we should switch to a specific
CAA error type.
This PR will allow clients to see the detailed list of problem domains when
new-cert returns an error due to CAA rechecking.
This is a followup from https://github.com/letsencrypt/boulder/pull/3017, in
which we identified a data race caused by the use of named returns. This also
reverts the change from that PR, which was only a surface level fix.
Fixes#3019.
Fixes#2889.
VA now implements two gRPC services: VA and CAA. These both run on the same port, but this allows implementation of the IsCAAValid RPC to skip using the gRPC wrappers, and makes it easier to potentially separate the service into its own package in the future.
RA.NewCertificate now checks the expiration times of authorizations, and will call out to VA to recheck CAA for those authorizations that were not validated recently enough.
The existing ReusePendingAuthz implementation had some bugs:
It would recycle deactivated authorizations, which then couldn't be fulfilled. (#2840)
Since it was implemented in the SA, it wouldn't get called until after the RA checks the Pending Authorizations rate limit. Which means it wouldn't fulfill its intended purpose of making accounts less likely to get stuck in a Pending Authorizations limited state. (#2831)
This factors out the reuse functionality, which used to be inside an "if" statement in the SA. Now the SA has an explicit GetPendingAuthorization RPC, which gets called from the RA before calling NewPendingAuthorization. This happens to obsolete #2807, by putting the recycling logic for both valid and pending authorizations in the RA.
This is a step towards the long-term goal of eliminating wrappers and a step
towards the short-term goal of making it easier to refactor ca/ca_test.go to
add testing of precertificate-based issuance.
Fixes#639.
This resolves something that has bugged me for two+ years, our DNSResolverImpl is not a DNS resolver, it is a DNS client. This change just makes that obvious.
When the RA's `validateEmail` function performs DNS lookups for the MX
and A records for the email domain it returns the
`emptyDNSResponseError` message if neither an MX or A record are found.
Prior to this commit the message only said "empty DNS response" and
didn't indicate what the DNS lookup was being used for, or which records
were empty. This commit updates the message to hopefully be less
confusing and more immediately actionable by a user.
Prior to this PR the SA's `CountRegistrationsByIP` treated IPv6
differently than IPv4 by counting registrations within a /48 for IPv6 as
opposed to exact matches for IPv4. This PR updates
`CountRegistrationsByIP` to treat IPv4 and IPv6 the
same, always matching exactly. The existing RegistrationsPerIP rate
limit policy will be applied against this exact matching count.
A new `CountRegistrationsByIPRange` function is added to the SA that
performs the historic matching process, e.g. for IPv4 it counts exactly
the same as `CountRegistrationsByIP`, but for IPv6 it counts within
a /48.
A new `RegistrationsPerIPRange` rate limit policy is added to allow
configuring the threshold/window for the fuzzy /48 matching registration
limit. Stats for the "Exceeded" and "Pass" events for this rate limit are
separated into a separate `RegistrationsByIPRange` stats scope under
the `RateLimit` scope to allow us to track it separate from the exact
registrations per IP rate limit.
Resolves https://github.com/letsencrypt/boulder/issues/2738
With the CountCertificatesExact feature flag enabled if the RA's
checkCertificatesPerNameLimit was called with names only containing
domains exactly matching a public suffix entry then the legacy
ra.enforceNameCounts function will be called with an empty tldNames
argument. This in turn will cause the RA->SA RPC to fail with an
"incomplete gRPC request message error".
This commit fixes this bug by only calling ra.enforceNameCounts when
len(tldNames) > 0.
Resolves#2758
Both `ra.domainsForRateLimiting` and `ra.suffixesForRateLimiting` were
doing their own "unique"ing with a `map[string]struct{}` when they could
have used `core.UniqueLowerNames`. This commit updates them both to do
so and adjusts the unit tests to expect the new sorted order return.
Prior to this PR if a domain was an exact match to a public suffix
list entry the certificates per name rate limit was applied based on the
count of certificates issued for that exact name and all of its
subdomains.
This PR introduces an exception such that exact public suffix
matches correctly have the certificate per name rate limit applied based
on only exact name matches.
In order to accomplish this a new RPC is added to the SA
`CountCertificatesByExactNames`. This operates similar to the existing
`CountCertificatesByNames` but does *not* include subdomains in the
count, only exact matches to the names provided. The usage of this new
RPC is feature flag gated behind the "CountCertificatesExact" feature flag.
The RA unit tests are updated to test the new code paths both with and
without the feature flag enabled.
Resolves#2681
This change changes the returning values from boolean to error.
It makes `checkConsistency` an internal function and removes the
optional argument in favor of making checks explicit where they are
used.
It also renames those functions to CheckConsistency* to not
give the impression of still returning boolean values.
Signed-off-by: David Calavera <david.calavera@gmail.com>
Previously RegistrationAuthorityImpl.NewCertificate would call
MatchesCSR() and then verify that the certificate can be successfully
parsed. However, MatchesCSR() itself parses the certificate, so the
latter check was pointless.
Instead, parse the certificate once, fail if it can't be parsed,
then pass the parsed certificate to MatchesCSR().
0e112ae updates ra/ra.go such that when onValidationUpdate returns a non-nil error the AuditErr
message includes the affected authorization ID in addition to the registration ID.
Resolves#2661.
This patch removes all usages of the `core.XXXError` and almost all usages of `probs` outside of the WFE and VA and replaces them with a unified internal error type. Since the VA uses `probs.ProblemDetails` quite extensively in challenges, and currently stores them in the DB I've saved this change for another change (it'll also require a migration). Since `ProblemDetails` should only ever be exposed to end-users all of its related logic should be moved into the `WFE` but since it still needs to be exposed to the VA and SA I've left it in place for now.
The new internal `errors` package offers the same convenience functions as `probs` does as well as a new simpler type testing method. A few small changes have also been made to error messages, mainly adding the library and function name to internal server errors for easier debugging (i.e. where a number of functions return the exact same errors and there is no other way to distinguish which method threw the error).
Also adds proper encoding of internal errors transferred over gRPC (the current encoding scheme is kept for `core` and `probs` errors since it'll be ideally be removed after we deploy this and follow-up changes) using `grpc/metadata` instead of the gRPC status codes.
Fixes#2507. Updates #2254 and #2505.
Fixes#976.
This implements a new rate limit, InvalidAuthorizationsPerAccount. If a given account fails authorization for a given hostname too many times within the window, subsequent new-authz attempts for that account and hostname will fail early with a rateLimited error. This mitigates the misconfigured clients that constantly retry authorization even though they always fail (e.g., because the hostname no longer resolves).
For the new rate limit, I added a new SA RPC, CountInvalidAuthorizations. I chose to implement this only in gRPC, not in AMQP-RPC, so checking the rate limit is gated on gRPC. See #2406 for some description of the how and why. I also chose to directly use the gRPC interfaces rather than wrapping them in core.StorageAuthority, as a step towards what we will want to do once we've moved fully to gRPC.
Because authorizations don't have a created time, we need to look at the expires time instead. Invalid authorizations retain the expiration they were given when they were created as pending authorizations, so we use now + pendingAuthorizationLifetime as one side of the window for rate limiting, and look backwards from there. Note that this means you could maliciously bypass this rate limit by stacking up pending authorizations over time, then failing them all at once.
Similarly, since this limit is by (account, hostname) rather than just (hostname), you can bypass it by creating multiple accounts. It would be more natural and robust to limit by hostname, like our certificate limits. However, we currently only have two indexes on the authz table: the primary key, and
(`registrationID`,`identifier`,`status`,`expires`)
Since this limit is intended mainly to combat misconfigured clients, I think this is sufficient for now.
Corresponding PR for website: letsencrypt/website#125
We turn arrays into maps with a range command. Previously, we were taking the
address of the iteration variable in that range command, which meant incorrect
results since the iteration variable gets reassigned.
Also change the integration test to catch this error.
Fixes#2496
We have a number of stats already expressed using the statsd interface. During
the switchover period to direct Prometheus collection, we'd like to make those
stats available both ways. This change automatically exports any stats exported
using the statsd interface via Prometheus as well.
This is a little tricky because Prometheus expects all stats to by registered
exactly once. Prometheus does offer a mechanism to gracefully recover from
registering a stat more than once by handling a certain error, but it is not
safe for concurrent access. So I added a concurrency-safe wrapper that creates
Prometheus stats on demand and memoizes them.
In the process, made a few small required side changes:
- Clean "/" from method names in the gRPC interceptors. They are allowed in
statsd but not in Prometheus.
- Replace "127.0.0.1" with "boulder" as the name of our testing CT log.
Prometheus stats can't start with a number.
- Remove ":" from the CT-log stat names emitted by Publisher. Prometheus stats
can't include it.
- Remove a stray "RA" in front of some rate limit stats, since it was
duplicative (we were emitting "RA.RA..." before).
Note that this means two stat groups in particular are duplicated:
- Gostats* is duplicated with the default process-level stats exported by the
Prometheus library.
- gRPCClient* are duplicated by the stats generated by the go-grpc-prometheus
package.
When writing dashboards and alerts in the Prometheus world, we should be careful
to avoid these two categories, as they will disappear eventually. As a general
rule, if a stat is available with an all-lowercase name, choose that one, as it
is probably the Prometheus-native version.
In the long run we will want to create most stats using the native Prometheus
stat interface, since it allows us to use add labels to metrics, which is very
useful. For instance, currently our DNS stats distinguish types of queries by
appending the type to the stat name. This would be more natural as a label in
Prometheus.
This commit resolves#2303 by updating the comment, and returned error type produced when the RA calls `SA.UpdatePendingAuthorization` and it fails.
Previously this produced a `MalformedRequestError` that was described as only happening when the client corrupted the challenge data.
Now this is returned as more descriptive `ServerInternalError` and the underlying error from the SA is logged as a warning for further debugging.
With the current gRPC design the CA talks directly to the Publisher when calling SubmitToCT which crosses security bounadries (secure internal segment -> internet facing segment) which is dangerous if (however unlikely) the Publisher is compromised and there is a gRPC exploit that allows memory corruption on the caller end of a RPC which could expose sensitive information or cause arbitrary issuance.
Instead we move the RPC call to the RA which is in a less sensitive network segment. Switching the call site from the CA -> RA is gated on adding the gRPC PublisherService object to the RA config.
Fixes#2202.
This commit updates the `go-jose` dependency to [v1.1.0](https://github.com/square/go-jose/releases/tag/v1.1.0) (Commit: aa2e30fdd1fe9dd3394119af66451ae790d50e0d). Since the import path changed from `github.com/square/...` to `gopkg.in/square/go-jose.v1/` this means removing the old dep and adding the new one.
The upstream go-jose library added a `[]*x509.Certificate` member to the `JsonWebKey` struct that prevents us from using a direct equality test against two `JsonWebKey` instances. Instead we now must compare the inner `Key` members.
The `TestRegistrationContactUpdate` function from `ra_test.go` was updated to populate the `Key` members used in testing instead of only using KeyID's to allow the updated comparisons to work as intended.
The `Key` field of the `Registration` object was switched from `jose.JsonWebKey` to `*jose.JsonWebKey ` to make it easier to represent a registration w/o a Key versus using a value with a nil `JsonWebKey.Key`.
I verified the upstream unit tests pass per contributing.md:
```
daniel@XXXXX:~/go/src/gopkg.in/square/go-jose.v1$ git show
commit aa2e30fdd1fe9dd3394119af66451ae790d50e0d
Merge: 139276c e18a743
Author: Cedric Staub <cs@squareup.com>
Date: Thu Sep 22 17:08:11 2016 -0700
Merge branch 'master' into v1
* master:
Better docs explaining embedded JWKs
Reject invalid embedded public keys
Improve multi-recipient/multi-sig handling
daniel@XXXXX:~/go/src/gopkg.in/square/go-jose.v1$ go test ./...
ok gopkg.in/square/go-jose.v1 17.599s
ok gopkg.in/square/go-jose.v1/cipher 0.007s
? gopkg.in/square/go-jose.v1/jose-util [no test files]
ok gopkg.in/square/go-jose.v1/json 1.238s
```
This PR reworks the validateEmail() function from the RA to allow timeouts during DNS validation of MX/A/AAAA records for an email to be non-fatal and match our intention to verify emails best-effort.
Notes:
bdns/problem.go - DNSError.Timeout() was changed to also include context cancellation and timeout as DNS timeouts. This matches what DNSError.Error() was doing to set the error message and supports external callers to Timeout not duplicating the work.
bdns/mocks.go - the LookupMX mock was changed to support always.error and always.timeout in a manner similar to the LookupHost mock. Otherwise the TestValidateEmail unit test for the RA would fail when the MX lookup completed before the Host lookup because the error wouldn't be correct (empty DNS records vs a timeout or network error).
test/config/ra.json, test/config-next/ra.json - the dnsTries and dnsTimeout values were updated such that dnsTries * dnsTimeout was <= the WFE->RA RPC timeout (currently 15s in the test configs). This allows the dns lookups to all timeout without the overall RPC timing out.
Resolves#2260.
The RA performs an RPC to the SA's `GetValidAuthorizations` function when attempting to find existing valid authorizations to reuse. Prior to this commit, ff the RPC fails (e.g. due to a timeout) the calling code
logs the failure as a warning but fails to return the error and cease processing. This results in a nil panic when we later try to index `auths`
This commit inserts the missing `return` to ensure we don't process further, thereby resolving #2274.
A test for this fix is provided with `TestReuseAuthorizationFaultySA`. Without f52f340 applied this test recreates the panic observed in #2274 and produces:
```
go test -p 1 -v -race --test.run TestReuseAuthorizationFaultySA github.com/letsencrypt/boulder/ra
=== RUN TestReuseAuthorizationFaultySA
--- FAIL: TestReuseAuthorizationFaultySA (0.04s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x20 pc=0x4be2b8]
```
With f52f340 it passes. Yay!
Fixes#503.
Functionality is gated by the feature flag `AllowKeyRollover`. Since this functionality is only specified in ACME draft-03 and we mostly implement the draft-02 style this takes some liberties in the implementation, which are described in the updated divergences doc. The `key-change` resource is used to side-step draft-03 `url` requirement.
Fixes#140.
This patch allows users to specify the following revocation reasons based on my interpretation of the meaning of the codes but could use confirmation from others.
* unspecified (0)
* keyCompromise (1)
* affiliationChanged (3)
* superseded (4)
* cessationOfOperation (5)
Part of #2080.
This change vendors `crypto/x509`, `crypto/x509/pkix`, and `encoding/asn1` from 1d5f6a765d. That commit is a direct child of the Go 1.5.4 release tag, so it contains the same code as the current Go version we are using. In that commit I rewrote imports in those packages so they depend on each other internally rather than calling out to the standard library, which would cause type disagreements.
I changed the imports in each place where we're parsing CSRs, and imported under a different name `oldx509`, both to avoid collisions and make it clear what's going on. Places that only use `x509` to parse certificates are not changed, and will use the current standard library.
This will unblock us from moving to Go 1.6, and subsequently Go 1.7.
Introduces the `authorizationLifetimeDays` and `pendingAuthorizationLifetimeDays` configuration options for `RA`.
If the values are missing from configuration, the code defaults back to the current values (300/7 days).
fixes#2024
This PR adds two optimizations to fix the optimistic lock errors observed in #1986.
First, the WFE now returns early for registration POST's (before invoking the RA and SA) when the POST body is the trivial update (`{"resource":"reg"}`). This prevents any DB operations from being performed when there is no work to be done.
Second, the RA now tracks whether a update actually changes the base registration's `Contact` slice, or `Agreement` string. If the proposed update doesn't change either of these fields then the RA will return early before handing the update to the SA.
Both changes save database operations from being performed needlessly and will help avoid the optimistic lock errors we observed when a problematic client was POSTing the trivial update repeatedly in a short period.
The fix was verified as follows: I checked out master and artificially introduced lock contention into the SA by adding a 2s sleep into `UpdateRegistration` between fetching the `existingRegModel` to get the `LockCol` value and calling `ssa.dbMap.Update`. With the sleep in place & two certbot clients posting matching registration updates the lock contention error is produced as expected. After checking out the `empty-reg-updates` branch, re-adding the sleep to the SA, and performing the same two client reg updates no error is produced.
This PR replaces the `x/net/publicsuffix` package with `weppos/publicsuffix-go`.
The conversations that leaded to this decision are #1479 and #1374. To summarize the discussion, the main issue with `x/net/publicsuffix` is that the package compiles the list into the Go source code and doesn't provide a way to easily pull updates (e.g. by re-parsing the original PSL) unless the entire package is recompiled.
The PSL update frequency is almost daily, which makes very hard to recompile the official Golang package to stay up-to-date with all the changes. Moreover, Golang maintainers expressed some concerns about rebuilding and committing changes with a frequency that would keep the package in sync with the original PSL. See https://github.com/letsencrypt/boulder/issues/1374#issuecomment-182429297
`weppos/publicsuffix-go` contains a compiled version of the list that is updated weekly (or more frequently). Moreover, the package can read and parse a PSL from a String or a File which will effectively decouple the Boulder source code with the list itself. The main benefit is that it will be possible to update the definition by simply downloading the latest list and restarting the application (assuming the list is persisted in memory).
This PR adds a expvar.Int published under the key "lastIssuance" that contains the timestamp of the last successful certificate issuance. This allows easy creation of a script that monitors the RA debug server (port 8002) to ensure that there has been a successful issuance within a set period (e.g. last five minutes).
The underlying expvar.Int code uses the atomic package to ensure safe updates/reads across multiple goroutines.
This resolves#1945 and was selected in place of the more complex circular bucket design. While the timestamp approach doesn't provide the issuance volume as readily it is less complex and meets the immediate need of a reliable external monitoring process hook.
https://github.com/letsencrypt/boulder/pull/1982
The `regID` parameter in the PA's `WillingToIssue` function was originally used for whitelisting purposes, but is not used any longer. This PR removes it.
This PR, adds a check in registration authority for non-ASCII encoded characters in an email address. This is due to a 'funky email implementation'.
Fixes#1350
In https://github.com/letsencrypt/boulder/pull/774 we introduced and account key stored with the challenge. This was a stopgap fix to the now-defunct SimpleHTTP and DNS challenges in the face of https://mailarchive.ietf.org/arch/msg/acme/F71iz6qq1o_QPVhJCV4dqWf-4Yc. However, we no longer offer or implement those challenges, so the extra field is unnecessary. It also take up a huge amount of space in the challenges table, which is our biggest table. SimpleHTTP and DNS challenges were removed in https://github.com/letsencrypt/boulder/pull/1247.
We can provide a follow-up migration to delete the column later, once we have a plan for large migrations without downtime.
Fixes#1909
As reported in #1925 the Certificates per Domain rate limit was being
incorrectly enforced on certificate renewals for FQDN sets that have
been previously issued. This is counter to the described rate
limit policies[0] that detail a separate rate limit for certificates
issued for the "exact same set of Fully Qualified Domain Names".
The bug was caused by the result of `domainsForRateLimiting` overwriting
the original `names []string` provided to the RA's
`checkCertificatesPerNameLimit` function. This meant instead of looking
for an existing FQDN set for the full set of domain names being
requested we checked for an FQDN set for just the eTLD+1's of the
domains. (e.g "www.example.com, foo.example.com, bar.example.com" vs
"example.com").
This commit preserves the original `names` values for doing an FQDN set
lookup and uses the `tldNames` from `domainsForRateLimiting` elsewhere.
This fixes#1925.
A test is added to ensure that `checkCertificatesPerNameLimit` does the
correct thing both with and without an existing FQDN set.
[0] https://community.letsencrypt.org/t/rate-limits-for-lets-encrypt/6769
The RA UpdateRegistration function merges a base registration object with an update by calling Registration.MergeUpdate. Prior to this commit MergeUpdate only allowed the updated registration object to overwrite the Contact field of the existing registration if the updated reg. defined at least one AcmeURL. This prevented clients from being able to outright remove the contact associated with an existing registration.
This commit removes the len() check on the input.Contact in MergeUpdate to allow the r.Contact field to be overwritten by a []*core.AcmeURL(nil) Contact field. Subsequently clients can now send an empty contacts list in the update registration POST in order to remove their reg contact.
Fixes#1846
* Allow removing registration contact.
* Adds a test for `MergeUpdate` contact removal.
* Change `Registration.Contact` type to `*[]*core.AcmeURL`.
* End validateContacts early for empty contacts
* Test removing reg. contact more thoroughly.
Presently clients may request a new AuthZ be created for a domain that they have already proved authorization over. This results in unnecessary bloat in the authorizations table and duplicated effort.
This commit alters the `NewAuthorization` function of the RA such that before going through the work of creating a new AuthZ it checks whether there already exists a valid AuthZ for the domain/regID that expires in more than 24 hours from the current date. If there is, then we short circuit creation and return the existing AuthZ. When this case occurs the `RA.ReusedValidAuthz` counter is incremented to provide visibility.
Since clients requesting a new AuthZ and getting an AuthZ back expect to turn around and post updates to the corresponding challenges we also return early in `UpdateAuthorization` when asked to update an AuthZ that is already valid. When this case occurs the `RA.ReusedValidAuthzChallenge` counter is incremented.
All of the above behaviour is gated by a new RA config flag `reuseValidAuthz`. In the default case (false) the RA does **not** reuse any AuthZ's and instead maintains the historic behaviour; always creating a new AuthZ when requested, irregardless of whether there are already valid AuthZ's that could be reused. In the true case (enabled only in `boulder-config-next.json`) the AuthZ reuse described above is enabled.
Resolves#1854
Resolves#1810 by automatically updating the RA ratelimit.RateLimitConfig whenever the backing config file is changed. Much like the Policy Authority uses a reloader instance to support updating the Hostname policy on the fly, this PR changes the Registration Authority to use a reloader for the rate limit policy file.
Access to the ra.rlPolicies member is protected with a RWMutex now that there is a potential for the values to be reloaded while a reader is active.
A test is introduced to ensure that writing a new policy YAML to the policy config file results in new values being set in the RA's rlPolicies instance.
https://github.com/letsencrypt/boulder/pull/1894