When the `features.PrecertificateRevocation` feature flag is enabled the WFE2
will allow revoking certificates for a submitted precertificate. The legacy WFE1
behaviour remains unchanged (as before (pre)certificates issued through the V1
API will be revocable with the V2 API).
Previously the WFE2 vetted the certificate from the revocation request by
looking up a final certificate by the serial number in the requested
certificate, and then doing a byte for byte comparison between the stored and
requested certificate.
Rather than adjust this logic to handle looking up and comparing stored
precertificates against requested precertificates (requiring new RPCs and an
additional round-trip) we choose to instead check the signature on the requested
certificate or precertificate and consider it valid for revocation if the
signature validates with one of the WFE2's known issuers. We trust the integrity
of our own signatures.
An integration test that performs a revocation of a precertificate (in this case
one that never had a final certificate issued due to SCT embedded errors) with
all of the available authentication mechanisms is included.
Resolves https://github.com/letsencrypt/boulder/issues/4414
This change adds two tables and two methods in the SA, to store precertificates
and serial numbers.
In the CA, when the feature flag is turned on, we generate a serial number, store it,
sign a precertificate and OCSP, store them, and then return the precertificate. Storing
the serial as an additional step before signing the certificate adds an extra layer of
insurance against duplicate serials, and also serves as a check on database availability.
Since an error storing the serial prevents going on to sign the precertificate, this decreases
the chance of signing something while the database is down.
Right now, neither table has read operations available in the SA.
To make this work, I needed to remove the check for duplicate certificateStatus entry
when inserting a final certificate and its OCSP response. I also needed to remove
an error that can occur when expiration-mailer processes a precertificate that lacks
a final certificate. That error would otherwise have prevented further processing of
expiration warnings.
Fixes#4412
This change builds on #4417, please review that first for ease of review.
Include identifierType in queries so that the regID_identifier_status_expires_idx index is properly utilized. Did a once over of the other authz2 queries to verify we are properly using their indexes as well and everything else looks like it is working as intended.
In getAllOrderAuthorizationStatuses, we were using a transaction for a series
of SELECTs. Since these SELECTs don't need to be strongly consistent with
each other, that creates needless locking and round trips.
In the current SA code, we need to remember to call Rollback on any error.
If we don't, we'll leave dangling transactions, which are hard to spot but eventually
clog up the database and cause availability problems.
This change attempts to deal with rollbacks more rigorously, by implementing a
withTransaction function that takes a closure as input. withTransaction opens
a transaction, applies a context.Context to it, and then runs the closure. If the
closure returns an error, withTransaction rolls back and return the error; otherwise
it commits and returns nil.
One of the quirks of this implementation is that it relies on the closure modifying
variables from its parent scope in order to return values. An alternate implementation
could define the return value of the closure as interface{}, nil, and have the calling
function do a type assertion. I'm seeking feedback on that; not sure yet which is cleaner.
This is a subset of the functions that need this treatment. I've got more coming, but
some of the changes break tests so I'm checking into why.
Updates #4337
For authzv1, this actually executes a SQL DELETE for the unused challenges
when an authorization is updated upon validation.
For authzv2, this doesn't perform a delete, but changes the authorizations that
are returned so they don't include unused challenges.
In order to test the flag for both authz storage models, I set the feature flag in
both config/ and config-next/.
Fixes#4352
In #4331 I introduced this new more efficient query for
GetOrderForNames, and commented about why we needed an ORDER BY... ASC
to efficiently use the index. However, the actually query did not match
the comment, and it used DESC. This fixes the query.
To demonstrate that the index is actually used with the ASC version,
here's the EXPLAIN output after filling up the table with a bunch of
failed orders:
MariaDB [boulder_sa_integration]> explain select orderID, registrationID FROM orderFqdnSets
-> WHERE setHash = UNHEX('B60FE34E4A6735D5A575D81C97F4DFED2102DC179B34252E4AA18F6E2A375C98')
-> AND expires > NOW() ORDER BY EXPIRES ASC LIMIT 1 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: orderFqdnSets
type: range
possible_keys: setHash_expires_idx
key: setHash_expires_idx
key_len: 37
ref: NULL
rows: 1500
Extra: Using index condition
1 row in set (0.000 sec)
MariaDB [boulder_sa_integration]> explain select orderID, registrationID FROM orderFqdnSets
-> WHERE setHash = UNHEX('B60FE34E4A6735D5A575D81C97F4DFED2102DC179B34252E4AA18F6E2A375C98')
-> AND expires > NOW() ORDER BY EXPIRES DESC LIMIT 1 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: orderFqdnSets
type: range
possible_keys: setHash_expires_idx
key: setHash_expires_idx
key_len: 37
ref: NULL
rows: 1500
Extra: Using where
1 row in set (0.000 sec)
In the case where the DB `Select()` returns a non-nil `err` result the
SA's `getAllOrderAuthorizationStatuses` function needs to ensure it
rolls back the transaction it opened or it will be leaked.
This rolls forward #4326 after it was reverted in #4328.
Resolves https://github.com/letsencrypt/boulder/issues/4329
The older query didn't have a `LIMIT 1` so it was returning multiple results,
but gorp's `SelectOne` was okay with multiple results when the selection was
going into an `int64`. When I changed this to a `struct` in #4326, gorp started
producing errors.
For this bug to manifest, an account needs to create an order, then fail
validation, twice in a row for a given domain name, then create an order once
more for the same domain name - that third request will fail because there are
multiple orders in the orderFqdnSets table for that domain.
Note that the bug condition doesn't happen when an account does three successful
issuances in a row, because finalizing an order (that is, issuing a certificate
for it) deletes the row in orderFqdnSets. Failing an authorization does not
delete the row in orderFqdnSets. I believe this was an intentional design
decision because an authorization can participate in many orders, and those
orders can have many other authorizations, so computing the updated state of
all those orders would be expensive (remember, order state is not persisted in
the DB but is calculated dynamically based on the authorizations it contains).
This wasn't detected in integration tests because we don't have any tests that
fail validation for the same domain multiple times. I filed an issue for an
integration test that would have incidentally caught this:
https://github.com/letsencrypt/boulder/issues/4332. There's also a more specific
test case in #4331.
This reverts commit 9fa360769e.
This commit can cause "gorp: multiple rows returned for: ..." under certain situations.
See #4329 for details of followup.
When there are a lot of potential orders to reuse, the query could scan
unnecessary rows, sometimes leading to timeouts. The new query used
when the FasterGetOrderForNames feature flag is enabled uses the
available index more effectively and adds a LIMIT clause.
In #4179 we added a different method of counting the certificatesPerName
rate limit that can provide the correct behavior for exact public suffix
matches without the need for a separate RPC call. This cleans up the
separate code paths in the SA and RA that are no longer necesary.
This is a holdover from one of the tables (I think orderToAuthz2?) used a string
authorization ID and we needed to convert in order to get what we wanted but
apparently I never cleaned it up when we switched to integers for both tables.
What is really confusing here is why we ever needed a CONVERT in the first place
if Maria is happy to arbitrarily compare strings to integers itself... what fun.
Add flag to explicitly disable orders containing authz2 authorizations. After looking at a handful of much more complex solutions this feels like the best option. With NewAuthorizationSchema disabled and DisableAuthz2Orders enabled any requests for orders that include authz2 authorizations will return a 404 (where previously they would return a 500).
Fixes#4263.
Also fixes a minor bug where `sa.UpdateRegistration` didn't properly check a
returned error. If a `errors.Duplicate` type error is returned in either `KeyRollover`/
`Newaccount` in wfe2 or `NewRegistration` in wfe during the update/insert step
the account info/pointer will be returned instead of an internal server error.
Fixes#3000.
This will allow implementing sub-problems without creating a cyclic
dependency between `core` and `problems`.
The `identifier` package is somewhat small/single-purpose and in the
future we may want to move more "ACME" bits beyond the `identifier`
types into a dedicated package outside of `core`.
The new `RevokeAuthorizationsByDomain2` and `DeactivateAuthorization2`
SA RPCs are declared to return `*corepb.Empty, error` but were
implemented to `return nil, nil` for the success case. This causes
a gRPC unmarshal error:
```
E151233 boulder-sa [AUDIT] grpc: server failed to encode response: rpc
error: code = Internal desc = grpc: error while marshaling: proto:
Marshal called with nil
```
Both RPCs should `return &corepb.Empty{}, nil` to avoid this.
Right now we run a `SELECT COUNT` query on issuedNames to calculate this rate limit.
Unfortunately, counting large numbers of rows is very slow. This change introduces a
dedicated table for keeping track of that rate limit.
Instead of being keyed by FQDN, this new `certificatesPerName` table is keyed by
the same field as rate limits are calculated on: the base domain. Each row has a base
domain, a time (chunked by hour), and a count. This means calculating the rate limit
status for each domain reads at most 7 * 24 rows.
This should particularly speed up cases when a single domain has lots of subdomains.
Fixes#4152
This PR implements new SA methods for handling authz2 style authorizations and updates existing SA methods to count and retrieve them where applicable when the `NewAuthorizationSchema` feature is enabled.
Fixes#4093Fixes#4082
Updates #4078
Updates #4077
Previously only "pending" orders were returned by `sa.GetOrderForNames` for the RA to reuse in `ra.NewOrder`. When we added the "ready" status to match late RFC 8555 developments we forgot to update `GetOrderForNames` to return "ready" orders. Prior to the "ready" status existing a fully validated order would have been "pending" and reused. This branch updates the reuse logic to restore reuse of validated orders.
Resolves https://github.com/letsencrypt/boulder/issues/4117
FinalizeAuthorization deleted from pendingAuthorizations and then added
to authz. DeactivateAuthorization did it in opposite order. This tweaks
them so they always do the insert / delete in the same order as each
other, to avoid deadlocks.
Go 1.11+ updated the `sql.DBStats` struct with new fields that are of
interest to us. This PR routes these stats to Prometheus by replacing
the existing autoprom stats code with new first-class Prometheus
metrics. Resolves https://github.com/letsencrypt/boulder/issues/4095
The `max_db_connections` stat from the SA is removed because the Go 1.11+
`sql.DBStats.MaxOpenConnections` field will give us a better view of
the same information.
The autoprom "reused_authz" stat that was being incremented in
`SA.GetPendingAuthorization` was also removed. It wasn't doing what it
says it was (counting reused authorizations) and was instead counting
the number of times `GetPendingAuthorization` returned an authz.
This changeset implements the logic required for the WFE to retrieve v2 authorizations and their associated challenges while still maintaining the logic to retrieve old authorizations/challenges. Challenge IDs for v2 authorizations are obfuscated using a pretty simply scheme in order to prevent hard coding of indexes. A `V2` field is added to the `core.Authorization` object and populated using the existing field of the same name from the protobuf for convenience. v2 authorizations and challenges use a `v2` prefix in all their URLs in order to easily differentiate between v1 and v2 URLs (e.g. `/acme/authz/v2/asd` and `/acme/challenge/v2/asd/123`), once v1 authorizations cease to exist this prefix can be safely removed. As v2 authorizations use int IDs this change switches from string IDs to int IDs, this mainly only effects tests.
Integration tests are put off for #4079 as they really need #4077 and #4078 to be properly effective.
Fixes#4041.
The existing (but undeployed) `AllowRenewalFirstRL` feature flag is used to gate whether the SA `CountCertificatesByNames` and `CountCertificatesByExactNames` RPCs will exclude renewals from the returned counts using the `issuedNames` table's `renewal` field.
The previous implementation of `AllowRenewalFirstRL` is deleted. It wasn't performant in specific corner cases.
There's no new integration test in this branch because the existing `test_renewal_exemption` integration test from the first `AllowRenewalFirstRL` implementation provides the required coverage for `config-next` runs.
Resolves#4060Resolves#4006
The `renewal` field of the `issuedNames` table is indexed
to allow more efficient processing of rate limit decisions
w.r.t. renewals for the certificates per domain rate limit (see
https://github.com/letsencrypt/boulder/pull/3178). Before we can start
using this field in rate limit calculations we need to populate it
appropriately. A new `SetIssuedNamesRenewalBit` feature flag for the SA
controls whether we do so or not.
Resolves https://github.com/letsencrypt/boulder/issues/4008
Implements a feature that enables immediate revocation instead of marking a certificate revoked and waiting for the OCSP-Updater to generate the OCSP response. This means that as soon as the request returns from the WFE the revoked OCSP response should be available to the user. This feature requires that the RA be configured to use the standalone Akamai purger service.
Fixes#4031.
* in boulder-ra we connected to the publisher and created a publisher gRPC client twice for no apparent reason
* in the SA we ignored errors from `getChallenges` in `GetAuthorizations` which could result in a nil challenge being returned in an authorization
Fixes#3965 and fixes#3949.
This change adds a model for the authz2 style authorization storage and implements methods to transform to/from the protobuf representations and SA methods to store and retrieve the new style authorizations.
In all cases where we were using `dbMap`, use `dbMap.WithContext()`.
Because of the way Gorp implements contexts, we can't add a context when
opening a transaction, committing it, or rolling it back, but we can add a context
when doing operations within a transaction, using `tx.WithContext()`.
Some functions in SA expected an argument of `gorp.DbMap` or `gorp.Transaction`.
Since Gorp's `WithContext()` functions return a `gorp.SqlExecutor`, those functions
had to be updated. This PR introduces a few local interfaces that define a
subset of methods required from `gorp.SqlExecutor`, which allows more narrowly
defining inputs to functions.
Removes the checks for a handful of deployed feature flags in preparation for removing the flags entirely. Also moves all of the currently deprecated flags to a separate section of the flags list so they can be more easily removed once purged from production configs.
Fixes#3880.
Sometimes two simultaneous challenge POSTs can result in a situation
where the challenges get updated to show one of them as "valid", then
later get updated to show them all "pending" again even though the
validation succeeded (and was audit logged).
Fixes#3833
Use a boulder error type to indicate duplicate rows instead of a normal untyped error (as gRPC mangles this type of error but understands how to properly handle a boulder error).
If an order is expired the status is invalid and we don't need to get any of the order's authorizations. Its important to exit early in this case because expired authorizations may be purged from the DB. Fetching the authz's for an expired order may return less authz objects than expected, triggering a 500 error response.
Resolves https://github.com/letsencrypt/boulder/issues/3839
Retains the existing logging of orphaned certs until we are confident that this
solution can fully replace it (even then we may want to keep it just for auditing etc).
Fixes#3636.
Things removed:
* features.EmbedSCTs (and all the associated RA/CA/ocsp-updater code etc)
* ca.enablePrecertificateFlow (and all the associated RA/CA code)
* sa.AddSCTReceipt and sa.GetSCTReceipt RPCs
* publisher.SubmitToCT and publisher.SubmitToSingleCT RPCs
Fixes#3755.
In each of GetAuthorization, DeactivateAuthorization, and SetOrderError, there
was an error case that could cause us to not rollback the transaction,
leaving it open. This could cause database performance problems.
A very large number of the logger calls are of the form log.Function(fmt.Sprintf(...)).
Rather than sprinkling fmt.Sprintf at every logger call site, provide formatting versions
of the logger functions and call these directly with the format and arguments.
While here remove some unnecessary trailing newlines and calls to String/Error.