Add `identifier` fields, which will soon replace the `dnsName` fields,
to:
- `corepb.Authorization`
- `corepb.Order`
- `rapb.NewOrderRequest`
- `sapb.CountFQDNSetsRequest`
- `sapb.CountInvalidAuthorizationsRequest`
- `sapb.FQDNSetExistsRequest`
- `sapb.GetAuthorizationsRequest`
- `sapb.GetOrderForNamesRequest`
- `sapb.GetValidAuthorizationsRequest`
- `sapb.NewOrderRequest`
Populate these `identifier` fields in every function that creates
instances of these structs.
Use these `identifier` fields instead of `dnsName` fields (at least
preferentially) in every function that uses these structs. When crossing
component boundaries, don't assume they'll be present, for
deployability's sake.
Deployability note: Mismatched `cert-checker` and `sa` versions will be
incompatible because of a type change in the arguments to
`sa.SelectAuthzsMatchingIssuance`.
Part of #7311
Update the SA to re-query the database for the updated account after
deactivating it, and return this to the RA. Update the RA to pass this
value through to the WFE. Update the WFE to return this value, rather
than locally modifying the pre-deactivation account object, if it gets
one (for deployability).
Also remove the RA's requirement that the request object specify its
current status so that the request can be trimmed down to just an ID.
This proto change is backwards-compatible because the new
DeactivateRegistrationRequest's registrationID field has the same type
(int64) and field number (1) as corepb.Registration's id field.
Part of https://github.com/letsencrypt/boulder/issues/5554
- Plumb the "replaces" value from the WFE through to the SA via the RA
- Store validated "replaces" value for new orders in the orders table
- Reflect the stored "replaces" value to subscribers in the order object
- Reorder CertificateProfileName before Replaces/ReplacesSerial in RA
and SA protos for consistency
Fixes#8034
In a few places within the SA, we use explicit transactions to wrap
read-then-update style operations. Because we set the transaction
isolation level on a per-session basis, these transactions do not in
fact change their isolation level, and therefore generally remain at the
default isolation level of REPEATABLE READ.
Unfortunately, we cannot resolve this simply by converting the SELECT
statements into SELECT...FOR UPDATE statements: although this would fix
the issue by making those queries into locking statements, it also
triggers what appears to be an InnoDB bug when many transactions all
attempt to select-then-insert into a table with both a primary key and a
separate unique key, as the crlShards table has. This causes the
integration tests in GitHub Actions, which run with an empty database
and therefore use the needToInsert codepath instead of the update
codepath, to consistently flake.
Instead, resolve the issue by having the UPDATE statements specify that
the value of the leasedUntil column is still the same as was read by the
initial SELECT. Although two crl-updaters may still attempt these
transactions concurrently, the UPDATE statements will still be fully
sequenced, and the latter one will fail.
Part of https://github.com/letsencrypt/boulder/issues/8031
Deprecate the "InsertAuthzsIndividually" feature flag, which has been
set to true in both Staging and Production. Delete the code guarded
behind that flag being false, namely the ability of the MultiInserter to
return the newly-created IDs from all of the rows it has inserted. This
behavior is being removed because it is not supported in MySQL / Vitess.
Fixes https://github.com/letsencrypt/boulder/issues/7718
---
> [!WARNING]
> ~~Do not merge until IN-10737 is complete~~
Add "certificateProfileName" to the model used to insert new authz2 rows
and to the list of column names read when retrieving rows from the
authz2 table. Add support for this column to the functions which convert
to and from authz2 model types.
Add support for the profile field to core types so that it can be
returned by the SA.
Fixes https://github.com/letsencrypt/boulder/issues/7955
Add querying by explicit shard (SA.GetRevokedCertsByShard) in addition
to querying by temporal shard (SA.GetRevokedCerts).
Merge results from both kinds of shard. De-duplicate by serial within a
shard, because the same certificate could wind up in a temporal shard
that matches its explicit shard.
When de-duplicating, validate that revocation reasons are the same or
(very unlikely) represent a re-revocation based on demonstrating key
compromise. This can happen because the two different SA queries occur
at slightly different times.
Add unit testing that CRL entries make it through the whole pipeline
from SA, to CA, to uploader.
Rename some types in the unittest to be more accessible.
Tweak a comment in SA.UpdateRevokedCertificate to make it clear that
status _and_ reason are critical for re-revocation.
Note: This GetRevokedCertsByShard code path will always return zero
certificates right now, because nothing is writing to the
`revokedCertificates` table. Writing to that table is gated on
certificates having CRL URLs in them, which is not yet implemented (and
will be config-gated).
Part of #7094
In RA.RevokedCertificate, if the certificate being revoked has a
crlDistributionPoints extension, parse the URL and pass the appropriate
shard to the SA.
This required some changes to the `admin` tool. When a malformed
certificate is revoked, we don't have a parsed copy of the certificate
to extract a CRL URL from. So, specifically when a malformed certificate
is being revoked, allow specifying a CRL shard. Because different
certificates will have different shards, require one-at-a-time
revocation for malformed certificates.
To support that refactoring, move the serial-cleaning functionality
earlier in the `admin` tool's flow.
Also, split out one of the cases handled by the `revokeCertificate`
helper in the RA. For admin malformed revocations, we need to accept a
human-specified ShardIdx, so call the SA directly in that case (and skip
stat increment since admin revocations aren't useful for metrics). This
allows `revokeCertificate` to be a more helpful helper, by extracting
serial, issuer ID, and CRL shard automatically from an
`*x509.Certificate`.
Note: we don't yet issue certificates with the crlDistributionPoints
extension, so this code will not be active until we start doing so.
Part of #7094.
This is the final stage of #5554: removing the old, combined
`UpdateRegistration` flow, which has been replaced by
`UpdateRegistrationContact` and `UpdateRegistrationKey`. Those new
functions have their own tests.
The RA's `UpdateRegistration` function no longer has any callers (as of
#7827's deployment), so it is safely deployable to remove it from the SA
too, and its request from gRPC.
Fixes#5554
---------
Co-authored-by: Jacob Hoffman-Andrews <jsha+github@letsencrypt.org>
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
Remove code using `certificatesPerName` & `newOrdersRL` tables.
Deprecate `DisableLegacyLimitWrites` & `UseKvLimitsForNewOrder` flags.
Remove legacy `ratelimit` package.
Delete these RA test cases:
- `TestAuthzFailedRateLimitingNewOrder` (rl:
`FailedAuthorizationsPerDomainPerAccount`)
- `TestCheckCertificatesPerNameLimit` (rl: `CertificatesPerDomain`)
- `TestCheckExactCertificateLimit` (rl: `CertificatesPerFQDNSet`)
- `TestExactPublicSuffixCertLimit` (rl: `CertificatesPerDomain`)
Rate limits in NewOrder are now enforced by the WFE, starting here:
5a9b4c4b18/wfe2/wfe.go (L781)
We collect a batch of transactions to check limits, check them all at
once, go through and find which one(s) failed, and serve the failure
with the Retry-After that's furthest in the future. All this code
doesn't really need to be tested again; what needs to be tested is that
we're returning the correct failure. That code is
`NewOrderLimitTransactions`, and the `ratelimits` package's tests cover
this.
The public suffix handling behavior is tested by
`TestFQDNsToETLDsPlusOne`:
5a9b4c4b18/ratelimits/utilities_test.go (L9)
Some other RA rate limit tests were deleted earlier, in #7869.
Part of #7671.
Change the type of the orderModelv2 CertificateProfileName field to be a
pointer to a string, reflecting the fact that the underlying database
column is nullable. Add tests to ensure that order rows inserted with
either order model can be read using the other model.
Fixes https://github.com/letsencrypt/boulder/issues/7906
The RA's DeactivateAccount method expects the account provided to it by
the WFE to still have status Valid. The new WFE deactivation code was
hardcoding the status to Deactivated. Fix the WFE to pass the account's
current status instead.
Add an integration test to confirm both the breakage and the fix. Also
leave behind some TODOs to simplify this codepath further, and not
require the status to be provided at all.
Part of #5554
Introduce separate UpdateRegistrationContact & UpdateRegistrationKey
methods in RA & SA
Clear contact field during DeactivateRegistration
Part of #7716
Part of #5554
The naming of our `precertificates` table (now used to store linting
certificates) is definitely confusing, so add some more comments in
various places explaining. See #6807.
Creating the list of authzIDs earlier in NewOrderAndAuthzs:
- Saves a `for` loop with duplicated code; we no longer need to range
over two different slices, just one.
- Allows us to create the Order PB later, after more of the data
collection logic, without interrupting it. This makes the order of
operations slightly easier to follow.
This is our only use of MariaDB's "INSERT ... RETURNING" syntax, which
does not exist in MySQL and Vitess. Add a feature flag which removes our
use of this feature, so that we can easily disable it and then re-enable
it if it turns out to be too much of a performance hit.
Also add a benchmark showing that the serial-insertion approach is
slower, but perhaps not debilitatingly so.
Part of https://github.com/letsencrypt/boulder/issues/7718
Remove the id, identifierValue, status, and challenges fields from
sapb.NewAuthzRequest. These fields were left behind from the previous
corepb.Authorization request type, and are now being ignored by the SA.
Since the RA is no longer constructing full challenge objects to include
in the request, remove pa.ChallengesFor and replace it with the much
simpler pa.ChallengeTypesFor.
Part of https://github.com/letsencrypt/boulder/issues/5913
Find all gRPC fields which represent DNS Names -- sometimes called
"identifier", "hostname", "domain", "identifierValue", or other things
-- and unify their naming. This naming makes it very clear that these
values are strings which may be included in the SAN extension of a
certificate with type dnsName.
As we move towards issuing IP Address certificates, all of these fields
will need to be replaced by fields which carry both an identifier type
and value, not just a single name. This unified naming makes it very
clear which messages and methods need to be updated to support
non-dnsName identifiers.
Part of https://github.com/letsencrypt/boulder/issues/7647
Within the NewOrderAndAuthzsRequest, replace the corepb.Authorization
field with a new sapb.NewAuthzRequest message. This message has all of
the same field types and numbers, and the RA still populates all of
these fields when constructing a request, for backwards compatibility.
But it also has new fields (an Identifier carrying both type and value,
a list of challenge types, and a challenge token) which the RA
preferentially consumes if present.
This causes the content of our NewOrderAndAuthzsRequest to more closely
match the content that will be created at the database layer. Although
this may seem like a step backwards in terms of abstraction, it is also
a step forwards in terms of both efficiency (not having to transmit
multiple nearly-identical challenge objects) and correctness (being
guaranteed that the token is actually identical across all challenges).
After this change is deployed, it will be followed by a change which
removes the old fields from the NewAuthzRequest message, to realize the
efficiency gains.
Part of https://github.com/letsencrypt/boulder/issues/5913
Call `RA.UnpauseAccount` for valid unpause form submissions.
Determine and display the appropriate outcome to the Subscriber based on
the count returned by `RA.UnpauseAccount`:
- If the count is zero, display the "Account already unpaused" message.
- If the count equals the max number of identifiers allowed in a single
request, display a page explaining the need to visit the unpause URL
again.
- Otherwise, display the "Successfully unpaused all N identifiers"
message.
Apply per-request timeout from the SFE configuration.
Part of https://github.com/letsencrypt/boulder/issues/7406
SA method PauseIdentifiers skips identifiers unpaused within the last 2
weeks, providing a grace period for operators to fix configuration
issues resulting in numerous contiguous validation failures.
Part of #7475
SA method UnpauseAccount uses up to 5 `UPDATE` query iterations, each
with a `LIMIT` of 10000, to unpause up to 50000 identifiers and returns
a count of identifiers unpaused.
Part of #7475
This reverts commit 2b5b6239a4.
Following up on #7556, after we made a more systematic change to use
borp's TypeConverter, we no longer need to manually truncate timestamps.
Add the storage implementation for our new (account, hostname) pair
pausing feature.
- Add schema and model for for the new paused table
- Add SA service methods for interacting with the paused table
Part of #7406
Part of #7475
As described in #7075, go-sql-driver/mysql v1.5.0 truncates timestamps
to microseconds, while v1.6.0 and above does not. That means upon
upgrading to v1.6.0, timestamps are written to the database with a
resolution of nanoseconds, and SELECT statements also use a resolution
of nanoseconds. We believe this is the cause of performance problems
we observed when upgrading to v1.6.0 and above.
To fix that, apply rounding in the application code. Rather than
just rounding to microseconds, round to seconds since that is the
resolution we care about. Using seconds rather than microseconds
may also allow some of our indexes to grow more slowly over time.
Note: this omits truncating some timestamps in CRL shard calculations,
since truncating those resulted in test failures that I'll follow up
on separately.
Replaced our embeds of foopb.UnimplementedFooServer with
foopb.UnsafeFooServer. Per the grpc-go docs this reduces the "forwards
compatibility" of our implementations, but that is only a concern for
codebases that are implementing gRPC interfaces maintained by third
parties, and which want to be able to update those third-party
dependencies without updating their own implementations in lockstep.
Because we update our protos and our implementations simultaneously, we
can remove this safety net to replace runtime type checking with
compile-time type checking.
However, that replacement is not enough, because we never pass our
implementation objects to a function which asserts that they match a
specific interface. So this PR also replaces our reflect-based unittests
with idiomatic interface assertions. I do not view this as a perfect
solution, as it relies on people implementing new gRPC servers to add
this line, but it is no worse than the status quo which relied on people
adding the "TestImplementation" test.
Fixes https://github.com/letsencrypt/boulder/issues/7497
Update the version of protoc-gen-go-grpc that we use to generate Go gRPC
code from our proto files, and update the versions of other gRPC tools
and libraries that we use to match. Turn on the new
`use_generic_streams` code generation flag to change how
protoc-gen-go-grpc generates implementations of our streaming methods,
from creating a wholly independent implementation for every stream to
using shared generic implementations.
Take advantage of this code-sharing to remove our SA "wrapper" methods,
now that they have truly the same signature as the SARO methods which
they wrap. Also remove all references to the old-style stream names
(e.g. foopb.FooService_BarMethodClient) and replace them with the new
underlying generic names, for the sake of consistency. Finally, also
remove a few custom stream test mocks, replacing them with the generic
mocks.ServerStreamClient.
Note that this PR does not change the names in //mocks/sa.go, to avoid
conflicts with work happening in the pursuit of
https://github.com/letsencrypt/boulder/issues/7476. Note also that this
PR updates the version of protoc-gen-go-grpc that we use to a specific
commit. This is because, although a new release of grpc-go itself has
been cut, the codegen binary is a separate Go module with its own
releases, and it hasn't had a new release cut yet. Tracking for that is
in https://github.com/grpc/grpc-go/issues/7030.
This removes the only place we query the requestedNames table, which
allows us to get rid of it in a subsequent PR (once this one is merged
and deployed).
Part of https://github.com/letsencrypt/boulder/issues/7432
Adds `certificateProfileName` to the `orders` database table. The
[maximum
length](https://github.com/letsencrypt/boulder/pull/7325/files#diff-a64a0af7cbf484da8e6d08d3eefdeef9314c5d9888233f0adcecd21b800102acR35)
of a profile name matches the `//issuance` package.
Adds a `MultipleCertificateProfiles` feature flag that, when enabled,
will store the certificate profile name from a `NewOrderRequest`. The
certificate profile name is allowed to be empty and the database will
treat that row as [NULL](https://mariadb.com/kb/en/null-values/). When
the SA retrieves this potentially NULL row, it will be cast as the
golang string zero value `""`.
SRE ticket IN-10145 has been filed to perform the database migration and
enable the new feature flag. The migration must be performed before
enabling the feature flag.
Part of https://github.com/letsencrypt/boulder/issues/7324
If a client attempts to validate a challenge twice in rapid succession,
we'll kick off two background validation routines. One of these will
complete first, updating the database with success or failure. The other
will fail when it attempts to update the database and finds that there
are no longer any authorizations with that ID in the "pending" state.
Reduce the level at which we log such events, since we don't
particularly care about them.
Fixes https://github.com/letsencrypt/boulder/issues/3995
This is a cleanup PR finishing the migration from int64 timestamps to
protobuf `*timestamppb.Timestamps` by removing all usage of the old
int64 fields. In the previous PR
https://github.com/letsencrypt/boulder/pull/7121 all fields were
switched to read from the protobuf timestamppb fields.
Adds a new case to `core.IsAnyNilOrZero` to check various properties of
a `*timestamppb.Timestamp` reducing the visual complexity for receivers.
Fixes https://github.com/letsencrypt/boulder/issues/7060
* Adds new `google.protobuf.Timestamp` fields to each .proto file where
we had been using `int64` fields as a timestamp.
* Updates relevant gRPC messages to populate the new
`google.protobuf.Timestamp` fields in addition to the old `int64`
timestamp fields.
* Added tests for each `<x>ToPB` and `PBto<x>` functions to ensure that
new fields passed into a gRPC message arrive as intended.
* Removed an unused error return from `PBToCert` and `PBToCertStatus`
and cleaned up each call site.
Built on-top of https://github.com/letsencrypt/boulder/pull/7069
Part 2 of 4 related to
https://github.com/letsencrypt/boulder/issues/7060
Add a new "revokedCertificates" table to the database schema. This table
is similar to the existing "certificateStatus" table in many ways, but
the idea is that it will only have rows added to it when certificates
are revoked, not when they're issued. Thus, it will grow many orders of
magnitude slower than the certificateStatus table does. Eventually, it
will replace that table entirely.
The one column that revokedCertificates adds is the new "ShardIdx"
column, which is the CRL shard in which the revoked certificate will
appear. This way we can assign certificates to CRL shards at the time
they are revoked, and guarantee that they will never move to a different
shard even if we change the number of shards we produce. This will
eventually allow us to put CRL URLs directly into our certificates,
replacing OCSP URLs.
Add new logic to the SA's RevokeCertificate and UpdateRevokedCertificate
methods to handle this new table. If these methods receive a request
which specifies a CRL shard (our CRL shards are 1-indexed, so shard 0
does not exist), then they will ensure that the new revocation status is
written into both the certificateStatus and revokedCertificates tables.
This logic will not function until the RA is updated to take advantage
of it, so it is not a risk for it to appear in Boulder before the new
table has been created.
Also add new logic to the SA's GetRevokedCertificates method. Similar to
the above, this reads from the new table if the ShardIdx field is
supplied in the request message. This code will not operate until the
crl-updater is updated to include this field. We will not perform this
update for a minimum of 100 days after this code is deployed, to ensure
that all unexpired revoked certificates are present in the
revokedCertificates table.
Part of https://github.com/letsencrypt/boulder/issues/7094
The NextUpdate field should not be required, as it is not necessary for
tracking and preventing duplicate work between multiple crl-updater
instances.
The ThisUpdate conditional needs explicit handling for NULL to ensure
that it updates correctly.
Simplify the index-picking logic in the SA's leaseOldestCrlShard method.
Specifically, more clearly separate it into "missing" and "non-missing"
cases, which require entirely different logic: picking a random missing
shard, or picking the oldest unleased shard, respectively.
Also change the UpdateCRLShard method to "unlease" shards when they're
updated. This allows the crl-updater to run as quickly as it likes,
while still ensuring that multiple instances do not step on each other's
toes.
The config change for shardWidth and lookbackPeriod instead of
certificateLifetime has been deployed in prod since IN-8445. The config
change changing the shardWidth is just so that the tests neither produce
a bazillion shards, nor have to do a bazillion SA queries for each chunk
within a shard, improving the readability of test logs.
Part of https://github.com/letsencrypt/boulder/issues/7023
Add a new feature flag, LeaseCRLShards, which controls certain aspects
of crl-updater's behavior.
When this flag is enabled, crl-updater calls the new SA.LeaseCRLShard
method before beginning work on a shard. This prevents it from stepping
on the toes of another crl-updater instance which may be working on the
same shard. This is important to prevent two competing instances from
accidentally updating a CRL's Number (which is an integer representation
of its thisUpdate timestamp) *backwards*, which would be a compliance
violation.
When this flag is enabled, crl-updater also calls the new
SA.UpdateCRLShard method after finishing work on a shard.
In the future, additional work will be done to make crl-updater use the
"give me the oldest available shard" mode of the LeaseCRLShard method.
Fixes https://github.com/letsencrypt/boulder/issues/6897
This change replaces [gorp] with [borp].
The changes consist of a mass renaming of the import and comments / doc
fixups, plus modifications of many call sites to provide a
context.Context everywhere, since gorp newly requires this (this was one
of the motivating factors for the borp fork).
This also refactors `github.com/letsencrypt/boulder/db.WrappedMap` and
`github.com/letsencrypt/boulder/db.Transaction` to not embed their
underlying gorp/borp objects, but to have them as plain fields. This
ensures that we can only call methods on them that are specifically
implemented in `github.com/letsencrypt/boulder/db`, so we don't miss
wrapping any. This required introducing a `NewWrappedMap` method along
with accessors `SQLDb()` and `BorpDB()` to get at the internal fields
during metrics and logging setup.
Fixes#6944
Add two new methods, LeaseCRLShard and UpdateCRLShard, to the SA gRPC
interface. These methods work in concert both to prevent multiple
instances of crl-updater from stepping on each others toes, and to lay
the groundwork for a less bursty version of crl-updater in the future.
Introduce a new database table, crlShards, which tracks the thisUpdate
and nextUpdate timestamps of each CRL shard for each issuer. It also has
a column "leasedUntil", which is also a timestamp. Grant the SA user
read-write access to this table.
LeaseCRLShard updates the leasedUntil column of the identified shard to
the given time. It returns an error if the identified shard's
leasedUntil timestamp is already in the future. This provides a
mechanism for crl-updater instances to "lick the cookie", so to speak,
marking CRL shards as "taken" so that multiple crl-updater instances
don't attempt to work on the same shard at the same time. Using a
timestamp has the added benefit that leases are guaranteed to expire,
ensuring that we don't accidentally fail to work on a shard forever.
LeaseCRLShard has a second mode of operation, when a range of potential
shards is given in the request, rather than a single shard. In this
mode, it returns the shard (within the given range) whose thisUpdate
timestamp is oldest. (Shards with no thisUpdate timestamp, including
because the requested range includes shard indices the database doesn't
yet know about, count as older than any shard with any thisUpdate
timestamp.) This allows crl-updater instances which don't care which
shard they're working on to do the most urgent work first.
UpdateCRLShard updates the thisUpdate and nextUpdate timestamps of the
identified shard. This closes the loop with the second mode of
LeaseCRLShard above: by updating the thisUpdate timestamp, the method
marks the shard as no longer urgently needing to be worked on.
IN-9220 tracks creating this table in staging and production
Part of #6897