Call `RA.UnpauseAccount` for valid unpause form submissions.
Determine and display the appropriate outcome to the Subscriber based on
the count returned by `RA.UnpauseAccount`:
- If the count is zero, display the "Account already unpaused" message.
- If the count equals the max number of identifiers allowed in a single
request, display a page explaining the need to visit the unpause URL
again.
- Otherwise, display the "Successfully unpaused all N identifiers"
message.
Apply per-request timeout from the SFE configuration.
Part of https://github.com/letsencrypt/boulder/issues/7406
The name "now" was always misleading, because we never set the value to
be the actual current time, we always set it to be some time in the
future to avoid returning authzs which expire in the very near future.
Changing the name to "validUntil" matches the current naming in
GetPendingAuthorizationRequest.
SA method PauseIdentifiers skips identifiers unpaused within the last 2
weeks, providing a grace period for operators to fix configuration
issues resulting in numerous contiguous validation failures.
Part of #7475
SA method UnpauseAccount uses up to 5 `UPDATE` query iterations, each
with a `LIMIT` of 10000, to unpause up to 50000 identifiers and returns
a count of identifiers unpaused.
Part of #7475
We only care about the date of an issuedName, not the exact time, and
this may reduce the size of the index somewhat.
---------
Co-authored-by: Samantha Frank <hello@entropy.cat>
This reverts commit 2b5b6239a4.
Following up on #7556, after we made a more systematic change to use
borp's TypeConverter, we no longer need to manually truncate timestamps.
We believe the MariaDB query planner generates inefficient query plans
when a time index is queried using high precision (nanosecond) times.
This uses the updated borp from[1] to automatically truncate
`time.Time` and `*time.Time` in query parameters.
[1]: https://github.com/letsencrypt/borp/pull/11
Part of #5437
Add the storage implementation for our new (account, hostname) pair
pausing feature.
- Add schema and model for for the new paused table
- Add SA service methods for interacting with the paused table
Part of #7406
Part of #7475
As described in #7075, go-sql-driver/mysql v1.5.0 truncates timestamps
to microseconds, while v1.6.0 and above does not. That means upon
upgrading to v1.6.0, timestamps are written to the database with a
resolution of nanoseconds, and SELECT statements also use a resolution
of nanoseconds. We believe this is the cause of performance problems
we observed when upgrading to v1.6.0 and above.
To fix that, apply rounding in the application code. Rather than
just rounding to microseconds, round to seconds since that is the
resolution we care about. Using seconds rather than microseconds
may also allow some of our indexes to grow more slowly over time.
Note: this omits truncating some timestamps in CRL shard calculations,
since truncating those resulted in test failures that I'll follow up
on separately.
Replaced our embeds of foopb.UnimplementedFooServer with
foopb.UnsafeFooServer. Per the grpc-go docs this reduces the "forwards
compatibility" of our implementations, but that is only a concern for
codebases that are implementing gRPC interfaces maintained by third
parties, and which want to be able to update those third-party
dependencies without updating their own implementations in lockstep.
Because we update our protos and our implementations simultaneously, we
can remove this safety net to replace runtime type checking with
compile-time type checking.
However, that replacement is not enough, because we never pass our
implementation objects to a function which asserts that they match a
specific interface. So this PR also replaces our reflect-based unittests
with idiomatic interface assertions. I do not view this as a perfect
solution, as it relies on people implementing new gRPC servers to add
this line, but it is no worse than the status quo which relied on people
adding the "TestImplementation" test.
Fixes https://github.com/letsencrypt/boulder/issues/7497
Update the version of protoc-gen-go-grpc that we use to generate Go gRPC
code from our proto files, and update the versions of other gRPC tools
and libraries that we use to match. Turn on the new
`use_generic_streams` code generation flag to change how
protoc-gen-go-grpc generates implementations of our streaming methods,
from creating a wholly independent implementation for every stream to
using shared generic implementations.
Take advantage of this code-sharing to remove our SA "wrapper" methods,
now that they have truly the same signature as the SARO methods which
they wrap. Also remove all references to the old-style stream names
(e.g. foopb.FooService_BarMethodClient) and replace them with the new
underlying generic names, for the sake of consistency. Finally, also
remove a few custom stream test mocks, replacing them with the generic
mocks.ServerStreamClient.
Note that this PR does not change the names in //mocks/sa.go, to avoid
conflicts with work happening in the pursuit of
https://github.com/letsencrypt/boulder/issues/7476. Note also that this
PR updates the version of protoc-gen-go-grpc that we use to a specific
commit. This is because, although a new release of grpc-go itself has
been cut, the codegen binary is a separate Go module with its own
releases, and it hasn't had a new release cut yet. Tracking for that is
in https://github.com/grpc/grpc-go/issues/7030.
While we don't want to halt the admin tool in the midst of its parallel
processing, we can keep track of whether it has encountered any errors
and raise one meta-error at the end of its execution. This will prevent
the top-level admin code from claiming that execution succeeded, and
ensure operators notice any previously-logged errors.
As part of this, fix the SA's GetLintPrecertificate wrapper to actually
call the SARO's GetLintPrecertificate, instead of incorrectly calling
the SARO's GetCertificate.
Fixes https://github.com/letsencrypt/boulder/issues/7460
Correctly explode the params slice with Go's "..." notation so that
gorp/go-sql-driver correctly receives each element of the params slice,
rather than receiving the slice as a whole. Also use the SA's clock,
rather than the DB's, to control which certs are selected -- in
deployments this wouldn't make a difference but in test those clocks can
be very different.
Add two unit tests to ensure this query does not regress, and create a
generic fake gRPC server stream for use in several SA tests including
the new ones.
Fixes https://github.com/letsencrypt/boulder/issues/7460
This removes the only place we query the requestedNames table, which
allows us to get rid of it in a subsequent PR (once this one is merged
and deployed).
Part of https://github.com/letsencrypt/boulder/issues/7432
Cert checker has been failing in config-next because the
"CertCheckerRequiresCorrespondence" flag is enabled, but no permissions
were granted on the precertificates table. While this did cause it to
log profusively, this did not cause the tests to fail because
integration-test.py simply executes cert-checker without caring about
its exit code.
We had disabled our lints on go1.22 because golangci-lint and
staticcheck didn't work with some of its updates. Re-enable them, and
fix the things which the updated linters catch now.
Fixes https://github.com/letsencrypt/boulder/issues/7229
Upgrade from the old go-jose v2.6.1 to the newly minted go-jose v4.0.1.
Cleans up old code now that `jose.ParseSigned` can take a list of
supported signature algorithms.
Fixes https://github.com/letsencrypt/boulder/issues/7390
---------
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
Adds `certificateProfileName` to the `orders` database table. The
[maximum
length](https://github.com/letsencrypt/boulder/pull/7325/files#diff-a64a0af7cbf484da8e6d08d3eefdeef9314c5d9888233f0adcecd21b800102acR35)
of a profile name matches the `//issuance` package.
Adds a `MultipleCertificateProfiles` feature flag that, when enabled,
will store the certificate profile name from a `NewOrderRequest`. The
certificate profile name is allowed to be empty and the database will
treat that row as [NULL](https://mariadb.com/kb/en/null-values/). When
the SA retrieves this potentially NULL row, it will be cast as the
golang string zero value `""`.
SRE ticket IN-10145 has been filed to perform the database migration and
enable the new feature flag. The migration must be performed before
enabling the feature flag.
Part of https://github.com/letsencrypt/boulder/issues/7324
Create a new method on the gorm rows object which runs a small closure
for every row retrieved from the database. Use this new method to remove
20 lines of boilerplate from five different SA methods and rocsp-tool.
This adds the profile name to the proto messages necessary to propagate
it from the WFE to the SA, and from the SA to the CA. This change is
safe to land prior to any logic being added, and unblocks
profile-handling logic changes to the WFE, RA, SA, and CA.
Part of https://github.com/letsencrypt/boulder/issues/7309
Add two new methods to the SA, GetSerialsByKey and GetSerialsByAccount,
which use the same query as the admin tool has previously used to get
serials matching a given SPKI hash or a given registration ID. These two
new gRPC methods read the database row-by-row and produce streams of
results to keep SA memory usage low.
Use these methods in the admin tool so it no longer needs a direct
database connection for these actions.
Part of https://github.com/letsencrypt/boulder/issues/7350
Adds the chosen DNS resolver to the VAs `ValidationRecord` object so
that for each challenge type during a validation, boulder can audit log
the resolver(s) chosen to fulfill the request..
Fixes https://github.com/letsencrypt/boulder/issues/7140
Add a new "GetLintPrecertificate" method to the SA's gRPC service. This
acts identically to the existing "GetCertificate", but returns the
linting precertificate created just prior to the actual precertificate
instead. This is useful for revocation, where we need to be able to act
on a serial even if the corresponding (pre)certificate was never issued
or never saved to the database.
Part of https://github.com/letsencrypt/boulder/issues/7135
In MariaDB, `long_query_time`[1] and `max_statement_time`[2] have up to
microsecond granularity (6 digits to the right of the decimal).
Fixes an issue detected by proxysql in staging.
```
MySQL_Session.cpp:6567:handler___status_WAITING_CLIENT_DATA___STATE_SLEEP___MYSQL_COM_QUERY_qpo(): [ERROR] Unable to parse query. If correct, report it as a bug: SET long_query_time=3.9200000000000004
```
1. https://mariadb.com/kb/en/server-system-variables/#long_query_time
2. https://mariadb.com/kb/en/server-system-variables/#max_statement_time
---------
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
If a client attempts to validate a challenge twice in rapid succession,
we'll kick off two background validation routines. One of these will
complete first, updating the database with success or failure. The other
will fail when it attempts to update the database and finds that there
are no longer any authorizations with that ID in the "pending" state.
Reduce the level at which we log such events, since we don't
particularly care about them.
Fixes https://github.com/letsencrypt/boulder/issues/3995
This is a cleanup PR finishing the migration from int64 timestamps to
protobuf `*timestamppb.Timestamps` by removing all usage of the old
int64 fields. In the previous PR
https://github.com/letsencrypt/boulder/pull/7121 all fields were
switched to read from the protobuf timestamppb fields.
Adds a new case to `core.IsAnyNilOrZero` to check various properties of
a `*timestamppb.Timestamp` reducing the visual complexity for receivers.
Fixes https://github.com/letsencrypt/boulder/issues/7060
Remove ThrowAwayCert's nameCount argument, since it is always set to 1
by all callers. Remove ThrowAwayCertWithSerial, because it has no
callers. Change the throwaway cert's key from RSA512 to ECDSA P-224 for
a two-orders-of-magnitude speedup in key generation. Use this simplified
form in two new places in the RA that were previously rolling their own
test certs.
Add a new clock argument to the test-only ThrowAwayCert function, and
use that clock to generate reasonable notBefore and notAfter timestamps
in the resulting throwaway test cert. This is necessary to easily test
functions which rely on the expiration timestamp of the certificate,
such as upcoming work about computing CRL shards.
Part of https://github.com/letsencrypt/boulder/issues/7094
* Renames all of int64 as a time.Duration fields to `<fieldname>NS` to
indicate they are Unix nanoseconds.
* Adds new `google.protobuf.Duration` fields to each .proto file where
we previously had been using an int64 field to populate a time.Duration.
* Updates relevant gRPC messages.
Part 1 of 3 for https://github.com/letsencrypt/boulder/issues/7097
* Adds new `google.protobuf.Timestamp` fields to each .proto file where
we had been using `int64` fields as a timestamp.
* Updates relevant gRPC messages to populate the new
`google.protobuf.Timestamp` fields in addition to the old `int64`
timestamp fields.
* Added tests for each `<x>ToPB` and `PBto<x>` functions to ensure that
new fields passed into a gRPC message arrive as intended.
* Removed an unused error return from `PBToCert` and `PBToCertStatus`
and cleaned up each call site.
Built on-top of https://github.com/letsencrypt/boulder/pull/7069
Part 2 of 4 related to
https://github.com/letsencrypt/boulder/issues/7060
Add a new "revokedCertificates" table to the database schema. This table
is similar to the existing "certificateStatus" table in many ways, but
the idea is that it will only have rows added to it when certificates
are revoked, not when they're issued. Thus, it will grow many orders of
magnitude slower than the certificateStatus table does. Eventually, it
will replace that table entirely.
The one column that revokedCertificates adds is the new "ShardIdx"
column, which is the CRL shard in which the revoked certificate will
appear. This way we can assign certificates to CRL shards at the time
they are revoked, and guarantee that they will never move to a different
shard even if we change the number of shards we produce. This will
eventually allow us to put CRL URLs directly into our certificates,
replacing OCSP URLs.
Add new logic to the SA's RevokeCertificate and UpdateRevokedCertificate
methods to handle this new table. If these methods receive a request
which specifies a CRL shard (our CRL shards are 1-indexed, so shard 0
does not exist), then they will ensure that the new revocation status is
written into both the certificateStatus and revokedCertificates tables.
This logic will not function until the RA is updated to take advantage
of it, so it is not a risk for it to appear in Boulder before the new
table has been created.
Also add new logic to the SA's GetRevokedCertificates method. Similar to
the above, this reads from the new table if the ShardIdx field is
supplied in the request message. This code will not operate until the
crl-updater is updated to include this field. We will not perform this
update for a minimum of 100 days after this code is deployed, to ensure
that all unexpired revoked certificates are present in the
revokedCertificates table.
Part of https://github.com/letsencrypt/boulder/issues/7094
The NextUpdate field should not be required, as it is not necessary for
tracking and preventing duplicate work between multiple crl-updater
instances.
The ThisUpdate conditional needs explicit handling for NULL to ensure
that it updates correctly.
Simplify the index-picking logic in the SA's leaseOldestCrlShard method.
Specifically, more clearly separate it into "missing" and "non-missing"
cases, which require entirely different logic: picking a random missing
shard, or picking the oldest unleased shard, respectively.
Also change the UpdateCRLShard method to "unlease" shards when they're
updated. This allows the crl-updater to run as quickly as it likes,
while still ensuring that multiple instances do not step on each other's
toes.
The config change for shardWidth and lookbackPeriod instead of
certificateLifetime has been deployed in prod since IN-8445. The config
change changing the shardWidth is just so that the tests neither produce
a bazillion shards, nor have to do a bazillion SA queries for each chunk
within a shard, improving the readability of test logs.
Part of https://github.com/letsencrypt/boulder/issues/7023
This design seeks to reduce read-pressure on our DB by moving rate limit
tabulation to a key-value datastore. This PR provides the following:
- (README.md) a short guide to the schemas, formats, and concepts
introduced in this PR
- (source.go) an interface for storing, retrieving, and resetting a
subscriber bucket
- (name.go) an enumeration of all defined rate limits
- (limit.go) a schema for defining default limits and per-subscriber
overrides
- (limiter.go) a high-level API for interacting with key-value rate
limits
- (gcra.go) an implementation of the Generic Cell Rate Algorithm, a
leaky bucket-style scheduling algorithm, used to calculate the present
or future capacity of a subscriber bucket using spend and refund
operations
Note: the included source implementation is test-only and currently
accomplished using a simple in-memory map protected by a mutex,
implementations using Redis and potentially other data stores will
follow.
Part of #5545
Add a new feature flag, LeaseCRLShards, which controls certain aspects
of crl-updater's behavior.
When this flag is enabled, crl-updater calls the new SA.LeaseCRLShard
method before beginning work on a shard. This prevents it from stepping
on the toes of another crl-updater instance which may be working on the
same shard. This is important to prevent two competing instances from
accidentally updating a CRL's Number (which is an integer representation
of its thisUpdate timestamp) *backwards*, which would be a compliance
violation.
When this flag is enabled, crl-updater also calls the new
SA.UpdateCRLShard method after finishing work on a shard.
In the future, additional work will be done to make crl-updater use the
"give me the oldest available shard" mode of the LeaseCRLShard method.
Fixes https://github.com/letsencrypt/boulder/issues/6897
This change replaces [gorp] with [borp].
The changes consist of a mass renaming of the import and comments / doc
fixups, plus modifications of many call sites to provide a
context.Context everywhere, since gorp newly requires this (this was one
of the motivating factors for the borp fork).
This also refactors `github.com/letsencrypt/boulder/db.WrappedMap` and
`github.com/letsencrypt/boulder/db.Transaction` to not embed their
underlying gorp/borp objects, but to have them as plain fields. This
ensures that we can only call methods on them that are specifically
implemented in `github.com/letsencrypt/boulder/db`, so we don't miss
wrapping any. This required introducing a `NewWrappedMap` method along
with accessors `SQLDb()` and `BorpDB()` to get at the internal fields
during metrics and logging setup.
Fixes#6944
Add two new methods, LeaseCRLShard and UpdateCRLShard, to the SA gRPC
interface. These methods work in concert both to prevent multiple
instances of crl-updater from stepping on each others toes, and to lay
the groundwork for a less bursty version of crl-updater in the future.
Introduce a new database table, crlShards, which tracks the thisUpdate
and nextUpdate timestamps of each CRL shard for each issuer. It also has
a column "leasedUntil", which is also a timestamp. Grant the SA user
read-write access to this table.
LeaseCRLShard updates the leasedUntil column of the identified shard to
the given time. It returns an error if the identified shard's
leasedUntil timestamp is already in the future. This provides a
mechanism for crl-updater instances to "lick the cookie", so to speak,
marking CRL shards as "taken" so that multiple crl-updater instances
don't attempt to work on the same shard at the same time. Using a
timestamp has the added benefit that leases are guaranteed to expire,
ensuring that we don't accidentally fail to work on a shard forever.
LeaseCRLShard has a second mode of operation, when a range of potential
shards is given in the request, rather than a single shard. In this
mode, it returns the shard (within the given range) whose thisUpdate
timestamp is oldest. (Shards with no thisUpdate timestamp, including
because the requested range includes shard indices the database doesn't
yet know about, count as older than any shard with any thisUpdate
timestamp.) This allows crl-updater instances which don't care which
shard they're working on to do the most urgent work first.
UpdateCRLShard updates the thisUpdate and nextUpdate timestamps of the
identified shard. This closes the loop with the second mode of
LeaseCRLShard above: by updating the thisUpdate timestamp, the method
marks the shard as no longer urgently needing to be worked on.
IN-9220 tracks creating this table in staging and production
Part of #6897
Previously, we had three chained calls initializing a database:
- InitWrappedDb calls NewDbMap
- NewDbMap calls NewDbMapFromConfig
Since all three are exporetd, this left me wondering when to call one vs
the others.
It turns out that NewDbMap is only called from tests, so I renamed it to
DBMapForTest to make that clear.
NewDbMapFromConfig is only called internally to the SA, so I made it
unexported it as newDbMapFromMysqlConfig.
Also, I copied the ParseDSN call into InitWrappedDb, so it doesn't need
to call DBMapForTest. Now InitWrappedDb and DBMapForTest both
independently call newDbMapFromMysqlConfig.
I also noticed that InitDBMetrics was only called internally so I
unexported it.
Add the necessary scaffolding for deep health checking of our various
gRPC components. Each component implementation that also implements the
grpc.checker interface will be checked periodically, and the health
status of the component will be updated accordingly.
Add the necessary methods to SA to implement the grpc.checker interface
and register these new health checks with Consul.
Additionally:
- Update entry point script to check for ProxySQL readiness.
- Increase the poll rate for gRPC Consul checks from 5s to 2s to help
with DNS failures, due to check failures, on startup.
- Change log level for Consul from INFO to ERROR to deal with noisy logs
full of transport failures due to Consul gRPC checks firing before the
SAs are up.
Fixes#6878
Part of #6795
When a user wants their email address deleted from the database but no
longer has access to their account, this allows an administrator to
clear it.
This adds `admin` as an alias for `admin-revoker`, because we'd like the
clear-email sub-command to be a part of that overall tool, but it's not
really revocation related.
Part of #6864
Removes the `Hostname` and `Port` fields from an http-01
ValidationRecord model prior to storing the record in the database.
Using `"hostname":"example.com","port":"80"` as a snippet of a whole
validation record, we'll save minimum 36 bytes for each new http-01
ValidationRecord that gets stored. When retrieving the record, the
ValidationRecord `RehydrateHostPort` method will repopulate the
`Hostname` and `Port` fields from the `URL` field.
Fixes the main goal of
https://github.com/letsencrypt/boulder/issues/5231.
---------
Co-authored-by: Samantha <hello@entropy.cat>
As a follow-up to https://github.com/letsencrypt/boulder/issues/5467, I
did an audit of all places where we call SelectOne to ensure that those
queries can never return more than one result. These four functions were
the only places that weren't already constrained to a single result
through the use of "SELECT COUNT", "LIMIT 1", "WHERE uniqueKey =", or
similar. Limit these functions' queries to always only return a single
result, now that their underlying tables no longer have unique key
constraints.
Additionally, slightly refactor selectRegistration to just take a single
column name rather than a whole WHERE clause.
Fixes https://github.com/letsencrypt/boulder/issues/6521
Make minor, non-user-visible changes to how we structure the probs
package. Notably:
- Add new problem types for UnsupportedContact and
UnsupportedIdentifier, which are specified by RFC8555 and which we will
use in the future, but haven't been using historically.
- Sort the problem types and constructor functions to match the
(alphabetical) order given in RFC8555.
- Rename some of the constructor functions to better match their
underlying problem types (e.g. "TLSError" to just "TLS").
- Replace the redundant ProblemDetailsToStatusCode function with simply
always returning a 500 if we haven't properly set the problem's
HTTPStatus.
- Remove the ability to use either the V1 or V2 error namespace prefix;
always use the proper RFC namespace prefix.
In order to get rid of the orphan queue, we want to make sure that
before we sign a precertificate, we have enough data in the database
that we can fulfill our revocation-checking obligations even if storing
that precertificate in the database fails. That means:
- We should have a row in the certificateStatus table for the serial.
- But we should not serve "good" for that serial until we are positive
the precertificate was issued (BRs 4.9.10).
- We should have a record in the live DB of the proposed certificate's
public key, so the bad-key-revoker can mark it revoked.
- We should have a record in the live DB of the proposed certificate's
names, so it can be revoked if we are required to revoke based on names.
The SA.AddPrecertificate method already achieves these goals for
precertificates by writing to the various metadata tables. This PR
repurposes the SA.AddPrecertificate method to write "proposed
precertificates" instead.
We already create a linting certificate before the precertificate, and
that linting certificate is identical to the precertificate that will be
issued except for the private key used to sign it (and the AKID). So for
instance it contains the right pubkey and SANs, and the Issuer name is
the same as the Issuer name that will be used. So we'll use the linting
certificate as the "proposed precertificate" and store it to the DB,
along with appropriate metadata.
In the new code path, rather than writing "good" for the new
certificateStatus row, we write a new, fake OCSP status string "wait".
This will cause us to return internalServerError to OCSP requests for
that serial (but we won't get such requests because the serial has not
yet been published). After we finish precertificate issuance, we update
the status to "good" with SA.SetCertificateStatusReady.
Part of #6665
Replace the current orderToAuthz2 table schema with one that includes an
auto-increment ID column, so that this table can be partitioned simply
by ID, like all of our other partitioned tables.
Update the SA so that when it selects from a join over this table and
the authz2 table, it explicitly selects the columns from the authz2
table, to avoid the ambiguity introduced by having two columns named
"id" in the result set.
This work is already in-progress in prod, represented by IN-8916 and
IN-8928.
Fixes https://github.com/letsencrypt/boulder/issues/6820
Make a series of small changes to our test database schema, both to make
it simpler to reason about and to bring it closer in alignment to our
production database schema:
- Incorporate the IssuedNamesDropIndex, Incidents, SimplePartitioning,
and NotUnique migrations into the CombinedSchema, as they have been
fully applied in prod;
- Use CHARSET=utf8mb4 everywhere, instead of just utf8;
- Use UNSIGNED for auto-increment ID columns in the tables where prod
does; and
- Re-sort the tables in CombinedSchema which no longer have foreign key
constraints.
Part of https://github.com/letsencrypt/boulder/issues/6820
Adds a new function to the `//sa` to ensure that a MariaDB config passed
in via SA `setDefault` or via DSN perform the following validations:
1. Correct quoting for strings and string enums to prevent future
problems such as PR #6683 from occurring.
2. Each system variable we care to use is scoped as SESSION, rather than
strictly GLOBAL.
3. Detect system variables passed in that are not in a curated list of
variables we care about.
4. Validate that values for booleans, floats, integers, and strings at
least pass basic a regex.
This change is in a bit of a weird place. The ideal place for this
change would be `go-sql-driver/mysql`, but since that driver handles the
general case of MySQL-compatible connections to the database, we're
implementing this validation in Boulder instead. We're confident about
the specific versions of MariaDB running in staging/prod and that the
database vendor won't change underneath us, which is why I decided to
take this approach. However, this change will bind us tighter to MariaDB
than MySQL due to the specific variables we're checking. An up-to-date
list of MariaDB system variables can be found
[here.](https://mariadb.com/kb/en/server-system-variables/)
Fixes https://github.com/letsencrypt/boulder/issues/6687.
Deprecate the ROCSPStage7 feature flag, which caused the RA and CA to
stop generating OCSP responses when issuing new certs and when revoking
certs. (That functionality is now handled just-in-time by the
ocsp-responder.) Delete the old OCSP-generating codepaths from the RA
and CA. Remove the CA's internal reference to an OCSP implementation,
because it no longer needs it.
Additionally, remove the SA's "Issuers" config field, which was never
used.
Fixes#6285
Deprecate the ROCSPStage6 feature flag. Remove all references to the
`ocspResponse` column from the SA, both when reading from and when
writing to the `certificateStatus` table. This makes it safe to fully
remove that column from the database.
IN-8731 enabled this flag in all environments, so it is safe to
deprecate.
Part of #6285
Delete the ocsp-updater service, and the //ocsp/updater library that
supports it. Remove test configs for the service, and remove references
to the service from other test files.
This service has been fully shut down for an extended period now, and is
safe to remove.
Fixes#6499
As a follow-up to #6780, add the same style of implementation test to
all of our other gRPC services. This was not included in that PR just to
keep it small and single-purpose.
This file contained both read-only and read-write methods. Its existence
is not reflected in any other gRPC or struct organization; it was easy
to forget that it exists. Merge its contents into both sa.go and
saro.go, so that the methods follow the same organization scheme as the
rest of the SA.
This makes it less likely that bugs like #6778 will happen again.
When external clients make POST requests to our ARI endpoint, they're
getting 404s even when a GET request with the same exact CertID
succeeds. Logs show that this is because the SA is returning "method
GetSerialMetadata not implemented" when the WFE attempts that gRPC
request. This is due to an oversight: the GetSerialMetadata method is
not implemented on the SQLStorageAuthorityRO object, only on the
SQLStorageAuthority object. The unit tests did not catch this bug
because they supply a mock SA, which does implement the method in
question.
Update the receiver and add a wrapper so that GetSerialMetadata is
implemented on both the read-write and read-only SA implementation
types. Add a new kind of test assertion which helps ensure this won't
happen again. Add a TODO for an integration test covering the ARI POST
codepath to prevent a regression.
Fixes#6778
Add an upstream ProxySQL container to our docker-compose. Configure
ProxySQL to manage database connections for our unit and integration
tests.
Fixes#5873
This was mostly unused. The only caller was orphan-finder, which used it
to determine if a certificate was already in the database. But this is
not particularly important functionality, so I've removed it.
sa: rename AddPrecertificateRequest.IssuerID
to IssuerNameID. This is in preparation for adding a similarly-named
field to AddSerialRequest.
Part of #5152.
Use constants from the go stdlib time package, such as time.DateTime and
time.RFC3339, when parsing and formatting timestamps. Additionally,
simplify or remove some of our uses of parsing timestamps, such as to
set fake clocks in tests.
When sql_mode is set as part of a multi-variable SET command (which
happens in go-sql-driver/mysql 1.6.0+), ProxySQL can mis-parse parts of
the SET command that come after it. For instance, if we run:
SET sql_mode=STRICT_ALL_TABLES,log_queries_not_using_indexes=ON;
Then ProxySQL would mis-parse that and pass along to its upstream:
SET sql_mode=STRICT_ALL_TABLES,log_queries_not_using_indexes;
Adding quotes around sql_mode (a string-valued variables) causes
ProxySQL to parse this correctly.
Add a new time.Duration field, LagFactor, to both the SA's config struct
and the read-only SA's implementation struct. In the GetRegistration,
GetOrder, and GetAuthorization2 methods, if the database select returned
a NoRows error and a lagFactor duration is configured, then sleep for
lagFactor seconds and retry the select.
This allows us to compensate for the replication lag between our primary
write database and our read-only replica databases. Sometimes clients
will fire requests in rapid succession (such as creating a new order,
then immediately querying the authorizations associated with that
order), and the subsequent requests will fail because they are directed
to read replicas which are lagging behind the primary. Adding this
simple sleep-and-retry will let us mitigate many of these failures,
without adding too much complexity.
Fixes#6593
This includes two feature flags: one that controls turning on the extra
database queries, and one that causes cert-checker to fail on missing
validations. If the second flag isn't turned on, it will just emit error
log lines. This will help us find any edge conditions we need to deal
with before making the new code trigger alerts.
Fixes#6562
A `core.Authorization` object has lots of fields (e.g. `status`,
`attempted`, `attemptedAt`) which are not relevant to a
newly-created authorization: a brand new authz can only be in
the "pending" state, cannot have been attempted already or have
been validated.
Fix a nil pointer dereference in `sa.NewOrderAndAuthzs` if a
`req *sapb.NewOrderAndAuthzsRequest` is passed into the
function with an inner nil `req.NewOrder`.
Add new tests.
- TestNewOrderAndAuthzs_MissingInnerOrder
- Checks that
the nil pointer dereference no longer occurs
- TestNewOrderAndAuthzs_NewAuthzExpectedFields
- Checks that the `Attempted`, `AttemptedAt`, `ValidationRecords`,
and `ValidationErrors` fields for a brand new authz in the
`pending` state are correctly defaulted to `nil` in
`sa.NewOrderAndAuthzs`.
Add a new test assertion `AssertBoxedNil` that returns true for the
existence of a "boxed nil" - a nil value wrapped in a non-nil interface
type.
Fixes#6535
---------
Co-authored-by: Samantha <hello@entropy.cat>
Add validation of input parameters as unquoted MariaDB identifiers, and
document the regex that does it.
Accept a narrower interface (Queryer) for `Insert()`.
Take a list of fields rather than a string containing multiple fields,
to make validation simpler. Rename retCol to returningColumn.
Document safety properties and requirements.
Neither our testing, staging, nor production configs use the
DBConfig.DBConnect config value. Remove it.
To connect to a database, you have to provide a connection URL. These
URLs often contain sensitive information such as DB usernames and
passwords, so we don't store them directly in our configs -- instead, we
store paths to files which contain these strings, and provision those
files via a separate mechanism. We maintained the ability to provide a
URL directly in the config for the sake of easy testing, but have not
used it for that purpose for some time now.
We use this pattern in several places: there is a query that needs to
have a variable number of placeholders (question marks) in it, depending
on how many items we are inserting or querying for. For instance, when
issuing a precertificate we add that precertificate's names to the
"issuedNames" table. To make things more efficient, we do that in a
single query, whether there is one name on the certificate or a hundred.
That means interpolating into the query string with series of question
marks that matches the number of names.
We have a helper type MultiInserter that solves this problem for simple
inserts, but it does not solve the problem for selects or more complex
inserts, and we still have a number of places that generate their
sequence of question marks manually.
This change updates addIssuedNames to use MultiInserter. To enable that,
it also narrows the interface required by MultiInserter.Insert, so it's
easier to mock in tests.
This change adds the new function db.QuestionMarks, which generates e.g.
`?,?,?` depending on the input N.
In a few places I had to rename a function parameter named `db` to avoid
shadowing the `db` package.
We use partitioning to be able to clean up old data, and partitioning is
incompatible with unique indexes. We still have a unique index on
`serial`, and these tables are downstream from there. There still may
some duplicates, like when a certificate is treated as orphaned but was
actually successfully added to the DB; when we later go to incorporate
it a duplicate will show up.
This reflects changes already made in prod.
This PR removes a unittest that coincidentally relied on these indexes
to generate an error case it needed: `TestAddPrecertificateStatusFail`.
That test was added in #5918. We can bring that test back with a
significant refactoring to change `*db.WrappedMap` to an interface, but
in the meantime we're prioritizing landing this PR so we have a more
realistic integration test environment.
The SA's GetOrder method issues four separate database queries:
- get the Order object itself
- get the list of Names associated with it
- get the list of Authorization IDs associated with it
- get the contents of those Authorizations themselves
These four queries can hit different database replicas with different
amounts of replication lag, and therefore different views of the
universe. This can result in inconsistent results, such as the Order
existing, but not having any Authorization IDs associated with it.
Conduct these four queries all within a single transaction, to ensure
they all hit a single read-replica with a consistent view of the world.
Fixes#6540
Delete the NewOrder and NewAuthorizations2 methods from the SA's gRPC
interface. These methods have been replaced by the unified
NewOrderAndAuthzs method, which performs both sets of insertions in a
single transaction.
Also update the SA and RA unittests to not rely on these methods for
setting up test data that other functions-under-test rely on. In most
cases, replace calls to NewOrder with calls to NewOrderAndAuthzs. In the
SA tests specifically, replace calls to NewAuthorizations2 with a
streamlined helper function that simply does the single necessary
database insert.
Fixes#6510Fixes#5816
In the WFE, ocsp-responder, and crl-updater, switch from using
StorageAuthorityClients to StorageAuthorityReadOnlyClients. This ensures
that these services cannot call methods which write to our database.
Fixes#6454
The `digest` value in AddCertificate's response message is never used by
any callers. Remove it, replacing the whole response message with
google.protobuf.Empty, to mirror the AddPrecertificate method.
This swap is safe, because message names are not sent on the network,
and empty message fields are omitted from the wire format entirely, so
sending the predefined Empty message is identical to sending an empty
AddCertificateResponse message. Since no client is inspecting the
response to access the digest field, sending an empty response will not
break any clients.
Fixes#6498
We have a new method, omitZero, which is to allow `max_statement_time`
and `long_query_time` to be zeroed out, but copy/paste error left
`long_query_time` out. Fix it.
The SA's NewOrder and NewOrderAndAuthzs methods both write new rows (the
order itself, new authorizations, and new OrderToAuthz relations) to the
database, and then quickly turn around and query the database for a
bunch of authz rows to compute the status of the new order. This is
necessary because many orders are created already referencing existing
authorizations, and the state of those authorizations is not known at
order creation time, but does affect the order's status.
Due to replication lag, it can cause issues if the database writes go to
the primary database, but the follow-up read goes to a read-only
database replica which may be lagging slightly. To prevent this issue,
conduct the reads on the same transaction as the writes.
Fixes#6511
- Replace `-1` in return values with `0`. No callers were depending on
`-1`.
- Replace `count(` with `COUNT(` for the sake of readability.
- Replace `COUNT(1)` with `COUNT(*)` (https://mariadb.com/kb/en/count).
Both
versions provide identical outputs but let's standardize on the docs.
Fixes#6494
Deprecate these feature flags, which are consistently set in both prod
and staging and which we do not expect to change the value of ever
again:
- AllowReRevocation
- AllowV1Registration
- CheckFailedAuthorizationsFirst
- FasterNewOrdersRateLimit
- GetAuthzReadOnly
- GetAuthzUseIndex
- MozRevocationReasons
- RejectDuplicateCSRExtensions
- RestrictRSAKeySizes
- SHA1CSRs
Move each feature flag to the "deprecated" section of features.go.
Remove all references to these feature flags from Boulder application
code, and make the code they were guarding the only path. Deduplicate
tests which were testing both the feature-enabled and feature-disabled
code paths. Remove the flags from all config-next JSON configs (but
leave them in config ones until they're fully deleted, not just
deprecated). Finally, replace a few testdata CSRs used in CA tests,
because they had SHA1WithRSAEncryption signatures that are now rejected.
Fixes#5171Fixes#6476
Part of #5997
Create a new gRPC service named StorageAuthorityReadOnly which only
exposes a read-only subset of the existing StorageAuthority service's
methods.
Implement this by splitting the existing SA in half, and having the
read-write half embed and wrap an instance of the read-only half.
Unfortunately, many of our tests use exported read-write methods as part
of their test setup, so the tests are all being performed against the
read-write struct, but they are exercising the same code as the
read-only implementation exposes.
Expose this new service at the SA on the same port as the existing
service, but with (in config-next) different sets of allowed clients. In
the future, read-only clients will be removed from the read-write
service's set of allowed clients.
Part of #6454