Commit Graph

719 Commits

Author SHA1 Message Date
Aaron Gable 164e035915
Reduce logging from inflight validation collisions (#7209)
If a client attempts to validate a challenge twice in rapid succession,
we'll kick off two background validation routines. One of these will
complete first, updating the database with success or failure. The other
will fail when it attempts to update the database and finds that there
are no longer any authorizations with that ID in the "pending" state.
Reduce the level at which we log such events, since we don't
particularly care about them.

Fixes https://github.com/letsencrypt/boulder/issues/3995
2023-12-15 09:58:34 -08:00
Aaron Gable 21b18667b2
Remove static test certs from SA unittests (#7217)
Fixes https://github.com/letsencrypt/boulder/issues/6279
2023-12-15 07:36:59 -08:00
Phil Porada 51e9f39259
Finish migration from int64 durations to durationpb (#7147)
This is a cleanup PR finishing the migration from int64 durations to
protobuf `*durationpb.Duration` by removing all usage of the old int64
fields. In the previous PR
https://github.com/letsencrypt/boulder/pull/7146 all fields were
switched to read from the protobuf durationpb fields.

Fixes https://github.com/letsencrypt/boulder/issues/7097
2023-11-28 12:51:11 -05:00
Phil Porada 6925fad324
Finish migration from int64 timestamps to timestamppb (#7142)
This is a cleanup PR finishing the migration from int64 timestamps to
protobuf `*timestamppb.Timestamps` by removing all usage of the old
int64 fields. In the previous PR
https://github.com/letsencrypt/boulder/pull/7121 all fields were
switched to read from the protobuf timestamppb fields.

Adds a new case to `core.IsAnyNilOrZero` to check various properties of
a `*timestamppb.Timestamp` reducing the visual complexity for receivers.

Fixes https://github.com/letsencrypt/boulder/issues/7060
2023-11-27 13:37:31 -08:00
Phil Porada 279a4d539d
Read from durationpb instead of int64 durations (#7146)
Switch to reading grpc duration values from the new durationpb protofbuf
fields, completely ignoring the old int64 fields.

Part 2 of 3 for https://github.com/letsencrypt/boulder/issues/7097
2023-11-13 12:23:46 -05:00
Aaron Gable f24ec910ef
Further simplifications to test.ThrowAwayCert (#7129)
Remove ThrowAwayCert's nameCount argument, since it is always set to 1
by all callers. Remove ThrowAwayCertWithSerial, because it has no
callers. Change the throwaway cert's key from RSA512 to ECDSA P-224 for
a two-orders-of-magnitude speedup in key generation. Use this simplified
form in two new places in the RA that were previously rolling their own
test certs.
2023-11-02 09:45:56 -07:00
Aaron Gable 3a3e32514c
Give throwaway test certs reasonable validity intervals (#7128)
Add a new clock argument to the test-only ThrowAwayCert function, and
use that clock to generate reasonable notBefore and notAfter timestamps
in the resulting throwaway test cert. This is necessary to easily test
functions which rely on the expiration timestamp of the certificate,
such as upcoming work about computing CRL shards.

Part of https://github.com/letsencrypt/boulder/issues/7094
2023-11-01 15:24:43 -07:00
Phil Porada b8b105453a
Rename protobuf duration fields to <fieldname>NS and populate new duration fields (#7115)
* Renames all of int64 as a time.Duration fields to `<fieldname>NS` to
indicate they are Unix nanoseconds.
* Adds new `google.protobuf.Duration` fields to each .proto file where
we previously had been using an int64 field to populate a time.Duration.
* Updates relevant gRPC messages.

Part 1 of 3 for https://github.com/letsencrypt/boulder/issues/7097
2023-10-26 10:46:03 -04:00
Phil Porada a5c2772004
Add and populate new protobuf Timestamp fields (#7070)
* Adds new `google.protobuf.Timestamp` fields to each .proto file where
we had been using `int64` fields as a timestamp.
* Updates relevant gRPC messages to populate the new
`google.protobuf.Timestamp` fields in addition to the old `int64`
timestamp fields.
* Added tests for each `<x>ToPB` and `PBto<x>` functions to ensure that
new fields passed into a gRPC message arrive as intended.
* Removed an unused error return from `PBToCert` and `PBToCertStatus`
and cleaned up each call site.

Built on-top of https://github.com/letsencrypt/boulder/pull/7069
Part 2 of 4 related to
https://github.com/letsencrypt/boulder/issues/7060
2023-10-11 12:12:12 -04:00
Aaron Gable bab048d221
SA: Add and use revokedCertificates table (#7095)
Add a new "revokedCertificates" table to the database schema. This table
is similar to the existing "certificateStatus" table in many ways, but
the idea is that it will only have rows added to it when certificates
are revoked, not when they're issued. Thus, it will grow many orders of
magnitude slower than the certificateStatus table does. Eventually, it
will replace that table entirely.

The one column that revokedCertificates adds is the new "ShardIdx"
column, which is the CRL shard in which the revoked certificate will
appear. This way we can assign certificates to CRL shards at the time
they are revoked, and guarantee that they will never move to a different
shard even if we change the number of shards we produce. This will
eventually allow us to put CRL URLs directly into our certificates,
replacing OCSP URLs.

Add new logic to the SA's RevokeCertificate and UpdateRevokedCertificate
methods to handle this new table. If these methods receive a request
which specifies a CRL shard (our CRL shards are 1-indexed, so shard 0
does not exist), then they will ensure that the new revocation status is
written into both the certificateStatus and revokedCertificates tables.
This logic will not function until the RA is updated to take advantage
of it, so it is not a risk for it to appear in Boulder before the new
table has been created.

Also add new logic to the SA's GetRevokedCertificates method. Similar to
the above, this reads from the new table if the ShardIdx field is
supplied in the request message. This code will not operate until the
crl-updater is updated to include this field. We will not perform this
update for a minimum of 100 days after this code is deployed, to ensure
that all unexpired revoked certificates are present in the
revokedCertificates table.

Part of https://github.com/letsencrypt/boulder/issues/7094
2023-10-02 10:21:14 -07:00
Phil Porada 034316ef6a
Rename int64 timestamp related protobuf fields to <fieldname>NS (#7069)
Rename all of int64 timestamp fields to `<fieldname>NS` to indicate they
are Unix nanosecond timestamps.

Part 1 of 4 related to
https://github.com/letsencrypt/boulder/issues/7060
2023-09-15 13:49:07 -04:00
Aaron Gable a70fc604a3
Use go1.21's stdlib slices package (#7074)
As of go1.21, there's a new standard library package which provides
basically the same (generic!) methods as the x/exp/slices package has
been. Now that we're on go1.21, let's use the more stable package.

Fixes https://github.com/letsencrypt/boulder/issues/6951
Fixes https://github.com/letsencrypt/boulder/issues/7032
2023-09-08 13:46:46 -07:00
Aaron Gable 7bed24a401
SA: Fix two bugs in UpdateCRLShard (#7052)
The NextUpdate field should not be required, as it is not necessary for
tracking and preventing duplicate work between multiple crl-updater
instances.

The ThisUpdate conditional needs explicit handling for NULL to ensure
that it updates correctly.
2023-08-31 12:06:33 -04:00
Aaron Gable 6a450a2272
Improve CRL shard leasing (#7030)
Simplify the index-picking logic in the SA's leaseOldestCrlShard method.
Specifically, more clearly separate it into "missing" and "non-missing"
cases, which require entirely different logic: picking a random missing
shard, or picking the oldest unleased shard, respectively.

Also change the UpdateCRLShard method to "unlease" shards when they're
updated. This allows the crl-updater to run as quickly as it likes,
while still ensuring that multiple instances do not step on each other's
toes.

The config change for shardWidth and lookbackPeriod instead of
certificateLifetime has been deployed in prod since IN-8445. The config
change changing the shardWidth is just so that the tests neither produce
a bazillion shards, nor have to do a bazillion SA queries for each chunk
within a shard, improving the readability of test logs.

Part of https://github.com/letsencrypt/boulder/issues/7023
2023-08-08 17:05:00 -07:00
Jacob Hoffman-Andrews 38fc840184
sa: refactor how metrics and logging are set up (#7031)
This eliminates the need for a pair of accessors on `db.WrappedMap` that
expose the underlying `*sql.DB` and `*borp.DbMap`.

Fixes #6991
2023-08-08 09:51:23 -07:00
Aaron Gable 9a4f0ca678
Deprecate LeaseCRLShards feature (#7009)
This feature flag is enabled in both staging and prod.
2023-08-07 15:17:00 -07:00
Jacob Hoffman-Andrews 725f190c01
ca: remove orphan queue code (#7025)
The `orphanQueueDir` config field is no longer used anywhere.

Fixes #6551
2023-08-02 16:04:28 -07:00
Samantha 055f620c4b
Initial implementation of key-value rate limits (#6947)
This design seeks to reduce read-pressure on our DB by moving rate limit
tabulation to a key-value datastore. This PR provides the following:

- (README.md) a short guide to the schemas, formats, and concepts
introduced in this PR
- (source.go) an interface for storing, retrieving, and resetting a
subscriber bucket
- (name.go) an enumeration of all defined rate limits
- (limit.go) a schema for defining default limits and per-subscriber
overrides
- (limiter.go) a high-level API for interacting with key-value rate
limits
- (gcra.go) an implementation of the Generic Cell Rate Algorithm, a
leaky bucket-style scheduling algorithm, used to calculate the present
or future capacity of a subscriber bucket using spend and refund
operations

Note: the included source implementation is test-only and currently
accomplished using a simple in-memory map protected by a mutex,
implementations using Redis and potentially other data stores will
follow.

Part of #5545
2023-07-21 12:57:18 -04:00
Aaron Gable 908421bb98
crl-updater: lease CRL shards to prevent races (#6941)
Add a new feature flag, LeaseCRLShards, which controls certain aspects
of crl-updater's behavior.

When this flag is enabled, crl-updater calls the new SA.LeaseCRLShard
method before beginning work on a shard. This prevents it from stepping
on the toes of another crl-updater instance which may be working on the
same shard. This is important to prevent two competing instances from
accidentally updating a CRL's Number (which is an integer representation
of its thisUpdate timestamp) *backwards*, which would be a compliance
violation.

When this flag is enabled, crl-updater also calls the new
SA.UpdateCRLShard method after finishing work on a shard.

In the future, additional work will be done to make crl-updater use the
"give me the oldest available shard" mode of the LeaseCRLShard method.

Fixes https://github.com/letsencrypt/boulder/issues/6897
2023-07-19 15:11:16 -07:00
Jacob Hoffman-Andrews 7d66d67054
It's borpin' time! (#6982)
This change replaces [gorp] with [borp].

The changes consist of a mass renaming of the import and comments / doc
fixups, plus modifications of many call sites to provide a
context.Context everywhere, since gorp newly requires this (this was one
of the motivating factors for the borp fork).

This also refactors `github.com/letsencrypt/boulder/db.WrappedMap` and
`github.com/letsencrypt/boulder/db.Transaction` to not embed their
underlying gorp/borp objects, but to have them as plain fields. This
ensures that we can only call methods on them that are specifically
implemented in `github.com/letsencrypt/boulder/db`, so we don't miss
wrapping any. This required introducing a `NewWrappedMap` method along
with accessors `SQLDb()` and `BorpDB()` to get at the internal fields
during metrics and logging setup.

Fixes #6944
2023-07-17 14:38:29 -07:00
Aaron Gable bd29cc430f
Allow reading incident rows with NULL columns (#6961)
Fixes https://github.com/letsencrypt/boulder/issues/6960
2023-06-30 08:29:16 -07:00
Aaron Gable 3d80d8505e
SA: gRPC methods for leasing CRL shards (#6940)
Add two new methods, LeaseCRLShard and UpdateCRLShard, to the SA gRPC
interface. These methods work in concert both to prevent multiple
instances of crl-updater from stepping on each others toes, and to lay
the groundwork for a less bursty version of crl-updater in the future.

Introduce a new database table, crlShards, which tracks the thisUpdate
and nextUpdate timestamps of each CRL shard for each issuer. It also has
a column "leasedUntil", which is also a timestamp. Grant the SA user
read-write access to this table.

LeaseCRLShard updates the leasedUntil column of the identified shard to
the given time. It returns an error if the identified shard's
leasedUntil timestamp is already in the future. This provides a
mechanism for crl-updater instances to "lick the cookie", so to speak,
marking CRL shards as "taken" so that multiple crl-updater instances
don't attempt to work on the same shard at the same time. Using a
timestamp has the added benefit that leases are guaranteed to expire,
ensuring that we don't accidentally fail to work on a shard forever.

LeaseCRLShard has a second mode of operation, when a range of potential
shards is given in the request, rather than a single shard. In this
mode, it returns the shard (within the given range) whose thisUpdate
timestamp is oldest. (Shards with no thisUpdate timestamp, including
because the requested range includes shard indices the database doesn't
yet know about, count as older than any shard with any thisUpdate
timestamp.) This allows crl-updater instances which don't care which
shard they're working on to do the most urgent work first.

UpdateCRLShard updates the thisUpdate and nextUpdate timestamps of the
identified shard. This closes the loop with the second mode of
LeaseCRLShard above: by updating the thisUpdate timestamp, the method
marks the shard as no longer urgently needing to be worked on.

IN-9220 tracks creating this table in staging and production
Part of #6897
2023-06-26 15:39:13 -07:00
Jacob Hoffman-Andrews 824417f6c0
sa: refactor db initialization (#6930)
Previously, we had three chained calls initializing a database:

 - InitWrappedDb calls NewDbMap
 - NewDbMap calls NewDbMapFromConfig

Since all three are exporetd, this left me wondering when to call one vs
the others.

It turns out that NewDbMap is only called from tests, so I renamed it to
DBMapForTest to make that clear.

NewDbMapFromConfig is only called internally to the SA, so I made it
unexported it as newDbMapFromMysqlConfig.

Also, I copied the ParseDSN call into InitWrappedDb, so it doesn't need
to call DBMapForTest. Now InitWrappedDb and DBMapForTest both
independently call newDbMapFromMysqlConfig.

I also noticed that InitDBMetrics was only called internally so I
unexported it.
2023-06-13 10:15:40 -07:00
Samantha 124c4cc6f5
grpc/sa: Implement deep health checks (#6928)
Add the necessary scaffolding for deep health checking of our various
gRPC components. Each component implementation that also implements the
grpc.checker interface will be checked periodically, and the health
status of the component will be updated accordingly.

Add the necessary methods to SA to implement the grpc.checker interface
and register these new health checks with Consul.

Additionally:
- Update entry point script to check for ProxySQL readiness.
- Increase the poll rate for gRPC Consul checks from 5s to 2s to help
with DNS failures, due to check failures, on startup.
- Change log level for Consul from INFO to ERROR to deal with noisy logs
full of transport failures due to Consul gRPC checks firing before the
SAs are up.

Fixes #6878
Part of #6795
2023-06-12 13:58:53 -04:00
Jacob Hoffman-Andrews 80e1510819
admin: add clear-email subcommand (#6919)
When a user wants their email address deleted from the database but no
longer has access to their account, this allows an administrator to
clear it.

This adds `admin` as an alias for `admin-revoker`, because we'd like the
clear-email sub-command to be a part of that overall tool, but it's not
really revocation related.

Part of #6864
2023-06-01 14:33:24 -04:00
Samantha e72a8f9cac
docker: Update proxysql container to match production (#6914) 2023-05-31 11:31:10 -04:00
Jacob Hoffman-Andrews b9eeb6ce1c
sa/database: move unmoored comment (#6922)
This comment about STRICT_ALL_TABLES got separated from the code it
documented. Bring them back together.
2023-05-30 09:15:06 -07:00
Phil Porada c75bf7033a
SA: Don't store HTTP-01 hostname and port in database validationrecord (#6863)
Removes the `Hostname` and `Port` fields from an http-01
ValidationRecord model prior to storing the record in the database.
Using `"hostname":"example.com","port":"80"` as a snippet of a whole
validation record, we'll save minimum 36 bytes for each new http-01
ValidationRecord that gets stored. When retrieving the record, the
ValidationRecord `RehydrateHostPort` method will repopulate the
`Hostname` and `Port` fields from the `URL` field.

Fixes the main goal of
https://github.com/letsencrypt/boulder/issues/5231.

---------

Co-authored-by: Samantha <hello@entropy.cat>
2023-05-23 15:36:17 -04:00
Aaron Gable 56f8537e68
Ensure SelectOne queries never return more than 1 row (#6900)
As a follow-up to https://github.com/letsencrypt/boulder/issues/5467, I
did an audit of all places where we call SelectOne to ensure that those
queries can never return more than one result. These four functions were
the only places that weren't already constrained to a single result
through the use of "SELECT COUNT", "LIMIT 1", "WHERE uniqueKey =", or
similar. Limit these functions' queries to always only return a single
result, now that their underlying tables no longer have unique key
constraints.

Additionally, slightly refactor selectRegistration to just take a single
column name rather than a whole WHERE clause.

Fixes https://github.com/letsencrypt/boulder/issues/6521
2023-05-17 14:13:21 -07:00
Matthew McPherrin 8c9c55609b
Remove redundant jose import alias (#6887)
This PR should have no functional change; just a cleanup.
2023-05-15 09:45:58 -07:00
Aaron Gable 1fcd951622
Probs: simplifications and cleanup (#6876)
Make minor, non-user-visible changes to how we structure the probs
package. Notably:
- Add new problem types for UnsupportedContact and
UnsupportedIdentifier, which are specified by RFC8555 and which we will
use in the future, but haven't been using historically.
- Sort the problem types and constructor functions to match the
(alphabetical) order given in RFC8555.
- Rename some of the constructor functions to better match their
underlying problem types (e.g. "TLSError" to just "TLS").
- Replace the redundant ProblemDetailsToStatusCode function with simply
always returning a 500 if we haven't properly set the problem's
HTTPStatus.
- Remove the ability to use either the V1 or V2 error namespace prefix;
always use the proper RFC namespace prefix.
2023-05-12 12:10:13 -04:00
Jacob Hoffman-Andrews 1c7e0fd1d8
Store linting certificate instead of precertificate (#6807)
In order to get rid of the orphan queue, we want to make sure that
before we sign a precertificate, we have enough data in the database
that we can fulfill our revocation-checking obligations even if storing
that precertificate in the database fails. That means:

- We should have a row in the certificateStatus table for the serial.
- But we should not serve "good" for that serial until we are positive
the precertificate was issued (BRs 4.9.10).
- We should have a record in the live DB of the proposed certificate's
public key, so the bad-key-revoker can mark it revoked.
- We should have a record in the live DB of the proposed certificate's
names, so it can be revoked if we are required to revoke based on names.

The SA.AddPrecertificate method already achieves these goals for
precertificates by writing to the various metadata tables. This PR
repurposes the SA.AddPrecertificate method to write "proposed
precertificates" instead.

We already create a linting certificate before the precertificate, and
that linting certificate is identical to the precertificate that will be
issued except for the private key used to sign it (and the AKID). So for
instance it contains the right pubkey and SANs, and the Issuer name is
the same as the Issuer name that will be used. So we'll use the linting
certificate as the "proposed precertificate" and store it to the DB,
along with appropriate metadata.

In the new code path, rather than writing "good" for the new
certificateStatus row, we write a new, fake OCSP status string "wait".
This will cause us to return internalServerError to OCSP requests for
that serial (but we won't get such requests because the serial has not
yet been published). After we finish precertificate issuance, we update
the status to "good" with SA.SetCertificateStatusReady.

Part of #6665
2023-04-26 13:54:24 -07:00
Aaron Gable 97aa50977f
Give orderToAuthz2 an auto-increment ID column (#6835)
Replace the current orderToAuthz2 table schema with one that includes an
auto-increment ID column, so that this table can be partitioned simply
by ID, like all of our other partitioned tables.

Update the SA so that when it selects from a join over this table and
the authz2 table, it explicitly selects the columns from the authz2
table, to avoid the ambiguity introduced by having two columns named
"id" in the result set.

This work is already in-progress in prod, represented by IN-8916 and
IN-8928.

Fixes https://github.com/letsencrypt/boulder/issues/6820
2023-04-24 14:59:18 -07:00
Aaron Gable 5480f1060b
Clean up database schema (#6832)
Make a series of small changes to our test database schema, both to make
it simpler to reason about and to bring it closer in alignment to our
production database schema:
- Incorporate the IssuedNamesDropIndex, Incidents, SimplePartitioning,
and NotUnique migrations into the CombinedSchema, as they have been
fully applied in prod;
- Use CHARSET=utf8mb4 everywhere, instead of just utf8;
- Use UNSIGNED for auto-increment ID columns in the tables where prod
does; and
- Re-sort the tables in CombinedSchema which no longer have foreign key
constraints.

Part of https://github.com/letsencrypt/boulder/issues/6820
2023-04-21 10:37:05 -07:00
Phil Porada 939a14544c
SA: Check MariaDB system variables at startup (#6791)
Adds a new function to the `//sa` to ensure that a MariaDB config passed
in via SA `setDefault` or via DSN perform the following validations:
1. Correct quoting for strings and string enums to prevent future
problems such as PR #6683 from occurring.
2. Each system variable we care to use is scoped as SESSION, rather than
strictly GLOBAL.
3. Detect system variables passed in that are not in a curated list of
variables we care about.
4. Validate that values for booleans, floats, integers, and strings at
least pass basic a regex.

This change is in a bit of a weird place. The ideal place for this
change would be `go-sql-driver/mysql`, but since that driver handles the
general case of MySQL-compatible connections to the database, we're
implementing this validation in Boulder instead. We're confident about
the specific versions of MariaDB running in staging/prod and that the
database vendor won't change underneath us, which is why I decided to
take this approach. However, this change will bind us tighter to MariaDB
than MySQL due to the specific variables we're checking. An up-to-date
list of MariaDB system variables can be found
[here.](https://mariadb.com/kb/en/server-system-variables/)

Fixes https://github.com/letsencrypt/boulder/issues/6687.
2023-04-18 11:02:33 -04:00
Aaron Gable 1235cbed5e
Re-remove never-used crls table (#6817)
Relands #5303, which was accidentally reverted in #5305.

Fixes https://github.com/letsencrypt/boulder/issues/6816
2023-04-17 16:00:17 -07:00
Aaron Gable 45329c9472
Deprecate ROCSPStage7 flag (#6804)
Deprecate the ROCSPStage7 feature flag, which caused the RA and CA to
stop generating OCSP responses when issuing new certs and when revoking
certs. (That functionality is now handled just-in-time by the
ocsp-responder.) Delete the old OCSP-generating codepaths from the RA
and CA. Remove the CA's internal reference to an OCSP implementation,
because it no longer needs it.

Additionally, remove the SA's "Issuers" config field, which was never
used.

Fixes #6285
2023-04-12 17:03:06 -07:00
Aaron Gable 7e994a1216
Deprecate ROCSPStage6 feature flag (#6770)
Deprecate the ROCSPStage6 feature flag. Remove all references to the
`ocspResponse` column from the SA, both when reading from and when
writing to the `certificateStatus` table. This makes it safe to fully
remove that column from the database.

IN-8731 enabled this flag in all environments, so it is safe to
deprecate.

Part of #6285
2023-04-04 15:41:51 -07:00
Aaron Gable 8c67769be4
Remove ocsp-updater from Boulder (#6769)
Delete the ocsp-updater service, and the //ocsp/updater library that
supports it. Remove test configs for the service, and remove references
to the service from other test files.

This service has been fully shut down for an extended period now, and is
safe to remove.

Fixes #6499
2023-03-31 14:39:04 -07:00
Aaron Gable 9262ca6e3f
Add grpc implementation tests to all services (#6782)
As a follow-up to #6780, add the same style of implementation test to
all of our other gRPC services. This was not included in that PR just to
keep it small and single-purpose.
2023-03-31 09:52:26 -07:00
Aaron Gable 27f0860aed
Remove precertificates.go (#6783)
This file contained both read-only and read-write methods. Its existence
is not reflected in any other gRPC or struct organization; it was easy
to forget that it exists. Merge its contents into both sa.go and
saro.go, so that the methods follow the same organization scheme as the
rest of the SA.

This makes it less likely that bugs like #6778 will happen again.
2023-03-30 17:59:11 -04:00
Aaron Gable 0d0116dd3f
Implement GetSerialMetadata on StorageAuthorityRO (#6780)
When external clients make POST requests to our ARI endpoint, they're
getting 404s even when a GET request with the same exact CertID
succeeds. Logs show that this is because the SA is returning "method
GetSerialMetadata not implemented" when the WFE attempts that gRPC
request. This is due to an oversight: the GetSerialMetadata method is
not implemented on the SQLStorageAuthorityRO object, only on the
SQLStorageAuthority object. The unit tests did not catch this bug
because they supply a mock SA, which does implement the method in
question.

Update the receiver and add a wrapper so that GetSerialMetadata is
implemented on both the read-write and read-only SA implementation
types. Add a new kind of test assertion which helps ensure this won't
happen again. Add a TODO for an integration test covering the ARI POST
codepath to prevent a regression.

Fixes #6778
2023-03-30 12:32:14 -07:00
Phil Porada ce2ee69c5f
SARO: Add sa_lag_factor metric to assess usage of the lagFactor codepath (#6774)
Add `sa_lag_retry` prometheus countervec metric with pass/fail
dimensions for `GetOrder`, `GetAuthorization2`, and `GetRegistration`
methods.

The new metrics will appear as follows:
```
sa_lag_retry{method="GetOrder",result="found"} 0
sa_lag_retry{method="GetOrder",result="notfound"} 0
sa_lag_retry{method="GetOrder",result="other"} 0
sa_lag_retry{method="GetAuthorization2",result="found"} 0
sa_lag_retry{method="GetAuthorization2",result="notfound"} 0
sa_lag_retry{method="GetAuthorization2",result="other"} 0
sa_lag_retry{method="GetRegistration",result="found"} 0
sa_lag_retry{method="GetRegistration",result="notfound"} 0
sa_lag_retry{method="GetRegistration",result="other"} 0
```

Fixes https://github.com/letsencrypt/boulder/issues/6773

---------

Co-authored-by: Samantha <hello@entropy.cat>
2023-03-30 13:48:16 -04:00
Samantha 511f5b79f1
test: Add ProxySQL to our Docker development stack (#6754)
Add an upstream ProxySQL container to our docker-compose. Configure
ProxySQL to manage database connections for our unit and integration
tests.

Fixes #5873
2023-03-29 18:41:24 -04:00
Jacob Hoffman-Andrews 85fd3ed8b7
sa: remove GetPrecertificate (#6692)
This was mostly unused. The only caller was orphan-finder, which used it
to determine if a certificate was already in the database. But this is
not particularly important functionality, so I've removed it.
2023-03-01 11:30:51 -08:00
Jacob Hoffman-Andrews d9872dbe41
sa: rename AddPrecertificateRequest.IssuerID (#6689)
sa: rename AddPrecertificateRequest.IssuerID
to IssuerNameID. This is in preparation for adding a similarly-named
field to AddSerialRequest.

Part of #5152.
2023-02-27 17:21:00 -05:00
Aaron Gable 5ce4b5a6d4
Use time format constants (#6694)
Use constants from the go stdlib time package, such as time.DateTime and
time.RFC3339, when parsing and formatting timestamps. Additionally,
simplify or remove some of our uses of parsing timestamps, such as to
set fake clocks in tests.
2023-02-24 11:22:23 -08:00
Jacob Hoffman-Andrews 8fd5861c1f
sa: quote sql_mode (#6683)
When sql_mode is set as part of a multi-variable SET command (which
happens in go-sql-driver/mysql 1.6.0+), ProxySQL can mis-parse parts of
the SET command that come after it. For instance, if we run:

SET sql_mode=STRICT_ALL_TABLES,log_queries_not_using_indexes=ON;

Then ProxySQL would mis-parse that and pass along to its upstream:

SET sql_mode=STRICT_ALL_TABLES,log_queries_not_using_indexes;

Adding quotes around sql_mode (a string-valued variables) causes
ProxySQL to parse this correctly.
2023-02-22 16:30:04 -05:00
Aaron Gable f9e4fb6c06
Add replication lag retries to some SA methods (#6649)
Add a new time.Duration field, LagFactor, to both the SA's config struct
and the read-only SA's implementation struct. In the GetRegistration,
GetOrder, and GetAuthorization2 methods, if the database select returned
a NoRows error and a lagFactor duration is configured, then sleep for
lagFactor seconds and retry the select.

This allows us to compensate for the replication lag between our primary
write database and our read-only replica databases. Sometimes clients
will fire requests in rapid succession (such as creating a new order,
then immediately querying the authorizations associated with that
order), and the subsequent requests will fail because they are directed
to read replicas which are lagging behind the primary. Adding this
simple sleep-and-retry will let us mitigate many of these failures,
without adding too much complexity.

Fixes #6593
2023-02-14 17:25:13 -08:00
Jacob Hoffman-Andrews e57c788086
Add checking of validations to cert-checker (#6617)
This includes two feature flags: one that controls turning on the extra
database queries, and one that causes cert-checker to fail on missing
validations. If the second flag isn't turned on, it will just emit error
log lines. This will help us find any edge conditions we need to deal
with before making the new code trigger alerts.

Fixes #6562
2023-02-03 16:25:41 -05:00