In order to get rid of the orphan queue, we want to make sure that
before we sign a precertificate, we have enough data in the database
that we can fulfill our revocation-checking obligations even if storing
that precertificate in the database fails. That means:
- We should have a row in the certificateStatus table for the serial.
- But we should not serve "good" for that serial until we are positive
the precertificate was issued (BRs 4.9.10).
- We should have a record in the live DB of the proposed certificate's
public key, so the bad-key-revoker can mark it revoked.
- We should have a record in the live DB of the proposed certificate's
names, so it can be revoked if we are required to revoke based on names.
The SA.AddPrecertificate method already achieves these goals for
precertificates by writing to the various metadata tables. This PR
repurposes the SA.AddPrecertificate method to write "proposed
precertificates" instead.
We already create a linting certificate before the precertificate, and
that linting certificate is identical to the precertificate that will be
issued except for the private key used to sign it (and the AKID). So for
instance it contains the right pubkey and SANs, and the Issuer name is
the same as the Issuer name that will be used. So we'll use the linting
certificate as the "proposed precertificate" and store it to the DB,
along with appropriate metadata.
In the new code path, rather than writing "good" for the new
certificateStatus row, we write a new, fake OCSP status string "wait".
This will cause us to return internalServerError to OCSP requests for
that serial (but we won't get such requests because the serial has not
yet been published). After we finish precertificate issuance, we update
the status to "good" with SA.SetCertificateStatusReady.
Part of #6665
Deprecate the ROCSPStage7 feature flag, which caused the RA and CA to
stop generating OCSP responses when issuing new certs and when revoking
certs. (That functionality is now handled just-in-time by the
ocsp-responder.) Delete the old OCSP-generating codepaths from the RA
and CA. Remove the CA's internal reference to an OCSP implementation,
because it no longer needs it.
Additionally, remove the SA's "Issuers" config field, which was never
used.
Fixes#6285
Deprecate the ROCSPStage6 feature flag. Remove all references to the
`ocspResponse` column from the SA, both when reading from and when
writing to the `certificateStatus` table. This makes it safe to fully
remove that column from the database.
IN-8731 enabled this flag in all environments, so it is safe to
deprecate.
Part of #6285
Change the SetCommonName flag, introduced in #6706, to
RequireCommonName. Rather than having the flag control both whether or
not a name is hoisted from the SANs into the CN *and* whether or not the
CA is willing to issue certs with no CN, this updated flag now only
controls the latter. By default, the new flag is true, and continues our
current behavior of failing issuance if we cannot set a CN in the cert.
When the flag is set to false, then we are willing to issue certificates
for which the CSR contains no CN and there is no SAN short enough to be
hoisted into the CN field.
When we have rolled out this change, we can move on to the next flag in
this series: HoistCommonName, which will control whether or not a SAN is
hoisted at all, effectively giving the CSRs (and therefore the clients)
full control over whether their certificate contains a SAN.
This change is safe because no environment explicitly sets the
SetCommonName flag to false yet.
Fixes#5112
Add a new feature flag, `SetCommonName`, which defaults to `true`. In
this default state, no behavior changes.
When set to `false` on the CA, this flag will cause the CA to leave the
Subject commonName field of the certificate blank, as is recommended by
the Baseline Requirements Section 7.1.4.2.2(a).
Also slightly modify the behavior of the RA's `matchesCSR()` function,
to allow for both certificates that have a CN and certificates that
don't. It is not feasible to put this behavior behind the same
SetCommonName flag, because that would require an atomic deploy of both
the RA and the CA.
Obsoletes #5112
Add the "AsyncFinalize" feature flag. When enabled, this causes the RA
to return almost immediately from FinalizeOrder requests, with the
actual hard work of issuing the precertificate, getting SCTs, issuing
the final certificate, and updating the database accordingly all
occuring in a background goroutine while the client polls the GetOrder
endpoint waiting for the result.
This is implemented by factoring out the majority of the finalization
work into a new `issueCertificateOuter` helper function, and simply
using the new flag to determine whether we call that helper in a
goroutine or not. This makes removing the feature flag in the future
trivially easy.
Also add a new prometheus metric named `inflight_finalizes` which can be
used to count the number of simultaneous goroutines which are performing
finalization work. This metric is exported regardless of the state of
the AsyncFinalize flag, so that we can observe any changes to this
metric when the flag is flipped.
Fixes#6575
Remove the `MandatoryPOSTasGET` flag from the WFE2.
Update the ACMEv2 divergence doc to note that neither staging nor
production use MandatoryPOSTasGET.
Fixes#6582.
This includes two feature flags: one that controls turning on the extra
database queries, and one that causes cert-checker to fail on missing
validations. If the second flag isn't turned on, it will just emit error
log lines. This will help us find any edge conditions we need to deal
with before making the new code trigger alerts.
Fixes#6562
Right now the expiration mailer does one big SELECT on
`certificateStatus` to find certificates to work on, then several
thousand SELECTs of individual serial numbers in `certificates`.
Since it's more efficient to get that data as a stream from a single
query, rather than thousands of separate queries, turn that into a JOIN.
NOTE: We used to use a JOIN, and switched to the current approach in
#2440 for performance reasons. I _believe_ part of the issue was that at
the time we were not using READ UNCOMMITTED, so we may have been slowing
down the database by requiring it to keep copies of a lot of rows during
the query. Still, it's possible that I've misunderstood the performance
characteristics here and it will still be a regression to use JOIN. So
I've gated the new behavior behind a feature flag.
The feature flag required extracting a new function, `getCerts`. That in
turn required changing some return types so we are not as closely tied
to `core.Certificate`. Instead we use a new local type named
`certDERWithRegId`, which can be provided either by the new code path or
the old code path.
Deprecate these feature flags, which are consistently set in both prod
and staging and which we do not expect to change the value of ever
again:
- AllowReRevocation
- AllowV1Registration
- CheckFailedAuthorizationsFirst
- FasterNewOrdersRateLimit
- GetAuthzReadOnly
- GetAuthzUseIndex
- MozRevocationReasons
- RejectDuplicateCSRExtensions
- RestrictRSAKeySizes
- SHA1CSRs
Move each feature flag to the "deprecated" section of features.go.
Remove all references to these feature flags from Boulder application
code, and make the code they were guarding the only path. Deduplicate
tests which were testing both the feature-enabled and feature-disabled
code paths. Remove the flags from all config-next JSON configs (but
leave them in config ones until they're fully deleted, not just
deprecated). Finally, replace a few testdata CSRs used in CA tests,
because they had SHA1WithRSAEncryption signatures that are now rejected.
Fixes#5171Fixes#6476
Part of #5997
Rather than simply refusing to write OCSP Response bytes to the
database (which is what ROCSP Stage 6 did), Stage 7 refuses to
even generate those bytes in the first place. We obviously can't
disable OCSP Response generation in the CA, since it still needs to
be usable by the ocsp-responder's live-signing path, so instead we
disable it in all of the non-live-signing codepaths (orphan finder,
issue precertificate, revoke certificate, and re-revoke certificate)
which have previously called GenerateOCSP.
Part of #6285
Clean up several spots where we were behaving differently on
go1.18 and go1.19, now that we're using go1.19 everywhere. Also
re-enable the lint and generate tests, and fix the various places where
the two versions disagreed on how comments should be formatted.
Also clean up the OldTLS codepaths, now that both go1.19 and our
own feature flags have forbidden TLS < 1.2 everywhere.
Fixes#6011
These flags are set in both staging and prod. Deprecate them, make
all code gated behind them the only path, and delete code (multi_source)
which was only accessible when these flags were not set.
Part of #6285
Create a new `ROCSPStage6` feature flag which affects the behavior of
the SA. When enabled, this flag causes the `AddPrecertificate`,
`RevokeCertificate`, and `UpdateRevokedCertificate` methods to ignore
the OCSP response bytes provided by their caller. They will no longer
error out if those bytes are missing, and if the bytes are present they
will still not be written to the database.
This allows us to, in the future, cause the RA and CA to stop generating
those OCSP responses entirely, and stop providing them to the SA,
without causing any errors when we do.
Part of #6079
Add checkedRedisSource, a new OCSP Source which gets
responses from Redis, gets metadata from the database, and
only serves the Redis response if it matches the authoritative
metadata. If there is a mismatch, it requests a new OCSP
response from the CA, stores it in Redis, and serves the new
response.
This behavior is locked behind a new ROCSPStage3 feature flag.
Part of #6079
This enables ocsp-responder to talk to the RA and request freshly signed
OCSP responses.
ocsp/responder/redis_source is moved to ocsp/responder/redis/redis_source.go
and significantly modified. Instead of assuming a response is always available
in Redis, it wraps a live-signing source. When a response is not available,
it attempts a live signing.
If live signing succeeds, the Redis responder returns the result right away
and attempts to write a copy to Redis on a goroutine using a background
context.
To make things more efficient, I eliminate an unneeded ocsp.ParseResponse
from the storage path. And I factored out a FakeResponse helper to make
the unittests more manageable.
Commits should be reviewable one-by-one.
Fixes#6191
We recently landed a fix so the expiration-mailer won't look twice at
the same certificate. This will cause an immediate behavior change when
it is deployed, and that might have surprising effects. Put the fix
behind a feature flag so we can control when it rolls out more
carefully.
By default, Boulder's feature flag code verifies that the list of flags
being set (from a JSON file) maps to actually-existing flags.
However, this gets in the way of a deployment strategy where feature
flags are added to config templates during a staging deploy with "true"
or "false" filled in depending on production or staging status - for
instance, when rolling out a deprecation to staging ahead of production.
If those configs get rolled to prod before the corresponding Boulder
deploy, Boulder will refuse to start up, even though it would be fine to
start up with the unrecognized flag ignored.
The envisioned deployment behavior here is that prod will have
AllowUnrecognizedFeatures: true while staging will have it set to false,
to ensure that misspellings of feature flag names are caught during
staging deploy. As a correlary, this assumes that the list of flags in
configs will be the same between staging and prod, with only their
values changing.
This adds three features flags: SHA1CSRs, OldTLSOutbound, and
OldTLSInbound. Each controls the behavior of an upcoming deprecation
(except OldTLSInbound, which isn't yet scheduled for a deprecation
but will be soon). Note that these feature flags take advantage of
`features`' default values, so they can default to "true" (that is, each
of these features is enabled by default), and we set them to "false"
in the config JSON to turn them off when the time comes.
The unittest for OldTLSOutbound requires that `example.com` resolves
to 127.0.0.1. This is because there's logic in the VA that checks
that redirected-to hosts end in an IANA TLD. The unittest relies on
redirecting, and we can't use e.g. `localhost` in it because of that
TLD check, so we use example.com.
Fixes#6036 and #6037
Simplify the WFE `RevokeCertificate` API method in three ways:
- Remove most of the logic checking if the requester is authorized to
revoke the certificate in question (based on who is making the
request, what authorizations they have, and what reason they're
requesting). That checking is now done by the RA. Instead, simply
verify that the JWS is authenticated.
- Remove the hard-to-read `authorizedToRevoke` callbacks, and make the
`revokeCertBySubscriberKey` (nee `revokeCertByKeyID`) and
`revokeCertByCertKey` (nee `revokeCertByJWK`) helpers much more
straight-line in their execution logic.
- Call the RA's new `RevokeCertByApplicant` and `RevokeCertByKey` gRPC
methods, rather than the deprecated `RevokeCertificateWithReg`.
This change, without any flag flips, should be invisible to the
end-user. It will slightly change some of our log message formats.
However, by now relying on the new RA gRPC revocation methods, this
change allows us to change our revocation policies by enabling the
`AllowDoubleRevocation` and `MozRevocationReasons` feature flags, which
affect the behavior of those new helpers.
Fixes#5936
Add two new gRPC methods to the SA:
- `RevokeCertByKey` will be used when the API request was signed by the
certificate's keypair, rather than a Subscriber keypair. If the
request is for reason `keyCompromise`, it will ensure that the key is
added to the blocked keys table, and will attempt to "re-revoke" a
certificate that was already revoked for some other reason.
- `RevokeCertByApplicant` supports both the path where the original
subscriber or another account which has proven control over all of the
identifier in the certificate requests revocation via the API. It does
not allow the requested reason to be `keyCompromise`, as these
requests do not represent a demonstration of key compromise.
In addition, add a new feature flag `MozRevocationReasons` which
controls the behavior of these new methods. If the flag is not set, they
behave like they have historically (see above). If the flag is set to true,
then the new methods enforce the upcoming Mozilla policies around
revocation reasons, namely:
- Only the original Subscriber can choose the revocation reason; other
clients will get a set reason code based on the method of requesting
revocation. When the original Subscriber requests reason
`keyCompromise`, this request will be honored, but the key will not be
blocked and other certificates with that key will not also be revoked.
- Revocations signed with the certificate key will always get reason
`keyCompromise`, because we do not know who is sending the request and
therefore must assume that the use of the key in this way represents
compromise. Because these requests will always be fore reason
`keyCompromise`, they will always be added to the blocked keys table
and they will always attempt "re-revocation".
- Revocations authorized via control of all names in the cert will
always get reason `cessationOfOperation`, which is to be used when the
original Subscriber does not control all names in the certificate
anymore.
Finally, update the existing `AdministrativelyRevokeCertificate` method
to use the new helper functions shared by the two new methods.
Part of #5936
Empty the bodies of the WFE's and RA's `NewAuthorization` methods. These
were used exclusively by the ACMEv1 flow. Also remove any helper functions
which were used exclusively by this code, and any tests which were testing
exclusively this code and which have equivalent tests for the ACMEv2 flow.
Greatly simply `SA.GetAuthorizations2`, as it no longer has to contend with
there being two different kinds of authorizations in the database. Add a few
TODOs to consider removing a few other SA gRPC methods which no longer
have any callers.
Part of #5681
Add a new feature flag `GetAuthzUseIndex` which causes the SA
to add `USE INDEX (regID_identifer_status_expires_idx)` to its authz2
database queries. This should encourage the query planner to actually
use that index instead of falling back to large table-scans.
Fixes#5822
Add a feature flag which causes the SA to switch between using the
traditional read-write database connector (pointed at the primary db)
or the newer read-only database connector (usually pointed at a
replica) when executing the `GetAuthorizations2` query.
Add a new feature flag to control whether or not the experimental ARI
information is exposed. Add a new entry to the Directory object which
provides the base URL for ARI requests. Add a new handler to the WFE
which parses incoming requests and returns reasonable renewalInfo.
Part of #5674
Add a new method to the SA's gRPC interface which takes both an Order
and a list of new Authorizations to insert into the database, and adds
both (as well as the various ancillary rows) inside a transaction.
To enable this, add a new abstraction layer inside the `db/` package
that facilitates inserting many rows at once, as we do for the `authz2`,
`orderToAuthz2`, and `requestedNames` tables in this operation.
Finally, add a new codepath to the RA (and a feature flag to control it)
which uses this new SA method instead of separately calling the
`NewAuthorization` method multiple times. Enable this feature flag in
the config-next integration tests.
This should reduce the failure rate of the new-order flow by reducing
the number of database operations by coalescing multiple inserts into a
single multi-row insert. It should also reduce the incidence of new
authorizations being created in the database but then never exposed to
the subscriber because of a failure later in the new-order flow, both by
reducing failures overall and by adding those authorizations in a
transaction which will be rolled back if there is a later failure.
Fixes#5577
Make the `NonCFSSLSigner` code path the only code path through the CA.
Remove all code related to the old, CFSSL-based code path. Update tests
to supply (or mock) issuers of the new kind. Remove or simplify a few
tests that were testing for behavior only exhibited by the old code
path, such as incrementing certain metrics. Remove code from `//cmd/`
for initializing the CFSSL library. Finally, mark the `NonCFSSLSigner`
feature flag itself as deprecated.
Delete the portions of the vendored CFSSL code which were only used
by these deleted code paths. This does not remove the CFSSL library
entirely, the rest of the cleanup will follow shortly.
Part of #5115
This flag is now enabled in Let's Encrypt staging/prod.
This change deprecates the flag and prepares it for deletion in a future
change. It can then be removed once no staging/prod configs reference the
flag.
Fixes#5236
features.go contains some feature flags that no longer control any Boulder
behavior. They have been removed from Let's Encrypt staging and prod
configs and can now be removed from Boulder.
This change removes the deprecated feature flags.
Currently, the CA is configured with a set of `internalIssuer`s,
and a mapping of public key algorithms (e.g. `x509.RSA`) to which
internalIssuer to use. In operation today, we use the same issuer
for all kinds of public key algorithms. In the future, we will use
different issuers for different algorithms (in particular, we will
use R3 to issue for RSA keys, and E1 to issue for ECDSA keys). But
we want to roll that out slowly, continuing to use our RSA issuer
to issue for all types of public keys, except for ECDSA keys which
are presented by a specific set of allowed accounts.
This change adds a new config field to the CA, which lets us specify
a small list of registration IDs which are allowed to have issuance
from our ECDSA issuer. If the config list is empty, then all accounts
are allowed. The CA checks to see if the key being issued for is
ECDSA: if it is, it then checks to make sure that the associated
registration ID is in the allowlist. If the account is not allowed,
it then overrides the issuance algorithm to use RSA instead,
mimicking our old behavior. It also adds a new feature flag, which
can be enabled to skip the allowlist entirely (effectively allowing
all registered accounts). This feature flag will be enabled when
we're done with our testing and confident in our ECDSA issuance.
Fixes#5259
The PrecertificateRevocation flag is turned on everywhere, so the
else case is unused code. This change updates the WFE to always
use the PrecertificateRevocation code path, and deprecates the old
feature flag.
The TestRevokeCertificateWithAuthz method was deleted because
it is redundant with TestRevokeCertificateWithAuthorizations.
Fixes#5240
Adds a replacement issuance library that replaces CFSSL. Usage of the
new library is gated by a feature, meaning until we fully deploy the
new signer we need to support both the new one and CFSSL, which makes
a few things a bit complicated.
One Big follow-up change is that once CFSSL is completely gone we'll
be able to stop using CSRs as the internal representation of issuance
requests (i.e. instead of passing a CSR all the way through from the
WFE -> CA and then converting it to the new signer.IssuanceRequest,
we can just construct a signer.IssuanceRequest at the WFE (or RA) and
pass that through the backend instead, making things a lot less opaque).
Fixes#4906.
The StoreKeyHashes feature flag controls whether rows are added to the
keyHashToSerial table. This feature is now enabled everywhere, so the
flag-protected code can be turned on unconditionally and the flag
removed from configs.
Related to #4895
This commit consists of three classes of changes:
1) Changing various command main.go files to always behave as they
would have when features.BlockedKeyTable was true. Also changing
one test in the same manner.
2) Removing the BlockedKeyTable flag from configuration in config-next,
because the flag is already live.
3) Moving the BlockedKeyTable flag to the "deprecated" section of
features.go, and regenerating featureflag_strings.go.
A future change will remove the BlockedKeyTable flag (and other
similarly deprecated flags) from features.go entirely.
Fixes#4873
Currently 99.99% of RSA keys we see in certificates at Let's Encrypt are
either 2048, 3072, or 4096 bits, but we support every 8 bit increment
between 2048 and 4096. Supporting these uncommon key sizes opens us up to
having to block much larger ranges of keys when dealing with something
like the Debian weak keys incident. Instead we should just reduce the
set of key sizes we support down to what people actually use.
Fixes#4835.
Adds a daemon which monitors the new blockedKeys table and checks for any unexpired, unrevoked certificates that are associated with the added SPKI hashes and revokes them, notifying the user that issued the certificates.
Fixes#4772.
Prev. we inserted data for tracking issued names into the `issuedNames` table
during `sa.AddCertificate`. A more robust solution is to do this during
`sa.AddPrecertificate` since this is when we've truly committed to having
issued for the names.
The new SA `WriteIssuedNamesPrecert` feature flag enables writing this table
during `AddPrecertificate`. The legacy behaviour continues with the flag
enabled or disabled but is updated to tolerate duplicate INSERT errors so that
it is possible to deploy this change across multiple SA instances safely.
Along the way I also updated `SA.AddPrecertificate` to perform its two
`INSERT`s in a transaction using the `db.WithTransaction` wrapper.
Resolves https://github.com/letsencrypt/boulder/issues/4565
In the process I tweaked a few variable names in GetAuthorizations2 to
refer to just "authz" instead of "authz2" because it made things
clearer, particularly in the case of authz2IDMap, which is a map of
whether a given ID exists, not a map from authz's to IDs.
Fixes#4564
This avoids needing to send the entire certificate in OCSP generation
RPCs.
Ended up including a few cleanups that made the implementation easier.
Initially I was struggling with how to derive the issuer identification info.
We could just stick the full SPKI hash in certificateStatus, but that takes a
significant amount of space, we could configure unique issuer IDs in the CA
config, but that would require being very careful about keeping the IDs
constant, and never reusing an ID, or we could store issuers in a table in the
database and use that as a lookup table, but that requires figuring out how to
get that info into the table etc. Instead I've just gone with what I found to
be the easiest solution, deriving a stable ID from the cert hash. This means we
don't need to remember to configure anything special and the CA config stays
the same as it is now.
Fixes#4469.
Previously we used a JOIN on the orderToAuthz2 table in order to make sure
we only returned authorizations created using the ACME v2 API. Each time an
order is created a pivot row (order ID + authz ID) is added to the
orderToAuthz2 table. If a large number of orders are created that all contain
the same authorization, due to reuse, then the JOINd query would return a full
authorization row for each entry in the orderToAuthz2 table with the authorization
ID.
Instead we now filter out these authorizations by doing a second query against
the orderToAuthz2 table. Using this query still requires examining a large number
of rows, but because we don't need to construct a temporary table for the JOIN
and fill it with all the full authorization rows we should save resources.
Fixes#4500.
This change set makes the authz2 storage format the default format. It removes
most of the functionality related to the previous storage format, except for
the SA fallbacks and old gRPC methods which have been left for a follow-up
change in order to make these changes deployable without introducing
incompatibilities.
Fixes#4454.
When the `features.PrecertificateRevocation` feature flag is enabled the WFE2
will allow revoking certificates for a submitted precertificate. The legacy WFE1
behaviour remains unchanged (as before (pre)certificates issued through the V1
API will be revocable with the V2 API).
Previously the WFE2 vetted the certificate from the revocation request by
looking up a final certificate by the serial number in the requested
certificate, and then doing a byte for byte comparison between the stored and
requested certificate.
Rather than adjust this logic to handle looking up and comparing stored
precertificates against requested precertificates (requiring new RPCs and an
additional round-trip) we choose to instead check the signature on the requested
certificate or precertificate and consider it valid for revocation if the
signature validates with one of the WFE2's known issuers. We trust the integrity
of our own signatures.
An integration test that performs a revocation of a precertificate (in this case
one that never had a final certificate issued due to SCT embedded errors) with
all of the available authentication mechanisms is included.
Resolves https://github.com/letsencrypt/boulder/issues/4414
This change adds two tables and two methods in the SA, to store precertificates
and serial numbers.
In the CA, when the feature flag is turned on, we generate a serial number, store it,
sign a precertificate and OCSP, store them, and then return the precertificate. Storing
the serial as an additional step before signing the certificate adds an extra layer of
insurance against duplicate serials, and also serves as a check on database availability.
Since an error storing the serial prevents going on to sign the precertificate, this decreases
the chance of signing something while the database is down.
Right now, neither table has read operations available in the SA.
To make this work, I needed to remove the check for duplicate certificateStatus entry
when inserting a final certificate and its OCSP response. I also needed to remove
an error that can occur when expiration-mailer processes a precertificate that lacks
a final certificate. That error would otherwise have prevented further processing of
expiration warnings.
Fixes#4412
This change builds on #4417, please review that first for ease of review.
For authzv1, this actually executes a SQL DELETE for the unused challenges
when an authorization is updated upon validation.
For authzv2, this doesn't perform a delete, but changes the authorizations that
are returned so they don't include unused challenges.
In order to test the flag for both authz storage models, I set the feature flag in
both config/ and config-next/.
Fixes#4352
This rolls forward #4326 after it was reverted in #4328.
Resolves https://github.com/letsencrypt/boulder/issues/4329
The older query didn't have a `LIMIT 1` so it was returning multiple results,
but gorp's `SelectOne` was okay with multiple results when the selection was
going into an `int64`. When I changed this to a `struct` in #4326, gorp started
producing errors.
For this bug to manifest, an account needs to create an order, then fail
validation, twice in a row for a given domain name, then create an order once
more for the same domain name - that third request will fail because there are
multiple orders in the orderFqdnSets table for that domain.
Note that the bug condition doesn't happen when an account does three successful
issuances in a row, because finalizing an order (that is, issuing a certificate
for it) deletes the row in orderFqdnSets. Failing an authorization does not
delete the row in orderFqdnSets. I believe this was an intentional design
decision because an authorization can participate in many orders, and those
orders can have many other authorizations, so computing the updated state of
all those orders would be expensive (remember, order state is not persisted in
the DB but is calculated dynamically based on the authorizations it contains).
This wasn't detected in integration tests because we don't have any tests that
fail validation for the same domain multiple times. I filed an issue for an
integration test that would have incidentally caught this:
https://github.com/letsencrypt/boulder/issues/4332. There's also a more specific
test case in #4331.
This reverts commit 9fa360769e.
This commit can cause "gorp: multiple rows returned for: ..." under certain situations.
See #4329 for details of followup.
When there are a lot of potential orders to reuse, the query could scan
unnecessary rows, sometimes leading to timeouts. The new query used
when the FasterGetOrderForNames feature flag is enabled uses the
available index more effectively and adds a LIMIT clause.
Add flag to explicitly disable orders containing authz2 authorizations. After looking at a handful of much more complex solutions this feels like the best option. With NewAuthorizationSchema disabled and DisableAuthz2Orders enabled any requests for orders that include authz2 authorizations will return a 404 (where previously they would return a 500).
Fixes#4263.
In November 2019 we will be removing support for legacy pre RFC-8555
unauthenticated GET requests for accessing ACME resources. A new
`MandatoryPOSTAsGET` feature flag is added to the WFE2 to allow
enforcing this change. Once this feature flag has been activated in Nov
we can remove both it and the WFE2 code supporting GET requests.
The `RevokeAuthorizationsByDomain` SA RPC is deprecated and `RevokeAuthorizationsByDomain2`
should be used in its place. Which RPC to use is controlled by the `NewAuthorizationSchema` feature
flag. When it is true the `admin-revoker` will use the new RPC.
Resolves https://github.com/letsencrypt/boulder/issues/4178
Right now we run a `SELECT COUNT` query on issuedNames to calculate this rate limit.
Unfortunately, counting large numbers of rows is very slow. This change introduces a
dedicated table for keeping track of that rate limit.
Instead of being keyed by FQDN, this new `certificatesPerName` table is keyed by
the same field as rate limits are calculated on: the base domain. Each row has a base
domain, a time (chunked by hour), and a count. This means calculating the rate limit
status for each domain reads at most 7 * 24 rows.
This should particularly speed up cases when a single domain has lots of subdomains.
Fixes#4152
Checking the "certificates per name" rate limit is moderately expensive, particularly
for domains that have lots of certificates on their subdomains. By checking the renewal
exemption first, we can save some database queries in a lot of cases.
Part of #4152
* `EnforceMultiVA` to allow configuring multiple VAs but not changing the primary VA's result based on what the remote VAs return.
* `MultiVAFullResults` to allow collecting all of the remote VA results. When all results are collected a JSON log line with the differential between the primary/remote VAs is logged.
Resolves https://github.com/letsencrypt/boulder/issues/4066
* Remove the challenge whitelist
* Reduce the signature for ChallengesFor and ChallengeTypeEnabled
* Some unit tests in the VA were changed from testing TLS-SNI to testing the same behavior
in TLS-ALPN, when that behavior wasn't already tested. For instance timeouts during connect
are now tested.
Fixes#4109
We're only using the simplified HTTP-01 code from `va/http.go` now 🎉 The old unit tests that still seem relevant are left in place in `va/va_test.go` instead of being moved to `va/http_test.go` to signal that they're a bit crufty and could probably use a separate cleanup. For now I'm hesitant to remove test coverage so I updated them in-place without moving them to a new home.
Resolves https://github.com/letsencrypt/boulder/issues/4089
If an order for a given set of names will fail finalization because of certificate rate limits (certs per domain, certs per fqdn set) there isn't any point in allowing an order for those names to be created. We can stop a lot of requests earlier by enforcing the cert rate limits at new order time as well as finalization time. A new RA `EarlyOrderRateLimit` feature flag controls whether this is done or not.
Resolves#3975
The `renewal` field of the `issuedNames` table is indexed
to allow more efficient processing of rate limit decisions
w.r.t. renewals for the certificates per domain rate limit (see
https://github.com/letsencrypt/boulder/pull/3178). Before we can start
using this field in rate limit calculations we need to populate it
appropriately. A new `SetIssuedNamesRenewalBit` feature flag for the SA
controls whether we do so or not.
Resolves https://github.com/letsencrypt/boulder/issues/4008
Implements a feature that enables immediate revocation instead of marking a certificate revoked and waiting for the OCSP-Updater to generate the OCSP response. This means that as soon as the request returns from the WFE the revoked OCSP response should be available to the user. This feature requires that the RA be configured to use the standalone Akamai purger service.
Fixes#4031.
Adds a feature which gates creation of authorizations following the style required for the new schema (and which can be used for gating the reset of our new schema code later down the road).
There was an internal discussion about an issue this creates regarding a predictable ordering of challenges within a challenge due to sequential challenge IDs which will always be static for each challenge type. It was suggested we could add some kind of obfuscation to the challenge ID when presented to the user to prevent this. This hasn't been done in this PR as it would only be focused in the WFE and would be better suited as its own changeset.
Fixes#3981.
Staging and prod both deployed the PerformValidationRPC feature flag. All running WFE/WFE2 instances are using the more accurately named PerformValidation RPC and we can strip out the old UpdateAuthorization bits. The feature flag for PerformValidationRPC remains until we clean up the staging/prod configs.
Resolves#3947 and completes the last of #3930
Previously we mistakenly returned status 204 (no content) for all
requests to new-nonce, including HEAD. This status should only be used
for GET requests.
When the `HeadNonceStatusOK` feature flag is enabled we will now return
the correct status for HEAD requests. When the flag is disabled we return
status 204 to preserve backwards compatibility.
The existing RA `UpdateAuthorization` RPC needs replacing for
two reasons:
1. The name isn't accurate - `PerformValidation` better captures
the purpose of the RPC.
2. The `core.Challenge` argument is superfluous since Key
Authorizations are not sent in the initiation POST from the client
anymore. The corresponding unmarshal and verification is now
removed. Notably this means broken clients that were POSTing
the wrong thing and failing pre-validation will now likely fail
post-validation.
To remove `UpdateAuthorization` the new `PerformValidation`
RPC is added alongside the old one. WFE and WFE2 are
updated to use the new RPC when the perform validation
feature flag is enabled. We can remove
`UpdateAuthorization` and its associated wrappers once all
WFE instances have been updated.
Resolves https://github.com/letsencrypt/boulder/issues/3930
Continued bugs from the custom dialer approach used by the VA for HTTP-01 (most recently https://github.com/letsencrypt/boulder/issues/3889) motivated a rewrite.
Instead of using a custom dialer to be able to control DNS resolution for HTTP validation requests we can construct URLs for the IP addresses we resolve and overload the Host header. This avoids having to do address resolution within the dialer and eliminates the complexity of the dialer `addrInfoChan`. The only thing left for our custom dialer now is to shave some time off of the provided context to help us discern timeouts before/after connect.
The existing IP preference & fallback behaviour is preserved: e.g. if a host has both IPv6 and IPv4 addresses we connect to the first IPv6 address. If there is a network error connecting to that address (e.g. an error during "dial"), we try once more with the first IPv4 address. No other retries are done. Matching existing behaviour no fallback is done for HTTP level failures on an IPv6 address (e.g. mismatched webroots, redirect loops, etc). A new Prometheus counter "http01_fallbacks" is used to keep track of the number of fallbacks performed.
As a result of moving the layer at which the retry happens a fallback like described above will now produce two validation records: one for the initial IPv6 connection, and one for the IPv4 connection. Neither will have the "addressesTried" field populated, just "addressesResolved" and "addressUsed". Previously with the dialer doing the retry we would have created just one validation record with an IPv4 "addressUsed" field and both an IPv6 and IPv4 address in the "addressesTried" field.
Because this is a big diff for a key part of the VA the new code is gated by the `SimplifiedVAHTTP` feature flag.
Resolves#3889
Removes the checks for a handful of deployed feature flags in preparation for removing the flags entirely. Also moves all of the currently deprecated flags to a separate section of the flags list so they can be more easily removed once purged from production configs.
Fixes#3880.
Does a simpler probe than compared to using a `blackbox_exporter`, but directly collects the info we think will aid debugging publisher outages.
Updates #3821.
ACME draft-13 changed the inner JWS' body to contain the old key
that signed the outer JWS. Because this is a backwards incompatible
change with the draft-12 ACME v2 key rollover we introduce a new feature
flag `features.ACME13KeyRollover` and conditionally use the old or new
key rollover semantics based on its value.
Things removed:
* features.EmbedSCTs (and all the associated RA/CA/ocsp-updater code etc)
* ca.enablePrecertificateFlow (and all the associated RA/CA code)
* sa.AddSCTReceipt and sa.GetSCTReceipt RPCs
* publisher.SubmitToCT and publisher.SubmitToSingleCT RPCs
Fixes#3755.
This adds support for the account-uri CAA parameter as specified by
section 3 of https://tools.ietf.org/html/draft-ietf-acme-caa-04, allowing
issuance to be restricted to one or more ACME accounts as specified by CAA
records.
When performing CAA checking respect the validation-methods parameter (if
present) and restrict the allowed authorization methods to those specified.
This allows a domain to restrict authorization methods that can be used with
Let's Encrypt.
This is largely based on PR #3003 (by @lukaslihotzki), which was landed and
then later reverted due to issue #3143. The bug the resulted in the previous
code being reverted has been addressed (likely inadvertently) by 76973d0f.
This implementation also includes integration tests for CAA validation-methods.
Fixes issue #3143.
When the WFE calls the RA the RA creates a sub context which is cancelled when
the RPC returns. Because we were spawning the publisher RPC calls in a
goroutine with the context from ra.IssueCertificate as soon as
ra.IssueCertificate returned that context was being canceled which in turn
canceled the publisher RPC calls.
Instead of using the RA RPC context simply use a `context.Background()` so
that the RPC context doesn't break these submissions. Also return to
pre-features.CancelCTSubmissions behavior where precert submissions would
be canceled once we retrieved SCTs from the winning logs instead of relying
on the magic behavior of the RA RPC canceling them itself.