Commit Graph

442 Commits

Author SHA1 Message Date
Aaron Gable 95e5f87f9e
Add feature flag to disable pending authz reuse (#7836)
Pending authz reuse is a nice-to-have feature because it allows us to
create fewer rows in the authz database table when creating new orders.
However, stats show that less than 2% of authorizations that we attach
to new orders are reused pending authzs. And as we move towards using a
more streamlined database schema to store our orders, authorizations,
and validation attempts, disabling pending authz reuse will greatly
simplify our database schema and code.

CPS Compliance Review: our CPS does not speak to whether or not we reuse
pending authorizations for new orders.
IN-10859 tracks enabling this flag in prod

Part of https://github.com/letsencrypt/boulder/issues/7715
2024-12-05 16:14:57 -08:00
Jacob Hoffman-Andrews 5cdfa3e26c
ra: wait for validations on clean shutdown (#7854)
This reduces the number of validations that get left indefinitely in
"pending" state.

Rename `DrainFinalize()` to `Drain()` to indicate that it now covers
more cases than just finalize.
2024-12-02 11:00:17 -08:00
Samantha Frank 27a77142ad
VA: Make performRemoteValidation more generic (#7847)
- Make performRemoteValidation a more generic function that returns a
new remoteResult interface
- Modify the return value of IsCAAValid and PerformValidation to satisfy
the remoteResult interface
- Include compile time checks and tests that pass an arbitrary operation
2024-11-27 15:29:33 -05:00
Aaron Gable ded2e5e610
Remove logging of contact email addresses (#7833)
Fixes https://github.com/letsencrypt/boulder/issues/7801
2024-11-25 13:33:56 -08:00
Aaron Gable 7791262815
Rate limit deactivating pending authzs (#7835)
When a client deactivates a pending authorization, count that towards
their FailedAuthorizationsPerDomainPerAccount and
FailedAuthorizationsForPausingPerDomainPerAccount rate limits. This
should help curb the few clients which constantly create new orders and
authzs, deactivate those pending authzs preventing reuse of them or
their orders, and then rinse and repeat.

Fixes https://github.com/letsencrypt/boulder/issues/7834
2024-11-21 17:17:16 -08:00
Samantha Frank 8bf13a90f4
VA: Make PerformValidation more like DoDCV (#7828)
- Remove Perspective and RIR from ValidationRecords
- Make ValidationResultToPB Perspective and RIR aware
- Update comment for VA.PerformValidation
- Make verificationRequestEvent more like doDCVAuditLog
- Update language used in problems created by performRemoteValidation to
be more like doRemoteDCV.
2024-11-20 14:13:55 -05:00
Samantha Frank a8cdaf8989
ratelimit: Remove legacy registrations per IP implementation (#7760)
Part of #7671
2024-11-19 18:39:21 -05:00
Samantha Frank 3506f09285
RA: Make calls to countCertificateIssued and countFailedValidations synchronous (#7824)
Solves CI flakes in TestCertificatesPerDomain and
TestIdentifiersPausedForAccount that are the result of a race on the
Redis database. This has the downside of making failed validations and
successful finalizations take slightly longer.
2024-11-15 16:33:51 -05:00
Jacob Hoffman-Andrews 5e385e440a
ra: clean up countFailedValidations (#7797)
Return an error and do logging in the caller. This adds early returns on
a number of error conditions, which can prevent nil pointer dereference
in those cases.

Also update the description for AutomaticallyPauseZombieClients.

Follows up #7763.
2024-11-14 16:16:36 -05:00
Jacob Hoffman-Andrews 1d8cf3e212
ra: remove special case for empty DNSNames (#7795)
This case was added to work around a test case that didn't fill it out;
instead, fill DNSNames for that test case.
2024-11-11 11:07:20 -05:00
Kruti Sutaria a79a830f3b
ratelimits: Auto pause zombie clients (#7763)
- Added a new key-value ratelimit
`FailedAuthorizationsForPausingPerDomainPerAccount` which is incremented
each time a client fails a validation.
- As long as capacity exists in the bucket, a successful validation
attempt will reset the bucket back to full capacity.
- Upon exhausting bucket capacity, the RA will send a gRPC to the SA to
pause the `account:identifier`. Further validation attempts will be
rejected by the [WFE](https://github.com/letsencrypt/boulder/pull/7599).
- Added a new feature flag, `AutomaticallyPauseZombieClients`, which
enables automatic pausing of zombie clients in the RA.
- Added a new RA metric `paused_pairs{"paused":[bool],
"repaused":[bool], "grace":[bool]}` to monitor use of this new
functionality.
- Updated `ra_test.go` `initAuthorities` to allow accessing the
`*ratelimits.RedisSource` for checking that the new ratelimit functions
as intended.

Co-authored-by: @pgporada 

Fixes https://github.com/letsencrypt/boulder/issues/7738

---------

Co-authored-by: Phil Porada <pporada@letsencrypt.org>
Co-authored-by: Phil Porada <philporada@gmail.com>
2024-11-08 13:51:41 -08:00
James Renken 6a2819a95a
Introduce separate UpdateRegistrationContact & UpdateRegistrationKey methods in RA & SA (#7735)
Introduce separate UpdateRegistrationContact & UpdateRegistrationKey
methods in RA & SA

Clear contact field during DeactivateRegistration

Part of #7716
Part of #5554
2024-11-06 10:07:31 -08:00
Samantha Frank 6e6c8fe480
ratelimits: Update errors to deep link to individual limits documentation (#7767)
Updates rate limits error messages to deep link to new website docs added in https://github.com/letsencrypt/website/pull/1756.
2024-10-25 13:55:51 -04:00
Samantha Frank e5edb7077f
wfe/features: Deprecate UseKvLimitsForNewOrder (#7765)
Default code paths that depended on this flag to be true.

Part of #5545
2024-10-23 18:13:24 -04:00
Samantha Frank 2e19a362ec
WFE/RA: Default codepaths to CheckRenewalExemptionAtWFE: true (#7745)
Also, remove redundant renewal checks in
`RA.checkNewOrdersPerAccountLimit()` and
`RA.checkCertificatesPerNameLimit()`.

Part of #7511
2024-10-07 15:12:30 -04:00
Samantha Frank d656afce78
ratelimits: Rename DomainsForRateLimiting() to clarify use (#7746)
Rename as suggested by @jsha in #7729.
2024-10-07 14:56:36 -04:00
Aaron Gable dad9e08606
Lay the groundwork for supporting IP identifiers (#7692)
Clean up how we handle identifiers throughout the Boulder codebase by
- moving the Identifier protobuf message definition from sa.proto to
core.proto;
- adding support for IP identifier to the "identifier" package;
- renaming the "identifier" package's exported names to be clearer; and
- ensuring we use the identifier package's helper functions everywhere
we can.

This will make future work to actually respect identifier types (such as
in Authorization and Order protobuf messages) simpler and easier to
review.

Part of https://github.com/letsencrypt/boulder/issues/7311
2024-08-30 11:40:38 -07:00
Aaron Gable d58d09615a
Improve how we disable challenge types (#7677)
When creating an authorization, populate it with all challenges
appropriate for that identifier, regardless of whether those challenge
types are currently "enabled" in the config. This ensures that
authorizations created during a incident for which we can temporarily
disabled a single challenge type can still be validated via that
challenge type after the incident is over.

Also, when finalizing an order, check that the challenge type used to
validation each authorization is not currently disabled. This ensures
that, if we temporarily disable a single challenge due to an incident,
we don't issue any more certificates using authorizations which were
fulfilled using that disabled challenge.

Note that standard rolling deployment of this change is not safe if any
challenges are disabled at the same time, due to the possibility of an
updated RA not filtering a challenge when writing it to the database,
and then a non-updated RA not filtering it when reading from the
database. But if all challenges are enabled then this change is safe for
normal deploy.

Fixes https://github.com/letsencrypt/boulder/issues/5913
2024-08-29 15:38:50 -07:00
Aaron Gable da7865cb10
Add go1.23.0 to CI (#7665)
Begin testing on go1.23. To facilitate this, also update /x/net,
golangci-lint, staticcheck, and pebble-challtestsrv to versions which
support go1.23. As a result of these updates, also fix a handful of new
lint findings, mostly regarding passing non-static (i.e. potentially
user-controlled) format strings into Sprintf-style functions.

Additionally, delete one VA unittest that was duplicating the checks
performed by a different VA unittest, but with a context timeout bug
that caused it to break when go1.23 subtly changed DialContext behavior.
2024-08-23 14:56:53 -07:00
Samantha Frank c9be034c00
ratelimits: Add a feature-flag which makes key-value implementation authoritative (#7666)
- Add feature flag `UseKvLimitsForNewOrder`
- Add feature flag `UseKvLimitsForNewAccount`
- Flush all Redis shards before running integration or unit tests, this
avoids false positives between local testing runs

Fixes #7664
Blocked by #7676
2024-08-22 15:56:30 -04:00
Samantha Frank 4bf6e2f5a9
ratelimits: Skip Spends on CertificatesPerDomain for renewals (#7676)
This bug was introduced in
https://github.com/letsencrypt/boulder/pull/7669.

Also, make calls to ra.countCertificateIssued() non-blocking like
ra.countFailedValidation().

Part of #7664
Blocks #7666
2024-08-21 15:25:22 -04:00
Samantha Frank 14c0b2c3bb
ratelimits: Check at NewOrder and SpendOnly later (#7669)
- Check `CertificatesPerDomain` at newOrder and spend at Finalize time.
- Check `CertificatesPerAccountPerDomain` at newOrder and spend at
Finalize time.
- Check `CertificatesPerFQDNSet` at newOrder and spend at Finalize time.
- Fix a bug
in`FailedAuthorizationsPerDomainPerAccountSpendOnlyTransaction()` which
results in failed authorizations being spent for the exact FQDN, not the
eTLD+1.
- Remove redundant "max names" check at transaction construction time
- Enable key-value rate limits in the RA
2024-08-15 19:08:17 -04:00
Aaron Gable e1790a5a02
Remove deprecated sapb.NewAuthzRequest fields (#7651)
Remove the id, identifierValue, status, and challenges fields from
sapb.NewAuthzRequest. These fields were left behind from the previous
corepb.Authorization request type, and are now being ignored by the SA.

Since the RA is no longer constructing full challenge objects to include
in the request, remove pa.ChallengesFor and replace it with the much
simpler pa.ChallengeTypesFor.

Part of https://github.com/letsencrypt/boulder/issues/5913
2024-08-15 15:35:10 -07:00
Aaron Gable 46859a22d9
Use consistent naming for dnsName gRPC fields (#7654)
Find all gRPC fields which represent DNS Names -- sometimes called
"identifier", "hostname", "domain", "identifierValue", or other things
-- and unify their naming. This naming makes it very clear that these
values are strings which may be included in the SAN extension of a
certificate with type dnsName.

As we move towards issuing IP Address certificates, all of these fields
will need to be replaced by fields which carry both an identifier type
and value, not just a single name. This unified naming makes it very
clear which messages and methods need to be updated to support
non-dnsName identifiers.

Part of https://github.com/letsencrypt/boulder/issues/7647
2024-08-12 14:32:55 -07:00
Aaron Gable fa732df492
Remove challenge.ProvidedKeyAuthorization (#7655)
This field was deprecated in
https://github.com/letsencrypt/boulder/pull/7515, and has been fully
replaced by vapb.PerformValidationRequest.ExpectedKeyAuthorization.

Fixes https://github.com/letsencrypt/boulder/issues/7514
2024-08-12 14:08:06 -07:00
Phil Porada 686920024a
RA: Always output revocation reason field in cert revocation event (#7660) 2024-08-12 16:10:06 -04:00
Aaron Gable 22b1771d23
RA: Add GetAuthorization method to filter disabled challenges (#7652)
Add a new "GetAuthorization" method to the RA. This method is very
similar to the SA's existing "GetAuthorization2" method, except that it
also uses the RA's built-in Policy Authority to filter out any
challenges which are currently disabled.

In a follow-up change, the WFE will be updated to use this method when
retrieving authorizations and challenges for display, so that we can
ensure disabled challenges are not presented to ACME clients.

Part of https://github.com/letsencrypt/boulder/issues/5913
2024-08-12 10:24:58 -07:00
Aaron Gable 8380bb9b92
Use new sapb.Authorizations.Authzs field in RA (#7650)
This is a followup to https://github.com/letsencrypt/boulder/pull/7646,
updating two other RA methods (RevokeCertByApplicant and NewOrder) which
call different SA methods (GetValidAuthorizations2 and
GetAuthorizations2) but receive the same return type
(sapb.Authorizations) from the SA to use that type's new field.
2024-08-09 12:23:47 -07:00
Aaron Gable 28f09341b9
Simplify GetValidOrderAuthorizations2 (#7646)
Simplify SA.GetValidOrderAuthorizations2 so that it no longer conditions
the query on the status, expiry, or registration ID of the authorization
rows. This gives the query much better performance, because it no longer
tries to use an overly-large index, and fall back to large row-scans
when the query planner decides the index is too large.

While we're here, also improve the return type of
GetValidOrderAuthorizations2, so that instead of returning a map of
names to authorizations, it simply returns a list of authzs. This both
reduces the size of the gRPC message (once the old map is fully
removed), and improves its correctness because we cannot count on names
to be unique across multiple identifier types.

Finally, improve the RA code which calls SA.GetValidOrderAuthorizations2
to handle this improved return type, to make fewer assumptions about
identifier types, and to separate static authorization-checking from CAA
rechecking.

Fixes https://github.com/letsencrypt/boulder/issues/7645
2024-08-08 10:40:40 -07:00
Aaron Gable 35b0b55453
Improve how we create new authorizations (#7643)
Within the NewOrderAndAuthzsRequest, replace the corepb.Authorization
field with a new sapb.NewAuthzRequest message. This message has all of
the same field types and numbers, and the RA still populates all of
these fields when constructing a request, for backwards compatibility.
But it also has new fields (an Identifier carrying both type and value,
a list of challenge types, and a challenge token) which the RA
preferentially consumes if present.

This causes the content of our NewOrderAndAuthzsRequest to more closely
match the content that will be created at the database layer. Although
this may seem like a step backwards in terms of abstraction, it is also
a step forwards in terms of both efficiency (not having to transmit
multiple nearly-identical challenge objects) and correctness (being
guaranteed that the token is actually identical across all challenges).

After this change is deployed, it will be followed by a change which
removes the old fields from the NewAuthzRequest message, to realize the
efficiency gains.

Part of https://github.com/letsencrypt/boulder/issues/5913
2024-08-08 10:15:46 -07:00
Aaron Gable c6c7617851
Profiles: allow for omission of KU, EKU, and SKID (#7622)
Add three new keys to the CA's ProfileConfig:
- OmitKeyEncipherment causes the keyEncipherment Key Usage to be omitted
from certificates with RSA public keys. We currently include it for
backwards compatibility with TLS 1.1 servers that don't support modern
cipher suites, but this KU is completely useless as of TLS 1.3.
- OmitClientAuth causes the tlsClientAuthentication Extended Key Usage
to be omitted from all certificates. We currently include it to support
any subscribers who may be relying on it, but Root Programs are moving
towards single-purpose hierarchies and its inclusion is being
discouraged.
- OmitSKID causes the Subject Key Identifier extension to be omitted
from all certificates. We currently include this extension because it is
recommended by RFC 5280, but it serves little to no practical purpose
and consumes a large number of bytes, so it is now NOT RECOMMENDED by
the Baseline Requirements.

Make substantive changes to issuer.requestValid and issuer.Prepare to
implement the desired behavior for each of these options. Make a very
slight change to ra.matchesCSR to generally allow for serverAuth-only
EKUs. Improve the unit tests of both the //ca and //issuance packages to
cover the new behavior.

Part of https://github.com/letsencrypt/boulder/issues/7610
2024-07-31 11:08:11 -07:00
Aaron Gable e54c5bb85e
RA: pass through unpause requests to SA (#7630)
Have the RA's UnpauseAccount gRPC method forward the requested account
ID to the SA's corresponding method, and in turn forward the SA's count
of unpaused identifiers back to the caller in the response.

Changing the response message from emptypb.Empty to a new
rapb.UnpauseAccountResponse is safe, because message names are not
transmitted on the wire, only message field numbers.

While we're here, drastically simplify the wfe_test and sfe_test Mock
RAs, so they don't have to implement methods that aren't actually used
by the tests.

Fixes https://github.com/letsencrypt/boulder/issues/7536
2024-07-25 16:34:02 -04:00
Aaron Gable 98a4bc01ea
Rename 'now' to 'validUntil' in GetAuthz requests (#7631)
The name "now" was always misleading, because we never set the value to
be the actual current time, we always set it to be some time in the
future to avoid returning authzs which expire in the very near future.
Changing the name to "validUntil" matches the current naming in
GetPendingAuthorizationRequest.
2024-07-25 10:52:34 -07:00
Aaron Gable 510996b07a
RA: Pass requested new-order profile through to SA (#7608)
When receiving a NewOrder request from the WFE, pass the specified
profile name (if any) through to the SA for storage. Also, when
retrieving previous orders for potential re-use, don't reuse them unless
they have the same profile name (including the empty/default profile
name).

Fixes https://github.com/letsencrypt/boulder/issues/7607
2024-07-19 14:39:39 -07:00
Aaron Gable 6a634a4bd2
Remove CAA-rechecking dead code (#7606)
This code path was a safety net to ensure that CAA got rechecked if the
authorization was going to expire less than 30d+7h from now, i.e. if the
authorization had originally been checked more than 7h ago. The metrics
show that, as expected, this code path has not been executed in living
memory, because all situations in which it might be hit instead hit the
preceding `if staleCAA` clause.
2024-07-19 09:53:16 -07:00
Samantha Frank 55c274d132
ratelimits: Exempt renewals from NewOrdersPerAccount and CertificatesPerDomain (#7513)
- Rename `NewOrderRequest` field `LimitsExempt` to `IsARIRenewal`
- Introduce a new `NewOrderRequest` field, `IsRenewal`
- Introduce a new (temporary) feature flag, `CheckRenewalExemptionAtWFE`

WFE:
- Perform renewal detection in the WFE when `CheckRenewalExemptionAtWFE`
is set
- Skip (key-value) `NewOrdersPerAccount` and `CertificatesPerDomain`
limit checks when renewal detection indicates the the order is a
renewal.

RA:
- Leave renewal detection in the RA intact
- Skip renewal detection and (legacy) `NewOrdersPerAccount` and
`CertificatesPerDomain` limit checks when `CheckRenewalExemptionAtWFE`
is set and the `NewOrderRequest` indicates that the order is a renewal.

Fixes #7508
Part of #5545
2024-06-27 16:39:31 -04:00
Phil Porada 8c324a5e8a
RA: Add UnpauseAccountRequest protobuf message and service (#7537)
Add the `ra.UnpauseAccount` which takes an `rapb.UnpauseAccountRequest`
input parameter. The method is just a stub to allow downstream SFE
development to continue. There is relevant ongoing work in the SA which
will eventually reside in this stub method.
2024-06-20 11:21:46 -04:00
Aaron Gable 09693f03dc
Deprecate Challenge.ProvidedKeyAuthorization (#7515)
The core.Challenge.ProvidedKeyAuthorization field is problematic, both
because it is poorly named (which is admittedly easily fixable) and
because it is a field which we never expose to the client yet it is held
on a core type. Deprecate this field, and replace it with a new
vapb.PerformValidationRequest.ExpectedKeyAuthorization field.

Within the VA, this also simplifies the primary logic methods to just
take the expected key authorization, rather than taking a whole (largely
unnecessary) challenge object. This has large but wholly mechanical
knock-on effects on the unit tests.

While we're here, improve the documentation on core.Challenge itself,
and remove Challenge.URI, which was deprecated long ago and is wholly
unused.

Part of https://github.com/letsencrypt/boulder/issues/7514
2024-06-04 14:48:36 -07:00
Aaron Gable b92581d620
Better compile-time type checking for gRPC server implementations (#7504)
Replaced our embeds of foopb.UnimplementedFooServer with
foopb.UnsafeFooServer. Per the grpc-go docs this reduces the "forwards
compatibility" of our implementations, but that is only a concern for
codebases that are implementing gRPC interfaces maintained by third
parties, and which want to be able to update those third-party
dependencies without updating their own implementations in lockstep.
Because we update our protos and our implementations simultaneously, we
can remove this safety net to replace runtime type checking with
compile-time type checking.

However, that replacement is not enough, because we never pass our
implementation objects to a function which asserts that they match a
specific interface. So this PR also replaces our reflect-based unittests
with idiomatic interface assertions. I do not view this as a perfect
solution, as it relies on people implementing new gRPC servers to add
this line, but it is no worse than the status quo which relied on people
adding the "TestImplementation" test.

Fixes https://github.com/letsencrypt/boulder/issues/7497
2024-05-28 09:26:29 -07:00
Aaron Gable 4663b9898e
Use custom mocks instead of mocks.StorageAuthority (#7494)
Replace "mocks.StorageAuthority" with "sapb.StorageAuthorityClient" in
our test mocks. The improves them by removing implementations of the
methods the tests don't actually need, instead of inheriting lots of
extraneous methods from the huge and cumbersome mocks.StorageAuthority.

This reduces our usage of mocks.StorageAuthority to only the WFE tests
(which create one in the frequently-used setup() function), which will
make refactoring those mocks in the pursuit of
https://github.com/letsencrypt/boulder/issues/7476 much easier.

Part of https://github.com/letsencrypt/boulder/issues/7476
2024-05-21 09:16:17 -07:00
Phil Porada d2d4f4a156
ra: Increment metric anytime ra.matchesCSR fails (#7492)
Adds a new metric `cert_csr_mismatch` for SRE to create an alert for.
This should never happen, but if it does we should know about it as soon
as possible. The details of the failure will end up in logs due to error
propagation.

Fixes https://github.com/letsencrypt/boulder/issues/6587
2024-05-20 14:30:53 -04:00
Jacob Hoffman-Andrews b7553f1290
docs: add ISSUANCE-CYCLE.md (#7444)
Also update the CA and RA doccomments to link to it and describe the
roles of key functions a little better.

Remove outdated reference to generating OCSP at issuance time.
2024-05-06 13:01:08 -07:00
Phil Porada fc7c522c28
RA: Audit log and track cert profile names and hashes (#7433)
* Adds `CertProfileName` to the CAs `capb.IssuePrecertificateResponse`
so the RA can receive the CAs configured default profile name for audit
logging/metrics. This is useful for when the RA sends an empty string as
the profile name to the CA, but we want to know exactly what the profile
name chosen by the CA was, rather than just relying on comparing hashes
between CA and RA audit logs.
* Adds the profile name and hash to RA audit logs emitted after a
successful issuance.
* Adds new labels to the existing `new_certificates` metric exported by
the RA.
```
# HELP new_certificates A counter of new certificates including the certificate profile name and hexadecimal certificate profile hash
# TYPE new_certificates counter
new_certificates{profileHash="de4c8c8866ed46b1d4af0d79e6b7ecf2d1ea625e26adcbbd3979ececd8fbd05a",profileName="defaultBoulderCertificateProfile"} 2
```

Fixes https://github.com/letsencrypt/boulder/issues/7421
2024-04-23 12:05:25 -04:00
forcedebug b33d28c8bd
Remove repeated words in comments (#7445)
Signed-off-by: forcedebug <forcedebug@outlook.com>
2024-04-23 10:30:33 -04:00
Aaron Gable e05d47a10a
Replace explicit int loops with range-over-int (#7434)
This adopts modern Go syntax to reduce the chance of off-by-one errors
and remove unnecessary loop variable declarations.

Fixes https://github.com/letsencrypt/boulder/issues/7227
2024-04-22 10:34:51 -07:00
Phil Porada 5f616ccdb9
Upgrade go-jose from v2.6.1 to v.4.0.1 (#7345)
Upgrade from the old go-jose v2.6.1 to the newly minted go-jose v4.0.1. 
Cleans up old code now that `jose.ParseSigned` can take a list of
supported signature algorithms.

Fixes https://github.com/letsencrypt/boulder/issues/7390

---------

Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
2024-04-02 17:49:51 -04:00
Aaron Gable 8ac88f557b
RA: Propagate profile name and hash from SA to CA (#7367)
When the order object retrieved from the SA contains a profile name,
propagate that into the request for the CA to issue a precertificate.
Similarly, when the CA's precertificate issuance response contains a
profile hash, propagate that into the request for the CA to issue the
corresponding final certificate.

Fixes https://github.com/letsencrypt/boulder/issues/7366
2024-03-14 14:55:32 -07:00
Samantha 7e5c1ca4bd
RA: Count failed authorizations using key-value rate limits (#7346)
Part of #5545
2024-03-11 17:24:45 -04:00
Samantha 8ede0e9d94
RA/ARI: Add method for tracking certificate replacement (#7293)
- Add new `replaces` field to RA.NewOrder requests
- Pass new `replaces` field to `SA.NewOrderAndAuthzs`
- Add new `limitsExempt` field to RA.NewOrder requests
- Ensure the RA follows this exemption for all NewOrder rate limits
2024-02-12 10:33:46 -05:00
Aaron Gable d1f8fd2921
RA: improve AdministrativelyRevokeCertificate (#7275)
The RA.AdministrativelyRevokeCertificate method has two primary modes of
operation: if a certificate DER blob is provided, it parses and extracts
information from that blob, and revokes the cert; if no DER is provided,
it assumes the cert is malformed, and revokes it (but doesn't do an OCSP
cache purge) based on the serial alone. However, this scheme has
slightly confusing semantics in the RA and requires that the admin
tooling look up the certificates to provide them to the RA.

Instead, add a new "malformed" field to the RA's
AdministrativelyRevokeCertificateRequest, and deprecate the "cert" field
of that same request. When the malformed boolean is false, the RA will
look up and parse the certificate itself. When the malformed field is
true, it will revoke the cert based on serial alone.

Note that the main logic of AdministrativelyRevokeCertificate -- namely
revoking, potentially re-revoking, doing an akamai cache purge, etc --
is not changed by this PR. The only thing that changes here is how the
RA gets access to the to-be-revoked certificate's information.

Part of https://github.com/letsencrypt/boulder/issues/7135
2024-01-29 13:54:44 -08:00
Phil Porada 03152aadc6
RVA: Recheck CAA records (#7221)
Previously, `va.IsCAAValid` would only check CAA records from the
primary VA during initial domain control validation, completely ignoring
any configured RVAs. The upcoming
[MPIC](https://github.com/ryancdickson/staging/pull/8) ballot will
require that it be done from multiple perspectives. With the currently
deployed [Multi-Perspective
Validation](https://letsencrypt.org/2020/02/19/multi-perspective-validation.html)
in staging and production, this change brings us in line with the
[proposed phase
3](https://github.com/ryancdickson/staging/pull/8/files#r1368708684).
This change reuses the existing
[MaxRemoteValidationFailures](21fc191273/cmd/boulder-va/main.go (L35))
variable for the required non-corroboration quorum.
> Phase 3: June 15, 2025 - December 14, 2025 ("CAs MUST implement MPIC
in blocking mode*"):
>
>    MUST implement MPIC? Yes
> Required quorum?: Minimally, 2 remote perspectives must be used. If
using less than 6 remote perspectives, 1 non-corroboration is allowed.
If using 6 or more remote perspectives, 2 non-corroborations are
allowed.
>    MUST block issuance if quorum is not met: Yes.
> Geographic diversity requirements?: Perspectives must be 500km from 1)
the primary perspective and 2) all other perspectives used in the
quorum.
>
> * Note: "Blocking Mode" is a nickname. As opposed to "monitoring mode"
(described in the last milestone), CAs MUST NOT issue a certificate if
quorum requirements are not met from this point forward.

Adds new VA feature flags: 
* `EnforceMultiCAA` instructs a primary VA to command each of its
configured RVAs to perform a CAA recheck.
* `MultiCAAFullResults` causes the primary VA to block waiting for all
RVA CAA recheck results to arrive.


Renamed `va.logRemoteValidationDifferentials` to
`va.logRemoteDifferentials` because it can handle initial domain control
validations and CAA rechecking with minimal editing.

Part of https://github.com/letsencrypt/boulder/issues/7061
2024-01-25 16:23:25 -05:00
Aaron Gable adb9673c37
Exempt renewals from NewOrders rate limit (#7002)
When a client is attempting to open a new Order which is identical to an
already-issued certificate, allow that request to bypass the normal New
Orders rate limit. This will allow renewals to go through even when a
client is exhibiting other bad behavior. This should not open the door
to floods of requests for the same certificate in rapid success, as the
Duplicate Certificates rate limit will still block those.

Fixes https://github.com/letsencrypt/boulder/issues/6792
2024-01-23 14:57:37 -08:00
Aaron Gable ab6e023b6f
Simplify issuance.NameID and how it is used (#7260)
Rename "IssuerNameID" to just "NameID". Similarly rename the standalone
functions which compute it to better describe their function. Add a
.NameID() directly to issuance.Issuer, so that callers in other packages
don't have to directly access the .Cert member of an Issuer. Finally,
rearrange the code in issuance.go to be sensibly grouped as concerning
NameIDs, Certificates, or Issuers, rather than all mixed up between the
three.

Fixes https://github.com/letsencrypt/boulder/issues/5152
2024-01-17 12:55:56 -08:00
Aaron Gable a9a87cd4a8
Remove fallbacks from IssuerNameID to IssuerID (#7259)
The last rows using the old-style IssuerID were written to the database
in late 2021. Those rows have long since aged out -- we no longer serve
certificates or revocation information for them -- so we can remove the
code which handles those old-style IDs. This allows for some nice
simplifications in the CA's ocspImpl and in the Issuance package, which
will be useful for further reorganization of the CA and issuance
packages.

Fixes https://github.com/letsencrypt/boulder/issues/5152
2024-01-12 14:03:49 -08:00
Samantha d281702c17
PA: Improve wildcard exact blocklist implementation (#7218)
Revamp WillingToIssueWildcards to WillingToIssue. Remove the need for
identifier.ACMEIdentifiers in the WillingToIssue(Wildcards) method.
Previously, before invoking this method, a slice of identifiers was
created by looping over each dnsName. However, these identifiers were
solely used in error messages.

Segment the validation process into distinct parts for domain
validation, wildcard validation, and exact blocklist checks. This
approach eliminates the necessity of substituting *. with x. in wildcard
domains.

Introduce a new helper, ValidDomain. It checks that a domain is valid
and that it doesn't contain any invalid wildcard characters.
Functionality from the previous ValidDomain is preserved in
ValidNonWildcardDomain.

Fixes #3323
2023-12-19 14:22:18 -05:00
Aaron Gable 164e035915
Reduce logging from inflight validation collisions (#7209)
If a client attempts to validate a challenge twice in rapid succession,
we'll kick off two background validation routines. One of these will
complete first, updating the database with success or failure. The other
will fail when it attempts to update the database and finds that there
are no longer any authorizations with that ID in the "pending" state.
Reduce the level at which we log such events, since we don't
particularly care about them.

Fixes https://github.com/letsencrypt/boulder/issues/3995
2023-12-15 09:58:34 -08:00
Aaron Gable 5e1bc3b501
Simplify the features package (#7204)
Replace the current three-piece setup (enum of feature variables, map of
feature vars to default values, and autogenerated bidirectional maps of
feature variables to and from strings) with a much simpler one-piece
setup: a single struct with one boolean-typed field per feature. This
preserves the overall structure of the package -- a single global
feature set protected by a mutex, and Set, Reset, and Enabled methods --
although the exact function signatures have all changed somewhat.

The executable config format remains the same, so no deployment changes
are necessary. This change does deprecate the AllowUnrecognizedFeatures
feature, as we cannot tell the json config parser to ignore unknown
field names, but that flag is set to False in all of our deployment
environments already.

Fixes https://github.com/letsencrypt/boulder/issues/6802
Fixes https://github.com/letsencrypt/boulder/issues/5229
2023-12-12 15:51:57 -05:00
Samantha eb49d4487e
ratelimits: Implement batched Spends and Refunds (#7143)
- Move default and override limits, and associated methods, out of the
Limiter to new limitRegistry struct, embedded in a new public
TransactionBuilder.
- Export Transaction and add corresponding Transaction constructor
methods for each limit Name, making Limiter and TransactionBuilder the
API for interacting with the ratelimits package.
- Implement batched Spends and Refunds on the Limiter, the new methods
accept a slice of Transactions.
- Add new boolean fields check and spend to Transaction to support more
complicated cases that can arise in batches:
1. the InvalidAuthorizations limit is checked at New Order time in a
batch with many other limits, but should only be spent when an
Authorization is first considered invalid.
2. the CertificatesPerDomain limit is overridden by
CertficatesPerDomainPerAccount, when this is the case, spends of the
CertificatesPerDomain limit should be "best-effort" but NOT deny the
request if capacity is lacking.
- Modify the existing Spend/Refund methods to support
Transaction.check/spend and 0 cost Transactions.
- Make bucketId private and add a constructor for each bucket key format
supported by ratelimits.
- Move domainsForRateLimiting() from the ra.go to ratelimits. This
avoids a circular import issue in ra.go.

Part of #5545
2023-12-07 11:56:02 -05:00
Phil Porada 51e9f39259
Finish migration from int64 durations to durationpb (#7147)
This is a cleanup PR finishing the migration from int64 durations to
protobuf `*durationpb.Duration` by removing all usage of the old int64
fields. In the previous PR
https://github.com/letsencrypt/boulder/pull/7146 all fields were
switched to read from the protobuf durationpb fields.

Fixes https://github.com/letsencrypt/boulder/issues/7097
2023-11-28 12:51:11 -05:00
Phil Porada 6925fad324
Finish migration from int64 timestamps to timestamppb (#7142)
This is a cleanup PR finishing the migration from int64 timestamps to
protobuf `*timestamppb.Timestamps` by removing all usage of the old
int64 fields. In the previous PR
https://github.com/letsencrypt/boulder/pull/7121 all fields were
switched to read from the protobuf timestamppb fields.

Adds a new case to `core.IsAnyNilOrZero` to check various properties of
a `*timestamppb.Timestamp` reducing the visual complexity for receivers.

Fixes https://github.com/letsencrypt/boulder/issues/7060
2023-11-27 13:37:31 -08:00
Samantha e1312ff319
RA: Fix legacy override utilization metrics (#7124)
- Emit override utilization only when resource counts are under
threshold.
- Override utilization accounts for anticipated issuance.
- Correct the limit metric label for `CertificatesPerName` and
`CertificatesPerFQDNSet/Fast`.

Part of #5545
2023-11-20 14:12:21 -05:00
Jacob Hoffman-Andrews 776287f22a
ra: send UnknownSerial error from GenerateOCSP (#7110)
The allows the ocsp-responder to log unknown serial numbers (vs just
expired ones).

Follow-up of #7108
Updates #7091
2023-11-01 15:07:00 -07:00
Phil Porada 000cd05d54
Remove config live reloader package (#7112)
The CA, RA, and tools importing the PA (policy authority) will no longer
be able to live reload specific config files. Each location is now
responsible for loading the config file.

* Removed the reloader package
* Removed unused `ecdsa_allow_list_status` metric from the CA
* Removed mutex from all ratelimit `limitsImpl` methods

Fixes https://github.com/letsencrypt/boulder/issues/7111

---------

Co-authored-by: Samantha <hello@entropy.cat>
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
2023-10-26 16:06:31 -04:00
Phil Porada b8b105453a
Rename protobuf duration fields to <fieldname>NS and populate new duration fields (#7115)
* Renames all of int64 as a time.Duration fields to `<fieldname>NS` to
indicate they are Unix nanoseconds.
* Adds new `google.protobuf.Duration` fields to each .proto file where
we previously had been using an int64 field to populate a time.Duration.
* Updates relevant gRPC messages.

Part 1 of 3 for https://github.com/letsencrypt/boulder/issues/7097
2023-10-26 10:46:03 -04:00
Phil Porada a5c2772004
Add and populate new protobuf Timestamp fields (#7070)
* Adds new `google.protobuf.Timestamp` fields to each .proto file where
we had been using `int64` fields as a timestamp.
* Updates relevant gRPC messages to populate the new
`google.protobuf.Timestamp` fields in addition to the old `int64`
timestamp fields.
* Added tests for each `<x>ToPB` and `PBto<x>` functions to ensure that
new fields passed into a gRPC message arrive as intended.
* Removed an unused error return from `PBToCert` and `PBToCertStatus`
and cleaned up each call site.

Built on-top of https://github.com/letsencrypt/boulder/pull/7069
Part 2 of 4 related to
https://github.com/letsencrypt/boulder/issues/7060
2023-10-11 12:12:12 -04:00
Jacob Hoffman-Andrews 2fa09d5990
ra: do not emit UnknownSerial yet (#7109)
This is a partial revert of #7103. Because that PR introduces a new
Boulder error code that ocsp-responder needs to be specially aware of,
it has a deployability issue: ocsp-responder needs to be made aware of
the new error code before the RA can start emitting it.

The simple fix is to not emit that error code in the RA until the
ocsp-responder is ready to receive it.
2023-10-04 15:26:41 -07:00
Jacob Hoffman-Andrews c9e9918b80
ocsp-responder(redis): sample logs at configured rate (#7102)
In #6478, we stopped passing through Redis errors to the top-level
Responder object, preferring instead to live-sign. As part of that
change, we logged the Redis errors so they wouldn't disappear. However,
the sample rate for those errors was hard coded to 1-in-1000, instead of
following the LogSampleRate configured in the JSON.

This adds a field to redisSource for logSampleRate, and passes it
through from the JSON config in ocsp-responder/main.go.

Part of #7091
2023-10-02 17:02:24 -07:00
Samantha 162458ff52
RA: Use the same key/id as the override (#7098)
Fix the overrideUsageGauge implementation added in #7076 that resulted
in metric cardinality explosion.
2023-09-25 12:10:59 -04:00
Phil Porada 034316ef6a
Rename int64 timestamp related protobuf fields to <fieldname>NS (#7069)
Rename all of int64 timestamp fields to `<fieldname>NS` to indicate they
are Unix nanosecond timestamps.

Part 1 of 4 related to
https://github.com/letsencrypt/boulder/issues/7060
2023-09-15 13:49:07 -04:00
Samantha 6223acd987
ratelimit: Add an override usage gauge (#7076)
Add a gauge similar to the one added to our key-value rate limits
implementation in #7044.

Part of #7036
2023-09-13 17:34:51 -04:00
Samantha 636d30f4a9
ratelimit: Overhaul metrics for the our existing rate limits (#7054)
- Use constants for each rate limit name to ensure consistency when
labeling metrics
- Consistently check `.Enabled()` outside of each limit check RA method
- Replace the existing checks counter with a latency histogram

Part of #5545
2023-09-11 15:06:16 -04:00
Aaron Gable aae7fb3551
Use go1.21's context.WithoutCancel (#7073)
This new standard library method returns a context with all of the
original metadata (e.g. tracing spans) still attached, but which will
not be canceled by any cancel funcs, deadlines, or timeouts set on the
parent context. We do this manually in a few places to prevent client
cancellations (usually disconnects) from disrupting our work, so this
just makes that code slightly simpler.

Fixes https://github.com/letsencrypt/boulder/issues/5506
2023-09-11 09:01:50 -07:00
Aaron Gable a70fc604a3
Use go1.21's stdlib slices package (#7074)
As of go1.21, there's a new standard library package which provides
basically the same (generic!) methods as the x/exp/slices package has
been. Now that we're on go1.21, let's use the more stable package.

Fixes https://github.com/letsencrypt/boulder/issues/6951
Fixes https://github.com/letsencrypt/boulder/issues/7032
2023-09-08 13:46:46 -07:00
Aaron Gable 62ff373885
Probs: remove divergences from RFC8555 (#6877)
Remove the remaining divergences from RFC8555 regarding what error types
we use in certain situations. Specifically:
- use "invalidContact" instead of "invalidEmail";
- use "unsupportedContact" for contact addresses that use a protocol
other than "mailto:"; and
- use "unsupportedIdentifier" for identifiers that specify a type other
than "dns".
2023-05-15 12:35:12 -07:00
Matthew McPherrin 8427245675
OTel Integration test using jaeger (#6842)
This adds Jaeger's all-in-one dev container (with no persistent storage)
to boulder's dev docker-compose. It configures config-next/ to send all
traces there.

A new integration test creates an account and issues a cert, then
verifies the trace contains some set of expected spans.

This test found that async finalize broke spans, so I fixed that and a
few related spots where we make a new context.
2023-05-05 10:41:29 -04:00
Aaron Gable b0d63e60fc
Fix order_ages histogram to have buckets up to 7 days (#6858)
A copy-paste typo caused the order_ages histogram to stop at 2 days,
rather than 7 days.
2023-05-02 17:39:50 -04:00
Jacob Hoffman-Andrews 1c7e0fd1d8
Store linting certificate instead of precertificate (#6807)
In order to get rid of the orphan queue, we want to make sure that
before we sign a precertificate, we have enough data in the database
that we can fulfill our revocation-checking obligations even if storing
that precertificate in the database fails. That means:

- We should have a row in the certificateStatus table for the serial.
- But we should not serve "good" for that serial until we are positive
the precertificate was issued (BRs 4.9.10).
- We should have a record in the live DB of the proposed certificate's
public key, so the bad-key-revoker can mark it revoked.
- We should have a record in the live DB of the proposed certificate's
names, so it can be revoked if we are required to revoke based on names.

The SA.AddPrecertificate method already achieves these goals for
precertificates by writing to the various metadata tables. This PR
repurposes the SA.AddPrecertificate method to write "proposed
precertificates" instead.

We already create a linting certificate before the precertificate, and
that linting certificate is identical to the precertificate that will be
issued except for the private key used to sign it (and the AKID). So for
instance it contains the right pubkey and SANs, and the Issuer name is
the same as the Issuer name that will be used. So we'll use the linting
certificate as the "proposed precertificate" and store it to the DB,
along with appropriate metadata.

In the new code path, rather than writing "good" for the new
certificateStatus row, we write a new, fake OCSP status string "wait".
This will cause us to return internalServerError to OCSP requests for
that serial (but we won't get such requests because the serial has not
yet been published). After we finish precertificate issuance, we update
the status to "good" with SA.SetCertificateStatusReady.

Part of #6665
2023-04-26 13:54:24 -07:00
Samantha 7048978028
RA: Track order and authz age during NewOrder and FinalizeOrder (#6841)
Overhaul tracking of order and authz age/ reuse to gather data for
#5061.

- Modify `ra.orderAges` histogram to track the method that observed the
order.
- Add observation of the order age at NewOrder time.
- Modify `ra.authzAges` histogram to track the method that observed the
authz as well its type (valid or pending).
- Add observation of authz age at Finalize time.
- Remove `reusedValidAuthzCounter`, which erroneously counted authzs
with status pending as valid.
2023-04-24 16:30:26 -04:00
Phil Porada 3665fe3651
RA: Condense ra.NewOrder rate limit sprawl into ra.checkLimits (#6831)
Moves two rate limits nested around ra.checkLimits into appropriate
positions within ra.checkLimits for readability purposes.

Fixes https://github.com/letsencrypt/boulder/issues/6828
2023-04-21 12:48:33 -04:00
Aaron Gable bd2da9b830
Fix calculation of authz ages at new-order time (#6827)
Previously, we were computing the age of the authorization as "how much
time it has left, minus its total lifetime". In retrospect, this
universally results in negative values, which then all end up in the
zero bucket of the histogram. Fix the arithmetic so we can start getting
accurate data.
2023-04-17 16:05:11 -07:00
Phil Porada 7c1f101423
RA: Simplify ra.RevokeCertByKey and fix duplicate calls to ra.purgeOCSPCache (#6818)
An error was introduced in
https://github.com/letsencrypt/boulder/pull/6758 that when an
already-revoked cert is revoked again (with an upgrade of revocation
reason from "shrug i dunno lol" to "key compromise"),
`ra.RevokeCertByKey` would send two cache purge requests to
`ra.purgeOCSPCache`.

Part 2/2 for fixing https://github.com/letsencrypt/boulder/issues/6806
2023-04-17 11:06:35 -04:00
Aaron Gable 98fa0f07b4
Re-enable errcheck linter (#6819)
Enable the errcheck linter. Update the way we express exclusions to use
the new, non-deprecated, non-regex-based format. Fix all places where we
began accidentally violating errcheck while it was disabled.
2023-04-14 15:41:12 -04:00
Aaron Gable 45329c9472
Deprecate ROCSPStage7 flag (#6804)
Deprecate the ROCSPStage7 feature flag, which caused the RA and CA to
stop generating OCSP responses when issuing new certs and when revoking
certs. (That functionality is now handled just-in-time by the
ocsp-responder.) Delete the old OCSP-generating codepaths from the RA
and CA. Remove the CA's internal reference to an OCSP implementation,
because it no longer needs it.

Additionally, remove the SA's "Issuers" config field, which was never
used.

Fixes #6285
2023-04-12 17:03:06 -07:00
Aaron Gable 373d08bb80
CertsPerName limit: only check renewal exemption once (#6784)
We used to check the CertificatesPerName "renewal exemption" after
checking to see if the rate limit was going to kick in at all. But
checking the rate limit is rather expensive, so #4174 introduced a
feature flag and a new block of code so that we'd check the renewal
exemption first, and short-circuit out of the whole function if it was
met. But when #4771 deprecated the feature flag, it left both blocks of
code in, instead of deleting the old location.

Remove the redundant exemption check.
2023-03-30 16:28:10 -04:00
Phil Porada f5c73a4fcf
RA: Purge OCSP cache on duplicate revocations (#6758)
* Perform Akamai OCSP cache purges on already revoked certificates in
`ra.RevokeCertByKey` and `ra.AdministrativelyRevokeCertificate` for
subsequent client calls to `/acme/revoke-cert`.

* Add `addToBlockedKeys` helper function used by
`ra.AdministrativelyRevokeCertificate` and `ra.RevokeCertByKey` to
de-duplicate code adding subject public key info hash to the
`blockedKeys` table.

* Add a duplicate revocation test to
`ra.TestAdministrativelyRevokeCertificate`

* Fix some grammar in an RA comment.

* Fixes https://github.com/letsencrypt/boulder/issues/4862
2023-03-30 16:24:54 -04:00
Aaron Gable 8b06fee54d
RA: relax CN matchesCSR check (#6757)
The RA's matchesCSR function checks to make sure that various aspects of
the final issued certificate match the CSR provided by the client. One
of these checks is that the CommonName matches the CN specified in the
CSR, or matches a specific one of the SANs if no CN was specified in the
CSR. However, if the logic to select which of the SANs gets promoted
into the CN ever differs between the RA and the CA (for example, in the
midst of an incremental deploy that updates the version of one service
before the other) this causes conflicts and results in Finalize requests
failing with 500s.

Relax the check, so that the RA simply guarantees that the final
certificate's CN matches any of the names in the CSR, not a specific one
of the names. This fixes a deployability bug introduced in #6706.

Part of #5112
2023-03-17 11:12:31 -07:00
Matthew McPherrin e1ed1a2ac2
Remove beeline tracing (#6733)
Remove tracing using Beeline from Boulder. The only remnant left behind
is the deprecated configuration, to ensure deployability.

We had previously planned to swap in OpenTelemetry in a single PR, but
that adds significant churn in a single change, so we're doing this as
multiple steps that will each be significantly easier to reason about
and review.

Part of #6361
2023-03-14 15:14:27 -07:00
Aaron Gable 9af4871e59
Add SetCommonName feature flag (#6706)
Add a new feature flag, `SetCommonName`, which defaults to `true`. In
this default state, no behavior changes.

When set to `false` on the CA, this flag will cause the CA to leave the
Subject commonName field of the certificate blank, as is recommended by
the Baseline Requirements Section 7.1.4.2.2(a).

Also slightly modify the behavior of the RA's `matchesCSR()` function,
to allow for both certificates that have a CN and certificates that
don't. It is not feasible to put this behavior behind the same
SetCommonName flag, because that would require an atomic deploy of both
the RA and the CA.

Obsoletes #5112
2023-03-09 13:31:55 -05:00
Aaron Gable c55ab19191
Register the inflight_finalizes metric (#6724) 2023-03-08 13:53:50 -08:00
Jacob Hoffman-Andrews e052e7445b
admin-revoker: document malformed-revoke (#6714)
In particular, document that it does not purge the Akamai cache.

Also, in the RA, avoid creating a "fake" certificate object containing
only the serial. Instead, use req.Serial directly in most places. This
uncovered some incorrect logic. Fix that logic by gating the operations
that actually need a full *x509.Certificate: revoking by key, and
purging the Akamai cache.

Also, make `req.Serial` mandatory for AdministrativelyRevokeCertificate.

This is a reopen of #6693, which accidentally got merged into a
different feature branch.
2023-03-02 12:02:21 -08:00
Phil Porada fdb9c543b7
Remove ReuseValidAuthz code (#6686)
Removes all code related to the `ReuseValidAuthz` feature flag. The
Boulder default is to now always reuse valid authorizations.

Fixes a panic in `test.AssertErrorIs` when `err` is unexpectedly `nil`
that was found this while reworking the
`TestPerformValidationAlreadyValid` test. The go stdlib `func Is`[1]
does not check for this.

1. https://go.dev/src/errors/wrap.go

Part 2/2, fixes https://github.com/letsencrypt/boulder/issues/2734
2023-02-28 17:57:16 -05:00
Aaron Gable cdf1a6f9f9
Add flag to make order finalization async (#6589)
Add the "AsyncFinalize" feature flag. When enabled, this causes the RA
to return almost immediately from FinalizeOrder requests, with the
actual hard work of issuing the precertificate, getting SCTs, issuing
the final certificate, and updating the database accordingly all
occuring in a background goroutine while the client polls the GetOrder
endpoint waiting for the result.

This is implemented by factoring out the majority of the finalization
work into a new `issueCertificateOuter` helper function, and simply
using the new flag to determine whether we call that helper in a
goroutine or not. This makes removing the feature flag in the future
trivially easy.

Also add a new prometheus metric named `inflight_finalizes` which can be
used to count the number of simultaneous goroutines which are performing
finalization work. This metric is exported regardless of the state of
the AsyncFinalize flag, so that we can observe any changes to this
metric when the flag is flipped.

Fixes #6575
2023-02-24 09:57:54 -08:00
Samantha 595a9511ed
RA: Log CAA reuse/recheck at order finalize time (#6643)
- Log counts of Authzs where CAA was rechecked/reused.
- Move the CAA recheck duration to a single variable in the RA.
- Add new method `InfoObject` to our logger.

Fixes #6560
Part of #6623
2023-02-10 11:23:16 -05:00
Phil Porada 134321040b
Default ReuseValidAuthz to true (#6644)
`ReuseValidAuthz` was introduced
here [1] and enabled in staging and production configs on 2016-07-13. 
There was a brief stint during the TLS-SNI-01 challenge type removal where 
SRE disabled it. However, time has finally come to remove this configuration
option. Issue #6623 will determine the feasibility of shorter authz
lifetimes and potentially the removal of authz reuse.

This change is broken up into two parts to allow SRE to safely remove
the flag from staging and production configs. We'll merge this PR, SRE
will deploy boulder and the config change, then we'll finish removing
`ReuseValidAuthz` configuration from the codebase.

[1] boulder commit 9abc212448

Part 1 of 2 for fixing #2734.
2023-02-09 14:26:06 -05:00
Jacob Hoffman-Andrews 0f642467fe
Add an unlimited setting for pendingAuthorizations (#6628)
This can save some database work counting thousands of rows, when
needed.

Fixes #6604
2023-02-03 12:42:20 -08:00
Phil Porada 9390c0e5f5
Put errors at end of log lines (#6627)
For consistency, put the error field at the end of unstructured log
lines to make them more ... structured.

Adds the `issuerID` field to "orphaning certificate" log line in the CA
to match the "orphaning precertificate" log line.

Fixes broken tests as a result of the CA and bdns log line change.

Fixes #5457
2023-02-03 11:28:38 -05:00
Aaron Gable 1b7eb3d978
RA: Simplify FinalizeOrder flow (#6588)
Simplify the control flow of the FinalizeOrder handler to make it easier
to read and reason about:
- Move all validation to before we set the order to Processing, and put
it all in a single helper funcion.
- Move almost all logEvent/trace handling directly into FinalizeOrder so
it cannot be missed.
- Flatten issueCertificate and issueCertificateInner into a single
helper function, now that they're no longer being called from both
ACMEv1 and v2 entry points.
- Other minor cleanups, such as making SolvedBy not return a pointer and
making matchesCSR private.

This paves the way for making both issueCertificateInner and failOrder
asynchronous, which we plan to do in the near future.

Part of #6575
2023-01-25 17:59:54 -08:00
Phil Porada 26e5b24585
dependencies: Replace square/go-jose.v2 with go-jose/go-jose.v2 (#6598)
Fixes #6573
2023-01-24 12:08:30 -05:00
Aaron Gable b472afd3f8
Remove the CertificateRequest wrapper type (#6579)
The core.CertificateRequest wrapper type only had two fields: an
x509.CertificateRequest, and a slice of bytes representing the unparsed
version of the same request. However, x509.CertificateRequest carries
around its own field .Raw, which serves the same purpose. Also, we
weren't even referencing the .Bytes field anyway. Therefore, get rid of
this redundant wrapper type.

This also makes it clearer just how far the original CSR makes it
through our system. I'd like a future change to remove the CSR from the
request to the CA service, replacing it with a structured object that
only exposes exactly the fields of the CSR we care about, such as names,
public key, and whether or not the must-staple extension should be set.
2023-01-11 13:30:52 -08:00
Aaron Gable 64aa8f4f15
Add histogram to track authz reuse ages (#6554)
Although we have a metric for counting how often we reuse a valid
authorization, no code has been incrementing that metric since 2021.
Re-enable it.

Additionally, that metric doesn't help us know how old authorizations
are when they get reused, which is useful information when deciding
whether or how much to shrink our authorization reuse lifetime. Add a
new histogram metric to track the ages of authorizations when they're
attached to an order.
2022-12-16 14:54:59 -08:00