Previously, cert-checker largely ignored its own config file and always
initialized its logger to log to syslog. This makes cert-checker
initialize its logger in the same way as all other boulder components.
Add a new "-cert-file" input mode to both `admin revoke-cert` and `admin
block-key` which operates on the serial or pubkey found in the
PEM-encoded certificate in the supplied file.
Fixes https://github.com/letsencrypt/boulder/issues/7267
We first introduced caa-log-checker as a remediation item in the wake of
https://bugzilla.mozilla.org/show_bug.cgi?id=1619047. Since that time,
we have upgraded to go1.22, which completely remoes the class of bug
which led to that incident (https://tip.golang.org/doc/go1.22#language).
Throughout its life, caa-log-checker was an operational burden, and was
at best a post-hoc check to detect issues after they had already
occurred. Therefore, we no longer run it in our production environment,
and it can be removed from the Boulder source.
This change introduces a new config key `certProfiles` which contains a
map of `profiles`. Only one of `profile` or `certProfiles` should be
used, because configuring both will result in the CA erroring and
shutting down. Further, the singular `profile` is now
[deprecated](https://github.com/letsencrypt/boulder/issues/7414).
The CA pre-computes several maps at startup;
* A human-readable name to a `*issuance.Profile` which is referred to as
"name".
* A SHA-256 sum over the entire contents of the given profile to the
`*issuance.Profile`. We'll refer to this as "hash".
Internally, CA methods no longer pass an `*issuance.Profile`, instead
they pass a structure containing maps of certificate profile
identifiers. To determine the default profile used by the CA, a new
config field `defaultCertificateProfileName` has been added to the
Issuance struct. Absence of `defaultCertificateProfileName` will cause
the CA to use the default value of `defaultBoulderCertificateProfile`
such as for the the deprecated `profile`. The key for each given
certificate profile will be used as the "name". Duplicate names or
hashes will cause the CA to error during initialization and shutdown.
When the RA calls `ra.CA.IssuePrecertificate`, it will pass an arbitrary
certificate profile name to the CA triggering the CA to lookup if the
name exists in its internal mapping. The RA maintains no state or
knowledge of configured certificate profiles and relies on the CA to
provide this information. If the name exists in the CA's map, it will
return the hash along with the precertificate bytes in a
`capb.IssuePrecertificateResponse`. The RA will then call
`ra.CA.IssueCertificateForPrecertificate` with that same hash. The CA
will lookup the hash to determine if it exists in its map, and if so
will continue on with certificate issuance.
Precertificate and certificate issuance audit logs will now include the
certificate profile name and hex representation of the hash that they
were issued with.
Fixes https://github.com/letsencrypt/boulder/issues/6966
There are no required config or SQL changes.
Give the admin tool a slightly more structured subcommand system:
subcommands are now represented by structs (rather than just methods on
the admin object) which satisfy a particular interface. This interface
separates flag declaration from execution, so that we can have greater
flexibility with regard to command line parsing. This allows the two
top-level flags (-config and -dry-run) to appear anywhere in the command
line (not only before the subcommand name), and allows the -h/-help flag
to print usage information even when other critical flags (like -config)
are missing.
Fixes https://github.com/letsencrypt/boulder/issues/7393
Fixes https://github.com/letsencrypt/boulder/issues/7359
Add a new input method flag to `admin block-key` which processes a file
containing one hexadecimal-encoded SPKI hash on each line. To facilitate
this, restructure the block-key subcommand's execution to more closely
resemble the revoke-cert subcommand, with a parallelism flag and the
ability to run many workers at the same time.
Part of https://github.com/letsencrypt/boulder/issues/7267
- Parse and validate the `profile` field in `newOrder` requests.
- Pass the `profile` field from `newOrder` calls to the resulting
`RA.NewOrder` call.
- When the client requests a specific profile, ensure that the profile
field is populated in the order returned.
Fixes#7332
Part of #7309
Create a new method on the gorm rows object which runs a small closure
for every row retrieved from the database. Use this new method to remove
20 lines of boilerplate from five different SA methods and rocsp-tool.
When a serial is passed in, all extraneous characters that are not
alphanumeric are stripped. The result is checked against
[[core.ValidSerial](9b05c38eb3/core/util.go (L170))]
to ensure that it is a valid hex of 32 or 36 characters and then passed
to the rest of boulder. If the stripped serial is not a valid serial, an
error is thrown and revocation does not proceed.
Use two existing SA methods, KeyBlocked and GetSerialsByKey, to replace
the direct database access previously used by the blockSPKIHash method.
This is less efficient than before -- it now streams the whole set of
affected serials rather than just counting them -- but doing so prevents
us from needing an additional SA method just for counting.
Also update the default mock StorageAuthority and
StorageAuthorityReadOnly provided by the mocks package to return actual
stream objects (which stream zero results) instead of nil, so that tests
can attempt to read from the resulting stream without getting a nil
pointer exception.
Part of https://github.com/letsencrypt/boulder/issues/7350
Add two new methods to the SA, GetSerialsByKey and GetSerialsByAccount,
which use the same query as the admin tool has previously used to get
serials matching a given SPKI hash or a given registration ID. These two
new gRPC methods read the database row-by-row and produce streams of
results to keep SA memory usage low.
Use these methods in the admin tool so it no longer needs a direct
database connection for these actions.
Part of https://github.com/letsencrypt/boulder/issues/7350
Before this PR, this is what example output of using the admin tool
would look like for an example command (in this case, certificate
revocation):
```
$ sudo /vagrant/admin -config /etc/boulder/config/admin.json revoke-cert -serial 2a13699017c283dea4d6ac5ac6d40caa3321
20:22:05.682795 6 admin qK_A7gU Debug server listening on :8000
20:22:05.682908 6 admin ts7-5Ag Versions: admin=(Unspecified Unspecified) Golang=(go1.21.8) BuildHost=(Unspecified)
20:22:05.689743 6 admin 6MX16g0 Found 1 certificates to revoke
20:22:05.691842 6 admin 1b3FsgU dry-run: &proto.AdministrativelyRevokeCertificateRequest{state:impl.MessageState{NoUnkeyedLiterals:pragma.NoUnkeyedLiterals{}, DoNotCompare:pragma.DoNotCompare{}, DoNotCopy:pragma.DoNotCopy{}, atomicMessageInfo:(*impl.MessageInfo)(nil)}, sizeCache:0, unknownFields:[]uint8(nil), Cert:[]uint8(nil), Serial:"2a13699017c283dea4d6ac5ac6d40caa3321", Code:0, AdminName:"root", SkipBlockKey:false, Malformed:false}
20:22:05.691901 6 admin 7tT7rAY Dry run complete. Pass -dry-run=false to mutate the database.
```
after this change the output looks like this:
```
$ sudo /vagrant/admin -config /etc/boulder/config/admin.json revoke-cert -serial 2a13699017c283dea4d6ac5ac6d40caa3321
21:22:13.769728 6 admin qK_A7gU Debug server listening on :8000
21:22:13.770156 6 admin ts7-5Ag Versions: admin=(Unspecified Unspecified) Golang=(go1.21.8) BuildHost=(Unspecified)
21:22:13.779291 6 admin xNuU_gY [AUDIT] admin tool executing a dry-run with the following arguments: revoke-cert -serial 2a13699017c283dea4d6ac5ac6d40caa3321
21:22:13.779534 6 admin 6MX16g0 Found 1 certificates to revoke
21:22:13.784524 6 admin yvHv9AM dry-run: "serial:\"2a13699017c283dea4d6ac5ac6d40caa3321\" adminName:\"root\""
21:22:13.786379 6 admin nKfNswk [AUDIT] admin tool has successfully completed executing a dry-run with the following arguments: revoke-cert -serial 2a13699017c283dea4d6ac5ac6d40caa3321
21:22:13.786951 6 admin 7tT7rAY Dry run complete. Pass -dry-run=false to mutate the database.
```
and with `-dry-run=false`:
```
$ sudo /vagrant/admin -config /etc/boulder/config/admin.json -dry-run=false revoke-cert -serial 2a13699017c283dea4d6ac5ac6d40caa3321
21:23:19.080073 6 admin qK_A7gU Debug server listening on :8000
21:23:19.080510 6 admin ts7-5Ag Versions: admin=(Unspecified Unspecified) Golang=(go1.21.8) BuildHost=(Unspecified)
21:23:19.089588 6 admin iKnckQ0 [AUDIT] admin tool executing with the following arguments: revoke-cert -serial 2a13699017c283dea4d6ac5ac6d40caa3321
21:23:19.089625 6 admin 6MX16g0 Found 1 certificates to revoke
21:23:19.169317 6 admin 9oyv3QY [AUDIT] admin tool has successfully completed executing with the following arguments: revoke-cert -serial 2a13699017c283dea4d6ac5ac6d40caa3321
```
Fixes#7358
Change crl-storer to only require that 1 of the IssuingDistributionPoint
URIs remain consistent between consecutive CRLs in the same sequence.
This allows us to add and remove IDP URIs, so we can change our IDP
scheme over time.
To facilitate this, also move all code which builds or parses IDP
extensions into a single place, so that we don't have to have multiple
definitions of the same types and similar code in many places.
Fixes https://github.com/letsencrypt/boulder/issues/7340
Part of https://github.com/letsencrypt/boulder/issues/7296
Adds post-issuance zlint linting to the `rootCeremony`,
`intermediateCeremony`, and `crossCertCeremony` ceremonies. It calls
zlint directly rather than using the existing
`issueLintCertAndPerformLinting` because the throwaway linting key pair
is unnecessary at this point.
Fixes https://github.com/letsencrypt/boulder/issues/7354
De-duplicate the code that has been replaced by `admin`, and cause all
of its subcommands to print helpful messages indicating the
corresponding `admin` command to run instead.
makes linter happy: not sure why 7 year old typo starts to hit by linter
nowdays though
not sure why github CI can't catch this but running t.sh locally marks
this as typo: (and it is)
We run the RVAs in AWS, where we don't have all the same service
discovery infrastructure we do for the primary VAs and the rest of
Boulder. The solution for populating SRV records we have today hasn't
been reliable, so we'd like to experiment with bringing up RVAs paired
1:1 with a local DNS resolver. This brings back some of the previous
static DNS resolver configuration, though it's not a clean revert
because other configuration has changed in the meantime
Move the CRL issuance logic -- building an x509.RevocationList template,
populating it with correctly-built extensions, linting it, and actually
signing it -- out of the //ca package and into the //issuance package.
This means that the CA's CRL code no longer needs to be able to reach
inside the issuance package to access its issuers and certificates (and
those fields will be able to be made private after the same is done for
OCSP issuance).
Additionally, improve the configuration of CRL issuance, create
additional checks on CRL's ThisUpdate and NextUpdate fields, and make it
possible for a CRL to contain two IssuingDistributionPoint URIs so that
we can migrate to shorter addresses.
IN-10045 tracks the corresponding production changes.
Fixes https://github.com/letsencrypt/boulder/issues/7159
Part of https://github.com/letsencrypt/boulder/issues/7296
Part of https://github.com/letsencrypt/boulder/issues/7294
Part of https://github.com/letsencrypt/boulder/issues/7094
Part of https://github.com/letsencrypt/boulder/issues/7100
Create a new administration tool "bin/admin" as a successor to and
replacement of "admin-revoker".
This new tool supports all the same fundamental capabilities as the old
admin-revoker, including:
- Revoking by serial, by batch of serials, by incident table, and by
private key
- Blocking a key to let bad-key-revoker take care of revocation
- Clearing email addresses from all accounts that use them
Improvements over the old admin-revoker include:
- All commands run in "dry-run" mode by default, to prevent accidental
executions
- All revocation mechanisms allow setting the revocation reason,
skipping blocking the key, indicating that the certificate is malformed,
and controlling the number of parallel workers conducting revocation
- All revocation mechanisms do not parse the cert in question, leaving
that to the RA
- Autogenerated usage information for all subcommands
- A much more modular structure to simplify adding more capabilities in
the future
- Significantly simplified tests with smaller mocks
The new tool has analogues of all of admin-revokers unit tests, and all
integration tests have been updated to use the new tool instead. A
future PR will remove admin-revoker, once we're sure SRE has had time to
update all of their playbooks.
Fixes https://github.com/letsencrypt/boulder/issues/7135
Fixes https://github.com/letsencrypt/boulder/issues/7269
Fixes https://github.com/letsencrypt/boulder/issues/7268
Fixes https://github.com/letsencrypt/boulder/issues/6927
Part of https://github.com/letsencrypt/boulder/issues/6840
Remove the Profile field from issuance.Issuer, to reflect the fact that
profiles are in fact independent pieces of configuration which can be
shared across (and are configured independently of) multiple issuers.
Move the IssuerURL, OCSPUrl, and CRLURL fields from issuance.Profile to
issuance.Issuer, since they reflect fundamental attributes of the
issuer, rather than attributes of a particular profile. This also
reflects the location at which those values are configured, in
issuance.IssuerConfig.
All other changes are fallout from the above: adding a Profile argument
to various methods in the issuance and linting packages, adding a
profile field to the caImpl struct, etc. This change paves the way for
two future changes: moving OCSP and CRL creation into the issuance
package, and supporting multiple simultaneous profiles that the CA can
select between.
Part of https://github.com/letsencrypt/boulder/issues/7159
Part of https://github.com/letsencrypt/boulder/issues/6316
Part of https://github.com/letsencrypt/boulder/issues/6966
Part of #7245.
There are still a few places that use 10.88.88.88 that will be harder to
remove. In particular, some of the Python integration tests start up
their own HTTP servers that differ from challtestsrv in some important
way (like timing out requests). Because challtestsrv already binds to
10.77.77.77:80, those test servers need a different IP address to bind
to. We can probably solve that but I'll leave it for another PR.
Previously, `va.IsCAAValid` would only check CAA records from the
primary VA during initial domain control validation, completely ignoring
any configured RVAs. The upcoming
[MPIC](https://github.com/ryancdickson/staging/pull/8) ballot will
require that it be done from multiple perspectives. With the currently
deployed [Multi-Perspective
Validation](https://letsencrypt.org/2020/02/19/multi-perspective-validation.html)
in staging and production, this change brings us in line with the
[proposed phase
3](https://github.com/ryancdickson/staging/pull/8/files#r1368708684).
This change reuses the existing
[MaxRemoteValidationFailures](21fc191273/cmd/boulder-va/main.go (L35))
variable for the required non-corroboration quorum.
> Phase 3: June 15, 2025 - December 14, 2025 ("CAs MUST implement MPIC
in blocking mode*"):
>
> MUST implement MPIC? Yes
> Required quorum?: Minimally, 2 remote perspectives must be used. If
using less than 6 remote perspectives, 1 non-corroboration is allowed.
If using 6 or more remote perspectives, 2 non-corroborations are
allowed.
> MUST block issuance if quorum is not met: Yes.
> Geographic diversity requirements?: Perspectives must be 500km from 1)
the primary perspective and 2) all other perspectives used in the
quorum.
>
> * Note: "Blocking Mode" is a nickname. As opposed to "monitoring mode"
(described in the last milestone), CAs MUST NOT issue a certificate if
quorum requirements are not met from this point forward.
Adds new VA feature flags:
* `EnforceMultiCAA` instructs a primary VA to command each of its
configured RVAs to perform a CAA recheck.
* `MultiCAAFullResults` causes the primary VA to block waiting for all
RVA CAA recheck results to arrive.
Renamed `va.logRemoteValidationDifferentials` to
`va.logRemoteDifferentials` because it can handle initial domain control
validations and CAA rechecking with minimal editing.
Part of https://github.com/letsencrypt/boulder/issues/7061
These names corresponded to single instances of a service, and were
primarily used for (a) specifying which interface to bind a gRPC port on
and (b) allowing `health-checker` to check individual instances rather
than a service as a whole.
For (a), change the `--grpc-addr` flags to bind to "all interfaces." For
(b), provide a specific IP address and port for health checking. This
required adding a `--hostOverride` flag for `health-checker` because the
service certificates contain hostname SANs, not IP address SANs.
Clarify the situation with nonce services a little bit. Previously we
had one nonce "service" in Consul and got nonces from that (i.e.
randomly between the two nonce-service instances). Now we have two nonce
services in consul, representing multiple datacenters, and one of them
is explicitly configured as the "get" service, while both are configured
as the "redeem" service.
Part of #7245.
Note this change does not yet get rid of the rednet/bluenet distinction,
nor does it get rid of all use of 10.88.88.88. That will be a followup
change.
Rename "IssuerNameID" to just "NameID". Similarly rename the standalone
functions which compute it to better describe their function. Add a
.NameID() directly to issuance.Issuer, so that callers in other packages
don't have to directly access the .Cert member of an Issuer. Finally,
rearrange the code in issuance.go to be sensibly grouped as concerning
NameIDs, Certificates, or Issuers, rather than all mixed up between the
three.
Fixes https://github.com/letsencrypt/boulder/issues/5152
Enable the atomicalign, deepequalerrors, findcall, nilness,
reflectvaluecompare, sortslice, timeformat, and unusedwrite go vet
analyzers, which golangci-lint does not enable by default. Additionally,
enable new go vet analyzers by default as they become available.
The fieldalignment and shadow analyzers remain disabled because they
report so many errors that they should be fixed in a separate PR.
Note that the nilness analyzer appears to have found one very real bug
in tlsalpn.go.
Upgrade to zlint v3.6.0
Two new lints are triggered in various places:
aia_contains_internal_names is ignored in integration test
configurations, and unit tests are updated to have more realistic URLs.
The w_subject_common_name_included lint needs to be ignored where we'd
ignored n_subject_common_name_included before.
Related to https://github.com/letsencrypt/boulder/issues/7261
Revamp WillingToIssueWildcards to WillingToIssue. Remove the need for
identifier.ACMEIdentifiers in the WillingToIssue(Wildcards) method.
Previously, before invoking this method, a slice of identifiers was
created by looping over each dnsName. However, these identifiers were
solely used in error messages.
Segment the validation process into distinct parts for domain
validation, wildcard validation, and exact blocklist checks. This
approach eliminates the necessity of substituting *. with x. in wildcard
domains.
Introduce a new helper, ValidDomain. It checks that a domain is valid
and that it doesn't contain any invalid wildcard characters.
Functionality from the previous ValidDomain is preserved in
ValidNonWildcardDomain.
Fixes#3323
Use policy.ValidEmail to vet email addresses before sending expiration
notifications to them. This same check is performed by notify-mailer,
and it helps reduce the number of invalid addresses we attempt to send
to and the number of email bounces we generate.
Additionally, mark certificates as having had a nag email sent if there
are no valid addresses for us to send to, so that we don't constantly
retry them.
Fixes https://github.com/letsencrypt/boulder/issues/5372
Change the max value of the CA's `SerialPrefix` config value from 255 (a
byte of all 1s) to 127 (a byte of one 0 followed by seven 1s). This
prevents the serial prefix from ever beginning with a 1.
This is important because serials are interpreted as signed
(twos-complement) integers, and are required to be positive -- a serial
whose first bit is 1 is considered to be negative and therefore in
violation of RFC 5280. The go stdlib fixes this for us by prepending a
zero byte to any serial that begins with a 1 bit, but we'd prefer all
our serials to be the same length.
Corresponding config change was completed in IN-9880.
Replace the current three-piece setup (enum of feature variables, map of
feature vars to default values, and autogenerated bidirectional maps of
feature variables to and from strings) with a much simpler one-piece
setup: a single struct with one boolean-typed field per feature. This
preserves the overall structure of the package -- a single global
feature set protected by a mutex, and Set, Reset, and Enabled methods --
although the exact function signatures have all changed somewhat.
The executable config format remains the same, so no deployment changes
are necessary. This change does deprecate the AllowUnrecognizedFeatures
feature, as we cannot tell the json config parser to ignore unknown
field names, but that flag is set to False in all of our deployment
environments already.
Fixes https://github.com/letsencrypt/boulder/issues/6802
Fixes https://github.com/letsencrypt/boulder/issues/5229
Many services already have --addr and/or --debug-addr flags.
However, it wasn't universal, so this PR adds flags to commands where
they're not currently present.
This makes it easier to use a shared config file but listen on different
ports, for running multiple instances on a single host.
The config options are made optional as well, and removed from
config-next/.
- Move default and override limits, and associated methods, out of the
Limiter to new limitRegistry struct, embedded in a new public
TransactionBuilder.
- Export Transaction and add corresponding Transaction constructor
methods for each limit Name, making Limiter and TransactionBuilder the
API for interacting with the ratelimits package.
- Implement batched Spends and Refunds on the Limiter, the new methods
accept a slice of Transactions.
- Add new boolean fields check and spend to Transaction to support more
complicated cases that can arise in batches:
1. the InvalidAuthorizations limit is checked at New Order time in a
batch with many other limits, but should only be spent when an
Authorization is first considered invalid.
2. the CertificatesPerDomain limit is overridden by
CertficatesPerDomainPerAccount, when this is the case, spends of the
CertificatesPerDomain limit should be "best-effort" but NOT deny the
request if capacity is lacking.
- Modify the existing Spend/Refund methods to support
Transaction.check/spend and 0 cost Transactions.
- Make bucketId private and add a constructor for each bucket key format
supported by ratelimits.
- Move domainsForRateLimiting() from the ra.go to ratelimits. This
avoids a circular import issue in ra.go.
Part of #5545
- Adds a feature flag to gate rollout for SHA256 Subject Key Identifiers
for end-entity certificates.
- The ceremony tool will now use the RFC 7093 section 2 option 1 method
for generating Subject Key Identifiers for future root CA, intermediate
CA, and cross-sign ceremonies.
- - - -
[RFC 7093 section 2 option
1](https://datatracker.ietf.org/doc/html/rfc7093#section-2) provides a
method for generating a truncated SHA256 hash for the Subject Key
Identifier field in accordance with Baseline Requirement [section
7.1.2.11.4 Subject Key
Identifier](90a98dc7c1/docs/BR.md (712114-subject-key-identifier)).
> [RFC5280] specifies two examples for generating key identifiers from
> public keys. Four additional mechanisms are as follows:
>
> 1) The keyIdentifier is composed of the leftmost 160-bits of the
> SHA-256 hash of the value of the BIT STRING subjectPublicKey
> (excluding the tag, length, and number of unused bits).
The related [RFC 5280 section
4.2.1.2](https://datatracker.ietf.org/doc/html/rfc5280#section-4.2.1.2)
states:
> For CA certificates, subject key identifiers SHOULD be derived from
> the public key or a method that generates unique values. Two common
> methods for generating key identifiers from the public key are:
> ...
> Other methods of generating unique numbers are also acceptable.
When running in manual mode, the `configFile` variable will take the
zero value of `""` while `manualConfigFile` will be provided on the CLI
by the operator. A startup check incorrectly dereferences `configFile`;
but correctly determines that it is the zero value `""`, outputs the
help text, and exits never allowing manual mode to perform work.
Fixes https://github.com/letsencrypt/boulder/issues/7176
This is a cleanup PR finishing the migration from int64 timestamps to
protobuf `*timestamppb.Timestamps` by removing all usage of the old
int64 fields. In the previous PR
https://github.com/letsencrypt/boulder/pull/7121 all fields were
switched to read from the protobuf timestamppb fields.
Adds a new case to `core.IsAnyNilOrZero` to check various properties of
a `*timestamppb.Timestamp` reducing the visual complexity for receivers.
Fixes https://github.com/letsencrypt/boulder/issues/7060
To simplify deployment of the log validator, this allows wildcards
(using go's filepath.Glob) to be included in the file paths.
In order to detect new files, a new background goroutine polls the glob
patterns every minute for matches.
Because the "monitor" function is running in its own goroutine, a lock
is needed to ensure it's not trying to add new tailers while shutdown is
happening.
The main function in log-validator is overly big, so this refactors it
in preparation for adding support for file globs, which is started in
the (currently draft) #7134
This PR should be functionally a no-op, except for one change:
The 1-per-second ratelimiter is moved to be per-file, so it's
1-per-second-per-file. I think this more closely aligns with what we'd
want, as we could potentially miss that a file had bad lines in it if it
was overwhelmed by another log file at the same time.
Update cert-checker's logic to only ask the Policy Authority if we are
willing to issue for the Subject Alternative Names in a cert, to avoid
asking the PA about the Common Name when that CN is just the empty
string.
Add a new check to cert-checker to ensure that the Common Name is
exactly equal to one of the SANs, to fill a theoretical gap created by
loosening the check above.
Fixes https://github.com/letsencrypt/boulder/issues/7156
The hpcloud version appears abandoned, with numerous unfixed bugs
including ones that can cause it to miss data. The nxadm fork is
maintained.
The updated tail also pulls in an updated fsnotify. We had it vendored
at two paths before, so this has a side benefit of simplifying us to
having just one copy.
The CA, RA, and tools importing the PA (policy authority) will no longer
be able to live reload specific config files. Each location is now
responsible for loading the config file.
* Removed the reloader package
* Removed unused `ecdsa_allow_list_status` metric from the CA
* Removed mutex from all ratelimit `limitsImpl` methods
Fixes https://github.com/letsencrypt/boulder/issues/7111
---------
Co-authored-by: Samantha <hello@entropy.cat>
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
* Adds new `google.protobuf.Timestamp` fields to each .proto file where
we had been using `int64` fields as a timestamp.
* Updates relevant gRPC messages to populate the new
`google.protobuf.Timestamp` fields in addition to the old `int64`
timestamp fields.
* Added tests for each `<x>ToPB` and `PBto<x>` functions to ensure that
new fields passed into a gRPC message arrive as intended.
* Removed an unused error return from `PBToCert` and `PBToCertStatus`
and cleaned up each call site.
Built on-top of https://github.com/letsencrypt/boulder/pull/7069
Part 2 of 4 related to
https://github.com/letsencrypt/boulder/issues/7060
Integrate the key-value rate limits from #6947 into the WFE. Rate limits
are backed by the Redis source added in #7016, and use the SRV record
shard discovery added in #7042.
Part of #5545
The certificates table has no index on expires, so including expires in
the query was causing unnecessarily slow queries.
Expires was part of the query in order to support the "UnexpiredOnly"
feature. This feature isn't really necessary; if we tell cert-checker to
check certain certificates, we want to know about problems even if the
certificates are expired. Deprecate the feature and always include all
certificates in the time range. Remove "expires" from the queries.
This allows simplifying the query structure a bit and inlining query
arguments in each site where they are used.
Update the error message returned when no rows are returned for the
MIN(id) query.
The precursor to UnexpiredOnly was introduced in
https://github.com/letsencrypt/boulder/pull/1644, but I can't find any
reference to a requirement for the flag.
Fixes#7085
* Allows the ceremony tool to add the `onlyContainsCACerts` flag to the
`IssuingDistributionPoint` extension[1] for CRLs.
* Add a lint to detect basic usage of this new flag.
* Add a helper function which doesn't (yet) exist in golang
x/crypto/cryptobyte named `ReadOptionalASN1BooleanWithTag` which
searches for an optional DER-encoded ASN.1 element tagged with a given
tag e.g. onlyContainsUserCerts and reports values back to the caller.
* Each revoked certificate in the CRL config is checked for is `IsCA` to
maintain conformance with RFC 5280 Section 6.3.3 b.2.iii [2].
> (iii) If the onlyContainsCACerts boolean is asserted in the
> IDP CRL extension, verify that the certificate
> includes the basic constraints extension with the cA
> boolean asserted.
Fixes https://github.com/letsencrypt/boulder/issues/7047
1. https://datatracker.ietf.org/doc/html/rfc5280#section-5.2.5
2. https://datatracker.ietf.org/doc/html/rfc5280#section-6.3.3
In #6478, we stopped passing through Redis errors to the top-level
Responder object, preferring instead to live-sign. As part of that
change, we logged the Redis errors so they wouldn't disappear. However,
the sample rate for those errors was hard coded to 1-in-1000, instead of
following the LogSampleRate configured in the JSON.
This adds a field to redisSource for logSampleRate, and passes it
through from the JSON config in ocsp-responder/main.go.
Part of #7091
Replace crl-updater's overly complex RunOnce and updateIssuer methods
with a single, much simpler RunOnce that is modeled off of the
recently-redone continuous Run method's model. Instead of breaking
things down by issuer then shard, simply kick off everything in
parallel. This also improves batch mode's ability to listen for context
cancellations at all the appropriate times.
At the same time, move getShardMappings into the shared updater.go file
because it is used by both the batch and continuous modes of operation,
and improve uniformity of usage of the crlId structure in log output.
Fixes https://github.com/letsencrypt/boulder/issues/7066
Delete our forked version of the x509 library, and update all call-sites
to use the version that we upstreamed and got released in go1.21. This
requires making a few changes to calling code:
- replace crl_x509.RevokedCertificate with x509.RevocationListEntry
- replace RevocationList.RevokedCertificates with
RevocationList.RevokedCertificateEntries
- make RevocationListEntry.ReasonCode a non-pointer integer
Our lints cannot yet be updated to use the new types and fields, because
those improvements have not yet been adopted by the zcrypto/x509 package
used by the linting framework.
Fixes https://github.com/letsencrypt/boulder/issues/6741
From the Go specification:
> "3. If the map is nil, the number of iterations is 0."
https://go.dev/ref/spec#For_range
Therefore, an additional nil check for before the loop is unnecessary.
Overhaul crl-updater's default (i.e. non-runOnce) behavior to update
individual CRL shards continuously, rather than updating all shards in a
large batch.
To accomplish this, it spins up one goroutine for each shard of each
issuer this updater is responsible for. Each goroutine is solely
responsible for its assigned shard. It sleeps for a random amount of
time (to stagger their starts), then begins a ticker to wake up every
updateInterval and re-issue its shard.
As part of this change, refactor updater.go into three separate files
(batch.go, continuous.go, and updater.go) containing functions dedicated
to single-run batch processing, long-running continuous processing, and
shared helpers, respectively.
IN-9475 tracks the deprecation of the `updateOffset` config key. The
other configuration changes in this PR do not require production
changes.
Fixes https://github.com/letsencrypt/boulder/issues/7023
Run staticcheck as a standalone binary rather than as a library via
golangci-lint. From the golangci-lint help out,
> staticcheck (megacheck): It's a set of rules from staticcheck. It's
not the same thing as the staticcheck binary. The author of staticcheck
doesn't support or approve the use of staticcheck as a library inside
golangci-lint.
We decided to disable ST1000 which warns about incorrect or missing
package comments.
For SA4011, I chose to change the semantics[1] of the for loop rather
than ignoring the SA4011 lint for that line.
Fixes https://github.com/letsencrypt/boulder/issues/6988
1. https://go.dev/ref/spec#Continue_statements
Our median CRL generation time is 10 seconds and our 99th percentile CRL
generation time is 30 seconds. Reduce the default update timeout from
one hour to ten minutes, to reduce how long we're locked out of updating
a shard which failed.
In `//cmd/ceremony`:
* Added `CertificateToCrossSignPath` to the `cross-certificate` ceremony
type. This new input field takes an existing certificate that will be
cross-signed and performs checks against the manually configured data in
each ceremony file.
* Added byte-for-byte subject/issuer comparison checks to root,
intermediate, and cross-certificate ceremonies to detect that signing is
happening as expected.
* Added Fermat factorization check from the `//goodkey` package to all
functions that generate new key material.
In `//linter`:
* The Check function now exports linting certificate bytes. The idea is
that a linting certificate's `tbsCertificate` bytes can be compared
against the final certificate's `tbsCertificate` bytes as a verification
that `x509.CreateCertificate` was deterministic and produced identical
DER bytes after each signing operation.
Other notable changes:
* Re-orders the issuers list in each CA config to match staging and
production. There is an ordering issue mentioned by @aarongable two
years ago on IN-5913 that didn't make it's way back to this repository.
> Order here matters – the default chain we serve for each intermediate
should be the first listed chain containing that intermediate.
* Enables `ECDSAForAll` in `config-next` CA configs to match Staging.
* Generates 2x new ECDSA subordinate CAs cross-signed by an RSA root and
adds these chains to the WFE for clients to download.
* Increased the test.sh startup timeout to account for the extra
ceremony run time.
Fixes https://github.com/letsencrypt/boulder/issues/7003
---------
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
This adds a lookup in cert-checker to find the linting precertificate
with the same serial number as a given final certificate, and checks
precertificate correspondence between the two.
Fixes#6959
This design seeks to reduce read-pressure on our DB by moving rate limit
tabulation to a key-value datastore. This PR provides the following:
- (README.md) a short guide to the schemas, formats, and concepts
introduced in this PR
- (source.go) an interface for storing, retrieving, and resetting a
subscriber bucket
- (name.go) an enumeration of all defined rate limits
- (limit.go) a schema for defining default limits and per-subscriber
overrides
- (limiter.go) a high-level API for interacting with key-value rate
limits
- (gcra.go) an implementation of the Generic Cell Rate Algorithm, a
leaky bucket-style scheduling algorithm, used to calculate the present
or future capacity of a subscriber bucket using spend and refund
operations
Note: the included source implementation is test-only and currently
accomplished using a simple in-memory map protected by a mutex,
implementations using Redis and potentially other data stores will
follow.
Part of #5545
Add a new feature flag, LeaseCRLShards, which controls certain aspects
of crl-updater's behavior.
When this flag is enabled, crl-updater calls the new SA.LeaseCRLShard
method before beginning work on a shard. This prevents it from stepping
on the toes of another crl-updater instance which may be working on the
same shard. This is important to prevent two competing instances from
accidentally updating a CRL's Number (which is an integer representation
of its thisUpdate timestamp) *backwards*, which would be a compliance
violation.
When this flag is enabled, crl-updater also calls the new
SA.UpdateCRLShard method after finishing work on a shard.
In the future, additional work will be done to make crl-updater use the
"give me the oldest available shard" mode of the LeaseCRLShard method.
Fixes https://github.com/letsencrypt/boulder/issues/6897
Completely remove the ability to configure Certificate Policy OIDs in
the Boulder CA. Instead, hard-code the Baseline Requirements Domain
Validated Reserved Policy Identifier. Boulder will never perform OV or
EV validation, so this is the only identifier that will be necessary.
In the ceremony tool, introduce additional checks that assert that Root
certificates do not have policies, and Intermediate certificates have
exactly the one Baseline Requirements Domain Validated Reserved Policy
Identifier.
This change replaces [gorp] with [borp].
The changes consist of a mass renaming of the import and comments / doc
fixups, plus modifications of many call sites to provide a
context.Context everywhere, since gorp newly requires this (this was one
of the motivating factors for the borp fork).
This also refactors `github.com/letsencrypt/boulder/db.WrappedMap` and
`github.com/letsencrypt/boulder/db.Transaction` to not embed their
underlying gorp/borp objects, but to have them as plain fields. This
ensures that we can only call methods on them that are specifically
implemented in `github.com/letsencrypt/boulder/db`, so we don't miss
wrapping any. This required introducing a `NewWrappedMap` method along
with accessors `SQLDb()` and `BorpDB()` to get at the internal fields
during metrics and logging setup.
Fixes#6944
The inclusion of Policy Qualifiers inside Policy Information elements of
a Certificate Policies extension is now NOT RECOMMENDED by the Baseline
Requirements. We have already removed these fields from all of our
Boulder configuration, and ceased issuing certificates with Policy
Qualifiers.
Remove all support for configuring and including Policy Qualifiers in
our certificates, both in Boulder's main issuance path and in our
ceremony tool. Switch from using the policyasn1 library to manually
encode these extensions, to using the crypto/x509's
Certificate.PolicyIdentifiers field. Delete the policyasn1 package as it
is no longer necessary.
Fixes https://github.com/letsencrypt/boulder/issues/6880
Update zlint to v3.5.0, which introduces scaffolding for running lints
over CRLs.
Convert all of our existing CRL checks to structs which match the zlint
interface, and add them to the registry. Then change our linter's
CheckCRL function, and crl-checker's Validate function, to run all lints
in the zlint registry.
Finally, update the ceremony tool to run these lints as well.
This change touches a lot of files, but involves almost no logic
changes. It's all just infrastructure, changing the way our lints and
their tests are shaped, and moving test files into new homes.
Fixes https://github.com/letsencrypt/boulder/issues/6934
Fixes https://github.com/letsencrypt/boulder/issues/6979
For "ordinary" errors like "file not found" for some part of the config,
we would prefer to log an error and exit without logging about a panic
and printing a stack trace.
To achieve that, we want to call `defer AuditPanic()` once, at the top
of `cmd/boulder`'s main. That's so early that we haven't yet parsed the
config, which means we haven't yet initialized a logger. We compromise:
`AuditPanic` now calls `log.Get()`, which will retrieve the configured
logger if one has been set up, or will create a default one (which logs
to stderr/stdout).
AuditPanic and Fail/FailOnError now cooperate: Fail/FailOnError panic
with a special type, and AuditPanic checks for that type and prints a
simple message before exiting when it's present.
This PR also coincidentally fixes a bug: panicking didn't previously
cause the program to exit with nonzero status, because it recovered the
panic but then did not explicitly exit nonzero.
Fixes#6933
Previously, we had three chained calls initializing a database:
- InitWrappedDb calls NewDbMap
- NewDbMap calls NewDbMapFromConfig
Since all three are exporetd, this left me wondering when to call one vs
the others.
It turns out that NewDbMap is only called from tests, so I renamed it to
DBMapForTest to make that clear.
NewDbMapFromConfig is only called internally to the SA, so I made it
unexported it as newDbMapFromMysqlConfig.
Also, I copied the ParseDSN call into InitWrappedDb, so it doesn't need
to call DBMapForTest. Now InitWrappedDb and DBMapForTest both
independently call newDbMapFromMysqlConfig.
I also noticed that InitDBMetrics was only called internally so I
unexported it.
Add the necessary scaffolding for deep health checking of our various
gRPC components. Each component implementation that also implements the
grpc.checker interface will be checked periodically, and the health
status of the component will be updated accordingly.
Add the necessary methods to SA to implement the grpc.checker interface
and register these new health checks with Consul.
Additionally:
- Update entry point script to check for ProxySQL readiness.
- Increase the poll rate for gRPC Consul checks from 5s to 2s to help
with DNS failures, due to check failures, on startup.
- Change log level for Consul from INFO to ERROR to deal with noisy logs
full of transport failures due to Consul gRPC checks firing before the
SAs are up.
Fixes#6878
Part of #6795
When a user wants their email address deleted from the database but no
longer has access to their account, this allows an administrator to
clear it.
This adds `admin` as an alias for `admin-revoker`, because we'd like the
clear-email sub-command to be a part of that overall tool, but it's not
really revocation related.
Part of #6864
Move the creation of the FilterSource outside of the conditional block,
so that the underlying source gets wrapped no matter which kind (either
a inMemorySource or a checkedRedisSource) it is.
This has two advantages: first, it means that static ocsp responders are
safer and more accurate, because they're not basing their responses on
both the issuer and the serial, not just the serial; and second, it
makes the current config validation tag which marks the "issuerCerts"
config field as required with `min=1` accurate.
Add per-shard exponential backoff and retry to crl-updater. Each
individual CRL shard will be retried up to MaxAttempts (default 1)
times, with exponential backoff starting at 1 second and maxing out at 1
minute between each attempt.
This can effectively reduce the parallelism of crl-updater: while a
goroutine is sleeping between attempts of a failing shard, it is not
doing work on another shard. This is a desirable feature, since it means
that crl-updater gently reduces the total load it places on the network
and database when shards start to fail.
Setting this new config parameter is tracked in IN-9140
Fixes https://github.com/letsencrypt/boulder/issues/6895
Previously if you passed `-h` or `-help` to a sub-sub-command of
admin-revoker it would error out with a red message and a stack trace
(in addition to printing help).
Now, it will print help and exit 1.
Most boulder components have a command line flag to override what gRPC
and debug port they listen on, which is used in tests to run multiple
instances with the same configuration.
However, CA's flag is named "--ca-addr", and not "--addr". This is
inconsistent with SA, RA, VA, nonce, publisher, and purger.
This flag isn't used in production, where we set it in the config file,
so it shouldn't be a breaking change to rename it.
- Make config validation run by default for all Boulder components with
a registered validator.
- Refactor main to parse `boulder` flags directly instead of declaring
them as subcommands.
- Remove the `validate` subcommand and update relevant docs.
- Fix configuration validation for issuer (file source) OCSP responder.
Fixes#6857Fixes#6763
This PR adds a new configuration block specifically for the otelhttp
instrumentation. This block is separate from the existing
"opentelemetry" configuration, and is only relevant when using otelhttp
instrumentation. It does not share any codepath with the existing
configuration, so it is at the top level to indicate which services it
applies to.
There's a bit of plumbing new configuration through. I've adopted the
measured_http package to also set up opentelemetry instead of just
metrics, which should hopefully allow any future changes to be smaller
(just config & there) and more consistent between the wfe2 and ocsp
responder.
There's one option here now, which disables setting
[otelhttp.WithPublicEndpoint](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp#WithPublicEndpoint).
This option is designed to do exactly what we want: Don't accept
incoming spans as parents of the new span created in the server.
Previously we had a setting to disable parent-based sampling to help
with this problem, which doesn't really make sense anymore, so let's
just remove it and simplify that setup path. The default of "false" is
designed to be the safe option. It's set to True in the test/ configs
for integration tests that use traces, and I expect we'll likely set it
true in production eventually once the LBs are configured to handle
tracing themselves.
Fixes#6851
Currently we set WaitForReady(true), which causes gRPC requests to not
fail immediately if no backends are available, but instead wait until
the timeout in case a backend does become available. The downside is
that this behavior masks true connection errors. We'd like to turn it
off.
Fixes#6834
This adds Jaeger's all-in-one dev container (with no persistent storage)
to boulder's dev docker-compose. It configures config-next/ to send all
traces there.
A new integration test creates an account and issues a cert, then
verifies the trace contains some set of expected spans.
This test found that async finalize broke spans, so I fixed that and a
few related spots where we make a new context.
In order to get rid of the orphan queue, we want to make sure that
before we sign a precertificate, we have enough data in the database
that we can fulfill our revocation-checking obligations even if storing
that precertificate in the database fails. That means:
- We should have a row in the certificateStatus table for the serial.
- But we should not serve "good" for that serial until we are positive
the precertificate was issued (BRs 4.9.10).
- We should have a record in the live DB of the proposed certificate's
public key, so the bad-key-revoker can mark it revoked.
- We should have a record in the live DB of the proposed certificate's
names, so it can be revoked if we are required to revoke based on names.
The SA.AddPrecertificate method already achieves these goals for
precertificates by writing to the various metadata tables. This PR
repurposes the SA.AddPrecertificate method to write "proposed
precertificates" instead.
We already create a linting certificate before the precertificate, and
that linting certificate is identical to the precertificate that will be
issued except for the private key used to sign it (and the AKID). So for
instance it contains the right pubkey and SANs, and the Issuer name is
the same as the Issuer name that will be used. So we'll use the linting
certificate as the "proposed precertificate" and store it to the DB,
along with appropriate metadata.
In the new code path, rather than writing "good" for the new
certificateStatus row, we write a new, fake OCSP status string "wait".
This will cause us to return internalServerError to OCSP requests for
that serial (but we won't get such requests because the serial has not
yet been published). After we finish precertificate issuance, we update
the status to "good" with SA.SetCertificateStatusReady.
Part of #6665
Export new prometheus metrics for the `notBefore` and `notAfter` fields
to track internal certificate validity periods when calling the `Load()`
method for a `*tls.Config`. Each metric is labeled with the `serial`
field.
```
tlsconfig_notafter_seconds{serial="2152072875247971686"} 1.664821961e+09
tlsconfig_notbefore_seconds{serial="2152072875247971686"} 1.664821960e+09
```
Fixes https://github.com/letsencrypt/boulder/issues/6829
This moves command() from cmd/ into core.Command() and uses it from the
log and main package, ensuring we have a single implementation of
path.Base(os.Args[0]) instead of scattering that method around. I put it
in core/util.go as similar enough functions like the BuildID and
revision info already live there, though I am not entirely sure it's the
right place.
This came up in @aarongable's review of #6750, where he pointed out I
was manually hardcoding commands instead of using command() or similar.
Add a new shared config stanza which all boulder components can use to
configure their Open Telemetry tracing. This allows components to
specify where their traces should be sent, what their sampling ratio
should be, and whether or not they should respect their parent's
sampling decisions (so that web front-ends can ignore sampling info
coming from outside our infrastructure). It's likely we'll need to
evolve this configuration over time, but this is a good starting point.
Add basic Open Telemetry setup to our existing cmd.StatsAndLogging
helper, so that it gets initialized at the same time as our other
observability helpers. This sets certain default fields on all
traces/spans generated by the service. Currently these include the
service name, the service version, and information about the telemetry
SDK itself. In the future we'll likely augment this with information
about the host and process.
Finally, add instrumentation for the HTTP servers and grpc
clients/servers. This gives us a starting point of being able to monitor
Boulder, but is fairly minimal as this PR is already somewhat unwieldy:
It's really only enough to understand that everything is wired up
properly in the configuration. In subsequent work we'll enhance those
spans with more data, and add more spans for things not automatically
traced here.
Fixes https://github.com/letsencrypt/boulder/issues/6361
---------
Co-authored-by: Aaron Gable <aaron@aarongable.com>
Although #6771 significantly cleaned up how gRPC services stop and clean
up, it didn't make any changes to our HTTP servers or our non-server
(e.g. crl-updater, log-validator) processes. This change finishes the
work.
Add a new helper method cmd.WaitForSignal, which simply blocks until one
of the three signals we care about is received. This easily replaces all
calls to cmd.CatchSignals which passed `nil` as the callback argument,
with the added advantage that it doesn't call os.Exit() and therefore
allows deferred cleanup functions to execute. This new function is
intended to be the last line of main(), allowing the whole process to
exit once it returns.
Reimplement cmd.CatchSignals as a thin wrapper around cmd.WaitForSignal,
but with the added callback functionality. Also remove the os.Exit()
call from CatchSignals, so that the main goroutine is allowed to finish
whatever it's doing, call deferred functions, and exit naturally.
Update all of our non-gRPC binaries to use one of these two functions.
The vast majority use WaitForSignal, as they run their main processing
loop in a background goroutine. A few (particularly those that can run
either in run-once or in daemonized mode) still use CatchSignals, since
their primary processing happens directly on the main goroutine.
The changes to //test/load-generator are the most invasive, simply
because that binary needed to have a context plumbed into it for proper
cancellation, but it already had a custom struct type named "context"
which needed to be renamed to avoid shadowing.
Fixes https://github.com/letsencrypt/boulder/issues/6794
Enable the errcheck linter. Update the way we express exclusions to use
the new, non-deprecated, non-regex-based format. Fix all places where we
began accidentally violating errcheck while it was disabled.
Deprecate the ROCSPStage7 feature flag, which caused the RA and CA to
stop generating OCSP responses when issuing new certs and when revoking
certs. (That functionality is now handled just-in-time by the
ocsp-responder.) Delete the old OCSP-generating codepaths from the RA
and CA. Remove the CA's internal reference to an OCSP implementation,
because it no longer needs it.
Additionally, remove the SA's "Issuers" config field, which was never
used.
Fixes#6285
These config stanzas have been removed in staging and prod. They used to
configure the separate OCSP and CRL gRPC services provided by the CA
process, but the CA now provides those services on the same port as the
main CA gRPC service.
Fixes#6448
The CA, RA, and VA have multiple goroutines running alongside primary
gRPC handling goroutine. These ancillary goroutines should be gracefully
shut down when the process is about to exit. Historically, we have
handled this by putting a call to each of these goroutine's shutdown
function inside cmd.CatchSignals, so that when a SIGINT is received, all
of the various cleanup routines happen in sequence.
But there's a cleaner way to do it: just use defer! All of these
cleanups need to happen after the primary gRPC server has fully shut
down, so that we know they stick around at least as long as the service
is handling gRPC requests. And when the service receives a SIGINT,
cmd.CatchSignals will call the gRPC server's GracefulStop, which will
cause the server's .Serve() to finally exit, which will cause start() to
exit, which will cause main() to exit, which will cause all deferred
functions to be run.
In addition, remove filterShutdownErrors as the bug which made it
necessary (.Serve() returning an error even when GracefulShutdown() is
called) was fixed back in 2017. This allows us to call the start()
function in a much more natural way, simply logging any error it returns
instead of calling os.Exit(1) if it returns an error.
This allows us to simplify the exit-handling code in these three
services' main() functions, and lets us be a bit more idiomatic with our
deferred cleanup functions.
Part of #6794
Deprecate the ROCSPStage6 feature flag. Remove all references to the
`ocspResponse` column from the SA, both when reading from and when
writing to the `certificateStatus` table. This makes it safe to fully
remove that column from the database.
IN-8731 enabled this flag in all environments, so it is safe to
deprecate.
Part of #6285
Delete the ocsp-updater service, and the //ocsp/updater library that
supports it. Remove test configs for the service, and remove references
to the service from other test files.
This service has been fully shut down for an extended period now, and is
safe to remove.
Fixes#6499
As a follow-up to #6780, add the same style of implementation test to
all of our other gRPC services. This was not included in that PR just to
keep it small and single-purpose.
- Require `letsencrypt/validator` package.
- Add a framework for registering configuration structs and any custom
validators for each Boulder component at `init()` time.
- Add a `validate` subcommand which allows you to pass a `-component`
name and `-config` file path.
- Expose validation via exported utility functions
`cmd.LookupConfigValidator()`, `cmd.ValidateJSONConfig()` and
`cmd.ValidateYAMLConfig()`.
- Add unit test which validates all registered component configuration
structs against test configuration files.
Part of #6052
When shutting down the CA, we should stop the primary gRPC service
(which waits for any outstanding requests to complete) before shutting
down the other helper goroutines. This prevents panics when an ongoing
gRPC request attempts to write to the OCSPLogQueue, but the queue's
channel has already been closed by the call to ocspi.Stop().
Fixes#6760
Remove tracing using Beeline from Boulder. The only remnant left behind
is the deprecated configuration, to ensure deployability.
We had previously planned to swap in OpenTelemetry in a single PR, but
that adds significant churn in a single change, so we're doing this as
multiple steps that will each be significantly easier to reason about
and review.
Part of #6361
Replace capital letters with lowercase letters in markdown fragments for
compatibility with various markdown renderers. For example, Github
happily accepts fragments as-is, but vscode does not.
Fixes https://github.com/letsencrypt/boulder/issues/6722
In particular, document that it does not purge the Akamai cache.
Also, in the RA, avoid creating a "fake" certificate object containing
only the serial. Instead, use req.Serial directly in most places. This
uncovered some incorrect logic. Fix that logic by gating the operations
that actually need a full *x509.Certificate: revoking by key, and
purging the Akamai cache.
Also, make `req.Serial` mandatory for AdministrativelyRevokeCertificate.
This is a reopen of #6693, which accidentally got merged into a
different feature branch.
Remove the OCSPGenerator and CRLGenerator gRPC servers that run on
separate ports from the CA's main gRPC server, which exposes both those
and the CertificateAuthority service as well. These additional servers
are no longer necessary, now that all three services are exposed on the
single address/port.
Fixes#6448
This was mostly unused. The only caller was orphan-finder, which used it
to determine if a certificate was already in the database. But this is
not particularly important functionality, so I've removed it.
Removes all code related to the `ReuseValidAuthz` feature flag. The
Boulder default is to now always reuse valid authorizations.
Fixes a panic in `test.AssertErrorIs` when `err` is unexpectedly `nil`
that was found this while reworking the
`TestPerformValidationAlreadyValid` test. The go stdlib `func Is`[1]
does not check for this.
1. https://go.dev/src/errors/wrap.go
Part 2/2, fixes https://github.com/letsencrypt/boulder/issues/2734
We rely on the ratelimit/ package in CI to validate our ratelimit
configurations. However, because that package relies on cmd/ just for
cmd.ConfigDuration, many additional dependencies get pulled in.
This refactors just that struct to a separate config package. This was
done using Goland's automatic refactoring tooling, which also organized
a few imports while it was touching them, keeping standard library,
internal and external dependencies grouped.
sa: rename AddPrecertificateRequest.IssuerID
to IssuerNameID. This is in preparation for adding a similarly-named
field to AddSerialRequest.
Part of #5152.
Replace the cmd.ServiceConfig embed with just its components (i.e.
DebugAddr and sometimes TLS) in the WFE, crl-updater, ocsp-updater,
ocsp-responder, and expiration-mailer. These services are not gRPC
services, and therefore do not need the full suite of config keys
introduced by cmd.ServiceConfig.
Blocks #6674
Part of #6052
Use constants from the go stdlib time package, such as time.DateTime and
time.RFC3339, when parsing and formatting timestamps. Additionally,
simplify or remove some of our uses of parsing timestamps, such as to
set fake clocks in tests.
Add the "AsyncFinalize" feature flag. When enabled, this causes the RA
to return almost immediately from FinalizeOrder requests, with the
actual hard work of issuing the precertificate, getting SCTs, issuing
the final certificate, and updating the database accordingly all
occuring in a background goroutine while the client polls the GetOrder
endpoint waiting for the result.
This is implemented by factoring out the majority of the finalization
work into a new `issueCertificateOuter` helper function, and simply
using the new flag to determine whether we call that helper in a
goroutine or not. This makes removing the feature flag in the future
trivially easy.
Also add a new prometheus metric named `inflight_finalizes` which can be
used to count the number of simultaneous goroutines which are performing
finalization work. This metric is exported regardless of the state of
the AsyncFinalize flag, so that we can observe any changes to this
metric when the flag is flipped.
Fixes#6575
Remove the GenerateOCSP and GenerateCRL methods from the
CertificateAuthority gRPC service. These methods are no longer called by
any clients; all clients use their respective OCSPGenerator and
CRLGenerator gRPC services instead.
In addition, remove the CRLGeneratorServer field from the caImpl, as it
no longer needs it to serve as a backing implementation for the
GenerateCRL pass-through method. Unfortunately, we can't remove the
OCSPGeneratorServer field until after ROCSPStage7 is complete, and the
CA is no longer generating an OCSP response during initial certificate
issuance.
Part of #6448