Create a new administration tool "bin/admin" as a successor to and
replacement of "admin-revoker".
This new tool supports all the same fundamental capabilities as the old
admin-revoker, including:
- Revoking by serial, by batch of serials, by incident table, and by
private key
- Blocking a key to let bad-key-revoker take care of revocation
- Clearing email addresses from all accounts that use them
Improvements over the old admin-revoker include:
- All commands run in "dry-run" mode by default, to prevent accidental
executions
- All revocation mechanisms allow setting the revocation reason,
skipping blocking the key, indicating that the certificate is malformed,
and controlling the number of parallel workers conducting revocation
- All revocation mechanisms do not parse the cert in question, leaving
that to the RA
- Autogenerated usage information for all subcommands
- A much more modular structure to simplify adding more capabilities in
the future
- Significantly simplified tests with smaller mocks
The new tool has analogues of all of admin-revokers unit tests, and all
integration tests have been updated to use the new tool instead. A
future PR will remove admin-revoker, once we're sure SRE has had time to
update all of their playbooks.
Fixes https://github.com/letsencrypt/boulder/issues/7135
Fixes https://github.com/letsencrypt/boulder/issues/7269
Fixes https://github.com/letsencrypt/boulder/issues/7268
Fixes https://github.com/letsencrypt/boulder/issues/6927
Part of https://github.com/letsencrypt/boulder/issues/6840
These names corresponded to single instances of a service, and were
primarily used for (a) specifying which interface to bind a gRPC port on
and (b) allowing `health-checker` to check individual instances rather
than a service as a whole.
For (a), change the `--grpc-addr` flags to bind to "all interfaces." For
(b), provide a specific IP address and port for health checking. This
required adding a `--hostOverride` flag for `health-checker` because the
service certificates contain hostname SANs, not IP address SANs.
Clarify the situation with nonce services a little bit. Previously we
had one nonce "service" in Consul and got nonces from that (i.e.
randomly between the two nonce-service instances). Now we have two nonce
services in consul, representing multiple datacenters, and one of them
is explicitly configured as the "get" service, while both are configured
as the "redeem" service.
Part of #7245.
Note this change does not yet get rid of the rednet/bluenet distinction,
nor does it get rid of all use of 10.88.88.88. That will be a followup
change.
Upgrade to zlint v3.6.0
Two new lints are triggered in various places:
aia_contains_internal_names is ignored in integration test
configurations, and unit tests are updated to have more realistic URLs.
The w_subject_common_name_included lint needs to be ignored where we'd
ignored n_subject_common_name_included before.
Related to https://github.com/letsencrypt/boulder/issues/7261
This partially reverts commit 20b121138c,
which was landed in https://github.com/letsencrypt/boulder/pull/7254.
Specifically, it reverts the addition of "noWaitForReady" to the
health-checker's gRPC config. This appears to stop the flaky `last
resolver error: produced zero addresses` failures we've been seeing in
the CI integration tests.
When we have a problem with our authentication certificates, it's better
to get a clear error early than to wait for health checker to time out.
Also, set noWaitForReady in the config, which prevents detailed errors
from being obscured by "timed out" errors.
Part of #7245.
This just provides a unique port for each instance, and breaks the
service<->port mapping. A subsequent PR will move to listening on the
same IP.
Remove unused `-b` variants of crl-storer and akamai-purger.
The new port scheme is that the first instance of a service is on `93xx`
and the second instance of a service is on `94xx`.
Part of a stacked change with #7243.
Change the max value of the CA's `SerialPrefix` config value from 255 (a
byte of all 1s) to 127 (a byte of one 0 followed by seven 1s). This
prevents the serial prefix from ever beginning with a 1.
This is important because serials are interpreted as signed
(twos-complement) integers, and are required to be positive -- a serial
whose first bit is 1 is considered to be negative and therefore in
violation of RFC 5280. The go stdlib fixes this for us by prepending a
zero byte to any serial that begins with a 1 bit, but we'd prefer all
our serials to be the same length.
Corresponding config change was completed in IN-9880.
These feature flags are no longer referenced in any test, staging, or
production configuration. They were removed in:
- StoreRevokerInfo: IN-8546
- ROCSPStage6 and ROCSPStage7: IN-8886
- CAAValidationMethods and CAAAccountURI: IN-9301
This value is always set to the empty string in prod, which (correctly)
results in the issued certificates not having a CRLDP at all. It turns
out our integration test environment has been including CRLDPs in all of
our test certs because we set crlURL to a non-empty value! This change
updates our test configs to match reality.
I'll remove the code which supports this config value as part of my
upcoming CA CRLDP changes.
This is potentially useful for diagnosing issues with connection
timeouts, which could have separate causes from HTTP errors. For
instance, a connection timeout is more likely to be caused by network
congestion or high CPU usage on the load balancer.
In `//cmd/ceremony`:
* Added `CertificateToCrossSignPath` to the `cross-certificate` ceremony
type. This new input field takes an existing certificate that will be
cross-signed and performs checks against the manually configured data in
each ceremony file.
* Added byte-for-byte subject/issuer comparison checks to root,
intermediate, and cross-certificate ceremonies to detect that signing is
happening as expected.
* Added Fermat factorization check from the `//goodkey` package to all
functions that generate new key material.
In `//linter`:
* The Check function now exports linting certificate bytes. The idea is
that a linting certificate's `tbsCertificate` bytes can be compared
against the final certificate's `tbsCertificate` bytes as a verification
that `x509.CreateCertificate` was deterministic and produced identical
DER bytes after each signing operation.
Other notable changes:
* Re-orders the issuers list in each CA config to match staging and
production. There is an ordering issue mentioned by @aarongable two
years ago on IN-5913 that didn't make it's way back to this repository.
> Order here matters – the default chain we serve for each intermediate
should be the first listed chain containing that intermediate.
* Enables `ECDSAForAll` in `config-next` CA configs to match Staging.
* Generates 2x new ECDSA subordinate CAs cross-signed by an RSA root and
adds these chains to the WFE for clients to download.
* Increased the test.sh startup timeout to account for the extra
ceremony run time.
Fixes https://github.com/letsencrypt/boulder/issues/7003
---------
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
Simplify the index-picking logic in the SA's leaseOldestCrlShard method.
Specifically, more clearly separate it into "missing" and "non-missing"
cases, which require entirely different logic: picking a random missing
shard, or picking the oldest unleased shard, respectively.
Also change the UpdateCRLShard method to "unlease" shards when they're
updated. This allows the crl-updater to run as quickly as it likes,
while still ensuring that multiple instances do not step on each other's
toes.
The config change for shardWidth and lookbackPeriod instead of
certificateLifetime has been deployed in prod since IN-8445. The config
change changing the shardWidth is just so that the tests neither produce
a bazillion shards, nor have to do a bazillion SA queries for each chunk
within a shard, improving the readability of test logs.
Part of https://github.com/letsencrypt/boulder/issues/7023
The inclusion of Policy Qualifiers inside Policy Information elements of
a Certificate Policies extension is now NOT RECOMMENDED by the Baseline
Requirements. We have already removed these fields from all of our
Boulder configuration, and ceased issuing certificates with Policy
Qualifiers.
Remove all support for configuring and including Policy Qualifiers in
our certificates, both in Boulder's main issuance path and in our
ceremony tool. Switch from using the policyasn1 library to manually
encode these extensions, to using the crypto/x509's
Certificate.PolicyIdentifiers field. Delete the policyasn1 package as it
is no longer necessary.
Fixes https://github.com/letsencrypt/boulder/issues/6880
Remove the load test stage of the integration test, which generates
superfluous amounts of log.
Turn down logging on the CA and VA from info to error-only.
Part of https://github.com/letsencrypt/boulder/issues/6890
When a user wants their email address deleted from the database but no
longer has access to their account, this allows an administrator to
clear it.
This adds `admin` as an alias for `admin-revoker`, because we'd like the
clear-email sub-command to be a part of that overall tool, but it's not
really revocation related.
Part of #6864
Enable SA gRPC health checks in Consul ahead of further changes for
#6878. Calls to the `Check` method of the SA's grpc.health.v1.Health
service must respond `SERVING` before the `sa` service will be
advertised in Consul DNS. Consul will continue to poll this service
every 5 seconds.
- Add `bconsul` docker service to boulder `bluenet` and `rednet`
- Add TLS credentials for `consul.boulder`:
```shell
$ openssl x509 -in consul.boulder/cert.pem -text | grep DNS
DNS:consul.boulder
```
- Update `test/grpc-creds/generate.sh` to add `consul.boulder`
- Update test SA configs to allow `consul.boulder` to access to
`grpc.health.v1.Health`
Part of #6878
In order to get rid of the orphan queue, we want to make sure that
before we sign a precertificate, we have enough data in the database
that we can fulfill our revocation-checking obligations even if storing
that precertificate in the database fails. That means:
- We should have a row in the certificateStatus table for the serial.
- But we should not serve "good" for that serial until we are positive
the precertificate was issued (BRs 4.9.10).
- We should have a record in the live DB of the proposed certificate's
public key, so the bad-key-revoker can mark it revoked.
- We should have a record in the live DB of the proposed certificate's
names, so it can be revoked if we are required to revoke based on names.
The SA.AddPrecertificate method already achieves these goals for
precertificates by writing to the various metadata tables. This PR
repurposes the SA.AddPrecertificate method to write "proposed
precertificates" instead.
We already create a linting certificate before the precertificate, and
that linting certificate is identical to the precertificate that will be
issued except for the private key used to sign it (and the AKID). So for
instance it contains the right pubkey and SANs, and the Issuer name is
the same as the Issuer name that will be used. So we'll use the linting
certificate as the "proposed precertificate" and store it to the DB,
along with appropriate metadata.
In the new code path, rather than writing "good" for the new
certificateStatus row, we write a new, fake OCSP status string "wait".
This will cause us to return internalServerError to OCSP requests for
that serial (but we won't get such requests because the serial has not
yet been published). After we finish precertificate issuance, we update
the status to "good" with SA.SetCertificateStatusReady.
Part of #6665
Deprecate the ROCSPStage7 feature flag, which caused the RA and CA to
stop generating OCSP responses when issuing new certs and when revoking
certs. (That functionality is now handled just-in-time by the
ocsp-responder.) Delete the old OCSP-generating codepaths from the RA
and CA. Remove the CA's internal reference to an OCSP implementation,
because it no longer needs it.
Additionally, remove the SA's "Issuers" config field, which was never
used.
Fixes#6285
These config stanzas have been removed in staging and prod. They used to
configure the separate OCSP and CRL gRPC services provided by the CA
process, but the CA now provides those services on the same port as the
main CA gRPC service.
Fixes#6448
Deprecate the ROCSPStage6 feature flag. Remove all references to the
`ocspResponse` column from the SA, both when reading from and when
writing to the `certificateStatus` table. This makes it safe to fully
remove that column from the database.
IN-8731 enabled this flag in all environments, so it is safe to
deprecate.
Part of #6285
Delete the ocsp-updater service, and the //ocsp/updater library that
supports it. Remove test configs for the service, and remove references
to the service from other test files.
This service has been fully shut down for an extended period now, and is
safe to remove.
Fixes#6499
When sending an ARI response, write the Retry-After header before
writing the JSON response body. This is necessary because
http.ResponseWriter implicitly calls WriteHeader whenever Write is
called, flushing all headers to the network and preventing any
additional headers from being written. Unfortunately, the unittests use
httptest.ResponseRecorder, which doesn't seem to enforce this invariant
(it's happy to report headers which were written after the body). Add a
header check to the integration tests, to make up for this deficiency.
- Require `letsencrypt/validator` package.
- Add a framework for registering configuration structs and any custom
validators for each Boulder component at `init()` time.
- Add a `validate` subcommand which allows you to pass a `-component`
name and `-config` file path.
- Expose validation via exported utility functions
`cmd.LookupConfigValidator()`, `cmd.ValidateJSONConfig()` and
`cmd.ValidateYAMLConfig()`.
- Add unit test which validates all registered component configuration
structs against test configuration files.
Part of #6052
- Consistently format existing test JSON config files
- Add a small Python script which loads and dumps JSON files
- Add CI JSON lint test to CI
---------
Co-authored-by: Aaron Gable <aaron@aarongable.com>
Remove the OCSPGenerator and CRLGenerator gRPC servers that run on
separate ports from the CA's main gRPC server, which exposes both those
and the CertificateAuthority service as well. These additional servers
are no longer necessary, now that all three services are exposed on the
single address/port.
Fixes#6448
Removes all code related to the `ReuseValidAuthz` feature flag. The
Boulder default is to now always reuse valid authorizations.
Fixes a panic in `test.AssertErrorIs` when `err` is unexpectedly `nil`
that was found this while reworking the
`TestPerformValidationAlreadyValid` test. The go stdlib `func Is`[1]
does not check for this.
1. https://go.dev/src/errors/wrap.go
Part 2/2, fixes https://github.com/letsencrypt/boulder/issues/2734
Remove the GenerateOCSP and GenerateCRL methods from the
CertificateAuthority gRPC service. These methods are no longer called by
any clients; all clients use their respective OCSPGenerator and
CRLGenerator gRPC services instead.
In addition, remove the CRLGeneratorServer field from the caImpl, as it
no longer needs it to serve as a backing implementation for the
GenerateCRL pass-through method. Unfortunately, we can't remove the
OCSPGeneratorServer field until after ROCSPStage7 is complete, and the
CA is no longer generating an OCSP response during initial certificate
issuance.
Part of #6448
Remove `example.com` domain name, which was used by the deleted OldTLS
tests.
Remove GODEBUG=x509sha1=1.
Add a longer comment for the Consul DNS fallback in docker-compose.yml.
Use the "dnsAuthority" field for all gRPC clients in config-next,
instead of implicitly relying on the system DNS. This matches what we do
in prod.
Make "dnsAuthority" field of GRPCClientConfig mandatory whenever
SRVLookup or SRVLookups is used.
Make test/config/ocsp-responder.json use ServerAddress instead of
SRVLookup, like the rest of test/config.
Assign nonce prefixes for each nonce-service by taking the first eight
characters of the the base64url encoded HMAC-SHA256 hash of the RPC
listening address using a provided key. The provided key must be same
across all boulder-wfe and nonce-service instances.
- Add a custom `grpc-go` load balancer implementation (`nonce`) which
can route nonce redemption RPC messages by matching the prefix to the
derived prefix of the nonce-service instance which created it.
- Modify the RPC client constructor to allow the operator to override
the default load balancer implementation (`round_robin`).
- Modify the `srv` RPC resolver to accept a comma separated list of
targets to be resolved.
- Remove unused nonce-service `-prefix` flag.
Fixes#6404
In the WFE, ocsp-responder, and crl-updater, switch from using
StorageAuthorityClients to StorageAuthorityReadOnlyClients. This ensures
that these services cannot call methods which write to our database.
Fixes#6454
Make minor changes to our implementation of CAA Account and Method
Binding, as a result of reviewing the code in preparation for enabling
it in production. Specifically:
- Ensure that the validation method and account ID are included at the
request level, rather than waiting until we perform the checks which use
those parameters;
- Clean up code which assumed the validation method and account ID might
not be populated;
- Use the core.AcmeChallenge type (rather than plain string) for the
validation method everywhere;
- Update comments to reference the latest version and correct sections
of the CAA RFCs; and
- Remove the CAA feature flags from the config integration tests to
reflect that they are not yet enabled in prod.
I have reviewed this code side-by-side with RFC 8659 (CAA) and RFC 8657
(ACME CAA Account and Method Binding) and believe it to be compliant
with both.
Deprecate these feature flags, which are consistently set in both prod
and staging and which we do not expect to change the value of ever
again:
- AllowReRevocation
- AllowV1Registration
- CheckFailedAuthorizationsFirst
- FasterNewOrdersRateLimit
- GetAuthzReadOnly
- GetAuthzUseIndex
- MozRevocationReasons
- RejectDuplicateCSRExtensions
- RestrictRSAKeySizes
- SHA1CSRs
Move each feature flag to the "deprecated" section of features.go.
Remove all references to these feature flags from Boulder application
code, and make the code they were guarding the only path. Deduplicate
tests which were testing both the feature-enabled and feature-disabled
code paths. Remove the flags from all config-next JSON configs (but
leave them in config ones until they're fully deleted, not just
deprecated). Finally, replace a few testdata CSRs used in CA tests,
because they had SHA1WithRSAEncryption signatures that are now rejected.
Fixes#5171Fixes#6476
Part of #5997
Now that we have the ability to easily add multiple gRPC services to the
same server, and control access to each service individually, use that
capability to expose the CA's CertificateAuthority, OCSPGenerator, and
CRLGenerator services all on the same address/port. This will make
establishing connections to the CA easier, but no less secure.
Part of #6448
Add the Issuing Distribution Point extension to all of our end-entity
CRLs. The extension contains the Distribution Point, the URL from
which this CRL is meant to be downloaded. Because our CRLs are
sharded, this URL prevents an on-path attacker from substituting a
different shard than the client expected in order to hide a revocation.
The extension also contains the OnlyContainsUserCerts boolean,
because our CRLs only contain end-entity certificates.
The Distribution Point url is constructed from a configurable base URI,
the issuer's NameID, the shard index, and the suffix ".crl". The base
URI must use the "http://" scheme and must not end with a slash.
openssl displays the IDP extension as:
```
X509v3 Issuing Distribution Point: critical
Full Name:
URI:http://c.boulder.test/66283756913588288/0.crl Only User Certificates
```
Fixes#6410
These flags are set in both staging and prod. Deprecate them, make
all code gated behind them the only path, and delete code (multi_source)
which was only accessible when these flags were not set.
Part of #6285
- Add a new gRPC client config field which overrides the dNSName checked in the
certificate presented by the gRPC server.
- Revert all test gRPC credentials to `<service>.boulder`
- Revert all ClientNames in gRPC server configs to `<service>.boulder`
- Set all gRPC clients in `test/config` to use `serverAddress` + `hostOverride`
- Set all gRPC clients in `test/config-next` to use `srvLookup` + `hostOverride`
- Rename incorrect SRV record for `ca` with port `9096` to `ca-ocsp`
- Rename incorrect SRV record for `ca` with port `9106` to `ca-crl`
Resolves#6424
- Add a dedicated Consul container
- Replace `sd-test-srv` with Consul
- Add documentation for configuring Consul
- Re-issue all gRPC credentials for `<service-name>.service.consul`
Part of #6111
Honeycomb was emitting logs directly to stderr like this:
```
WARN: Missing API Key.
WARN: Dataset is ignored in favor of service name. Data will be sent to service name: boulder
```
Fix this by providing a fake API key and replacing "dataset" with "serviceName" in configs. Also add missing Honeycomb configs for crl-updater.
For stdout-only logger, include checksums and escape newlines.
Make every function in the Run -> Tick -> tickIssuer -> tickShard chain
return an error. Make that return value a named return (which we usually
avoid) so that we can remove the manual setting of the metric result
label and have the deferred metric handling function take care of that
instead. In addition, let that cleanup function wrap the returned error
(if any) with the identity of the shard, issuer, or tick that is
returning it, so that we don't have to include that info in every
individual error message. Finally, have the functions which spin off
many helpers (Tick and tickIssuer) collect all of their helpers' errors
and only surface that error at the end, to ensure the process completes
even in the presence of transient errors.
In crl-updater's main, surface the error returned by Run or Tick, to
make debugging easier.
Now that both crl-updater and crl-storer are running in prod,
run this integration test in both test environments as well.
In addition, remove the fake storer grpc client that the updater
used when no storer client was configured, as storer clients
are now configured in all environments.
Emit PEM output instead of pretty-printed output. Send the pretty-printed
output straight to stdout instead of via a logger, so the internal newlines don't
get escaped.
Fixes#6310
Allow the crl-storer to load whole AWS config files. Although
this requires a deployment to maintain an additional config
files for the crl-storer, and one in a format we usually don't
use, it does give us lots of flexibility in setting up things like
role assumption.
Also remove the S3Region config flag, as it is now redundant
with the contents of the config file, and rename the existing
S3CredsFile config key to AWSCredsFile to better represent
its true contents.
Fixes#6308
Create a new crl-storer service, which receives CRL shards via gRPC and
uploads them to an S3 bucket. It ignores AWS SDK configuration in the
usual places, in favor of configuration from our standard JSON service
config files. It ensures that the CRLs it receives parse and are signed
by the appropriate issuer before uploading them.
Integrate crl-updater with the new service. It streams bytes to the
crl-storer as it receives them from the CA, without performing any
checking at the same time. This new functionality is disabled if the
crl-updater does not have a config stanza instructing it how to connect
to the crl-storer.
Finally, add a new test component, the s3-test-srv. This acts similarly
to the existing mail-test-srv: it receives requests, stores information
about them, and exposes that information for later querying by the
integration test. The integration test uses this to ensure that a
newly-revoked certificate does show up in the next generation of CRLs
produced.
Fixes#6162
Add a new config key `UpdateOffset` to crl-updater, which causes it to
run on a regular schedule rather than running immediately upon startup
and then every `UpdatePeriod` after that. It is safe for this new config
key to be omitted and take the default zero value.
Also add a new command line flag `runOnce` to crl-updater which causes
it to immediately run a single time and then exit, rather than running
continuously as a daemon. This will be useful for integration tests and
emergency situations.
Part of #6163
Add a new filter to mail-test-srv, allowing test processes to query
for messages sent from a specific address, not just ones sent to
a specific address. This fixes a race condition in the revocation
integration tests where the number of messages sent to a cert's
contact address would be higher than expected because expiration
mailer sent a message while the test was running. Also reduce
bad-key-revoker's maximum backoff to 2 seconds to ensure that
it continues to run frequently during the integration tests, despite
usually not having any work to do.
While we're here, also improve the comments on various revocation
integration tests, remove some unnecessary cruft, and split the tests
out to explicitly test functionality with the MozRevocationReasons
flag both enabled and disabled. Also, change ocsp_helper's default
output from os.Stdout to ioutil.Discard to prevent hundreds of lines
of log spam when the integration tests fail during a test that uses
that library.
Fixes#6248
Previously we used "ExpectedFreshness" to control how frequently the
Redis source would request re-signing of stale entries. But that field
also controls whether multi_source is willing to serve a MariaDB
response. It's better to split these into two values.
Create a new service named crl-updater. It is responsible for
maintaining the full set of CRLs we issue: one "full and complete" CRL
for each currently-active Issuer, split into a number of "shards" which
are essentially CRLs with arbitrary scopes.
The crl-updater is modeled after the ocsp-updater: it is a long-running
standalone service that wakes up periodically, does a large amount of
work in parallel, and then sleeps. The period at which it wakes to do
work is configurable. Unlike the ocsp-responder, it does all of its work
every time it wakes, so we expect to set the update frequency at 6-24
hours.
Maintaining CRL scopes is done statelessly. Every certificate belongs to
a specific "bucket", given its notAfter date. This mapping is generally
unchanging over the life of the certificate, so revoked certificate
entries will not be moving between shards upon every update. The only
exception is if we change the number of shards, in which case all of the
bucket boundaries will be recomputed. For more details, see the comment
on `getShardBoundaries`.
It uses the new SA.GetRevokedCerts method to collect all of the revoked
certificates whose notAfter timestamps fall within the boundaries of
each shard's time-bucket. It uses the new CA.GenerateCRL method to sign
the CRLs. In the future, it will send signed CRLs to the crl-storer to
be persisted outside our infrastructure.
Fixes#6163
This copies over settings from config-next that are now deployed in prod.
Also, I updated a comment in sd-test-srv to more accurately describe how SRV records work.
- Add new configuration key `throughput`, a mapping which contains all
throughput related akamai-purger settings.
- Deprecate configuration key `purgeInterval` in favor of `purgeBatchInterval` in
the new `throughput` configuration mapping.
- When no `throughput` or `purgeInterval` is provided, the purger uses optimized
default settings which offer 1.9x the throughput of current production settings.
- At startup, all throughput related settings are modeled to ensure that we
don't exceed the limits imposed on us by Akamai.
- Queue is now `[][]string`, instead of `[]string`.
- When a given queue entry is purged we know all 3 of it's URLs were purged.
- At startup we know the size of a theoretical request to purge based on the
number of queue entries included
- Raises the queue size from ~333-thousand cached OCSP responses to
1.25-million, which is roughly 6 hours of work using the optimized default
settings
- Raise `purgeInterval` in test config from 1ms, which violates API limits, to 800ms
Fixes#5984
These tests are testing functionality that is no longer in use in
production deployments of Boulder. As we go about removing wfe1
functionality, these tests will break, so let's just remove them
wholesale right now. I have verified that all of the tests removed in
this PR are duplicated against wfe2.
One of the changes in this PR is to cease starting up the wfe1 process
in the integration tests at all. However, that component was serving
requests for the AIA Issuer URL, which gets queried by various OCSP and
revocation tests. In order to keep those tests working, this change also
adds an integration-test-only handler to wfe2, and updates the CA
configuration to point at the new handler.
Part of #5681
This scans the database for certificateStatus rows, gets them signed by the CA, and writes them to Redis.
Also, bump the default PoolSize for Redis to 100.
This is a sort of proof of concept of the Redis interaction, which will
evolve into a tool for inspection and manual repair of missing entries,
if we find ourselves needing to do that.
The important bits here are rocsp/rocsp.go and
cmd/rocsp-tool/main.go. Also, the newly-vendored Redis client.
This allows repeated runs using the same hiearchy, and avoids spurious
errors from ocsp-updater saying "This CA doesn't have an issuer cert
with ID XXX"
Fixes#5721
The expiration mailer processes certificates in batches of size
`certLimit` (default 100). In production, it runs in daemon mode, so it
will go on to the next batch when the current one is done. However, in
local integration tests we rely on it getting all its work done in a
single run. This works when you're running from a clean slate, but if
you've run integration tests a bunch of times, there will be a bunch of
certificates from previous runs that clog up the queue, and it won't
send mail for the specific certificate the integration test is looking
for.
Solution: Set `certLimit` very high in the config.
Also, update the default times for sending mail to match what we have in
prod.
Instead of using the default `json.Unmarshal`, explicitly
construct and use a `json.Decoder` so that we can set the
`DisallowUnknownFields` flag on the decoder. This causes
any unrecognized config keys to result in errors at boulder
startup time.
Fixes#5643
Throw away the result of parsing various command-line flags in
cert-checker. Leave the flags themselves in place to avoid breaking
any scripts which pass them, but only respect the values provided by
the config file.
Part of #5489
Delete the expired-authz-purger2 binary, as well as the
various config files, tests, and test helpers that exist to
support it.
This utility is no longer necessary, as it has not been
running for quite some time, and we have developed
alternative means of keeping the growth of the authz
table under control.
Fixes#5568
Delete the boulder-janitor binary, and the various configs
and tests which exist to support it.
This tool has not been actively running in quite some time.
The tables which is covers are either supported by our
more recent partitioning methods, or are rate-limit tables
that we hope to move out of mysql entirely. The cost of
maintaining the janitor is not offset by the benefits it brings
us (or the lack thereof).
Fixes#5569
In the CA, compute the notAfter timestamp such that the cert is actually
valid for the intended duration, not for one second longer. In the
Issuance library, compute the validity period by including the full
length of the final second indicated by the notAfter date when
determining if the certificate request matches our profile. Update tests
and config files to match.
Fixes#5473
In prod, the CA is now configured to issue certificates with
notAfter timestamps 7775999 seconds after their notBefore
timestamp, and to enforce that same difference when validating
issuance requests.
Update our test configs to match.
- Remove field `ECDSAAllowedAccounts` from CA
- Remove `ECDSAAllowedAccounts` from CA tests
- Replace `ECDSAAllowedAccounts` with `ECDSAAllowListFilename` in
`test/config/ca-a.json` and `test/config/ca-b.json`
- Add YAML allow list file at `test/config/ecdsaAllowList.yml`
Fixes#5394
Add tool to audit subscriber registrations for e-mail addresses that
`notify-mailer` is currently configured to skip.
- Add `cmd/contact-auditor` with README
- Add test coverage for `cmd/contact-auditor`
- Add config file at `test/config/contact-auditor`
Part of #5372
Add Honeycomb tracing to all Boulder components which act as
HTTP servers, gRPC servers, or gRPC clients. Add many values
which we currently emit to logs to the trace spans. Add a way to
configure the Honeycomb integration to our config files, and by
default configure all of our tests to "mute" (send nothing).
Followup changes will refine the configuration, attempt to reduce
the new dependency load, and introduce better sampling.
Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218
Solve a nil pointer dereference of `ecdsaAllowList` in `boulder-ca` by
calling `reloader.New()` in constructor `ca.NewECDSAAllowListFromFile`
instead.
- Add missing entry `ECDSAAllowListFilename` to
`test/config-next/ca-a.json` and `test/config-next/ca-b.json`
- Add missing file ecdsaAllowList.yml to `test/config-next`
- Add missing entry `ECDSAAllowedAccounts` to `test/config/ca-a.json`
and `test/config/ca-b.json`
- Move creation of the reloader to `NewECDSAAllowListFromFile`
Fixes#5414
Only process OCSP generation requests which are identified
by the certificate's serial number and the ID (not NameID,
unfortunately) of its issuer. Delete the code path which handled
OCSP generation for requests identified by the full DER of
the certificate in question.
Update existing tests to use serial+id to request OCSP, and
move test cases from the old `TestGenerateOCSPWithIssuerID`
into the default test method.
Part of #5079
- Edit integration test to start expired-authz-purger2 with config/
config-next
- Move config from `cmd/expired-authz-purger2/config.json` to
`test/config/expired-authz-purger2.json`
- Add a copy of `test/config/expired-authz-purger2.json` to
`test/config-next/`
Fixes#5351
The old `config.Common.CT.IntermediateBundleFilename` format is no
longer used in any production configs, and can be removed safely.
Part of #5162
Part of #5242Fixes#5269
In #5235 we replaced MaxDBConns in favor of MaxOpenConns.
One week ago MaxDBConns was removed from all dev, staging, and
production configurations. This change completes the removal of
MaxDBConns from all components and test/config.
Fixes#5249
This change simplifies and hardens the wfe2's support for having
multiple issuers, and multiple chains for each issuer, configured
and loaded in memory.
The only config-visible change is replacing the old two separate config
values (`certificateChains` and `alternateCertificateChains`) with a
single value (`chains`). This new value does not require the user to
know and hand-code the AIA URLs at which the certificates are available;
instead the chains are simply presented as lists of files. If this new
config value is present, the old config values will be ignored; if it
is not, the old config values will be respected.
Behind the scenes, the chain loading code has been completely changed.
Instead of loading PEM bytes directly from the file, and then asserting
various things (line endings, no trailing bits, etc) about those bytes,
we now parse a certificate from the file, and in-memory recreate the
PEM from that certificate. This approach allows the file loading to be
much more forgiving, while also being stricter: we now check that each
certificate in the chain is correctly signed by the next cert, and that
the last cert in the chain is a self-signed root.
Within the WFE itself, most of the internal structure has been retained.
However, both the internal `issuerCertificates` (used for checking
that certs we are asked to revoke were in fact issued by us) and the
`certificateChains` (used to append chains to end-entity certs when
served to clients) have been updated to be maps keyed by IssuerNameID.
This allows revocation checking to not have to iterate through the
whole list of issuers, and also makes it easy to double-check that
the signatures on end-entity certs are valid before serving them. Actual
checking of the validity will come in a follow-up change, due to the
invasive nature of the necessary test changes.
Fixes#5164
Previously, configuration of the boulder-janitor was split into
two places: the actual json config file (which controlled which
jobs would be enabled, and what their rate limits should be), and
the janitor code itself (which controlled which tables and columns
those jobs should query). This resulted in significant duplicated
code, as most of the jobs were identical except for their table
and column names.
This change abstracts away the query which jobs use to find work.
Instead of having each job type parse its own config and produce
its own work query (in Go code), now each job supplies just a few
key values (the table name and two column names) in its JSON config,
and the Go code assembles the appropriate query from there. We are
able to delete all of the files defining individual job types, and
replace them with a single slightly smarter job constructor.
This enables further refactorings, namely:
* Moving all of the logic code into its own module;
* Ensuring that the exported interface of that module is safe (i.e.
that a client cannot create and run jobs without them being valid,
because the only exposed methods ensure validity);
* Collapsing validity checks into a single location;
* Various renamings.