Commit Graph

522 Commits

Author SHA1 Message Date
Phil Porada 3366be50f1
Use RFC 7093 truncated SHA256 hash for Subject Key Identifier (#7179)
- Adds a feature flag to gate rollout for SHA256 Subject Key Identifiers
for end-entity certificates.
- The ceremony tool will now use the RFC 7093 section 2 option 1 method
for generating Subject Key Identifiers for future root CA, intermediate
CA, and cross-sign ceremonies.

- - - -

[RFC 7093 section 2 option
1](https://datatracker.ietf.org/doc/html/rfc7093#section-2) provides a
method for generating a truncated SHA256 hash for the Subject Key
Identifier field in accordance with Baseline Requirement [section
7.1.2.11.4 Subject Key
Identifier](90a98dc7c1/docs/BR.md (712114-subject-key-identifier)).

> [RFC5280] specifies two examples for generating key identifiers from
>    public keys.  Four additional mechanisms are as follows:
> 
>    1) The keyIdentifier is composed of the leftmost 160-bits of the
>       SHA-256 hash of the value of the BIT STRING subjectPublicKey
>       (excluding the tag, length, and number of unused bits).

The related [RFC 5280 section
4.2.1.2](https://datatracker.ietf.org/doc/html/rfc5280#section-4.2.1.2)
states:
>   For CA certificates, subject key identifiers SHOULD be derived from
>   the public key or a method that generates unique values.  Two common
>   methods for generating key identifiers from the public key are:
>   ...
>   Other methods of generating unique numbers are also acceptable.
2023-12-06 13:44:17 -05:00
Matthew McPherrin 32adaf1846
Make log-validator take glob patterns to monitor for log files (#7172)
To simplify deployment of the log validator, this allows wildcards
(using go's filepath.Glob) to be included in the file paths.

In order to detect new files, a new background goroutine polls the glob
patterns every minute for matches.

Because the "monitor" function is running in its own goroutine, a lock
is needed to ensure it's not trying to add new tailers while shutdown is
happening.
2023-11-27 12:48:46 -08:00
Aaron Gable 16081d8e30
Invert RequireCommonName into AllowNoCommonName (#7139)
The RequireCommonName feature flag was our only "inverted" feature flag,
which defaulted to true and had to be explicitly set to false. This
inversion can lead to confusion, especially to readers who expect all Go
default values to be zero values. We plan to remove the ability for our
feature flag system to support default-true flags, which the existence
of this flag blocked. Since this flag has not been set in any real
configs, inverting it is easy.

Part of https://github.com/letsencrypt/boulder/issues/6802
2023-11-06 10:58:30 -08:00
Aaron Gable 81cb970d30
Remove crlURL from test CA issuer configs (#7132)
This value is always set to the empty string in prod, which (correctly)
results in the issued certificates not having a CRLDP at all. It turns
out our integration test environment has been including CRLDPs in all of
our test certs because we set crlURL to a non-empty value! This change
updates our test configs to match reality.

I'll remove the code which supports this config value as part of my
upcoming CA CRLDP changes.
2023-11-02 11:20:50 -07:00
Samantha 9aef5839b5
WFE: Add new key-value ratelimits implementation (#7089)
Integrate the key-value rate limits from #6947 into the WFE. Rate limits
are backed by the Redis source added in #7016, and use the SRV record
shard discovery added in #7042.

Part of #5545
2023-10-04 14:12:38 -04:00
Aaron Gable 3b880e1ccf
Add CAAAfterValidation feature flag (#7082)
Add a new feature flag "CAAAfterValidation" which, when set to true in
the VA, causes the VA to only begin CAA checks after basic domain
control validation has completed successfully. This will make successful
validations take longer, since the DCV and CAA checks are performed
serially instead of in parallel. However, it will also reduce the number
of CAA checks we perform by up to 80%, since such a high percentage of
validations also fail.

IN-9575 tracks enabling this feature flag in staging and prod
Fixes https://github.com/letsencrypt/boulder/issues/7058
2023-09-18 13:30:31 -07:00
Aaron Gable 102b447e8d
Smoother scheduling and leasing for crl-updater (#7010)
Overhaul crl-updater's default (i.e. non-runOnce) behavior to update
individual CRL shards continuously, rather than updating all shards in a
large batch.

To accomplish this, it spins up one goroutine for each shard of each
issuer this updater is responsible for. Each goroutine is solely
responsible for its assigned shard. It sleeps for a random amount of
time (to stagger their starts), then begins a ticker to wake up every
updateInterval and re-issue its shard.

As part of this change, refactor updater.go into three separate files
(batch.go, continuous.go, and updater.go) containing functions dedicated
to single-run batch processing, long-running continuous processing, and
shared helpers, respectively.

IN-9475 tracks the deprecation of the `updateOffset` config key. The
other configuration changes in this PR do not require production
changes.

Fixes https://github.com/letsencrypt/boulder/issues/7023
2023-09-08 09:16:15 -07:00
Phil Porada 72e01b337a
ceremony: Distinguish between intermediate and cross-sign ceremonies (#7005)
In `//cmd/ceremony`:
* Added `CertificateToCrossSignPath` to the `cross-certificate` ceremony
type. This new input field takes an existing certificate that will be
cross-signed and performs checks against the manually configured data in
each ceremony file.
* Added byte-for-byte subject/issuer comparison checks to root,
intermediate, and cross-certificate ceremonies to detect that signing is
happening as expected.
* Added Fermat factorization check from the `//goodkey` package to all
functions that generate new key material.

In `//linter`: 
* The Check function now exports linting certificate bytes. The idea is
that a linting certificate's `tbsCertificate` bytes can be compared
against the final certificate's `tbsCertificate` bytes as a verification
that `x509.CreateCertificate` was deterministic and produced identical
DER bytes after each signing operation.

Other notable changes:
* Re-orders the issuers list in each CA config to match staging and
production. There is an ordering issue mentioned by @aarongable two
years ago on IN-5913 that didn't make it's way back to this repository.
> Order here matters – the default chain we serve for each intermediate
should be the first listed chain containing that intermediate.
* Enables `ECDSAForAll` in `config-next` CA configs to match Staging.
* Generates 2x new ECDSA subordinate CAs cross-signed by an RSA root and
adds these chains to the WFE for clients to download.
* Increased the test.sh startup timeout to account for the extra
ceremony run time.


Fixes https://github.com/letsencrypt/boulder/issues/7003

---------

Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
2023-08-23 14:01:19 -04:00
Aaron Gable 6a450a2272
Improve CRL shard leasing (#7030)
Simplify the index-picking logic in the SA's leaseOldestCrlShard method.
Specifically, more clearly separate it into "missing" and "non-missing"
cases, which require entirely different logic: picking a random missing
shard, or picking the oldest unleased shard, respectively.

Also change the UpdateCRLShard method to "unlease" shards when they're
updated. This allows the crl-updater to run as quickly as it likes,
while still ensuring that multiple instances do not step on each other's
toes.

The config change for shardWidth and lookbackPeriod instead of
certificateLifetime has been deployed in prod since IN-8445. The config
change changing the shardWidth is just so that the tests neither produce
a bazillion shards, nor have to do a bazillion SA queries for each chunk
within a shard, improving the readability of test logs.

Part of https://github.com/letsencrypt/boulder/issues/7023
2023-08-08 17:05:00 -07:00
Aaron Gable 9a4f0ca678
Deprecate LeaseCRLShards feature (#7009)
This feature flag is enabled in both staging and prod.
2023-08-07 15:17:00 -07:00
Jacob Hoffman-Andrews 725f190c01
ca: remove orphan queue code (#7025)
The `orphanQueueDir` config field is no longer used anywhere.

Fixes #6551
2023-08-02 16:04:28 -07:00
Jacob Hoffman-Andrews 8d7b87c9ca
cert-checker: check for precertificate correspondence (#7015)
This adds a lookup in cert-checker to find the linting precertificate
with the same serial number as a given final certificate, and checks
precertificate correspondence between the two.

Fixes #6959
2023-07-28 12:45:47 -04:00
Aaron Gable 908421bb98
crl-updater: lease CRL shards to prevent races (#6941)
Add a new feature flag, LeaseCRLShards, which controls certain aspects
of crl-updater's behavior.

When this flag is enabled, crl-updater calls the new SA.LeaseCRLShard
method before beginning work on a shard. This prevents it from stepping
on the toes of another crl-updater instance which may be working on the
same shard. This is important to prevent two competing instances from
accidentally updating a CRL's Number (which is an integer representation
of its thisUpdate timestamp) *backwards*, which would be a compliance
violation.

When this flag is enabled, crl-updater also calls the new
SA.UpdateCRLShard method after finishing work on a shard.

In the future, additional work will be done to make crl-updater use the
"give me the oldest available shard" mode of the LeaseCRLShard method.

Fixes https://github.com/letsencrypt/boulder/issues/6897
2023-07-19 15:11:16 -07:00
Aaron Gable e09c5faf5e
Deprecate CAA AccountURI and ValidationMethods feature flags (#7000)
These flags are set to true in all environments.
2023-07-14 14:54:39 -04:00
Jacob Hoffman-Andrews cd24b9db20
ca: deprecate StoreLintingCertificateInsteadOfPrecertificate (#6970)
And turn off the orphan queue in config-next.
2023-07-05 10:44:08 -07:00
Samantha 124c4cc6f5
grpc/sa: Implement deep health checks (#6928)
Add the necessary scaffolding for deep health checking of our various
gRPC components. Each component implementation that also implements the
grpc.checker interface will be checked periodically, and the health
status of the component will be updated accordingly.

Add the necessary methods to SA to implement the grpc.checker interface
and register these new health checks with Consul.

Additionally:
- Update entry point script to check for ProxySQL readiness.
- Increase the poll rate for gRPC Consul checks from 5s to 2s to help
with DNS failures, due to check failures, on startup.
- Change log level for Consul from INFO to ERROR to deal with noisy logs
full of transport failures due to Consul gRPC checks firing before the
SAs are up.

Fixes #6878
Part of #6795
2023-06-12 13:58:53 -04:00
Jacob Hoffman-Andrews 2041e8723b
integration: shorten log output (#6894)
Remove the load test stage of the integration test, which generates
superfluous amounts of log.

Turn down logging on the CA and VA from info to error-only.

Part of https://github.com/letsencrypt/boulder/issues/6890
2023-06-05 13:11:19 -04:00
Jacob Hoffman-Andrews 80e1510819
admin: add clear-email subcommand (#6919)
When a user wants their email address deleted from the database but no
longer has access to their account, this allows an administrator to
clear it.

This adds `admin` as an alias for `admin-revoker`, because we'd like the
clear-email sub-command to be a part of that overall tool, but it's not
really revocation related.

Part of #6864
2023-06-01 14:33:24 -04:00
Samantha f09a94bd74
consul: Configure gRPC health check for SA (#6908)
Enable SA gRPC health checks in Consul ahead of further changes for
#6878. Calls to the `Check` method of the SA's grpc.health.v1.Health
service must respond `SERVING` before the `sa` service will be
advertised in Consul DNS. Consul will continue to poll this service
every 5 seconds.

- Add `bconsul` docker service to boulder `bluenet` and `rednet`
- Add TLS credentials for `consul.boulder`:
  ```shell
  $ openssl x509 -in consul.boulder/cert.pem -text | grep DNS
                DNS:consul.boulder
  ```
- Update `test/grpc-creds/generate.sh` to add `consul.boulder`
- Update test SA configs to allow `consul.boulder` to access to
`grpc.health.v1.Health`

Part of #6878
2023-05-23 13:16:49 -04:00
Aaron Gable fe523f142d
crl-updater: retry failed shards (#6907)
Add per-shard exponential backoff and retry to crl-updater. Each
individual CRL shard will be retried up to MaxAttempts (default 1)
times, with exponential backoff starting at 1 second and maxing out at 1
minute between each attempt.

This can effectively reduce the parallelism of crl-updater: while a
goroutine is sleeping between attempts of a failing shard, it is not
doing work on another shard. This is a desirable feature, since it means
that crl-updater gently reduces the total load it places on the network
and database when shards start to fail.

Setting this new config parameter is tracked in IN-9140
Fixes https://github.com/letsencrypt/boulder/issues/6895
2023-05-22 12:59:09 -07:00
Matthew McPherrin b7d9f8c2e3
In config-next/, opentelemetry -> openTelemetry for consistency (#6888)
In configs, opentelemetry -> openTelemetry

As pointed out in review of #6867, these should match the case of their
corresponding Go identifiers for consistency.

JSON keys are case-insensitive in Go (part of why we've got a fork in
go-jose),
so this change should have no functional impact.
2023-05-15 17:07:29 -04:00
Matthew McPherrin 3aae67b8a9
Opentelemetry: Add option for public endpoints (#6867)
This PR adds a new configuration block specifically for the otelhttp
instrumentation. This block is separate from the existing
"opentelemetry" configuration, and is only relevant when using otelhttp
instrumentation. It does not share any codepath with the existing
configuration, so it is at the top level to indicate which services it
applies to.

There's a bit of plumbing new configuration through. I've adopted the
measured_http package to also set up opentelemetry instead of just
metrics, which should hopefully allow any future changes to be smaller
(just config & there) and more consistent between the wfe2 and ocsp
responder.

There's one option here now, which disables setting
[otelhttp.WithPublicEndpoint](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp#WithPublicEndpoint).
This option is designed to do exactly what we want: Don't accept
incoming spans as parents of the new span created in the server.
Previously we had a setting to disable parent-based sampling to help
with this problem, which doesn't really make sense anymore, so let's
just remove it and simplify that setup path. The default of "false" is
designed to be the safe option. It's set to True in the test/ configs
for integration tests that use traces, and I expect we'll likely set it
true in production eventually once the LBs are configured to handle
tracing themselves.

Fixes #6851
2023-05-12 15:34:34 -04:00
Samantha 310546a14e
VA: Support discovery of DNS resolvers via Consul (#6869)
Deprecate `va.DNSResolver` in favor of backwards compatible
`va.DNSProvider`.

Fixes #6852
2023-05-12 12:54:31 -04:00
Samantha 19c5244088
test: Use consul hostname instead of IP for dnsAuthority (#6883)
Standardize on hostnames for dnsAuthority to match production. 

Related to #6869
2023-05-11 14:13:53 -07:00
Jacob Hoffman-Andrews f295626e4c
ca: remove simulated ISRG OID from config (#6879)
We intend to issue in the future with only the CA/Browser Forum Domain
Validated OID.
2023-05-10 12:39:12 -04:00
Jacob Hoffman-Andrews ac4be89b56
grpc: add NoWaitForReady config field (#6850)
Currently we set WaitForReady(true), which causes gRPC requests to not
fail immediately if no backends are available, but instead wait until
the timeout in case a backend does become available. The downside is
that this behavior masks true connection errors. We'd like to turn it
off.

Fixes #6834
2023-05-09 16:16:44 -07:00
Matthew McPherrin 8427245675
OTel Integration test using jaeger (#6842)
This adds Jaeger's all-in-one dev container (with no persistent storage)
to boulder's dev docker-compose. It configures config-next/ to send all
traces there.

A new integration test creates an account and issues a cert, then
verifies the trace contains some set of expected spans.

This test found that async finalize broke spans, so I fixed that and a
few related spots where we make a new context.
2023-05-05 10:41:29 -04:00
Jacob Hoffman-Andrews 1c7e0fd1d8
Store linting certificate instead of precertificate (#6807)
In order to get rid of the orphan queue, we want to make sure that
before we sign a precertificate, we have enough data in the database
that we can fulfill our revocation-checking obligations even if storing
that precertificate in the database fails. That means:

- We should have a row in the certificateStatus table for the serial.
- But we should not serve "good" for that serial until we are positive
the precertificate was issued (BRs 4.9.10).
- We should have a record in the live DB of the proposed certificate's
public key, so the bad-key-revoker can mark it revoked.
- We should have a record in the live DB of the proposed certificate's
names, so it can be revoked if we are required to revoke based on names.

The SA.AddPrecertificate method already achieves these goals for
precertificates by writing to the various metadata tables. This PR
repurposes the SA.AddPrecertificate method to write "proposed
precertificates" instead.

We already create a linting certificate before the precertificate, and
that linting certificate is identical to the precertificate that will be
issued except for the private key used to sign it (and the AKID). So for
instance it contains the right pubkey and SANs, and the Issuer name is
the same as the Issuer name that will be used. So we'll use the linting
certificate as the "proposed precertificate" and store it to the DB,
along with appropriate metadata.

In the new code path, rather than writing "good" for the new
certificateStatus row, we write a new, fake OCSP status string "wait".
This will cause us to return internalServerError to OCSP requests for
that serial (but we won't get such requests because the serial has not
yet been published). After we finish precertificate issuance, we update
the status to "good" with SA.SetCertificateStatusReady.

Part of #6665
2023-04-26 13:54:24 -07:00
Aaron Gable 45329c9472
Deprecate ROCSPStage7 flag (#6804)
Deprecate the ROCSPStage7 feature flag, which caused the RA and CA to
stop generating OCSP responses when issuing new certs and when revoking
certs. (That functionality is now handled just-in-time by the
ocsp-responder.) Delete the old OCSP-generating codepaths from the RA
and CA. Remove the CA's internal reference to an OCSP implementation,
because it no longer needs it.

Additionally, remove the SA's "Issuers" config field, which was never
used.

Fixes #6285
2023-04-12 17:03:06 -07:00
Aaron Gable 7e994a1216
Deprecate ROCSPStage6 feature flag (#6770)
Deprecate the ROCSPStage6 feature flag. Remove all references to the
`ocspResponse` column from the SA, both when reading from and when
writing to the `certificateStatus` table. This makes it safe to fully
remove that column from the database.

IN-8731 enabled this flag in all environments, so it is safe to
deprecate.

Part of #6285
2023-04-04 15:41:51 -07:00
Aaron Gable 8c67769be4
Remove ocsp-updater from Boulder (#6769)
Delete the ocsp-updater service, and the //ocsp/updater library that
supports it. Remove test configs for the service, and remove references
to the service from other test files.

This service has been fully shut down for an extended period now, and is
safe to remove.

Fixes #6499
2023-03-31 14:39:04 -07:00
Samantha b2224eb4bc
config: Add validation tags to all configuration structs (#6674)
- Require `letsencrypt/validator` package.
- Add a framework for registering configuration structs and any custom
validators for each Boulder component at `init()` time.
- Add a `validate` subcommand which allows you to pass a `-component`
name and `-config` file path.
- Expose validation via exported utility functions
`cmd.LookupConfigValidator()`, `cmd.ValidateJSONConfig()` and
`cmd.ValidateYAMLConfig()`.
- Add unit test which validates all registered component configuration
structs against test configuration files.

Part of #6052
2023-03-21 14:08:03 -04:00
Aaron Gable 6d6f3632da
Change SetCommonName to RequireCommonName (#6749)
Change the SetCommonName flag, introduced in #6706, to
RequireCommonName. Rather than having the flag control both whether or
not a name is hoisted from the SANs into the CN *and* whether or not the
CA is willing to issue certs with no CN, this updated flag now only
controls the latter. By default, the new flag is true, and continues our
current behavior of failing issuance if we cannot set a CN in the cert.
When the flag is set to false, then we are willing to issue certificates
for which the CSR contains no CN and there is no SAN short enough to be
hoisted into the CN field.

When we have rolled out this change, we can move on to the next flag in
this series: HoistCommonName, which will control whether or not a SAN is
hoisted at all, effectively giving the CSRs (and therefore the clients)
full control over whether their certificate contains a SAN.

This change is safe because no environment explicitly sets the
SetCommonName flag to false yet.

Fixes #5112
2023-03-21 11:07:06 -07:00
Matthew McPherrin 05c9106eba
lints: Consistently format JSON configuration files (#6755)
- Consistently format existing test JSON config files
- Add a small Python script which loads and dumps JSON files
- Add CI JSON lint test to CI

---------

Co-authored-by: Aaron Gable <aaron@aarongable.com>
2023-03-20 18:11:19 -04:00
Matthew McPherrin e1ed1a2ac2
Remove beeline tracing (#6733)
Remove tracing using Beeline from Boulder. The only remnant left behind
is the deprecated configuration, to ensure deployability.

We had previously planned to swap in OpenTelemetry in a single PR, but
that adds significant churn in a single change, so we're doing this as
multiple steps that will each be significantly easier to reason about
and review.

Part of #6361
2023-03-14 15:14:27 -07:00
Aaron Gable 9af4871e59
Add SetCommonName feature flag (#6706)
Add a new feature flag, `SetCommonName`, which defaults to `true`. In
this default state, no behavior changes.

When set to `false` on the CA, this flag will cause the CA to leave the
Subject commonName field of the certificate blank, as is recommended by
the Baseline Requirements Section 7.1.4.2.2(a).

Also slightly modify the behavior of the RA's `matchesCSR()` function,
to allow for both certificates that have a CN and certificates that
don't. It is not feasible to put this behavior behind the same
SetCommonName flag, because that would require an atomic deploy of both
the RA and the CA.

Obsoletes #5112
2023-03-09 13:31:55 -05:00
Aaron Gable 29bf521121
CA: Remove secondary gRPC servers (#6496)
Remove the OCSPGenerator and CRLGenerator gRPC servers that run on
separate ports from the CA's main gRPC server, which exposes both those
and the CertificateAuthority service as well. These additional servers
are no longer necessary, now that all three services are exposed on the
single address/port.

Fixes #6448
2023-03-01 11:45:28 -08:00
Samantha 98ef3bb2b4
VA/config: Remove unused va.CAA service in config (#6697)
GRPC config from `va.VA` is used for both `va.VA` and `va.CAA`.
2023-02-27 13:44:47 -05:00
Samantha a0fe7dc93e
SA: Remove Redis config (#6695)
This field doesn't appear to be in use.

Part of #6052
2023-02-27 09:29:38 -08:00
Aaron Gable cdf1a6f9f9
Add flag to make order finalization async (#6589)
Add the "AsyncFinalize" feature flag. When enabled, this causes the RA
to return almost immediately from FinalizeOrder requests, with the
actual hard work of issuing the precertificate, getting SCTs, issuing
the final certificate, and updating the database accordingly all
occuring in a background goroutine while the client polls the GetOrder
endpoint waiting for the result.

This is implemented by factoring out the majority of the finalization
work into a new `issueCertificateOuter` helper function, and simply
using the new flag to determine whether we call that helper in a
goroutine or not. This makes removing the feature flag in the future
trivially easy.

Also add a new prometheus metric named `inflight_finalizes` which can be
used to count the number of simultaneous goroutines which are performing
finalization work. This metric is exported regardless of the state of
the AsyncFinalize flag, so that we can observe any changes to this
metric when the flag is flipped.

Fixes #6575
2023-02-24 09:57:54 -08:00
Jacob Hoffman-Andrews 79250756bf
expiration-mailer: limit number of mails sent to same address per day (#6675)
This adds a config field, "mailsPerAddressPerDay." Addresses that get
that many mails won't receive any more until the next day (UTC).

Fixes #6508.
2023-02-22 15:24:31 -08:00
Phil Porada 6c84a69043
Remove MandatoryPOSTasGET flag (#6672)
Remove the `MandatoryPOSTasGET` flag from the WFE2.
Update the ACMEv2 divergence doc to note that neither staging nor
production use MandatoryPOSTasGET.

Fixes #6582.
2023-02-17 13:04:31 -05:00
Jacob Hoffman-Andrews cd1bbc0d82
Tidy up integration test environment (#6668)
Remove `example.com` domain name, which was used by the deleted OldTLS
tests.

Remove GODEBUG=x509sha1=1.

Add a longer comment for the Consul DNS fallback in docker-compose.yml.

Use the "dnsAuthority" field for all gRPC clients in config-next,
instead of implicitly relying on the system DNS. This matches what we do
in prod.

Make "dnsAuthority" field of GRPCClientConfig mandatory whenever
SRVLookup or SRVLookups is used.

Make test/config/ocsp-responder.json use ServerAddress instead of
SRVLookup, like the rest of test/config.
2023-02-16 09:33:24 -08:00
Aaron Gable f9e4fb6c06
Add replication lag retries to some SA methods (#6649)
Add a new time.Duration field, LagFactor, to both the SA's config struct
and the read-only SA's implementation struct. In the GetRegistration,
GetOrder, and GetAuthorization2 methods, if the database select returned
a NoRows error and a lagFactor duration is configured, then sleep for
lagFactor seconds and retry the select.

This allows us to compensate for the replication lag between our primary
write database and our read-only replica databases. Sometimes clients
will fire requests in rapid succession (such as creating a new order,
then immediately querying the authorizations associated with that
order), and the subsequent requests will fail because they are directed
to read replicas which are lagging behind the primary. Adding this
simple sleep-and-retry will let us mitigate many of these failures,
without adding too much complexity.

Fixes #6593
2023-02-14 17:25:13 -08:00
Samantha 5c49231ea6
ROCSP: Remove support for Redis Cluster (#6645)
Fixes #6517
2023-02-09 17:14:37 -05:00
Phil Porada 134321040b
Default ReuseValidAuthz to true (#6644)
`ReuseValidAuthz` was introduced
here [1] and enabled in staging and production configs on 2016-07-13. 
There was a brief stint during the TLS-SNI-01 challenge type removal where 
SRE disabled it. However, time has finally come to remove this configuration
option. Issue #6623 will determine the feasibility of shorter authz
lifetimes and potentially the removal of authz reuse.

This change is broken up into two parts to allow SRE to safely remove
the flag from staging and production configs. We'll merge this PR, SRE
will deploy boulder and the config change, then we'll finish removing
`ReuseValidAuthz` configuration from the codebase.

[1] boulder commit 9abc212448

Part 1 of 2 for fixing #2734.
2023-02-09 14:26:06 -05:00
Samantha d73125d8f6
WFE: Add custom balancer implementation which routes nonce redemption RPCs by prefix (#6618)
Assign nonce prefixes for each nonce-service by taking the first eight
characters of the the base64url encoded HMAC-SHA256 hash of the RPC
listening address using a provided key. The provided key must be same
across all boulder-wfe and nonce-service instances.
- Add a custom `grpc-go` load balancer implementation (`nonce`) which
can route nonce redemption RPC messages by matching the prefix to the
derived prefix of the nonce-service instance which created it.
- Modify the RPC client constructor to allow the operator to override
the default load balancer implementation (`round_robin`).
- Modify the `srv` RPC resolver to accept a comma separated list of
targets to be resolved.
- Remove unused nonce-service `-prefix` flag.

Fixes #6404
2023-02-03 17:52:18 -05:00
Jacob Hoffman-Andrews e57c788086
Add checking of validations to cert-checker (#6617)
This includes two feature flags: one that controls turning on the extra
database queries, and one that causes cert-checker to fail on missing
validations. If the second flag isn't turned on, it will just emit error
log lines. This will help us find any edge conditions we need to deal
with before making the new code trigger alerts.

Fixes #6562
2023-02-03 16:25:41 -05:00
Jacob Hoffman-Andrews 9d3f7d8f84
Add timeout config to WFE (#6621) 2023-01-30 10:07:41 -08:00
Aaron Gable a7dc34f127
ocsp-responder: make db config optional (#6601)
In #6293, we gave the ocsp-responder the ability to use a gRPC
connection to the SA to get status information for certificates, rather
than using a database connection directly. However, that change
neglected to make the database connection configuration optional: an
ocsp-responder with an SA gRPC client configured would never use its
database connection, but if it wasn't configured it would refuse to
start. Fix this oversight by making the DBConfig stanza optional.
2023-01-26 15:21:39 -08:00
Phil Porada 3866e4f60d
VA: Use default PortConfig during testing (#6609)
Part of #3940
2023-01-25 16:16:08 -05:00
Phil Porada aae4175186
Remove deprecated feature flags (#6566)
Remove deprecated feature flags.

Fixes #6559
2023-01-23 20:56:15 -05:00
Matthew McPherrin 1f6a873fcc
Remove MandatoryPOSTAsGET from config-next (#6585)
In preparation for removing this flag completely in #6582 , remove it
from config-next. This matches boulder's configuration in all LE
environments.
2023-01-12 17:42:28 -08:00
Samantha 6c6da76400
ROCSP: Replace Redis Cluster with a consistently sharded all-primary nodes (#6516) 2022-12-19 15:06:47 -05:00
Jacob Hoffman-Andrews fe2cf7d136
ocsp: add load shedding for live signer (#6523)
In live.go we use a semaphore to limit how many inflight signing
requests we can have, so a flood of OCSP traffic doesn't flood our CA
instances. If traffic exceeds our capacity to sign responses for long
enough, we want to eventually start fast-rejecting inbound requests that
are unlikely to get serviced before their deadline is reached. To do
that, add a MaxSigningWaiters config field to the OCSP responder.

Note that the files in //semaphore are forked from x/sync/semaphore,
with modifications to add the MaxWaiters field and functionality.

Fixes #6392
2022-12-12 15:48:44 -08:00
Aaron Gable ba34ac6b6e
Use read-only SA clients in wfe, ocsp, and crl (#6484)
In the WFE, ocsp-responder, and crl-updater, switch from using
StorageAuthorityClients to StorageAuthorityReadOnlyClients. This ensures
that these services cannot call methods which write to our database.

Fixes #6454
2022-12-02 13:48:28 -08:00
Jacob Hoffman-Andrews 75338135e4
expiration-mailer: use a JOIN to find work more efficiently (#6439)
Right now the expiration mailer does one big SELECT on
`certificateStatus` to find certificates to work on, then several
thousand SELECTs of individual serial numbers in `certificates`.

Since it's more efficient to get that data as a stream from a single
query, rather than thousands of separate queries, turn that into a JOIN.

NOTE: We used to use a JOIN, and switched to the current approach in
#2440 for performance reasons. I _believe_ part of the issue was that at
the time we were not using READ UNCOMMITTED, so we may have been slowing
down the database by requiring it to keep copies of a lot of rows during
the query. Still, it's possible that I've misunderstood the performance
characteristics here and it will still be a regression to use JOIN. So
I've gated the new behavior behind a feature flag.

The feature flag required extracting a new function, `getCerts`. That in
turn required changing some return types so we are not as closely tied
to `core.Certificate`. Instead we use a new local type named
`certDERWithRegId`, which can be provided either by the new code path or
the old code path.
2022-11-14 17:34:58 -08:00
Aaron Gable 4f473edfa8
Deprecate 10 feature flags (#6502)
Deprecate these feature flags, which are consistently set in both prod
and staging and which we do not expect to change the value of ever
again:
- AllowReRevocation
- AllowV1Registration
- CheckFailedAuthorizationsFirst
- FasterNewOrdersRateLimit
- GetAuthzReadOnly
- GetAuthzUseIndex
- MozRevocationReasons
- RejectDuplicateCSRExtensions
- RestrictRSAKeySizes
- SHA1CSRs

Move each feature flag to the "deprecated" section of features.go.
Remove all references to these feature flags from Boulder application
code, and make the code they were guarding the only path. Deduplicate
tests which were testing both the feature-enabled and feature-disabled
code paths. Remove the flags from all config-next JSON configs (but
leave them in config ones until they're fully deleted, not just
deprecated). Finally, replace a few testdata CSRs used in CA tests,
because they had SHA1WithRSAEncryption signatures that are now rejected.

Fixes #5171 
Fixes #6476
Part of #5997
2022-11-14 09:24:50 -08:00
Aaron Gable 9e67423110
Create new StorageAuthorityReadOnly gRPC service (#6483)
Create a new gRPC service named StorageAuthorityReadOnly which only
exposes a read-only subset of the existing StorageAuthority service's
methods.

Implement this by splitting the existing SA in half, and having the
read-write half embed and wrap an instance of the read-only half.
Unfortunately, many of our tests use exported read-write methods as part
of their test setup, so the tests are all being performed against the
read-write struct, but they are exercising the same code as the
read-only implementation exposes.

Expose this new service at the SA on the same port as the existing
service, but with (in config-next) different sets of allowed clients. In
the future, read-only clients will be removed from the read-write
service's set of allowed clients.

Part of #6454
2022-11-09 11:09:12 -08:00
Aaron Gable 4466c953de
CA: Expose all gRPC services on single address (#6495)
Now that we have the ability to easily add multiple gRPC services to the
same server, and control access to each service individually, use that
capability to expose the CA's CertificateAuthority, OCSPGenerator, and
CRLGenerator services all on the same address/port. This will make
establishing connections to the CA easier, but no less secure.

Part of #6448
2022-11-08 15:28:59 -08:00
Aaron Gable 46c8d66c31
bgrpc.NewServer: support multiple services (#6487)
Turn bgrpc.NewServer into a builder-pattern, with a config-based
initialization, multiple calls to Add to add new gRPC services, and a
final call to Build to produce the start() and stop() functions which
control server behavior. All calls are chainable to produce compact code
in each component's main() function.

This improves the process of creating a new gRPC server in three ways:
1) It avoids the need for generics/templating, which was slightly
verbose.
2) It allows the set of services to be registered on this server to be
known ahead of time.
3) It greatly streamlines adding multiple services to the same server,
which we use today in the VA and will be using soon in the SA and CA.

While we're here, add a new per-service config stanza to the
GRPCServerConfig, so that individual services on the same server can
have their own configuration. For now, only provide a "ClientNames" key,
which will be used in a follow-up PR.

Part of #6454
2022-11-04 13:26:42 -07:00
Aaron Gable 6efd941e3c
Stabilize CRL shard boundaries (#6445)
Add two new config keys to the crl-updater:
* shardWidth, which controls the width of the chunks that we divide all
of time into, with a default value of "16h" (approximately the same as
today's shard width derived from 128 shards covering 90 days); and
* lookbackPeriod, which controls the amount of already-expired
certificates that should be included in our CRLs to ensure that even
certificates which are revoked immediately before they expire still show
up in aborts least one CRL, with a default value of "24h" (approximately
the same as today's lookback period derived from our run frequency of
6h).

Use these two new values to change the way CRL shards are computed.

Previously, we would compute the total time we care about based on the
configured certificate lifetime (to determine how far forward to look)
and the configured update period (to determine how far back to look),
and then divide that time evenly by the number of shards. However, this
method had two fatal flaws. First, if the certificate lifetime is
configured incorrectly, then the CRL updater will fail to query the
database for some certs that should be included in the CRLs. Second, if
the update period is changed, this would change the lookback period,
which in turn would change the shard width, causing all CRL entries to
suddenly change which shard they're in.

Instead, first compute all chunk locations based only on the shard width
and number of shards. Then determine which chunks we need to care about
based on the configured lookback period and by querying the database for
the farthest-future expiration, to ensure we cover all extant
certificates. This may mean that more than one chunk of time will get
mapped to a single shard, but that's okay -- each chunk will remain
mapped to the same shard for the whole time we care about it.

Fixes #6438
Fixes #6440
2022-10-27 15:59:48 -07:00
Aaron Gable ab4b1eb3e1
Add ROCSPStage7 flag to disable OCSP calls (#6461)
Rather than simply refusing to write OCSP Response bytes to the
database (which is what ROCSP Stage 6 did), Stage 7 refuses to
even generate those bytes in the first place. We obviously can't
disable OCSP Response generation in the CA, since it still needs to
be usable by the ocsp-responder's live-signing path, so instead we
disable it in all of the non-live-signing codepaths (orphan finder,
issue precertificate, revoke certificate, and re-revoke certificate)
which have previously called GenerateOCSP.

Part of #6285
2022-10-21 17:24:19 -07:00
Aaron Gable 02432fcd51
RA: Use OCSPGenerator gRPC service (#6453)
When the RA is generating OCSP (as part of new issuance, revocation,
or when its own GenerateOCSP method is called by the ocsp-responder)
have it use the CA's dedicated OCSPGenerator service, rather than
calling the method exposed by the CA's catch-all CertificateAuthority
service. To facilitate this, add a new GRPCClientConfig stanza to the
RA.

This change will allow us to remove the GenerateOCSP and GenerateCRL
methods from the catch-all CertificateAuthority service, allowing us to
independently control which kinds of objects the CA is willing to sign
by turning off individual service interfaces. The RA's new config stanza
will need to be populated in prod before further changes are possible.

Fixes #6451
2022-10-21 15:37:01 -07:00
Aaron Gable 30d8f19895
Deprecate ROCSP Stage 1, 2, and 3 flags (#6460)
These flags are set in both staging and prod. Deprecate them, make
all code gated behind them the only path, and delete code (multi_source)
which was only accessible when these flags were not set.

Part of #6285
2022-10-21 14:58:34 -07:00
Aaron Gable 272625b4a4
Add CRLDPBase config key to boulder-ca (#6442)
Add a new configuration key to the CA which allows us to
specify the "base URL" for our CRLs. This will be necessary
before including an Issuing Distribution Point extension in our
CRLs, or a CRL Distribution Point in our certificates.

Part of #6410
2022-10-11 08:55:25 -07:00
Samantha 9c12e58c7b
grpc: Allow static host override in client config (#6423)
- Add a new gRPC client config field which overrides the dNSName checked in the
  certificate presented by the gRPC server.
- Revert all test gRPC credentials to `<service>.boulder`
- Revert all ClientNames in gRPC server configs to `<service>.boulder`
- Set all gRPC clients in `test/config` to use `serverAddress` + `hostOverride`
- Set all gRPC clients in `test/config-next` to use `srvLookup` + `hostOverride`
- Rename incorrect SRV record for `ca` with port `9096` to `ca-ocsp` 
- Rename incorrect SRV record for `ca` with port `9106` to `ca-crl` 

Resolves #6424
2022-10-03 15:23:55 -07:00
Jacob Hoffman-Andrews 46e41ca8bd
expiration-mailer: allow limiting UPDATE statement (#6400)
This avoids the statements getting so big they can't run.

Also, drive-by add some comments to the expiration-mailer config.
2022-09-26 12:07:31 -07:00
Samantha ffad58009e
grpc: Backend discovery improvements (#6394)
- Fork the default `dns` resolver from `go-grpc` to add backend discovery via
  DNS SRV resource records.
- Add new fields for SRV based discovery to `cmd.GRPCClientConfig`
- Add new (optional) field `DNSAuthority` for specifying custom DNS server to
  `cmd.GRPCClientConfig`
- Add a utility method to `cmd.GRPCClientConfig` to simplify target URI and host
  construction. With three schemes and `DNSAuthority` it makes more sense to
  handle all of this parsing and construction outside of the RPC client
  constructor.

Resolves #6111
2022-09-23 13:11:59 -07:00
Samantha 90eb90bdbe
test: Replace sd-test-srv with consul (#6389)
- Add a dedicated Consul container
- Replace `sd-test-srv` with Consul
- Add documentation for configuring Consul
- Re-issue all gRPC credentials for `<service-name>.service.consul`

Part of #6111
2022-09-19 16:13:53 -07:00
Jacob Hoffman-Andrews db044a8822
log: fix spurious honeycomb warnings; improve stdout logger (#6364)
Honeycomb was emitting logs directly to stderr like this:

```
WARN: Missing API Key.
WARN: Dataset is ignored in favor of service name. Data will be sent to service name: boulder
```

Fix this by providing a fake API key and replacing "dataset" with "serviceName" in configs. Also add missing Honeycomb configs for crl-updater.

For stdout-only logger, include checksums and escape newlines.
2022-09-14 11:25:02 -07:00
Jacob Hoffman-Andrews 797f3c7217
responder: return InternalError for expired responses (#6377)
This was masking a bug, because the integration test for OCSP responses
for expired certificates was looking for the "unauthorized" OCSP
response status. Which we were returning, even though our HTTP-level
response code was 533.
2022-09-14 11:24:46 -07:00
Jacob Hoffman-Andrews 3a72f6b0a9
Reject SHA-1 CSRs in config-next (#6374) 2022-09-12 16:34:07 -07:00
Samantha 78ea1d2c9d
SA: Use separate schema for incidents tables (#6350)
- Move incidents tables from `boulder_sa` to `incidents_sa` (added in #6344)
- Grant read perms for all tables in `incidents_sa`
- Modify unit tests to account for new schema and grants
- Add database cleaning func for `boulder_sa`
- Adjust cleanup funcs to omit `sql-migrate` tables instead of `goose`

Resolves #6328
2022-09-09 15:17:14 -07:00
Jacob Hoffman-Andrews dd1c52573e
log: allow logging to stdout/stderr instead of syslog (#6307)
Right now, Boulder expects to be able to connect to syslog, and panics
if it's not available. We'd like to be able to log to stdout/stderr as a
replacement for syslog.

- Add a detailed timestamp (down to microseconds, same as we collect in
prod via syslog).
- Remove the escape codes for colorizing output.
- Report the severity level numerically rather than with a letter prefix.

Add locking for stdout/stderr and syslog logs. Neither the [syslog] package
nor the [os] package document concurrency-safety, and the Go rule is: if
it's not documented to be concurrent-safe, it's not. Notably the [log.Logger]
package is documented to be concurrent-safe, and a look at its implementation
shows it uses a Mutex internally.

Remove places that use the singleton `blog.Get()`, and instead pass through
a logger from main in all the places that need it.

[syslog]: https://pkg.go.dev/log/syslog
[os]: https://pkg.go.dev/os
[log.Logger]: https://pkg.go.dev/log#Logger
2022-08-29 06:19:22 -07:00
Aaron Gable c1be8cfc52
crl-storer: load whole AWS config files (#6309)
Allow the crl-storer to load whole AWS config files. Although
this requires a deployment to maintain an additional config
files for the crl-storer, and one in a format we usually don't
use, it does give us lots of flexibility in setting up things like
role assumption.

Also remove the S3Region config flag, as it is now redundant
with the contents of the config file, and rename the existing
S3CredsFile config key to AWSCredsFile to better represent
its true contents.

Fixes #6308
2022-08-23 11:04:12 -07:00
Aaron Gable b001af71e8
Add new services to log-validator test config (#6303)
Fixes #6289
2022-08-17 16:46:11 -07:00
Aaron Gable 09195e6804
ocsp-responder: get minimal status info from SA (#6293)
Add a new `GetRevocationStatus` gRPC method to the SA which retrieves
only the subset of the certificate status metadata relevant to
revocation, namely whether the certificate has been revoked, when it was
revoked, and the revocation reason. Notably, this method is our first
use of the `goog.protobuf.Timestamp` type in a message, which is more
ergonomic and less prone to errors than using unix nanoseconds.

Use this new method in ocsp-responder's checked_redis_source, to avoid
having to send many other pieces of metadata and the full ocsp response
bytes over the network. It provides all the information necessary to
determine if the response from Redis is up-to-date.

Within the checked_redis_source, use this new method in two different
ways: if only a database connection is configured (as is the case today)
then get this information directly from the db; if a gRPC connection to
the SA is available then prefer that instead. This may make requests
slower, but will allow us to remove database access from the hosts which
run the ocsp-responder today, simplifying our network.

The new behavior consists of two pieces, each locked behind a config
gate:
- Performing the smaller database query is only enabled if the
  ocsp-responder has the `ROCSPStage3` feature flag enabled.
- Talking to the SA rather than the database directly is only enabled if
  the ocsp-responder has an `saService` gRPC stanza in its config.

Fixes #6274
2022-08-16 16:37:24 -07:00
Aaron Gable 3a12177eab
ROCSP Stage 6: Never write OCSP responses to DB (#6284)
Create a new `ROCSPStage6` feature flag which affects the behavior of
the SA. When enabled, this flag causes the `AddPrecertificate`,
`RevokeCertificate`, and `UpdateRevokedCertificate` methods to ignore
the OCSP response bytes provided by their caller. They will no longer
error out if those bytes are missing, and if the bytes are present they
will still not be written to the database.

This allows us to, in the future, cause the RA and CA to stop generating
those OCSP responses entirely, and stop providing them to the SA,
without causing any errors when we do.

Part of #6079
2022-08-10 15:31:26 -07:00
Aaron Gable 93d3e0b9e5
Enable early ROCSP stages in integration tests (#6280)
For some reason ROCSPStage3 was enabled without also enabling
ROCSP Stages 1 and 2. Fix the oversight so we're actually running
all of the first three ROCSP stages in config-next integration tests.
2022-08-10 12:40:18 -07:00
Aaron Gable 6a9bb399f7
Create new crl-storer service (#6264)
Create a new crl-storer service, which receives CRL shards via gRPC and
uploads them to an S3 bucket. It ignores AWS SDK configuration in the
usual places, in favor of configuration from our standard JSON service
config files. It ensures that the CRLs it receives parse and are signed
by the appropriate issuer before uploading them.

Integrate crl-updater with the new service. It streams bytes to the
crl-storer as it receives them from the CA, without performing any
checking at the same time. This new functionality is disabled if the
crl-updater does not have a config stanza instructing it how to connect
to the crl-storer.

Finally, add a new test component, the s3-test-srv. This acts similarly
to the existing mail-test-srv: it receives requests, stores information
about them, and exposes that information for later querying by the
integration test. The integration test uses this to ensure that a
newly-revoked certificate does show up in the next generation of CRLs
produced.

Fixes #6162
2022-08-08 16:22:48 -07:00
Samantha 576b6777b5
grpc: Implement a static multiple IP address gRPC resolver (#6270)
- Implement a static resolver for the gPRC dialer under the scheme `static:///`
  which allows the dialer to resolve a backend from a static list of IPv4/IPv6
  addresses passed via the existing JSON config.
- Add config key `serverAddresses` to the `GRPCClientConfig` which, when
  populated, enables static IP resolution of gRPC server backends.
- Set `config-next` to use static gRPC backend resolution for all SA clients.
- Generate a new SA certificate which adds `10.77.77.77` and `10.88.88.88` to
  the SANs.

Resolves #6255
2022-08-05 10:20:57 -07:00
Jacob Hoffman-Andrews b6c4d9bc21
ocsp/responder: add checked Redis source (#6272)
Add checkedRedisSource, a new OCSP Source which gets
responses from Redis, gets metadata from the database, and
only serves the Redis response if it matches the authoritative
metadata. If there is a mismatch, it requests a new OCSP
response from the CA, stores it in Redis, and serves the new
response.

This behavior is locked behind a new ROCSPStage3 feature flag.

Part of #6079
2022-08-04 16:22:14 -07:00
Aaron Gable 694d73d67b
crl-updater: add UpdateOffset config to run on a schedule (#6260)
Add a new config key `UpdateOffset` to crl-updater, which causes it to
run on a regular schedule rather than running immediately upon startup
and then every `UpdatePeriod` after that. It is safe for this new config
key to be omitted and take the default zero value.

Also add a new command line flag `runOnce` to crl-updater which causes
it to immediately run a single time and then exit, rather than running
continuously as a daemon. This will be useful for integration tests and
emergency situations.

Part of #6163
2022-07-29 13:30:16 -07:00
Aaron Gable 9ae16edf51
Fix race condition in revocation integration tests (#6253)
Add a new filter to mail-test-srv, allowing test processes to query
for messages sent from a specific address, not just ones sent to
a specific address. This fixes a race condition in the revocation
integration tests where the number of messages sent to a cert's
contact address would be higher than expected because expiration
mailer sent a message while the test was running. Also reduce
bad-key-revoker's maximum backoff to 2 seconds to ensure that
it continues to run frequently during the integration tests, despite
usually not having any work to do.

While we're here, also improve the comments on various revocation
integration tests, remove some unnecessary cruft, and split the tests
out to explicitly test functionality with the MozRevocationReasons
flag both enabled and disabled. Also, change ocsp_helper's default
output from os.Stdout to ioutil.Discard to prevent hundreds of lines
of log spam when the integration tests fail during a test that uses
that library.

Fixes #6248
2022-07-29 09:23:50 -07:00
Jacob Hoffman-Andrews 243bcd7e8c
rocsp: plumb through more config options (#6244)
This allows configuring Boulder to talk to read-only replicas, and
decide on a routing policy (random or by latency).
2022-07-22 12:17:17 -07:00
Jacob Hoffman-Andrews 3b09571e70
ocsp-responder: add LiveSigningPeriod (#6237)
Previously we used "ExpectedFreshness" to control how frequently the
Redis source would request re-signing of stale entries. But that field
also controls whether multi_source is willing to serve a MariaDB
response. It's better to split these into two values.
2022-07-20 15:36:38 -07:00
Jacob Hoffman-Andrews 29724cb0b7
ocsp/responder: update Redis source to use live signing (#6207)
This enables ocsp-responder to talk to the RA and request freshly signed
OCSP responses.

ocsp/responder/redis_source is moved to ocsp/responder/redis/redis_source.go
and significantly modified. Instead of assuming a response is always available
in Redis, it wraps a live-signing source. When a response is not available,
it attempts a live signing.

If live signing succeeds, the Redis responder returns the result right away
and attempts to write a copy to Redis on a goroutine using a background
context.

To make things more efficient, I eliminate an unneeded ocsp.ParseResponse
from the storage path. And I factored out a FakeResponse helper to make
the unittests more manageable.

Commits should be reviewable one-by-one.

Fixes #6191
2022-07-18 10:47:14 -07:00
Aaron Gable 436061fb35
CRL: Create crl-updater service (#6212)
Create a new service named crl-updater. It is responsible for
maintaining the full set of CRLs we issue: one "full and complete" CRL
for each currently-active Issuer, split into a number of "shards" which
are essentially CRLs with arbitrary scopes.

The crl-updater is modeled after the ocsp-updater: it is a long-running
standalone service that wakes up periodically, does a large amount of
work in parallel, and then sleeps. The period at which it wakes to do
work is configurable. Unlike the ocsp-responder, it does all of its work
every time it wakes, so we expect to set the update frequency at 6-24
hours.

Maintaining CRL scopes is done statelessly. Every certificate belongs to
a specific "bucket", given its notAfter date. This mapping is generally
unchanging over the life of the certificate, so revoked certificate
entries will not be moving between shards upon every update. The only
exception is if we change the number of shards, in which case all of the
bucket boundaries will be recomputed. For more details, see the comment
on `getShardBoundaries`.

It uses the new SA.GetRevokedCerts method to collect all of the revoked
certificates whose notAfter timestamps fall within the boundaries of
each shard's time-bucket. It uses the new CA.GenerateCRL method to sign
the CRLs. In the future, it will send signed CRLs to the crl-storer to
be persisted outside our infrastructure.

Fixes #6163
2022-07-08 09:34:51 -07:00
Jacob Hoffman-Andrews 223bda0cec
ocsp-updater: remove Redis support (#6201) 2022-06-30 11:42:53 -07:00
Aaron Gable e13918b50e
CA: Add GenerateCRL gRPC method (#6187)
Add a new CA gRPC method named `GenerateCRL`. In the
style of the existing `GenerateOCSP` method, this new endpoint
is implemented as a separate service, for which the CA binary
spins up an additional gRPC service.

This method uses gRPC streaming for both its input and output.
For input, the stream must contain exactly one metadata message
identifying the crl number, issuer, and timestamp, and then any
number of messages identifying a single certificate which should
be included in the CRL. For output, it simply streams chunks of
bytes.

Fixes #6161
2022-06-29 11:03:12 -07:00
Aaron Gable 3000339dee
Reject CSRs with duplicate extensions (#6153)
This behavior will be on by default in go1.19, so let's turn
it on ourselves now to ensure there won't be any breakage
when we upgrade in August.
2022-06-17 13:13:30 -07:00
Jacob Hoffman-Andrews fda4124471
expiration-mailer: truncate serials and dns names (#6148)
This avoids sending excessively large emails and excessively large log
lines.

Fixes #6085
2022-06-14 15:48:00 -07:00
Aaron Gable f7ab64f05b
Remove last references to CFSSL (#6155)
Just a docs and config cleanup.
2022-06-14 14:22:34 -07:00
Aaron Gable 11544756bb
Support new Google CT Policy (#6082)
Add a new code path to the ctpolicy package which enforces Chrome's new
CT Policy, which requires that SCTs come from logs run by two different
operators, rather than one Google and one non-Google log. To achieve
this, invert the "race" logic: rather than assuming we always have two
groups, and racing the logs within each group against each other, we now
race the various groups against each other, and pick just one arbitrary
log from each group to attempt submission to.

Ensure that the new code path does the right thing by adding a new zlint
which checks that the two SCTs embedded in a certificate come from logs
run by different operators. To support this lint, which needs to have a
canonical mapping from logs to their operators, import the Chrome CT Log
List JSON Schema and autogenerate Go structs from it so that we can
parse a real CT Log List. Also add flags to all services which run these
lints (the CA and cert-checker) to let them load a CT Log List from disk
and provide it to the lint.

Finally, since we now have the ability to load a CT Log List file
anyway, use this capability to simplify configuration of the RA. Rather
than listing all of the details for each log we're willing to submit to,
simply list the names (technically, Descriptions) of each log, and look
up the rest of the details from the log list file.

To support this change, SRE will need to deploy log list files (the real
Chrome log list for prod, and a custom log list for staging) and then
update the configuration of the RA, CA, and cert-checker. Once that
transition is complete, the deletion TODOs left behind by this change
will be able to be completed, removing the old RA configuration and old
ctpolicy race logic.

Part of #5938
2022-05-25 15:14:57 -07:00
Jacob Hoffman-Andrews 76f987a1df
Reland "Allow expiration mailer to work in parallel" (#6133)
This reverts commit 7ef6913e71.

We turned on the `ExpirationMailerDontLookTwice` feature flag in prod, and it's
working fine but not clearing the backlog. Since
https://github.com/letsencrypt/boulder/pull/6100 fixed the issue that caused us
to (nearly) stop sending mail when we deployed #6057, this should be safe to
roll forward.

The revert of the revert applied cleanly, except for expiration-mailer/main.go
and `main_test.go`, particularly around the contents `processCerts` (where
`sendToOneRegID` was extracted from) and `sendToOneRegID` itself. So those areas
are good targets for extra attention.
2022-05-23 16:16:43 -07:00
Jacob Hoffman-Andrews be893678bd
expiration-mailer: feature-gate bug fix (#6122)
We recently landed a fix so the expiration-mailer won't look twice at
the same certificate. This will cause an immediate behavior change when
it is deployed, and that might have surprising effects. Put the fix
behind a feature flag so we can control when it rolls out more
carefully.
2022-05-16 14:17:23 -07:00
Jacob Hoffman-Andrews a4ba9b1adb
rocsp/config: fix PoolSize comment (#6110)
The go-redis docs say default is 10 * NumCPU, but the actual code says 5.

Extra context:

2465baaab5/options.go (L143-L145)

2465baaab5/cluster.go (L96-L98)

For Options, the default (documented) is 10 * NumCPUs. For ClusterOptions, the
default (undocumented) is 5 * NumCPUs. We use ClusterOptions. Also worth noting:
for ClusterOptions, the limit is per node.
2022-05-12 16:29:26 -07:00
Jacob Hoffman-Andrews 25e4b7e7fa
expiration-mailer: Deprecate NagCheckInterval (#6103)
This was introduced when expiration-mailer was run by cron, and was a
way for expiration-mailer to know something about its expected run
interval so it could send notifications "on time" rather than "just
after" the configured email time.

Now that expiration-mailer runs as a daemon we can simply pull this
value from `Frequency`, which is set to the same value in prod.
2022-05-12 16:28:42 -07:00
Aaron Gable 7ef6913e71
Revert "Allow expiration mailer to work in parallel" (#6080)
When deployed, the newly-parallel expiration-mailer encountered
unexpected difficulties and dropped to apparently sending nearly zero
emails despite not throwing any real errors. Reverting the parallelism
change until we understand and can fix the root cause.

This reverts two commits:
- Allow expiration mailer to work in parallel (#6057)
- Fix data race in expiration-mailer test mocks (#6072) 

It also modifies the revert to leave the new `ParallelSends` config key
in place (albeit completely ignored), so that the binary containing this
revert can be safely deployed regardless of config status.

Part of #5682
2022-05-03 13:18:40 -07:00
Jacob Hoffman-Andrews 9629c88d66
Allow expiration mailer to work in parallel (#6057)
Previously, each accounts email would be sent in serial,
along with several reads from the database (to check for
certificate renewal) and several writes to the database (to update
`certificateStatus.lastExpirationNagSent`). This adds a config field
for the expiration mailer that sets the parallelism it will use.

That means making and using multiple SMTP connections as well. Previously,
`bmail.Mailer` was not safe for concurrent use. It also had a piece of
API awkwardness: after you created a Mailer, you had to call Connect on
it to change its state.

Instead of treating that as a state change on Mailer, I split out a
separate component: `bmail.Conn`. Now, when you call `Mailer.Connect()`,
you get a Conn. You can send mail on that Conn and Close it when you're
done. A single Mailer instance can produce multiple Conns, so Mailer is
now concurrency-safe (while Conn is not).

This involved a moderate amount of renaming and code movement, and
GitHub's move detector is not keeping up 100%, so an eye towards "is
this moved code?" may help. Also adding `?w=1` to the diff URL to ignore
whitespace diffs.
2022-04-21 18:04:55 -07:00
Jacob Hoffman-Andrews 4467cf27db
Update config from config-next (#6051)
This copies over settings from config-next that are now deployed in prod.

Also, I updated a comment in sd-test-srv to more accurately describe how SRV records work.
2022-04-19 12:10:26 -07:00
Aaron Gable dab8a71b0e
Use new RA methods from WFE revocation path (#5983)
Simplify the WFE `RevokeCertificate` API method in three ways:
- Remove most of the logic checking if the requester is authorized to
  revoke the certificate in question (based on who is making the
  request, what authorizations they have, and what reason they're
  requesting). That checking is now done by the RA. Instead, simply
  verify that the JWS is authenticated.
- Remove the hard-to-read `authorizedToRevoke` callbacks, and make the
  `revokeCertBySubscriberKey` (nee `revokeCertByKeyID`) and
  `revokeCertByCertKey` (nee `revokeCertByJWK`) helpers much more
  straight-line in their execution logic.
- Call the RA's new `RevokeCertByApplicant` and `RevokeCertByKey` gRPC
  methods, rather than the deprecated `RevokeCertificateWithReg`.

This change, without any flag flips, should be invisible to the
end-user. It will slightly change some of our log message formats.
However, by now relying on the new RA gRPC revocation methods, this
change allows us to change our revocation policies by enabling the
`AllowDoubleRevocation` and `MozRevocationReasons` feature flags, which
affect the behavior of those new helpers.

Fixes #5936
2022-03-28 14:14:11 -07:00
Samantha 7c22b99d63
akamai-purger: Improve throughput and configuration safety (#6006)
- Add new configuration key `throughput`, a mapping which contains all
  throughput related akamai-purger settings.
- Deprecate configuration key `purgeInterval` in favor of `purgeBatchInterval` in
  the new `throughput` configuration mapping.
- When no `throughput` or `purgeInterval` is provided, the purger uses optimized
  default settings which offer 1.9x the throughput of current production settings.
- At startup, all throughput related settings are modeled to ensure that we
  don't exceed the limits imposed on us by Akamai.
- Queue is now `[][]string`, instead of `[]string`.
  - When a given queue entry is purged we know all 3 of it's URLs were purged.
  - At startup we know the size of a theoretical request to purge based on the
    number of queue entries included
- Raises the queue size from ~333-thousand cached OCSP responses to
  1.25-million, which is roughly 6 hours of work using the optimized default
  settings
- Raise `purgeInterval` in test config from 1ms, which violates API limits, to 800ms

Fixes #5984
2022-03-23 17:23:07 -07:00
Andrew Gabbitas 79048cffba
Support writing initial OCSP response to redis (#5958)
Adds a rocsp redis client to the sa if cluster information is provided in the
sa config. If a redis cluster is configured, all new certificate OCSP
responses added with sa.AddPrecertificate will attempt to be written to
the redis cluster, but will not block or fail on errors.

Fixes: #5871
2022-03-21 20:33:12 -06:00
Aaron Gable 07d56e3772
Add new, simpler revocation methods to RA (#5969)
Add two new gRPC methods to the SA:
- `RevokeCertByKey` will be used when the API request was signed by the
  certificate's keypair, rather than a Subscriber keypair. If the
  request is for reason `keyCompromise`, it will ensure that the key is
  added to the blocked keys table, and will attempt to "re-revoke" a
  certificate that was already revoked for some other reason.
- `RevokeCertByApplicant` supports both the path where the original
  subscriber or another account which has proven control over all of the
  identifier in the certificate requests revocation via the API. It does
  not allow the requested reason to be `keyCompromise`, as these
  requests do not represent a demonstration of key compromise.

In addition, add a new feature flag `MozRevocationReasons` which
controls the behavior of these new methods. If the flag is not set, they
behave like they have historically (see above). If the flag is set to true,
then the new methods enforce the upcoming Mozilla policies around
revocation reasons, namely:
- Only the original Subscriber can choose the revocation reason; other
  clients will get a set reason code based on the method of requesting
  revocation. When the original Subscriber requests reason
  `keyCompromise`, this request will be honored, but the key will not be
  blocked and other certificates with that key will not also be revoked.
- Revocations signed with the certificate key will always get reason
  `keyCompromise`, because we do not know who is sending the request and
  therefore must assume that the use of the key in this way represents
  compromise. Because these requests will always be fore reason
  `keyCompromise`, they will always be added to the blocked keys table
  and they will always attempt "re-revocation".
- Revocations authorized via control of all names in the cert will
  always get reason `cessationOfOperation`, which is to be used when the
  original Subscriber does not control all names in the certificate
  anymore.

Finally, update the existing `AdministrativelyRevokeCertificate` method
to use the new helper functions shared by the two new methods.

Part of #5936
2022-03-14 08:58:17 -07:00
Andrew Gabbitas d006588f46
Orphan finder: Fix redundant syslog config value (#5971)
Replace redundant stdoutlevel with a sysloglevel value in test configs.
2022-02-24 14:24:03 -08:00
Aaron Gable 114d10a6cb
Integrate goodkey checks into cert-checker (#5870) 2022-01-11 09:42:12 -08:00
Jacob Hoffman-Andrews 1c573d592b
Add account cache to WFE (#5855)
Followup from #5839.

I chose groupcache/lru as our LRU cache implementation because it's part
of the golang org, written by one of the Go authors, and very simple
and easy to read.

This adds an `AccountGetter` interface that is implemented by both the
AccountCache and the SA. If the WFE config includes an AccountCache field,
it will wrap the SA in an AccountCache with the configured max size and
expiration time.

We set an expiration time on account cache entries because we want a
bounded amount of time that they may be stale by. This will be used in
conjunction with a delay on account-updating pathways to ensure we don't
allow authentication with a deactivated account or changed key.

The account cache stores corepb.Registration objects because protobufs
have an established way to do a deep copy. Deep copies are important so
the cache can maintain its own internal state and ensure nothing external
is modifying it.

As part of this process I changed construction of the WFE. Previously,
"SA" and "RA" were public fields that were mutated after construction. Now
they are parameters to the constructor, along with the new "accountGetter"
parameter.

The cache includes stats for requests categorized by hits and misses.
2021-12-15 11:10:23 -08:00
Aaron Gable 89000bd61c
Add close-primes detection via Fermat's factorization (#5853)
Add a new check to GoodKey which attempts to factor the public modulus
of the presented key using Fermat's factorization method. This method
will succeed if and only if the prime factors are very close to each
other -- i.e. almost certainly were not selected independently from a
random uniform distribution, but were instead calculated via some other
less secure method.

To support this new feature, add a new config flag to the RA, CA, and
WFE, which all use the GoodKey checks. As part of adding this new config
value, refactor the GoodKey config items into their own config struct
which can be re-used across all services.

If the new `FermatRounds` config value has not been set, it will default
to zero, causing no factorization to be attempted.

Fixes #5850
Part of #5851
2021-12-14 09:19:33 -08:00
Aaron Gable 5c02deabfb
Remove wfe1 integration tests (#5840)
These tests are testing functionality that is no longer in use in
production deployments of Boulder. As we go about removing wfe1
functionality, these tests will break, so let's just remove them
wholesale right now. I have verified that all of the tests removed in
this PR are duplicated against wfe2.

One of the changes in this PR is to cease starting up the wfe1 process
in the integration tests at all. However, that component was serving
requests for the AIA Issuer URL, which gets queried by various OCSP and
revocation tests. In order to keep those tests working, this change also
adds an integration-test-only handler to wfe2, and updates the CA
configuration to point at the new handler.

Part of #5681
2021-12-10 12:40:22 -08:00
Jacob Hoffman-Andrews 3d7206a183
ocsp-updater: add support for writing to Redis (#5825)
If configured, ocsp-updater will write responses to Redis in parallel
with MariaDB, giving up if Redis is slower and incrementing a stat.

Factors out the ShortIDIssuer concept from rocsp-tool into
rocsp_config.
2021-12-06 14:46:46 -08:00
Andrew Gabbitas cbd24db64b
Add ocsp-responder redis lookup support (#5800)
This is the first step in moving OCSP responses from mysql to redis.

Adds support for parallel lookups to mysql and redis. The mysql source
remains the source of truth. If the secondaryLookup [redis] succeeds,
compare against the primaryLookup [mysql] and return if they concur
that the status is the same and the redis source is at least as fresh
as mysql.

There are checks on the database response for `certStatus.IsExpired`,
`certStatus.OCSPLastUpdated.IsZero()` and
`!src.filter.responseMatchesIssuer`.

The expired check isn't necessary for redis because the response will
be set with a ttl and drop out of redis when it reaches the ttl, and
delivering a response for an expired certificate until that happens
isn't a problem. 

The `certStatus.OCSPLastUpdated.IsZero()` check is a MySQL check that
isn't needed in redis.

The `responseMatchesIssuer` check is important and will need to be
checked in some form before MySQL is no longer the source of truth.
There is another project to check issuer for responses and isn't scoped
for this change.
2021-12-06 10:47:05 -07:00
Aaron Gable c7643992a0
Enable USE INDEX hints when querying authz2 table (#5823)
Add a new feature flag `GetAuthzUseIndex` which causes the SA
to add `USE INDEX (regID_identifer_status_expires_idx)` to its authz2
database queries. This should encourage the query planner to actually
use that index instead of falling back to large table-scans.

Fixes #5822
2021-12-01 14:48:09 -08:00
Aaron Gable 8eb7272adf
SA: Use read-only connector for GetAuthorizations2 (#5815)
Add a feature flag which causes the SA to switch between using the
traditional read-write database connector (pointed at the primary db)
or the newer read-only database connector (usually pointed at a
replica) when executing the `GetAuthorizations2` query.
2021-11-24 16:57:42 -08:00
Aaron Gable 1a1cd24237
Add tests for the experimental `renewalInfo` endpoint (#5750)
Add a unit test and an integration test that both exercise the new
experimental ACME Renewal Info endpoint. These tests do not
yet validate the contents of the response, just that the appropriate
HTTP response code is returned, but they will be developed as the
code under test evolves.

Fixes #5674
2021-10-27 15:00:56 -07:00
Jacob Hoffman-Andrews ba0ea090b2
integration: save hierarchy across runs (#5729)
This allows repeated runs using the same hiearchy, and avoids spurious
errors from ocsp-updater saying "This CA doesn't have an issuer cert
with ID XXX"

Fixes #5721
2021-10-20 17:06:33 -07:00
Jacob Hoffman-Andrews dc742fc320
Fix expiration-mailer integration test locally. (#5719)
The expiration mailer processes certificates in batches of size
`certLimit` (default 100). In production, it runs in daemon mode, so it
will go on to the next batch when the current one is done. However, in
local integration tests we rely on it getting all its work done in a
single run. This works when you're running from a clean slate, but if
you've run integration tests a bunch of times, there will be a bunch of
certificates from previous runs that clog up the queue, and it won't
send mail for the specific certificate the integration test is looking
for.

Solution: Set `certLimit` very high in the config.

Also, update the default times for sending mail to match what we have in
prod.
2021-10-18 19:51:34 -07:00
Aaron Gable e0c3e2c1df
Reject unrecognized config keys (#5649)
Instead of using the default `json.Unmarshal`, explicitly
construct and use a `json.Decoder` so that we can set the
`DisallowUnknownFields` flag on the decoder. This causes
any unrecognized config keys to result in errors at boulder
startup time.

Fixes #5643
2021-09-24 10:13:44 -07:00
Aaron Gable 4ef9fb1b4f
Add new SA.NewOrderAndAuthzs gRPC method (#5602)
Add a new method to the SA's gRPC interface which takes both an Order
and a list of new Authorizations to insert into the database, and adds
both (as well as the various ancillary rows) inside a transaction.

To enable this, add a new abstraction layer inside the `db/` package
that facilitates inserting many rows at once, as we do for the `authz2`,
`orderToAuthz2`, and `requestedNames` tables in this operation. 

Finally, add a new codepath to the RA (and a feature flag to control it)
which uses this new SA method instead of separately calling the
`NewAuthorization` method multiple times. Enable this feature flag in
the config-next integration tests.

This should reduce the failure rate of the new-order flow by reducing
the number of database operations by coalescing multiple inserts into a
single multi-row insert. It should also reduce the incidence of new
authorizations being created in the database but then never exposed to
the subscriber because of a failure later in the new-order flow, both by
reducing failures overall and by adding those authorizations in a
transaction which will be rolled back if there is a later failure.

Fixes #5577
2021-09-03 13:48:04 -07:00
J.C. Jones 0f16ff6d17
ocsp-updater: Split work by a configurable serial suffix shard (#5628)
- Enable`ocsp-updater` to query for serials matching a configurable suffix to
  allow for multiple `ocsp-updater` instances at once
- Add field `SerialSuffixShards` to `OCSPUpdaterConfig`
- Add field `serialSuffixShards` to `test/config-next/ocsp-updater.json`
- Add codepath to default to the previous query when `serialSuffixShards` is
  missing from the JSON config

Part of #5629
Fixes #5625
2021-09-02 15:52:18 -07:00
Andrew Gabbitas 17f300387b
BadKeyRevoker: backoff on errors or no work (#5580)
- Add exponential backoff
- Add key `backoffIntervalMax` to JSON config with a default of `60s`

Fixes #5559
2021-08-19 13:31:47 -07:00
Samantha 819d57ebdb
cert-checker: Use cmd.ConfigDuration instead of int to express acceptable validity periods (#5582)
Add `acceptableValidityDurations` to cert-checker's config, which
uses `cmd.ConfigDuration` instead of `int` to express the acceptable
validity periods. Deprecate the older int-based `acceptableValidityPeriods`.
This makes it easier to reason about the values in the configs, and
brings this config into line with other configs (such as the CA).

Fixes #5542
2021-08-17 08:52:22 -07:00
Aaron Gable 1c6842cf69
Delete expired-authz-purger2 (#5570)
Delete the expired-authz-purger2 binary, as well as the
various config files, tests, and test helpers that exist to
support it.

This utility is no longer necessary, as it has not been
running for quite some time, and we have developed
alternative means of keeping the growth of the authz
table under control.

Fixes #5568
2021-08-11 14:39:57 -07:00
Aaron Gable ac3e5e70c4
Delete boulder-janitor (#5571)
Delete the boulder-janitor binary, and the various configs
and tests which exist to support it.

This tool has not been actively running in quite some time.
The tables which is covers are either supported by our
more recent partitioning methods, or are rate-limit tables
that we hope to move out of mysql entirely. The cost of
maintaining the janitor is not offset by the benefits it brings
us (or the lack thereof).

Fixes #5569
2021-08-11 11:10:24 -07:00
J.C. Jones 7b31bdb30a
Add read-only dbConns to SQLStorageAuthority and OCSPUpdater (#5555)
This changeset adds a second DB connect string for the SA for use in 
read-only queries that are not themselves dependencies for read-write 
queries. In other words, this is attempting to only catch things like 
rate-limit `SELECT`s and other coarse-counting, so we can potentially 
move those read queries off the read-write primary database.

It also adds a second DB connect string to the OCSP Updater. This is a 
little trickier, as the subsequent `UPDATE`s _are_ dependent on the 
output of the `SELECT`, but in this case it's operating on data batches,
and a few seconds' replication latency are several orders of magnitude 
below the threshold for update frequency, so any certificates that 
aren't caught on run `n` can be caught on run `n+1`.

Since we export DB metrics to Prometheus, this also refactors 
`InitDBMetrics` to take a DB Address (host:port tuple) and User out of 
the DB connection DSN and include those as labels in the metrics.

Fixes #5550
Fixes #4985
2021-08-02 11:21:34 -07:00
Aaron Gable 3a0996b147
Update cert-checker config to include all params (#5509)
These parameters are currently accepted by cert-checker
either via the command line or via the config. Add them to
the config-next config as we move towards deprecating
their CLI equivalents.

Part of #5489
2021-07-09 10:17:15 -07:00
Aaron Gable 20f1bf1d0d
Compute validity periods inclusive of notAfter second (#5494)
In the CA, compute the notAfter timestamp such that the cert is actually
valid for the intended duration, not for one second longer. In the
Issuance library, compute the validity period by including the full
length of the final second indicated by the notAfter date when
determining if the certificate request matches our profile. Update tests
and config files to match.

Fixes #5473
2021-06-24 13:17:29 -07:00
Aaron Gable b087e6a2bf
cert-checker: take validity period from config (#5490)
Add a new `acceptableValidityPeriods` field to cert-checker's config.
This field is a list of integers representing validity periods measured
in seconds (so 7776000 is 90 days). This field is multi-valued to enable
transitions between different validity periods (e.g. 90 days + 1 second
to 90 days, or 90 days to 30 days). If the field is not provided,
cert-checker defaults to 90 days.

Also update the way that cert-checker computes the validity period of
the certificates it is checking to include the full width of the final
second represented by the notAfter timestamp.

Finally, update the tests to support this new behavior.

Fixes #5472
2021-06-17 17:29:47 -07:00
Aaron Gable 6e1357efa3
Update boulder test validity period to match prod (#5493)
In prod, the CA is now configured to issue certificates with
notAfter timestamps 7775999 seconds after their notBefore
timestamp, and to enforce that same difference when validating
issuance requests.

Update our test configs to match.
2021-06-16 18:08:57 -07:00
Samantha be1c24165e
test: Fix uppercase ECDSAAllowListFilename in test JSON configs (#5487) 2021-06-16 14:24:30 -07:00
Andrew Gabbitas b5aab29407
Make boulder-observer HTTP User-Agent configurable (#5484)
- Make User-Agent configurable in config file
- Fix README example
- Add tests
2021-06-14 11:08:18 -06:00
Samantha 6955df0f56
contact-auditor: Add tool to audit registration contacts (#5425)
Add tool to audit subscriber registrations for e-mail addresses that
`notify-mailer` is currently configured to skip.

- Add `cmd/contact-auditor` with README
- Add test coverage for `cmd/contact-auditor`
- Add config file at `test/config/contact-auditor`

Part of #5372
2021-06-07 14:21:54 -07:00
Aaron Gable 9abb39d4d6
Honeycomb integration proof-of-concept (#5408)
Add Honeycomb tracing to all Boulder components which act as
HTTP servers, gRPC servers, or gRPC clients. Add many values
which we currently emit to logs to the trace spans. Add a way to
configure the Honeycomb integration to our config files, and by
default configure all of our tests to "mute" (send nothing).

Followup changes will refine the configuration, attempt to reduce
the new dependency load, and introduce better sampling.

Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218
2021-05-24 16:13:08 -07:00
Samantha 1f19eee55b
CA: Fix startup bug caused by ECDSA allow list reloader (#5412)
Solve a nil pointer dereference of `ecdsaAllowList` in `boulder-ca` by
calling `reloader.New()` in constructor `ca.NewECDSAAllowListFromFile`
instead.

- Add missing entry `ECDSAAllowListFilename` to
  `test/config-next/ca-a.json` and `test/config-next/ca-b.json`
- Add missing file ecdsaAllowList.yml to `test/config-next`
- Add missing entry `ECDSAAllowedAccounts` to `test/config/ca-a.json`
  and `test/config/ca-b.json`
- Move creation of the reloader to `NewECDSAAllowListFromFile`

Fixes #5414
2021-05-17 14:41:15 -07:00
Aaron Gable a19ebfa0e9
VA: Query SRV to preload/cache DNS resolver addrs (#5360)
Abstract out the way that the bdns library keeps track of the
resolvers it uses to do DNS lookups. Create one implementation,
the `StaticProvider`, which behaves exactly the same as the old
mechanism (providing whatever names or addresses were given
in the config). Create another implementation, `DynamicProvider`,
which re-resolves the provided name on a regular basis.

The dynamic provider consumes a single name, does a lookup
on that name for any SRV records suggesting that it is running a
DNS service, and then looks up A records to get the address of
all the names returned by the SRV query. It exports its successes
and failures as a prometheus metric.

Finally, update the tests and config-next configs to work with
this new mechanism. Give sd-test-srv the capability to respond
to SRV queries, and put the names it provides into docker's
default DNS resolver.

Fixes #5306
2021-04-20 10:11:53 -07:00
Aaron Gable 6e6be607fa
Deprecate StoreIssuerInfo flag (#5386)
This flag is no longer referenced by any code, and can
be safely deprecated.

Part of #5079
2021-04-13 17:18:01 -07:00
Aaron Gable 0196d7e876
orphan-finder: use serial + issuerID for OCSP (#5383)
Update orphan-finder's `generateOCSP` function to make its request to
the CA using the certificate's serial number and issuer ID, rather than
the full DER bytes. To facilitate this, add an `IssuerCerts` item to the
orphan-finder config, and add an `issuers` map to its struct, mimicking
fields of the same name and purpose on the RA. Leave the old code path
in the `generateOCSP` method for now, to be fully removed after the new
config has been deployed.

Also update the unittests to use real on-disk certificates instead of
inline strings, and similarly correct the integration test to use a
certificate with the correct Issuer field.

Part of #5079
Fixes #5149
2021-04-05 09:13:21 -07:00
Samantha 97e393d2e7
boulder-observer (#5315)
Add configuration driven Prometheus black box metric exporter
2021-03-29 12:56:54 -07:00
Aaron Gable 547dbfc93a
Remove Common.DNSResolver from VA config (#5355)
This field is not used by any production configs, so we can safely
remove it.

Also, add config fields for DNSTimeout and DNSAllowLoopbackAddress
outside of the Common sub-struct, to allow for its removal later.

Part of #5242
2021-03-19 10:02:04 -07:00
Samantha 35340ff67a
Move expired-authz-purger2 config to test directory (#5352)
- Edit integration test to start expired-authz-purger2 with config/
  config-next
- Move config from `cmd/expired-authz-purger2/config.json` to
  `test/config/expired-authz-purger2.json`
- Add a copy of `test/config/expired-authz-purger2.json` to
  `test/config-next/`

Fixes #5351
2021-03-18 17:56:25 -07:00
Aaron Gable 91473b384b
Remove common config from publisher (#5353)
The old `config.Common.CT.IntermediateBundleFilename` format is no
longer used in any production configs, and can be removed safely.

Part of #5162
Part of #5242
Fixes #5269
2021-03-18 16:59:06 -07:00
Jacob Hoffman-Andrews b4e483d38b
Add gRPC MaxConnectionAge config. (#5311)
This allows servers to tell clients to go away after some period of time, which triggers the clients to re-resolve DNS.

Per grpc/grpc#12295, this is the preferred way to do this.

Related: #5307.
2021-03-01 18:37:47 -08:00
Samantha e2e7dad034
Move cmd.DBConfig fields to their own named sub-struct (#5286)
Named field `DB`, in a each component configuration struct, acts as the
receiver for the value of `db` when component JSON files are
unmarshalled.

When `cmd.DBConfig` fields are received at the root of component
configuration struct instead of `DB` copy them to the `DB` field of the
component configuration struct.

Move existing `cmd.DBConfig` values from the root of each component's
JSON configuration in `test/config-next` to `db`

Part of #5275
2021-02-16 10:48:58 -08:00
Samantha 2efabf57b6
Adding support for multiple issuers to publisher (#5272)
Publisher currently loads a PEM formatted certificate bundle from file
using LoadCertBundle a utility function in the core package.
LoadCertBundle parses the PEM file to a slice of x509.Certificates and
returns them to boulder-publisher (without checking validity). Using
these x509 Certificates, boulder-publisher to construct an ASN1Cert
bundle. This bundle is passed to each new publisher instance. When
publisher receives a request it unconditionally appends this bundle to
each end-entity precertificate for submission to CT logs.

This change augments this process to add support for multiple issuers
using the IssuerNameID concept in the Issuance package. Config field
Common.CT.CertificateBundleFilename has been replaced with the Chains
field. LoadChain, a utility function added in PR #5271, loads and
validates the chain (which nets us some added deploy-time safety) before
returning it to boulder-publisher. Using these x509 Certificates,
boulder-publisher constructs a mapping of IssuerNameID to ASN1Cert
bundle and passes this to each new publisher instance. When publisher
receives a request it determines the IssuerNameID of the precertificate
to select and append the correct ASN1Cert bundle for a given Issuer.

A followup issue #5269 has been created to address removal of the Common
field from the publisher configuration and code has been commented with
TODOs where code will need to be removed or refactored.

Fixes #1669
2021-02-08 12:23:44 -08:00
Andrew Gabbitas 0fdfbe1211
Deprecate StripDefaultSchemePort flag (#5265)
This flag is now enabled in Let's Encrypt staging/prod.

This change deprecates the flag and prepares it for deletion in a future
change. It can then be removed once no staging/prod configs reference the
flag.

Fixes #5236
2021-02-08 11:30:52 -08:00
Aaron Gable 379826d4b5
WFE2: Improve support for multiple issuers & chains (#5247)
This change simplifies and hardens the wfe2's support for having
multiple issuers, and multiple chains for each issuer, configured
and loaded in memory.

The only config-visible change is replacing the old two separate config
values (`certificateChains` and `alternateCertificateChains`) with a
single value (`chains`). This new value does not require the user to
know and hand-code the AIA URLs at which the certificates are available;
instead the chains are simply presented as lists of files. If this new
config value is present, the old config values will be ignored; if it
is not, the old config values will be respected.

Behind the scenes, the chain loading code has been completely changed.
Instead of loading PEM bytes directly from the file, and then asserting
various things (line endings, no trailing bits, etc) about those bytes,
we now parse a certificate from the file, and in-memory recreate the
PEM from that certificate. This approach allows the file loading to be
much more forgiving, while also being stricter: we now check that each
certificate in the chain is correctly signed by the next cert, and that
the last cert in the chain is a self-signed root.

Within the WFE itself, most of the internal structure has been retained.
However, both the internal `issuerCertificates` (used for checking
that certs we are asked to revoke were in fact issued by us) and the
`certificateChains` (used to append chains to end-entity certs when
served to clients) have been updated to be maps keyed by IssuerNameID.
This allows revocation checking to not have to iterate through the
whole list of issuers, and also makes it easy to double-check that
the signatures on end-entity certs are valid before serving them. Actual
checking of the validity will come in a follow-up change, due to the
invasive nature of the necessary test changes.

Fixes #5164
2021-01-27 15:07:58 -08:00
Samantha e0510056cc
Enhancements to SQL driver tuning via JSON config (#5235)
Historically the only database/sql driver setting exposed via JSON
config was maxDBConns. This change adds support for maxIdleConns,
connMaxLifetime, connMaxIdleTime, and renames maxDBConns to
maxOpenConns. The addition of these settings will give our SRE team a
convenient method for tuning the reuse/closure of database connections.

A new struct, DBSettings, has been added to SA. The struct, and each of
it's fields has been commented.

All new fields have been plumbed through to the relevant Boulder
components and exported as Prometheus metrics. Tests have been
added/modified to ensure that the fields are being set. There should be
no loss in coverage

Deployability concerns for the migration from maxDBConns to maxOpenConns
have been addressed with the temporary addition of the helper method
cmd.DBConfig.GetMaxOpenConns(). This method can be removed once
test/config is defaulted to using maxOpenConns. Relevant sections of the
code have TODOs added that link back to an newly opened issue.

Fixes #5199
2021-01-25 15:34:55 -08:00
Jacob Hoffman-Andrews 8b9145838d
Add logging of OCSP generation events (#5223)
This adds a new component to the CA, ocspLogQueue, which batches up
OCSP generation events for audit logging. It will log accumulated
events when it reaches a certain line length, or when a maximum amount
of times has passed.
2021-01-12 15:31:49 -08:00
Aaron Gable beee17c510
Janitor: refactor to be controlled by config (#5195)
Previously, configuration of the boulder-janitor was split into
two places: the actual json config file (which controlled which
jobs would be enabled, and what their rate limits should be), and
the janitor code itself (which controlled which tables and columns
those jobs should query). This resulted in significant duplicated
code, as most of the jobs were identical except for their table
and column names.

This change abstracts away the query which jobs use to find work.
Instead of having each job type parse its own config and produce
its own work query (in Go code), now each job supplies just a few
key values (the table name and two column names) in its JSON config,
and the Go code assembles the appropriate query from there. We are
able to delete all of the files defining individual job types, and
replace them with a single slightly smarter job constructor.

This enables further refactorings, namely:
* Moving all of the logic code into its own module;
* Ensuring that the exported interface of that module is safe (i.e.
  that a client cannot create and run jobs without them being valid,
  because the only exposed methods ensure validity);
* Collapsing validity checks into a single location;
* Various renamings.
2020-12-17 09:53:22 -08:00