boulder

Commit Graph

Author	SHA1	Message	Date
Phil Porada	3366be50f1	Use RFC 7093 truncated SHA256 hash for Subject Key Identifier (#7179 ) - Adds a feature flag to gate rollout for SHA256 Subject Key Identifiers for end-entity certificates. - The ceremony tool will now use the RFC 7093 section 2 option 1 method for generating Subject Key Identifiers for future root CA, intermediate CA, and cross-sign ceremonies. - - - - [RFC 7093 section 2 option 1](https://datatracker.ietf.org/doc/html/rfc7093#section-2) provides a method for generating a truncated SHA256 hash for the Subject Key Identifier field in accordance with Baseline Requirement [section 7.1.2.11.4 Subject Key Identifier](`90a98dc7c1/docs/BR.md (712114-subject-key-identifier)`). > [RFC5280] specifies two examples for generating key identifiers from > public keys. Four additional mechanisms are as follows: > > 1) The keyIdentifier is composed of the leftmost 160-bits of the > SHA-256 hash of the value of the BIT STRING subjectPublicKey > (excluding the tag, length, and number of unused bits). The related [RFC 5280 section 4.2.1.2](https://datatracker.ietf.org/doc/html/rfc5280#section-4.2.1.2) states: > For CA certificates, subject key identifiers SHOULD be derived from > the public key or a method that generates unique values. Two common > methods for generating key identifiers from the public key are: > ... > Other methods of generating unique numbers are also acceptable.	2023-12-06 13:44:17 -05:00
Matthew McPherrin	32adaf1846	Make log-validator take glob patterns to monitor for log files (#7172 ) To simplify deployment of the log validator, this allows wildcards (using go's filepath.Glob) to be included in the file paths. In order to detect new files, a new background goroutine polls the glob patterns every minute for matches. Because the "monitor" function is running in its own goroutine, a lock is needed to ensure it's not trying to add new tailers while shutdown is happening.	2023-11-27 12:48:46 -08:00
Aaron Gable	16081d8e30	Invert RequireCommonName into AllowNoCommonName (#7139 ) The RequireCommonName feature flag was our only "inverted" feature flag, which defaulted to true and had to be explicitly set to false. This inversion can lead to confusion, especially to readers who expect all Go default values to be zero values. We plan to remove the ability for our feature flag system to support default-true flags, which the existence of this flag blocked. Since this flag has not been set in any real configs, inverting it is easy. Part of https://github.com/letsencrypt/boulder/issues/6802	2023-11-06 10:58:30 -08:00
Aaron Gable	81cb970d30	Remove crlURL from test CA issuer configs (#7132 ) This value is always set to the empty string in prod, which (correctly) results in the issued certificates not having a CRLDP at all. It turns out our integration test environment has been including CRLDPs in all of our test certs because we set crlURL to a non-empty value! This change updates our test configs to match reality. I'll remove the code which supports this config value as part of my upcoming CA CRLDP changes.	2023-11-02 11:20:50 -07:00
Samantha	9aef5839b5	WFE: Add new key-value ratelimits implementation (#7089 ) Integrate the key-value rate limits from #6947 into the WFE. Rate limits are backed by the Redis source added in #7016, and use the SRV record shard discovery added in #7042. Part of #5545	2023-10-04 14:12:38 -04:00
Aaron Gable	3b880e1ccf	Add CAAAfterValidation feature flag (#7082 ) Add a new feature flag "CAAAfterValidation" which, when set to true in the VA, causes the VA to only begin CAA checks after basic domain control validation has completed successfully. This will make successful validations take longer, since the DCV and CAA checks are performed serially instead of in parallel. However, it will also reduce the number of CAA checks we perform by up to 80%, since such a high percentage of validations also fail. IN-9575 tracks enabling this feature flag in staging and prod Fixes https://github.com/letsencrypt/boulder/issues/7058	2023-09-18 13:30:31 -07:00
Aaron Gable	102b447e8d	Smoother scheduling and leasing for crl-updater (#7010 ) Overhaul crl-updater's default (i.e. non-runOnce) behavior to update individual CRL shards continuously, rather than updating all shards in a large batch. To accomplish this, it spins up one goroutine for each shard of each issuer this updater is responsible for. Each goroutine is solely responsible for its assigned shard. It sleeps for a random amount of time (to stagger their starts), then begins a ticker to wake up every updateInterval and re-issue its shard. As part of this change, refactor updater.go into three separate files (batch.go, continuous.go, and updater.go) containing functions dedicated to single-run batch processing, long-running continuous processing, and shared helpers, respectively. IN-9475 tracks the deprecation of the `updateOffset` config key. The other configuration changes in this PR do not require production changes. Fixes https://github.com/letsencrypt/boulder/issues/7023	2023-09-08 09:16:15 -07:00
Phil Porada	72e01b337a	ceremony: Distinguish between intermediate and cross-sign ceremonies (#7005 ) In `//cmd/ceremony`: * Added `CertificateToCrossSignPath` to the `cross-certificate` ceremony type. This new input field takes an existing certificate that will be cross-signed and performs checks against the manually configured data in each ceremony file. * Added byte-for-byte subject/issuer comparison checks to root, intermediate, and cross-certificate ceremonies to detect that signing is happening as expected. * Added Fermat factorization check from the `//goodkey` package to all functions that generate new key material. In `//linter`: * The Check function now exports linting certificate bytes. The idea is that a linting certificate's `tbsCertificate` bytes can be compared against the final certificate's `tbsCertificate` bytes as a verification that `x509.CreateCertificate` was deterministic and produced identical DER bytes after each signing operation. Other notable changes: * Re-orders the issuers list in each CA config to match staging and production. There is an ordering issue mentioned by @aarongable two years ago on IN-5913 that didn't make it's way back to this repository. > Order here matters – the default chain we serve for each intermediate should be the first listed chain containing that intermediate. * Enables `ECDSAForAll` in `config-next` CA configs to match Staging. * Generates 2x new ECDSA subordinate CAs cross-signed by an RSA root and adds these chains to the WFE for clients to download. * Increased the test.sh startup timeout to account for the extra ceremony run time. Fixes https://github.com/letsencrypt/boulder/issues/7003 --------- Co-authored-by: Aaron Gable <aaron@letsencrypt.org>	2023-08-23 14:01:19 -04:00
Aaron Gable	6a450a2272	Improve CRL shard leasing (#7030 ) Simplify the index-picking logic in the SA's leaseOldestCrlShard method. Specifically, more clearly separate it into "missing" and "non-missing" cases, which require entirely different logic: picking a random missing shard, or picking the oldest unleased shard, respectively. Also change the UpdateCRLShard method to "unlease" shards when they're updated. This allows the crl-updater to run as quickly as it likes, while still ensuring that multiple instances do not step on each other's toes. The config change for shardWidth and lookbackPeriod instead of certificateLifetime has been deployed in prod since IN-8445. The config change changing the shardWidth is just so that the tests neither produce a bazillion shards, nor have to do a bazillion SA queries for each chunk within a shard, improving the readability of test logs. Part of https://github.com/letsencrypt/boulder/issues/7023	2023-08-08 17:05:00 -07:00
Aaron Gable	9a4f0ca678	Deprecate LeaseCRLShards feature (#7009 ) This feature flag is enabled in both staging and prod.	2023-08-07 15:17:00 -07:00
Jacob Hoffman-Andrews	725f190c01	ca: remove orphan queue code (#7025 ) The `orphanQueueDir` config field is no longer used anywhere. Fixes #6551	2023-08-02 16:04:28 -07:00
Jacob Hoffman-Andrews	8d7b87c9ca	cert-checker: check for precertificate correspondence (#7015 ) This adds a lookup in cert-checker to find the linting precertificate with the same serial number as a given final certificate, and checks precertificate correspondence between the two. Fixes #6959	2023-07-28 12:45:47 -04:00
Aaron Gable	908421bb98	crl-updater: lease CRL shards to prevent races (#6941 ) Add a new feature flag, LeaseCRLShards, which controls certain aspects of crl-updater's behavior. When this flag is enabled, crl-updater calls the new SA.LeaseCRLShard method before beginning work on a shard. This prevents it from stepping on the toes of another crl-updater instance which may be working on the same shard. This is important to prevent two competing instances from accidentally updating a CRL's Number (which is an integer representation of its thisUpdate timestamp) backwards, which would be a compliance violation. When this flag is enabled, crl-updater also calls the new SA.UpdateCRLShard method after finishing work on a shard. In the future, additional work will be done to make crl-updater use the "give me the oldest available shard" mode of the LeaseCRLShard method. Fixes https://github.com/letsencrypt/boulder/issues/6897	2023-07-19 15:11:16 -07:00
Aaron Gable	e09c5faf5e	Deprecate CAA AccountURI and ValidationMethods feature flags (#7000 ) These flags are set to true in all environments.	2023-07-14 14:54:39 -04:00
Jacob Hoffman-Andrews	cd24b9db20	ca: deprecate StoreLintingCertificateInsteadOfPrecertificate (#6970 ) And turn off the orphan queue in config-next.	2023-07-05 10:44:08 -07:00
Samantha	124c4cc6f5	grpc/sa: Implement deep health checks (#6928 ) Add the necessary scaffolding for deep health checking of our various gRPC components. Each component implementation that also implements the grpc.checker interface will be checked periodically, and the health status of the component will be updated accordingly. Add the necessary methods to SA to implement the grpc.checker interface and register these new health checks with Consul. Additionally: - Update entry point script to check for ProxySQL readiness. - Increase the poll rate for gRPC Consul checks from 5s to 2s to help with DNS failures, due to check failures, on startup. - Change log level for Consul from INFO to ERROR to deal with noisy logs full of transport failures due to Consul gRPC checks firing before the SAs are up. Fixes #6878 Part of #6795	2023-06-12 13:58:53 -04:00
Jacob Hoffman-Andrews	2041e8723b	integration: shorten log output (#6894 ) Remove the load test stage of the integration test, which generates superfluous amounts of log. Turn down logging on the CA and VA from info to error-only. Part of https://github.com/letsencrypt/boulder/issues/6890	2023-06-05 13:11:19 -04:00
Jacob Hoffman-Andrews	80e1510819	admin: add clear-email subcommand (#6919 ) When a user wants their email address deleted from the database but no longer has access to their account, this allows an administrator to clear it. This adds `admin` as an alias for `admin-revoker`, because we'd like the clear-email sub-command to be a part of that overall tool, but it's not really revocation related. Part of #6864	2023-06-01 14:33:24 -04:00
Samantha	f09a94bd74	consul: Configure gRPC health check for SA (#6908 ) Enable SA gRPC health checks in Consul ahead of further changes for #6878. Calls to the `Check` method of the SA's grpc.health.v1.Health service must respond `SERVING` before the `sa` service will be advertised in Consul DNS. Consul will continue to poll this service every 5 seconds. - Add `bconsul` docker service to boulder `bluenet` and `rednet` - Add TLS credentials for `consul.boulder`: ```shell $ openssl x509 -in consul.boulder/cert.pem -text \| grep DNS DNS:consul.boulder ``` - Update `test/grpc-creds/generate.sh` to add `consul.boulder` - Update test SA configs to allow `consul.boulder` to access to `grpc.health.v1.Health` Part of #6878	2023-05-23 13:16:49 -04:00
Aaron Gable	fe523f142d	crl-updater: retry failed shards (#6907 ) Add per-shard exponential backoff and retry to crl-updater. Each individual CRL shard will be retried up to MaxAttempts (default 1) times, with exponential backoff starting at 1 second and maxing out at 1 minute between each attempt. This can effectively reduce the parallelism of crl-updater: while a goroutine is sleeping between attempts of a failing shard, it is not doing work on another shard. This is a desirable feature, since it means that crl-updater gently reduces the total load it places on the network and database when shards start to fail. Setting this new config parameter is tracked in IN-9140 Fixes https://github.com/letsencrypt/boulder/issues/6895	2023-05-22 12:59:09 -07:00
Matthew McPherrin	b7d9f8c2e3	In config-next/, opentelemetry -> openTelemetry for consistency (#6888 ) In configs, opentelemetry -> openTelemetry As pointed out in review of #6867, these should match the case of their corresponding Go identifiers for consistency. JSON keys are case-insensitive in Go (part of why we've got a fork in go-jose), so this change should have no functional impact.	2023-05-15 17:07:29 -04:00
Matthew McPherrin	3aae67b8a9	Opentelemetry: Add option for public endpoints (#6867 ) This PR adds a new configuration block specifically for the otelhttp instrumentation. This block is separate from the existing "opentelemetry" configuration, and is only relevant when using otelhttp instrumentation. It does not share any codepath with the existing configuration, so it is at the top level to indicate which services it applies to. There's a bit of plumbing new configuration through. I've adopted the measured_http package to also set up opentelemetry instead of just metrics, which should hopefully allow any future changes to be smaller (just config & there) and more consistent between the wfe2 and ocsp responder. There's one option here now, which disables setting [otelhttp.WithPublicEndpoint](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp#WithPublicEndpoint). This option is designed to do exactly what we want: Don't accept incoming spans as parents of the new span created in the server. Previously we had a setting to disable parent-based sampling to help with this problem, which doesn't really make sense anymore, so let's just remove it and simplify that setup path. The default of "false" is designed to be the safe option. It's set to True in the test/ configs for integration tests that use traces, and I expect we'll likely set it true in production eventually once the LBs are configured to handle tracing themselves. Fixes #6851	2023-05-12 15:34:34 -04:00
Samantha	310546a14e	VA: Support discovery of DNS resolvers via Consul (#6869 ) Deprecate `va.DNSResolver` in favor of backwards compatible `va.DNSProvider`. Fixes #6852	2023-05-12 12:54:31 -04:00
Samantha	19c5244088	test: Use consul hostname instead of IP for dnsAuthority (#6883 ) Standardize on hostnames for dnsAuthority to match production. Related to #6869	2023-05-11 14:13:53 -07:00
Jacob Hoffman-Andrews	f295626e4c	ca: remove simulated ISRG OID from config (#6879 ) We intend to issue in the future with only the CA/Browser Forum Domain Validated OID.	2023-05-10 12:39:12 -04:00
Jacob Hoffman-Andrews	ac4be89b56	grpc: add NoWaitForReady config field (#6850 ) Currently we set WaitForReady(true), which causes gRPC requests to not fail immediately if no backends are available, but instead wait until the timeout in case a backend does become available. The downside is that this behavior masks true connection errors. We'd like to turn it off. Fixes #6834	2023-05-09 16:16:44 -07:00
Matthew McPherrin	8427245675	OTel Integration test using jaeger (#6842 ) This adds Jaeger's all-in-one dev container (with no persistent storage) to boulder's dev docker-compose. It configures config-next/ to send all traces there. A new integration test creates an account and issues a cert, then verifies the trace contains some set of expected spans. This test found that async finalize broke spans, so I fixed that and a few related spots where we make a new context.	2023-05-05 10:41:29 -04:00
Jacob Hoffman-Andrews	1c7e0fd1d8	Store linting certificate instead of precertificate (#6807 ) In order to get rid of the orphan queue, we want to make sure that before we sign a precertificate, we have enough data in the database that we can fulfill our revocation-checking obligations even if storing that precertificate in the database fails. That means: - We should have a row in the certificateStatus table for the serial. - But we should not serve "good" for that serial until we are positive the precertificate was issued (BRs 4.9.10). - We should have a record in the live DB of the proposed certificate's public key, so the bad-key-revoker can mark it revoked. - We should have a record in the live DB of the proposed certificate's names, so it can be revoked if we are required to revoke based on names. The SA.AddPrecertificate method already achieves these goals for precertificates by writing to the various metadata tables. This PR repurposes the SA.AddPrecertificate method to write "proposed precertificates" instead. We already create a linting certificate before the precertificate, and that linting certificate is identical to the precertificate that will be issued except for the private key used to sign it (and the AKID). So for instance it contains the right pubkey and SANs, and the Issuer name is the same as the Issuer name that will be used. So we'll use the linting certificate as the "proposed precertificate" and store it to the DB, along with appropriate metadata. In the new code path, rather than writing "good" for the new certificateStatus row, we write a new, fake OCSP status string "wait". This will cause us to return internalServerError to OCSP requests for that serial (but we won't get such requests because the serial has not yet been published). After we finish precertificate issuance, we update the status to "good" with SA.SetCertificateStatusReady. Part of #6665	2023-04-26 13:54:24 -07:00
Aaron Gable	45329c9472	Deprecate ROCSPStage7 flag (#6804 ) Deprecate the ROCSPStage7 feature flag, which caused the RA and CA to stop generating OCSP responses when issuing new certs and when revoking certs. (That functionality is now handled just-in-time by the ocsp-responder.) Delete the old OCSP-generating codepaths from the RA and CA. Remove the CA's internal reference to an OCSP implementation, because it no longer needs it. Additionally, remove the SA's "Issuers" config field, which was never used. Fixes #6285	2023-04-12 17:03:06 -07:00
Aaron Gable	7e994a1216	Deprecate ROCSPStage6 feature flag (#6770 ) Deprecate the ROCSPStage6 feature flag. Remove all references to the `ocspResponse` column from the SA, both when reading from and when writing to the `certificateStatus` table. This makes it safe to fully remove that column from the database. IN-8731 enabled this flag in all environments, so it is safe to deprecate. Part of #6285	2023-04-04 15:41:51 -07:00
Aaron Gable	8c67769be4	Remove ocsp-updater from Boulder (#6769 ) Delete the ocsp-updater service, and the //ocsp/updater library that supports it. Remove test configs for the service, and remove references to the service from other test files. This service has been fully shut down for an extended period now, and is safe to remove. Fixes #6499	2023-03-31 14:39:04 -07:00
Samantha	b2224eb4bc	config: Add validation tags to all configuration structs (#6674 ) - Require `letsencrypt/validator` package. - Add a framework for registering configuration structs and any custom validators for each Boulder component at `init()` time. - Add a `validate` subcommand which allows you to pass a `-component` name and `-config` file path. - Expose validation via exported utility functions `cmd.LookupConfigValidator()`, `cmd.ValidateJSONConfig()` and `cmd.ValidateYAMLConfig()`. - Add unit test which validates all registered component configuration structs against test configuration files. Part of #6052	2023-03-21 14:08:03 -04:00
Aaron Gable	6d6f3632da	Change SetCommonName to RequireCommonName (#6749 ) Change the SetCommonName flag, introduced in #6706, to RequireCommonName. Rather than having the flag control both whether or not a name is hoisted from the SANs into the CN and whether or not the CA is willing to issue certs with no CN, this updated flag now only controls the latter. By default, the new flag is true, and continues our current behavior of failing issuance if we cannot set a CN in the cert. When the flag is set to false, then we are willing to issue certificates for which the CSR contains no CN and there is no SAN short enough to be hoisted into the CN field. When we have rolled out this change, we can move on to the next flag in this series: HoistCommonName, which will control whether or not a SAN is hoisted at all, effectively giving the CSRs (and therefore the clients) full control over whether their certificate contains a SAN. This change is safe because no environment explicitly sets the SetCommonName flag to false yet. Fixes #5112	2023-03-21 11:07:06 -07:00
Matthew McPherrin	05c9106eba	lints: Consistently format JSON configuration files (#6755 ) - Consistently format existing test JSON config files - Add a small Python script which loads and dumps JSON files - Add CI JSON lint test to CI --------- Co-authored-by: Aaron Gable <aaron@aarongable.com>	2023-03-20 18:11:19 -04:00
Matthew McPherrin	e1ed1a2ac2	Remove beeline tracing (#6733 ) Remove tracing using Beeline from Boulder. The only remnant left behind is the deprecated configuration, to ensure deployability. We had previously planned to swap in OpenTelemetry in a single PR, but that adds significant churn in a single change, so we're doing this as multiple steps that will each be significantly easier to reason about and review. Part of #6361	2023-03-14 15:14:27 -07:00
Aaron Gable	9af4871e59	Add SetCommonName feature flag (#6706 ) Add a new feature flag, `SetCommonName`, which defaults to `true`. In this default state, no behavior changes. When set to `false` on the CA, this flag will cause the CA to leave the Subject commonName field of the certificate blank, as is recommended by the Baseline Requirements Section 7.1.4.2.2(a). Also slightly modify the behavior of the RA's `matchesCSR()` function, to allow for both certificates that have a CN and certificates that don't. It is not feasible to put this behavior behind the same SetCommonName flag, because that would require an atomic deploy of both the RA and the CA. Obsoletes #5112	2023-03-09 13:31:55 -05:00
Aaron Gable	29bf521121	CA: Remove secondary gRPC servers (#6496 ) Remove the OCSPGenerator and CRLGenerator gRPC servers that run on separate ports from the CA's main gRPC server, which exposes both those and the CertificateAuthority service as well. These additional servers are no longer necessary, now that all three services are exposed on the single address/port. Fixes #6448	2023-03-01 11:45:28 -08:00
Samantha	98ef3bb2b4	VA/config: Remove unused va.CAA service in config (#6697 ) GRPC config from `va.VA` is used for both `va.VA` and `va.CAA`.	2023-02-27 13:44:47 -05:00
Samantha	a0fe7dc93e	SA: Remove Redis config (#6695 ) This field doesn't appear to be in use. Part of #6052	2023-02-27 09:29:38 -08:00
Aaron Gable	cdf1a6f9f9	Add flag to make order finalization async (#6589 ) Add the "AsyncFinalize" feature flag. When enabled, this causes the RA to return almost immediately from FinalizeOrder requests, with the actual hard work of issuing the precertificate, getting SCTs, issuing the final certificate, and updating the database accordingly all occuring in a background goroutine while the client polls the GetOrder endpoint waiting for the result. This is implemented by factoring out the majority of the finalization work into a new `issueCertificateOuter` helper function, and simply using the new flag to determine whether we call that helper in a goroutine or not. This makes removing the feature flag in the future trivially easy. Also add a new prometheus metric named `inflight_finalizes` which can be used to count the number of simultaneous goroutines which are performing finalization work. This metric is exported regardless of the state of the AsyncFinalize flag, so that we can observe any changes to this metric when the flag is flipped. Fixes #6575	2023-02-24 09:57:54 -08:00
Jacob Hoffman-Andrews	79250756bf	expiration-mailer: limit number of mails sent to same address per day (#6675 ) This adds a config field, "mailsPerAddressPerDay." Addresses that get that many mails won't receive any more until the next day (UTC). Fixes #6508.	2023-02-22 15:24:31 -08:00
Phil Porada	6c84a69043	Remove MandatoryPOSTasGET flag (#6672 ) Remove the `MandatoryPOSTasGET` flag from the WFE2. Update the ACMEv2 divergence doc to note that neither staging nor production use MandatoryPOSTasGET. Fixes #6582.	2023-02-17 13:04:31 -05:00
Jacob Hoffman-Andrews	cd1bbc0d82	Tidy up integration test environment (#6668 ) Remove `example.com` domain name, which was used by the deleted OldTLS tests. Remove GODEBUG=x509sha1=1. Add a longer comment for the Consul DNS fallback in docker-compose.yml. Use the "dnsAuthority" field for all gRPC clients in config-next, instead of implicitly relying on the system DNS. This matches what we do in prod. Make "dnsAuthority" field of GRPCClientConfig mandatory whenever SRVLookup or SRVLookups is used. Make test/config/ocsp-responder.json use ServerAddress instead of SRVLookup, like the rest of test/config.	2023-02-16 09:33:24 -08:00
Aaron Gable	f9e4fb6c06	Add replication lag retries to some SA methods (#6649 ) Add a new time.Duration field, LagFactor, to both the SA's config struct and the read-only SA's implementation struct. In the GetRegistration, GetOrder, and GetAuthorization2 methods, if the database select returned a NoRows error and a lagFactor duration is configured, then sleep for lagFactor seconds and retry the select. This allows us to compensate for the replication lag between our primary write database and our read-only replica databases. Sometimes clients will fire requests in rapid succession (such as creating a new order, then immediately querying the authorizations associated with that order), and the subsequent requests will fail because they are directed to read replicas which are lagging behind the primary. Adding this simple sleep-and-retry will let us mitigate many of these failures, without adding too much complexity. Fixes #6593	2023-02-14 17:25:13 -08:00
Samantha	5c49231ea6	ROCSP: Remove support for Redis Cluster (#6645 ) Fixes #6517	2023-02-09 17:14:37 -05:00
Phil Porada	134321040b	Default ReuseValidAuthz to true (#6644 ) `ReuseValidAuthz` was introduced here [1] and enabled in staging and production configs on 2016-07-13. There was a brief stint during the TLS-SNI-01 challenge type removal where SRE disabled it. However, time has finally come to remove this configuration option. Issue #6623 will determine the feasibility of shorter authz lifetimes and potentially the removal of authz reuse. This change is broken up into two parts to allow SRE to safely remove the flag from staging and production configs. We'll merge this PR, SRE will deploy boulder and the config change, then we'll finish removing `ReuseValidAuthz` configuration from the codebase. [1] boulder commit `9abc212448` Part 1 of 2 for fixing #2734.	2023-02-09 14:26:06 -05:00
Samantha	d73125d8f6	WFE: Add custom balancer implementation which routes nonce redemption RPCs by prefix (#6618 ) Assign nonce prefixes for each nonce-service by taking the first eight characters of the the base64url encoded HMAC-SHA256 hash of the RPC listening address using a provided key. The provided key must be same across all boulder-wfe and nonce-service instances. - Add a custom `grpc-go` load balancer implementation (`nonce`) which can route nonce redemption RPC messages by matching the prefix to the derived prefix of the nonce-service instance which created it. - Modify the RPC client constructor to allow the operator to override the default load balancer implementation (`round_robin`). - Modify the `srv` RPC resolver to accept a comma separated list of targets to be resolved. - Remove unused nonce-service `-prefix` flag. Fixes #6404	2023-02-03 17:52:18 -05:00
Jacob Hoffman-Andrews	e57c788086	Add checking of validations to cert-checker (#6617 ) This includes two feature flags: one that controls turning on the extra database queries, and one that causes cert-checker to fail on missing validations. If the second flag isn't turned on, it will just emit error log lines. This will help us find any edge conditions we need to deal with before making the new code trigger alerts. Fixes #6562	2023-02-03 16:25:41 -05:00
Jacob Hoffman-Andrews	9d3f7d8f84	Add timeout config to WFE (#6621 )	2023-01-30 10:07:41 -08:00
Aaron Gable	a7dc34f127	ocsp-responder: make db config optional (#6601 ) In #6293, we gave the ocsp-responder the ability to use a gRPC connection to the SA to get status information for certificates, rather than using a database connection directly. However, that change neglected to make the database connection configuration optional: an ocsp-responder with an SA gRPC client configured would never use its database connection, but if it wasn't configured it would refuse to start. Fix this oversight by making the DBConfig stanza optional.	2023-01-26 15:21:39 -08:00
Phil Porada	3866e4f60d	VA: Use default PortConfig during testing (#6609 ) Part of #3940	2023-01-25 16:16:08 -05:00
Phil Porada	aae4175186	Remove deprecated feature flags (#6566 ) Remove deprecated feature flags. Fixes #6559	2023-01-23 20:56:15 -05:00
Matthew McPherrin	1f6a873fcc	Remove MandatoryPOSTAsGET from config-next (#6585 ) In preparation for removing this flag completely in #6582 , remove it from config-next. This matches boulder's configuration in all LE environments.	2023-01-12 17:42:28 -08:00
Samantha	6c6da76400	ROCSP: Replace Redis Cluster with a consistently sharded all-primary nodes (#6516 )	2022-12-19 15:06:47 -05:00
Jacob Hoffman-Andrews	fe2cf7d136	ocsp: add load shedding for live signer (#6523 ) In live.go we use a semaphore to limit how many inflight signing requests we can have, so a flood of OCSP traffic doesn't flood our CA instances. If traffic exceeds our capacity to sign responses for long enough, we want to eventually start fast-rejecting inbound requests that are unlikely to get serviced before their deadline is reached. To do that, add a MaxSigningWaiters config field to the OCSP responder. Note that the files in //semaphore are forked from x/sync/semaphore, with modifications to add the MaxWaiters field and functionality. Fixes #6392	2022-12-12 15:48:44 -08:00
Aaron Gable	ba34ac6b6e	Use read-only SA clients in wfe, ocsp, and crl (#6484 ) In the WFE, ocsp-responder, and crl-updater, switch from using StorageAuthorityClients to StorageAuthorityReadOnlyClients. This ensures that these services cannot call methods which write to our database. Fixes #6454	2022-12-02 13:48:28 -08:00
Jacob Hoffman-Andrews	75338135e4	expiration-mailer: use a JOIN to find work more efficiently (#6439 ) Right now the expiration mailer does one big SELECT on `certificateStatus` to find certificates to work on, then several thousand SELECTs of individual serial numbers in `certificates`. Since it's more efficient to get that data as a stream from a single query, rather than thousands of separate queries, turn that into a JOIN. NOTE: We used to use a JOIN, and switched to the current approach in #2440 for performance reasons. I _believe_ part of the issue was that at the time we were not using READ UNCOMMITTED, so we may have been slowing down the database by requiring it to keep copies of a lot of rows during the query. Still, it's possible that I've misunderstood the performance characteristics here and it will still be a regression to use JOIN. So I've gated the new behavior behind a feature flag. The feature flag required extracting a new function, `getCerts`. That in turn required changing some return types so we are not as closely tied to `core.Certificate`. Instead we use a new local type named `certDERWithRegId`, which can be provided either by the new code path or the old code path.	2022-11-14 17:34:58 -08:00
Aaron Gable	4f473edfa8	Deprecate 10 feature flags (#6502 ) Deprecate these feature flags, which are consistently set in both prod and staging and which we do not expect to change the value of ever again: - AllowReRevocation - AllowV1Registration - CheckFailedAuthorizationsFirst - FasterNewOrdersRateLimit - GetAuthzReadOnly - GetAuthzUseIndex - MozRevocationReasons - RejectDuplicateCSRExtensions - RestrictRSAKeySizes - SHA1CSRs Move each feature flag to the "deprecated" section of features.go. Remove all references to these feature flags from Boulder application code, and make the code they were guarding the only path. Deduplicate tests which were testing both the feature-enabled and feature-disabled code paths. Remove the flags from all config-next JSON configs (but leave them in config ones until they're fully deleted, not just deprecated). Finally, replace a few testdata CSRs used in CA tests, because they had SHA1WithRSAEncryption signatures that are now rejected. Fixes #5171 Fixes #6476 Part of #5997	2022-11-14 09:24:50 -08:00
Aaron Gable	9e67423110	Create new StorageAuthorityReadOnly gRPC service (#6483 ) Create a new gRPC service named StorageAuthorityReadOnly which only exposes a read-only subset of the existing StorageAuthority service's methods. Implement this by splitting the existing SA in half, and having the read-write half embed and wrap an instance of the read-only half. Unfortunately, many of our tests use exported read-write methods as part of their test setup, so the tests are all being performed against the read-write struct, but they are exercising the same code as the read-only implementation exposes. Expose this new service at the SA on the same port as the existing service, but with (in config-next) different sets of allowed clients. In the future, read-only clients will be removed from the read-write service's set of allowed clients. Part of #6454	2022-11-09 11:09:12 -08:00
Aaron Gable	4466c953de	CA: Expose all gRPC services on single address (#6495 ) Now that we have the ability to easily add multiple gRPC services to the same server, and control access to each service individually, use that capability to expose the CA's CertificateAuthority, OCSPGenerator, and CRLGenerator services all on the same address/port. This will make establishing connections to the CA easier, but no less secure. Part of #6448	2022-11-08 15:28:59 -08:00
Aaron Gable	46c8d66c31	bgrpc.NewServer: support multiple services (#6487 ) Turn bgrpc.NewServer into a builder-pattern, with a config-based initialization, multiple calls to Add to add new gRPC services, and a final call to Build to produce the start() and stop() functions which control server behavior. All calls are chainable to produce compact code in each component's main() function. This improves the process of creating a new gRPC server in three ways: 1) It avoids the need for generics/templating, which was slightly verbose. 2) It allows the set of services to be registered on this server to be known ahead of time. 3) It greatly streamlines adding multiple services to the same server, which we use today in the VA and will be using soon in the SA and CA. While we're here, add a new per-service config stanza to the GRPCServerConfig, so that individual services on the same server can have their own configuration. For now, only provide a "ClientNames" key, which will be used in a follow-up PR. Part of #6454	2022-11-04 13:26:42 -07:00
Aaron Gable	6efd941e3c	Stabilize CRL shard boundaries (#6445 ) Add two new config keys to the crl-updater: * shardWidth, which controls the width of the chunks that we divide all of time into, with a default value of "16h" (approximately the same as today's shard width derived from 128 shards covering 90 days); and * lookbackPeriod, which controls the amount of already-expired certificates that should be included in our CRLs to ensure that even certificates which are revoked immediately before they expire still show up in aborts least one CRL, with a default value of "24h" (approximately the same as today's lookback period derived from our run frequency of 6h). Use these two new values to change the way CRL shards are computed. Previously, we would compute the total time we care about based on the configured certificate lifetime (to determine how far forward to look) and the configured update period (to determine how far back to look), and then divide that time evenly by the number of shards. However, this method had two fatal flaws. First, if the certificate lifetime is configured incorrectly, then the CRL updater will fail to query the database for some certs that should be included in the CRLs. Second, if the update period is changed, this would change the lookback period, which in turn would change the shard width, causing all CRL entries to suddenly change which shard they're in. Instead, first compute all chunk locations based only on the shard width and number of shards. Then determine which chunks we need to care about based on the configured lookback period and by querying the database for the farthest-future expiration, to ensure we cover all extant certificates. This may mean that more than one chunk of time will get mapped to a single shard, but that's okay -- each chunk will remain mapped to the same shard for the whole time we care about it. Fixes #6438 Fixes #6440	2022-10-27 15:59:48 -07:00
Aaron Gable	ab4b1eb3e1	Add ROCSPStage7 flag to disable OCSP calls (#6461 ) Rather than simply refusing to write OCSP Response bytes to the database (which is what ROCSP Stage 6 did), Stage 7 refuses to even generate those bytes in the first place. We obviously can't disable OCSP Response generation in the CA, since it still needs to be usable by the ocsp-responder's live-signing path, so instead we disable it in all of the non-live-signing codepaths (orphan finder, issue precertificate, revoke certificate, and re-revoke certificate) which have previously called GenerateOCSP. Part of #6285	2022-10-21 17:24:19 -07:00
Aaron Gable	02432fcd51	RA: Use OCSPGenerator gRPC service (#6453 ) When the RA is generating OCSP (as part of new issuance, revocation, or when its own GenerateOCSP method is called by the ocsp-responder) have it use the CA's dedicated OCSPGenerator service, rather than calling the method exposed by the CA's catch-all CertificateAuthority service. To facilitate this, add a new GRPCClientConfig stanza to the RA. This change will allow us to remove the GenerateOCSP and GenerateCRL methods from the catch-all CertificateAuthority service, allowing us to independently control which kinds of objects the CA is willing to sign by turning off individual service interfaces. The RA's new config stanza will need to be populated in prod before further changes are possible. Fixes #6451	2022-10-21 15:37:01 -07:00
Aaron Gable	30d8f19895	Deprecate ROCSP Stage 1, 2, and 3 flags (#6460 ) These flags are set in both staging and prod. Deprecate them, make all code gated behind them the only path, and delete code (multi_source) which was only accessible when these flags were not set. Part of #6285	2022-10-21 14:58:34 -07:00
Aaron Gable	272625b4a4	Add CRLDPBase config key to boulder-ca (#6442 ) Add a new configuration key to the CA which allows us to specify the "base URL" for our CRLs. This will be necessary before including an Issuing Distribution Point extension in our CRLs, or a CRL Distribution Point in our certificates. Part of #6410	2022-10-11 08:55:25 -07:00
Samantha	9c12e58c7b	grpc: Allow static host override in client config (#6423 ) - Add a new gRPC client config field which overrides the dNSName checked in the certificate presented by the gRPC server. - Revert all test gRPC credentials to `<service>.boulder` - Revert all ClientNames in gRPC server configs to `<service>.boulder` - Set all gRPC clients in `test/config` to use `serverAddress` + `hostOverride` - Set all gRPC clients in `test/config-next` to use `srvLookup` + `hostOverride` - Rename incorrect SRV record for `ca` with port `9096` to `ca-ocsp` - Rename incorrect SRV record for `ca` with port `9106` to `ca-crl` Resolves #6424	2022-10-03 15:23:55 -07:00
Jacob Hoffman-Andrews	46e41ca8bd	expiration-mailer: allow limiting UPDATE statement (#6400 ) This avoids the statements getting so big they can't run. Also, drive-by add some comments to the expiration-mailer config.	2022-09-26 12:07:31 -07:00
Samantha	ffad58009e	grpc: Backend discovery improvements (#6394 ) - Fork the default `dns` resolver from `go-grpc` to add backend discovery via DNS SRV resource records. - Add new fields for SRV based discovery to `cmd.GRPCClientConfig` - Add new (optional) field `DNSAuthority` for specifying custom DNS server to `cmd.GRPCClientConfig` - Add a utility method to `cmd.GRPCClientConfig` to simplify target URI and host construction. With three schemes and `DNSAuthority` it makes more sense to handle all of this parsing and construction outside of the RPC client constructor. Resolves #6111	2022-09-23 13:11:59 -07:00
Samantha	90eb90bdbe	test: Replace sd-test-srv with consul (#6389 ) - Add a dedicated Consul container - Replace `sd-test-srv` with Consul - Add documentation for configuring Consul - Re-issue all gRPC credentials for `<service-name>.service.consul` Part of #6111	2022-09-19 16:13:53 -07:00
Jacob Hoffman-Andrews	db044a8822	log: fix spurious honeycomb warnings; improve stdout logger (#6364 ) Honeycomb was emitting logs directly to stderr like this: ``` WARN: Missing API Key. WARN: Dataset is ignored in favor of service name. Data will be sent to service name: boulder ``` Fix this by providing a fake API key and replacing "dataset" with "serviceName" in configs. Also add missing Honeycomb configs for crl-updater. For stdout-only logger, include checksums and escape newlines.	2022-09-14 11:25:02 -07:00
Jacob Hoffman-Andrews	797f3c7217	responder: return InternalError for expired responses (#6377 ) This was masking a bug, because the integration test for OCSP responses for expired certificates was looking for the "unauthorized" OCSP response status. Which we were returning, even though our HTTP-level response code was 533.	2022-09-14 11:24:46 -07:00
Jacob Hoffman-Andrews	3a72f6b0a9	Reject SHA-1 CSRs in config-next (#6374 )	2022-09-12 16:34:07 -07:00
Samantha	78ea1d2c9d	SA: Use separate schema for incidents tables (#6350 ) - Move incidents tables from `boulder_sa` to `incidents_sa` (added in #6344) - Grant read perms for all tables in `incidents_sa` - Modify unit tests to account for new schema and grants - Add database cleaning func for `boulder_sa` - Adjust cleanup funcs to omit `sql-migrate` tables instead of `goose` Resolves #6328	2022-09-09 15:17:14 -07:00
Jacob Hoffman-Andrews	dd1c52573e	log: allow logging to stdout/stderr instead of syslog (#6307 ) Right now, Boulder expects to be able to connect to syslog, and panics if it's not available. We'd like to be able to log to stdout/stderr as a replacement for syslog. - Add a detailed timestamp (down to microseconds, same as we collect in prod via syslog). - Remove the escape codes for colorizing output. - Report the severity level numerically rather than with a letter prefix. Add locking for stdout/stderr and syslog logs. Neither the [syslog] package nor the [os] package document concurrency-safety, and the Go rule is: if it's not documented to be concurrent-safe, it's not. Notably the [log.Logger] package is documented to be concurrent-safe, and a look at its implementation shows it uses a Mutex internally. Remove places that use the singleton `blog.Get()`, and instead pass through a logger from main in all the places that need it. [syslog]: https://pkg.go.dev/log/syslog [os]: https://pkg.go.dev/os [log.Logger]: https://pkg.go.dev/log#Logger	2022-08-29 06:19:22 -07:00
Aaron Gable	c1be8cfc52	crl-storer: load whole AWS config files (#6309 ) Allow the crl-storer to load whole AWS config files. Although this requires a deployment to maintain an additional config files for the crl-storer, and one in a format we usually don't use, it does give us lots of flexibility in setting up things like role assumption. Also remove the S3Region config flag, as it is now redundant with the contents of the config file, and rename the existing S3CredsFile config key to AWSCredsFile to better represent its true contents. Fixes #6308	2022-08-23 11:04:12 -07:00
Aaron Gable	b001af71e8	Add new services to log-validator test config (#6303 ) Fixes #6289	2022-08-17 16:46:11 -07:00
Aaron Gable	09195e6804	ocsp-responder: get minimal status info from SA (#6293 ) Add a new `GetRevocationStatus` gRPC method to the SA which retrieves only the subset of the certificate status metadata relevant to revocation, namely whether the certificate has been revoked, when it was revoked, and the revocation reason. Notably, this method is our first use of the `goog.protobuf.Timestamp` type in a message, which is more ergonomic and less prone to errors than using unix nanoseconds. Use this new method in ocsp-responder's checked_redis_source, to avoid having to send many other pieces of metadata and the full ocsp response bytes over the network. It provides all the information necessary to determine if the response from Redis is up-to-date. Within the checked_redis_source, use this new method in two different ways: if only a database connection is configured (as is the case today) then get this information directly from the db; if a gRPC connection to the SA is available then prefer that instead. This may make requests slower, but will allow us to remove database access from the hosts which run the ocsp-responder today, simplifying our network. The new behavior consists of two pieces, each locked behind a config gate: - Performing the smaller database query is only enabled if the ocsp-responder has the `ROCSPStage3` feature flag enabled. - Talking to the SA rather than the database directly is only enabled if the ocsp-responder has an `saService` gRPC stanza in its config. Fixes #6274	2022-08-16 16:37:24 -07:00
Aaron Gable	3a12177eab	ROCSP Stage 6: Never write OCSP responses to DB (#6284 ) Create a new `ROCSPStage6` feature flag which affects the behavior of the SA. When enabled, this flag causes the `AddPrecertificate`, `RevokeCertificate`, and `UpdateRevokedCertificate` methods to ignore the OCSP response bytes provided by their caller. They will no longer error out if those bytes are missing, and if the bytes are present they will still not be written to the database. This allows us to, in the future, cause the RA and CA to stop generating those OCSP responses entirely, and stop providing them to the SA, without causing any errors when we do. Part of #6079	2022-08-10 15:31:26 -07:00
Aaron Gable	93d3e0b9e5	Enable early ROCSP stages in integration tests (#6280 ) For some reason ROCSPStage3 was enabled without also enabling ROCSP Stages 1 and 2. Fix the oversight so we're actually running all of the first three ROCSP stages in config-next integration tests.	2022-08-10 12:40:18 -07:00
Aaron Gable	6a9bb399f7	Create new crl-storer service (#6264 ) Create a new crl-storer service, which receives CRL shards via gRPC and uploads them to an S3 bucket. It ignores AWS SDK configuration in the usual places, in favor of configuration from our standard JSON service config files. It ensures that the CRLs it receives parse and are signed by the appropriate issuer before uploading them. Integrate crl-updater with the new service. It streams bytes to the crl-storer as it receives them from the CA, without performing any checking at the same time. This new functionality is disabled if the crl-updater does not have a config stanza instructing it how to connect to the crl-storer. Finally, add a new test component, the s3-test-srv. This acts similarly to the existing mail-test-srv: it receives requests, stores information about them, and exposes that information for later querying by the integration test. The integration test uses this to ensure that a newly-revoked certificate does show up in the next generation of CRLs produced. Fixes #6162	2022-08-08 16:22:48 -07:00
Samantha	576b6777b5	grpc: Implement a static multiple IP address gRPC resolver (#6270 ) - Implement a static resolver for the gPRC dialer under the scheme `static:///` which allows the dialer to resolve a backend from a static list of IPv4/IPv6 addresses passed via the existing JSON config. - Add config key `serverAddresses` to the `GRPCClientConfig` which, when populated, enables static IP resolution of gRPC server backends. - Set `config-next` to use static gRPC backend resolution for all SA clients. - Generate a new SA certificate which adds `10.77.77.77` and `10.88.88.88` to the SANs. Resolves #6255	2022-08-05 10:20:57 -07:00
Jacob Hoffman-Andrews	b6c4d9bc21	ocsp/responder: add checked Redis source (#6272 ) Add checkedRedisSource, a new OCSP Source which gets responses from Redis, gets metadata from the database, and only serves the Redis response if it matches the authoritative metadata. If there is a mismatch, it requests a new OCSP response from the CA, stores it in Redis, and serves the new response. This behavior is locked behind a new ROCSPStage3 feature flag. Part of #6079	2022-08-04 16:22:14 -07:00
Aaron Gable	694d73d67b	crl-updater: add UpdateOffset config to run on a schedule (#6260 ) Add a new config key `UpdateOffset` to crl-updater, which causes it to run on a regular schedule rather than running immediately upon startup and then every `UpdatePeriod` after that. It is safe for this new config key to be omitted and take the default zero value. Also add a new command line flag `runOnce` to crl-updater which causes it to immediately run a single time and then exit, rather than running continuously as a daemon. This will be useful for integration tests and emergency situations. Part of #6163	2022-07-29 13:30:16 -07:00
Aaron Gable	9ae16edf51	Fix race condition in revocation integration tests (#6253 ) Add a new filter to mail-test-srv, allowing test processes to query for messages sent from a specific address, not just ones sent to a specific address. This fixes a race condition in the revocation integration tests where the number of messages sent to a cert's contact address would be higher than expected because expiration mailer sent a message while the test was running. Also reduce bad-key-revoker's maximum backoff to 2 seconds to ensure that it continues to run frequently during the integration tests, despite usually not having any work to do. While we're here, also improve the comments on various revocation integration tests, remove some unnecessary cruft, and split the tests out to explicitly test functionality with the MozRevocationReasons flag both enabled and disabled. Also, change ocsp_helper's default output from os.Stdout to ioutil.Discard to prevent hundreds of lines of log spam when the integration tests fail during a test that uses that library. Fixes #6248	2022-07-29 09:23:50 -07:00
Jacob Hoffman-Andrews	243bcd7e8c	rocsp: plumb through more config options (#6244 ) This allows configuring Boulder to talk to read-only replicas, and decide on a routing policy (random or by latency).	2022-07-22 12:17:17 -07:00
Jacob Hoffman-Andrews	3b09571e70	ocsp-responder: add LiveSigningPeriod (#6237 ) Previously we used "ExpectedFreshness" to control how frequently the Redis source would request re-signing of stale entries. But that field also controls whether multi_source is willing to serve a MariaDB response. It's better to split these into two values.	2022-07-20 15:36:38 -07:00
Jacob Hoffman-Andrews	29724cb0b7	ocsp/responder: update Redis source to use live signing (#6207 ) This enables ocsp-responder to talk to the RA and request freshly signed OCSP responses. ocsp/responder/redis_source is moved to ocsp/responder/redis/redis_source.go and significantly modified. Instead of assuming a response is always available in Redis, it wraps a live-signing source. When a response is not available, it attempts a live signing. If live signing succeeds, the Redis responder returns the result right away and attempts to write a copy to Redis on a goroutine using a background context. To make things more efficient, I eliminate an unneeded ocsp.ParseResponse from the storage path. And I factored out a FakeResponse helper to make the unittests more manageable. Commits should be reviewable one-by-one. Fixes #6191	2022-07-18 10:47:14 -07:00
Aaron Gable	436061fb35	CRL: Create crl-updater service (#6212 ) Create a new service named crl-updater. It is responsible for maintaining the full set of CRLs we issue: one "full and complete" CRL for each currently-active Issuer, split into a number of "shards" which are essentially CRLs with arbitrary scopes. The crl-updater is modeled after the ocsp-updater: it is a long-running standalone service that wakes up periodically, does a large amount of work in parallel, and then sleeps. The period at which it wakes to do work is configurable. Unlike the ocsp-responder, it does all of its work every time it wakes, so we expect to set the update frequency at 6-24 hours. Maintaining CRL scopes is done statelessly. Every certificate belongs to a specific "bucket", given its notAfter date. This mapping is generally unchanging over the life of the certificate, so revoked certificate entries will not be moving between shards upon every update. The only exception is if we change the number of shards, in which case all of the bucket boundaries will be recomputed. For more details, see the comment on `getShardBoundaries`. It uses the new SA.GetRevokedCerts method to collect all of the revoked certificates whose notAfter timestamps fall within the boundaries of each shard's time-bucket. It uses the new CA.GenerateCRL method to sign the CRLs. In the future, it will send signed CRLs to the crl-storer to be persisted outside our infrastructure. Fixes #6163	2022-07-08 09:34:51 -07:00
Jacob Hoffman-Andrews	223bda0cec	ocsp-updater: remove Redis support (#6201 )	2022-06-30 11:42:53 -07:00
Aaron Gable	e13918b50e	CA: Add GenerateCRL gRPC method (#6187 ) Add a new CA gRPC method named `GenerateCRL`. In the style of the existing `GenerateOCSP` method, this new endpoint is implemented as a separate service, for which the CA binary spins up an additional gRPC service. This method uses gRPC streaming for both its input and output. For input, the stream must contain exactly one metadata message identifying the crl number, issuer, and timestamp, and then any number of messages identifying a single certificate which should be included in the CRL. For output, it simply streams chunks of bytes. Fixes #6161	2022-06-29 11:03:12 -07:00
Aaron Gable	3000339dee	Reject CSRs with duplicate extensions (#6153 ) This behavior will be on by default in go1.19, so let's turn it on ourselves now to ensure there won't be any breakage when we upgrade in August.	2022-06-17 13:13:30 -07:00
Jacob Hoffman-Andrews	fda4124471	expiration-mailer: truncate serials and dns names (#6148 ) This avoids sending excessively large emails and excessively large log lines. Fixes #6085	2022-06-14 15:48:00 -07:00
Aaron Gable	f7ab64f05b	Remove last references to CFSSL (#6155 ) Just a docs and config cleanup.	2022-06-14 14:22:34 -07:00
Aaron Gable	11544756bb	Support new Google CT Policy (#6082 ) Add a new code path to the ctpolicy package which enforces Chrome's new CT Policy, which requires that SCTs come from logs run by two different operators, rather than one Google and one non-Google log. To achieve this, invert the "race" logic: rather than assuming we always have two groups, and racing the logs within each group against each other, we now race the various groups against each other, and pick just one arbitrary log from each group to attempt submission to. Ensure that the new code path does the right thing by adding a new zlint which checks that the two SCTs embedded in a certificate come from logs run by different operators. To support this lint, which needs to have a canonical mapping from logs to their operators, import the Chrome CT Log List JSON Schema and autogenerate Go structs from it so that we can parse a real CT Log List. Also add flags to all services which run these lints (the CA and cert-checker) to let them load a CT Log List from disk and provide it to the lint. Finally, since we now have the ability to load a CT Log List file anyway, use this capability to simplify configuration of the RA. Rather than listing all of the details for each log we're willing to submit to, simply list the names (technically, Descriptions) of each log, and look up the rest of the details from the log list file. To support this change, SRE will need to deploy log list files (the real Chrome log list for prod, and a custom log list for staging) and then update the configuration of the RA, CA, and cert-checker. Once that transition is complete, the deletion TODOs left behind by this change will be able to be completed, removing the old RA configuration and old ctpolicy race logic. Part of #5938	2022-05-25 15:14:57 -07:00
Jacob Hoffman-Andrews	76f987a1df	Reland "Allow expiration mailer to work in parallel" (#6133 ) This reverts commit `7ef6913e71`. We turned on the `ExpirationMailerDontLookTwice` feature flag in prod, and it's working fine but not clearing the backlog. Since https://github.com/letsencrypt/boulder/pull/6100 fixed the issue that caused us to (nearly) stop sending mail when we deployed #6057, this should be safe to roll forward. The revert of the revert applied cleanly, except for expiration-mailer/main.go and `main_test.go`, particularly around the contents `processCerts` (where `sendToOneRegID` was extracted from) and `sendToOneRegID` itself. So those areas are good targets for extra attention.	2022-05-23 16:16:43 -07:00
Jacob Hoffman-Andrews	be893678bd	expiration-mailer: feature-gate bug fix (#6122 ) We recently landed a fix so the expiration-mailer won't look twice at the same certificate. This will cause an immediate behavior change when it is deployed, and that might have surprising effects. Put the fix behind a feature flag so we can control when it rolls out more carefully.	2022-05-16 14:17:23 -07:00
Jacob Hoffman-Andrews	a4ba9b1adb	rocsp/config: fix PoolSize comment (#6110 ) The go-redis docs say default is 10 * NumCPU, but the actual code says 5. Extra context: `2465baaab5/options.go (L143-L145)` `2465baaab5/cluster.go (L96-L98)` For Options, the default (documented) is 10 * NumCPUs. For ClusterOptions, the default (undocumented) is 5 * NumCPUs. We use ClusterOptions. Also worth noting: for ClusterOptions, the limit is per node.	2022-05-12 16:29:26 -07:00
Jacob Hoffman-Andrews	25e4b7e7fa	expiration-mailer: Deprecate NagCheckInterval (#6103 ) This was introduced when expiration-mailer was run by cron, and was a way for expiration-mailer to know something about its expected run interval so it could send notifications "on time" rather than "just after" the configured email time. Now that expiration-mailer runs as a daemon we can simply pull this value from `Frequency`, which is set to the same value in prod.	2022-05-12 16:28:42 -07:00
Aaron Gable	7ef6913e71	Revert "Allow expiration mailer to work in parallel" (#6080 ) When deployed, the newly-parallel expiration-mailer encountered unexpected difficulties and dropped to apparently sending nearly zero emails despite not throwing any real errors. Reverting the parallelism change until we understand and can fix the root cause. This reverts two commits: - Allow expiration mailer to work in parallel (#6057) - Fix data race in expiration-mailer test mocks (#6072) It also modifies the revert to leave the new `ParallelSends` config key in place (albeit completely ignored), so that the binary containing this revert can be safely deployed regardless of config status. Part of #5682	2022-05-03 13:18:40 -07:00
Jacob Hoffman-Andrews	9629c88d66	Allow expiration mailer to work in parallel (#6057 ) Previously, each accounts email would be sent in serial, along with several reads from the database (to check for certificate renewal) and several writes to the database (to update `certificateStatus.lastExpirationNagSent`). This adds a config field for the expiration mailer that sets the parallelism it will use. That means making and using multiple SMTP connections as well. Previously, `bmail.Mailer` was not safe for concurrent use. It also had a piece of API awkwardness: after you created a Mailer, you had to call Connect on it to change its state. Instead of treating that as a state change on Mailer, I split out a separate component: `bmail.Conn`. Now, when you call `Mailer.Connect()`, you get a Conn. You can send mail on that Conn and Close it when you're done. A single Mailer instance can produce multiple Conns, so Mailer is now concurrency-safe (while Conn is not). This involved a moderate amount of renaming and code movement, and GitHub's move detector is not keeping up 100%, so an eye towards "is this moved code?" may help. Also adding `?w=1` to the diff URL to ignore whitespace diffs.	2022-04-21 18:04:55 -07:00
Jacob Hoffman-Andrews	4467cf27db	Update config from config-next (#6051 ) This copies over settings from config-next that are now deployed in prod. Also, I updated a comment in sd-test-srv to more accurately describe how SRV records work.	2022-04-19 12:10:26 -07:00
Aaron Gable	dab8a71b0e	Use new RA methods from WFE revocation path (#5983 ) Simplify the WFE `RevokeCertificate` API method in three ways: - Remove most of the logic checking if the requester is authorized to revoke the certificate in question (based on who is making the request, what authorizations they have, and what reason they're requesting). That checking is now done by the RA. Instead, simply verify that the JWS is authenticated. - Remove the hard-to-read `authorizedToRevoke` callbacks, and make the `revokeCertBySubscriberKey` (nee `revokeCertByKeyID`) and `revokeCertByCertKey` (nee `revokeCertByJWK`) helpers much more straight-line in their execution logic. - Call the RA's new `RevokeCertByApplicant` and `RevokeCertByKey` gRPC methods, rather than the deprecated `RevokeCertificateWithReg`. This change, without any flag flips, should be invisible to the end-user. It will slightly change some of our log message formats. However, by now relying on the new RA gRPC revocation methods, this change allows us to change our revocation policies by enabling the `AllowDoubleRevocation` and `MozRevocationReasons` feature flags, which affect the behavior of those new helpers. Fixes #5936	2022-03-28 14:14:11 -07:00
Samantha	7c22b99d63	akamai-purger: Improve throughput and configuration safety (#6006 ) - Add new configuration key `throughput`, a mapping which contains all throughput related akamai-purger settings. - Deprecate configuration key `purgeInterval` in favor of `purgeBatchInterval` in the new `throughput` configuration mapping. - When no `throughput` or `purgeInterval` is provided, the purger uses optimized default settings which offer 1.9x the throughput of current production settings. - At startup, all throughput related settings are modeled to ensure that we don't exceed the limits imposed on us by Akamai. - Queue is now `[][]string`, instead of `[]string`. - When a given queue entry is purged we know all 3 of it's URLs were purged. - At startup we know the size of a theoretical request to purge based on the number of queue entries included - Raises the queue size from ~333-thousand cached OCSP responses to 1.25-million, which is roughly 6 hours of work using the optimized default settings - Raise `purgeInterval` in test config from 1ms, which violates API limits, to 800ms Fixes #5984	2022-03-23 17:23:07 -07:00
Andrew Gabbitas	79048cffba	Support writing initial OCSP response to redis (#5958 ) Adds a rocsp redis client to the sa if cluster information is provided in the sa config. If a redis cluster is configured, all new certificate OCSP responses added with sa.AddPrecertificate will attempt to be written to the redis cluster, but will not block or fail on errors. Fixes: #5871	2022-03-21 20:33:12 -06:00
Aaron Gable	07d56e3772	Add new, simpler revocation methods to RA (#5969 ) Add two new gRPC methods to the SA: - `RevokeCertByKey` will be used when the API request was signed by the certificate's keypair, rather than a Subscriber keypair. If the request is for reason `keyCompromise`, it will ensure that the key is added to the blocked keys table, and will attempt to "re-revoke" a certificate that was already revoked for some other reason. - `RevokeCertByApplicant` supports both the path where the original subscriber or another account which has proven control over all of the identifier in the certificate requests revocation via the API. It does not allow the requested reason to be `keyCompromise`, as these requests do not represent a demonstration of key compromise. In addition, add a new feature flag `MozRevocationReasons` which controls the behavior of these new methods. If the flag is not set, they behave like they have historically (see above). If the flag is set to true, then the new methods enforce the upcoming Mozilla policies around revocation reasons, namely: - Only the original Subscriber can choose the revocation reason; other clients will get a set reason code based on the method of requesting revocation. When the original Subscriber requests reason `keyCompromise`, this request will be honored, but the key will not be blocked and other certificates with that key will not also be revoked. - Revocations signed with the certificate key will always get reason `keyCompromise`, because we do not know who is sending the request and therefore must assume that the use of the key in this way represents compromise. Because these requests will always be fore reason `keyCompromise`, they will always be added to the blocked keys table and they will always attempt "re-revocation". - Revocations authorized via control of all names in the cert will always get reason `cessationOfOperation`, which is to be used when the original Subscriber does not control all names in the certificate anymore. Finally, update the existing `AdministrativelyRevokeCertificate` method to use the new helper functions shared by the two new methods. Part of #5936	2022-03-14 08:58:17 -07:00
Andrew Gabbitas	d006588f46	Orphan finder: Fix redundant syslog config value (#5971 ) Replace redundant stdoutlevel with a sysloglevel value in test configs.	2022-02-24 14:24:03 -08:00
Aaron Gable	114d10a6cb	Integrate goodkey checks into cert-checker (#5870 )	2022-01-11 09:42:12 -08:00
Jacob Hoffman-Andrews	1c573d592b	Add account cache to WFE (#5855 ) Followup from #5839. I chose groupcache/lru as our LRU cache implementation because it's part of the golang org, written by one of the Go authors, and very simple and easy to read. This adds an `AccountGetter` interface that is implemented by both the AccountCache and the SA. If the WFE config includes an AccountCache field, it will wrap the SA in an AccountCache with the configured max size and expiration time. We set an expiration time on account cache entries because we want a bounded amount of time that they may be stale by. This will be used in conjunction with a delay on account-updating pathways to ensure we don't allow authentication with a deactivated account or changed key. The account cache stores corepb.Registration objects because protobufs have an established way to do a deep copy. Deep copies are important so the cache can maintain its own internal state and ensure nothing external is modifying it. As part of this process I changed construction of the WFE. Previously, "SA" and "RA" were public fields that were mutated after construction. Now they are parameters to the constructor, along with the new "accountGetter" parameter. The cache includes stats for requests categorized by hits and misses.	2021-12-15 11:10:23 -08:00
Aaron Gable	89000bd61c	Add close-primes detection via Fermat's factorization (#5853 ) Add a new check to GoodKey which attempts to factor the public modulus of the presented key using Fermat's factorization method. This method will succeed if and only if the prime factors are very close to each other -- i.e. almost certainly were not selected independently from a random uniform distribution, but were instead calculated via some other less secure method. To support this new feature, add a new config flag to the RA, CA, and WFE, which all use the GoodKey checks. As part of adding this new config value, refactor the GoodKey config items into their own config struct which can be re-used across all services. If the new `FermatRounds` config value has not been set, it will default to zero, causing no factorization to be attempted. Fixes #5850 Part of #5851	2021-12-14 09:19:33 -08:00
Aaron Gable	5c02deabfb	Remove wfe1 integration tests (#5840 ) These tests are testing functionality that is no longer in use in production deployments of Boulder. As we go about removing wfe1 functionality, these tests will break, so let's just remove them wholesale right now. I have verified that all of the tests removed in this PR are duplicated against wfe2. One of the changes in this PR is to cease starting up the wfe1 process in the integration tests at all. However, that component was serving requests for the AIA Issuer URL, which gets queried by various OCSP and revocation tests. In order to keep those tests working, this change also adds an integration-test-only handler to wfe2, and updates the CA configuration to point at the new handler. Part of #5681	2021-12-10 12:40:22 -08:00
Jacob Hoffman-Andrews	3d7206a183	ocsp-updater: add support for writing to Redis (#5825 ) If configured, ocsp-updater will write responses to Redis in parallel with MariaDB, giving up if Redis is slower and incrementing a stat. Factors out the ShortIDIssuer concept from rocsp-tool into rocsp_config.	2021-12-06 14:46:46 -08:00
Andrew Gabbitas	cbd24db64b	Add ocsp-responder redis lookup support (#5800 ) This is the first step in moving OCSP responses from mysql to redis. Adds support for parallel lookups to mysql and redis. The mysql source remains the source of truth. If the secondaryLookup [redis] succeeds, compare against the primaryLookup [mysql] and return if they concur that the status is the same and the redis source is at least as fresh as mysql. There are checks on the database response for `certStatus.IsExpired`, `certStatus.OCSPLastUpdated.IsZero()` and `!src.filter.responseMatchesIssuer`. The expired check isn't necessary for redis because the response will be set with a ttl and drop out of redis when it reaches the ttl, and delivering a response for an expired certificate until that happens isn't a problem. The `certStatus.OCSPLastUpdated.IsZero()` check is a MySQL check that isn't needed in redis. The `responseMatchesIssuer` check is important and will need to be checked in some form before MySQL is no longer the source of truth. There is another project to check issuer for responses and isn't scoped for this change.	2021-12-06 10:47:05 -07:00
Aaron Gable	c7643992a0	Enable USE INDEX hints when querying authz2 table (#5823 ) Add a new feature flag `GetAuthzUseIndex` which causes the SA to add `USE INDEX (regID_identifer_status_expires_idx)` to its authz2 database queries. This should encourage the query planner to actually use that index instead of falling back to large table-scans. Fixes #5822	2021-12-01 14:48:09 -08:00
Aaron Gable	8eb7272adf	SA: Use read-only connector for GetAuthorizations2 (#5815 ) Add a feature flag which causes the SA to switch between using the traditional read-write database connector (pointed at the primary db) or the newer read-only database connector (usually pointed at a replica) when executing the `GetAuthorizations2` query.	2021-11-24 16:57:42 -08:00
Aaron Gable	1a1cd24237	Add tests for the experimental `renewalInfo` endpoint (#5750 ) Add a unit test and an integration test that both exercise the new experimental ACME Renewal Info endpoint. These tests do not yet validate the contents of the response, just that the appropriate HTTP response code is returned, but they will be developed as the code under test evolves. Fixes #5674	2021-10-27 15:00:56 -07:00
Jacob Hoffman-Andrews	ba0ea090b2	integration: save hierarchy across runs (#5729 ) This allows repeated runs using the same hiearchy, and avoids spurious errors from ocsp-updater saying "This CA doesn't have an issuer cert with ID XXX" Fixes #5721	2021-10-20 17:06:33 -07:00
Jacob Hoffman-Andrews	dc742fc320	Fix expiration-mailer integration test locally. (#5719 ) The expiration mailer processes certificates in batches of size `certLimit` (default 100). In production, it runs in daemon mode, so it will go on to the next batch when the current one is done. However, in local integration tests we rely on it getting all its work done in a single run. This works when you're running from a clean slate, but if you've run integration tests a bunch of times, there will be a bunch of certificates from previous runs that clog up the queue, and it won't send mail for the specific certificate the integration test is looking for. Solution: Set `certLimit` very high in the config. Also, update the default times for sending mail to match what we have in prod.	2021-10-18 19:51:34 -07:00
Aaron Gable	e0c3e2c1df	Reject unrecognized config keys (#5649 ) Instead of using the default `json.Unmarshal`, explicitly construct and use a `json.Decoder` so that we can set the `DisallowUnknownFields` flag on the decoder. This causes any unrecognized config keys to result in errors at boulder startup time. Fixes #5643	2021-09-24 10:13:44 -07:00
Aaron Gable	4ef9fb1b4f	Add new SA.NewOrderAndAuthzs gRPC method (#5602 ) Add a new method to the SA's gRPC interface which takes both an Order and a list of new Authorizations to insert into the database, and adds both (as well as the various ancillary rows) inside a transaction. To enable this, add a new abstraction layer inside the `db/` package that facilitates inserting many rows at once, as we do for the `authz2`, `orderToAuthz2`, and `requestedNames` tables in this operation. Finally, add a new codepath to the RA (and a feature flag to control it) which uses this new SA method instead of separately calling the `NewAuthorization` method multiple times. Enable this feature flag in the config-next integration tests. This should reduce the failure rate of the new-order flow by reducing the number of database operations by coalescing multiple inserts into a single multi-row insert. It should also reduce the incidence of new authorizations being created in the database but then never exposed to the subscriber because of a failure later in the new-order flow, both by reducing failures overall and by adding those authorizations in a transaction which will be rolled back if there is a later failure. Fixes #5577	2021-09-03 13:48:04 -07:00
J.C. Jones	0f16ff6d17	ocsp-updater: Split work by a configurable serial suffix shard (#5628 ) - Enable`ocsp-updater` to query for serials matching a configurable suffix to allow for multiple `ocsp-updater` instances at once - Add field `SerialSuffixShards` to `OCSPUpdaterConfig` - Add field `serialSuffixShards` to `test/config-next/ocsp-updater.json` - Add codepath to default to the previous query when `serialSuffixShards` is missing from the JSON config Part of #5629 Fixes #5625	2021-09-02 15:52:18 -07:00
Andrew Gabbitas	17f300387b	BadKeyRevoker: backoff on errors or no work (#5580 ) - Add exponential backoff - Add key `backoffIntervalMax` to JSON config with a default of `60s` Fixes #5559	2021-08-19 13:31:47 -07:00
Samantha	819d57ebdb	cert-checker: Use cmd.ConfigDuration instead of int to express acceptable validity periods (#5582 ) Add `acceptableValidityDurations` to cert-checker's config, which uses `cmd.ConfigDuration` instead of `int` to express the acceptable validity periods. Deprecate the older int-based `acceptableValidityPeriods`. This makes it easier to reason about the values in the configs, and brings this config into line with other configs (such as the CA). Fixes #5542	2021-08-17 08:52:22 -07:00
Aaron Gable	1c6842cf69	Delete expired-authz-purger2 (#5570 ) Delete the expired-authz-purger2 binary, as well as the various config files, tests, and test helpers that exist to support it. This utility is no longer necessary, as it has not been running for quite some time, and we have developed alternative means of keeping the growth of the authz table under control. Fixes #5568	2021-08-11 14:39:57 -07:00
Aaron Gable	ac3e5e70c4	Delete boulder-janitor (#5571 ) Delete the boulder-janitor binary, and the various configs and tests which exist to support it. This tool has not been actively running in quite some time. The tables which is covers are either supported by our more recent partitioning methods, or are rate-limit tables that we hope to move out of mysql entirely. The cost of maintaining the janitor is not offset by the benefits it brings us (or the lack thereof). Fixes #5569	2021-08-11 11:10:24 -07:00
J.C. Jones	7b31bdb30a	Add read-only dbConns to SQLStorageAuthority and OCSPUpdater (#5555 ) This changeset adds a second DB connect string for the SA for use in read-only queries that are not themselves dependencies for read-write queries. In other words, this is attempting to only catch things like rate-limit `SELECT`s and other coarse-counting, so we can potentially move those read queries off the read-write primary database. It also adds a second DB connect string to the OCSP Updater. This is a little trickier, as the subsequent `UPDATE`s _are_ dependent on the output of the `SELECT`, but in this case it's operating on data batches, and a few seconds' replication latency are several orders of magnitude below the threshold for update frequency, so any certificates that aren't caught on run `n` can be caught on run `n+1`. Since we export DB metrics to Prometheus, this also refactors `InitDBMetrics` to take a DB Address (host:port tuple) and User out of the DB connection DSN and include those as labels in the metrics. Fixes #5550 Fixes #4985	2021-08-02 11:21:34 -07:00
Aaron Gable	3a0996b147	Update cert-checker config to include all params (#5509 ) These parameters are currently accepted by cert-checker either via the command line or via the config. Add them to the config-next config as we move towards deprecating their CLI equivalents. Part of #5489	2021-07-09 10:17:15 -07:00
Aaron Gable	20f1bf1d0d	Compute validity periods inclusive of notAfter second (#5494 ) In the CA, compute the notAfter timestamp such that the cert is actually valid for the intended duration, not for one second longer. In the Issuance library, compute the validity period by including the full length of the final second indicated by the notAfter date when determining if the certificate request matches our profile. Update tests and config files to match. Fixes #5473	2021-06-24 13:17:29 -07:00
Aaron Gable	b087e6a2bf	cert-checker: take validity period from config (#5490 ) Add a new `acceptableValidityPeriods` field to cert-checker's config. This field is a list of integers representing validity periods measured in seconds (so 7776000 is 90 days). This field is multi-valued to enable transitions between different validity periods (e.g. 90 days + 1 second to 90 days, or 90 days to 30 days). If the field is not provided, cert-checker defaults to 90 days. Also update the way that cert-checker computes the validity period of the certificates it is checking to include the full width of the final second represented by the notAfter timestamp. Finally, update the tests to support this new behavior. Fixes #5472	2021-06-17 17:29:47 -07:00
Aaron Gable	6e1357efa3	Update boulder test validity period to match prod (#5493 ) In prod, the CA is now configured to issue certificates with notAfter timestamps 7775999 seconds after their notBefore timestamp, and to enforce that same difference when validating issuance requests. Update our test configs to match.	2021-06-16 18:08:57 -07:00
Samantha	be1c24165e	test: Fix uppercase ECDSAAllowListFilename in test JSON configs (#5487 )	2021-06-16 14:24:30 -07:00
Andrew Gabbitas	b5aab29407	Make boulder-observer HTTP User-Agent configurable (#5484 ) - Make User-Agent configurable in config file - Fix README example - Add tests	2021-06-14 11:08:18 -06:00
Samantha	6955df0f56	contact-auditor: Add tool to audit registration contacts (#5425 ) Add tool to audit subscriber registrations for e-mail addresses that `notify-mailer` is currently configured to skip. - Add `cmd/contact-auditor` with README - Add test coverage for `cmd/contact-auditor` - Add config file at `test/config/contact-auditor` Part of #5372	2021-06-07 14:21:54 -07:00
Aaron Gable	9abb39d4d6	Honeycomb integration proof-of-concept (#5408 ) Add Honeycomb tracing to all Boulder components which act as HTTP servers, gRPC servers, or gRPC clients. Add many values which we currently emit to logs to the trace spans. Add a way to configure the Honeycomb integration to our config files, and by default configure all of our tests to "mute" (send nothing). Followup changes will refine the configuration, attempt to reduce the new dependency load, and introduce better sampling. Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218	2021-05-24 16:13:08 -07:00
Samantha	1f19eee55b	CA: Fix startup bug caused by ECDSA allow list reloader (#5412 ) Solve a nil pointer dereference of `ecdsaAllowList` in `boulder-ca` by calling `reloader.New()` in constructor `ca.NewECDSAAllowListFromFile` instead. - Add missing entry `ECDSAAllowListFilename` to `test/config-next/ca-a.json` and `test/config-next/ca-b.json` - Add missing file ecdsaAllowList.yml to `test/config-next` - Add missing entry `ECDSAAllowedAccounts` to `test/config/ca-a.json` and `test/config/ca-b.json` - Move creation of the reloader to `NewECDSAAllowListFromFile` Fixes #5414	2021-05-17 14:41:15 -07:00
Aaron Gable	a19ebfa0e9	VA: Query SRV to preload/cache DNS resolver addrs (#5360 ) Abstract out the way that the bdns library keeps track of the resolvers it uses to do DNS lookups. Create one implementation, the `StaticProvider`, which behaves exactly the same as the old mechanism (providing whatever names or addresses were given in the config). Create another implementation, `DynamicProvider`, which re-resolves the provided name on a regular basis. The dynamic provider consumes a single name, does a lookup on that name for any SRV records suggesting that it is running a DNS service, and then looks up A records to get the address of all the names returned by the SRV query. It exports its successes and failures as a prometheus metric. Finally, update the tests and config-next configs to work with this new mechanism. Give sd-test-srv the capability to respond to SRV queries, and put the names it provides into docker's default DNS resolver. Fixes #5306	2021-04-20 10:11:53 -07:00
Aaron Gable	6e6be607fa	Deprecate StoreIssuerInfo flag (#5386 ) This flag is no longer referenced by any code, and can be safely deprecated. Part of #5079	2021-04-13 17:18:01 -07:00
Aaron Gable	0196d7e876	orphan-finder: use serial + issuerID for OCSP (#5383 ) Update orphan-finder's `generateOCSP` function to make its request to the CA using the certificate's serial number and issuer ID, rather than the full DER bytes. To facilitate this, add an `IssuerCerts` item to the orphan-finder config, and add an `issuers` map to its struct, mimicking fields of the same name and purpose on the RA. Leave the old code path in the `generateOCSP` method for now, to be fully removed after the new config has been deployed. Also update the unittests to use real on-disk certificates instead of inline strings, and similarly correct the integration test to use a certificate with the correct Issuer field. Part of #5079 Fixes #5149	2021-04-05 09:13:21 -07:00
Samantha	97e393d2e7	boulder-observer (#5315 ) Add configuration driven Prometheus black box metric exporter	2021-03-29 12:56:54 -07:00
Aaron Gable	547dbfc93a	Remove Common.DNSResolver from VA config (#5355 ) This field is not used by any production configs, so we can safely remove it. Also, add config fields for DNSTimeout and DNSAllowLoopbackAddress outside of the Common sub-struct, to allow for its removal later. Part of #5242	2021-03-19 10:02:04 -07:00
Samantha	35340ff67a	Move expired-authz-purger2 config to test directory (#5352 ) - Edit integration test to start expired-authz-purger2 with config/ config-next - Move config from `cmd/expired-authz-purger2/config.json` to `test/config/expired-authz-purger2.json` - Add a copy of `test/config/expired-authz-purger2.json` to `test/config-next/` Fixes #5351	2021-03-18 17:56:25 -07:00
Aaron Gable	91473b384b	Remove common config from publisher (#5353 ) The old `config.Common.CT.IntermediateBundleFilename` format is no longer used in any production configs, and can be removed safely. Part of #5162 Part of #5242 Fixes #5269	2021-03-18 16:59:06 -07:00
Jacob Hoffman-Andrews	b4e483d38b	Add gRPC MaxConnectionAge config. (#5311 ) This allows servers to tell clients to go away after some period of time, which triggers the clients to re-resolve DNS. Per grpc/grpc#12295, this is the preferred way to do this. Related: #5307.	2021-03-01 18:37:47 -08:00
Samantha	e2e7dad034	Move cmd.DBConfig fields to their own named sub-struct (#5286 ) Named field `DB`, in a each component configuration struct, acts as the receiver for the value of `db` when component JSON files are unmarshalled. When `cmd.DBConfig` fields are received at the root of component configuration struct instead of `DB` copy them to the `DB` field of the component configuration struct. Move existing `cmd.DBConfig` values from the root of each component's JSON configuration in `test/config-next` to `db` Part of #5275	2021-02-16 10:48:58 -08:00
Samantha	2efabf57b6	Adding support for multiple issuers to publisher (#5272 ) Publisher currently loads a PEM formatted certificate bundle from file using LoadCertBundle a utility function in the core package. LoadCertBundle parses the PEM file to a slice of x509.Certificates and returns them to boulder-publisher (without checking validity). Using these x509 Certificates, boulder-publisher to construct an ASN1Cert bundle. This bundle is passed to each new publisher instance. When publisher receives a request it unconditionally appends this bundle to each end-entity precertificate for submission to CT logs. This change augments this process to add support for multiple issuers using the IssuerNameID concept in the Issuance package. Config field Common.CT.CertificateBundleFilename has been replaced with the Chains field. LoadChain, a utility function added in PR #5271, loads and validates the chain (which nets us some added deploy-time safety) before returning it to boulder-publisher. Using these x509 Certificates, boulder-publisher constructs a mapping of IssuerNameID to ASN1Cert bundle and passes this to each new publisher instance. When publisher receives a request it determines the IssuerNameID of the precertificate to select and append the correct ASN1Cert bundle for a given Issuer. A followup issue #5269 has been created to address removal of the Common field from the publisher configuration and code has been commented with TODOs where code will need to be removed or refactored. Fixes #1669	2021-02-08 12:23:44 -08:00
Andrew Gabbitas	0fdfbe1211	Deprecate StripDefaultSchemePort flag (#5265 ) This flag is now enabled in Let's Encrypt staging/prod. This change deprecates the flag and prepares it for deletion in a future change. It can then be removed once no staging/prod configs reference the flag. Fixes #5236	2021-02-08 11:30:52 -08:00
Aaron Gable	379826d4b5	WFE2: Improve support for multiple issuers & chains (#5247 ) This change simplifies and hardens the wfe2's support for having multiple issuers, and multiple chains for each issuer, configured and loaded in memory. The only config-visible change is replacing the old two separate config values (`certificateChains` and `alternateCertificateChains`) with a single value (`chains`). This new value does not require the user to know and hand-code the AIA URLs at which the certificates are available; instead the chains are simply presented as lists of files. If this new config value is present, the old config values will be ignored; if it is not, the old config values will be respected. Behind the scenes, the chain loading code has been completely changed. Instead of loading PEM bytes directly from the file, and then asserting various things (line endings, no trailing bits, etc) about those bytes, we now parse a certificate from the file, and in-memory recreate the PEM from that certificate. This approach allows the file loading to be much more forgiving, while also being stricter: we now check that each certificate in the chain is correctly signed by the next cert, and that the last cert in the chain is a self-signed root. Within the WFE itself, most of the internal structure has been retained. However, both the internal `issuerCertificates` (used for checking that certs we are asked to revoke were in fact issued by us) and the `certificateChains` (used to append chains to end-entity certs when served to clients) have been updated to be maps keyed by IssuerNameID. This allows revocation checking to not have to iterate through the whole list of issuers, and also makes it easy to double-check that the signatures on end-entity certs are valid before serving them. Actual checking of the validity will come in a follow-up change, due to the invasive nature of the necessary test changes. Fixes #5164	2021-01-27 15:07:58 -08:00
Samantha	e0510056cc	Enhancements to SQL driver tuning via JSON config (#5235 ) Historically the only database/sql driver setting exposed via JSON config was maxDBConns. This change adds support for maxIdleConns, connMaxLifetime, connMaxIdleTime, and renames maxDBConns to maxOpenConns. The addition of these settings will give our SRE team a convenient method for tuning the reuse/closure of database connections. A new struct, DBSettings, has been added to SA. The struct, and each of it's fields has been commented. All new fields have been plumbed through to the relevant Boulder components and exported as Prometheus metrics. Tests have been added/modified to ensure that the fields are being set. There should be no loss in coverage Deployability concerns for the migration from maxDBConns to maxOpenConns have been addressed with the temporary addition of the helper method cmd.DBConfig.GetMaxOpenConns(). This method can be removed once test/config is defaulted to using maxOpenConns. Relevant sections of the code have TODOs added that link back to an newly opened issue. Fixes #5199	2021-01-25 15:34:55 -08:00
Jacob Hoffman-Andrews	8b9145838d	Add logging of OCSP generation events (#5223 ) This adds a new component to the CA, ocspLogQueue, which batches up OCSP generation events for audit logging. It will log accumulated events when it reaches a certain line length, or when a maximum amount of times has passed.	2021-01-12 15:31:49 -08:00
Aaron Gable	beee17c510	Janitor: refactor to be controlled by config (#5195 ) Previously, configuration of the boulder-janitor was split into two places: the actual json config file (which controlled which jobs would be enabled, and what their rate limits should be), and the janitor code itself (which controlled which tables and columns those jobs should query). This resulted in significant duplicated code, as most of the jobs were identical except for their table and column names. This change abstracts away the query which jobs use to find work. Instead of having each job type parse its own config and produce its own work query (in Go code), now each job supplies just a few key values (the table name and two column names) in its JSON config, and the Go code assembles the appropriate query from there. We are able to delete all of the files defining individual job types, and replace them with a single slightly smarter job constructor. This enables further refactorings, namely: * Moving all of the logic code into its own module; * Ensuring that the exported interface of that module is safe (i.e. that a client cannot create and run jobs without them being valid, because the only exposed methods ensure validity); * Collapsing validity checks into a single location; * Various renamings.	2020-12-17 09:53:22 -08:00

1 2 3 4 5 ...

522 Commits