boulder

Commit Graph

Author	SHA1	Message	Date
Phil Porada	c7dc3a8d72	Test against go1.20.6 (#6987 ) This version includes a fix that seems relevant to us: > The HTTP/1 client did not fully validate the contents of the Host header. A maliciously crafted Host header could inject additional headers or entire requests. The HTTP/1 client now refuses to send requests containing an invalid Request.Host or Request.URL.Host value. > > Thanks to Bartek Nowotarski for reporting this issue. > > Includes security fixes for CVE-2023-29406 and Go issue https://go.dev/issue/60374	2023-07-11 12:50:42 -07:00
Phil Porada	947e199016	Add govulncheck to CI (#6963 ) Fixes https://github.com/letsencrypt/boulder/issues/6354 Runs [govulncheck](https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck) in a one-shot container so that PR creation, updates to a PR, and merges to main can contact the govuln API and check for known vulnerabilities. Lastly, upgrades the version of golangci-lint to the [latest available (v1.53.3)](https://github.com/golangci/golangci-lint/releases). --------- Co-authored-by: Aaron Gable <aaron@letsencrypt.org>	2023-07-11 09:51:20 -04:00
Jacob Hoffman-Andrews	cd24b9db20	ca: deprecate StoreLintingCertificateInsteadOfPrecertificate (#6970 ) And turn off the orphan queue in config-next.	2023-07-05 10:44:08 -07:00
Aaron Gable	cc596bd4eb	Begin testing on go1.21rc2 with loopvar experiment (#6952 ) Add go1.21rc2 to the matrix of go versions we test against. Add a new step to our CI workflows (boulder-ci, try-release, and release) which sets the "GOEXPERIMENT=loopvar" environment variable if we're running go1.21. This experiment makes it so that loop variables are scoped only to their single loop iteration, rather than to the whole loop. This prevents bugs such as our CAA Rechecking incident (https://bugzilla.mozilla.org/show_bug.cgi?id=1619047). Also add a line to our docker setup to propagate this environment variable into the container, where it can affect builds. Finally, fix one TLS-ALPN-01 test to have the fake subscriber server actually willing to negotiate the acme-tls/1 protocol, so that the ACME server's tls client actually waits to (fail to) get the certificate, instead of dying immediately. This fix is related to the upgrade to go1.21, not the loopvar experiment. Fixes https://github.com/letsencrypt/boulder/issues/6950	2023-06-26 16:35:29 -07:00
Aaron Gable	8224fad20b	Update to go1.20.5 (#6946 ) We are already running go1.20.5 in production.	2023-06-20 14:55:37 -07:00
Jacob Hoffman-Andrews	a2b2e53045	cmd: fail without panic (#6935 ) For "ordinary" errors like "file not found" for some part of the config, we would prefer to log an error and exit without logging about a panic and printing a stack trace. To achieve that, we want to call `defer AuditPanic()` once, at the top of `cmd/boulder`'s main. That's so early that we haven't yet parsed the config, which means we haven't yet initialized a logger. We compromise: `AuditPanic` now calls `log.Get()`, which will retrieve the configured logger if one has been set up, or will create a default one (which logs to stderr/stdout). AuditPanic and Fail/FailOnError now cooperate: Fail/FailOnError panic with a special type, and AuditPanic checks for that type and prints a simple message before exiting when it's present. This PR also coincidentally fixes a bug: panicking didn't previously cause the program to exit with nonzero status, because it recovered the panic but then did not explicitly exit nonzero. Fixes #6933	2023-06-20 12:29:02 -07:00
Samantha	124c4cc6f5	grpc/sa: Implement deep health checks (#6928 ) Add the necessary scaffolding for deep health checking of our various gRPC components. Each component implementation that also implements the grpc.checker interface will be checked periodically, and the health status of the component will be updated accordingly. Add the necessary methods to SA to implement the grpc.checker interface and register these new health checks with Consul. Additionally: - Update entry point script to check for ProxySQL readiness. - Increase the poll rate for gRPC Consul checks from 5s to 2s to help with DNS failures, due to check failures, on startup. - Change log level for Consul from INFO to ERROR to deal with noisy logs full of transport failures due to Consul gRPC checks firing before the SAs are up. Fixes #6878 Part of #6795	2023-06-12 13:58:53 -04:00
Jacob Hoffman-Andrews	2041e8723b	integration: shorten log output (#6894 ) Remove the load test stage of the integration test, which generates superfluous amounts of log. Turn down logging on the CA and VA from info to error-only. Part of https://github.com/letsencrypt/boulder/issues/6890	2023-06-05 13:11:19 -04:00
Samantha	dc269a63d5	docker: Update consul container to match production (#6913 ) - Update consul container from `1.13.1` to `1.14.2` to match production. - Specify `grpc_tls`, now required instead of defaulted to `8503` when `enable_agent_tls_for_checks` is specified. Part of #6911	2023-06-02 14:35:07 -04:00
Jacob Hoffman-Andrews	80e1510819	admin: add clear-email subcommand (#6919 ) When a user wants their email address deleted from the database but no longer has access to their account, this allows an administrator to clear it. This adds `admin` as an alias for `admin-revoker`, because we'd like the clear-email sub-command to be a part of that overall tool, but it's not really revocation related. Part of #6864	2023-06-01 14:33:24 -04:00
Jacob Hoffman-Andrews	521eb55d1e	test: better message for different empty slices (#6920 ) Given two empty slices, one that is equal to nil and one that is not, AssertDeepEquals used to produce this confusing output: [[]] !(deep)= [[]] After this change, it produces: [[]string(nil)] !(deep)= [[]string{}]	2023-05-26 09:41:23 -07:00
Aaron Gable	4305f64a28	Replace integration test root ocsp with crls (#6905 ) We no longer issue OCSP responses for our intermediate certificates, instead producing CRLs which cover those intermediates. Remove the OCSP response from our integration test ceremony, remove the configuration for the static ocsp-responder which serves that response, and remove the integration test which spins up and checks that responder. Replace all of the above with new CRLs generated as part of the integration test ceremony.	2023-05-24 14:22:43 -07:00
Samantha	f09a94bd74	consul: Configure gRPC health check for SA (#6908 ) Enable SA gRPC health checks in Consul ahead of further changes for #6878. Calls to the `Check` method of the SA's grpc.health.v1.Health service must respond `SERVING` before the `sa` service will be advertised in Consul DNS. Consul will continue to poll this service every 5 seconds. - Add `bconsul` docker service to boulder `bluenet` and `rednet` - Add TLS credentials for `consul.boulder`: ```shell $ openssl x509 -in consul.boulder/cert.pem -text \| grep DNS DNS:consul.boulder ``` - Update `test/grpc-creds/generate.sh` to add `consul.boulder` - Update test SA configs to allow `consul.boulder` to access to `grpc.health.v1.Health` Part of #6878	2023-05-23 13:16:49 -04:00
Aaron Gable	26adec08cc	Remove go1.20.3 from CI (#6898 ) We are no longer be using go1.20.3 in prod.	2023-05-22 14:47:33 -07:00
Aaron Gable	fe523f142d	crl-updater: retry failed shards (#6907 ) Add per-shard exponential backoff and retry to crl-updater. Each individual CRL shard will be retried up to MaxAttempts (default 1) times, with exponential backoff starting at 1 second and maxing out at 1 minute between each attempt. This can effectively reduce the parallelism of crl-updater: while a goroutine is sleeping between attempts of a failing shard, it is not doing work on another shard. This is a desirable feature, since it means that crl-updater gently reduces the total load it places on the network and database when shards start to fail. Setting this new config parameter is tracked in IN-9140 Fixes https://github.com/letsencrypt/boulder/issues/6895	2023-05-22 12:59:09 -07:00
Aaron Gable	3990a08328	Add relevant domain to CAA errors and logs (#6886 ) When processing CAA records, keep track of the FQDN at which that CAA record was found (which may be different from the FQDN for which we are attempting issuance, since we crawl CAA records upwards from the requested name to the TLD). Then surface this name upwards so that it can be included in our own log lines and in the problem documents which we return to clients. Fixes https://github.com/letsencrypt/boulder/issues/3171	2023-05-22 15:08:56 -04:00
Matthew McPherrin	b7d9f8c2e3	In config-next/, opentelemetry -> openTelemetry for consistency (#6888 ) In configs, opentelemetry -> openTelemetry As pointed out in review of #6867, these should match the case of their corresponding Go identifiers for consistency. JSON keys are case-insensitive in Go (part of why we've got a fork in go-jose), so this change should have no functional impact.	2023-05-15 17:07:29 -04:00
Aaron Gable	62ff373885	Probs: remove divergences from RFC8555 (#6877 ) Remove the remaining divergences from RFC8555 regarding what error types we use in certain situations. Specifically: - use "invalidContact" instead of "invalidEmail"; - use "unsupportedContact" for contact addresses that use a protocol other than "mailto:"; and - use "unsupportedIdentifier" for identifiers that specify a type other than "dns".	2023-05-15 12:35:12 -07:00
Matthew McPherrin	c21b44bdc2	Rename CA's "--ca-addr" flag to "--addr" (#6889 ) Most boulder components have a command line flag to override what gRPC and debug port they listen on, which is used in tests to run multiple instances with the same configuration. However, CA's flag is named "--ca-addr", and not "--addr". This is inconsistent with SA, RA, VA, nonce, publisher, and purger. This flag isn't used in production, where we set it in the config file, so it shouldn't be a breaking change to rename it.	2023-05-15 11:17:07 -07:00
Matthew McPherrin	3aae67b8a9	Opentelemetry: Add option for public endpoints (#6867 ) This PR adds a new configuration block specifically for the otelhttp instrumentation. This block is separate from the existing "opentelemetry" configuration, and is only relevant when using otelhttp instrumentation. It does not share any codepath with the existing configuration, so it is at the top level to indicate which services it applies to. There's a bit of plumbing new configuration through. I've adopted the measured_http package to also set up opentelemetry instead of just metrics, which should hopefully allow any future changes to be smaller (just config & there) and more consistent between the wfe2 and ocsp responder. There's one option here now, which disables setting [otelhttp.WithPublicEndpoint](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp#WithPublicEndpoint). This option is designed to do exactly what we want: Don't accept incoming spans as parents of the new span created in the server. Previously we had a setting to disable parent-based sampling to help with this problem, which doesn't really make sense anymore, so let's just remove it and simplify that setup path. The default of "false" is designed to be the safe option. It's set to True in the test/ configs for integration tests that use traces, and I expect we'll likely set it true in production eventually once the LBs are configured to handle tracing themselves. Fixes #6851	2023-05-12 15:34:34 -04:00
Samantha	310546a14e	VA: Support discovery of DNS resolvers via Consul (#6869 ) Deprecate `va.DNSResolver` in favor of backwards compatible `va.DNSProvider`. Fixes #6852	2023-05-12 12:54:31 -04:00
Samantha	19c5244088	test: Use consul hostname instead of IP for dnsAuthority (#6883 ) Standardize on hostnames for dnsAuthority to match production. Related to #6869	2023-05-11 14:13:53 -07:00
Jacob Hoffman-Andrews	f295626e4c	ca: remove simulated ISRG OID from config (#6879 ) We intend to issue in the future with only the CA/Browser Forum Domain Validated OID.	2023-05-10 12:39:12 -04:00
Jacob Hoffman-Andrews	ac4be89b56	grpc: add NoWaitForReady config field (#6850 ) Currently we set WaitForReady(true), which causes gRPC requests to not fail immediately if no backends are available, but instead wait until the timeout in case a backend does become available. The downside is that this behavior masks true connection errors. We'd like to turn it off. Fixes #6834	2023-05-09 16:16:44 -07:00
Samantha	c453ca0571	grpc: Deprecate clientNames field (#6870 ) - SRE removed in IN-8755 Fixes #6698	2023-05-08 14:49:27 -04:00
Samantha	487680629d	cmd: TLSConfig values should be string not *string (#6872 ) Fixes #6737	2023-05-08 13:21:42 -04:00
Samantha	c9173cc024	boulder-va: Remove deprecated Common fields stanza (#6871 ) - SRE removed in IN-8752. Fixes #6716	2023-05-08 11:47:17 -04:00
Matthew McPherrin	8427245675	OTel Integration test using jaeger (#6842 ) This adds Jaeger's all-in-one dev container (with no persistent storage) to boulder's dev docker-compose. It configures config-next/ to send all traces there. A new integration test creates an account and issues a cert, then verifies the trace contains some set of expected spans. This test found that async finalize broke spans, so I fixed that and a few related spots where we make a new context.	2023-05-05 10:41:29 -04:00
Phil Porada	f8f45f90a9	Test and build release on go1.20.4 (#6862 ) [Go 1.20.4](https://groups.google.com/g/golang-announce/c/MEb0UyuSMsU) contains a security updates for the html/template package, which we use in `//cmd/bad-key-revoker`.	2023-05-04 10:55:02 -04:00
Aaron Gable	02fa680b08	Update path to ARI endpoint (#6859 ) Update the document number to the latest version, and remove the /get/ prefix since it now supports both the GET and POST portions of the spec. Also update one piece of tooling to properly get the ARI URL from the directory, rather than hard-coding it.	2023-05-03 15:20:51 -07:00
Matthew McPherrin	b5118dde36	Stop using DIRECTORY env var in integration tests (#6854 ) We only ever set it to the same value, and then read it back in make_client, so just hardcode it there instead. It's a bit spooky-action-at-a-distance and is process-wide with no synchronization, which means we can't safely use different values anyway.	2023-05-03 09:54:48 -04:00
Jacob Hoffman-Andrews	a9fc1cb882	Improve cert_storage_failed_test (#6849 ) Replace inline connect string with a new one in test/vars (that points to boulder_sa_integration). Remove comments about interpolateParams=false being required; it is not. Add clauses to getPrecertByName to ensure it follows its documented constraints (return the latest one). Follow-up on #6807. Fixes #6848.	2023-05-02 15:43:07 -07:00
Jacob Hoffman-Andrews	1c7e0fd1d8	Store linting certificate instead of precertificate (#6807 ) In order to get rid of the orphan queue, we want to make sure that before we sign a precertificate, we have enough data in the database that we can fulfill our revocation-checking obligations even if storing that precertificate in the database fails. That means: - We should have a row in the certificateStatus table for the serial. - But we should not serve "good" for that serial until we are positive the precertificate was issued (BRs 4.9.10). - We should have a record in the live DB of the proposed certificate's public key, so the bad-key-revoker can mark it revoked. - We should have a record in the live DB of the proposed certificate's names, so it can be revoked if we are required to revoke based on names. The SA.AddPrecertificate method already achieves these goals for precertificates by writing to the various metadata tables. This PR repurposes the SA.AddPrecertificate method to write "proposed precertificates" instead. We already create a linting certificate before the precertificate, and that linting certificate is identical to the precertificate that will be issued except for the private key used to sign it (and the AKID). So for instance it contains the right pubkey and SANs, and the Issuer name is the same as the Issuer name that will be used. So we'll use the linting certificate as the "proposed precertificate" and store it to the DB, along with appropriate metadata. In the new code path, rather than writing "good" for the new certificateStatus row, we write a new, fake OCSP status string "wait". This will cause us to return internalServerError to OCSP requests for that serial (but we won't get such requests because the serial has not yet been published). After we finish precertificate issuance, we update the status to "good" with SA.SetCertificateStatusReady. Part of #6665	2023-04-26 13:54:24 -07:00
Aaron Gable	25cae29f70	Resolve TestAkamaiPurgerDrainQueueFails race (#6844 ) Fixes https://github.com/letsencrypt/boulder/issues/6837	2023-04-26 11:30:09 -04:00
Phil Porada	17fb1b287f	cmd: Export prometheus metrics for TLS cert notBefore and notAfter fields (#6836 ) Export new prometheus metrics for the `notBefore` and `notAfter` fields to track internal certificate validity periods when calling the `Load()` method for a `*tls.Config`. Each metric is labeled with the `serial` field. ``` tlsconfig_notafter_seconds{serial="2152072875247971686"} 1.664821961e+09 tlsconfig_notbefore_seconds{serial="2152072875247971686"} 1.664821960e+09 ``` Fixes https://github.com/letsencrypt/boulder/issues/6829	2023-04-24 16:28:05 -04:00
Aaron Gable	5480f1060b	Clean up database schema (#6832 ) Make a series of small changes to our test database schema, both to make it simpler to reason about and to bring it closer in alignment to our production database schema: - Incorporate the IssuedNamesDropIndex, Incidents, SimplePartitioning, and NotUnique migrations into the CombinedSchema, as they have been fully applied in prod; - Use CHARSET=utf8mb4 everywhere, instead of just utf8; - Use UNSIGNED for auto-increment ID columns in the tables where prod does; and - Re-sort the tables in CombinedSchema which no longer have foreign key constraints. Part of https://github.com/letsencrypt/boulder/issues/6820	2023-04-21 10:37:05 -07:00
Aaron Gable	3ddca2d1b8	Update eggsampler/acme and use it for ARI tests (#6811 ) Update github.com/eggsampler/acme from v3.3.0 to v3.4.0. Changelog: https://github.com/eggsampler/acme/compare/v3.3.0...v3.4.0 Update the ARI integration test to use the eggampler/acme client's new ARI capabilities for making both GET and POST requests. This simplifies and streamlines the test significantly, and lets us test the POST path. Fixes #6781	2023-04-19 14:14:43 -07:00
Phil Porada	0ac848173e	Appease errcheck (#6821 ) Check errors during shutdown for several components to appease errcheck. Related to [1] and [2]. 1) https://github.com/letsencrypt/boulder/pull/6808 2) https://github.com/letsencrypt/boulder/pull/6819	2023-04-14 22:32:24 -04:00
Aaron Gable	bd1d27b8e8	Fix non-gRPC process cleanup and exit (#6808 ) Although #6771 significantly cleaned up how gRPC services stop and clean up, it didn't make any changes to our HTTP servers or our non-server (e.g. crl-updater, log-validator) processes. This change finishes the work. Add a new helper method cmd.WaitForSignal, which simply blocks until one of the three signals we care about is received. This easily replaces all calls to cmd.CatchSignals which passed `nil` as the callback argument, with the added advantage that it doesn't call os.Exit() and therefore allows deferred cleanup functions to execute. This new function is intended to be the last line of main(), allowing the whole process to exit once it returns. Reimplement cmd.CatchSignals as a thin wrapper around cmd.WaitForSignal, but with the added callback functionality. Also remove the os.Exit() call from CatchSignals, so that the main goroutine is allowed to finish whatever it's doing, call deferred functions, and exit naturally. Update all of our non-gRPC binaries to use one of these two functions. The vast majority use WaitForSignal, as they run their main processing loop in a background goroutine. A few (particularly those that can run either in run-once or in daemonized mode) still use CatchSignals, since their primary processing happens directly on the main goroutine. The changes to //test/load-generator are the most invasive, simply because that binary needed to have a context plumbed into it for proper cancellation, but it already had a custom struct type named "context" which needed to be renamed to avoid shadowing. Fixes https://github.com/letsencrypt/boulder/issues/6794	2023-04-14 16:22:56 -04:00
Aaron Gable	98fa0f07b4	Re-enable errcheck linter (#6819 ) Enable the errcheck linter. Update the way we express exclusions to use the new, non-deprecated, non-regex-based format. Fix all places where we began accidentally violating errcheck while it was disabled.	2023-04-14 15:41:12 -04:00
Phil Porada	56a11f0896	Fix CI failures related to akamai-test-srv (#6815 ) Fixes a CI problem introduced by https://github.com/letsencrypt/boulder/pull/6758 where we could send two purge requests which caused sporadic CI failures due to an infinite loop. Fixes https://github.com/letsencrypt/boulder/issues/6806	2023-04-13 09:56:30 -07:00
Aaron Gable	45329c9472	Deprecate ROCSPStage7 flag (#6804 ) Deprecate the ROCSPStage7 feature flag, which caused the RA and CA to stop generating OCSP responses when issuing new certs and when revoking certs. (That functionality is now handled just-in-time by the ocsp-responder.) Delete the old OCSP-generating codepaths from the RA and CA. Remove the CA's internal reference to an OCSP implementation, because it no longer needs it. Additionally, remove the SA's "Issuers" config field, which was never used. Fixes #6285	2023-04-12 17:03:06 -07:00
Aaron Gable	d6192e7c56	Stop testing go1.20.2 (#6809 ) Staging and Prod have fully upgraded to go1.20.3, per IN-8865.	2023-04-10 11:00:25 -07:00
Aaron Gable	e55a276efe	CA: Remove deprecated config stanzas (#6595 ) These config stanzas have been removed in staging and prod. They used to configure the separate OCSP and CRL gRPC services provided by the CA process, but the CA now provides those services on the same port as the main CA gRPC service. Fixes #6448	2023-04-07 09:37:34 -07:00
Aaron Gable	94f93361a0	Promote the first SAN from the CSR (#6796 ) Rather than promoting the alphabetically-first SAN to be the CN, promote the SAN which came first in the CSR. This is a reversion to previous behavior that was changed as a side-effect of: - https://github.com/letsencrypt/boulder/pull/6706; - https://github.com/letsencrypt/boulder/pull/6749; and - https://github.com/letsencrypt/boulder/pull/6757 Fixes https://github.com/letsencrypt/boulder/issues/6801	2023-04-06 14:30:19 -07:00
Aaron Gable	7e994a1216	Deprecate ROCSPStage6 feature flag (#6770 ) Deprecate the ROCSPStage6 feature flag. Remove all references to the `ocspResponse` column from the SA, both when reading from and when writing to the `certificateStatus` table. This makes it safe to fully remove that column from the database. IN-8731 enabled this flag in all environments, so it is safe to deprecate. Part of #6285	2023-04-04 15:41:51 -07:00
Phil Porada	8824e347fd	Golang 1.20.3 security release upgrade (#6793 ) Release notes: https://groups.google.com/g/golang-announce/c/Xdv6JL9ENs8 This update includes fixes for excessive memory usage when parsing headers in the net/http package.	2023-04-04 15:33:34 -07:00
Aaron Gable	8c67769be4	Remove ocsp-updater from Boulder (#6769 ) Delete the ocsp-updater service, and the //ocsp/updater library that supports it. Remove test configs for the service, and remove references to the service from other test files. This service has been fully shut down for an extended period now, and is safe to remove. Fixes #6499	2023-03-31 14:39:04 -07:00
Aaron Gable	22fd579cf2	ARI: write Retry-After header before body (#6787 ) When sending an ARI response, write the Retry-After header before writing the JSON response body. This is necessary because http.ResponseWriter implicitly calls WriteHeader whenever Write is called, flushing all headers to the network and preventing any additional headers from being written. Unfortunately, the unittests use httptest.ResponseRecorder, which doesn't seem to enforce this invariant (it's happy to report headers which were written after the body). Add a header check to the integration tests, to make up for this deficiency.	2023-03-31 10:48:45 -07:00
Aaron Gable	9262ca6e3f	Add grpc implementation tests to all services (#6782 ) As a follow-up to #6780, add the same style of implementation test to all of our other gRPC services. This was not included in that PR just to keep it small and single-purpose.	2023-03-31 09:52:26 -07:00

1 2 3 4 5 ...

1509 Commits