boulder

Commit Graph

Author	SHA1	Message	Date
Aaron Gable	b090ffbd2e	Use zlint to check our CRLs (#6972 ) Update zlint to v3.5.0, which introduces scaffolding for running lints over CRLs. Convert all of our existing CRL checks to structs which match the zlint interface, and add them to the registry. Then change our linter's CheckCRL function, and crl-checker's Validate function, to run all lints in the zlint registry. Finally, update the ceremony tool to run these lints as well. This change touches a lot of files, but involves almost no logic changes. It's all just infrastructure, changing the way our lints and their tests are shaped, and moving test files into new homes. Fixes https://github.com/letsencrypt/boulder/issues/6934 Fixes https://github.com/letsencrypt/boulder/issues/6979	2023-07-11 15:39:05 -07:00
Aaron Gable	3c1476d79b	Remove last math/rand.Seed() call (#6948 ) The use of math/rand.Seed() is deprecated as of go1.20, as the package now seeds itself: https://tip.golang.org/doc/go1.20#minor_library_changes	2023-06-20 14:56:03 -07:00
Jacob Hoffman-Andrews	a2b2e53045	cmd: fail without panic (#6935 ) For "ordinary" errors like "file not found" for some part of the config, we would prefer to log an error and exit without logging about a panic and printing a stack trace. To achieve that, we want to call `defer AuditPanic()` once, at the top of `cmd/boulder`'s main. That's so early that we haven't yet parsed the config, which means we haven't yet initialized a logger. We compromise: `AuditPanic` now calls `log.Get()`, which will retrieve the configured logger if one has been set up, or will create a default one (which logs to stderr/stdout). AuditPanic and Fail/FailOnError now cooperate: Fail/FailOnError panic with a special type, and AuditPanic checks for that type and prints a simple message before exiting when it's present. This PR also coincidentally fixes a bug: panicking didn't previously cause the program to exit with nonzero status, because it recovered the panic but then did not explicitly exit nonzero. Fixes #6933	2023-06-20 12:29:02 -07:00
Jacob Hoffman-Andrews	824417f6c0	sa: refactor db initialization (#6930 ) Previously, we had three chained calls initializing a database: - InitWrappedDb calls NewDbMap - NewDbMap calls NewDbMapFromConfig Since all three are exporetd, this left me wondering when to call one vs the others. It turns out that NewDbMap is only called from tests, so I renamed it to DBMapForTest to make that clear. NewDbMapFromConfig is only called internally to the SA, so I made it unexported it as newDbMapFromMysqlConfig. Also, I copied the ParseDSN call into InitWrappedDb, so it doesn't need to call DBMapForTest. Now InitWrappedDb and DBMapForTest both independently call newDbMapFromMysqlConfig. I also noticed that InitDBMetrics was only called internally so I unexported it.	2023-06-13 10:15:40 -07:00
Samantha	124c4cc6f5	grpc/sa: Implement deep health checks (#6928 ) Add the necessary scaffolding for deep health checking of our various gRPC components. Each component implementation that also implements the grpc.checker interface will be checked periodically, and the health status of the component will be updated accordingly. Add the necessary methods to SA to implement the grpc.checker interface and register these new health checks with Consul. Additionally: - Update entry point script to check for ProxySQL readiness. - Increase the poll rate for gRPC Consul checks from 5s to 2s to help with DNS failures, due to check failures, on startup. - Change log level for Consul from INFO to ERROR to deal with noisy logs full of transport failures due to Consul gRPC checks firing before the SAs are up. Fixes #6878 Part of #6795	2023-06-12 13:58:53 -04:00
Jacob Hoffman-Andrews	80e1510819	admin: add clear-email subcommand (#6919 ) When a user wants their email address deleted from the database but no longer has access to their account, this allows an administrator to clear it. This adds `admin` as an alias for `admin-revoker`, because we'd like the clear-email sub-command to be a part of that overall tool, but it's not really revocation related. Part of #6864	2023-06-01 14:33:24 -04:00
Aaron Gable	6ea74d5be9	OCSP: Use FilterSource for static responders (#6901 ) Move the creation of the FilterSource outside of the conditional block, so that the underlying source gets wrapped no matter which kind (either a inMemorySource or a checkedRedisSource) it is. This has two advantages: first, it means that static ocsp responders are safer and more accurate, because they're not basing their responses on both the issuer and the serial, not just the serial; and second, it makes the current config validation tag which marks the "issuerCerts" config field as required with `min=1` accurate.	2023-05-24 14:23:27 -07:00
Aaron Gable	fe523f142d	crl-updater: retry failed shards (#6907 ) Add per-shard exponential backoff and retry to crl-updater. Each individual CRL shard will be retried up to MaxAttempts (default 1) times, with exponential backoff starting at 1 second and maxing out at 1 minute between each attempt. This can effectively reduce the parallelism of crl-updater: while a goroutine is sleeping between attempts of a failing shard, it is not doing work on another shard. This is a desirable feature, since it means that crl-updater gently reduces the total load it places on the network and database when shards start to fail. Setting this new config parameter is tracked in IN-9140 Fixes https://github.com/letsencrypt/boulder/issues/6895	2023-05-22 12:59:09 -07:00
Jacob Hoffman-Andrews	2b1ac9e915	admin-revoker: fix help output (#6891 ) Previously if you passed `-h` or `-help` to a sub-sub-command of admin-revoker it would error out with a red message and a stack trace (in addition to printing help). Now, it will print help and exit 1.	2023-05-15 13:54:13 -07:00
Aaron Gable	46183df5dc	Add link to list of root programs to ceremony docs (#6884 ) Fixes https://github.com/letsencrypt/boulder/issues/6730	2023-05-15 12:34:34 -07:00
Matthew McPherrin	c21b44bdc2	Rename CA's "--ca-addr" flag to "--addr" (#6889 ) Most boulder components have a command line flag to override what gRPC and debug port they listen on, which is used in tests to run multiple instances with the same configuration. However, CA's flag is named "--ca-addr", and not "--addr". This is inconsistent with SA, RA, VA, nonce, publisher, and purger. This flag isn't used in production, where we set it in the config file, so it shouldn't be a breaking change to rename it.	2023-05-15 11:17:07 -07:00
Samantha	9e8101ff3a	main: Validate config files by default (#6885 ) - Make config validation run by default for all Boulder components with a registered validator. - Refactor main to parse `boulder` flags directly instead of declaring them as subcommands. - Remove the `validate` subcommand and update relevant docs. - Fix configuration validation for issuer (file source) OCSP responder. Fixes #6857 Fixes #6763	2023-05-15 14:16:04 -04:00
Matthew McPherrin	3aae67b8a9	Opentelemetry: Add option for public endpoints (#6867 ) This PR adds a new configuration block specifically for the otelhttp instrumentation. This block is separate from the existing "opentelemetry" configuration, and is only relevant when using otelhttp instrumentation. It does not share any codepath with the existing configuration, so it is at the top level to indicate which services it applies to. There's a bit of plumbing new configuration through. I've adopted the measured_http package to also set up opentelemetry instead of just metrics, which should hopefully allow any future changes to be smaller (just config & there) and more consistent between the wfe2 and ocsp responder. There's one option here now, which disables setting [otelhttp.WithPublicEndpoint](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp#WithPublicEndpoint). This option is designed to do exactly what we want: Don't accept incoming spans as parents of the new span created in the server. Previously we had a setting to disable parent-based sampling to help with this problem, which doesn't really make sense anymore, so let's just remove it and simplify that setup path. The default of "false" is designed to be the safe option. It's set to True in the test/ configs for integration tests that use traces, and I expect we'll likely set it true in production eventually once the LBs are configured to handle tracing themselves. Fixes #6851	2023-05-12 15:34:34 -04:00
Samantha	310546a14e	VA: Support discovery of DNS resolvers via Consul (#6869 ) Deprecate `va.DNSResolver` in favor of backwards compatible `va.DNSProvider`. Fixes #6852	2023-05-12 12:54:31 -04:00
Aaron Gable	42bd62e50b	Purger: list failed urls in error message (#6882 ) Fixes https://github.com/letsencrypt/boulder/issues/6853	2023-05-11 10:39:54 -07:00
Jacob Hoffman-Andrews	ac4be89b56	grpc: add NoWaitForReady config field (#6850 ) Currently we set WaitForReady(true), which causes gRPC requests to not fail immediately if no backends are available, but instead wait until the timeout in case a backend does become available. The downside is that this behavior masks true connection errors. We'd like to turn it off. Fixes #6834	2023-05-09 16:16:44 -07:00
Samantha	9dce86fda0	boulder-wfe: Remove deprecated chains fields (#6874 ) Fields CertificateChains and AlternativeCertificateChains were removed by SRE in IN-5913. Fixes #6873 Related to #5164	2023-05-08 15:55:20 -04:00
Samantha	c453ca0571	grpc: Deprecate clientNames field (#6870 ) - SRE removed in IN-8755 Fixes #6698	2023-05-08 14:49:27 -04:00
Samantha	487680629d	cmd: TLSConfig values should be string not *string (#6872 ) Fixes #6737	2023-05-08 13:21:42 -04:00
Samantha	c9173cc024	boulder-va: Remove deprecated Common fields stanza (#6871 ) - SRE removed in IN-8752. Fixes #6716	2023-05-08 11:47:17 -04:00
Matthew McPherrin	8427245675	OTel Integration test using jaeger (#6842 ) This adds Jaeger's all-in-one dev container (with no persistent storage) to boulder's dev docker-compose. It configures config-next/ to send all traces there. A new integration test creates an account and issues a cert, then verifies the trace contains some set of expected spans. This test found that async finalize broke spans, so I fixed that and a few related spots where we make a new context.	2023-05-05 10:41:29 -04:00
Jacob Hoffman-Andrews	1c7e0fd1d8	Store linting certificate instead of precertificate (#6807 ) In order to get rid of the orphan queue, we want to make sure that before we sign a precertificate, we have enough data in the database that we can fulfill our revocation-checking obligations even if storing that precertificate in the database fails. That means: - We should have a row in the certificateStatus table for the serial. - But we should not serve "good" for that serial until we are positive the precertificate was issued (BRs 4.9.10). - We should have a record in the live DB of the proposed certificate's public key, so the bad-key-revoker can mark it revoked. - We should have a record in the live DB of the proposed certificate's names, so it can be revoked if we are required to revoke based on names. The SA.AddPrecertificate method already achieves these goals for precertificates by writing to the various metadata tables. This PR repurposes the SA.AddPrecertificate method to write "proposed precertificates" instead. We already create a linting certificate before the precertificate, and that linting certificate is identical to the precertificate that will be issued except for the private key used to sign it (and the AKID). So for instance it contains the right pubkey and SANs, and the Issuer name is the same as the Issuer name that will be used. So we'll use the linting certificate as the "proposed precertificate" and store it to the DB, along with appropriate metadata. In the new code path, rather than writing "good" for the new certificateStatus row, we write a new, fake OCSP status string "wait". This will cause us to return internalServerError to OCSP requests for that serial (but we won't get such requests because the serial has not yet been published). After we finish precertificate issuance, we update the status to "good" with SA.SetCertificateStatusReady. Part of #6665	2023-04-26 13:54:24 -07:00
Phil Porada	17fb1b287f	cmd: Export prometheus metrics for TLS cert notBefore and notAfter fields (#6836 ) Export new prometheus metrics for the `notBefore` and `notAfter` fields to track internal certificate validity periods when calling the `Load()` method for a `*tls.Config`. Each metric is labeled with the `serial` field. ``` tlsconfig_notafter_seconds{serial="2152072875247971686"} 1.664821961e+09 tlsconfig_notbefore_seconds{serial="2152072875247971686"} 1.664821960e+09 ``` Fixes https://github.com/letsencrypt/boulder/issues/6829	2023-04-24 16:28:05 -04:00
Matthew McPherrin	3502e4a971	Use a common Command() function (#6833 ) This moves command() from cmd/ into core.Command() and uses it from the log and main package, ensuring we have a single implementation of path.Base(os.Args[0]) instead of scattering that method around. I put it in core/util.go as similar enough functions like the BuildID and revision info already live there, though I am not entirely sure it's the right place. This came up in @aarongable's review of #6750, where he pointed out I was manually hardcoding commands instead of using command() or similar.	2023-04-21 16:54:32 -04:00
Aaron Gable	bcee974183	OCSP: remove ExpectedFreshness config key (#6838 ) This field was removed from all staging and prod configs in IN-8855. Fixes https://github.com/letsencrypt/boulder/issues/6775	2023-04-21 15:58:52 -04:00
Matthew McPherrin	0060e695b5	Introduce OpenTelemetry Tracing (#6750 ) Add a new shared config stanza which all boulder components can use to configure their Open Telemetry tracing. This allows components to specify where their traces should be sent, what their sampling ratio should be, and whether or not they should respect their parent's sampling decisions (so that web front-ends can ignore sampling info coming from outside our infrastructure). It's likely we'll need to evolve this configuration over time, but this is a good starting point. Add basic Open Telemetry setup to our existing cmd.StatsAndLogging helper, so that it gets initialized at the same time as our other observability helpers. This sets certain default fields on all traces/spans generated by the service. Currently these include the service name, the service version, and information about the telemetry SDK itself. In the future we'll likely augment this with information about the host and process. Finally, add instrumentation for the HTTP servers and grpc clients/servers. This gives us a starting point of being able to monitor Boulder, but is fairly minimal as this PR is already somewhat unwieldy: It's really only enough to understand that everything is wired up properly in the configuration. In subsequent work we'll enhance those spans with more data, and add more spans for things not automatically traced here. Fixes https://github.com/letsencrypt/boulder/issues/6361 --------- Co-authored-by: Aaron Gable <aaron@aarongable.com>	2023-04-21 10:46:59 -07:00
Phil Porada	0ac848173e	Appease errcheck (#6821 ) Check errors during shutdown for several components to appease errcheck. Related to [1] and [2]. 1) https://github.com/letsencrypt/boulder/pull/6808 2) https://github.com/letsencrypt/boulder/pull/6819	2023-04-14 22:32:24 -04:00
Aaron Gable	bd1d27b8e8	Fix non-gRPC process cleanup and exit (#6808 ) Although #6771 significantly cleaned up how gRPC services stop and clean up, it didn't make any changes to our HTTP servers or our non-server (e.g. crl-updater, log-validator) processes. This change finishes the work. Add a new helper method cmd.WaitForSignal, which simply blocks until one of the three signals we care about is received. This easily replaces all calls to cmd.CatchSignals which passed `nil` as the callback argument, with the added advantage that it doesn't call os.Exit() and therefore allows deferred cleanup functions to execute. This new function is intended to be the last line of main(), allowing the whole process to exit once it returns. Reimplement cmd.CatchSignals as a thin wrapper around cmd.WaitForSignal, but with the added callback functionality. Also remove the os.Exit() call from CatchSignals, so that the main goroutine is allowed to finish whatever it's doing, call deferred functions, and exit naturally. Update all of our non-gRPC binaries to use one of these two functions. The vast majority use WaitForSignal, as they run their main processing loop in a background goroutine. A few (particularly those that can run either in run-once or in daemonized mode) still use CatchSignals, since their primary processing happens directly on the main goroutine. The changes to //test/load-generator are the most invasive, simply because that binary needed to have a context plumbed into it for proper cancellation, but it already had a custom struct type named "context" which needed to be renamed to avoid shadowing. Fixes https://github.com/letsencrypt/boulder/issues/6794	2023-04-14 16:22:56 -04:00
Aaron Gable	98fa0f07b4	Re-enable errcheck linter (#6819 ) Enable the errcheck linter. Update the way we express exclusions to use the new, non-deprecated, non-regex-based format. Fix all places where we began accidentally violating errcheck while it was disabled.	2023-04-14 15:41:12 -04:00
Aaron Gable	45329c9472	Deprecate ROCSPStage7 flag (#6804 ) Deprecate the ROCSPStage7 feature flag, which caused the RA and CA to stop generating OCSP responses when issuing new certs and when revoking certs. (That functionality is now handled just-in-time by the ocsp-responder.) Delete the old OCSP-generating codepaths from the RA and CA. Remove the CA's internal reference to an OCSP implementation, because it no longer needs it. Additionally, remove the SA's "Issuers" config field, which was never used. Fixes #6285	2023-04-12 17:03:06 -07:00
Aaron Gable	e55a276efe	CA: Remove deprecated config stanzas (#6595 ) These config stanzas have been removed in staging and prod. They used to configure the separate OCSP and CRL gRPC services provided by the CA process, but the CA now provides those services on the same port as the main CA gRPC service. Fixes #6448	2023-04-07 09:37:34 -07:00
Aaron Gable	d6cd589795	Simplify how gRPC services start, stop, and clean up (#6771 ) The CA, RA, and VA have multiple goroutines running alongside primary gRPC handling goroutine. These ancillary goroutines should be gracefully shut down when the process is about to exit. Historically, we have handled this by putting a call to each of these goroutine's shutdown function inside cmd.CatchSignals, so that when a SIGINT is received, all of the various cleanup routines happen in sequence. But there's a cleaner way to do it: just use defer! All of these cleanups need to happen after the primary gRPC server has fully shut down, so that we know they stick around at least as long as the service is handling gRPC requests. And when the service receives a SIGINT, cmd.CatchSignals will call the gRPC server's GracefulStop, which will cause the server's .Serve() to finally exit, which will cause start() to exit, which will cause main() to exit, which will cause all deferred functions to be run. In addition, remove filterShutdownErrors as the bug which made it necessary (.Serve() returning an error even when GracefulShutdown() is called) was fixed back in 2017. This allows us to call the start() function in a much more natural way, simply logging any error it returns instead of calling os.Exit(1) if it returns an error. This allows us to simplify the exit-handling code in these three services' main() functions, and lets us be a bit more idiomatic with our deferred cleanup functions. Part of #6794	2023-04-05 14:55:57 -07:00
Aaron Gable	7e994a1216	Deprecate ROCSPStage6 feature flag (#6770 ) Deprecate the ROCSPStage6 feature flag. Remove all references to the `ocspResponse` column from the SA, both when reading from and when writing to the `certificateStatus` table. This makes it safe to fully remove that column from the database. IN-8731 enabled this flag in all environments, so it is safe to deprecate. Part of #6285	2023-04-04 15:41:51 -07:00
Aaron Gable	8c67769be4	Remove ocsp-updater from Boulder (#6769 ) Delete the ocsp-updater service, and the //ocsp/updater library that supports it. Remove test configs for the service, and remove references to the service from other test files. This service has been fully shut down for an extended period now, and is safe to remove. Fixes #6499	2023-03-31 14:39:04 -07:00
Phil Porada	a178943d01	bad-key-revoker: Reduce probability of hash collision during testing (#6790 ) Create a jwkHash of 32 bytes rather than 2 bytes to reduce the probability of a hash collision in the `TestInvoke` function. Fixes https://github.com/letsencrypt/boulder/issues/6789	2023-03-31 13:27:12 -07:00
Aaron Gable	9262ca6e3f	Add grpc implementation tests to all services (#6782 ) As a follow-up to #6780, add the same style of implementation test to all of our other gRPC services. This was not included in that PR just to keep it small and single-purpose.	2023-03-31 09:52:26 -07:00
Matthew McPherrin	6d37299ec3	Fix up validation a bit (#6786 ) - Missing newline - Include validate in the list of subcommands - fixes #6785	2023-03-30 16:11:28 -04:00
Phil Porada	ce2ee69c5f	SARO: Add sa_lag_factor metric to assess usage of the lagFactor codepath (#6774 ) Add `sa_lag_retry` prometheus countervec metric with pass/fail dimensions for `GetOrder`, `GetAuthorization2`, and `GetRegistration` methods. The new metrics will appear as follows: ``` sa_lag_retry{method="GetOrder",result="found"} 0 sa_lag_retry{method="GetOrder",result="notfound"} 0 sa_lag_retry{method="GetOrder",result="other"} 0 sa_lag_retry{method="GetAuthorization2",result="found"} 0 sa_lag_retry{method="GetAuthorization2",result="notfound"} 0 sa_lag_retry{method="GetAuthorization2",result="other"} 0 sa_lag_retry{method="GetRegistration",result="found"} 0 sa_lag_retry{method="GetRegistration",result="notfound"} 0 sa_lag_retry{method="GetRegistration",result="other"} 0 ``` Fixes https://github.com/letsencrypt/boulder/issues/6773 --------- Co-authored-by: Samantha <hello@entropy.cat>	2023-03-30 13:48:16 -04:00
Matthew McPherrin	49851d7afd	Remove Beeline configuration (#6765 ) In a previous PR, #6733, this configuration was marked deprecated pending removal. Here is that removal.	2023-03-23 16:58:36 -04:00
Samantha	b2224eb4bc	config: Add validation tags to all configuration structs (#6674 ) - Require `letsencrypt/validator` package. - Add a framework for registering configuration structs and any custom validators for each Boulder component at `init()` time. - Add a `validate` subcommand which allows you to pass a `-component` name and `-config` file path. - Expose validation via exported utility functions `cmd.LookupConfigValidator()`, `cmd.ValidateJSONConfig()` and `cmd.ValidateYAMLConfig()`. - Add unit test which validates all registered component configuration structs against test configuration files. Part of #6052	2023-03-21 14:08:03 -04:00
Aaron Gable	f33071b7c0	CA: Stop gRPC service before helper goroutines (#6761 ) When shutting down the CA, we should stop the primary gRPC service (which waits for any outstanding requests to complete) before shutting down the other helper goroutines. This prevents panics when an ongoing gRPC request attempts to write to the OCSPLogQueue, but the queue's channel has already been closed by the call to ocspi.Stop(). Fixes #6760	2023-03-20 16:10:10 -07:00
Matthew McPherrin	e1ed1a2ac2	Remove beeline tracing (#6733 ) Remove tracing using Beeline from Boulder. The only remnant left behind is the deprecated configuration, to ensure deployability. We had previously planned to swap in OpenTelemetry in a single PR, but that adds significant churn in a single change, so we're doing this as multiple steps that will each be significantly easier to reason about and review. Part of #6361	2023-03-14 15:14:27 -07:00
Phil Porada	f0b3d319ff	Markdown anchor fragments should be lowercase (#6727 ) Replace capital letters with lowercase letters in markdown fragments for compatibility with various markdown renderers. For example, Github happily accepts fragments as-is, but vscode does not. Fixes https://github.com/letsencrypt/boulder/issues/6722	2023-03-07 16:37:29 -08:00
Samantha	8227052345	GRPC: Add TODO for Config.GRPCServerConfig.ClientNames (#6718 ) Part of #6698	2023-03-02 17:17:55 -05:00
Samantha	aae2a48a65	VA: Deprecate Config.Common (#6717 ) Marks both fields in Common as deprecated. Part of #6716	2023-03-02 16:26:36 -05:00
Jacob Hoffman-Andrews	e052e7445b	admin-revoker: document malformed-revoke (#6714 ) In particular, document that it does not purge the Akamai cache. Also, in the RA, avoid creating a "fake" certificate object containing only the serial. Instead, use req.Serial directly in most places. This uncovered some incorrect logic. Fix that logic by gating the operations that actually need a full *x509.Certificate: revoking by key, and purging the Akamai cache. Also, make `req.Serial` mandatory for AdministrativelyRevokeCertificate. This is a reopen of #6693, which accidentally got merged into a different feature branch.	2023-03-02 12:02:21 -08:00
Samantha	0ae891f5e4	boulder-ra: Remove Config.IssuerCertPath (#6713 ) Fixes #5162 Part of #6052 Blocks #6674	2023-03-01 15:45:56 -05:00
Samantha	dcf4a4bd51	ocsp-responder: Remove Config.MaxAge (#6711 ) Fixes #6710 Part of #6052 Blocks #6674	2023-03-01 15:45:41 -05:00
Samantha	8440a47d0b	expiration-mailer: Remove Config.NagCheckInterval (#6712 ) Fixes #6097 Part of #6052 Blocks #6674	2023-03-01 15:45:18 -05:00
Aaron Gable	29bf521121	CA: Remove secondary gRPC servers (#6496 ) Remove the OCSPGenerator and CRLGenerator gRPC servers that run on separate ports from the CA's main gRPC server, which exposes both those and the CertificateAuthority service as well. These additional servers are no longer necessary, now that all three services are exposed on the single address/port. Fixes #6448	2023-03-01 11:45:28 -08:00

1 2 3 4 5 ...

1511 Commits