boulder

Commit Graph

Author	SHA1	Message	Date
Aaron Gable	11544756bb	Support new Google CT Policy (#6082 ) Add a new code path to the ctpolicy package which enforces Chrome's new CT Policy, which requires that SCTs come from logs run by two different operators, rather than one Google and one non-Google log. To achieve this, invert the "race" logic: rather than assuming we always have two groups, and racing the logs within each group against each other, we now race the various groups against each other, and pick just one arbitrary log from each group to attempt submission to. Ensure that the new code path does the right thing by adding a new zlint which checks that the two SCTs embedded in a certificate come from logs run by different operators. To support this lint, which needs to have a canonical mapping from logs to their operators, import the Chrome CT Log List JSON Schema and autogenerate Go structs from it so that we can parse a real CT Log List. Also add flags to all services which run these lints (the CA and cert-checker) to let them load a CT Log List from disk and provide it to the lint. Finally, since we now have the ability to load a CT Log List file anyway, use this capability to simplify configuration of the RA. Rather than listing all of the details for each log we're willing to submit to, simply list the names (technically, Descriptions) of each log, and look up the rest of the details from the log list file. To support this change, SRE will need to deploy log list files (the real Chrome log list for prod, and a custom log list for staging) and then update the configuration of the RA, CA, and cert-checker. Once that transition is complete, the deletion TODOs left behind by this change will be able to be completed, removing the old RA configuration and old ctpolicy race logic. Part of #5938	2022-05-25 15:14:57 -07:00
Jacob Hoffman-Andrews	76f987a1df	Reland "Allow expiration mailer to work in parallel" (#6133 ) This reverts commit `7ef6913e71`. We turned on the `ExpirationMailerDontLookTwice` feature flag in prod, and it's working fine but not clearing the backlog. Since https://github.com/letsencrypt/boulder/pull/6100 fixed the issue that caused us to (nearly) stop sending mail when we deployed #6057, this should be safe to roll forward. The revert of the revert applied cleanly, except for expiration-mailer/main.go and `main_test.go`, particularly around the contents `processCerts` (where `sendToOneRegID` was extracted from) and `sendToOneRegID` itself. So those areas are good targets for extra attention.	2022-05-23 16:16:43 -07:00
Aaron Gable	9b4ca235dd	Update boulder-tools dependencies (#6129 ) Update: - golangci-lint from v1.42.1 to v1.46.2 - protoc from v3.15.6 to v3.20.1 - protoc-gen-go from v1.26.0 to v1.28.0 - protoc-gen-go-grpc from v1.1.0 to v1.2.0 - fpm from v1.14.0 to v1.14.2 Also remove a reference to go1.17.9 from one last place. This does result in updating all of our generated .pb.go files, but only to update the version number embedded in each file's header. Fixes #6123	2022-05-20 14:24:01 -07:00
Aaron Gable	f958d479f9	Stop testing on go1.17 (#6126 ) We are using exclusively go1.18 in our deployment environments.	2022-05-18 08:40:29 -07:00
Jacob Hoffman-Andrews	be893678bd	expiration-mailer: feature-gate bug fix (#6122 ) We recently landed a fix so the expiration-mailer won't look twice at the same certificate. This will cause an immediate behavior change when it is deployed, and that might have surprising effects. Put the fix behind a feature flag so we can control when it rolls out more carefully.	2022-05-16 14:17:23 -07:00
Jacob Hoffman-Andrews	a4ba9b1adb	rocsp/config: fix PoolSize comment (#6110 ) The go-redis docs say default is 10 * NumCPU, but the actual code says 5. Extra context: `2465baaab5/options.go (L143-L145)` `2465baaab5/cluster.go (L96-L98)` For Options, the default (documented) is 10 * NumCPUs. For ClusterOptions, the default (undocumented) is 5 * NumCPUs. We use ClusterOptions. Also worth noting: for ClusterOptions, the limit is per node.	2022-05-12 16:29:26 -07:00
Jacob Hoffman-Andrews	25e4b7e7fa	expiration-mailer: Deprecate NagCheckInterval (#6103 ) This was introduced when expiration-mailer was run by cron, and was a way for expiration-mailer to know something about its expected run interval so it could send notifications "on time" rather than "just after" the configured email time. Now that expiration-mailer runs as a daemon we can simply pull this value from `Frequency`, which is set to the same value in prod.	2022-05-12 16:28:42 -07:00
Jacob Hoffman-Andrews	f5769c0967	Fix comment on AssertMetricWithLabelsEquals (#6099 ) Also tag it as a helper.	2022-05-10 15:52:19 -07:00
Aaron Gable	f29f63a317	Don't write "null" to DB for missing contacts (#6090 ) Instead write `[]`, a better representation of an empty contact set, and avoid having literal JSON `null`s in our database. As part of doing so, add some extra code to //sa/model.go that bypasses the need for //sa/type-converter.go to do any magic JSON-to-string-slice conversions for us. Fixes #6074	2022-05-10 09:25:41 -07:00
Aaron Gable	7ef6913e71	Revert "Allow expiration mailer to work in parallel" (#6080 ) When deployed, the newly-parallel expiration-mailer encountered unexpected difficulties and dropped to apparently sending nearly zero emails despite not throwing any real errors. Reverting the parallelism change until we understand and can fix the root cause. This reverts two commits: - Allow expiration mailer to work in parallel (#6057) - Fix data race in expiration-mailer test mocks (#6072) It also modifies the revert to leave the new `ParallelSends` config key in place (albeit completely ignored), so that the binary containing this revert can be safely deployed regardless of config status. Part of #5682	2022-05-03 13:18:40 -07:00
Jacob Hoffman-Andrews	9629c88d66	Allow expiration mailer to work in parallel (#6057 ) Previously, each accounts email would be sent in serial, along with several reads from the database (to check for certificate renewal) and several writes to the database (to update `certificateStatus.lastExpirationNagSent`). This adds a config field for the expiration mailer that sets the parallelism it will use. That means making and using multiple SMTP connections as well. Previously, `bmail.Mailer` was not safe for concurrent use. It also had a piece of API awkwardness: after you created a Mailer, you had to call Connect on it to change its state. Instead of treating that as a state change on Mailer, I split out a separate component: `bmail.Conn`. Now, when you call `Mailer.Connect()`, you get a Conn. You can send mail on that Conn and Close it when you're done. A single Mailer instance can produce multiple Conns, so Mailer is now concurrency-safe (while Conn is not). This involved a moderate amount of renaming and code movement, and GitHub's move detector is not keeping up 100%, so an eye towards "is this moved code?" may help. Also adding `?w=1` to the diff URL to ignore whitespace diffs.	2022-04-21 18:04:55 -07:00
Jacob Hoffman-Andrews	fe6fab8821	Remove fqdnsets_old workaround (#6054 ) Fixes #5670	2022-04-21 16:39:35 -07:00
Jacob Hoffman-Andrews	4467cf27db	Update config from config-next (#6051 ) This copies over settings from config-next that are now deployed in prod. Also, I updated a comment in sd-test-srv to more accurately describe how SRV records work.	2022-04-19 12:10:26 -07:00
Jacob Hoffman-Andrews	ca29b4b380	Install a specific version of fpm (#6049 ) This prevents fpm from changing out from under us unexpectedly.	2022-04-13 16:26:09 -07:00
Samantha	a9ba5e42a0	VA: Add IP address to detailed errors (#6039 ) Prepend the IP address of the remote host where HTTP-01 or TLS-ALPN-01 validation was attempted in the detailed error response body. Fixes #6016	2022-04-13 12:55:35 -07:00
Samantha	bafe45f301	Update Go to 1.17.9 and 1.18.1 (#6047 ) go1.17.9 (released 2022-04-12) includes security fixes to the crypto/elliptic and encoding/pem packages, as well as bug fixes to the linker and runtime. See the [Go 1.17.9 milestone](https://github.com/golang/go/issues?q=milestone%3AGo1.17.9+label%3ACherryPickApproved) on our issue tracker for details. go1.18.1 (released 2022-04-12) includes security fixes to the crypto/elliptic, crypto/x509, and encoding/pem packages, as well as bug fixes to the compiler, linker, runtime, the go command, vet, and the bytes, crypto/x509, and go/types packages. See the [Go 1.18.1 milestone](https://github.com/golang/go/issues?q=milestone%3AGo1.18.1+label%3ACherryPickApproved) on our issue tracker for details.	2022-04-12 19:32:42 -07:00
Samantha	82c20145c9	SA: Add support for querying which incidents impact a given serial (#6026 ) First commit adding support for tooling to aid in the tracking and remediation of incidents. - Add new SA method `IncidentsForSerial` - Add database models for `incident`s and `incidentCert`s - Add protobuf type for `incident` - Add database migrations for `incidents`, `incident_foo`, and `incident_bar` - Give db user `sa` permissions to `incidents`, `incident_foo`, and `incident_bar` Part Of #5947	2022-04-07 14:44:59 -07:00
Aaron Gable	dab8a71b0e	Use new RA methods from WFE revocation path (#5983 ) Simplify the WFE `RevokeCertificate` API method in three ways: - Remove most of the logic checking if the requester is authorized to revoke the certificate in question (based on who is making the request, what authorizations they have, and what reason they're requesting). That checking is now done by the RA. Instead, simply verify that the JWS is authenticated. - Remove the hard-to-read `authorizedToRevoke` callbacks, and make the `revokeCertBySubscriberKey` (nee `revokeCertByKeyID`) and `revokeCertByCertKey` (nee `revokeCertByJWK`) helpers much more straight-line in their execution logic. - Call the RA's new `RevokeCertByApplicant` and `RevokeCertByKey` gRPC methods, rather than the deprecated `RevokeCertificateWithReg`. This change, without any flag flips, should be invisible to the end-user. It will slightly change some of our log message formats. However, by now relying on the new RA gRPC revocation methods, this change allows us to change our revocation policies by enabling the `AllowDoubleRevocation` and `MozRevocationReasons` feature flags, which affect the behavior of those new helpers. Fixes #5936	2022-03-28 14:14:11 -07:00
Samantha	7c22b99d63	akamai-purger: Improve throughput and configuration safety (#6006 ) - Add new configuration key `throughput`, a mapping which contains all throughput related akamai-purger settings. - Deprecate configuration key `purgeInterval` in favor of `purgeBatchInterval` in the new `throughput` configuration mapping. - When no `throughput` or `purgeInterval` is provided, the purger uses optimized default settings which offer 1.9x the throughput of current production settings. - At startup, all throughput related settings are modeled to ensure that we don't exceed the limits imposed on us by Akamai. - Queue is now `[][]string`, instead of `[]string`. - When a given queue entry is purged we know all 3 of it's URLs were purged. - At startup we know the size of a theoretical request to purge based on the number of queue entries included - Raises the queue size from ~333-thousand cached OCSP responses to 1.25-million, which is roughly 6 hours of work using the optimized default settings - Raise `purgeInterval` in test config from 1ms, which violates API limits, to 800ms Fixes #5984	2022-03-23 17:23:07 -07:00
Andrew Gabbitas	79048cffba	Support writing initial OCSP response to redis (#5958 ) Adds a rocsp redis client to the sa if cluster information is provided in the sa config. If a redis cluster is configured, all new certificate OCSP responses added with sa.AddPrecertificate will attempt to be written to the redis cluster, but will not block or fail on errors. Fixes: #5871	2022-03-21 20:33:12 -06:00
Samantha	3e9eaf84ea	rocsp-tool: Add syslog support (#6010 ) Add a logging stanza to rocsp-tool's config, and initialize a boulder logger rather than using Go's default log facilities. Fixes #5976	2022-03-21 14:51:56 -07:00
Jacob Hoffman-Andrews	7d00d9fbcf	Use go1.18 in CI, and fix up Docker image (#6002 ) - Remove GOPATH-style path structure, which isn't needed with Go modules. - Remove check for existing of docker buildx builder instance, since it was unreliable.	2022-03-21 12:24:13 -07:00
Aaron Gable	910dde95f6	Clean up goodkey configs (#5993 ) Fixes https://github.com/letsencrypt/boulder/issues/5851	2022-03-15 15:26:19 -07:00
Aaron Gable	07d56e3772	Add new, simpler revocation methods to RA (#5969 ) Add two new gRPC methods to the SA: - `RevokeCertByKey` will be used when the API request was signed by the certificate's keypair, rather than a Subscriber keypair. If the request is for reason `keyCompromise`, it will ensure that the key is added to the blocked keys table, and will attempt to "re-revoke" a certificate that was already revoked for some other reason. - `RevokeCertByApplicant` supports both the path where the original subscriber or another account which has proven control over all of the identifier in the certificate requests revocation via the API. It does not allow the requested reason to be `keyCompromise`, as these requests do not represent a demonstration of key compromise. In addition, add a new feature flag `MozRevocationReasons` which controls the behavior of these new methods. If the flag is not set, they behave like they have historically (see above). If the flag is set to true, then the new methods enforce the upcoming Mozilla policies around revocation reasons, namely: - Only the original Subscriber can choose the revocation reason; other clients will get a set reason code based on the method of requesting revocation. When the original Subscriber requests reason `keyCompromise`, this request will be honored, but the key will not be blocked and other certificates with that key will not also be revoked. - Revocations signed with the certificate key will always get reason `keyCompromise`, because we do not know who is sending the request and therefore must assume that the use of the key in this way represents compromise. Because these requests will always be fore reason `keyCompromise`, they will always be added to the blocked keys table and they will always attempt "re-revocation". - Revocations authorized via control of all names in the cert will always get reason `cessationOfOperation`, which is to be used when the original Subscriber does not control all names in the certificate anymore. Finally, update the existing `AdministrativelyRevokeCertificate` method to use the new helper functions shared by the two new methods. Part of #5936	2022-03-14 08:58:17 -07:00
Jacob Hoffman-Andrews	1047c4cf7a	Remove chisel.py (#5986 ) This was used for ACMEv1 and is not needed anymore. Also update some outdated references in chisel2.py.	2022-03-11 08:39:06 -08:00
Jacob Hoffman-Andrews	6395701244	Update CI to test go1.18beta2 (#5982 ) This requires using GODEBUG to enable a couple of thing turned off by go1.18 (TLS 1.0/1.1, SHA-1 CSRs). Also add help for a failure mode of cross builds.	2022-03-09 13:42:15 -08:00
Aaron Gable	32973392de	Revert "Bump google.golang.org/grpc from 1.36.1 to 1.44.0" (#5981 ) Reverts letsencrypt/boulder#5963 Turns out the tests are still flaky -- using the `grpc.WaitForReady(true)` connection option results in sometimes seeing 9 entries added to the purger queue, and sometimes 10 entries. Reverting because flakiness on main should not be tolerated.	2022-03-08 10:32:30 -08:00
dependabot[bot]	2ec03b377b	Bump google.golang.org/grpc from 1.36.1 to 1.44.0 (#5963 ) Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.36.1 to 1.44.0. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](https://github.com/grpc/grpc-go/compare/v1.36.1...v1.44.0) Also update akamai-purger integration test to avoid experimental API. The `conn.GetState()` API is marked experimental and may change behavior at any time. It appears to have changed between v1.36.1 and v1.44.0, and so the akamai-purger integration tests which rely on it break. Rather than writing our own loop which polls `conn.GetState()`, just use the stable `WaitForReady(true)` connection option, and apply it to all connections by setting it as a default option in the dial options.	2022-03-07 17:00:20 -08:00
Samantha	80fe3aed54	akamai-purger: Cleanup (#5949 ) Light cleanup of akamai-purger and the akamai cache-client. This does not make any material changes to logic. - Use `errors.New` and `errors.Is` instead of a custom `ErrFatal` type and `errors.As` - Add whitespace to separate chunks of execution and error checking from one another - Use `logger.Infof` and `logger.Errorf` instead of wrapped calls to `fmt.Sprintf` - Remove capital letters from the beginning of error messages - Additional comments and removal of some that are no longer accurate	2022-02-24 20:57:25 -08:00
Andrew Gabbitas	d006588f46	Orphan finder: Fix redundant syslog config value (#5971 ) Replace redundant stdoutlevel with a sysloglevel value in test configs.	2022-02-24 14:24:03 -08:00
Aaron Gable	d1777c5fda	Fix shadowing assignments inside closures (#5944 ) When inside a closure, it is important to not accidentally assign to variables declared outside the scope of the closure. Doing so causes static analysis tools (such as `errcheck`) to be unable to evaluate the lifetime of the variable, and unable to determine if it is appropriately read from before being assigned to again. Fix two instances where we assign to a variable declared in the closure's enclosing scope, rather than declaring a new variable with the same name.	2022-02-16 14:33:17 -08:00
Andrew Gabbitas	3bb3421631	Remove go 1.17.5 from test matrix (#5940 )	2022-02-12 12:03:23 -07:00
Andrew Gabbitas	fcb817897c	Add go1.17.7 to the test matrix (#5939 )	2022-02-10 17:50:46 -07:00
Aaron Gable	305ef9cce9	Improve error checking paradigm (#5920 ) We have decided that we don't like the if err := call(); err != nil syntax, because it creates confusing scopes, but we have not cleaned up all existing instances of that syntax. However, we have now found a case where that syntax enables a bug: It caused readers to believe that a later err = call() statement was assigning to an already-declared err in the local scope, when in fact it was assigning to an already-declared err in the parent scope of a closure. This caused our ineffassign and staticcheck linters to be unable to analyze the lifetime of the err variable, and so they did not complain when we never checked the actual value of that error. This change standardizes on the two-line error checking syntax everywhere, so that we can more easily ensure that our linters are correctly analyzing all error assignments.	2022-02-01 14:42:43 -07:00
Samantha	83a7220f4e	admin-revoker: Block and revoke by private key (#5878 ) Incidents of key compromise where proof is supplied in the form of a private key have historically been labor intensive for SRE. This PR seeks to automate the process of embedded public key validation , query for issuance, revocation, and blocking by SPKI hash. For an example of private keys embedding a mismatched public key, see: https://blog.hboeck.de/archives/888-How-I-tricked-Symantec-with-a-Fake-Private-Key.html. Adds two new sub-commands (private-key-block and private-key-revoke) and one new flag (-dry-run) to admin-revoker. Both new sub-commands validate that the provided private key and provide the operator with an issuance count. Any blocking and revocation actions are gated by the new '-dry-run' flag, which is 'true' by default. private-key-block: if -dry-run=false, will immediately block issuance for the provided key. The operator is informed that bad-key-revoker will eventually revoke any certificates using the provided key. private-key-revoke: if -dry-run=false, will revoke all certificates using the provided key and then blocks future issuance. This avoids a race with the bad-key-revoker. This command will execute successfully even if issuance for the provided key is already blocked. - Add support for blocking issuance by private key to admin-revoker - Add support for revoking certificates by private key to admin-revoker - Create new package called 'privatekey' - Move private key loading logic from 'issuance' to 'privatekey' - Add embedded public key verification to 'privatekey' - Add new field `skipBlockKey` to `AdministrativelyRevokeCertificate` protobuf - Add check in RA to ensure that only KeyCompromise revocations use `skipBlockKey` Fixes #5785	2022-01-21 10:29:12 -08:00
Aaron Gable	ab79f96d7b	Fixup staticcheck and stylecheck, and violations thereof (#5897 ) Add `stylecheck` to our list of lints, since it got separated out from `staticcheck`. Fix the way we configure both to be clearer and not rely on regexes. Additionally fix a number of easy-to-change `staticcheck` and `stylecheck` violations, allowing us to reduce our number of ignored checks. Part of #5681	2022-01-20 16:22:30 -08:00
Aaron Gable	11263893eb	Remove RA NewAuthorization and NewCertificate (#5900 ) These gRPC methods were only used by the ACMEv1 code paths. Now that boulder-wfe has been fully removed, we can be confident that no clients ever call these methods, and can remove them from the gRPC service interface. Part of #5816	2022-01-20 14:47:21 -08:00
Aaron Gable	18389c9024	Remove dead code (#5893 ) Running an older version (v0.0.1-2020.1.4) of `staticcheck` in whole-program mode (`staticcheck --unused.whole-program=true -- ./...`) finds various instances of unused code which don't normally show up as CI issues. I've used this to find and remove a large chunk of the unused code, to pave the way for additional large deletions accompanying the WFE1 removal. Part of #5681	2022-01-19 12:23:06 -08:00
Jacob Hoffman-Andrews	1ee91fe59f	Add doc and debugging tool for Redis (#5885 )	2022-01-18 18:32:37 -08:00
Aaron Gable	ad0e56ec4a	Remove test coverage on go1.17 (#5882 )	2022-01-14 16:22:24 -08:00
Samantha	7d4facc403	test: Install arm64 protobuf for arm64 docker images (#5880 )	2022-01-13 13:45:38 -08:00
Aaron Gable	114d10a6cb	Integrate goodkey checks into cert-checker (#5870 )	2022-01-11 09:42:12 -08:00
Aaron Gable	2f2bac4bf2	Improve readability of A and AAAA lookup errors (#5843 ) When we query DNS for a host, and both the A and AAAA lookups fail or are empty, combine both errors into a single error rather than only returning the error from the A lookup. Fixes #5819 Fixes #5319	2022-01-03 10:39:25 -08:00
Aaron Gable	ff726dfc9f	Make revocation integration tests comprehensive (#5856 ) Overhaul the revocation integration tests to comprehensively test every combination of: - revoking a cert vs a precert - revoking via the cert key, the subscriber key, or a separate account that has validation for all of the names in the cert - revoking for reason Unspecified vs for reason KeyCompromise Also update a number of the python tests to verify that they cannot revoke for reason keyCompromise, but can and do revoke with other reasons.	2021-12-20 14:38:39 -08:00
Samantha	8a1b51f81b	Use go install for dep binaries when building docker (#5858 ) Update the way our docker build script installs the binaries we rely on at runtime to avoid "go get" deprecation warnings. Fixes #5744	2021-12-16 19:02:46 -08:00
Samantha	c46003c52f	Fix error in the tag and upload script (#5857 ) When looping over multiple Go versions this script currently exits in error because we attempt to create a cross-compiling node even though it already exists. This allows subsequent builds to make use of the Docker cache, reducing the build time by ~400 seconds. - Only create the cross-compiling node if it doesn't exist - No longer remove the cross-compiling node on exit	2021-12-16 19:00:47 -08:00
Jacob Hoffman-Andrews	1c573d592b	Add account cache to WFE (#5855 ) Followup from #5839. I chose groupcache/lru as our LRU cache implementation because it's part of the golang org, written by one of the Go authors, and very simple and easy to read. This adds an `AccountGetter` interface that is implemented by both the AccountCache and the SA. If the WFE config includes an AccountCache field, it will wrap the SA in an AccountCache with the configured max size and expiration time. We set an expiration time on account cache entries because we want a bounded amount of time that they may be stale by. This will be used in conjunction with a delay on account-updating pathways to ensure we don't allow authentication with a deactivated account or changed key. The account cache stores corepb.Registration objects because protobufs have an established way to do a deep copy. Deep copies are important so the cache can maintain its own internal state and ensure nothing external is modifying it. As part of this process I changed construction of the WFE. Previously, "SA" and "RA" were public fields that were mutated after construction. Now they are parameters to the constructor, along with the new "accountGetter" parameter. The cache includes stats for requests categorized by hits and misses.	2021-12-15 11:10:23 -08:00
Aaron Gable	89000bd61c	Add close-primes detection via Fermat's factorization (#5853 ) Add a new check to GoodKey which attempts to factor the public modulus of the presented key using Fermat's factorization method. This method will succeed if and only if the prime factors are very close to each other -- i.e. almost certainly were not selected independently from a random uniform distribution, but were instead calculated via some other less secure method. To support this new feature, add a new config flag to the RA, CA, and WFE, which all use the GoodKey checks. As part of adding this new config value, refactor the GoodKey config items into their own config struct which can be re-used across all services. If the new `FermatRounds` config value has not been set, it will default to zero, causing no factorization to be attempted. Fixes #5850 Part of #5851	2021-12-14 09:19:33 -08:00
Aaron Gable	c88640c816	Run tests on Go 1.17.5 (#5845 ) Build a new docker container for the new Go 1.17.5 security release, which includes a fix for the `net/http` package. Update our CI to run tests on both our current and the new go versions.	2021-12-10 14:44:23 -08:00
Samantha	23cf26f392	test: Delete docker buildx node unconditionally (#5847 ) Currently, if `docker buildx` fails the cross-compilation node, created before the build starts, will never be deleted. This ensures that the cross-compilation node is always deleted before `tag_and_upload.sh` exits.	2021-12-10 13:12:08 -08:00

1 2 3 4 5 ...

1329 Commits