boulder

Commit Graph

Author	SHA1	Message	Date
Aaron Gable	6ae6aa8e90	Dynamically generate grpc-creds at integration test startup (#7477 ) The summary here is: - Move test/cert-ceremonies to test/certs - Move .hierarchy (generated by the above) to test/certs/webpki - Remove our mapping of .hierarchy to /hierarchy inside docker - Move test/grpc-creds to test/certs/ipki - Unify the generation of both test/certs/webpki and test/certs/ipki into a single script at test/certs/generate.sh - Make that script the entrypoint of a new docker compose service - Have t.sh and tn.sh invoke that service to ensure keys and certs are created before tests run No production changes are necessary, the config changes here are just for testing purposes. Part of https://github.com/letsencrypt/boulder/issues/7476	2024-05-15 11:31:23 -04:00
Aaron Gable	94d14689bf	Implement unpredictable issuance from similar intermediates (#7418 ) Replace the CA's "useForRSA" and "useForECDSA" config keys with a single "active" boolean. When the CA starts up, all active RSA issuers will be used to issue precerts with RSA pubkeys, and all ECDSA issuers will be used to issue precerts with ECDSA pubkeys (if the ECDSAForAll flag is true; otherwise just those that are on the allow-list). All "inactive" issuers can still issue OCSP responses, CRLs, and (notably) final certificates. Instead of using the "useForRSA" and "useForECDSA" flags, plus implicit config ordering, to determine which issuer to use to handle a given issuance, simply use the issuer's public key algorithm to determine which issuances it should be handling. All implicit ordering considerations are removed, because the "active" certificates now just form a pool that is sampled from randomly. To facilitate this, update some unit and integration tests to be more flexible and try multiple potential issuing intermediates, particularly when constructing OCSP requests. For this change to be safe to deploy with no user-visible behavior changes, the CA configs must contain: - Exactly one RSA-keyed intermediate with "useForRSALeaves" set to true; and - Exactly one ECDSA-keyed intermediate with "useForECDSALeaves" set to true. If the configs contain more than one intermediate meeting one of the bullets above, then randomized issuance will begin immediately. Fixes https://github.com/letsencrypt/boulder/issues/7291 Fixes https://github.com/letsencrypt/boulder/issues/7290	2024-04-18 10:00:38 -07:00
Aaron Gable	327f96d281	Update integration test hierarchy for the modern era (#7411 ) Update the hierarchy which the integration tests auto-generate inside the ./hierarchy folder to include three intermediates of each key type, two to be actively loaded and one to be held in reserve. To facilitate this: - Update the generation script to loop, rather than hard-coding each intermediate we want - Improve the filenames of the generated hierarchy to be more readable - Replace the WFE's AIA endpoint with a thin aia-test-srv so that we don't have to have NameIDs hardcoded in our ca.json configs Having this new hierarchy will make it easier for our integration tests to validate that new features like "unpredictable issuance" are working correctly. Part of https://github.com/letsencrypt/boulder/issues/729	2024-04-08 14:06:00 -07:00
Aaron Gable	10e894a172	Create new admin tool (#7276 ) Create a new administration tool "bin/admin" as a successor to and replacement of "admin-revoker". This new tool supports all the same fundamental capabilities as the old admin-revoker, including: - Revoking by serial, by batch of serials, by incident table, and by private key - Blocking a key to let bad-key-revoker take care of revocation - Clearing email addresses from all accounts that use them Improvements over the old admin-revoker include: - All commands run in "dry-run" mode by default, to prevent accidental executions - All revocation mechanisms allow setting the revocation reason, skipping blocking the key, indicating that the certificate is malformed, and controlling the number of parallel workers conducting revocation - All revocation mechanisms do not parse the cert in question, leaving that to the RA - Autogenerated usage information for all subcommands - A much more modular structure to simplify adding more capabilities in the future - Significantly simplified tests with smaller mocks The new tool has analogues of all of admin-revokers unit tests, and all integration tests have been updated to use the new tool instead. A future PR will remove admin-revoker, once we're sure SRE has had time to update all of their playbooks. Fixes https://github.com/letsencrypt/boulder/issues/7135 Fixes https://github.com/letsencrypt/boulder/issues/7269 Fixes https://github.com/letsencrypt/boulder/issues/7268 Fixes https://github.com/letsencrypt/boulder/issues/6927 Part of https://github.com/letsencrypt/boulder/issues/6840	2024-02-07 09:35:18 -08:00
Jacob Hoffman-Andrews	ce5632b480	Remove `service1` / `service2` names in consul (#7266 ) These names corresponded to single instances of a service, and were primarily used for (a) specifying which interface to bind a gRPC port on and (b) allowing `health-checker` to check individual instances rather than a service as a whole. For (a), change the `--grpc-addr` flags to bind to "all interfaces." For (b), provide a specific IP address and port for health checking. This required adding a `--hostOverride` flag for `health-checker` because the service certificates contain hostname SANs, not IP address SANs. Clarify the situation with nonce services a little bit. Previously we had one nonce "service" in Consul and got nonces from that (i.e. randomly between the two nonce-service instances). Now we have two nonce services in consul, representing multiple datacenters, and one of them is explicitly configured as the "get" service, while both are configured as the "redeem" service. Part of #7245. Note this change does not yet get rid of the rednet/bluenet distinction, nor does it get rid of all use of 10.88.88.88. That will be a followup change.	2024-01-22 09:34:20 -08:00
Phil Porada	56a11f0896	Fix CI failures related to akamai-test-srv (#6815 ) Fixes a CI problem introduced by https://github.com/letsencrypt/boulder/pull/6758 where we could send two purge requests which caused sporadic CI failures due to an infinite loop. Fixes https://github.com/letsencrypt/boulder/issues/6806	2023-04-13 09:56:30 -07:00
Samantha	6d519059a3	akamai-purger: Deprecate PurgeInterval config field (#6489 ) Fixes #6003	2022-11-04 12:44:35 -07:00
Aaron Gable	dab8a71b0e	Use new RA methods from WFE revocation path (#5983 ) Simplify the WFE `RevokeCertificate` API method in three ways: - Remove most of the logic checking if the requester is authorized to revoke the certificate in question (based on who is making the request, what authorizations they have, and what reason they're requesting). That checking is now done by the RA. Instead, simply verify that the JWS is authenticated. - Remove the hard-to-read `authorizedToRevoke` callbacks, and make the `revokeCertBySubscriberKey` (nee `revokeCertByKeyID`) and `revokeCertByCertKey` (nee `revokeCertByJWK`) helpers much more straight-line in their execution logic. - Call the RA's new `RevokeCertByApplicant` and `RevokeCertByKey` gRPC methods, rather than the deprecated `RevokeCertificateWithReg`. This change, without any flag flips, should be invisible to the end-user. It will slightly change some of our log message formats. However, by now relying on the new RA gRPC revocation methods, this change allows us to change our revocation policies by enabling the `AllowDoubleRevocation` and `MozRevocationReasons` feature flags, which affect the behavior of those new helpers. Fixes #5936	2022-03-28 14:14:11 -07:00
Samantha	7c22b99d63	akamai-purger: Improve throughput and configuration safety (#6006 ) - Add new configuration key `throughput`, a mapping which contains all throughput related akamai-purger settings. - Deprecate configuration key `purgeInterval` in favor of `purgeBatchInterval` in the new `throughput` configuration mapping. - When no `throughput` or `purgeInterval` is provided, the purger uses optimized default settings which offer 1.9x the throughput of current production settings. - At startup, all throughput related settings are modeled to ensure that we don't exceed the limits imposed on us by Akamai. - Queue is now `[][]string`, instead of `[]string`. - When a given queue entry is purged we know all 3 of it's URLs were purged. - At startup we know the size of a theoretical request to purge based on the number of queue entries included - Raises the queue size from ~333-thousand cached OCSP responses to 1.25-million, which is roughly 6 hours of work using the optimized default settings - Raise `purgeInterval` in test config from 1ms, which violates API limits, to 800ms Fixes #5984	2022-03-23 17:23:07 -07:00
Jacob Hoffman-Andrews	ba0ea090b2	integration: save hierarchy across runs (#5729 ) This allows repeated runs using the same hiearchy, and avoids spurious errors from ocsp-updater saying "This CA doesn't have an issuer cert with ID XXX" Fixes #5721	2021-10-20 17:06:33 -07:00
Aaron Gable	3666322817	Add health-checker tool and use it from startservers.py (#5095 ) This adds a new tool, `health-checker`, which is a client of the new Health Checker Service that has been integrated into all of our boulder components. This tool takes an address, a timeout, and a config file. It then attempts to connect to a gRPC Health Service at the given address, retrying until it hits its timeout, using credentials specified by the config file. This is then wrapped by a new function `waithealth` in our Python helpers, which serves much the same function as `waitport`, but specifically for services which surface a gRPC Health Service This in turn requires slight modifications to `startservers`, namely specifying the address and port on which each service starts its gRPC listener. Finally, this change also introduces new credentials for this health-checker, and adds those credentials as a valid client to all services' json configs. A similar change would have to be made to our production configs if we were to establish a long-lived health checker/prober in prod. Fixes #5074	2020-10-06 15:01:35 -07:00
Aaron Gable	dea2f6ef92	Refactor and cleanup python integration tests (#4945 )	2020-07-13 14:31:15 -07:00
Aaron Gable	e906b9e272	Add test for re-signed OCSP revocation reasons (#4937 )	2020-07-10 11:13:33 -07:00
Roland Bracewell Shoemaker	7673f02803	Use cmd/ceremony in integration tests (#4832 ) This ended up taking a lot more work than I expected. In order to make the implementation more robust a bunch of stuff we previously relied on has been ripped out in order to reduce unnecessary complexity (I think I insisted on a bunch of this in the first place, so glad I can kill it now). In particular this change: * Removes bhsm and pkcs11-proxy: softhsm and pkcs11-proxy don't play well together, and any softhsm manipulation would need to happen on bhsm, then require a restart of pkcs11-proxy to pull in the on-disk changes. This makes manipulating softhsm from the boulder container extremely difficult, and because of the need to initialize new on each run (described below) we need direct access to the softhsm2 tools since pkcs11-tool cannot do slot initialization operations over the wire. I originally argued for bhsm as a way to mimic a network attached HSM, mainly so that we could do network level fault testing. In reality we've never actually done this, and the extra complexity is not really realistic for a handful of reasons. It seems better to just rip it out and operate directly on a local softhsm instance (the other option would be to use pkcs11-proxy locally, but this still would require manually restarting the proxy whenever softhsm2-util was used, and wouldn't really offer any realistic benefit). * Initializes the softhsm slots on each integration test run, rather than when creating the docker image (this is necessary to prevent churn in test/cert-ceremonies/generate.go, which would need to be updated to reflect the new slot IDs each time a new boulder-tools image was created since slot IDs are randomly generated) * Installs softhsm from source so that we can use a more up to date version (2.5.0 vs. 2.2.0 which is in the debian repo) * Generates the root and intermediate private keys in softhsm and writes out the root and intermediate public keys to /tmp for use in integration tests (the existing test-{ca,root} certs are kept in test/ because they are used in a whole bunch of unit tests. At some point these should probably be renamed/moved to be more representative of what they are used for, but that is left for a follow-up in order to keep the churn in this PR as related to the ceremony work as possible) Another follow-up item here is that we should really be zeroing out the database at the start of each integration test run, since certain things like certificates and ocsp responses will be signed by a key/issuer that is no longer is use/doesn't match the current key/issuer. Fixes #4832.	2020-06-03 15:20:23 -07:00
Jacob Hoffman-Andrews	5af7541c85	Improve output when Go integration tests fail. (#4734 ) Right now we show output like: Traceback (most recent call last): File "test/integration-test.py", line 60, in run_go_tests subprocess.check_call(cmdLine, shell=False, stderr=subprocess.STDOUT) File "/usr/lib/python3.5/subprocess.py", line 271, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['go', 'test', '-tags', 'integration', '-count=1', '-race', './test/integration']' returned non-zero exit status 1 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "test/integration-test.py", line 414, in main() File "test/integration-test.py", line 293, in main run_go_tests(args.test_case_filter) File "test/integration-test.py", line 62, in run_go_tests raise(Exception("%s. Output:\n%s" % (e, e.output))) Exception: Command '['go', 'test', '-tags', 'integration', '-count=1', '-race', './test/integration']' returned non-zero exit status 1. Output: None This change removes the try / raise clauses that were causing this double exception logging. The original purpose of these clauses was to make sure we logged output on failure. To continue to fulfill that purpose, I switched the run function to use check_call instead of check_output. check_output captures the stdout; check_call emits it to the caller's stdout as normal, so we still see the output. I also changed the two cases that actually wanted to process output so they use check_output directly.	2020-04-06 17:42:40 -07:00
Jacob Hoffman-Andrews	1146eecac3	integration: use python3 (#4582 ) Python 2 is over in 1 month 4 days: https://pythonclock.org/ This rolls forward most of the changes in #4313. The original change was rolled back in #4323 because it broke `docker-compose up`. This change fixes those original issues by (a) making sure `requests` is installed and (b) sourcing a virtualenv containing the `requests` module before running start.py. Other notable changes in this: - Certbot has changed the developer instructions to install specific packages rather than rely on `letsencrypt-auto --os-packages-only`, so we follow suit. - Python3 now has a `bytes` type that is used in some places that used to provide `str`, and all `str` are now Unicode. That means going from `bytes` to `str` and back requires explicit `.decode()` and `.encode()`. - Moved from urllib2 to requests in many places.	2019-11-28 09:54:58 -05:00
Jacob Hoffman-Andrews	0c9ca050ab	Tidy up default_config_dir in integration test (#4509 ) We now expect that the config dir is always set, so we make that explicit in the integration test and error if that's not true. This change also renames the variable to just "config_dir", and removes the parameter to startservers.start, which is currently never set to anything other than its default value. This also explicitly sets the environment variable in .travis.yml.	2019-10-25 09:51:48 -07:00
Roland Bracewell Shoemaker	db01830508	Return OCSP unauthorized status if the certificate is expired (#4380 ) The ocsp-updater ocspStaleMaxAge config var has to be bumped up to ~7 months so that when it is run after the six-months-ago run it will actually update the ocsp responses generated during that period and mark the certificate status row as expired. Fixes #4338.	2019-08-01 14:13:27 -07:00
Roland Bracewell Shoemaker	59ef95230d	integration: Fix typo in test/helpers.py (#4369 )	2019-07-26 14:02:52 -04:00
Jacob Hoffman-Andrews	3af49a16be	Revert "integration: move to Python3 (#4313 )" (#4323 ) This reverts commit `796a7aa2f4`. People's tests have been breaking on `docker-compose up` with the following output: ``` ImportError: No module named requests ``` Fixes #4322	2019-07-03 11:35:45 -07:00
Jacob Hoffman-Andrews	796a7aa2f4	integration: move to Python3 (#4313 ) * integration: move to Python3 - Add parentheses to all print and raise calls. - Python3 distinguishes bytes from strings. Add encode() and decode() calls as needed to provide the correct type. - Use requests library consistently (urllib3 is not in Python3). - Remove shebang from Python files without a main, and update shebang for integration-test.py.	2019-07-02 09:28:49 -04:00
Roland Bracewell Shoemaker	24f150f8fc	Re-apply #4279 with requests fix (#4286 ) Move from using `requests` to `urllib2` in `helpers.py`. Verified this works with `docker-compose up`. In the future we really should be installing our own python dependencies in the boulder-tools image rather than relying on getting them by using the certbot virtualenv.	2019-06-24 11:58:37 -07:00
Adrien Ferrand	8e31d58113	Revert "tests: Switch to instant OCSP verification in int. tests (#4279 )" (#4285 ) This reverts commit `f4b9235acb`. Fixes #4284	2019-06-23 12:06:16 -07:00
Roland Bracewell Shoemaker	f4b9235acb	tests: Switch to instant OCSP verification in int. tests (#4279 ) * Switch to instant OCSP verification in integration tests * Move waitport to helpers and use it to determine if ocsp-responder is alive in test_single_ocsp	2019-06-21 09:53:01 -04:00
Roland Bracewell Shoemaker	0a16b5f57d	Reduce akamai purger interval in integration tests (#4277 ) and reduce the verify_akamai_purge deadline/sleep to match the much smaller interval.	2019-06-20 16:31:44 -04:00
Roland Bracewell Shoemaker	acc44498d1	RA: Make RevokeAtRA feature standard behavior (#4268 ) Now that it is live in production and is working as intended we can remove the old ocsp-updater functionality entirely. Fixes #4048.	2019-06-20 14:32:53 -04:00
Jacob Hoffman-Andrews	18a3c78d6f	Refactor test_caa and twenty-days-ago setup (#4261 ) As part of #4241, I need to introduce some twenty-days-ago setup. So I refactored the only current instance (test_caa) to use a style where setup functions can be registered right next to the test cases they affect. The @register_twenty_days_ago is Python for "call register_twenty_days_ago with the thing on the next line as an argument." I also cleaned up a bunch of related stuff: * Removed the ACCOUNT_URI environment variable and associated function params. This was introduced in in #3736 to pass a URI to challtestsrv before we refactored for more dynamic updates. It's not used any more. * Removed a try / except from startChallSrv that needlessly hid errors. * Move setting of DNS fixtures for caa_test into the test case itself.	2019-06-18 14:58:06 -07:00
Roland Bracewell Shoemaker	098a761c02	ocsp-updater: Remove integrated akamai purger (#4258 ) This is now an external service. Also bumps up the deadline in the integration test helper which checks for purging because using the remote service from the ocsp-updater takes a little longer. Once we remove ocsp-updater revocation support that can probably be cranked back down to a more reasonable timeframe.	2019-06-12 09:36:53 -04:00
Jacob Hoffman-Andrews	8f578f3a93	Improve integration tests (#4143 ) - Move fakeclock, get_future_output, and random_domain to helpers.py. - Remove tempdir handling from integration-test.py since it's already done in helpers.py - Consolidate handling of config dir into helpers.py, and add CONFIG_NEXT boolean. - Move RevokeAtRA config gating into verify_revocation to reduce redundancy. - Skip load-balancing test when filter is enabled. - Ungate test_sct_embedding - Rework test_ct_submissions, which was out of date. In particular, have a couple of logs where submitFinalCert: false, and make ct-test-srv store submission counts by hostnames for better test case isolation.	2019-04-04 10:59:38 -07:00
Roland Bracewell Shoemaker	3e54cea295	Implement direct revocation at RA (#4043 ) Implements a feature that enables immediate revocation instead of marking a certificate revoked and waiting for the OCSP-Updater to generate the OCSP response. This means that as soon as the request returns from the WFE the revoked OCSP response should be available to the user. This feature requires that the RA be configured to use the standalone Akamai purger service. Fixes #4031.	2019-02-14 14:47:42 -05:00
Roland Bracewell Shoemaker	046955e99c	Add a standalone akamai purger service (#4040 ) Fixes #4030.	2019-02-05 09:00:31 -08:00
Roland Bracewell Shoemaker	cdef80ce67	Remove Akamai CCU v2 support (#3994 ) Fixes #3991.	2019-01-08 12:28:11 -08:00
Roland Bracewell Shoemaker	6a47decc33	Deflake akamai purger integration testing (#3961 ) The problem here was that we were doing revocation tests in the v2 integration file that didn't block on getting the revoked OCSP status. This meant that if the OCSP responder was running slow it could execute a revoked cert tick between reseting the akamai test server in the next test and sending another purge request which would mean we saw two purge requests when we expected to see one. The fix was to add the blocking and purge checking/reseting to the v2 tests. Doing this without duplicating a bunch of code required factoring a number of functions out into a third helpers file (I think more code could be abstracted out to this file but just wanted to start with what was needed for this change.)	2018-11-30 14:17:23 -08:00

33 Commits