Commit Graph

33 Commits

Author SHA1 Message Date
Aaron Gable 6ae6aa8e90
Dynamically generate grpc-creds at integration test startup (#7477)
The summary here is:
- Move test/cert-ceremonies to test/certs
- Move .hierarchy (generated by the above) to test/certs/webpki
- Remove our mapping of .hierarchy to /hierarchy inside docker
- Move test/grpc-creds to test/certs/ipki
- Unify the generation of both test/certs/webpki and test/certs/ipki
into a single script at test/certs/generate.sh
- Make that script the entrypoint of a new docker compose service
- Have t.sh and tn.sh invoke that service to ensure keys and certs are
created before tests run

No production changes are necessary, the config changes here are just
for testing purposes.

Part of https://github.com/letsencrypt/boulder/issues/7476
2024-05-15 11:31:23 -04:00
Aaron Gable 94d14689bf
Implement unpredictable issuance from similar intermediates (#7418)
Replace the CA's "useForRSA" and "useForECDSA" config keys with a single
"active" boolean. When the CA starts up, all active RSA issuers will be
used to issue precerts with RSA pubkeys, and all ECDSA issuers will be
used to issue precerts with ECDSA pubkeys (if the ECDSAForAll flag is
true; otherwise just those that are on the allow-list). All "inactive"
issuers can still issue OCSP responses, CRLs, and (notably) final
certificates.

Instead of using the "useForRSA" and "useForECDSA" flags, plus implicit
config ordering, to determine which issuer to use to handle a given
issuance, simply use the issuer's public key algorithm to determine
which issuances it should be handling. All implicit ordering
considerations are removed, because the "active" certificates now just
form a pool that is sampled from randomly.

To facilitate this, update some unit and integration tests to be more
flexible and try multiple potential issuing intermediates, particularly
when constructing OCSP requests.

For this change to be safe to deploy with no user-visible behavior
changes, the CA configs must contain:
- Exactly one RSA-keyed intermediate with "useForRSALeaves" set to true;
and
- Exactly one ECDSA-keyed intermediate with "useForECDSALeaves" set to
true.

If the configs contain more than one intermediate meeting one of the
bullets above, then randomized issuance will begin immediately.

Fixes https://github.com/letsencrypt/boulder/issues/7291
Fixes https://github.com/letsencrypt/boulder/issues/7290
2024-04-18 10:00:38 -07:00
Aaron Gable 327f96d281
Update integration test hierarchy for the modern era (#7411)
Update the hierarchy which the integration tests auto-generate inside
the ./hierarchy folder to include three intermediates of each key type,
two to be actively loaded and one to be held in reserve. To facilitate
this:
- Update the generation script to loop, rather than hard-coding each
intermediate we want
- Improve the filenames of the generated hierarchy to be more readable
- Replace the WFE's AIA endpoint with a thin aia-test-srv so that we
don't have to have NameIDs hardcoded in our ca.json configs

Having this new hierarchy will make it easier for our integration tests
to validate that new features like "unpredictable issuance" are working
correctly.

Part of https://github.com/letsencrypt/boulder/issues/729
2024-04-08 14:06:00 -07:00
Aaron Gable 10e894a172
Create new admin tool (#7276)
Create a new administration tool "bin/admin" as a successor to and
replacement of "admin-revoker".

This new tool supports all the same fundamental capabilities as the old
admin-revoker, including:
- Revoking by serial, by batch of serials, by incident table, and by
private key
- Blocking a key to let bad-key-revoker take care of revocation
- Clearing email addresses from all accounts that use them

Improvements over the old admin-revoker include:
- All commands run in "dry-run" mode by default, to prevent accidental
executions
- All revocation mechanisms allow setting the revocation reason,
skipping blocking the key, indicating that the certificate is malformed,
and controlling the number of parallel workers conducting revocation
- All revocation mechanisms do not parse the cert in question, leaving
that to the RA
- Autogenerated usage information for all subcommands
- A much more modular structure to simplify adding more capabilities in
the future
- Significantly simplified tests with smaller mocks

The new tool has analogues of all of admin-revokers unit tests, and all
integration tests have been updated to use the new tool instead. A
future PR will remove admin-revoker, once we're sure SRE has had time to
update all of their playbooks.

Fixes https://github.com/letsencrypt/boulder/issues/7135
Fixes https://github.com/letsencrypt/boulder/issues/7269
Fixes https://github.com/letsencrypt/boulder/issues/7268
Fixes https://github.com/letsencrypt/boulder/issues/6927
Part of https://github.com/letsencrypt/boulder/issues/6840
2024-02-07 09:35:18 -08:00
Jacob Hoffman-Andrews ce5632b480
Remove `service1` / `service2` names in consul (#7266)
These names corresponded to single instances of a service, and were
primarily used for (a) specifying which interface to bind a gRPC port on
and (b) allowing `health-checker` to check individual instances rather
than a service as a whole.

For (a), change the `--grpc-addr` flags to bind to "all interfaces." For
(b), provide a specific IP address and port for health checking. This
required adding a `--hostOverride` flag for `health-checker` because the
service certificates contain hostname SANs, not IP address SANs.

Clarify the situation with nonce services a little bit. Previously we
had one nonce "service" in Consul and got nonces from that (i.e.
randomly between the two nonce-service instances). Now we have two nonce
services in consul, representing multiple datacenters, and one of them
is explicitly configured as the "get" service, while both are configured
as the "redeem" service.

Part of #7245.

Note this change does not yet get rid of the rednet/bluenet distinction,
nor does it get rid of all use of 10.88.88.88. That will be a followup
change.
2024-01-22 09:34:20 -08:00
Phil Porada 56a11f0896
Fix CI failures related to akamai-test-srv (#6815)
Fixes a CI problem introduced by
https://github.com/letsencrypt/boulder/pull/6758 where we could send two
purge requests which caused sporadic CI failures due to an infinite
loop.

Fixes https://github.com/letsencrypt/boulder/issues/6806
2023-04-13 09:56:30 -07:00
Samantha 6d519059a3
akamai-purger: Deprecate PurgeInterval config field (#6489)
Fixes #6003
2022-11-04 12:44:35 -07:00
Aaron Gable dab8a71b0e
Use new RA methods from WFE revocation path (#5983)
Simplify the WFE `RevokeCertificate` API method in three ways:
- Remove most of the logic checking if the requester is authorized to
  revoke the certificate in question (based on who is making the
  request, what authorizations they have, and what reason they're
  requesting). That checking is now done by the RA. Instead, simply
  verify that the JWS is authenticated.
- Remove the hard-to-read `authorizedToRevoke` callbacks, and make the
  `revokeCertBySubscriberKey` (nee `revokeCertByKeyID`) and
  `revokeCertByCertKey` (nee `revokeCertByJWK`) helpers much more
  straight-line in their execution logic.
- Call the RA's new `RevokeCertByApplicant` and `RevokeCertByKey` gRPC
  methods, rather than the deprecated `RevokeCertificateWithReg`.

This change, without any flag flips, should be invisible to the
end-user. It will slightly change some of our log message formats.
However, by now relying on the new RA gRPC revocation methods, this
change allows us to change our revocation policies by enabling the
`AllowDoubleRevocation` and `MozRevocationReasons` feature flags, which
affect the behavior of those new helpers.

Fixes #5936
2022-03-28 14:14:11 -07:00
Samantha 7c22b99d63
akamai-purger: Improve throughput and configuration safety (#6006)
- Add new configuration key `throughput`, a mapping which contains all
  throughput related akamai-purger settings.
- Deprecate configuration key `purgeInterval` in favor of `purgeBatchInterval` in
  the new `throughput` configuration mapping.
- When no `throughput` or `purgeInterval` is provided, the purger uses optimized
  default settings which offer 1.9x the throughput of current production settings.
- At startup, all throughput related settings are modeled to ensure that we
  don't exceed the limits imposed on us by Akamai.
- Queue is now `[][]string`, instead of `[]string`.
  - When a given queue entry is purged we know all 3 of it's URLs were purged.
  - At startup we know the size of a theoretical request to purge based on the
    number of queue entries included
- Raises the queue size from ~333-thousand cached OCSP responses to
  1.25-million, which is roughly 6 hours of work using the optimized default
  settings
- Raise `purgeInterval` in test config from 1ms, which violates API limits, to 800ms

Fixes #5984
2022-03-23 17:23:07 -07:00
Jacob Hoffman-Andrews ba0ea090b2
integration: save hierarchy across runs (#5729)
This allows repeated runs using the same hiearchy, and avoids spurious
errors from ocsp-updater saying "This CA doesn't have an issuer cert
with ID XXX"

Fixes #5721
2021-10-20 17:06:33 -07:00
Aaron Gable 3666322817
Add health-checker tool and use it from startservers.py (#5095)
This adds a new tool, `health-checker`, which is a client of the new
Health Checker Service that has been integrated into all of our
boulder components. This tool takes an address, a timeout, and a
config file. It then attempts to connect to a gRPC Health Service at
the given address, retrying until it hits its timeout, using credentials
specified by the config file.

This is then wrapped by a new function `waithealth` in our Python
helpers, which serves much the same function as `waitport`, but
specifically for services which surface a gRPC Health Service
This in turn requires slight modifications to `startservers`, namely
specifying the address and port on which each service starts its
gRPC listener.

Finally, this change also introduces new credentials for this
health-checker, and adds those credentials as a valid client to
all services' json configs. A similar change would have to be made
to our production configs if we were to establish a long-lived 
health checker/prober in prod.

Fixes #5074
2020-10-06 15:01:35 -07:00
Aaron Gable dea2f6ef92
Refactor and cleanup python integration tests (#4945) 2020-07-13 14:31:15 -07:00
Aaron Gable e906b9e272
Add test for re-signed OCSP revocation reasons (#4937) 2020-07-10 11:13:33 -07:00
Roland Bracewell Shoemaker 7673f02803
Use cmd/ceremony in integration tests (#4832)
This ended up taking a lot more work than I expected. In order to make the implementation more robust a bunch of stuff we previously relied on has been ripped out in order to reduce unnecessary complexity (I think I insisted on a bunch of this in the first place, so glad I can kill it now).

In particular this change:

* Removes bhsm and pkcs11-proxy: softhsm and pkcs11-proxy don't play well together, and any softhsm manipulation would need to happen on bhsm, then require a restart of pkcs11-proxy to pull in the on-disk changes. This makes manipulating softhsm from the boulder container extremely difficult, and because of the need to initialize new on each run (described below) we need direct access to the softhsm2 tools since pkcs11-tool cannot do slot initialization operations over the wire. I originally argued for bhsm as a way to mimic a network attached HSM, mainly so that we could do network level fault testing. In reality we've never actually done this, and the extra complexity is not really realistic for a handful of reasons. It seems better to just rip it out and operate directly on a local softhsm instance (the other option would be to use pkcs11-proxy locally, but this still would require manually restarting the proxy whenever softhsm2-util was used, and wouldn't really offer any realistic benefit).
* Initializes the softhsm slots on each integration test run, rather than when creating the docker image (this is necessary to prevent churn in test/cert-ceremonies/generate.go, which would need to be updated to reflect the new slot IDs each time a new boulder-tools image was created since slot IDs are randomly generated)
* Installs softhsm from source so that we can use a more up to date version (2.5.0 vs. 2.2.0 which is in the debian repo)
* Generates the root and intermediate private keys in softhsm and writes out the root and intermediate public keys to /tmp for use in integration tests (the existing test-{ca,root} certs are kept in test/ because they are used in a whole bunch of unit tests. At some point these should probably be renamed/moved to be more representative of what they are used for, but that is left for a follow-up in order to keep the churn in this PR as related to the ceremony work as possible)
Another follow-up item here is that we should really be zeroing out the database at the start of each integration test run, since certain things like certificates and ocsp responses will be signed by a key/issuer that is no longer is use/doesn't match the current key/issuer.

Fixes #4832.
2020-06-03 15:20:23 -07:00
Jacob Hoffman-Andrews 5af7541c85
Improve output when Go integration tests fail. (#4734)
Right now we show output like:

Traceback (most recent call last):
File "test/integration-test.py", line 60, in run_go_tests
subprocess.check_call(cmdLine, shell=False, stderr=subprocess.STDOUT)
File "/usr/lib/python3.5/subprocess.py", line 271, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['go', 'test', '-tags', 'integration', '-count=1', '-race', './test/integration']' returned non-zero exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test/integration-test.py", line 414, in
main()
File "test/integration-test.py", line 293, in main
run_go_tests(args.test_case_filter)
File "test/integration-test.py", line 62, in run_go_tests
raise(Exception("%s. Output:\n%s" % (e, e.output)))
Exception: Command '['go', 'test', '-tags', 'integration', '-count=1', '-race', './test/integration']' returned non-zero exit status 1. Output:
None

This change removes the try / raise clauses that were causing this
double exception logging. The original purpose of these clauses was to
make sure we logged output on failure. To continue to fulfill that
purpose, I switched the run function to use check_call instead of
check_output. check_output captures the stdout; check_call emits it to
the caller's stdout as normal, so we still see the output.

I also changed the two cases that actually wanted to process output so
they use check_output directly.
2020-04-06 17:42:40 -07:00
Jacob Hoffman-Andrews 1146eecac3 integration: use python3 (#4582)
Python 2 is over in 1 month 4 days: https://pythonclock.org/

This rolls forward most of the changes in #4313.

The original change was rolled back in #4323 because it
broke `docker-compose up`. This change fixes those original issues by
(a) making sure `requests` is installed and (b) sourcing a virtualenv
containing the `requests` module before running start.py.

Other notable changes in this:
 - Certbot has changed the developer instructions to install specific packages
rather than rely on `letsencrypt-auto --os-packages-only`, so we follow suit.
 - Python3 now has a `bytes` type that is used in some places that used to
provide `str`, and all `str` are now Unicode. That means going from `bytes` to
`str` and back requires explicit `.decode()` and `.encode()`.
 - Moved from urllib2 to requests in many places.
2019-11-28 09:54:58 -05:00
Jacob Hoffman-Andrews 0c9ca050ab Tidy up default_config_dir in integration test (#4509)
We now expect that the config dir is always set, so we make that
explicit in the integration test and error if that's not true.

This change also renames the variable to just "config_dir", and removes
the parameter to startservers.start, which is currently never set to
anything other than its default value.

This also explicitly sets the environment variable in .travis.yml.
2019-10-25 09:51:48 -07:00
Roland Bracewell Shoemaker db01830508
Return OCSP unauthorized status if the certificate is expired (#4380)
The ocsp-updater ocspStaleMaxAge config var has to be bumped up to ~7 months so that when it is run after the six-months-ago run it will actually update the ocsp responses generated during that period and mark the certificate status row as expired.

Fixes #4338.
2019-08-01 14:13:27 -07:00
Roland Bracewell Shoemaker 59ef95230d integration: Fix typo in test/helpers.py (#4369) 2019-07-26 14:02:52 -04:00
Jacob Hoffman-Andrews 3af49a16be
Revert "integration: move to Python3 (#4313)" (#4323)
This reverts commit 796a7aa2f4.

People's tests have been breaking on `docker-compose up` with the following output:

```
ImportError: No module named requests
```

Fixes #4322
2019-07-03 11:35:45 -07:00
Jacob Hoffman-Andrews 796a7aa2f4 integration: move to Python3 (#4313)
* integration: move to Python3

- Add parentheses to all print and raise calls.
- Python3 distinguishes bytes from strings. Add encode() and
  decode() calls as needed to provide the correct type.
- Use requests library consistently (urllib3 is not in Python3).
- Remove shebang from Python files without a main, and update
  shebang for integration-test.py.
2019-07-02 09:28:49 -04:00
Roland Bracewell Shoemaker 24f150f8fc Re-apply #4279 with requests fix (#4286)
Move from using `requests` to `urllib2` in `helpers.py`. Verified
this works with `docker-compose up`. In the future we really should
be installing our own python dependencies in the boulder-tools image
rather than relying on getting them by using the certbot virtualenv.
2019-06-24 11:58:37 -07:00
Adrien Ferrand 8e31d58113 Revert "tests: Switch to instant OCSP verification in int. tests (#4279)" (#4285)
This reverts commit f4b9235acb.

Fixes #4284
2019-06-23 12:06:16 -07:00
Roland Bracewell Shoemaker f4b9235acb tests: Switch to instant OCSP verification in int. tests (#4279)
* Switch to instant OCSP verification in integration tests
* Move waitport to helpers and use it to determine if ocsp-responder is
  alive in test_single_ocsp
2019-06-21 09:53:01 -04:00
Roland Bracewell Shoemaker 0a16b5f57d Reduce akamai purger interval in integration tests (#4277)
and reduce the verify_akamai_purge deadline/sleep to match the much smaller interval.
2019-06-20 16:31:44 -04:00
Roland Bracewell Shoemaker acc44498d1 RA: Make RevokeAtRA feature standard behavior (#4268)
Now that it is live in production and is working as intended we can remove
the old ocsp-updater functionality entirely.

Fixes #4048.
2019-06-20 14:32:53 -04:00
Jacob Hoffman-Andrews 18a3c78d6f Refactor test_caa and twenty-days-ago setup (#4261)
As part of #4241, I need to introduce some twenty-days-ago setup. So I refactored the
only current instance (test_caa) to use a style where setup functions can be registered right
next to the test cases they affect. The @register_twenty_days_ago is Python for
"call register_twenty_days_ago with the thing on the next line as an argument."

I also cleaned up a bunch of related stuff:
* Removed the ACCOUNT_URI environment variable and associated function params.
This was introduced in in #3736 to pass a URI to challtestsrv before we refactored for
more dynamic updates. It's not used any more.
* Removed a try / except from startChallSrv that needlessly hid errors.
* Move setting of DNS fixtures for caa_test into the test case itself.
2019-06-18 14:58:06 -07:00
Roland Bracewell Shoemaker 098a761c02 ocsp-updater: Remove integrated akamai purger (#4258)
This is now an external service.

Also bumps up the deadline in the integration test helper which checks for
purging because using the remote service from the ocsp-updater takes a little
longer. Once we remove ocsp-updater revocation support that can probably be
cranked back down to a more reasonable timeframe.
2019-06-12 09:36:53 -04:00
Jacob Hoffman-Andrews 8f578f3a93
Improve integration tests (#4143)
- Move fakeclock, get_future_output, and random_domain to helpers.py.
- Remove tempdir handling from integration-test.py since it's already
  done in helpers.py
- Consolidate handling of config dir into helpers.py, and add
  CONFIG_NEXT boolean.
- Move RevokeAtRA config gating into verify_revocation to reduce
  redundancy.
- Skip load-balancing test when filter is enabled.
- Ungate test_sct_embedding
- Rework test_ct_submissions, which was out of date. In particular, have a couple of
  logs where submitFinalCert: false, and make ct-test-srv store submission counts
  by hostnames for better test case isolation.
2019-04-04 10:59:38 -07:00
Roland Bracewell Shoemaker 3e54cea295 Implement direct revocation at RA (#4043)
Implements a feature that enables immediate revocation instead of marking a certificate revoked and waiting for the OCSP-Updater to generate the OCSP response. This means that as soon as the request returns from the WFE the revoked OCSP response should be available to the user. This feature requires that the RA be configured to use the standalone Akamai purger service.

Fixes #4031.
2019-02-14 14:47:42 -05:00
Roland Bracewell Shoemaker 046955e99c Add a standalone akamai purger service (#4040)
Fixes #4030.
2019-02-05 09:00:31 -08:00
Roland Bracewell Shoemaker cdef80ce67
Remove Akamai CCU v2 support (#3994)
Fixes #3991.
2019-01-08 12:28:11 -08:00
Roland Bracewell Shoemaker 6a47decc33 Deflake akamai purger integration testing (#3961)
The problem here was that we were doing revocation tests in the
v2 integration file that didn't block on getting the revoked OCSP
status. This meant that if the OCSP responder was running slow it
could execute a revoked cert tick between reseting the akamai test
server in the next test and sending another purge request which would
mean we saw two purge requests when we expected to see one.

The fix was to add the blocking and purge checking/reseting to the
v2 tests. Doing this without duplicating a bunch of code required
factoring a number of functions out into a third helpers file (I
think more code could be abstracted out to this file but just wanted
to start with what was needed for this change.)
2018-11-30 14:17:23 -08:00