Enables integration tests for authz2 and fixes a few bugs that were flagged up during the process. Disables expired-authorization-purger integration tests if config-next is being used as expired-authz-purger expects to purge some stuff but doesn't know about authz2 authorizations, a new test will be added with #4188.
Fixes#4079.
Without this change running a single integration tests with
`INT_SKIP_SETUP` like so:
```
docker-compose run --use-aliases -e INT_FILTER="test_http_multiva_threshold_pass" -e INT_SKIP_SETUP=true -e RUN="integration" boulder ./test.sh;
```
Produces an error like:
```
+ python2 test/integration-test.py --chisel --load --filter test_http_multiva_threshold_pass --skip-setup
Traceback (most recent call last):
File "test/integration-test.py", line 309, in <module>
main()
File "test/integration-test.py", line 217, in main
caa_account_uri = caa_client.account.uri if caa_client is not None else None
UnboundLocalError: local variable 'caa_client' referenced before assignment
```
This makes it a little clearer which bits are test setup helpers, and which
bits are actual test cases. It may also make it a little easier to see which cases
from the v1 tests also need a v2 test case.
Fixes#4126
## CI: restore load-generator run.
This restores running the `load-generator` during CI to make sure it doesn't bitrot. It was previously removed while we debugged the VA getting jammed up and not cleanly shutting down.
Since the global `pebble-challtestsrv` and the `load-generator`'s internal chall test srv will conflict this requires moving the `load-generator` run to the end of integration tests and updating `startservers.py` to allow the load gen integration test code to stop the `pebble-challtestsrv` before starting the `load-generator`.
The `load-generator` and associated config are updated to allow specifying bind addresses for the DNS interface of the internal challtestsrv. Multiple addresses are supported so that the `load-generator`'s chall test srv can listen on port DNS ports Boulder is configured to use. The `load-generator` config now accepts a `fakeDNS` parameter that can be used to specify the default IPv4 address returned by the `load-generator`'s DNS server for A queries.
## load-generator: support different challenges/strategies.
Updates the load-generator to support HTTP-01, DNS-01, and TLS-ALPN-01 challenge response servers. A new challenge selection configuration parameter (`ChallengeStrategy`) can be set to `"http-01"`, `"dns-01"`, or `"tls-alpn-01"` to solve only challenges of that type. Using `"random"` will let the load-generator choose a challenge type randomly.
Resolves https://github.com/letsencrypt/boulder/issues/3900
- Move fakeclock, get_future_output, and random_domain to helpers.py.
- Remove tempdir handling from integration-test.py since it's already
done in helpers.py
- Consolidate handling of config dir into helpers.py, and add
CONFIG_NEXT boolean.
- Move RevokeAtRA config gating into verify_revocation to reduce
redundancy.
- Skip load-balancing test when filter is enabled.
- Ungate test_sct_embedding
- Rework test_ct_submissions, which was out of date. In particular, have a couple of
logs where submitFinalCert: false, and make ct-test-srv store submission counts
by hostnames for better test case isolation.
* `EnforceMultiVA` to allow configuring multiple VAs but not changing the primary VA's result based on what the remote VAs return.
* `MultiVAFullResults` to allow collecting all of the remote VA results. When all results are collected a JSON log line with the differential between the primary/remote VAs is logged.
Resolves https://github.com/letsencrypt/boulder/issues/4066
We don't intend to load test the legacy WFE implementation in the future
and if we need to we can always revive this code from git. Removing it
will make refactoring the ACME v2 code to be closer to RFC 8555 easier.
Previously the v2_integration tests were imported to the global
namespace in integration-test.py. As a result, some were shadowed and
didn't run, or called methods that were in the main namespace rather
than their own.
This PR imports and runs them under their own namespace. It also fixes
some tests that were broken. Notably:
- Fixes chisel2.expect_problem.
- Fixes incorrect namespacing on some expect_problem calls.
- Remove unused ValidationError from v2_integration.
- Replace client.key with client.net.key.
I tried dropping the RA->VA timeout to make the
`test_http_challenge_timeout` integration test faster. It seems to flake
in CI so I'm restoring the original 20s timeout. This makes
`test_http_challenge_timeout` slower but c'est la vie.
Implements a feature that enables immediate revocation instead of marking a certificate revoked and waiting for the OCSP-Updater to generate the OCSP response. This means that as soon as the request returns from the WFE the revoked OCSP response should be available to the user. This feature requires that the RA be configured to use the standalone Akamai purger service.
Fixes#4031.
The URL construction approach we were previously using for the refactored VA HTTP-01 validation code was nice but broke SNI for HTTP->HTTPS redirects. In order to preserve this functionality we need to use a custom `DialContext` handler on the HTTP Transport that overrides the target host to use a pre-resolved IP.
Resolves https://github.com/letsencrypt/boulder/issues/3969
Staging and prod both deployed the PerformValidationRPC feature flag. All running WFE/WFE2 instances are using the more accurately named PerformValidation RPC and we can strip out the old UpdateAuthorization bits. The feature flag for PerformValidationRPC remains until we clean up the staging/prod configs.
Resolves#3947 and completes the last of #3930
When the `SimplifiedVAHTTP01` feature flag is enabled we need to
preserve query parameters when reconstructing a redirect URL for the
resolved IP address.
To add integration testing for this condition the Boulder tools images
are updated to in turn pull in an updated `pebble-challtestsrv` command
that tracks request history.
A new Python wrapper for the `pebble-challtestsrv` HTTP API is added to
centralize interacting with the chall test srv to add mock data and to
get the history of HTTP requests that have been processed.
`pebble-challtestsrv` added a `-defaultIPv4` arg we can use to simplify
the integration tests and fix FAKE_DNS usage outside of integration
tests.
A new boulder-tools image with an updated `pebble-challtestsrv` is used
and `test/startservers.py` is changed to populate `-defaultIPv4` via the
`FAKE_DNS` env var.
Now that Pebble has a `pebble-challtestsrv` we can remove the `challtestrv`
package and associated command from Boulder. I switched CI to use
`pebble-challtestsrv`. Notably this means that we have to add our expected mock
data using the HTTP management interface. The Boulder-tools images are
regenerated to include the `pebble-challtestsrv` command.
Using this approach also allows separating the TLS-ALPN-01 and HTTPS HTTP-01
challenges by binding each challenge type in the `pebble-challtestsrv` to
different interfaces both using the same VA
HTTPS port. Mock DNS directs the VA to the correct interface.
The load-generator command that was previously using the `challtestsrv` package
from Boulder is updated to use a vendored copy of the new
`github.org/letsencrypt/challtestsrv` package.
Vendored dependencies change in two ways:
1) Gomock is updated to the latest release (matching what the Bouldertools image
provides)
2) A couple of new subpackages in `golang.org/x/net/` are added by way of
transitive dependency through the challtestsrv package.
Unit tests are confirmed to pass for `gomock`:
```
~/go/src/github.com/golang/mock/gomock$ git log --pretty=format:'%h' -n 1
51421b9
~/go/src/github.com/golang/mock/gomock$ go test ./...
ok github.com/golang/mock/gomock 0.002s
? github.com/golang/mock/gomock/internal/mock_matcher [no test files]
```
For `/x/net` all tests pass except two `/x/net/icmp` `TestDiag.go` test cases
that we have agreed are OK to ignore.
Resolves https://github.com/letsencrypt/boulder/issues/3962 and
https://github.com/letsencrypt/boulder/issues/3951
To complete https://github.com/letsencrypt/boulder/issues/3956 the `challtestsrv` is updated such that its existing TLS-ALPN-01 challenge test server will serve HTTP-01 responses with a self-signed certificate when a non-TLS-ALPN-01 request arrives. This lets the TLS-ALPN-01 challenge server double as a HTTPS version of the HTTP challenge server. The `challtestsrv` now also supports adding/remove redirects that will be served to clients when requesting matching paths.
The existing chisel/chisel2 integration tests are updated to use the `challtestsrv` instead of starting their own standalone servers. This centralizes our mock challenge responses and lets us bind the `challtestsrv` to the VA's HTTP port in `startservers.py` without clashing ports later on.
New integration tests are added for HTTP-01 redirect scenarios using the updated `challtestserv`. These test cases cover:
* valid HTTP -> HTTP redirect
* valid HTTP -> HTTPS redirect
* Invalid HTTP -> non-HTTP/HTTPS port redirect
* Invalid HTTP-> non-HTTP/HTTPS protocol scheme redirect
* Invalid HTTP-> bare IP redirect
* Invalid HTTP redirect loop
The new integration tests shook out two fixes that were required for the legacy VA HTTP-01 code (afad22b) and one fix for the challtestsrv mock DNS (59b7d6d).
Resolves https://github.com/letsencrypt/boulder/issues/3956
The problem here was that we were doing revocation tests in the
v2 integration file that didn't block on getting the revoked OCSP
status. This meant that if the OCSP responder was running slow it
could execute a revoked cert tick between reseting the akamai test
server in the next test and sending another purge request which would
mean we saw two purge requests when we expected to see one.
The fix was to add the blocking and purge checking/reseting to the
v2 tests. Doing this without duplicating a bunch of code required
factoring a number of functions out into a third helpers file (I
think more code could be abstracted out to this file but just wanted
to start with what was needed for this change.)
The existing RA `UpdateAuthorization` RPC needs replacing for
two reasons:
1. The name isn't accurate - `PerformValidation` better captures
the purpose of the RPC.
2. The `core.Challenge` argument is superfluous since Key
Authorizations are not sent in the initiation POST from the client
anymore. The corresponding unmarshal and verification is now
removed. Notably this means broken clients that were POSTing
the wrong thing and failing pre-validation will now likely fail
post-validation.
To remove `UpdateAuthorization` the new `PerformValidation`
RPC is added alongside the old one. WFE and WFE2 are
updated to use the new RPC when the perform validation
feature flag is enabled. We can remove
`UpdateAuthorization` and its associated wrappers once all
WFE instances have been updated.
Resolves https://github.com/letsencrypt/boulder/issues/3930
Removes superfluous usage of `UpdatePendingAuthorization` in the RA to update the key authorization and test if the authorization is pending and instead uses the result of the initial `GetAuthorization` call in the WFE.
Fixes#3923.
This adds support for the account-uri CAA parameter as specified by
section 3 of https://tools.ietf.org/html/draft-ietf-acme-caa-04, allowing
issuance to be restricted to one or more ACME accounts as specified by CAA
records.
We see a fair number of ACME accounts/registrations with contact
addresses for the RFC2606 Section 3 "Reserved Example Second Level
Domain Names" (`example.com`, `example.net`, `example.org`). These are
not real contact addresses and are likely the result of the user
copy-pasting example configuration. These users will miss out on
expiration emails and other subscriber communications :-(
This commit updates the RA's `validateEmail` function to reject any
contact addresses for reserved example domain names. The corresponding
unit test is updated accordingly.
Resolves https://github.com/letsencrypt/boulder/issues/3719
When performing CAA checking respect the validation-methods parameter (if
present) and restrict the allowed authorization methods to those specified.
This allows a domain to restrict authorization methods that can be used with
Let's Encrypt.
This is largely based on PR #3003 (by @lukaslihotzki), which was landed and
then later reverted due to issue #3143. The bug the resulted in the previous
code being reverted has been addressed (likely inadvertently) by 76973d0f.
This implementation also includes integration tests for CAA validation-methods.
Fixes issue #3143.
We'd like to start using the DNS load balancer in the latest version of gRPC. That means putting all IPs for a service under a single hostname (or using a SRV record, but we're not taking that path). This change adds an sd-test-srv to act as our service discovery DNS service. It returns both Boulder IP addresses for any A lookup ending in ".boulder". This change also sets up the Docker DNS for our boulder container to defer to sd-test-srv when it doesn't know an answer.
sd-test-srv doesn't know how to resolve public Internet names like `github.com`. Resolving public names is required for the `godep-restore` test phase, so this change breaks out a copy of the boulder container that is used only for `godep-restore`.
This change implements a shim of a DNS resolver for gRPC, so that we can switch to DNS-based load balancing with the currently vendored gRPC, then when we upgrade to the latest gRPC we won't need a simultaneous config update.
Also, this change introduces a check at the end of the integration test that each backend received at least one RPC, ensuring that we are not sending all load to a single backend.
The existing CAA tests only test the CAA checks on the validation path and
not the CAA rechecking in the case where an existing authorization is present
(but older than the 8 hour window).
This extends the CAA integration tests to also cover the CAA rechecking
code path, by reusing older authorizations and rejecting issuance via CAA.
The test_expired_authz_404() test is currently broken in two ways - firstly,
there is no way for it to distinguish between a 404 from an expired authz
and a 404 from a non-existant authz. Secondly, the test_expired_authz_purger()
test runs and wipes out all of the existing authorizations, including the one
that was set up from setup_seventy_days_ago(), before the expired test runs.
Avoid this by running the expired authorization purger test from later in main().
Also, add a minimal canary that will detect if all authorizations have been purged
(although this still does not guarantee that we got a 404 due to expiration).
load generator: send correct ACMEv2 Content-Type on POST.
This PR updates the Boulder load-generator to send the correct ACMEv2 Content-Type header when POSTing the ACME server. This is required for ACMEv2 and without it all POST requests to the WFE2 running a test/config-next configuration result in malformed 400 errors. While only required by ACMEv2 this commit sends it for ACMEv1 requests as well. No harm no foul.
integration tests: allow running just the load generator.
Prior to this PR an omission in an if statement in integration-test.py meant that you couldn't invoke test/integration-test.py with just the --load argument to only run the load generator. This commit updates the if to allow this use case.
Previously we updated the RA's issueCertificateInner function to prefix errors returned from the CA with meaningful information about which CA RPC caused the failure. Unfortunately by using fmt.Errorf to do this we're discarding the underlying error type. This can cause unexpected server internal errors downstream if (for e.g.) the CA rejects a CSR with a malformed error (see #3632).
This PR updates the issueCertificateInner error message prefixing to maintain the error type if the underlying error is a berrors.BoulderError. A RA unit test with several mock CAs is added to test the prefixing occurs as expected without loss of error type.
This PR also adds an integration test that ensures we reject a CSR with >100 names with a malformed error. This is not strictly related to this PR but since I wrote it while debugging the root issue I thought I'd include it. To allow this test to pass the pendingAuthorizationsPerAccount in test/rate-limit-policies.yml and associated tests had to be adjusted.
Resolves#3632
This allows these tools to easily be run in command line mode from
the host machine against a Boulder running inside docker-compose up
without modifying the FAKE_DNS field in docker-compose.yml. This
allows for easier testing of various conditions.
In publisher and in the integration test, check that SCTs are in a
reasonable range. Also, update CreateTestingSignedSCT (used by
ct-test-srv) to produce SCTs correctly with a timetamp in Unix epoch
milliseconds.
- Remove acme-v2 test phase.
- Rename integration-test-v2.py to v2_integration, so it can be imported.
- Import all symbols from v2_integration before running test_*.
- In chisel2:
- Rename DIRECTORY so it doesn't collide.
- Incidental logging and error fixes.
- Merge v1 and v2 load testing into a single function.
- Run cert-checker just once, after all other test cases.
- In v2_integration:
- Remove unnecessary imports.
- Import chisel2 methods in the chisel2 namespace so they don't
collide with chisel methods.
- Remove main and shutdown code.
Previously, each time we defined a new test case in integration-test.py, we had to explicitly call it.
This made it easy to leave out cases without realizing it. After this change, we will automatically
find all functions named "test_" and call them. As a result, I found that we weren't calling
`test_revoked_by_account`, and it was failing. So I fixed it as part of this PR.
Fixes#3518
Adds SCT embedding to the certificate issuance flow. When a issuance is requested a precertificate (the requested certificate but poisoned with the critical CT extension) is issued and submitted to the required CT logs. Once the SCTs for the precertificate have been collected a new certificate is issued with the poison extension replace with a SCT list extension containing the retrieved SCTs.
Fixes#2244, fixes#3492 and fixes#3429.
This commit adds short 15s runs of the load generator against the V1 and
V2 APIs during the three integration test runs (v1 config, v1
config-next, and v2). 15s was selected because 30s caused too much
output and the build log to be truncated.
Presently the latency output is *not* being checked for errors. This was
too flaky in practice.
A fix for a race condition in the load-generator code itself related to
HTTP status code tracking is included in this commit.
The pending authz rate limit also needed to be adjusted to keep the
load-generator from failing requests after hitting 429s.
The test for the certificates_per_name rate limit uses an exact domain
name that has an override in the rate limit config file to have a limit
of 0. This works correctly most of the time. However, if that mechanism
fails once (due to some bug), future runs of the integation test will
continue to fail, because there will now be an issued certificate for
"lim.it" in the DB, and subsequent attempts will be considered renewals.
This change adds a random subdomain to the test, so that it's not
eligible for the renewal exemption.