When we query DNS for a host, and both the A and AAAA lookups fail or
are empty, combine both errors into a single error rather than only
returning the error from the A lookup.
Fixes#5819Fixes#5319
Overhaul the revocation integration tests to comprehensively test
every combination of:
- revoking a cert vs a precert
- revoking via the cert key, the subscriber key, or a separate account
that has validation for all of the names in the cert
- revoking for reason Unspecified vs for reason KeyCompromise
Also update a number of the python tests to verify that they cannot
revoke for reason keyCompromise, but can and do revoke with other
reasons.
This allows repeated runs using the same hiearchy, and avoids spurious
errors from ocsp-updater saying "This CA doesn't have an issuer cert
with ID XXX"
Fixes#5721
Add a new rate limit, identical in implementation to the current
`CertificatesPerFQDNSet` limit, intended to always have both a lower
window and a lower threshold. This allows us to block runaway clients
quickly, and give their owners the ability to fix and try again quickly
(on the order of hours instead of days).
Configure the integration tests to set this new limit at 2 certs per 2
hours. Also increase the existing limit from 5 to 6 certs in 7 days, to
allow clients to hit the first limit three times before being fully
blocked for the week. Also add a new integration test to verify this
behavior.
Note that the new ratelimit must have a window greater than the
configured certificate backdate (currently 1 hour) in order to be
useful.
Fixes#5210
We'd like to issue certs with no CN eventually, but it's not
going to happen any time soon. In the mean time, the existing
code never gets exercised and is rather complex, so this
removes it.
The `KeyPolicy.GoodKey` method is used to validate both public keys
used to sign JWK messages, and public keys contained inside CSR
messages.
According to RFC8555 section 6.7, validation failure in the former
case should result in `badPublicKey`, while validation failure in
the latter case should result in `badCSR`. In either case, a failure
due to reasons other than the key itself should result in
`serverInternal`.
However, the GoodKey method returns a variety of different errors
which are not all applicable depending on the context in which it is
called. In addition, the `csr.VerifyCSR` method passes these errors
through verbatim, resulting in ACME clients receiving confusing and
incorrect error message types.
This change causes the GoodKey method to always return either a
generic error or a KeyError. Calling methods should treat a `KeyError`
as either a `badPublicKey` or a `badCSR` depending on their context,
and may treat a generic error however they choose (though likely as a
serverInternal error).
Fixes#4930
This ended up taking a lot more work than I expected. In order to make the implementation more robust a bunch of stuff we previously relied on has been ripped out in order to reduce unnecessary complexity (I think I insisted on a bunch of this in the first place, so glad I can kill it now).
In particular this change:
* Removes bhsm and pkcs11-proxy: softhsm and pkcs11-proxy don't play well together, and any softhsm manipulation would need to happen on bhsm, then require a restart of pkcs11-proxy to pull in the on-disk changes. This makes manipulating softhsm from the boulder container extremely difficult, and because of the need to initialize new on each run (described below) we need direct access to the softhsm2 tools since pkcs11-tool cannot do slot initialization operations over the wire. I originally argued for bhsm as a way to mimic a network attached HSM, mainly so that we could do network level fault testing. In reality we've never actually done this, and the extra complexity is not really realistic for a handful of reasons. It seems better to just rip it out and operate directly on a local softhsm instance (the other option would be to use pkcs11-proxy locally, but this still would require manually restarting the proxy whenever softhsm2-util was used, and wouldn't really offer any realistic benefit).
* Initializes the softhsm slots on each integration test run, rather than when creating the docker image (this is necessary to prevent churn in test/cert-ceremonies/generate.go, which would need to be updated to reflect the new slot IDs each time a new boulder-tools image was created since slot IDs are randomly generated)
* Installs softhsm from source so that we can use a more up to date version (2.5.0 vs. 2.2.0 which is in the debian repo)
* Generates the root and intermediate private keys in softhsm and writes out the root and intermediate public keys to /tmp for use in integration tests (the existing test-{ca,root} certs are kept in test/ because they are used in a whole bunch of unit tests. At some point these should probably be renamed/moved to be more representative of what they are used for, but that is left for a follow-up in order to keep the churn in this PR as related to the ceremony work as possible)
Another follow-up item here is that we should really be zeroing out the database at the start of each integration test run, since certain things like certificates and ocsp responses will be signed by a key/issuer that is no longer is use/doesn't match the current key/issuer.
Fixes#4832.
As of this change, each test case in v1_integration.py has an equivalent
in v2_integration.py. This mostly involved copying the test cases and
tweaking them to use chisel2.py. I had to add support for updating email
addresses in chisel2.py (copied from chisel.py) in order to support one
of the test cases.
The VA was not yet configured to recognize account paths that start
with the ACMEv2 path, so I added that configuration.
The most useful way to see what's changed in porting the test cases
is to check out this branch and then do a diff between v1_integration.py
and v2_integration.py.
In Python3, the output of subprocess.check_output is of type bytes.
That means calling print() on the output will print \n instead of an
actual newline. This PR adds decoding to the output of mysql in the slow
query test, bringing it into line with other check_output calls.
This also removes a redundant "def run" that is shadowed by the
definition in helpers.py (and was also missing a decode() call).
Python 2 is over in 1 month 4 days: https://pythonclock.org/
This rolls forward most of the changes in #4313.
The original change was rolled back in #4323 because it
broke `docker-compose up`. This change fixes those original issues by
(a) making sure `requests` is installed and (b) sourcing a virtualenv
containing the `requests` module before running start.py.
Other notable changes in this:
- Certbot has changed the developer instructions to install specific packages
rather than rely on `letsencrypt-auto --os-packages-only`, so we follow suit.
- Python3 now has a `bytes` type that is used in some places that used to
provide `str`, and all `str` are now Unicode. That means going from `bytes` to
`str` and back requires explicit `.decode()` and `.encode()`.
- Moved from urllib2 to requests in many places.
This brings OCSP signing into alignment with the other components of the
CA in that they use ca.clk, which can be mocked out in unittests.
This tweaks test_ocsp_exp_unauth to be compatible with the change.
Fixes#4441.
LE is popular and aims to popularise certificate issuance. End users who see
error messages cannot be assumed to be as DNS-experienced as previously. The
user-facing error messages in the policy authority file are terse and unobvious
to the point that they are often unlikely to be well understood by those they
are intended to inform, who may be "just trying to get a LE cert for their
domain".
This also adds the badCSR error type specified by RFC 8555. It is a natural fit for the errors in VerifyCSR that aren't covered by badPublicKey. The web package function for converting a berror to
a problem is updated for the new badCSR error type.
The callers (RA and CA) are updated to return the berrors from VerifyCSR as is instead of unconditionally wrapping them as a berrors.MalformedError instance. Unit/integration tests are updated accordingly.
Resolves#4418
In order to move multi perspective validation forward we need to support policy
in Boulder configuration that can relax multi-va requirements temporarily.
A similar mechanism was used in support of the gradual deprecation of the
TLS-SNI-01 challenge type and with the introduction of CAA enforcement and has
shown to be a helpful tool to have available when introducing changes that are
expected to break sites.
When the VA "multiVAPolicyFile" is specified it is assumed to be a YAML file
containing two lists:
1. disabledNames - a list of domain names that are exempt from multi VA
enforcement.
2. disabledAccounts - a list of account IDs that are exempt from multi VA
enforcement.
When a hostname or account ID is added to the policy we'll begin communication
with the related ACME account contact to establish that this is a temporary
measure and the root problem will need to be addressed before an eventual
cut-off date.
Resolves https://github.com/letsencrypt/boulder/issues/4455
We occasionally have reason to block public keys from being used in CSRs
or for JWKs. This work adds support for loading a YAML blocked keys list
to the WFE, the RA and the CA (all the components already using the
`goodekey` package).
The list is loaded in-memory and is intended to be used sparingly and
not for more complicated mass blocking scenarios. This augments the
existing debian weak key checking which is specific to RSA keys and
operates on a truncated hash of the key modulus. In comparison the
admin. blocked keys are identified by the Base64 encoding of a SHA256
hash over the DER encoding of the public key expressed as a PKIX subject
public key. For ECDSA keys in particular we believe a more thorough
solution would have to consider inverted curve points but to start we're
calling this approach "Good Enough".
A utility program (`block-a-key`) is provided that can read a PEM
formatted x509 certificate or a JSON formatted JWK and emit lines to be
added to the blocked keys YAML to block the related public key.
A test blocked keys YAML file is included
(`test/example-blocked-keys.yml`), initially populated with a few of the
keys from the `test/` directory. We may want to do a more through pass
through Boulder's source code and add a block entry for every test
private key.
Resolves https://github.com/letsencrypt/boulder/issues/4404
This also removes some awkward dancing we did in integration_test.py to
run setup_twenty_days_ago under the opposite config of whatever we were
about to run tests under.
Reverts most of #4288 and #4290.
This PR changes the VA to return `dns` problem type for errors when performing
HTTP-01 challenges for domains that have no IP addresses, or errors looking up
the IP addresses.
The `va.getAddrs` function is internal to the VA and can return
`berrors.BoulderError`s with a DNS type when there is an error, allowing the
calling code to convert this to a problem when required
using an updated `detailedError` function. This avoids some clunky conversion
the HTTP-01 code was doing that misrepresented DNS level errors as connection
problems with a DNS detail message.
In order to add an integration test for challenge validation that results in
`getAddrs` DNS level errors the Boulder tools image had to be bumped to a tag
that includes the latest `pebble-challtestsrv` that
supports mocking SERVFAILs. It isn't possible to mock this case with internal IP
addresses because our VA test configuration does not filter internal addresses
to support the testing context.
Additionally this branch removes the `UnknownHostProblem` from the `probs`
package:
1. It isn't used anywhere after 532c210
2. It's not a real RFC 8555 problem type. We should/do use the
DNS type for this.
Resolves https://github.com/letsencrypt/boulder/issues/4407
To make this work, I changed the twenty_days_ago setup to use
`config-next` when the main test phase is running `config`. That, in
turn, made the recheck_caa test fail, so I added a tweak to that.
I also moved the authzv2 migrations into `db`. Without that change,
the integration test would fail during the twenty_days_ago setup because
Boulder would attempt to create authzv2 objects but the table wouldn't
exist yet.
The ocsp-updater ocspStaleMaxAge config var has to be bumped up to ~7 months so that when it is run after the six-months-ago run it will actually update the ocsp responses generated during that period and mark the certificate status row as expired.
Fixes#4338.
For authzv1, this actually executes a SQL DELETE for the unused challenges
when an authorization is updated upon validation.
For authzv2, this doesn't perform a delete, but changes the authorizations that
are returned so they don't include unused challenges.
In order to test the flag for both authz storage models, I set the feature flag in
both config/ and config-next/.
Fixes#4352
I introduced test_fail_thrice as a specific regression test for #4329,
but I realized that a more general test of the failed validation limit
would have better coverage and also serve as a regression test at the
same time.
Fixes#4332.
This rolls forward #4326 after it was reverted in #4328.
Resolves https://github.com/letsencrypt/boulder/issues/4329
The older query didn't have a `LIMIT 1` so it was returning multiple results,
but gorp's `SelectOne` was okay with multiple results when the selection was
going into an `int64`. When I changed this to a `struct` in #4326, gorp started
producing errors.
For this bug to manifest, an account needs to create an order, then fail
validation, twice in a row for a given domain name, then create an order once
more for the same domain name - that third request will fail because there are
multiple orders in the orderFqdnSets table for that domain.
Note that the bug condition doesn't happen when an account does three successful
issuances in a row, because finalizing an order (that is, issuing a certificate
for it) deletes the row in orderFqdnSets. Failing an authorization does not
delete the row in orderFqdnSets. I believe this was an intentional design
decision because an authorization can participate in many orders, and those
orders can have many other authorizations, so computing the updated state of
all those orders would be expensive (remember, order state is not persisted in
the DB but is calculated dynamically based on the authorizations it contains).
This wasn't detected in integration tests because we don't have any tests that
fail validation for the same domain multiple times. I filed an issue for an
integration test that would have incidentally caught this:
https://github.com/letsencrypt/boulder/issues/4332. There's also a more specific
test case in #4331.
This reverts commit 796a7aa2f4.
People's tests have been breaking on `docker-compose up` with the following output:
```
ImportError: No module named requests
```
Fixes#4322
* integration: move to Python3
- Add parentheses to all print and raise calls.
- Python3 distinguishes bytes from strings. Add encode() and
decode() calls as needed to provide the correct type.
- Use requests library consistently (urllib3 is not in Python3).
- Remove shebang from Python files without a main, and update
shebang for integration-test.py.
Before #4275 if a CSR only contained SANs longer than the max CN limit
it would set the CN to one anyway and would cause the 'CN too long'
check to get triggered. After #4275 if all of the SANs were too long
the CN wouldn't get set and we didn't have a check for `forceCNFromSAN
&& cn == ""` which would allow empty CNs despite `forceCNFromSAN`
being set. This adds that check and a test for the corner case.
When NewAuthorizationSchema is enabled, we still want v1 authzs to be reusable in
new orders. This tests that that code is implemented correctly.
Updates #4241
Move from using `requests` to `urllib2` in `helpers.py`. Verified
this works with `docker-compose up`. In the future we really should
be installing our own python dependencies in the boulder-tools image
rather than relying on getting them by using the certbot virtualenv.
* Switch to instant OCSP verification in integration tests
* Move waitport to helpers and use it to determine if ocsp-responder is
alive in test_single_ocsp
In November 2019 we will be removing support for legacy pre RFC-8555
unauthenticated GET requests for accessing ACME resources. A new
`MandatoryPOSTAsGET` feature flag is added to the WFE2 to allow
enforcing this change. Once this feature flag has been activated in Nov
we can remove both it and the WFE2 code supporting GET requests.
In practice it seems the only way to add a specific error for when an initial HTTP-01 challenge request is made to an HTTP/2 server mis-configured on `:80` is with a regex on the error string.
The error returned from the stdlib `http.Client` for HTTP to an HTTP/2 server is just an `errors.ErrorString` instance without any context (once you peel it out of the wrapping `url.Error`):
> Err:(*errors.errorString)(0xc420609bf0)}] errStr=[Get http://example.com/.well-known/acme-challenge/xxxxxxx: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x00\x00\x12\x04\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x80\x00\x04\x00\x01\x00\x00\x00\x05\x00\xff\xff\xff\x00\x00\x04\b\x00\x00\x00\x00\x00\u007f\xff\x00\x00\x00\x00\b\a\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01"]
Even directly in the stdlib code at the place in `http/response.go` that generates the error it's using a `&badStringError{}` and just putting the byte string that `textproto` read into it.
To detect this case in `detailedError` I added a pre-compiled regex that will match the net/http malformed HTTP response error for raw bytes matching an arbitrarily sized HTTP/2 SETTINGS frame. Per RFC "A SETTINGS frame MUST be sent by both endpoints at the start of a connection" and so this seems like a fairly reliable indicator of an unexpected HTTP/2 response in an HTTP/1.1 context.
Thanks to @mnordhoff for the detailed notes (and RFC refs) in #3416 It made this a lot easier!
Resolves#3416.
Often folks will mis-configure their webserver to send an HTTP redirect missing
a `/' between the FQDN and the path.
E.g. in Apache using:
Redirect / https://bad-redirect.org
Instead of:
Redirect / https://bad-redirect.org/
Will produce an invalid HTTP-01 redirect target like:
https://bad-redirect.org.well-known/acme-challenge/xxxx
This happens frequently enough we want to return a distinct error message
for this case by detecting the redirect targets ending in ".well-known".
After the "Simple HTTP-01" code landed this case was previously getting an error
message of the form:
> "Invalid hostname in redirect target, must end in IANA registered TLD"
Resolves https://github.com/letsencrypt/boulder/issues/3606
Rather than having the `BouncerHTTPRequestHandler` bounce/redirect
requests based on how many have arrived it is more reliable to follow
the way we did this in unit tests and decide based on the request
UA. By setting a distinct UA per VA it's possible to more precisely
define how many requests from each VA should be redirected to the
real key authorization.
Using the count was flaky because occasionally the remote VA
requests would arrive before the primary VA expending the VIP
request quota and bouncing the primary VA validation request. The
`test_http_multiva_threshold_pass` unit test expectation was that
since 2 of the 3 requests would pass that the test would pass but
since the primary VA is treated differently (it must always see a
valid result for the remote VA results to be considered) the test
would fail when the requests were scheduled this way.
Something particular to CI seems to result in the primary VA
being scheduled after the remote VA goroutines more often
than on faster machines/local testing. I had very little
luck reproducing locally but in 10 CI builds of master
with the count-based `BouncerHTTPRequestHandler` tests I saw
`test_http_multiva_threshold_pass` fail 2 builds. After switching to
this branch's UA-based `BouncerHTTPRequestHandler` 10/10 builds passed.
Resolves https://github.com/letsencrypt/boulder/issues/4147
- Move fakeclock, get_future_output, and random_domain to helpers.py.
- Remove tempdir handling from integration-test.py since it's already
done in helpers.py
- Consolidate handling of config dir into helpers.py, and add
CONFIG_NEXT boolean.
- Move RevokeAtRA config gating into verify_revocation to reduce
redundancy.
- Skip load-balancing test when filter is enabled.
- Ungate test_sct_embedding
- Rework test_ct_submissions, which was out of date. In particular, have a couple of
logs where submitFinalCert: false, and make ct-test-srv store submission counts
by hostnames for better test case isolation.
* `EnforceMultiVA` to allow configuring multiple VAs but not changing the primary VA's result based on what the remote VAs return.
* `MultiVAFullResults` to allow collecting all of the remote VA results. When all results are collected a JSON log line with the differential between the primary/remote VAs is logged.
Resolves https://github.com/letsencrypt/boulder/issues/4066