This test adds support in ct-test-srv for rejecting precertificates by
hostname, in order to artificially generate a condition where a
precertificate is issued but no final certificate can be issued. Right
now the final check in the test is temporarily disabled until the
feature is fixed.
Also, as our first Go-based integration test, this pulls in the
eggsampler/acme Go client, and adds some suport in integration-test.py.
This also refactors ct-test-srv slightly to use a ServeMux, and fixes
a couple of cases of not returning immediately on error.
This also removes some awkward dancing we did in integration_test.py to
run setup_twenty_days_ago under the opposite config of whatever we were
about to run tests under.
Reverts most of #4288 and #4290.
This PR changes the VA to return `dns` problem type for errors when performing
HTTP-01 challenges for domains that have no IP addresses, or errors looking up
the IP addresses.
The `va.getAddrs` function is internal to the VA and can return
`berrors.BoulderError`s with a DNS type when there is an error, allowing the
calling code to convert this to a problem when required
using an updated `detailedError` function. This avoids some clunky conversion
the HTTP-01 code was doing that misrepresented DNS level errors as connection
problems with a DNS detail message.
In order to add an integration test for challenge validation that results in
`getAddrs` DNS level errors the Boulder tools image had to be bumped to a tag
that includes the latest `pebble-challtestsrv` that
supports mocking SERVFAILs. It isn't possible to mock this case with internal IP
addresses because our VA test configuration does not filter internal addresses
to support the testing context.
Additionally this branch removes the `UnknownHostProblem` from the `probs`
package:
1. It isn't used anywhere after 532c210
2. It's not a real RFC 8555 problem type. We should/do use the
DNS type for this.
Resolves https://github.com/letsencrypt/boulder/issues/4407
To make this work, I changed the twenty_days_ago setup to use
`config-next` when the main test phase is running `config`. That, in
turn, made the recheck_caa test fail, so I added a tweak to that.
I also moved the authzv2 migrations into `db`. Without that change,
the integration test would fail during the twenty_days_ago setup because
Boulder would attempt to create authzv2 objects but the table wouldn't
exist yet.
The ocsp-updater ocspStaleMaxAge config var has to be bumped up to ~7 months so that when it is run after the six-months-ago run it will actually update the ocsp responses generated during that period and mark the certificate status row as expired.
Fixes#4338.
* deps: update github.com/zmap/zlint to latest.
Update the `github.com/zmap/zlint` dependency to b126a9b. This captures
a small fix to the `ct_sct_policy_count_unsatisfied` lint that ensures
it isn't run for precertificates.
* config: remove ct_sct_policy_count_unsatisfied from ignored_lints.
With the latest `zlint` the `ct_sct_policy_count_unsatisfied` lint won't
flag precertificates as having an info-level lint result for missing
SCTs. With that fix in place we no longer have to ignore this lint in
the config-next CA configs that enable preissuance linting.
The `test/config-next` CA configs are both updated to use `zlint` to lint TBS
pre-certificates with a throw-away key and treat any lint findings >=
`lints.Pass` as an error, blocking the CA from signing the TBS pre-cert with its
private key.
The CA `issuePrecertificateInner` function is updated to specifically catch
linting related errors from CFSSL to marshal the linting findings to the audit
log. A small unit test for this change is included.
The CA `IssueCertificateForPrecertificate` function remains unchanged: the CFSSL
interface that defines `SignFromPrecert` doesn't facilitate linting. We still
lint final certificates post-issuance with `cert-checker` and accept the
possibility there may be some compliance issues that could occur between the
precertificate passing linting and the final certificate being signed.
Resolves https://github.com/letsencrypt/boulder/issues/4255
For authzv1, this actually executes a SQL DELETE for the unused challenges
when an authorization is updated upon validation.
For authzv2, this doesn't perform a delete, but changes the authorizations that
are returned so they don't include unused challenges.
In order to test the flag for both authz storage models, I set the feature flag in
both config/ and config-next/.
Fixes#4352
A new `boulder-janitor` command is added that provides a long-running
daemon that cleans up rows associated with expired certificate
resources. At present this is rows from the following tables:
* certificates
* certificateStatus
* certificatesPerName
Adding cleanup of tables associated with Order resources is the next step.
Three prometheus stats are exported:
* janitor_deletions - CounterVec for the number of deletions by table the
boulder-janitor has performed.
* janitor_workbatch - GaugeVec for the number of items of work by table
the boulder-janitor queued for deletion.
* janitor_errors - CounterVec for the number of errors by table and error
type the boulder-janitor has experienced.
The `SCTReceipts` database table and associated model linkages are
legacy cruft from before Boulder implemented SCT embedding. We can
safely remove all of this stuff.
I noticed this stale bit of text in `test/sa_db_users.sql` while working on something unrelated.
The CA component does not have its own DB, it relies on the SA and its DB the same as
other components. Consulting a historian tells me that this was once true in a lost age :-)
The explicit license header comment isn't required. We don't follow this practice in any of
the other files and assume the base repo LICENSE file is sufficient.
Similarly we don't need to drop each user and recreate them, we use
`CREATE USER IF NOT EXISTS` from MariaDB 10.1+ We also don't need the related
`drop_users.sql` script.
I introduced test_fail_thrice as a specific regression test for #4329,
but I realized that a more general test of the failed validation limit
would have better coverage and also serve as a regression test at the
same time.
Fixes#4332.
This rolls forward #4326 after it was reverted in #4328.
Resolves https://github.com/letsencrypt/boulder/issues/4329
The older query didn't have a `LIMIT 1` so it was returning multiple results,
but gorp's `SelectOne` was okay with multiple results when the selection was
going into an `int64`. When I changed this to a `struct` in #4326, gorp started
producing errors.
For this bug to manifest, an account needs to create an order, then fail
validation, twice in a row for a given domain name, then create an order once
more for the same domain name - that third request will fail because there are
multiple orders in the orderFqdnSets table for that domain.
Note that the bug condition doesn't happen when an account does three successful
issuances in a row, because finalizing an order (that is, issuing a certificate
for it) deletes the row in orderFqdnSets. Failing an authorization does not
delete the row in orderFqdnSets. I believe this was an intentional design
decision because an authorization can participate in many orders, and those
orders can have many other authorizations, so computing the updated state of
all those orders would be expensive (remember, order state is not persisted in
the DB but is calculated dynamically based on the authorizations it contains).
This wasn't detected in integration tests because we don't have any tests that
fail validation for the same domain multiple times. I filed an issue for an
integration test that would have incidentally caught this:
https://github.com/letsencrypt/boulder/issues/4332. There's also a more specific
test case in #4331.
This reverts commit 9fa360769e.
This commit can cause "gorp: multiple rows returned for: ..." under certain situations.
See #4329 for details of followup.
When there are a lot of potential orders to reuse, the query could scan
unnecessary rows, sometimes leading to timeouts. The new query used
when the FasterGetOrderForNames feature flag is enabled uses the
available index more effectively and adds a LIMIT clause.
This reverts commit 796a7aa2f4.
People's tests have been breaking on `docker-compose up` with the following output:
```
ImportError: No module named requests
```
Fixes#4322
* integration: move to Python3
- Add parentheses to all print and raise calls.
- Python3 distinguishes bytes from strings. Add encode() and
decode() calls as needed to provide the correct type.
- Use requests library consistently (urllib3 is not in Python3).
- Remove shebang from Python files without a main, and update
shebang for integration-test.py.
Basically a complete re-write/re-design of the forwarding concept introduced in
#4297 (sorry for the rapid churn here). Instead of nonce-services blindly
forwarding nonces around to each other in an attempt to find out who issued the
nonce we add an identifying prefix to each nonce generated by a service. The
WFEs then use this prefix to decide which nonce-service to ask to validate the
nonce.
This requires a slightly more complicated configuration at the WFE/2 end, but
overall I think ends up being a way cleaner, more understandable, easy to
reason about implementation. When configuring the WFE you need to provide two
forms of gRPC config:
* one gRPC config for retrieving nonces, this should be a DNS name that
resolves to all available nonce-services (or at least the ones you want to
retrieve nonces from locally, in a two DC setup you might only configure the
nonce-services that are in the same DC as the WFE instance). This allows
getting a nonce from any of the configured services and is load-balanced
transparently at the gRPC layer.
* a map of nonce prefixes to gRPC configs, this maps each individual
nonce-service to it's prefix and allows the WFE instances to figure out which
nonce-service to ask to validate a nonce it has received (in a two DC setup
you'd want to configure this with all the nonce-services across both DCs so
that you can validate a nonce that was generated by a nonce-service in another
DC).
This balancing is implemented in the integration tests.
Given the current remote nonce code hasn't been deployed anywhere yet this
makes a number of hard breaking changes to both the existing nonce-service
code, and the forwarding code.
Fixes#4303.
Before #4275 if a CSR only contained SANs longer than the max CN limit
it would set the CN to one anyway and would cause the 'CN too long'
check to get triggered. After #4275 if all of the SANs were too long
the CN wouldn't get set and we didn't have a check for `forceCNFromSAN
&& cn == ""` which would allow empty CNs despite `forceCNFromSAN`
being set. This adds that check and a test for the corner case.
The three new cases separately test:
- Rechecking CAA during authz reuse.
- Successful issuance for a positive CAA record
- Rejected issuance for a negative CAA record
- The various CAA extensions from https://tools.ietf.org/html/draft-ietf-acme-caa-06
Importantly, this also switches `recheck.good-caa-reserved.com` to use a
dynamically generated random name. This should fix the problem where
running integration tests locally several times resulted in hitting an
exact match rate limit error, requiring a clear of the fqdnSets table.
This also moves the creation of the client for test_recheck_caa into its
own early-setup function, so there is less test-case-specific setup in
integration-test.py.
When NewAuthorizationSchema is enabled, we still want v1 authzs to be reusable in
new orders. This tests that that code is implemented correctly.
Updates #4241
These two setup phases were only used by `test_expired_authz_404`,
which is adequately covered by unittests. Since each setup and teardown
is rather time consuming, this speeds up and simplifies integration
tests.
Before: 5m10
After: 4m46
Move from using `requests` to `urllib2` in `helpers.py`. Verified
this works with `docker-compose up`. In the future we really should
be installing our own python dependencies in the boulder-tools image
rather than relying on getting them by using the certbot virtualenv.
* Switch to instant OCSP verification in integration tests
* Move waitport to helpers and use it to determine if ocsp-responder is
alive in test_single_ocsp
This updates the `cert-checker` utility configuration with a new allow list of
ignored lints so we can exclude known false-positives/accepted info results by
name instead of result level. To start only the `n_subject_common_name_included`
lint is excluded in `test/config-next/cert-checker.json`. Once this lands we can
treat info/warning lint results as errors as a follow-up to not break
deployability guarantees.
Resolves https://github.com/letsencrypt/boulder/issues/4271
As part of #4241, I need to introduce some twenty-days-ago setup. So I refactored the
only current instance (test_caa) to use a style where setup functions can be registered right
next to the test cases they affect. The @register_twenty_days_ago is Python for
"call register_twenty_days_ago with the thing on the next line as an argument."
I also cleaned up a bunch of related stuff:
* Removed the ACCOUNT_URI environment variable and associated function params.
This was introduced in in #3736 to pass a URI to challtestsrv before we refactored for
more dynamic updates. It's not used any more.
* Removed a try / except from startChallSrv that needlessly hid errors.
* Move setting of DNS fixtures for caa_test into the test case itself.
This is now an external service.
Also bumps up the deadline in the integration test helper which checks for
purging because using the remote service from the ocsp-updater takes a little
longer. Once we remove ocsp-updater revocation support that can probably be
cranked back down to a more reasonable timeframe.
In November 2019 we will be removing support for legacy pre RFC-8555
unauthenticated GET requests for accessing ACME resources. A new
`MandatoryPOSTAsGET` feature flag is added to the WFE2 to allow
enforcing this change. Once this feature flag has been activated in Nov
we can remove both it and the WFE2 code supporting GET requests.