When the `features.PrecertificateRevocation` feature flag is enabled the WFE2
will allow revoking certificates for a submitted precertificate. The legacy WFE1
behaviour remains unchanged (as before (pre)certificates issued through the V1
API will be revocable with the V2 API).
Previously the WFE2 vetted the certificate from the revocation request by
looking up a final certificate by the serial number in the requested
certificate, and then doing a byte for byte comparison between the stored and
requested certificate.
Rather than adjust this logic to handle looking up and comparing stored
precertificates against requested precertificates (requiring new RPCs and an
additional round-trip) we choose to instead check the signature on the requested
certificate or precertificate and consider it valid for revocation if the
signature validates with one of the WFE2's known issuers. We trust the integrity
of our own signatures.
An integration test that performs a revocation of a precertificate (in this case
one that never had a final certificate issued due to SCT embedded errors) with
all of the available authentication mechanisms is included.
Resolves https://github.com/letsencrypt/boulder/issues/4414
* Use `check_call` instead of `check_output`, we don't care about
capturing the output and instead want it to go to stdout so test
failures can be debugged.
* Don't use `shell=True`, it isn't needed here.
* Pipe through the test case filter so that it can be used with
`--test.run` to limit the Go integration tests run.
This change adds two tables and two methods in the SA, to store precertificates
and serial numbers.
In the CA, when the feature flag is turned on, we generate a serial number, store it,
sign a precertificate and OCSP, store them, and then return the precertificate. Storing
the serial as an additional step before signing the certificate adds an extra layer of
insurance against duplicate serials, and also serves as a check on database availability.
Since an error storing the serial prevents going on to sign the precertificate, this decreases
the chance of signing something while the database is down.
Right now, neither table has read operations available in the SA.
To make this work, I needed to remove the check for duplicate certificateStatus entry
when inserting a final certificate and its OCSP response. I also needed to remove
an error that can occur when expiration-mailer processes a precertificate that lacks
a final certificate. That error would otherwise have prevented further processing of
expiration warnings.
Fixes#4412
This change builds on #4417, please review that first for ease of review.
We occasionally have reason to block public keys from being used in CSRs
or for JWKs. This work adds support for loading a YAML blocked keys list
to the WFE, the RA and the CA (all the components already using the
`goodekey` package).
The list is loaded in-memory and is intended to be used sparingly and
not for more complicated mass blocking scenarios. This augments the
existing debian weak key checking which is specific to RSA keys and
operates on a truncated hash of the key modulus. In comparison the
admin. blocked keys are identified by the Base64 encoding of a SHA256
hash over the DER encoding of the public key expressed as a PKIX subject
public key. For ECDSA keys in particular we believe a more thorough
solution would have to consider inverted curve points but to start we're
calling this approach "Good Enough".
A utility program (`block-a-key`) is provided that can read a PEM
formatted x509 certificate or a JSON formatted JWK and emit lines to be
added to the blocked keys YAML to block the related public key.
A test blocked keys YAML file is included
(`test/example-blocked-keys.yml`), initially populated with a few of the
keys from the `test/` directory. We may want to do a more through pass
through Boulder's source code and add a block entry for every test
private key.
Resolves https://github.com/letsencrypt/boulder/issues/4404
This test adds support in ct-test-srv for rejecting precertificates by
hostname, in order to artificially generate a condition where a
precertificate is issued but no final certificate can be issued. Right
now the final check in the test is temporarily disabled until the
feature is fixed.
Also, as our first Go-based integration test, this pulls in the
eggsampler/acme Go client, and adds some suport in integration-test.py.
This also refactors ct-test-srv slightly to use a ServeMux, and fixes
a couple of cases of not returning immediately on error.
This also removes some awkward dancing we did in integration_test.py to
run setup_twenty_days_ago under the opposite config of whatever we were
about to run tests under.
Reverts most of #4288 and #4290.
This PR changes the VA to return `dns` problem type for errors when performing
HTTP-01 challenges for domains that have no IP addresses, or errors looking up
the IP addresses.
The `va.getAddrs` function is internal to the VA and can return
`berrors.BoulderError`s with a DNS type when there is an error, allowing the
calling code to convert this to a problem when required
using an updated `detailedError` function. This avoids some clunky conversion
the HTTP-01 code was doing that misrepresented DNS level errors as connection
problems with a DNS detail message.
In order to add an integration test for challenge validation that results in
`getAddrs` DNS level errors the Boulder tools image had to be bumped to a tag
that includes the latest `pebble-challtestsrv` that
supports mocking SERVFAILs. It isn't possible to mock this case with internal IP
addresses because our VA test configuration does not filter internal addresses
to support the testing context.
Additionally this branch removes the `UnknownHostProblem` from the `probs`
package:
1. It isn't used anywhere after 532c210
2. It's not a real RFC 8555 problem type. We should/do use the
DNS type for this.
Resolves https://github.com/letsencrypt/boulder/issues/4407
Include identifierType in queries so that the regID_identifier_status_expires_idx index is properly utilized. Did a once over of the other authz2 queries to verify we are properly using their indexes as well and everything else looks like it is working as intended.
The ID fields on each of these three tables is an auto-incrementing
primary key and so the additional `ORDER` clause in the SQL queries to
find work from these tables is unnecessary.
This better matches what's logged when there is an error deleting
a resource. Without adding this context errors from getWork aren't
identifiable without cross-referencing the Prometheus stats.
* deps: update github.com/zmap/zlint to latest.
This captures a new lint (`e_subject_printable_string_badalpha`) that
addresses a historic Let's Encrypt incident related to the allowed
PrintableString character set. It also pulls in minor housekeeping
related to consistently prefixing lint names with their respective lint
result level.
* review: fix expected lint name in TestIgnoredLint.
The upstream `zlint` project added a missing `w_` prefix on the
`ct_sct_policy_count_unsatisifed` lint that needed to be reflected in
expected test output.
To make this work, I changed the twenty_days_ago setup to use
`config-next` when the main test phase is running `config`. That, in
turn, made the recheck_caa test fail, so I added a tweak to that.
I also moved the authzv2 migrations into `db`. Without that change,
the integration test would fail during the twenty_days_ago setup because
Boulder would attempt to create authzv2 objects but the table wouldn't
exist yet.
To make log analysis easier we choose to elevate the pseudo ACME HTTP
method "POST-as-GET" to the `web.RequestEvent.Method` after processing
a valid POST-as-GET request, replacing the "POST" method value that will
have been set by the outermost handler.
These errors show up in the Publisher at shutdown during integration
test runs, because the Publisher is trying to write responses from RPCs
that were slow due to the ct-test-srv's LatencySchedule. This
specifically happens only for the optional submission of "final"
certificates.
In getAllOrderAuthorizationStatuses, we were using a transaction for a series
of SELECTs. Since these SELECTs don't need to be strongly consistent with
each other, that creates needless locking and round trips.
The ocsp-updater ocspStaleMaxAge config var has to be bumped up to ~7 months so that when it is run after the six-months-ago run it will actually update the ocsp responses generated during that period and mark the certificate status row as expired.
Fixes#4338.
* deps: update github.com/zmap/zlint to latest.
Update the `github.com/zmap/zlint` dependency to b126a9b. This captures
a small fix to the `ct_sct_policy_count_unsatisfied` lint that ensures
it isn't run for precertificates.
* config: remove ct_sct_policy_count_unsatisfied from ignored_lints.
With the latest `zlint` the `ct_sct_policy_count_unsatisfied` lint won't
flag precertificates as having an info-level lint result for missing
SCTs. With that fix in place we no longer have to ignore this lint in
the config-next CA configs that enable preissuance linting.
In the current SA code, we need to remember to call Rollback on any error.
If we don't, we'll leave dangling transactions, which are hard to spot but eventually
clog up the database and cause availability problems.
This change attempts to deal with rollbacks more rigorously, by implementing a
withTransaction function that takes a closure as input. withTransaction opens
a transaction, applies a context.Context to it, and then runs the closure. If the
closure returns an error, withTransaction rolls back and return the error; otherwise
it commits and returns nil.
One of the quirks of this implementation is that it relies on the closure modifying
variables from its parent scope in order to return values. An alternate implementation
could define the return value of the closure as interface{}, nil, and have the calling
function do a type assertion. I'm seeking feedback on that; not sure yet which is cleaner.
This is a subset of the functions that need this treatment. I've got more coming, but
some of the changes break tests so I'm checking into why.
Updates #4337
The `test/config-next` CA configs are both updated to use `zlint` to lint TBS
pre-certificates with a throw-away key and treat any lint findings >=
`lints.Pass` as an error, blocking the CA from signing the TBS pre-cert with its
private key.
The CA `issuePrecertificateInner` function is updated to specifically catch
linting related errors from CFSSL to marshal the linting findings to the audit
log. A small unit test for this change is included.
The CA `IssueCertificateForPrecertificate` function remains unchanged: the CFSSL
interface that defines `SignFromPrecert` doesn't facilitate linting. We still
lint final certificates post-issuance with `cert-checker` and accept the
possibility there may be some compliance issues that could occur between the
precertificate passing linting and the final certificate being signed.
Resolves https://github.com/letsencrypt/boulder/issues/4255
This will unblock pre-issuance linting support by updating the
`github.com/cloudflare/cfssl` dependency to the `1.3.4` tag which
notably includes the zlint integration developed in
cloudflare/cfssl#1015
Notably this brings in:
* A mild perf. boost from an updated transitive zcrypto dep and a reworked util func.
* A new KeyUsage lint for ECDSA keys.
* Updated gTLD data.
* A required `LintStatus` deserialization fix that will unblock a CFSSL update.
The `TestIgnoredLint` unit test is updated to no longer expect a warning from the
` w_serial_number_low_entropy` lint. This lint was removed in the upstream project.
XHR requests from web-based ACME clients provide the User-Agent
of the browser that initiated the request, but the hostname of the site
that originated the request is sent in the Origin header. This will let
us better analyze web-based ACME traffic.
Fixes#4370
For authzv1, this actually executes a SQL DELETE for the unused challenges
when an authorization is updated upon validation.
For authzv2, this doesn't perform a delete, but changes the authorizations that
are returned so they don't include unused challenges.
In order to test the flag for both authz storage models, I set the feature flag in
both config/ and config-next/.
Fixes#4352
The gRPC INFO log lines clutter up integration test output, and we've never
had a use for them in production (they are mostly about details of
connection status).
A new `boulder-janitor` command is added that provides a long-running
daemon that cleans up rows associated with expired certificate
resources. At present this is rows from the following tables:
* certificates
* certificateStatus
* certificatesPerName
Adding cleanup of tables associated with Order resources is the next step.
Three prometheus stats are exported:
* janitor_deletions - CounterVec for the number of deletions by table the
boulder-janitor has performed.
* janitor_workbatch - GaugeVec for the number of items of work by table
the boulder-janitor queued for deletion.
* janitor_errors - CounterVec for the number of errors by table and error
type the boulder-janitor has experienced.