The `boulder-janitor` is extended to cleanup rows from the `orders` table that
have expired beyond the configured grace period, and the associated referencing
rows in `requestedNames`, `orderFqdnSets`, and `orderToAuthz2`.
To make implementing the transaction work for the deletions easier/consistent
I lifted the SA's `WithTransaction` code and assoc. functions to a new shared
`db` package. This also let me drop the one-off `janitorDb` interface from the
existing code.
There is an associated change to the `GRANT` statements for the `janitor` DB
user to allow it to find/delete the rows related to orders.
Resolves https://github.com/letsencrypt/boulder/issues/4527
This comment was there mainly to indicate that you should get protocol
changes made in ACME before implementing them in Boulder. Since the
protocol is done, this is no longer an issue. In practice we don't often
see people proposing Boulder changes that are incompatible with the
spec, so I don't think we need this line anymore.
Fixes#4541
This brings OCSP signing into alignment with the other components of the
CA in that they use ca.clk, which can be mocked out in unittests.
This tweaks test_ocsp_exp_unauth to be compatible with the change.
Fixes#4441.
Since 6f71c0c switched the Go integration tests to run in parallel the
`TestPrecertificateOCSP` test has been flaky. To fix the flake the test
needs to be changed to be resilient to precertificates other than the
one it is expecting being returned by the ct-test-srv since other tests
are also concurrently using it.
Newer Go versions seem to give a different psuedoversion for this
dependency at the same commit than when we initially switched to Go
modules for Boulder. Fixing the psuedoversion now so it won't trip up
future updates unexpectedly.
Previously, we referred to "DER encoded PKIX public keys", but PKIX (RFC 5280)
doesn't define a standalone "public key" type. Instead, it defines
SubjectPublicKeyInfo, containing an algorithm and a BIT STRING. As a
result, SPKI and SPKI hash are more commonly used terms, and we're more
likely to get reports based on those. We should mirror that terminology
in our documentation.
When we get a DNS error that has an internal cause (like connection
refused), we return a generic message like "networking error" to the
user to avoid revealing details that would be confusing. However, when
debugging problems with our own services, it's useful to have the
underlying errors.
This adds a helper method in the VA and calls it from each place we use
DNS errors.
LE is popular and aims to popularise certificate issuance. End users who see
error messages cannot be assumed to be as DNS-experienced as previously. The
user-facing error messages in the policy authority file are terse and unobvious
to the point that they are often unlikely to be well understood by those they
are intended to inform, who may be "just trying to get a LE cert for their
domain".
Previously we used a JOIN on the orderToAuthz2 table in order to make sure
we only returned authorizations created using the ACME v2 API. Each time an
order is created a pivot row (order ID + authz ID) is added to the
orderToAuthz2 table. If a large number of orders are created that all contain
the same authorization, due to reuse, then the JOINd query would return a full
authorization row for each entry in the orderToAuthz2 table with the authorization
ID.
Instead we now filter out these authorizations by doing a second query against
the orderToAuthz2 table. Using this query still requires examining a large number
of rows, but because we don't need to construct a temporary table for the JOIN
and fill it with all the full authorization rows we should save resources.
Fixes#4500.
Errors that were being returned in the checkAlgorithm methods of both wfe and
wfe2 didn't really match up to what was actually being checked. This change
attempts to bring the errors in line with what is actually being tested.
Fixes#4452.
This branch also updates the WFE2 parseJWS function to match the error string fixed in the upstream project for the case where a JWS EC public key fails to unmarshal due to an incorrect length.
Resolves#4300
We now expect that the config dir is always set, so we make that
explicit in the integration test and error if that's not true.
This change also renames the variable to just "config_dir", and removes
the parameter to startservers.start, which is currently never set to
anything other than its default value.
This also explicitly sets the environment variable in .travis.yml.
This creates the correct type of backend service for the OCSP generator.
It also adds an invocation of orphan-finder during the integration
tests.
This also adds a minor safety check to SA that I hit while writing the
test. Without this safety check, passing a certificate with no DNSNames
to AddCertificate would result in an obscure MariaDB syntax error
without enough context to track it down. In normal circumstances this
shouldn't be hit, but it will be good to have a solid error message if
we hit it in tests sometime.
Also, this tweaks the .travis.yml so it explicitly sets BOULDER_CONFIG_DIR
to test/config in the default case. Because the docker-compose run
command uses -e BOULDER_CONFIG_DIR="${BOULDER_CONFIG_DIR}",
we were setting a blank BOULDER_CONFIG_DIR in default case.
Since the Python startservers script sets a default if BOULDER_CONFIG_DIR
is not set, we haven't noticed this before. But since this test case relies
on the actual environment variable, it became an issue.
Fixes#4499
Addresses two issues introduced in #4476:
* Keep setting the V2 field in modelToAuthzPB so RPCs returned from new components to old don't cause panics
* Don't return expired orders from the SA, so that users requesting old orders that contain old style authorizations don't cause breakage in the RA
Only removing :443 when the http.Request.TLS is not nil breaks when
Boulder's WFE/WFE2 are running HTTP behind a separate ingress proxy that
terminates HTTPS on its behalf.
We need to apply some fixes for bugs introduced in #4476 before it can be deployed, as such we need to revert #4495 as there needs to be a full deploy cycle between these two changes.
This reverts commit 3ae1ae1.
😭
OCSPGeneratorService matches the semantics better, and is what
ocsp-updater uses. It also matches what's in the config-next.
This wasn't caught by integration tests because we don't currently
run orphan-finder in the integration tests. We don't have a good way
to induce failures in the SA on demand.
This change set makes the authz2 storage format the default format. It removes
most of the functionality related to the previous storage format, except for
the SA fallbacks and old gRPC methods which have been left for a follow-up
change in order to make these changes deployable without introducing
incompatibilities.
Fixes#4454.
SCT signature verification already happens within the CT client, so we are
currently doing it twice for no reason. This change removes the redundant check
we perform.
In f32fdc4 the Boulder logging framework was updated to emit a CRC32-IEEE
checksum in log lines. The `log-validator` command verifies these checksums in
one of two ways:
1. By running as a daemon process, tailing logs and verifying checksums as they
arrive.
2. By running as a one-off command, verifying checksums of every line in a log
file on disk.
If we can't write to stdout we prefer to panic immediately rather than
potentially lose logs we capture from redirecting stdout as a syslog backup.
A unit test is included to verify the panic behaviour. Prior to the `log` diff
in this branch the test failed because the non-nil `err` result from
`fmt.Printf` was being away:
```
=== RUN TestStdoutFailure
=== PAUSE TestStdoutFailure
=== CONT TestStdoutFailure
FAIL github.com/letsencrypt/boulder/log 0.011s
FAIL
```
After the `log` package diff in this branch is applied the test passes.
I additionally tested this end-to-end by redirecting stdout to a full
filesystem volume mounted into the Boulder docker image. It provoked the
expected panic when a component tried to write to stdout and the filesystem was
full.
The idea expressed in this comment isn't representative of the
Boulder cmds. E.g. There's no top level "App Shell" in use and the
`NewAppShell`, `Action` and `Run` functions ref'd do not exist.
We have a nice `sa/precertificates.go` file that holds `AddPrecertificate`
(and other precert functions). Let's put `GetPrecertificate` there
too instead of in the more generic `sa/sa.go` file.
Adds a CRC32-IEEE checksum to our log lines. At most this adds 8 bytes per line, and at least adds 2 bytes. Given this a relatively minor change I haven't bothered flagging it, although if we have anything in place that assumes the current structure of log lines we may want to add a flag in order to prevent immediate breakage before things can be altered.
Fixes#4474.
A gauge wasn't the appropriate stat type choice for this usage.
Switching the stat to be a counter instead of a gauge means we can't
detect when the janitor is finished its work in the integration test by
watching for this stat to drop to zero for all the table labels we're
concerned with. Instead the test is updated to watch for the counter
value to stabilize for a period longer than the workbatch sleep.
Spamming runs of the `TestPrecertificateRevocation` integration test from
1cd9733c24 found two ways it would flake on rare
occasion:
1. A [data race in the
`ct-test-srv`](https://gist.github.com/cpu/761c176cb72e0eaa52656d3322423202)
would kill the test log process and the integration test would be unable to
reach the mock API. This causes the test failure flagged in #4460. The root
cause is addressed by refactoring the `ct-test-srv`'s
`addChainOrPre` function to use a separate function for checking/extending the
rejected list with the correct locking in place.
2. Occasionally the integration test wasn't able to find a matching precert in
the very first configured ct-test-srv. This produces a test failure like:
```
--- FAIL: TestPrecertificateRevocation (4.95s)
--- FAIL: TestPrecertificateRevocation/revocation_by_certificate_key (1.27s)
revocation_test.go:110: finding rejected precertificate: no matching ct-test-srv rejection found
FAIL
FAIL github.com/letsencrypt/boulder/test/integration 4.961s
FAIL
```
I believe this is addressed by changing the integration test logic to check all of
the configured `ct-test-srv` instances for a matching precert instead of just
the first.
Resolves https://github.com/letsencrypt/boulder/issues/4460
Since OCSP signing happens before precertificate signing, it is simpler
and safer to consider the issuance a failure if OCSP signing fails than to
continue with signing the precertificate and try to sign OCSP later.
Fixes#4429
In the process, rename generateOCSPAndStoreCertificate to just
storeCertificate, because that function doesn't generate OCSP anymore;
instead the OCSP is generated (and stored) at precertificate issuance
time.
This also adds the badCSR error type specified by RFC 8555. It is a natural fit for the errors in VerifyCSR that aren't covered by badPublicKey. The web package function for converting a berror to
a problem is updated for the new badCSR error type.
The callers (RA and CA) are updated to return the berrors from VerifyCSR as is instead of unconditionally wrapping them as a berrors.MalformedError instance. Unit/integration tests are updated accordingly.
Resolves#4418