Incorporates square/go-jose#282.
$ go test gopkg.in/square/go-jose.v2
go: finding gopkg.in/square/go-jose.v2 v2.4.1
ok gopkg.in/square/go-jose.v2 46.790s
* deps: update publicsuffix-go to 342bab7
This updates `github.com/weppos/publicsuffix-go` to 342bab7, the tip of
master at the time of writing.
Unit tests are confirmed to pass:
```
~/go/src/github.com/weppos/publicsuffix-go$ git log --pretty=format:'%h' -n 1
342bab7
~/go/src/github.com/weppos/publicsuffix-go$ go test ./...
? github.com/weppos/publicsuffix-go/cmd/load [no test files]
ok github.com/weppos/publicsuffix-go/net/publicsuffix 0.023s
ok github.com/weppos/publicsuffix-go/publicsuffix 0.015s
? github.com/weppos/publicsuffix-go/publicsuffix/generator [no test files]
```
* deps: update zmap/zlint to 71201e7
This updates `github.com/zmap/zlint` to 71201e7, the tip of master at
the time of writing.
Unit tests are confirmed to pass:
```
~/go/src/github.com/zmap/zlint$ git log --pretty=format:'%h' -n 1
71201e7
~/go/src/github.com/zmap/zlint$ go test ./...
ok github.com/zmap/zlint 0.205s
? github.com/zmap/zlint/cmd/zlint [no test files]
? github.com/zmap/zlint/cmd/zlint-gtld-update [no test files]
ok github.com/zmap/zlint/lints 0.214s
ok github.com/zmap/zlint/util 0.014s
```
This is a breaking API change: pkcs11key now takes as input a public key rather than
a private key label. In order to find the private key, it first finds the public key's CKA_ID
in the token, then looks for a private key with the same CKA_ID. From ftp://ftp.rsasecurity.com/pub/pkcs/pkcs-11/v2-30/pkcs-11v2-30b-d6.pdf:
> The CKA_ID field is intended to distinguish among multiple keys. In
the case of public and private keys, this field assists in handling
multiple keys held by the same subject; the key identifier for a
public key and its corresponding private key should be the same.
This does require that both the public key and private key are present and have
appropriate CKA_IDs set. I've verified this is the case in prod. In our integration
testing environment it was not the case, so I've tweaked entrypoint.sh to load
public keys into SoftHSM and set their CKA_ID.
The initial part of this change was written by @cpu. I've reviewed and approved
those commits.
* cmd: update prometheus.NewProcessCollector args.
There's a new struct `prometheus.ProcessCollectorOpts` that is expected
to be used as the sole argument to `prometheus.NewProcessCollector`. We
don't need to specify `os.Getpid` as the `PidFn` of the struct because
the default is to assume `os.Getpid`. Similarly we don't need to set the
namespace to `""` explicitly, it is the default.
* SA: reimplement db metrics as custom collector.
The modern Prometheus golang API supports translating between legacy
metric sources on the fly with a custom collector. We can use this
approach to collect the metrics from `gorp.DbMap`'s via the `sql.DB`
type's `Stats` function and the returned `sql.DbStats` struct.
This is a cleaner solution overall (we can lose the DB metrics updating
go routine) and it avoids the need to use the now-removed `Set` method
of the `prometheus.Counter` type.
* test: Update CountHistogramSamples.
The `With` function of `prometheus.HistogramVec` types we tend to use as
the argument to `test.CountHistogramSamples` changed to return
a `prometheus.Observer`. Since we only use this function in test
contexts, and only with things that cast back to
a `prometheus.Histogram` we take that approach to fix the problem
without updating call-sites.
New types and related infrastructure are added to the `db` package to allow
wrapping gorp DbMaps and Transactions.
The wrapped versions return a special `db.ErrDatabaseOp` error type when errors
occur. The new error type includes additional information such as the operation
that failed and the related table.
Where possible we determine the table based on the types of the gorp function
arguments. Where that isn't possible (e.g. with raw SQL queries) we try to use
a simple regexp approach to find the table name. This isn't great for general
SQL but works well enough for Boulder's existing SQL queries.
To get additional confidence my regexps work for all of Boulder's queries
I temporarily changed the `db` package's `tableFromQuery` function to panic if
the table couldn't be determined. I re-ran the full unit and integration test
suites with this configuration and saw no panics.
Resolves https://github.com/letsencrypt/boulder/issues/4559
The AddCertificate processing related to updating the fqdnSets and certificatesPerNames tables can be done in a separate transaction from the inserts to issuedNames and certificates.
This has the advantage of letting the overall AddCertificate request succeed when the primary transaction succeeds but the rate limit update transaction fails. We are OK with slightly incorrect rate limit results if it means more AddCertificate requests succeed and there are fewer orphaned final certificates.
To maintain visibility we audit log when the rate limit transaction fails and also increment a new failedAddCertRLTransactions prometheus counter.
Resolves#4566
In Python3, the output of subprocess.check_output is of type bytes.
That means calling print() on the output will print \n instead of an
actual newline. This PR adds decoding to the output of mysql in the slow
query test, bringing it into line with other check_output calls.
This also removes a redundant "def run" that is shadowed by the
definition in helpers.py (and was also missing a decode() call).
* SA: add unit test for auto_increment schemas.
`TestAutoIncrementSchema` uses a root user connection to the
`information_schema` MariaDB database to try and find table columns from
the Boulder schemas that are both `auto_increment` and not `int64`.
* SA: rename _db-next RemoveOCSPResponses.sql migration.
Based on the order that we apply migrations the
`RemoveOCSPResponses.sql` migration with its old prefix
(`20181101105733`) was never being applied. That in turn caused the new
`TestAutoIncrementSchema` unit test to fail because the old
`ocspResponses` table has an `id` field that is `auto_increment` but
`sized `int(11)`.
Renaming the migration with a newer prefix solves the problem. The
`ocspResponses` table ends up dropped when `config-next` is used.
Afterwards the `TestAutoIncrementSchema` unit test passes again.
Python 2 is over in 1 month 4 days: https://pythonclock.org/
This rolls forward most of the changes in #4313.
The original change was rolled back in #4323 because it
broke `docker-compose up`. This change fixes those original issues by
(a) making sure `requests` is installed and (b) sourcing a virtualenv
containing the `requests` module before running start.py.
Other notable changes in this:
- Certbot has changed the developer instructions to install specific packages
rather than rely on `letsencrypt-auto --os-packages-only`, so we follow suit.
- Python3 now has a `bytes` type that is used in some places that used to
provide `str`, and all `str` are now Unicode. That means going from `bytes` to
`str` and back requires explicit `.decode()` and `.encode()`.
- Moved from urllib2 to requests in many places.
We intentionally use a SLEEP in a SQL query to trigger timeout behavior.
This caused integration tests failures locally (where unittests are run
in the same session as integration tests).
Prev. we inserted data for tracking issued names into the `issuedNames` table
during `sa.AddCertificate`. A more robust solution is to do this during
`sa.AddPrecertificate` since this is when we've truly committed to having
issued for the names.
The new SA `WriteIssuedNamesPrecert` feature flag enables writing this table
during `AddPrecertificate`. The legacy behaviour continues with the flag
enabled or disabled but is updated to tolerate duplicate INSERT errors so that
it is possible to deploy this change across multiple SA instances safely.
Along the way I also updated `SA.AddPrecertificate` to perform its two
`INSERT`s in a transaction using the `db.WithTransaction` wrapper.
Resolves https://github.com/letsencrypt/boulder/issues/4565
In the process I tweaked a few variable names in GetAuthorizations2 to
refer to just "authz" instead of "authz2" because it made things
clearer, particularly in the case of authz2IDMap, which is a map of
whether a given ID exists, not a map from authz's to IDs.
Fixes#4564
In the deep dark history of Boulder we ended up jamming contacts into
a VARCHAR db field. We need to make sure that when contacts are
marshaled the resulting bytes will fit into the column or a 500 will
be returned to the user when the SA RPC fails.
One day we should fix this properly and not return a hacky error message
that's hard for users to understand. Unfortunately that will likely
require a migration or a new DB table. In the shorter term this hack
will prevent 500s which is a clear improvement.
Prev. we weren't checking the domain portion of an email contact address
very strictly in the RA. This updates the PA to export a function that
can be used to validate the domain the same way we validate domain
portions of DNS type identifiers for issuance.
This also changes the RA to use the `invalidEmail` error type in more
places.
A new Go integration test is added that checks these errors end-to-end
for both account creation and account update.
We need the RA's `NewOrder` RPC to return a `berrors.Malformed` instance
when there are too many identifiers. A bare error will be turned into
a server internal problem by the WFE2's `web.ProblemDetailsForError`
call while a `berrors.Malformed` will produce the expected malformed
problem.
This commit fixes the err, updates the unit test, and adds an end-to-end
integration test so we don't mess this up again.
This updates the `github.com/eggsampler/acme` dependency used in our Go-based
integration tests to v3. Notably this fixes a data race we encountered in CI.
With the data race fixed this branch can also revert
54a798b7f6 and resolve
https://github.com/letsencrypt/boulder/issues/4542
I ran a `go mod tidy` to cleanup the old `v2` copy of the dep and it also
removed a few stale cfssl/mysql items from the `go.mod`.
Upstream library's tests are confirmed to pass:
```
~/go/src/github.com/eggsampler/acme$ git log --pretty=format:'%h' -n 1
b581dc6
~/go/src/github.com/eggsampler/acme$ make pebble
mkdir -p /home/daniel/go/src/github.com/letsencrypt/pebble
git clone --depth 1 https://github.com/letsencrypt/pebble.git /home/daniel/go/src/github.com/letsencrypt/pebble \
|| (cd /home/daniel/go/src/github.com/letsencrypt/pebble; git checkout -f master && git reset --hard HEAD && git pull -q)
fatal: destination path '/home/daniel/go/src/github.com/letsencrypt/pebble' already exists and is not an empty directory.
Already on 'master'
Your branch is up-to-date with 'le/master'.
HEAD is now at 6c2d514 wfe: compare Identifier.Type with acme.IndentifierIP (#287)
docker-compose -f /home/daniel/go/src/github.com/letsencrypt/pebble/docker-compose.yml up -d
Creating network "pebble_acmenet" with driver "bridge"
Creating pebble_challtestsrv_1 ... done
Creating pebble_pebble_1 ... done
while ! wget --delete-after -q --no-check-certificate "https://localhost:14000/dir" ; do sleep 1 ; done
go clean -testcache
go test -race -coverprofile=coverage_18.txt -covermode=atomic github.com/eggsampler/acme/v3
ok github.com/eggsampler/acme/v3 24.292s coverage: 83.0% of statements
docker-compose -f /home/daniel/go/src/github.com/letsencrypt/pebble/docker-compose.yml down
Stopping pebble_pebble_1 ... done
Stopping pebble_challtestsrv_1 ... done
Removing pebble_pebble_1 ... done
Removing pebble_challtestsrv_1 ... done
Removing network pebble_acmenet
```
When a nag group hits capacity we set the nagsAtCapacity gauge to 1.
This gauge also needs to be reset to 0 when the nag group is no longer
at capacity.
Updates `github.com/weppos/publicsuffix-go` to 3dd5f42, and
`github.com/zmap/zlint` to eea5fe8. Both hashes are the tip of master at
the time of writing.
Unit tests are confirmed to pass:
```
~/go/src/github.com/weppos/publicsuffix-go$ git log --pretty=format:'%h' -n 1
3dd5f42
~/go/src/github.com/weppos/publicsuffix-go$ go test ./...
? github.com/weppos/publicsuffix-go/cmd/load [no test files]
ok github.com/weppos/publicsuffix-go/net/publicsuffix 0.008s
ok github.com/weppos/publicsuffix-go/publicsuffix 0.005s
? github.com/weppos/publicsuffix-go/publicsuffix/generator [no test files]
~/go/src/github.com/zmap/zlint$ git log --pretty=format:'%h' -n 1
eea5fe8
~/go/src/github.com/zmap/zlint$ go test ./...
ok github.com/zmap/zlint 0.240s
? github.com/zmap/zlint/cmd/zlint [no test files]
? github.com/zmap/zlint/cmd/zlint-gtld-update [no test files]
ok github.com/zmap/zlint/lints 0.156s
ok github.com/zmap/zlint/util 0.020s
```
I couldn't get this to work cleanly with
`--log-queries-not-using-indexes` because a couple of queries show up
during integration test runs, seemingly because the tables involved are
small enough that the optimizer finds it faster to skip the index.
Some possible followups:
- Allow list those queries, or
- Preload the DB with a certain number of certificates before the start
of testing.
This is a small clean-up I spotted while migrating the `WithTransaction` wrapper
out of the `sa` package into `db` during #4544.
The `admin-revoker` util. was using bare transactions with the `db.Rollback`
(prev `sa.Rollback`) helper function instead of the newly exported
`db.WithTransaction` wrapper. The latter is safer so we should use it here too.
After this change all of the external consumers of the `Rollback` function have
been switched to using `WithTransaction` so we can unexport `Rollback`.
The `orphans` Prometheus `CounterVec` is used to count orphans that
couldn't be confirmed saved by the SA and were queued by the CA.
The `adopted_orphans` `CounterVec` is used to count orphans pulled from
the queue by the CA and successfully integrated through to the SA.
Both counter stats are labelled by "type", e.g. "precert" or "cert".
Based on the volume of data Boulder supports we use `BIGINT(20)` for
database ID fields throughout all of our tables except for two that were
missed: `fqdnSets` and `issuedNames`. Prior to this migration both were
using `INT(11)`, allowing only values up to 2,147,483,647. After the
migration is applied the `BIGINT(20)` type allows values up to 2^63-1.
This avoids needing to send the entire certificate in OCSP generation
RPCs.
Ended up including a few cleanups that made the implementation easier.
Initially I was struggling with how to derive the issuer identification info.
We could just stick the full SPKI hash in certificateStatus, but that takes a
significant amount of space, we could configure unique issuer IDs in the CA
config, but that would require being very careful about keeping the IDs
constant, and never reusing an ID, or we could store issuers in a table in the
database and use that as a lookup table, but that requires figuring out how to
get that info into the table etc. Instead I've just gone with what I found to
be the easiest solution, deriving a stable ID from the cert hash. This means we
don't need to remember to configure anything special and the CA config stays
the same as it is now.
Fixes#4469.
We've found we need the context offered from logging the error closer to when it
happens in the `bdns` package rather than in the `va`. Adopting the function
requires adapting it slightly. Specifically in the new location we know it won't
be called with any timeout results, with a non-dns error, or with a nil
underlying error.
Having the logging done in `bdns` (and specifically from `exchangeOne`) also
lets us log the wire format of the query and response when we get a `dns.ErrId`
error indicating a query/response ID mismatch. A small unit test is included
that ensures the logging happens as expected.
In case it proves useful for matching against other metrics the DNS ID mismatch
error case also now increments a dedicated prometheus counter vector stat,
`dns_id_mismatch`. The stat is labelled by resolver and query type.
Resolves https://github.com/letsencrypt/boulder/issues/4532
The most recent tagged release of mysql is v1.4.1, from a year ago. It
also happens to pull in an unwanted dependency (appengine) that the
latest commit does not.
Tests pass:
$ go test -count=1 github.com/go-sql-driver/mysql
ok github.com/go-sql-driver/mysql 0.068s
Fixes#4530
The `boulder-janitor` is extended to cleanup rows from the `orders` table that
have expired beyond the configured grace period, and the associated referencing
rows in `requestedNames`, `orderFqdnSets`, and `orderToAuthz2`.
To make implementing the transaction work for the deletions easier/consistent
I lifted the SA's `WithTransaction` code and assoc. functions to a new shared
`db` package. This also let me drop the one-off `janitorDb` interface from the
existing code.
There is an associated change to the `GRANT` statements for the `janitor` DB
user to allow it to find/delete the rows related to orders.
Resolves https://github.com/letsencrypt/boulder/issues/4527
This comment was there mainly to indicate that you should get protocol
changes made in ACME before implementing them in Boulder. Since the
protocol is done, this is no longer an issue. In practice we don't often
see people proposing Boulder changes that are incompatible with the
spec, so I don't think we need this line anymore.
Fixes#4541
This brings OCSP signing into alignment with the other components of the
CA in that they use ca.clk, which can be mocked out in unittests.
This tweaks test_ocsp_exp_unauth to be compatible with the change.
Fixes#4441.
Since 6f71c0c switched the Go integration tests to run in parallel the
`TestPrecertificateOCSP` test has been flaky. To fix the flake the test
needs to be changed to be resilient to precertificates other than the
one it is expecting being returned by the ct-test-srv since other tests
are also concurrently using it.
Newer Go versions seem to give a different psuedoversion for this
dependency at the same commit than when we initially switched to Go
modules for Boulder. Fixing the psuedoversion now so it won't trip up
future updates unexpectedly.
Previously, we referred to "DER encoded PKIX public keys", but PKIX (RFC 5280)
doesn't define a standalone "public key" type. Instead, it defines
SubjectPublicKeyInfo, containing an algorithm and a BIT STRING. As a
result, SPKI and SPKI hash are more commonly used terms, and we're more
likely to get reports based on those. We should mirror that terminology
in our documentation.
When we get a DNS error that has an internal cause (like connection
refused), we return a generic message like "networking error" to the
user to avoid revealing details that would be confusing. However, when
debugging problems with our own services, it's useful to have the
underlying errors.
This adds a helper method in the VA and calls it from each place we use
DNS errors.
LE is popular and aims to popularise certificate issuance. End users who see
error messages cannot be assumed to be as DNS-experienced as previously. The
user-facing error messages in the policy authority file are terse and unobvious
to the point that they are often unlikely to be well understood by those they
are intended to inform, who may be "just trying to get a LE cert for their
domain".
Previously we used a JOIN on the orderToAuthz2 table in order to make sure
we only returned authorizations created using the ACME v2 API. Each time an
order is created a pivot row (order ID + authz ID) is added to the
orderToAuthz2 table. If a large number of orders are created that all contain
the same authorization, due to reuse, then the JOINd query would return a full
authorization row for each entry in the orderToAuthz2 table with the authorization
ID.
Instead we now filter out these authorizations by doing a second query against
the orderToAuthz2 table. Using this query still requires examining a large number
of rows, but because we don't need to construct a temporary table for the JOIN
and fill it with all the full authorization rows we should save resources.
Fixes#4500.