Commit Graph

4872 Commits

Author SHA1 Message Date
Jacob Hoffman-Andrews 211985eae7 integration: Fail tests on unindexed queries. (#4581)
Also incidentally remove the database flag from the mysql invocation
since it was unnecessary.

Fixes #4406
2019-11-27 13:19:36 -05:00
Jacob Hoffman-Andrews 70a9bf32d8 integration: allow-list a known-slow SQL query. (#4580)
We intentionally use a SLEEP in a SQL query to trigger timeout behavior.
This caused integration tests failures locally (where unittests are run
in the same session as integration tests).
2019-11-26 14:46:40 -05:00
Daniel McCarney 608c381444
SA: conditionally track issued names in AddPrecertificate. (#4573)
Prev. we inserted data for tracking issued names into the `issuedNames` table
during `sa.AddCertificate`. A more robust solution is to do this during
`sa.AddPrecertificate` since this is when we've truly committed to having
issued for the names.

The new SA `WriteIssuedNamesPrecert` feature flag enables writing this table
during `AddPrecertificate`. The legacy behaviour continues with the flag
enabled or disabled but is updated to tolerate duplicate INSERT errors so that
it is possible to deploy this change across multiple SA instances safely.

Along the way I also updated `SA.AddPrecertificate` to perform its two
`INSERT`s in a transaction using the `db.WithTransaction` wrapper.

Resolves https://github.com/letsencrypt/boulder/issues/4565
2019-11-26 13:43:32 -05:00
Jacob Hoffman-Andrews 42d70dd478 SA: Deprecate GetAuthorizationsPerf flag. (#4576)
In the process I tweaked a few variable names in GetAuthorizations2 to
refer to just "authz" instead of "authz2" because it made things
clearer, particularly in the case of authz2IDMap, which is a map of
whether a given ID exists, not a map from authz's to IDs.

Fixes #4564
2019-11-25 09:54:54 -05:00
Daniel McCarney 6ed62cf746
RA: reject Contacts that marshal too long for DB. (#4575)
In the deep dark history of Boulder we ended up jamming contacts into
a VARCHAR db field. We need to make sure that when contacts are
marshaled the resulting bytes will fit into the column or a 500 will
be returned to the user when the SA RPC fails.

One day we should fix this properly and not return a hacky error message
that's hard for users to understand. Unfortunately that will likely
require a migration or a new DB table. In the shorter term this hack
will prevent 500s which is a clear improvement.
2019-11-22 15:13:53 -05:00
Daniel McCarney fde145ab96
RA: implement stricter email validation. (#4574)
Prev. we weren't checking the domain portion of an email contact address
very strictly in the RA. This updates the PA to export a function that
can be used to validate the domain the same way we validate domain
portions of DNS type identifiers for issuance.

This also changes the RA to use the `invalidEmail` error type in more
places.

A new Go integration test is added that checks these errors end-to-end
for both account creation and account update.
2019-11-22 13:39:31 -05:00
Daniel McCarney a86ed0f753
RA: fix error returned through WFE2 for too big NewOrders. (#4572)
We need the RA's `NewOrder` RPC to return a `berrors.Malformed` instance
when there are too many identifiers. A bare error will be turned into
a server internal problem by the WFE2's `web.ProblemDetailsForError`
call while a `berrors.Malformed` will produce the expected malformed
problem.

This commit fixes the err, updates the unit test, and adds an end-to-end
integration test so we don't mess this up again.
2019-11-21 13:54:49 -05:00
Daniel McCarney 4e9ab5f04e
deps: update to eggsampler/acme/v3, run tidy, re-enable parallel tests (#4568)
This updates the `github.com/eggsampler/acme` dependency used in our Go-based
integration tests to v3. Notably this fixes a data race we encountered in CI.
With the data race fixed this branch can also revert
54a798b7f6 and resolve
https://github.com/letsencrypt/boulder/issues/4542

I ran a `go mod tidy` to cleanup the old `v2` copy of the dep and it also
removed a few stale cfssl/mysql items from the `go.mod`.

Upstream library's tests are confirmed to pass:
```
~/go/src/github.com/eggsampler/acme$ git log --pretty=format:'%h' -n 1
b581dc6

~/go/src/github.com/eggsampler/acme$ make pebble
mkdir -p /home/daniel/go/src/github.com/letsencrypt/pebble
git clone --depth 1 https://github.com/letsencrypt/pebble.git /home/daniel/go/src/github.com/letsencrypt/pebble \
	|| (cd /home/daniel/go/src/github.com/letsencrypt/pebble; git checkout -f master && git reset --hard HEAD && git pull -q)
fatal: destination path '/home/daniel/go/src/github.com/letsencrypt/pebble' already exists and is not an empty directory.
Already on 'master'
Your branch is up-to-date with 'le/master'.
HEAD is now at 6c2d514 wfe: compare Identifier.Type with acme.IndentifierIP (#287)
docker-compose -f /home/daniel/go/src/github.com/letsencrypt/pebble/docker-compose.yml up -d
Creating network "pebble_acmenet" with driver "bridge"
Creating pebble_challtestsrv_1 ... done
Creating pebble_pebble_1       ... done
while ! wget --delete-after -q --no-check-certificate "https://localhost:14000/dir" ; do sleep 1 ; done
go clean -testcache
go test -race -coverprofile=coverage_18.txt -covermode=atomic github.com/eggsampler/acme/v3
ok  	github.com/eggsampler/acme/v3	24.292s	coverage: 83.0% of statements
docker-compose -f /home/daniel/go/src/github.com/letsencrypt/pebble/docker-compose.yml down
Stopping pebble_pebble_1       ... done
Stopping pebble_challtestsrv_1 ... done
Removing pebble_pebble_1       ... done
Removing pebble_challtestsrv_1 ... done
Removing network pebble_acmenet
```
2019-11-21 09:23:12 -05:00
Daniel McCarney fabfba2e16 expiration-mailer: fix nagsAtCapacity to reset. (#4569)
When a nag group hits capacity we set the nagsAtCapacity gauge to 1.
This gauge also needs to be reset to 0 when the nag group is no longer
at capacity.
2019-11-20 19:32:58 -08:00
Daniel McCarney 4c2613a8e5 deps: update weppos/publicsuffix-go and zmap/zlint. (#4563)
Updates `github.com/weppos/publicsuffix-go` to 3dd5f42, and
`github.com/zmap/zlint` to eea5fe8. Both hashes are the tip of master at
the time of writing.

Unit tests are confirmed to pass:
```
~/go/src/github.com/weppos/publicsuffix-go$ git log --pretty=format:'%h' -n 1
3dd5f42

~/go/src/github.com/weppos/publicsuffix-go$ go test ./...
?   	github.com/weppos/publicsuffix-go/cmd/load	[no test files]
ok  	github.com/weppos/publicsuffix-go/net/publicsuffix	0.008s
ok  	github.com/weppos/publicsuffix-go/publicsuffix	0.005s
?   	github.com/weppos/publicsuffix-go/publicsuffix/generator	[no test files]

~/go/src/github.com/zmap/zlint$ git log --pretty=format:'%h' -n 1
eea5fe8

~/go/src/github.com/zmap/zlint$ go test ./...
ok  	github.com/zmap/zlint	0.240s
?   	github.com/zmap/zlint/cmd/zlint	[no test files]
?   	github.com/zmap/zlint/cmd/zlint-gtld-update	[no test files]
ok  	github.com/zmap/zlint/lints	0.156s
ok  	github.com/zmap/zlint/util	0.020s
```
2019-11-20 19:31:22 -08:00
Jacob Hoffman-Andrews 36e504f21f integration: log slow queries and check for them (#4554)
I couldn't get this to work cleanly with
`--log-queries-not-using-indexes` because a couple of queries show up
during integration test runs, seemingly because the tables involved are
small enough that the optimizer finds it faster to skip the index.

Some possible followups:
 - Allow list those queries, or
 - Preload the DB with a certain number of certificates before the start
   of testing.
2019-11-19 13:44:24 -05:00
Daniel McCarney 953885aec1
admin-revoker: use WithTransaction (#4560)
This is a small clean-up I spotted while migrating the `WithTransaction` wrapper
out of the `sa` package into `db` during #4544.

The `admin-revoker` util. was using bare transactions with the `db.Rollback`
(prev `sa.Rollback`) helper function instead of the newly exported
`db.WithTransaction` wrapper. The latter is safer so we should use it here too.

After this change all of the external consumers of the `Rollback` function have
been switched to using `WithTransaction` so we can unexport `Rollback`.
2019-11-18 15:49:25 -05:00
Daniel McCarney bfa6bcfecd
CA: add orphans and adopted_orphans prom. counters (#4558)
The `orphans` Prometheus `CounterVec` is used to count orphans that
couldn't be confirmed saved by the SA and were queued by the CA.

The `adopted_orphans` `CounterVec` is used to count orphans pulled from
the queue by the CA and successfully integrated through to the SA.

Both counter stats are labelled by "type", e.g. "precert" or "cert".
2019-11-18 15:28:22 -05:00
Daniel McCarney 47b875f01f
SA: migrate `fqdnSets.id`, `issuedNames.id` to BIGINT. (#4557)
Based on the volume of data Boulder supports we use `BIGINT(20)` for
database ID fields throughout all of our tables except for two that were
missed: `fqdnSets` and `issuedNames`. Prior to this migration both were
using `INT(11)`, allowing only values up to 2,147,483,647. After the
migration is applied the `BIGINT(20)` type allows values up to 2^63-1.
2019-11-18 13:35:20 -05:00
Roland Bracewell Shoemaker b557d870c7 CA/SA: Store issuer info in certificateStatus, use for OCSP generation (#4546)
This avoids needing to send the entire certificate in OCSP generation
RPCs.

Ended up including a few cleanups that made the implementation easier.

Initially I was struggling with how to derive the issuer identification info.
We could just stick the full SPKI hash in certificateStatus, but that takes a
significant amount of space, we could configure unique issuer IDs in the CA
config, but that would require being very careful about keeping the IDs
constant, and never reusing an ID, or we could store issuers in a table in the
database and use that as a lookup table, but that requires figuring out how to
get that info into the table etc. Instead I've just gone with what I found to
be the easiest solution, deriving a stable ID from the cert hash. This means we
don't need to remember to configure anything special and the CA config stays
the same as it is now.

Fixes #4469.
2019-11-18 09:15:29 -05:00
Daniel McCarney 6ed4ce23a8
bdns: move logDNSError to exchangeOne, log ErrId specially. (#4553)
We've found we need the context offered from logging the error closer to when it
happens in the `bdns` package rather than in the `va`. Adopting the function
requires adapting it slightly. Specifically in the new location we know it won't
be called with any timeout results, with a non-dns error, or with a nil
underlying error.

Having the logging done in `bdns` (and specifically from `exchangeOne`) also
lets us log the wire format of the query and response when we get a `dns.ErrId`
error indicating a query/response ID mismatch. A small unit test is included
that ensures the logging happens as expected.

In case it proves useful for matching against other metrics the DNS ID mismatch
error case also now increments a dedicated prometheus counter vector stat,
`dns_id_mismatch`. The stat is labelled by resolver and query type.

Resolves https://github.com/letsencrypt/boulder/issues/4532
2019-11-15 16:03:45 -05:00
Jacob Hoffman-Andrews 0804e97d88 Update mysql to b4242ba (latest). (#4552)
The most recent tagged release of mysql is v1.4.1, from a year ago. It
also happens to pull in an unwanted dependency (appengine) that the
latest commit does not.

Tests pass:

$ go test -count=1 github.com/go-sql-driver/mysql
ok github.com/go-sql-driver/mysql 0.068s

Fixes #4530
2019-11-15 12:29:14 -08:00
Roland Bracewell Shoemaker f24fd0dfc8 Cleanup leftovers from PrecertificateOCSP deprecation (#4551)
Cleans up a few things that were left out of #4465.
2019-11-14 15:23:48 -08:00
Roland Bracewell Shoemaker b8ee84da7b
Switch GenerateOCSP to directly use protos instead of wrapper (#4549) 2019-11-14 11:10:33 -08:00
Jacob Hoffman-Andrews ef18f4c1a1 SA: use stdlib context instead of golang version. (#4547)
Match the context `import` in `sa/precertificates.go` to the rest of Boulder.
2019-11-14 12:50:35 -05:00
Daniel McCarney df059e093b
janitor: add cleanup of Orders and assoc. rows. (#4544)
The `boulder-janitor` is extended to cleanup rows from the `orders` table that
have expired beyond the configured grace period, and the associated referencing
rows in `requestedNames`, `orderFqdnSets`, and `orderToAuthz2`.

To make implementing the transaction work for the deletions easier/consistent
I lifted the SA's `WithTransaction` code and assoc. functions to a new shared
`db` package. This also let me drop the one-off `janitorDb` interface from the
existing code.

There is an associated change to the `GRANT` statements for the `janitor` DB
user to allow it to find/delete the rows related to orders.

Resolves https://github.com/letsencrypt/boulder/issues/4527
2019-11-13 13:47:55 -05:00
Jacob Hoffman-Andrews 88236799d5 docs: remove ACMEv1 comment from CONTRIBUTING.md. (#4545)
This comment was there mainly to indicate that you should get protocol
changes made in ACME before implementing them in Boulder. Since the
protocol is done, this is no longer an issue. In practice we don't often
see people proposing Boulder changes that are incompatible with the
spec, so I don't think we need this line anymore.

Fixes #4541
2019-11-12 09:02:48 -05:00
Jacob Hoffman-Andrews 2ab1729a18 CA: use mockable clock in CA's OCSP signer. (#4539)
This brings OCSP signing into alignment with the other components of the
CA in that they use ca.clk, which can be mocked out in unittests.

This tweaks test_ocsp_exp_unauth to be compatible with the change.

Fixes #4441.
2019-11-11 16:17:46 -05:00
Roland Bracewell Shoemaker a44f346f88 SA: remove old authorization tables from the schema (#4538) 2019-11-11 16:17:16 -05:00
Roland Bracewell Shoemaker 54a798b7f6 tests: temporarily disable race detector for golang int. tests (#4543) 2019-11-11 14:47:54 -05:00
Jacob Hoffman-Andrews e392b28e76 PA: remove 230-byte limit on DNS names. (#4537)
This was due to an obsolete storage constraint.

Fixes #4516
2019-11-11 10:06:28 -05:00
Daniel McCarney df6b507aa9
test: fix TestPrecertificateOCSP flake. (#4536)
Since 6f71c0c switched the Go integration tests to run in parallel the
`TestPrecertificateOCSP` test has been flaky. To fix the flake the test
needs to be changed to be resilient to precertificates other than the
one it is expecting being returned by the ct-test-srv since other tests
are also concurrently using it.
2019-11-08 16:29:30 -05:00
Roland Bracewell Shoemaker 6f71c0c453 tests: run golang integration tests in parallel w/ race detector (#4533) 2019-11-08 15:10:21 -05:00
Roland Bracewell Shoemaker 17d84ce18b Add GetAuthorizationsPerf feature to test/config-next/sa.json (#4534)
Seem to have forgotten to do this in #4512.
2019-11-08 11:29:54 -08:00
Daniel McCarney 32ad79a0df deps: rerun the gopkg.in/go-gorp/gorp.v2 go get. (#4531)
Newer Go versions seem to give a different psuedoversion for this
dependency at the same commit than when we initially switched to Go
modules for Boulder. Fixing the psuedoversion now so it won't trip up
future updates unexpectedly.
2019-11-07 10:21:28 -08:00
Daniel McCarney 6da5e18a1e deps: update CFSSL to v1.4.0 (#4529)
This keeps us on a tagged release and includes only small bugfixes/doc
updates.
2019-11-06 14:23:22 -08:00
Jacob Hoffman-Andrews 5e608ccbb8 tests: use relative paths in Go integration tests. (#4526)
This makes it simpler to specify paths to testdata and config files.

Fixes #4508
2019-11-05 10:22:20 -05:00
Roland Bracewell Shoemaker e402156c1c Revert "Revert "Remove remaining old format authorization code from SA/… (#4502)" (#4524)
This reverts commit dc2ce4ca84.
2019-11-04 09:45:19 -05:00
Jacob Hoffman-Andrews 49043a4156 Clarify public key blocklist documentation. (#4523)
Previously, we referred to "DER encoded PKIX public keys", but PKIX (RFC 5280)
doesn't define a standalone "public key" type. Instead, it defines
SubjectPublicKeyInfo, containing an algorithm and a BIT STRING. As a
result, SPKI and SPKI hash are more commonly used terms, and we're more
likely to get reports based on those. We should mirror that terminology
in our documentation.
2019-11-04 09:10:36 -05:00
Jacob Hoffman-Andrews 7f6caddc5b VA: log internal DNS errors. (#4520)
When we get a DNS error that has an internal cause (like connection
refused), we return a generic message like "networking error" to the
user to avoid revealing details that would be confusing. However, when
debugging problems with our own services, it's useful to have the
underlying errors.

This adds a helper method in the VA and calls it from each place we use
DNS errors.
2019-11-04 09:09:24 -05:00
stilez 2461c9fc5b PA: make user-facing error messages more intuitive (#4513)
LE is popular and aims to popularise certificate issuance. End users who see
error messages cannot be assumed to be as DNS-experienced as previously. The
user-facing error messages in the policy authority file are terse and unobvious
to the point that they are often unlikely to be well understood by those they
are intended to inform, who may be "just trying to get a LE cert for their
domain".
2019-11-01 10:25:55 -04:00
Roland Bracewell Shoemaker e49b6d7c61 SA: remove JOIN from GetAuthorizations2 and filter in code (#4512)
Previously we used a JOIN on the orderToAuthz2 table in order to make sure
we only returned authorizations created using the ACME v2 API. Each time an
order is created a pivot row (order ID + authz ID) is added to the
orderToAuthz2 table. If a large number of orders are created that all contain
the same authorization, due to reuse, then the JOINd query would return a full
authorization row for each entry in the orderToAuthz2 table with the authorization
ID.

Instead we now filter out these authorizations by doing a second query against
the orderToAuthz2 table. Using this query still requires examining a large number
of rows, but because we don't need to construct a temporary table for the JOIN
and fill it with all the full authorization rows we should save resources.

Fixes #4500.
2019-10-31 13:25:32 -04:00
Daniel McCarney 7b60b57c33
va: log account ID in multi VA differential JSON. (#4521)
This will reduce the amount of analysis time required to identify
large integrators that aren't compatible with multi VA.
2019-10-31 13:12:28 -04:00
Roland Bracewell Shoemaker e5eb8f8736 wfe/wfe2: make JWS signature alg error msgs match reality (#4519)
Errors that were being returned in the checkAlgorithm methods of both wfe and
wfe2 didn't really match up to what was actually being checked. This change
attempts to bring the errors in line with what is actually being tested.

Fixes #4452.
2019-10-31 09:55:11 -04:00
Daniel McCarney e448e81dc4 deps: update square/go-jose to v2.4.0 (#4518)
This branch also updates the WFE2 parseJWS function to match the error string fixed in the upstream project for the case where a JWS EC public key fails to unmarshal due to an incorrect length.

Resolves #4300
2019-10-30 10:59:41 -07:00
alexzorin a36a90519b Fix start.py/docker-compose (#4510)
fakeclock became a mandatory parameter in #4509
2019-10-28 11:00:43 -07:00
Jacob Hoffman-Andrews 0c9ca050ab Tidy up default_config_dir in integration test (#4509)
We now expect that the config dir is always set, so we make that
explicit in the integration test and error if that's not true.

This change also renames the variable to just "config_dir", and removes
the parameter to startservers.start, which is currently never set to
anything other than its default value.

This also explicitly sets the environment variable in .travis.yml.
2019-10-25 09:51:48 -07:00
Jacob Hoffman-Andrews d4168626ad Fix orphan-finder (#4507)
This creates the correct type of backend service for the OCSP generator.
It also adds an invocation of orphan-finder during the integration
tests.

This also adds a minor safety check to SA that I hit while writing the
test. Without this safety check, passing a certificate with no DNSNames
to AddCertificate would result in an obscure MariaDB syntax error
without enough context to track it down. In normal circumstances this
shouldn't be hit, but it will be good to have a solid error message if
we hit it in tests sometime.

Also, this tweaks the .travis.yml so it explicitly sets BOULDER_CONFIG_DIR
to test/config in the default case. Because the docker-compose run
command uses -e BOULDER_CONFIG_DIR="${BOULDER_CONFIG_DIR}",
we were setting a blank BOULDER_CONFIG_DIR in default case.
Since the Python startservers script sets a default if BOULDER_CONFIG_DIR
is not set, we haven't noticed this before. But since this test case relies
on the actual environment variable, it became an issue.

Fixes #4499
2019-10-25 09:51:14 -07:00
Jacob Hoffman-Andrews 329e4154cd
Deprecate EarlyOrderRateLimit and FasterGetOrder (#4497)
These feature flags are already turned on in production.
2019-10-24 10:47:29 -07:00
Roland Bracewell Shoemaker 83aafd1884
Address #4476 issues (#4504)
Addresses two issues introduced in #4476:
* Keep setting the V2 field in modelToAuthzPB so RPCs returned from new components to old don't cause panics
* Don't return expired orders from the SA, so that users requesting old orders that contain old style authorizations don't cause breakage in the RA
2019-10-23 13:08:32 -07:00
Daniel McCarney 3175b4f9eb
web: strip :443/:80 unconditionally w/ features.StripDefaultSchemePort (#4505)
Only removing :443 when the http.Request.TLS is not nil breaks when
Boulder's WFE/WFE2 are running HTTP behind a separate ingress proxy that
terminates HTTPS on its behalf.
2019-10-23 15:17:13 -04:00
Roland Bracewell Shoemaker dc2ce4ca84
Revert "Remove remaining old format authorization code from SA/… (#4502)
We need to apply some fixes for bugs introduced in #4476 before it can be deployed, as such we need to revert #4495 as there needs to be a full deploy cycle between these two changes.

This reverts commit 3ae1ae1.

😭
2019-10-23 10:45:29 -07:00
Roland Bracewell Shoemaker 3ae1ae1493 Remove remaining old format authorization code from SA/protos (#4495) 2019-10-23 09:08:38 -04:00
Jacob Hoffman-Andrews 672bdcfdcb orphan-finder: Rename CAService in config. (#4496)
OCSPGeneratorService matches the semantics better, and is what
ocsp-updater uses. It also matches what's in the config-next.

This wasn't caught by integration tests because we don't currently
run orphan-finder in the integration tests. We don't have a good way
to induce failures in the SA on demand.
2019-10-22 09:25:11 -07:00
Roland Bracewell Shoemaker 46e0468220 Make authz2 the default storage format (#4476)
This change set makes the authz2 storage format the default format. It removes
most of the functionality related to the previous storage format, except for
the SA fallbacks and old gRPC methods which have been left for a follow-up
change in order to make these changes deployable without introducing
incompatibilities.

Fixes #4454.
2019-10-21 15:29:15 -04:00