Commit Graph

3835 Commits

Author SHA1 Message Date
Roland Shoemaker 7189bbc79c Review fixes pt. 1 2017-02-22 13:56:08 -08:00
Daniel McCarney e23b4b6896 Adds script for tagging boulder-tools images. (#2577)
This commit adds a small script `tag_and_upload.sh` that:

1) Builds the boulder-tools image with the correct tag
2) Prompts you to log in to dockerhub
3) Pushes the boulder-tools image

This means I won't have to remember how to do this next time we need to
bump our Go version :-)
2017-02-21 12:00:44 -05:00
David Calavera 0dc2513d2d
Generate GRPC objects with Go 1.8.
Signed-off-by: David Calavera <david.calavera@gmail.com>
2017-02-21 12:11:17 +01:00
David Calavera 0d1cc66cab
Update to Go 1.8.0.
Signed-off-by: David Calavera <david.calavera@gmail.com>
2017-02-21 10:57:57 +01:00
Sophie Herold db3a6d6507 Draft-05 divergences and sync with release (#2573)
Updates acme-divergences.md for draft-05 changes.
2017-02-20 12:30:00 -05:00
Roland Shoemaker 415aa9598a Revert accidental inclusion of unrelated code 2017-02-16 13:20:07 -08:00
Daniel McCarney 81542c426d Removes explicit `boulder-tools` docker pull cmd. (#2569)
Remove explicit `boulder-tools` docker pull cmd.

Per @jsha's comment in #2567 it should be possible to remove the explicit docker pull letsencrypt/boulder-tools since the docker-compose pull that precedes it will take care of it.
2017-02-16 15:57:54 -05:00
Daniel McCarney fcf361c327 Remove CertStatusOptimizationsMigrated Feature Flag & Assoc. Cruft (#2561)
The NotAfter and IsExpired fields on the certificateStatus table
have been migrated in staging & production. Similarly the
CertStatusOptimizationsMigrated feature flag has been turned on after
a successful backfill operation. We have confirmed the optimization is
working as expected and can now clean out the duplicated v1 and v2
models, and the feature flag branching. The notafter-backfill command
is no longer useful and so this commit also cleans it out of the repo.

Note: Some unit tests were sidestepping the SA and inserting
certificateStatus rows explicitly. These tests had to be updated to
set the NotAfter field in order for the queries used by the
ocsp-updater and the expiration-mailer to perform the way the tests
originally expected.

Resolves #2530
2017-02-16 11:35:00 -08:00
Daniel McCarney 0fe8ae01f1 Adds explicit tag to Travis boulder-tools pull. (#2566)
Previously we had a 'latest' docker image tag for the `boulder-tools`
image that was *not* the most recent. This was confusing so we deleted
it this morning to close #2030.

Unfortunately Travis was defaulting to pulling the "latest" tag since
one wasn't specified (e.g. the way we do in `Dockerfile` and
`docker-compose.yml`). Resulting in build breakage:

```
docker pull letsencrypt/boulder-tools
Using default tag: latest
Pulling repository docker.io/letsencrypt/boulder-tools
Tag latest not found in repository docker.io/letsencrypt/boulder-tools
The command "docker pull letsencrypt/boulder-tools" failed and exited
with 1
```

This commit specifies the same tag as in `Dockerfile` and
`docker-compose.yml` for travis. We will need to update this tag when we
update the other places for a new boolder-tools image.
2017-02-15 17:19:36 -05:00
Jacob Hoffman-Andrews 1a92b5df28 Link to instructions to reset Docker. (#2563)
* Link to instructions to reset Docker.

* Use correct terms.
2017-02-15 10:41:56 -05:00
Roland Shoemaker 811740924a Revert docker-compose change 2017-02-14 21:00:16 -08:00
Roland Shoemaker fb4cdebfdc Revert pkcs11 config changes 2017-02-14 12:57:27 -08:00
Jacob Hoffman-Andrews 15bb5a8027 Properly close httptest Servers. (#2560)
Rolling forward #2110 now that we are on a modern Go.
2017-02-14 15:08:08 -05:00
Roland Bracewell Shoemaker 0c04fe2f5e Move error wrapping/unwrapping into the interceptors (#2556)
Instead of using `unwrapError/wrapError` in each of the wrapper functions do it in the server/client interceptors instead. This means we now consistently do error unwrapping/wrapping.

Fixes #2509.
2017-02-13 12:56:23 -05:00
Jacob Hoffman-Andrews 154ee0af3b Add DNS challenge to integration test. (#2548)
Part of #2521.
2017-02-13 09:17:13 -08:00
Jacob Hoffman-Andrews 1b994083ba Use latest Certbot in boulder-tools. (#2554)
This allows us to iterate more easily against the current acme module.

Also, remove nodejs from boulder-tools, clean up a few packages that weren't
previously cleaned up, and install a specific version of protoc-gen-go to match
our vendored grpc.
2017-02-09 16:10:01 -08:00
Simone Carletti affa0e92cd Upgrade the PSL (and publicsuffix-go to v0.3.2) (#2553)
In the last weeks we made some large changes to the list of .RU and .SU domains in the PSL, due to some very old policy changes at the registry (2009) and more recent follow up.

Given the amount of pressure about these changes from certain users, most certainly because LE limits, I figured out you'll soon have people asking you to merge the changes. I've packaged a new release of publicsuffix-go, and updated the dependency in this PR.

$ git show master

commit c5490f26d8f43b84857ac54e23387b8ed9b100dd
Author: Simone Carletti <weppos@weppos.net>
Date:   Tue Feb 7 23:26:14 2017 +0100

    Release 0.3.2
➜  publicsuffix-go git:(master) go test ./...
?   	github.com/weppos/publicsuffix-go/cmd/load	[no test files]
ok  	github.com/weppos/publicsuffix-go/net/publicsuffix	0.023s
ok  	github.com/weppos/publicsuffix-go/publicsuffix	0.039s

Please note this release also includes the .ONION as per publicsuffix/list#374
2017-02-07 14:59:48 -08:00
Patrick Figel 6ba8aadfd7 Use X.509 AIA Issuer URL in rel="up" link header (#2545)
In order to provide the correct issuer certificate for older certificates after an issuer certificate rollover or when using multiple issuer certificates (e.g. RSA and ECDSA), use the AIA CA Issuer URL embedded in the certificate for the rel="up" link served by WFE. This behaviour is gated behind the UseAIAIssuerURL feature, which defaults to false.

To prevent MitM vulnerabilities in cases where the AIA URL is HTTP-only, it is upgraded to HTTPS.

This also adds a test for the issuer URL returned by the /acme/cert endpoint. wfe/test/178.{crt,key} were regenerated to add the AIA extension required to pass the test.

/acme/cert was changed to return an absolute URL to the issuer endpoint (making it consistent with /acme/new-cert).

Fixes #1663
Based on #1780
2017-02-07 11:19:22 -08:00
Roland Bracewell Shoemaker 18de73f0d8 Pass nil errors through boulder/grpc wrapError/unwrapError (#2544)
Instead of trying to wrap or unwrap them which causes panics.

Also, expand the test_ct_submission integration test to include resubmissions.
2017-02-06 18:19:39 -08:00
Jacob Hoffman-Andrews c00e4cb545 Remove test.js. (#2549)
It's been replaced with chisel.py, which uses the Python acme module.

Add instructions on installing dependencies for integration test.
2017-02-06 15:23:58 -08:00
Jacob Hoffman-Andrews 8521bd1d44 Merge pull request #2542 from letsencrypt/cpu-prom-auto-fix
This PR does two things:

1) It reverts the revert of #2474, restoring the Prometheus stats bridge work

2) It strips invalid metric name characters in `promAdjust`. The `promAdjust` function in `metrics/auto.go` previously allowed characters that were not valid in prometheus metric names (e.g. '>'). This PR updates `promAdjust` to remove invalid characters. The `TestPromAdjust` function is updated with testcases that include invalid characters.

(Note for reviewers: the diffs in cc896c9996 are the only "new" work in this PR to review)

Resolves https://github.com/letsencrypt/boulder/issues/2540

(Credit to @rolandshoemaker for spotting the problem while I chased ghosts!!!)
2017-02-02 15:01:24 -08:00
Daniel cc896c9996
Strips invalid characters in `promAdjust`.
The `promAdjust` function in `auto.go` previously allowed characters
that were not valid in prometheus metric names (e.g. '>'). This commit
updates `promAdjust` to remove invalid characters. The `TestPromAdjust`
function is updated with testcases that include invalid characters.
2017-02-01 16:07:33 -05:00
Daniel e88db3cd5e
Revert "Revert "Copy all statsd stats to Prometheus. (#2474)" (#2541)"
This reverts commit 9d9e4941a5 and
restores the statsd prometheus code.
2017-02-01 15:48:18 -05:00
Daniel McCarney 9d9e4941a5 Revert "Copy all statsd stats to Prometheus. (#2474)" (#2541)
This reverts commit 58ccd7a71a.

We are seeing multiple boulder components restart when they encounter the stat registration race condition described in https://github.com/letsencrypt/boulder/issues/2540
2017-02-01 12:50:27 -05:00
Jacob Hoffman-Andrews d012a87049 Remove specialized exit codes. (#2537)
Simply rely on exceptions from check_output.

Also, factor out common params for check_output into a `run` helper function.
Makes sure we always capture stderr into stdout.
2017-01-31 22:30:14 -08:00
Alex Jordan 1896461f32 Make the tests README less ugly (#2539)
All the big bold headers were pretty jarring.
2017-01-31 16:30:01 -05:00
Roland Bracewell Shoemaker a0da51ee7b Fix Akamai OCSP purging code (#2531)
Purge both query encoded and un-encoded OCSP GET URLs and fix a typo in the POST body key.
Since there is no public documentation of how the latter Akamai feature works we were unable to
confirm that the previously spelling of `body-mdy` was a typo of `body-md5` but active testing against
the staging environment and the Akamai API has confirmed that it was.

Fixes #2528.
2017-01-30 10:43:18 -08:00
Daniel McCarney 00d11f126b Parse feature flags in all cmd's (#2534)
If you are the first person to add a feature to a Boulder command its very
easy to forget to update the command's config structure to accommodate a
`map[string]bool` entry and to pass it to `features.Set` in `main()`. See
https://github.com/letsencrypt/boulder/issues/2533 for one example. I've
fallen into this trap myself a few times so I'm going to try and save myself
some future grief by fixing it across the board once and for all!

This PR adds a `Features` config entry and a corresponding `features.Set` to:
* ocsp-updater (resolves #2533)
* admin-revoker
* boulder-publisher
* contact-exporter
* expiration-mailer
* expired-authz-purger
* notify-mailer
* ocsp-responder
* orphan-finder

These components were skipped because they already had features supported:
* boulder-ca
* boulder-ra
* boulder-sa
* boulder-va
* boulder-wfe
* cert-checker

I deliberately skipped adding Feature support to:
* single-ocsp (Its only configuration comes from the pkcs11key library and
  doesn't support features)
* rabbitmq-setup (No configuration/features and we'll likely soon be rming this
  since the gRPC migration)
* notafter-backfill (This is a one-off that will be deleted soon)
2017-01-27 16:29:46 -05:00
Daniel McCarney 3fa950ac58 Improve VA TLS-SNI-01 challenge failure error. (#2527)
Previous to this PR the VA's validateTLSWithZName function would
return an error message containing the SAN names of the leaf certificate
when the validation failed. This commit updates that message to include
the Subject Common Name of the leaf cert in addition to the SANs. The
names are deduplicated to prevent listing a Subj CN twice if its also
a SAN. This will help debug cases where a cert with no SANs is returned
by the server.

In addition, the number of certificates in the chain received from the
server is included in the message. This will hopefully further help
users identify misconfiguration since a TLS SNI 01 challenge response
should have a chain length of 1.

Resolves #2468
2017-01-27 10:05:42 -08:00
Roland Shoemaker 7b49bbfc40 Better info pt. 1 2017-01-26 11:01:18 -08:00
Jacob Hoffman-Andrews 01e78fbd1b Restore error check for config-next. (#2525)
This check was previously commented out because it would fail under gRPC, but
now that the underlying bug is fixed we can uncomment it.
2017-01-25 15:49:15 -05:00
Daniel McCarney 63e8d25394 Fixes Mandrill UNSUB merge tag (#2524)
Per the documentation we should be using *|UNSUB:url|* not |UNSUB:url|. I also confirmed that the template that predates the Boulder data/ version used *|UNSUB:https://mandrillapp.com/unsub|*.

I also moved the "WARNING" onto the preceding line. The unsubscribe URL is quite long and gnarly once its been filed in and the warning could be missed in a MUA that doesn't wrap lines well.

This resolves #2523
2017-01-25 10:53:17 -08:00
Jacob Hoffman-Andrews 056defba86 Refactor integration test. (#2515)
Add a new tiny client called chisel, in place of test.js. This reduces the
number of language runtimes Boulder depends on for its tests. Also, since chisel
uses the acme Python library, we get more testing of that library, which
underlies Certbot. This also gives us more flexibility to hook different parts
of the issuance flows in our tests.

Reorganize integration-test.py itself. There was not clear separation of
specific test cases. Some test cases were added as part of run_node_test; some
were wrapped around it. There is now much closer to one function per test case.
Eventually we may be able to adopt Python's test infrastructure for these test
cases.

Remove some unused imports; consolidate on urllib2 instead of urllib.

For getting serial number and expiration date, replace shelling out to OpenSSL
with using pyOpenSSL, since we already have an in-memory parsed certificate.

Replace ISSUANCE_FAILED, REVOCATION_FAILED, MAILER_FAILED with simple die, since
we don't use these. Later, I'd like to remove the other specific exit codes. We
don't make very good use of them, and it would be more effective to just use
stack traces or, even better, reporting of which test cases failed.

Make test_single_ocsp_sign responsible for its own subprocess lifecycle.

Skip running startservers if WFE is already running, to make it easier to
iterate against a running Boulder (saves a few seconds of Boulder startup).
2017-01-25 10:50:04 -08:00
Roland Bracewell Shoemaker 7853532972 Encode challenge errors and validation records when handling protobufs (#2520)
Previously we had `Error` and `ValidationRecords` fields in the `Challenge` protobuf but they were never populated which mean't that when using gRPC these fields wouldn't be sent to the SA from the RA on a `FinalizeAuthorization` call. This change populates those fields and updates the PB marshaling tests to verify the correct behavior.

Fixes #2514.
2017-01-25 09:39:35 -05:00
Jacob Hoffman-Andrews ad3738bbf5 Robustify expired_authz_purger test. 2017-01-24 18:02:35 -08:00
Jacob Hoffman-Andrews ecd8d558f3 Review feedback. 2017-01-24 17:45:19 -08:00
Roland Shoemaker e841bddc38 Remove OCSP code and add some easy wfe overrides 2017-01-24 15:24:45 -08:00
Jacob Hoffman-Andrews 94bd21c082 Merge branch 'master' of github.com:letsencrypt/boulder into chisel2 2017-01-23 13:30:11 -08:00
Jacob Hoffman-Andrews 6c93b41f20 Add a limit on failed authorizations (#2513)
Fixes #976.

This implements a new rate limit, InvalidAuthorizationsPerAccount. If a given account fails authorization for a given hostname too many times within the window, subsequent new-authz attempts for that account and hostname will fail early with a rateLimited error. This mitigates the misconfigured clients that constantly retry authorization even though they always fail (e.g., because the hostname no longer resolves).

For the new rate limit, I added a new SA RPC, CountInvalidAuthorizations. I chose to implement this only in gRPC, not in AMQP-RPC, so checking the rate limit is gated on gRPC. See #2406 for some description of the how and why. I also chose to directly use the gRPC interfaces rather than wrapping them in core.StorageAuthority, as a step towards what we will want to do once we've moved fully to gRPC.

Because authorizations don't have a created time, we need to look at the expires time instead. Invalid authorizations retain the expiration they were given when they were created as pending authorizations, so we use now + pendingAuthorizationLifetime as one side of the window for rate limiting, and look backwards from there. Note that this means you could maliciously bypass this rate limit by stacking up pending authorizations over time, then failing them all at once.

Similarly, since this limit is by (account, hostname) rather than just (hostname), you can bypass it by creating multiple accounts. It would be more natural and robust to limit by hostname, like our certificate limits. However, we currently only have two indexes on the authz table: the primary key, and

(`registrationID`,`identifier`,`status`,`expires`)

Since this limit is intended mainly to combat misconfigured clients, I think this is sufficient for now.

Corresponding PR for website: letsencrypt/website#125
2017-01-23 11:22:51 -08:00
Daniel McCarney 15e73edc5a Google Safe Browsing V4 Improvements (#2504)
This PR has three primary contributions:

1. The existing code for using the V4 safe browsing API introduced in #2446 had some bugs that are fixed in this PR.
2. A gsb-test-srv is added to provide a mock Google Safebrowsing V4 server for integration testing purposes.
3. A short integration test is added to test end-to-end GSB lookup for an "unsafe" domain.

For 1) most notably Boulder was assuming the new V4 library accepted a directory for its database persistence when it instead expects an existing file to be provided. Additionally the VA wasn't properly instantiating feature flags preventing the V4 api from being used by the VA.

For 2) the test server is designed to have a fixed set of "bad" domains (Currently just honest.achmeds.discount.hosting.com). When asked for a database update by a client it will package the list of bad domains up & send them to the client. When the client is asked to do a URL lookup it will check the local database for a matching prefix, and if found, perform a lookup against the test server. The test server will process the lookup and increment a count for how many times the bad domain was asked about.

For 3) the Boulder startservers.py was updated to start the gsb-test-srv and the VA is configured to talk to it using the V4 API. The integration test consists of attempting issuance for a domain pre-configured in the gsb-test-srv as a bad domain. If the issuance succeeds we know the GSB lookup code is faulty. If the issuance fails, we check that the gsb-test-srv received the correct number of lookups for the "bad" domain and fail if the expected isn't reality.

Notes for reviewers:

* The gsb-test-srv has to be started before anything will use it. Right now the v4 library handles database update request failures poorly and will not retry for 30min. See google/safebrowsing#44 for more information.
* There's not an easy way to test for "good" domain lookups, only hits against the list. The design of the V4 API is such that a list of prefixes is delivered to the client in the db update phase and if the domain in question matches no prefixes then the lookup is deemed unneccesary and not performed. I experimented with sending 256 1 byte prefixes to try and trick the client to always do a lookup, but the min prefix size is 4 bytes and enumerating all possible prefixes seemed gross.
* The test server has a /add endpoint that could be used by integration tests to add new domains to the block list, but it isn't being used presently. The trouble is that the client only updates its database every 30 minutes at present, and so adding a new domain will only take affect after the client updates the database.

Resolves #2448
2017-01-23 11:07:20 -08:00
Jacob Hoffman-Andrews 7705b18a70 Refactor integration test.
Add a new tiny client called chisel, in place of test.js. This reduces the
number of language runtimes Boulder depends on for its tests. Also, since chisel
uses the acme Python library, we get more testing of that library, which
underlies Certbot. This also gives us more flexibility to hook different parts
of the issuance flows in our tests.

Reorganize integration-test.py itself. There was not clear separation of
specific test cases. Some test cases were added as part of run_node_test; some
were wrapped around it. There is now much closer to one function per test case.
Eventually we may be able to adopt Python's test infrastructure for these test
cases.

Remove some unused imports; consolidate on urllib2 instead of urllib.

For getting serial number and expiration date, replace shelling out to OpenSSL
with using pyOpenSSL, since we already have an in-memory parsed certificate.

Replace ISSUANCE_FAILED, REVOCATION_FAILED, MAILER_FAILED with simple die, since
we don't use these. Later, I'd like to remove the other specific exit codes. We
don't make very good use of them, and it would be more effective to just use
stack traces or, even better, reporting of which test cases failed.

Make single_ocsp_sign responsible for its own subprocess lifecycle.

Skip running startservers if WFE is already running, to make it easier to
iterate against a running Boulder (saves a few seconds of Boulder startup).
2017-01-22 20:51:27 -08:00
Roland Bracewell Shoemaker 170e37c675 Add a special error message if we are trying to talk TLS to a HTTP-only server (#2511)
If the VA fails to validate a TLS-SNI-01 challenge because it is trying to talk TLS to a HTTP-only server return a special error message that is slightly more informative.
2017-01-20 11:36:39 -05:00
Roland Bracewell Shoemaker b2a4a1692b Add counter for signatures (#2510)
Add a super basic counter for certificate and OCSP signatures so we have a slightly less noisy idea of our current HSM signing performance and where it is going.

Fixes #2438.
2017-01-20 11:33:09 -05:00
Jacob Hoffman-Andrews 16ab736c07 Temporarily switch to SIGKILL for startservers shutdown. (#2512)
Unfortunately our clean shutdown code paths are too noisy, and often obscure
real errors. We can turn this back to SIGTERM once that's fixed.
2017-01-19 16:45:43 -08:00
Roland Bracewell Shoemaker 7d7adabe44 Allow probs.ProblemDetails to be passed across gRPC layer (#2506)
Currently services will pass both `core.XXXError` and `probs.XXX` type errors across the gRPC layer. In the future (#2505) we intend to stop passing `probs.XXX` type errors across this layer but for now we need to support them until that change is landed. This patch takes the easiest path to allow this by encoding the `probs.ProblemDetails` to JSON and storing it in the gRPC error body so that it can be passed around.

Fixes #2497.
2017-01-19 14:59:44 -08:00
Roland Bracewell Shoemaker cb64fee358 Purge POST'd OCSP responses as well as GETs (#2449)
Akamai expects a special URL to be used to purge responses from POST'd requests, this change adds those URLs to the existing GET URLs when purging OCSP responses.

Note this assumes all POST requests encode their OCSP request in the same manner that Golang does, which is most likely not the case in some situations. In order to mimic all the possible formats we would need to write a bunch of custom ASN.1 definitions that conform with specific field use/order that browsers etc use and generate a URL for each of those as well.

Fixes #996.
2017-01-18 09:30:15 -08:00
Jacob Hoffman-Andrews 714ec98a0c Update OCSP load testing doc. (#2486)
Prefer up over start to allow prometheus container to find boulder.
Use ocspMinTimeToExpiry: 0h trick instead of updating DB manually.

Offer command to fill DB.

Offer Prometheus link to throughput graph.
2017-01-17 16:32:31 -08:00
Jacob Hoffman-Andrews 9dacdd5443 Fix SA wrappers for maps. (#2498)
We turn arrays into maps with a range command. Previously, we were taking the
address of the iteration variable in that range command, which meant incorrect
results since the iteration variable gets reassigned.

Also change the integration test to catch this error.

Fixes #2496
2017-01-17 14:07:07 -08:00
Roland Bracewell Shoemaker 80600fbadb Only send cache purge request if generated OCSP has 'revoked' status (#2487)
* Only send cache purge request if generated OCSP has 'revoked' status

* Only call sendPurge from generateRevokedResponse
2017-01-17 17:02:41 -05:00
Josh Soref 281de6d90e Fix auth line in email test (#2499)
Previously the comment was wrong (left off the leading `\0`), but the base64-encoded `AUTH` line was right.

This change also modified the password from `paswd` to `passwd`.
2017-01-17 09:46:58 -08:00