This commit adds a small script `tag_and_upload.sh` that:
1) Builds the boulder-tools image with the correct tag
2) Prompts you to log in to dockerhub
3) Pushes the boulder-tools image
This means I won't have to remember how to do this next time we need to
bump our Go version :-)
Remove explicit `boulder-tools` docker pull cmd.
Per @jsha's comment in #2567 it should be possible to remove the explicit docker pull letsencrypt/boulder-tools since the docker-compose pull that precedes it will take care of it.
The NotAfter and IsExpired fields on the certificateStatus table
have been migrated in staging & production. Similarly the
CertStatusOptimizationsMigrated feature flag has been turned on after
a successful backfill operation. We have confirmed the optimization is
working as expected and can now clean out the duplicated v1 and v2
models, and the feature flag branching. The notafter-backfill command
is no longer useful and so this commit also cleans it out of the repo.
Note: Some unit tests were sidestepping the SA and inserting
certificateStatus rows explicitly. These tests had to be updated to
set the NotAfter field in order for the queries used by the
ocsp-updater and the expiration-mailer to perform the way the tests
originally expected.
Resolves#2530
Previously we had a 'latest' docker image tag for the `boulder-tools`
image that was *not* the most recent. This was confusing so we deleted
it this morning to close#2030.
Unfortunately Travis was defaulting to pulling the "latest" tag since
one wasn't specified (e.g. the way we do in `Dockerfile` and
`docker-compose.yml`). Resulting in build breakage:
```
docker pull letsencrypt/boulder-tools
Using default tag: latest
Pulling repository docker.io/letsencrypt/boulder-tools
Tag latest not found in repository docker.io/letsencrypt/boulder-tools
The command "docker pull letsencrypt/boulder-tools" failed and exited
with 1
```
This commit specifies the same tag as in `Dockerfile` and
`docker-compose.yml` for travis. We will need to update this tag when we
update the other places for a new boolder-tools image.
Instead of using `unwrapError/wrapError` in each of the wrapper functions do it in the server/client interceptors instead. This means we now consistently do error unwrapping/wrapping.
Fixes#2509.
This allows us to iterate more easily against the current acme module.
Also, remove nodejs from boulder-tools, clean up a few packages that weren't
previously cleaned up, and install a specific version of protoc-gen-go to match
our vendored grpc.
In the last weeks we made some large changes to the list of .RU and .SU domains in the PSL, due to some very old policy changes at the registry (2009) and more recent follow up.
Given the amount of pressure about these changes from certain users, most certainly because LE limits, I figured out you'll soon have people asking you to merge the changes. I've packaged a new release of publicsuffix-go, and updated the dependency in this PR.
$ git show master
commit c5490f26d8f43b84857ac54e23387b8ed9b100dd
Author: Simone Carletti <weppos@weppos.net>
Date: Tue Feb 7 23:26:14 2017 +0100
Release 0.3.2
➜ publicsuffix-go git:(master) go test ./...
? github.com/weppos/publicsuffix-go/cmd/load [no test files]
ok github.com/weppos/publicsuffix-go/net/publicsuffix 0.023s
ok github.com/weppos/publicsuffix-go/publicsuffix 0.039s
Please note this release also includes the .ONION as per publicsuffix/list#374
In order to provide the correct issuer certificate for older certificates after an issuer certificate rollover or when using multiple issuer certificates (e.g. RSA and ECDSA), use the AIA CA Issuer URL embedded in the certificate for the rel="up" link served by WFE. This behaviour is gated behind the UseAIAIssuerURL feature, which defaults to false.
To prevent MitM vulnerabilities in cases where the AIA URL is HTTP-only, it is upgraded to HTTPS.
This also adds a test for the issuer URL returned by the /acme/cert endpoint. wfe/test/178.{crt,key} were regenerated to add the AIA extension required to pass the test.
/acme/cert was changed to return an absolute URL to the issuer endpoint (making it consistent with /acme/new-cert).
Fixes#1663
Based on #1780
This PR does two things:
1) It reverts the revert of #2474, restoring the Prometheus stats bridge work
2) It strips invalid metric name characters in `promAdjust`. The `promAdjust` function in `metrics/auto.go` previously allowed characters that were not valid in prometheus metric names (e.g. '>'). This PR updates `promAdjust` to remove invalid characters. The `TestPromAdjust` function is updated with testcases that include invalid characters.
(Note for reviewers: the diffs in cc896c9996 are the only "new" work in this PR to review)
Resolves https://github.com/letsencrypt/boulder/issues/2540
(Credit to @rolandshoemaker for spotting the problem while I chased ghosts!!!)
The `promAdjust` function in `auto.go` previously allowed characters
that were not valid in prometheus metric names (e.g. '>'). This commit
updates `promAdjust` to remove invalid characters. The `TestPromAdjust`
function is updated with testcases that include invalid characters.
Simply rely on exceptions from check_output.
Also, factor out common params for check_output into a `run` helper function.
Makes sure we always capture stderr into stdout.
Purge both query encoded and un-encoded OCSP GET URLs and fix a typo in the POST body key.
Since there is no public documentation of how the latter Akamai feature works we were unable to
confirm that the previously spelling of `body-mdy` was a typo of `body-md5` but active testing against
the staging environment and the Akamai API has confirmed that it was.
Fixes#2528.
If you are the first person to add a feature to a Boulder command its very
easy to forget to update the command's config structure to accommodate a
`map[string]bool` entry and to pass it to `features.Set` in `main()`. See
https://github.com/letsencrypt/boulder/issues/2533 for one example. I've
fallen into this trap myself a few times so I'm going to try and save myself
some future grief by fixing it across the board once and for all!
This PR adds a `Features` config entry and a corresponding `features.Set` to:
* ocsp-updater (resolves#2533)
* admin-revoker
* boulder-publisher
* contact-exporter
* expiration-mailer
* expired-authz-purger
* notify-mailer
* ocsp-responder
* orphan-finder
These components were skipped because they already had features supported:
* boulder-ca
* boulder-ra
* boulder-sa
* boulder-va
* boulder-wfe
* cert-checker
I deliberately skipped adding Feature support to:
* single-ocsp (Its only configuration comes from the pkcs11key library and
doesn't support features)
* rabbitmq-setup (No configuration/features and we'll likely soon be rming this
since the gRPC migration)
* notafter-backfill (This is a one-off that will be deleted soon)
Previous to this PR the VA's validateTLSWithZName function would
return an error message containing the SAN names of the leaf certificate
when the validation failed. This commit updates that message to include
the Subject Common Name of the leaf cert in addition to the SANs. The
names are deduplicated to prevent listing a Subj CN twice if its also
a SAN. This will help debug cases where a cert with no SANs is returned
by the server.
In addition, the number of certificates in the chain received from the
server is included in the message. This will hopefully further help
users identify misconfiguration since a TLS SNI 01 challenge response
should have a chain length of 1.
Resolves#2468
Per the documentation we should be using *|UNSUB:url|* not |UNSUB:url|. I also confirmed that the template that predates the Boulder data/ version used *|UNSUB:https://mandrillapp.com/unsub|*.
I also moved the "WARNING" onto the preceding line. The unsubscribe URL is quite long and gnarly once its been filed in and the warning could be missed in a MUA that doesn't wrap lines well.
This resolves#2523
Add a new tiny client called chisel, in place of test.js. This reduces the
number of language runtimes Boulder depends on for its tests. Also, since chisel
uses the acme Python library, we get more testing of that library, which
underlies Certbot. This also gives us more flexibility to hook different parts
of the issuance flows in our tests.
Reorganize integration-test.py itself. There was not clear separation of
specific test cases. Some test cases were added as part of run_node_test; some
were wrapped around it. There is now much closer to one function per test case.
Eventually we may be able to adopt Python's test infrastructure for these test
cases.
Remove some unused imports; consolidate on urllib2 instead of urllib.
For getting serial number and expiration date, replace shelling out to OpenSSL
with using pyOpenSSL, since we already have an in-memory parsed certificate.
Replace ISSUANCE_FAILED, REVOCATION_FAILED, MAILER_FAILED with simple die, since
we don't use these. Later, I'd like to remove the other specific exit codes. We
don't make very good use of them, and it would be more effective to just use
stack traces or, even better, reporting of which test cases failed.
Make test_single_ocsp_sign responsible for its own subprocess lifecycle.
Skip running startservers if WFE is already running, to make it easier to
iterate against a running Boulder (saves a few seconds of Boulder startup).
Previously we had `Error` and `ValidationRecords` fields in the `Challenge` protobuf but they were never populated which mean't that when using gRPC these fields wouldn't be sent to the SA from the RA on a `FinalizeAuthorization` call. This change populates those fields and updates the PB marshaling tests to verify the correct behavior.
Fixes#2514.
Fixes#976.
This implements a new rate limit, InvalidAuthorizationsPerAccount. If a given account fails authorization for a given hostname too many times within the window, subsequent new-authz attempts for that account and hostname will fail early with a rateLimited error. This mitigates the misconfigured clients that constantly retry authorization even though they always fail (e.g., because the hostname no longer resolves).
For the new rate limit, I added a new SA RPC, CountInvalidAuthorizations. I chose to implement this only in gRPC, not in AMQP-RPC, so checking the rate limit is gated on gRPC. See #2406 for some description of the how and why. I also chose to directly use the gRPC interfaces rather than wrapping them in core.StorageAuthority, as a step towards what we will want to do once we've moved fully to gRPC.
Because authorizations don't have a created time, we need to look at the expires time instead. Invalid authorizations retain the expiration they were given when they were created as pending authorizations, so we use now + pendingAuthorizationLifetime as one side of the window for rate limiting, and look backwards from there. Note that this means you could maliciously bypass this rate limit by stacking up pending authorizations over time, then failing them all at once.
Similarly, since this limit is by (account, hostname) rather than just (hostname), you can bypass it by creating multiple accounts. It would be more natural and robust to limit by hostname, like our certificate limits. However, we currently only have two indexes on the authz table: the primary key, and
(`registrationID`,`identifier`,`status`,`expires`)
Since this limit is intended mainly to combat misconfigured clients, I think this is sufficient for now.
Corresponding PR for website: letsencrypt/website#125
This PR has three primary contributions:
1. The existing code for using the V4 safe browsing API introduced in #2446 had some bugs that are fixed in this PR.
2. A gsb-test-srv is added to provide a mock Google Safebrowsing V4 server for integration testing purposes.
3. A short integration test is added to test end-to-end GSB lookup for an "unsafe" domain.
For 1) most notably Boulder was assuming the new V4 library accepted a directory for its database persistence when it instead expects an existing file to be provided. Additionally the VA wasn't properly instantiating feature flags preventing the V4 api from being used by the VA.
For 2) the test server is designed to have a fixed set of "bad" domains (Currently just honest.achmeds.discount.hosting.com). When asked for a database update by a client it will package the list of bad domains up & send them to the client. When the client is asked to do a URL lookup it will check the local database for a matching prefix, and if found, perform a lookup against the test server. The test server will process the lookup and increment a count for how many times the bad domain was asked about.
For 3) the Boulder startservers.py was updated to start the gsb-test-srv and the VA is configured to talk to it using the V4 API. The integration test consists of attempting issuance for a domain pre-configured in the gsb-test-srv as a bad domain. If the issuance succeeds we know the GSB lookup code is faulty. If the issuance fails, we check that the gsb-test-srv received the correct number of lookups for the "bad" domain and fail if the expected isn't reality.
Notes for reviewers:
* The gsb-test-srv has to be started before anything will use it. Right now the v4 library handles database update request failures poorly and will not retry for 30min. See google/safebrowsing#44 for more information.
* There's not an easy way to test for "good" domain lookups, only hits against the list. The design of the V4 API is such that a list of prefixes is delivered to the client in the db update phase and if the domain in question matches no prefixes then the lookup is deemed unneccesary and not performed. I experimented with sending 256 1 byte prefixes to try and trick the client to always do a lookup, but the min prefix size is 4 bytes and enumerating all possible prefixes seemed gross.
* The test server has a /add endpoint that could be used by integration tests to add new domains to the block list, but it isn't being used presently. The trouble is that the client only updates its database every 30 minutes at present, and so adding a new domain will only take affect after the client updates the database.
Resolves#2448
Add a new tiny client called chisel, in place of test.js. This reduces the
number of language runtimes Boulder depends on for its tests. Also, since chisel
uses the acme Python library, we get more testing of that library, which
underlies Certbot. This also gives us more flexibility to hook different parts
of the issuance flows in our tests.
Reorganize integration-test.py itself. There was not clear separation of
specific test cases. Some test cases were added as part of run_node_test; some
were wrapped around it. There is now much closer to one function per test case.
Eventually we may be able to adopt Python's test infrastructure for these test
cases.
Remove some unused imports; consolidate on urllib2 instead of urllib.
For getting serial number and expiration date, replace shelling out to OpenSSL
with using pyOpenSSL, since we already have an in-memory parsed certificate.
Replace ISSUANCE_FAILED, REVOCATION_FAILED, MAILER_FAILED with simple die, since
we don't use these. Later, I'd like to remove the other specific exit codes. We
don't make very good use of them, and it would be more effective to just use
stack traces or, even better, reporting of which test cases failed.
Make single_ocsp_sign responsible for its own subprocess lifecycle.
Skip running startservers if WFE is already running, to make it easier to
iterate against a running Boulder (saves a few seconds of Boulder startup).
If the VA fails to validate a TLS-SNI-01 challenge because it is trying to talk TLS to a HTTP-only server return a special error message that is slightly more informative.
Add a super basic counter for certificate and OCSP signatures so we have a slightly less noisy idea of our current HSM signing performance and where it is going.
Fixes#2438.
Currently services will pass both `core.XXXError` and `probs.XXX` type errors across the gRPC layer. In the future (#2505) we intend to stop passing `probs.XXX` type errors across this layer but for now we need to support them until that change is landed. This patch takes the easiest path to allow this by encoding the `probs.ProblemDetails` to JSON and storing it in the gRPC error body so that it can be passed around.
Fixes#2497.
Akamai expects a special URL to be used to purge responses from POST'd requests, this change adds those URLs to the existing GET URLs when purging OCSP responses.
Note this assumes all POST requests encode their OCSP request in the same manner that Golang does, which is most likely not the case in some situations. In order to mimic all the possible formats we would need to write a bunch of custom ASN.1 definitions that conform with specific field use/order that browsers etc use and generate a URL for each of those as well.
Fixes#996.
Prefer up over start to allow prometheus container to find boulder.
Use ocspMinTimeToExpiry: 0h trick instead of updating DB manually.
Offer command to fill DB.
Offer Prometheus link to throughput graph.
We turn arrays into maps with a range command. Previously, we were taking the
address of the iteration variable in that range command, which meant incorrect
results since the iteration variable gets reassigned.
Also change the integration test to catch this error.
Fixes#2496
Previously the comment was wrong (left off the leading `\0`), but the base64-encoded `AUTH` line was right.
This change also modified the password from `paswd` to `passwd`.