Commit Graph

3614 Commits

Author SHA1 Message Date
Jacob Hoffman-Andrews d6ba7fcba9 Add some timing histogram stats (#2482)
Previously our gRPC client code called the wrong function, enabling server-side instead of client-side histograms.

Also, add a timing stat for the generate / store combination in OCSP Updater.
2017-01-10 11:02:41 -08:00
Jacob Hoffman-Andrews 5aa78a578c Skip parsing of certificates in OCSP Updater. (#2480)
The result is thrown away, so this adds a slight performance overhead for no
benefit. In theory it would catch malformed certificates in the DB, but that is
the job of cert-checker.
2017-01-10 10:36:34 -05:00
Jacob Hoffman-Andrews 58ccd7a71a Copy all statsd stats to Prometheus. (#2474)
We have a number of stats already expressed using the statsd interface. During
the switchover period to direct Prometheus collection, we'd like to make those
stats available both ways. This change automatically exports any stats exported
using the statsd interface via Prometheus as well.

This is a little tricky because Prometheus expects all stats to by registered
exactly once. Prometheus does offer a mechanism to gracefully recover from
registering a stat more than once by handling a certain error, but it is not
safe for concurrent access. So I added a concurrency-safe wrapper that creates
Prometheus stats on demand and memoizes them.

In the process, made a few small required side changes:
 - Clean "/" from method names in the gRPC interceptors. They are allowed in
   statsd but not in Prometheus.
 - Replace "127.0.0.1" with "boulder" as the name of our testing CT log.
   Prometheus stats can't start with a number.
 - Remove ":" from the CT-log stat names emitted by Publisher. Prometheus stats
   can't include it.
 - Remove a stray "RA" in front of some rate limit stats, since it was
   duplicative (we were emitting "RA.RA..." before).

Note that this means two stat groups in particular are duplicated:
 - Gostats* is duplicated with the default process-level stats exported by the
   Prometheus library.
 - gRPCClient* are duplicated by the stats generated by the go-grpc-prometheus
   package.

When writing dashboards and alerts in the Prometheus world, we should be careful
to avoid these two categories, as they will disappear eventually. As a general
rule, if a stat is available with an all-lowercase name, choose that one, as it
is probably the Prometheus-native version.

In the long run we will want to create most stats using the native Prometheus
stat interface, since it allows us to use add labels to metrics, which is very
useful. For instance, currently our DNS stats distinguish types of queries by
appending the type to the stat name. This would be more natural as a label in
Prometheus.
2017-01-10 10:30:15 -05:00
Jacob Hoffman-Andrews 510e279208 Simplify gRPC TLS configs. (#2470)
Previously, a given binary would have three TLS config fields (CA cert, cert,
key) for its gRPC server, plus each of its configured gRPC clients. In typical
use, we expect all three of those to be the same across both servers and clients
within a given binary.

This change reuses the TLSConfig type already defined for use with AMQP, adds a
Load() convenience function that turns it into a *tls.Config, and configures it
for use with all of the binaries. This should make configuration easier and more
robust, since it more closely matches usage.

This change preserves temporary backwards-compatibility for the
ocsp-updater->publisher RPCs, since those are the only instances of gRPC
currently enabled in production.
2017-01-06 14:19:18 -08:00
Jacob Hoffman-Andrews 7e187773b2 Expose more debug ports. (#2471)
Some Boulder services offered debug ports but did not expose them in
docker-compose.yml.
2017-01-06 16:48:25 -05:00
Jacob Hoffman-Andrews 9b8dacab03 Split out separate RPC services for issuing and for signing OCSP (#2452)
This allows finer-grained control of which components can request issuance. The OCSP Updater should not be able to request issuance.

Also, update test/grpc-creds/generate.sh to reissue the certs properly.

Resolves #2417
2017-01-05 15:08:39 -08:00
Daniel McCarney 74c5e68491 Fixes the `config/publisher.json` clientNames list. (#2466)
In https://github.com/letsencrypt/boulder/pull/2453 we created
individual client certificates for each gRPC client. The "clientNames"
list in the `config-next/publisher.json` was updated for the new
component-specific SANs but we neglected to updated
`config/publisher.json`. This caused the `ocsp-updater` (which uses gRPC
in the base `config/` to talk to the `publisher`) to fail to connect.

This commit updates `config/publisher.json` to have the same clientNames
as `config-next/publisher.json` and resolves #2465
2017-01-03 10:10:01 -08:00
Jacob Hoffman-Andrews eadce69146 Improve Docker instructions. (#2464)
Previously the instructions assumed you had Go setup on your host, which
somewhat defeates the point of running Boulder inside Docker, since it requires
more initial setup. These instructions make first-time users less likely to hit
the oci runtime error described later in the README.
2017-01-03 09:43:44 -08:00
Jacob Hoffman-Andrews 089a270453 Add instructions on load testing OCSP generation. (#2459) 2017-01-02 11:36:03 -08:00
Simone Carletti b5bac90efd Update the publicsuffix dep to v0.3.1 (#2462)
We recently made changes to the IANA suffixes, and you may want to pull them into the latest Boulder version.

```
➜  publicsuffix-go git:(master) git show -s

commit 3ea542729b4d7056a9d1356c9baf27bcad2bda7f
Author: Simone Carletti <weppos@weppos.net>
Date:   Mon Jan 2 18:28:57 2017 +0100

    Release 0.3.1
```

```
➜  publicsuffix-go git:(master) go test ./...
?   	github.com/weppos/publicsuffix-go/cmd/gen	[no test files]
?   	github.com/weppos/publicsuffix-go/cmd/load	[no test files]
ok  	github.com/weppos/publicsuffix-go/net/publicsuffix	0.045s
ok  	github.com/weppos/publicsuffix-go/publicsuffix	0.091s
```

v0.3.1 is tagged and signed with my PGP key.
https://github.com/weppos/publicsuffix-go/releases/tag/v0.3.1
2017-01-02 11:05:19 -08:00
IntiGabriel 18341d0e3f Add instructions to fetch boulder into $GOPATH (#2460)
When cloning this repository it is not clear that you need to get boulder first. Starting directly with `docker-compose up` fails.
2016-12-30 01:23:18 -08:00
Jacob Hoffman-Andrews 0c665b2053 Split up gRPC certificates by service. (#2453)
Previously, all gRPC services used the same client and server certificates. Now,
each service has its own certificate, which it uses for both client and server
authentication, more closely simulating production.

This also adds aliases for each of the relevant hostnames in /etc/hosts. There
may be some issues if Docker decides to rewrite /etc/hosts while Boulder is
running, but this seems to work for now.
2016-12-29 14:53:59 -08:00
Jacob Hoffman-Andrews 3abb9d1780 Make client certificate errors more verbose. (#2451)
Echo the expected list of names and the received list of names.

Also, change the unittest to use its own testdata directory rather than
borrowing.
2016-12-29 14:52:12 -08:00
Daniel McCarney d26a54b3e9 Adds 'kid' divergence to docs (#2458)
Resolves #2455
2016-12-29 14:51:47 -08:00
Daniel McCarney 74e281c1ce Switch to Google's v4 safebrowsing library. (#2446)
Right now we are using a third-party client for the Google Safe Browsing API, but Google has recently released their own [Golang library](https://github.com/google/safebrowsing) which also supports the newer v4 API. Using this library will let us avoid fixing some lingering race conditions & unpleasantness in our fork of `go-safebrowsing-api`.

This PR adds support for using the Google library & the v4 API in place of our existing fork when the `GoogleSafeBrowsingV4` feature flag is enabled in the VA "features" configuration.

Resolves https://github.com/letsencrypt/boulder/issues/1863

Per `CONTRIBUTING.md` I also ran the unit tests for the new dependency:
```
daniel@XXXXXXXXXX:~/go/src/github.com/google/safebrowsing$ go test ./...
ok  	github.com/google/safebrowsing	3.274s
?   	github.com/google/safebrowsing/cmd/sblookup	[no test files]
?   	github.com/google/safebrowsing/cmd/sbserver	[no test files]
?   	github.com/google/safebrowsing/cmd/sbserver/statik	[no test files]
?   	github.com/google/safebrowsing/internal/safebrowsing_proto	[no test files]
ok  	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/jsonpb	0.012s
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/jsonpb/jsonpb_test_proto	[no test files]
ok  	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/proto	0.062s
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/proto/proto3_proto	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go/descriptor	[no test files]
ok  	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go/generator	0.017s
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go/grpc	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go/plugin	[no test files]
ok  	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes	0.009s
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/any	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/duration	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/empty	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/struct	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/timestamp	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/wrappers	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/rakyll/statik	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/rakyll/statik/fs	[no test files]
ok  	github.com/google/safebrowsing/vendor/golang.org/x/net/idna	0.003s
```
2016-12-27 11:18:11 -05:00
Daniel McCarney 5acce8ba38 Removes `ProblemDetailsForError` from `verifyPOST`. (#2444)
Prior to this commit, when there was an err from
`wfe.SA.GetRegistrationByKey`, and that error wasn't an instance of
`core.NoSuchRegistrationError`, `verifyPOST` converted the error into
a problem by sending it through `core.ProblemDetailsForError(err, "")`.

In this case, this isn't an appropriate strategy. The only possible
errors that can be sent through this function will not match any of the
`case` statements in `core.ProblemDetailsForError` and will be returned
by the `default` case:

```
default:
  // Internal server error messages may include sensitive data, so we do
  // not include it.
  return probs.ServerInternal(msg)
```

Since `verifyPOST` calls this function with `msg = ""`,
`ProblemDetailsForError` will return an empty `ServerInternalProblem`. When the
caller of `verifyPOST` gives the returned serv internal problem to `sendError`
it will produce: `"Internal error -  - %s<nil>"` because the problem's detail
is "" and the error code given to `sendError` is nil.

Since having examined the code paths in `verifyPOST` before
`core.ProblemDetailsForError` won't ever match anything but the default case
producing a blank message it seems the proper fix here is to not use
`ProblemDetailsForError` at all and instead directly instantiate a
`ServerInternalProblem` with a suitable message.
2016-12-21 13:00:48 -08:00
Roland Bracewell Shoemaker 5ca43d2985 Fix akamai cache purger bugs (#2443)
Fixes two bugs in the Akamai cache purging library and one in the `ocsp-updater` and adds some tests to the Akamai library.

* The first was that the backoff logic was broken, the backoff was calculated but discarded as it was assumed the sleep happened inside `core.RetryBackoff` instead of it returning the amount of time to backoff.
* The second was that the internal HTTP client would only log errors if they were fatal which was superfluous as the caller would also log the fatal errors and masked what the actual issue was during retries.
* The last in `ocsp-updater` was that `path.Join` was used to create a URL which is not an intended use of the method as it attempts to clean paths. This meant that the scheme prefix `http://` would be 'cleaned' to `http:/`, since Akamai has no idea what the malformed URLs referred to it would return 403 Forbidden which we could consider a temporary error and retry until failure.
2016-12-21 09:05:49 -05:00
Jacob Hoffman-Andrews d710dc9a2f Updates divergences after more feedback (#2441)
As pointed out by @webczat (Thanks again!), my last update to `acme-divergences.md`
https://github.com/letsencrypt/boulder/pull/2402 was still a little bit off-the-mark accuracy wise.

This PR resolves the problems (fingers crossed! 🎌) that remained with the
documentation of the Section 6.3 "URL field" divergence and the Section 5.4.1
"existing registration" divergence.

Resolves https://github.com/letsencrypt/boulder/issues/2414
2016-12-19 15:00:49 -08:00
Daniel McCarney e74e7ad14b Include domain name in email subj (#2435)
This PR modifies the `expiration-mailer` utility to change the subject used in the reminder emails to include a domain name from the expiring certificate.

Previously unless otherwise specified using the `Mailer.Subject` configuration parameter all reminder emails were sent with the subject `Certificate expiration notice`. Both the `test/config/` and `test/config-next` expiration mailer configurations do not override the subject and were using the default.

With this PR, if no `Mailer.Subject` configuration parameter is provided then reminder emails are sent with the subject `Certificate expiration notice for domain "foo.bar.com"` in the case of only one domain in the expiring certificate, and `Certificate expiration for domain "foo.bar.com" (and $(n-1) more)` for the case where there are n > 1 domains (e.g. "(and 1 more)", "(and 2 more)" ...). I explicitly left support for the `Mailer.Subject` override to allow legacy configurations to function.

I didn't explicitly add a new unit test for this behaviour because the existing unit tests were exercising both the "configuration override" portion of the subject behaviour, and matching the new expected subject. It would be entirely duplicated code to write a separate test for the subject template.

Resolves #2411
2016-12-19 17:12:37 -05:00
Daniel McCarney 1083db5a15 Optimize expiration-mailer queries (#2440)
This PR splits up the expiration-mailer's `findExpiringCertificates` query into two parts:
1. One query to find `certificateStatus` serial numbers that match the search criteria
2. Sequential queries to find each `certificate` row for the results from 1.

This removes the `JOIN` on two large tables from the original `findExpiringCertificates` query and lets us shift load away from the database. https://github.com/letsencrypt/boulder/issues/2432 wasn't sufficient to reduce the load of this query.

Resolves https://github.com/letsencrypt/boulder/issues/2425
2016-12-19 14:29:37 -05:00
Daniel 2cf2b97358
Updates divergences after more feedback 📣 2016-12-19 11:45:43 -05:00
Jacob Hoffman-Andrews 0cc0ab9d9b Add admin-revoker integration tests for serial-revoke and auth-revoke (#2421)
Skips adding tests for reg-revoke as it would require significant changes to how test.js
functions that would additionally require re-working a number of the other integration
tests.

Updates #2340.
2016-12-15 16:54:32 -08:00
Daniel McCarney 5c3482d2dd `certificateStatus` table optimizations (Part Four) (#2432)
Similar to #2431 the expiration-mailer's `findExpiringCertificates()` query can be optimized slightly by using `certificateStatus.NotAfter` in place of `certificate.expires` in the `WHERE` clause of its query when the `CertStatusOptimizationsMigrated` feature is enabled.

Resolves https://github.com/letsencrypt/boulder/issues/2425
2016-12-15 12:53:54 -08:00
Daniel McCarney 32890656b8 `certificateStatus` table optimizations (Part Three) (#2431)
Following on to https://github.com/letsencrypt/boulder/pull/2177 and https://github.com/letsencrypt/boulder/issues/2227 this PR adds code to the `ocsp-updater` that takes advantage of the migrations & backfill from the previous optimization PRs.

This has the primary effect of removing the `JOIN` on the `certificates` table in the `findStaleOCSPResponses` query. We expect this to be a big win in terms of query performance.

The `ocsp-updater` is also updated to opportunistically fill in the newly added `isExpired` field of the `CertificateStatus` table as it encounters rows that aren't marked as expired but correspond to an expired certificate.

Resolves https://github.com/letsencrypt/boulder/issues/2238 and #2239
2016-12-15 12:53:43 -08:00
Jacob Hoffman-Andrews 263db24571 Disable fail-fast for gRPC. (#2397) (#2434)
This is a roll-forward of 5b865f1, with the QueueDeclare and QueueBind changes
in AMQP-RPC removed, and the startup order changes in test/startservers.py
removed. The AMQP-RPC changes caused RabbitMQ permission problems in production,
and the startup order changes depended on the AMQP-RPC changes but were not
required now that we have a unittest also.

This allows us to restart backends with relatively little interruption in
service, provided the backends come up promptly.

Fixes #2389 and #2408
2016-12-15 12:52:34 -08:00
Roland Shoemaker e850b27588 Fix typo 2016-12-15 12:47:36 -08:00
Roland Shoemaker 38c46fdd2e Review fixes pt. 2 2016-12-15 11:58:19 -08:00
Roland Shoemaker 07068b4d1e Review fixes pt. 1 2016-12-15 11:47:37 -08:00
Jacob Hoffman-Andrews 5407a45b02 Revert "Disable fail-fast for gRPC. (#2397)" (#2427)
This reverts commit 5b865f1d63.

The QueueDeclare and QueueBind calls in that change caused AMQP permission
denied errors.
2016-12-13 13:20:08 -08:00
Daniel McCarney fab236022a Merge branch 'staging' into master 2016-12-13 09:35:46 -05:00
Jacob Hoffman-Andrews 68f1be8523 Cherry-pick #2422 into staging (#2423)
This is a cherry-pick of #2422 into the staging branch for a hotfix release.

Previously all OCSP signing and storage would be serial, which meant it was hard
to exercise the full capacity of our HSM. In this change, we run a limited
number of update and store requests in parallel.

This change also changes stats generation in generateOCSPResponses so we can
tell the difference between stats produced by new OCSP requests vs existing ones,
and adds a new stat that records how long the SQL query in findStaleOCSPResponses
takes.

Resolved conflicts on cherry-pick:
cmd/ocsp-updater/main.go
2016-12-12 17:55:33 -08:00
Jacob Hoffman-Andrews 26cf552ff9 Sign OCSP in parallel for better performance. (#2422)
Previously all OCSP signing and storage would be serial, which meant it was hard
to exercise the full capacity of our HSM. In this change, we run a limited
number of update and store requests in parallel.

This change also changes stats generation in generateOCSPResponses so we can
tell the difference between stats produced by new OCSP requests vs existing ones,
and adds a new stat that records how long the SQL query in findStaleOCSPResponses
takes.
2016-12-12 17:22:44 -08:00
Roland Shoemaker 26e2d8a5ca Add admin-revoker integration tests for serial-revoke and auth-revoke 2016-12-12 15:43:35 -08:00
Daniel McCarney 276530f68d Merge pull request #2420 from letsencrypt/cpu-staging-hotfix
OCSP Updater "stale max age" parameter
2016-12-12 16:20:15 -05:00
Daniel McCarney 11100f5873
OCSP Updater "stale max age" parameter. (#2419)
This PR adds a new `OCSPStaleMaxAge` configuration parameter to the `ocsp-updater`. The default value when not provided is 30 days, and this is explicitly added to both `config/ocsp-updater.json` and `config-next/ocsp-updater.json`. 

The OCSP updater uses this new parameter in `findStaleOCSPResponses` as a lower bound on the `ocspLastUpdated` field of the certificateStatus table. This is intended to speed up the processing of this query until we can land the proper fixes that require more intensive migrations & backfilling. 

The `TestGenerateOCSPResponses` and `TestFindStaleOCSPResponses` unit tests had to be updated to explicitly set the `ocspLastUpdated` field of the certificate status rows that the tests add, because otherwise they are left at a default value of `0` and are excluded by the new `OCSPStaleMaxAge` functionality.
2016-12-12 16:06:51 -05:00
Daniel McCarney 6ec93157f7 OCSP Updater "stale max age" parameter. (#2419)
This PR adds a new `OCSPStaleMaxAge` configuration parameter to the `ocsp-updater`. The default value when not provided is 30 days, and this is explicitly added to both `config/ocsp-updater.json` and `config-next/ocsp-updater.json`. 

The OCSP updater uses this new parameter in `findStaleOCSPResponses` as a lower bound on the `ocspLastUpdated` field of the certificateStatus table. This is intended to speed up the processing of this query until we can land the proper fixes that require more intensive migrations & backfilling. 

The `TestGenerateOCSPResponses` and `TestFindStaleOCSPResponses` unit tests had to be updated to explicitly set the `ocspLastUpdated` field of the certificate status rows that the tests add, because otherwise they are left at a default value of `0` and are excluded by the new `OCSPStaleMaxAge` functionality.
2016-12-12 15:57:59 -05:00
Jacob Hoffman-Andrews e25138b21c Update orphan finder. (#2409)
The log format changed slightly: We log hex instead of base64.
2016-12-09 12:06:19 -08:00
Jacob Hoffman-Andrews 5b865f1d63 Disable fail-fast for gRPC. (#2397)
This allows us to restart backends with relatively little interruption in
service, provided the backends come up promptly.

Fixes #2389 and #2408
2016-12-09 12:03:45 -08:00
Jacob Hoffman-Andrews 3cff6babb3 Remove /go/bin from PATH in Docker images (#2412)
We don't write any built binaries there during the test process (Boulder
binaries go in boulder/bin), so including it in that path has no effect,
at least on Travis.

For local builds, because GOPATH gets mounted as a volume at /go/,
including /go/bin in PATH means that a docker run command locally can
wind up running a local copy of some key binary that is different than
the one that gets run in Travis. For instance, this happens when the
host's version of protoc-gen-go differs from the version in the image,
producing diffs when Travis runs the "generate" phase of test.sh.
2016-12-09 09:11:03 -08:00
Jacob Hoffman-Andrews d9b53cd103 Set gRPC logs to go through syslog. (#2403)
StatsAndLogging is called early enough in each program that it precedes any gRPC
setup code that might need SetLogger already to have been set.

Fixes #2383
2016-12-08 15:25:31 -08:00
Daniel McCarney d4902820ca Adds unique VA DNS validation error for empty TXTs. (#2401)
Presently when the VA performs a DNS-01 challenge verification it
returns the same error for the case where the remote nameserver had the
**wrong** TXT value, and when the remote nameserver had an **empty**
response for the TXT query. It would aid debugging if the user was told
which of the two failure cases was responsible for the overall challenge
failure.

This commit adds a unique error message for the empty TXT records case,
and a unit test/mock to exercise the new the error message.

Resolves #2326
2016-12-08 11:27:28 -08:00
Daniel McCarney b0f59b8a96 Adds warning about unsub link (#2404)
We received some community forum feedback that the certificate expiration email's unsubscribe link didn't provide enough context to know what would happen if you clicked it. Combined with #1396 I agree that we should make it clear that there won't be a subsequent confirmation of intent if you click unsubscribe and you can't resubscribe yourself.

Note: there is an outstanding operations ticket to switch to using the Boulder data/ templates in staging/production. Until that happens this PR won't take effect post-merge/deployment.
2016-12-08 11:26:05 -08:00
Jacob Hoffman-Andrews b22cae8cdd Don't echo command from run_and_expect_silence. (#2405)
Some commands, like our errcheck command, are very long. When we echo these both
before and after running them, it can obscure what is often a single-line
failure message. Removing the echo after failure makes it easier to spot the
real failure message.
2016-12-08 11:22:21 -08:00
Daniel McCarney abb54bdf81 Adds divergences for URL & existing reg status code. (#2402)
Issue #2365 reported two places where we had divergences from ACME-04 in Boulder's implementation that were not reflected in the divergences doc. This PR documents:

1. That Boulder checks the `resource` field from the protected JWS header instead of the `url` field as described in Section 5.4.1
2. That Boulder uses a response with HTTP status code 409 (Conflict) when returning a Location header for an existing reg while Section 6.3 describes using HTTP status code 200 for this purpose.

This resolves #2365.
2016-12-08 10:20:44 -08:00
Jacob Hoffman-Andrews a8998bf0b9 Split grpc/wrappers.go into several files (#2392)
There is now one file per service, containing both the client-side and
server-side wrappers for that service. This is a straight move of the code, with
the copyright, header comments, package statement, and imports copied into each
new file, and goimports run on the result.

Two custom errors were moved into bcodes.go.

Fixes #2388.
2016-12-06 15:45:31 -08:00
Jacob Hoffman-Andrews 1c1449b284 Improvements to tests and test configs. (#2396)
- Remove spinner from test.js. It made Travis logs hard to read.
- Listen on all interfaces for debugAddr. This makes it possible to check
  Prometheus metrics for instances running in a Docker container.
- Standardize DNS timeouts on 1s and 3 retries across all configs. This ensures
  DNS completes within the relevant RPC timeouts.
- Remove RA service queue from VA, since VA no longer uses the callback to RA on
  completing a challenge.
2016-12-05 14:35:27 -08:00
Jacob Hoffman-Andrews b8a237ffb3 Use grpc-go-prometheus for RPC stats. (#2391)
There's an off-the-shelf package that provides most of the stats we care about
for gRPC using interceptors. This change vendors go-grpc-prometheus and its
dependencies, and calls out to the interceptors provided by that package from
our own interceptors.

This will allow us to get metrics like latency histograms by call, status codes
by call, and so on.

Fixes #2390.

This change vendors go-grpc-prometheus and its dependencies. Per contributing guidelines, I've run the tests on these dependencies, and they pass:

go test github.com/davecgh/go-spew/spew github.com/grpc-ecosystem/go-grpc-prometheus github.com/grpc-ecosystem/go-grpc-prometheus/examples/testproto github.com/pmezard/go-difflib/difflib github.com/stretchr/testify/assert github.com/stretchr/testify/require github.com/stretchr/testify/suite 
ok      github.com/davecgh/go-spew/spew 0.022s
ok      github.com/grpc-ecosystem/go-grpc-prometheus    0.120s
?       github.com/grpc-ecosystem/go-grpc-prometheus/examples/testproto [no test files]
ok      github.com/pmezard/go-difflib/difflib   0.042s
ok      github.com/stretchr/testify/assert      0.021s
ok      github.com/stretchr/testify/require     0.017s
ok      github.com/stretchr/testify/suite       0.012s
2016-12-05 14:31:22 -08:00
Daniel McCarney a2b8faea1e Only resubmit missing SCTs. (#2342)
This PR introduces the ability for the ocsp-updater to only resubmit certificates to logs that we are missing SCTs from. Prior to this commit when a certificate was missing one or more SCTs we would submit it to every log, causing unnecessary overhead for us and the log operator.

To accomplish this a new RPC endpoint is added to the Publisher service "SubmitToSingleCT". Unlike the existing "SubmitToCT" this RPC endpoint accepts a log URI and public key in addition to the certificate DER bytes. The certificate is submitted directly to that log, and a cache of constructed resources is maintained so that subsequent submissions to the same log can reuse the stat name, verifier, and submission client.

Resolves #1679
2016-12-05 13:54:02 -08:00
Daniel McCarney 0f2af77660 Merge pull request #2394 from letsencrypt/master
Merge master to staging.
2016-12-05 11:42:11 -05:00
Roland Bracewell Shoemaker 43bcc0b167 Empty gRPC whitelist fix (#2376)
`grpc/creds:serverTransportCredentials.validateClient` is meant to ignore the check if the `acceptedSANs` map it is constructed with is `nil`. This never happens as the map is constructed using `make(map[string]struct{})` meaning it can never be `nil`.

Instead start with a `nil` map and only populate it if we have `ClientNames` to whitelist.

Fixes #2375.
2016-12-05 08:26:19 -08:00