Commit Graph

13 Commits

Author SHA1 Message Date
Roland Bracewell Shoemaker e2b2511898 Overhaul internal error usage (#2583)
This patch removes all usages of the `core.XXXError` and almost all usages of `probs` outside of the WFE and VA and replaces them with a unified internal error type. Since the VA uses `probs.ProblemDetails` quite extensively in challenges, and currently stores them in the DB I've saved this change for another change (it'll also require a migration). Since `ProblemDetails` should only ever be exposed to end-users all of its related logic should be moved into the `WFE` but since it still needs to be exposed to the VA and SA I've left it in place for now.

The new internal `errors` package offers the same convenience functions as `probs` does as well as a new simpler type testing method. A few small changes have also been made to error messages, mainly adding the library and function name to internal server errors for easier debugging (i.e. where a number of functions return the exact same errors and there is no other way to distinguish which method threw the error).

Also adds proper encoding of internal errors transferred over gRPC (the current encoding scheme is kept for `core` and `probs` errors since it'll be ideally be removed after we deploy this and follow-up changes) using `grpc/metadata` instead of the gRPC status codes.

Fixes #2507. Updates #2254 and #2505.
2017-03-22 23:27:31 -07:00
Roland Bracewell Shoemaker 0c04fe2f5e Move error wrapping/unwrapping into the interceptors (#2556)
Instead of using `unwrapError/wrapError` in each of the wrapper functions do it in the server/client interceptors instead. This means we now consistently do error unwrapping/wrapping.

Fixes #2509.
2017-02-13 12:56:23 -05:00
Daniel e88db3cd5e
Revert "Revert "Copy all statsd stats to Prometheus. (#2474)" (#2541)"
This reverts commit 9d9e4941a5 and
restores the statsd prometheus code.
2017-02-01 15:48:18 -05:00
Daniel McCarney 9d9e4941a5 Revert "Copy all statsd stats to Prometheus. (#2474)" (#2541)
This reverts commit 58ccd7a71a.

We are seeing multiple boulder components restart when they encounter the stat registration race condition described in https://github.com/letsencrypt/boulder/issues/2540
2017-02-01 12:50:27 -05:00
Jacob Hoffman-Andrews 58ccd7a71a Copy all statsd stats to Prometheus. (#2474)
We have a number of stats already expressed using the statsd interface. During
the switchover period to direct Prometheus collection, we'd like to make those
stats available both ways. This change automatically exports any stats exported
using the statsd interface via Prometheus as well.

This is a little tricky because Prometheus expects all stats to by registered
exactly once. Prometheus does offer a mechanism to gracefully recover from
registering a stat more than once by handling a certain error, but it is not
safe for concurrent access. So I added a concurrency-safe wrapper that creates
Prometheus stats on demand and memoizes them.

In the process, made a few small required side changes:
 - Clean "/" from method names in the gRPC interceptors. They are allowed in
   statsd but not in Prometheus.
 - Replace "127.0.0.1" with "boulder" as the name of our testing CT log.
   Prometheus stats can't start with a number.
 - Remove ":" from the CT-log stat names emitted by Publisher. Prometheus stats
   can't include it.
 - Remove a stray "RA" in front of some rate limit stats, since it was
   duplicative (we were emitting "RA.RA..." before).

Note that this means two stat groups in particular are duplicated:
 - Gostats* is duplicated with the default process-level stats exported by the
   Prometheus library.
 - gRPCClient* are duplicated by the stats generated by the go-grpc-prometheus
   package.

When writing dashboards and alerts in the Prometheus world, we should be careful
to avoid these two categories, as they will disappear eventually. As a general
rule, if a stat is available with an all-lowercase name, choose that one, as it
is probably the Prometheus-native version.

In the long run we will want to create most stats using the native Prometheus
stat interface, since it allows us to use add labels to metrics, which is very
useful. For instance, currently our DNS stats distinguish types of queries by
appending the type to the stat name. This would be more natural as a label in
Prometheus.
2017-01-10 10:30:15 -05:00
Jacob Hoffman-Andrews 263db24571 Disable fail-fast for gRPC. (#2397) (#2434)
This is a roll-forward of 5b865f1, with the QueueDeclare and QueueBind changes
in AMQP-RPC removed, and the startup order changes in test/startservers.py
removed. The AMQP-RPC changes caused RabbitMQ permission problems in production,
and the startup order changes depended on the AMQP-RPC changes but were not
required now that we have a unittest also.

This allows us to restart backends with relatively little interruption in
service, provided the backends come up promptly.

Fixes #2389 and #2408
2016-12-15 12:52:34 -08:00
Jacob Hoffman-Andrews 5407a45b02 Revert "Disable fail-fast for gRPC. (#2397)" (#2427)
This reverts commit 5b865f1d63.

The QueueDeclare and QueueBind calls in that change caused AMQP permission
denied errors.
2016-12-13 13:20:08 -08:00
Jacob Hoffman-Andrews 5b865f1d63 Disable fail-fast for gRPC. (#2397)
This allows us to restart backends with relatively little interruption in
service, provided the backends come up promptly.

Fixes #2389 and #2408
2016-12-09 12:03:45 -08:00
Jacob Hoffman-Andrews b8a237ffb3 Use grpc-go-prometheus for RPC stats. (#2391)
There's an off-the-shelf package that provides most of the stats we care about
for gRPC using interceptors. This change vendors go-grpc-prometheus and its
dependencies, and calls out to the interceptors provided by that package from
our own interceptors.

This will allow us to get metrics like latency histograms by call, status codes
by call, and so on.

Fixes #2390.

This change vendors go-grpc-prometheus and its dependencies. Per contributing guidelines, I've run the tests on these dependencies, and they pass:

go test github.com/davecgh/go-spew/spew github.com/grpc-ecosystem/go-grpc-prometheus github.com/grpc-ecosystem/go-grpc-prometheus/examples/testproto github.com/pmezard/go-difflib/difflib github.com/stretchr/testify/assert github.com/stretchr/testify/require github.com/stretchr/testify/suite 
ok      github.com/davecgh/go-spew/spew 0.022s
ok      github.com/grpc-ecosystem/go-grpc-prometheus    0.120s
?       github.com/grpc-ecosystem/go-grpc-prometheus/examples/testproto [no test files]
ok      github.com/pmezard/go-difflib/difflib   0.042s
ok      github.com/stretchr/testify/assert      0.021s
ok      github.com/stretchr/testify/require     0.017s
ok      github.com/stretchr/testify/suite       0.012s
2016-12-05 14:31:22 -08:00
Jacob Hoffman-Andrews 27a1446010 Move timeouts into client interceptor. (#2387)
Previously we had custom code in each gRPC wrapper to implement timeouts. Moving
the timeout code into the client interceptor allows us to simplify things and
reduce code duplication.
2016-12-05 10:42:26 -05:00
Roland Bracewell Shoemaker 09483007bd Cleanup gRPC metric formatting (#2218)
Based on experience with the new gRPC staging deployment. gRPC generates `FullMethod` names such as `-ServiceName-MethodName` which can be confusing. For client calls to a service we actually want something formatted like `ServiceName-MethodName` and for server requests we want just `MethodName`.

This PR adds a method to clean up the `FullMethod` names returned by gRPC and formats them the way we expect.
2016-10-14 10:26:13 -07:00
Roland Bracewell Shoemaker e187c92715 Add gRPC client side metrics (#2151)
Fixes #1880.

Updates google.golang.org/grpc and github.com/jmhodges/clock, both test suites pass. A few of the gRPC interfaces changed so this also fixes those breakages.
2016-09-09 15:17:36 -04:00
Roland Bracewell Shoemaker 7b29dba75d Add gRPC server-side interceptor (#1933)
Adds a server side unary RPC interceptor which includes basic stats. We could also use this to add a server request ID to the context.Context to identify the call through the system, but really I'd rather do that on the client side before the RPC is sent which requires the client interceptor implementation upstream. Also updates google.golang.org/grpc.

Updates #1880.
2016-06-20 11:27:32 -04:00