Commit Graph

4368 Commits

Author SHA1 Message Date
Daniel McCarney 783784b680 SA: Enable OrderReadyStatus feature flag in config-next. (#3738)
We landed this feature flag disabled pending Certbot's acme library
supporting this status value. That work has landed and so we can enable
this feature in `config-next` ahead of a staging/prod rollout.
2018-05-29 10:32:58 -07:00
Jacob Hoffman-Andrews d42a0ab277
Godeps: Fix golang_protobuf_extensions comment. (#3737)
We're still on the same commit, which Godep previously commented as
"v1.0.0-2-gc12348c" (in other words, commit `c12348c`, which is slightly
ahead of the v1.0.0 tag). The upstream repo recently tagged a v1.0.1
release, at the same commit we were using. This caused Godep in our
tests to use a simplified comment referencing only the tag, which caused
a spurious diff and failed the test.

This commit updates the comment in Godeps.json.
2018-05-29 09:20:12 -07:00
Daniel c69314c45e Godeps: Fix golang_protobuf_extensions comment.
The "v1.0.0-2-gc12348c" tag referenced in the Godeps.json comment for
the "github.com/matttproud/golang_protobuf_extensions/pbutil" import
doesn't seem to exist in the upstream repo anymore.

The "v1.0.1" comment being flagged as a diff in Godeps restore during CI
_does_ exist and it points to the same commit
(c12348ce28de40eed0136aa2b644d0ee0650e56c) we are using.

This commit fixes the comment to match upstream & expected.

(ugh Godeps....)
2018-05-29 11:27:49 -04:00
Jacob Hoffman-Andrews b8e42cfbdf Update to latest boulder-tools. (#3734)
* Update to latest boulder-tools.

* Add Fprint* to errcheck ignore.
2018-05-29 08:58:44 -04:00
Kyle Spiers dd0e0249e5 core/util: ValidSerial should return false if the serial is not 32 or 36 (#3712)
The current check always fails because a length can't be simultaneously be both less than 32 and greater than 36.
2018-05-24 15:31:06 -04:00
Jacob Hoffman-Andrews b3f5c0f6e5
Speed up goodkey test. (#3733)
This is one of our slowest unittests, clocking in at 23 seconds in a recent run.
This was largely due to generating keys. Note that performance is significantly
worse under the race detector.
2018-05-23 16:11:46 -07:00
Joel Sing 2540d59296 Implement CAA validation-methods checking. (#3716)
When performing CAA checking respect the validation-methods parameter (if
present) and restrict the allowed authorization methods to those specified.
This allows a domain to restrict authorization methods that can be used with
Let's Encrypt.

This is largely based on PR #3003 (by @lukaslihotzki), which was landed and
then later reverted due to issue #3143. The bug the resulted in the previous
code being reverted has been addressed (likely inadvertently) by 76973d0f.

This implementation also includes integration tests for CAA validation-methods.

Fixes issue #3143.
2018-05-23 14:32:31 -07:00
Jacob Hoffman-Andrews 5ad14170fb Ignore canceled IsSafeDomain calls. (#3730)
Fixes #3681.
2018-05-23 12:50:30 -07:00
Jacob Hoffman-Andrews dbcb16543e Start using multiple-IP hostnames for load balancing (#3687)
We'd like to start using the DNS load balancer in the latest version of gRPC. That means putting all IPs for a service under a single hostname (or using a SRV record, but we're not taking that path). This change adds an sd-test-srv to act as our service discovery DNS service. It returns both Boulder IP addresses for any A lookup ending in ".boulder". This change also sets up the Docker DNS for our boulder container to defer to sd-test-srv when it doesn't know an answer.

sd-test-srv doesn't know how to resolve public Internet names like `github.com`. Resolving public names is required for the `godep-restore` test phase, so this change breaks out a copy of the boulder container that is used only for `godep-restore`.

This change implements a shim of a DNS resolver for gRPC, so that we can switch to DNS-based load balancing with the currently vendored gRPC, then when we upgrade to the latest gRPC we won't need a simultaneous config update.

Also, this change introduces a check at the end of the integration test that each backend received at least one RPC, ensuring that we are not sending all load to a single backend.
2018-05-23 09:47:14 -04:00
Jacob Hoffman-Andrews ef0324727d Remove exclude for Go 1.10.2 coverage build. (#3728) 2018-05-21 13:07:28 -04:00
Tom Delmas e78a7bdb10 Doc "Boulder divergences from ACME": ACME v2 is in production (#3725)
Update ACME divergences to reflect that ACME v2 is in production and has 3 divergences from the current RFC.
2018-05-21 09:29:08 -04:00
Daniel McCarney 3df93d2230 VA: Log observed CAA records. (#3726)
This is a quick first pass at audit logging the *dns.CAA records in JSON format from within the VA's IsCAAValid function. This will provide more information for post-hoc analysis of CAA decisions.
2018-05-18 15:28:45 -07:00
Daniel McCarney 084819b011
VA: CAA tag identifiers are case insensitive. (#3722)
Per RFC 6844 Section 5.1 the matching of CAA tag identifiers (e.g.
"Issue") is case insensitive. This commit updates the CAA tag processing
to be case insensitive as required by the RFC.

To exercise the fix this commit adds a test case to the `caaMockDNS`
`LookupCAA` implementation for a hostname (`mixedcase.com`) that has
a CAA record with a mixed case `Issue` tag. Prior to the fix from this
branch being included in `va/caa.go` the test fails:
```
    --- FAIL: TestCAAChecking/Bad_(Reserved,_Mixed_case_Issue) (0.00s)
          caa_test.go:292: checkCAARecords validity mismatch for
          mixedcase.com: got true expected false
```

With the fix applied, the test passes.
2018-05-18 09:34:57 -04:00
Jacob Hoffman-Andrews f7dd91534d Backport config-next/wfe2.json changes to config/ (#3721) 2018-05-17 08:24:34 -04:00
Daniel McCarney 4aacecc318 Godeps: Update `weppos/publicsuffix-go` to 67ec7c1. (#3717)
This commit updates the `github.com/weppos/publicsuffix-go` dependency to
67ec7c1, the tip of master at the time of writing.

Unit tests are verified to pass:

```
$ go test ./...
?   	github.com/weppos/publicsuffix-go/cmd/load	[no test files]
ok  	github.com/weppos/publicsuffix-go/net/publicsuffix	(cached)
ok  	github.com/weppos/publicsuffix-go/publicsuffix	(cached)
```
2018-05-16 10:45:39 -07:00
Jacob Hoffman-Andrews 0709b1caa9
Godeps: Try to appease CI by mimicking comment diff. (#3718)
Prior to this PR the builds of master are failing in Travis with an error during the `godep-restore` phase of our CI:

```
[   godep-restore] Starting
godep restore
rm -rf Godeps/ vendor/
godep save ./...
diff /dev/fd/63 /dev/fd/62
254c254
<                       "Comment": "v1.3-28-g3955978",
---
>                       "Comment": "v1.3.0-28-g3955978",
>                       git diff --exit-code -- ./vendor/
>
```

The root cause here is the upstream `github.com/go-sql-driver/mysql` project adding a
new tag for the 3955978caca48c1658a4bb7a9c6a0f084e326af3 commit. The new tag
is named to match semvar requirements and Godeps in CI is preferring that tag.
Updating the file to point to this new tag is a sensible fix from our side.

This commit updates the "Comment" filed to match what CI expects.
2018-05-16 09:59:57 -07:00
Daniel e65286659e Godeps: Try to appease CI by mimicking comment diff.
Prior to this commit the builds of master are failing in Travis with an
error during the `godep-restore` phase of our CI:

```
[   godep-restore] Starting
godep restore
rm -rf Godeps/ vendor/
godep save ./...
diff /dev/fd/63 /dev/fd/62
254c254
<                       "Comment": "v1.3-28-g3955978",
---
>                       "Comment": "v1.3.0-28-g3955978",
>                       git diff --exit-code -- ./vendor/
>
```

This seems to be a mysterious difference in the "Comment" field of the
`github.com/go-sql-driver/mysql` dependency. This dep hasn't changed
versions so given the general level of frustration involved with
debugging Godep it seems like the easiest path forward is to mimick the
diff.

This commit updates the "Comment" filed to match what CI expects.
2018-05-16 11:42:07 -04:00
Roland Bracewell Shoemaker 30394c4b4c Accept empty pin and generate a key ID (#3713)
Two fixes that I found while doing work on the gen-cert tool and setting up the HSM again
* Accept an empty PIN argument, this allows purely using the PED for login if not using challenge mode
* Generate 4 byte key ID for public/private key pairs during key gen, the HSM doesn't generate this field itself and `letsencrypt/pkcs11key` relies on this attribute to function
2018-05-16 08:33:34 -04:00
Joel Sing 5bc7fe639d Distinguish between recheckCAA failures. (#3710)
When rechecking CAA, the existing code maps all failures to a CAAError.
This means that any other non-CAA failure (for example, an internal server
error) gets hidden.

Avoid this by reworking recheckCAA to return errors and if we find a
non-CAAError, we return that directly. Revise tests to cover both
situations.

Updates issue #3143.
2018-05-15 17:57:35 -07:00
Joel Sing 1da6af39a1 Add an integration test for CAA rechecking. (#3709)
The existing CAA tests only test the CAA checks on the validation path and
not the CAA rechecking in the case where an existing authorization is present
(but older than the 8 hour window).

This extends the CAA integration tests to also cover the CAA rechecking
code path, by reusing older authorizations and rejecting issuance via CAA.
2018-05-15 09:55:28 -07:00
Daniel McCarney 700219a4a0
Dev/CI: Remove Go 1.10.1, default to 1.10.2 (#3711)
Production/staging have been updated to a release built with Go 1.10.2.
This allows us to remove the Go 1.10.1 builds from the travis matrix and
default to Go 1.10.2.
2018-05-11 16:25:25 -04:00
Daniel McCarney 5597a77ba2
WFE2: Allow legacy Key ID prefix for ACME v2 JWS. (#3705)
While we intended to allow legacy ACME v1 accounts created through the WFE to work with the ACME v2 implementation and the WFE2 we neglected to consider that a legacy account would have a Key ID URL that doesn't match the expected for a V2 account. This caused `wfe2/verify.go`'s `lookupJWK` to reject all POST requests authenticated by a legacy account unless the ACME client took the extra manual step of "fixing" the URL.

This PR adds a configuration parameter to the WFE2 for an allowed legacy key ID prefix. The WFE2 verification logic is updated to allow both the expected key ID prefix and the configured legacy key ID prefix. This will allow us to specify the correct legacy URL in configuration for both staging/prod to allow unmodified V1 ACME accounts to be used with ACME v2.

Resolves https://github.com/letsencrypt/boulder/issues/3674
2018-05-11 15:57:56 -04:00
metaclassing f6c49adc30 Make datetime consistent for authz expiration (#3706)
So I took a quick stab at one possible solution for the authz expiration variance discussed over at community.letsencrypt.org/t/inconsistent-datetime-format-in-some-responses/61452

Golang's nanosecond precision results in newly created pending authz having one expires datetime, but subsequent requests have a different expires datetime due to database storage throwing away fractional second information.

  "expires": "2018-04-14T04:43:00.105818284Z",
...
  "expires": "2018-04-14T04:43:00Z",
I am not a Go expert so there might be some more widely accepted approach to accomplishing the same thing, please let me know if you would prefer a different solution.
2018-05-11 11:57:52 -07:00
Joel Sing f03c2517c6 Simplify the recheckCAA function. (#3701)
Using both a sync.WaitGroup and channel is somewhat unnecessary - instead
synchronize directly on the channel. Additionally, strings in Go are
immutable - as such using string concatentation in a loop results in
reallocation. Use a string slice and strings.Join, which not only avoids
this, but also cuts down on the lines of code needed.
2018-05-11 11:56:27 -07:00
Joel Sing f8a023e49c Remove various unnecessary uses of fmt.Sprintf (#3707)
Remove various unnecessary uses of fmt.Sprintf - in particular:

- Avoid calls like t.Error(fmt.Sprintf(...)), where t.Errorf can be used directly.

- Use strconv when converting an integer to a string, rather than using
  fmt.Sprintf("%d", ...). This is simpler and can also detect type errors at
  compile time.

- Instead of using x.Write([]byte(fmt.Sprintf(...))), use fmt.Fprintf(x, ...).
2018-05-11 11:55:25 -07:00
Joel Sing 9990d14654 Convert the probs functions to be formatters. (#3708)
Many of the probs.XYZ calls are of the form probs.XYZ(fmt.Sprintf(...)).
Convert these functions to take a format string and optional arguments,
following the same pattern used in the errors package. Convert the
various call sites to remove the now redundant fmt.Sprintf calls.
2018-05-11 11:51:16 -07:00
Joel Sing 087074c73b Fix issue with expired authz test. (#3704)
The test_expired_authz_404() test is currently broken in two ways - firstly,
there is no way for it to distinguish between a 404 from an expired authz
and a 404 from a non-existant authz. Secondly, the test_expired_authz_purger()
test runs and wipes out all of the existing authorizations, including the one
that was set up from setup_seventy_days_ago(), before the expired test runs.

Avoid this by running the expired authorization purger test from later in main().
Also, add a minimal canary that will detect if all authorizations have been purged
(although this still does not guarantee that we got a 404 due to expiration).
2018-05-11 10:56:32 -07:00
Joel Sing 8ebdfc60b6 Provide formatting logger functions. (#3699)
A very large number of the logger calls are of the form log.Function(fmt.Sprintf(...)).
Rather than sprinkling fmt.Sprintf at every logger call site, provide formatting versions
of the logger functions and call these directly with the format and arguments.

While here remove some unnecessary trailing newlines and calls to String/Error.
2018-05-10 11:06:29 -07:00
Jacob Hoffman-Andrews 099d1c8858 Reduce load testing request rate in Travis. (#3703)
This may reduce the amount of logs we output, getting us below the level
that gets our jobs killed.
2018-05-10 08:32:53 -04:00
Daniel McCarney 76a3f4a18f RA/CA: Use `doNotForceCN: false` for `test/config`. (#3698)
In staging/prod we use `doNotForceCN: false` for both the RA & CA
config. Switching this to `true` is blocked on CABF work that will
likely take considerable time.

In the short-term we should use `doNotForceCN: false` in `test/config`
and only use `doNotForceCN: true` in `test/config-next`.
2018-05-09 12:54:16 -07:00
Daniel McCarney c254159235 challsrv: Common ACME challenge response server library/command. (#3689)
Prior to this commit we had two implementations of ACME challenge
servers for use in tests:
1) test/dns-test-srv - a small fake DNS server used for adding/removing
   DNS-01 TXT records and returning fake A/AAAA data.
2) test/load-generator/challenge-servers.go - a small library for
   providing an HTTP-01 challenge server.

This commit consolidates both into a dedicated `test/challsrv` package.
The `load-generator` code is updated to use this library package to
implement its HTTP-01 challenge server. This leaves the `load-generator`
as a nice stand alone tool that doesn't need coordination between itself
and a separate `challsrv` binary.

To keep the `dns-test-srv` use-case of a nice standalone binary that can
be run from `test/startservers.py` the `test/challsrv` package has
a `test/challsrv/cmd/challsrv` package that provides the `challsrv`
command. This is a stand-alone binary that can offer both an HTTP-01 and
a DNS-01 challenge server along with a management HTTP interface that
can be used by external programs to add/remove HTTP-01 and DNS-01
challenges.

The Boulder integration tests are updated to use `challsrv` instead of
`dns-test-srv`. Presently only the DNS-01 challenge server of `challsrv`
is used by the integration tests.

TODO: The DNS-01 challenge server is doing a fair number of non-DNS-01
challenge things (Fake host data, etc). This should be cleaned up and
made configurable.

Updates #3652
2018-05-09 12:49:13 -07:00
Jacob Hoffman-Andrews 75e4c65c21
Republish most ports (#3700)
Merging #3693 broke Certbot's integration tests. I'm working with them on a PR that will make the tests work against both a published and unpublished Boulder container. But the experience made me realize a lot of clients are running Boulder for integration tests, and we shouldn't break them without warning. This republishes the necessary ports, while leaving the unnecessary (debugging) ports unpublished.
2018-05-09 11:34:03 -07:00
Roland Bracewell Shoemaker b2a2a24dc3 Stop using validation record as an input/output (#3694)
This change cleans up how `va.http01Dialer` works with regards to `core.ValidationRecord`s. Instead of using the record as both an input and a output it now uses a set of inputs and outputs information about addresses via a channel. The validation record is then constructed in the parent scope or in the redirect function instead of the dialer itself.

Fixes #2730, fixes #3109, and fixes #3663.
2018-05-09 11:55:14 -04:00
Roland Bracewell Shoemaker e3eb3019b2 Update golang.org/x/net (#3695)
Updates `golang.org/x/net` to master (d11bb6cd).

```
$ go test ./...
ok  	golang.org/x/net/bpf	(cached)
ok  	golang.org/x/net/context	(cached)
ok  	golang.org/x/net/context/ctxhttp	(cached)
?   	golang.org/x/net/dict	[no test files]
ok  	golang.org/x/net/dns/dnsmessage	(cached)
ok  	golang.org/x/net/html	(cached)
ok  	golang.org/x/net/html/atom	(cached)
ok  	golang.org/x/net/html/charset	(cached)
ok  	golang.org/x/net/http/httpguts	(cached)
ok  	golang.org/x/net/http/httpproxy	(cached)
ok  	golang.org/x/net/http2	(cached)
?   	golang.org/x/net/http2/h2i	[no test files]
ok  	golang.org/x/net/http2/hpack	(cached)
ok  	golang.org/x/net/icmp	0.199s
ok  	golang.org/x/net/idna	(cached)
?   	golang.org/x/net/internal/iana	[no test files]
?   	golang.org/x/net/internal/nettest	[no test files]
ok  	golang.org/x/net/internal/socket	(cached)
ok  	golang.org/x/net/internal/socks	(cached)
ok  	golang.org/x/net/internal/sockstest	(cached)
ok  	golang.org/x/net/internal/timeseries	(cached)
ok  	golang.org/x/net/ipv4	(cached)
ok  	golang.org/x/net/ipv6	(cached)
ok  	golang.org/x/net/nettest	(cached)
ok  	golang.org/x/net/netutil	(cached)
ok  	golang.org/x/net/proxy	(cached)
ok  	golang.org/x/net/publicsuffix	(cached)
ok  	golang.org/x/net/trace	(cached)
ok  	golang.org/x/net/webdav	(cached)
ok  	golang.org/x/net/webdav/internal/xml	(cached)
ok  	golang.org/x/net/websocket	(cached)
ok  	golang.org/x/net/xsrftoken	(cached)
```

Fixes #3692.
2018-05-08 10:38:32 -07:00
Roland Bracewell Shoemaker bd755a31e5 Log error body for non-500 CT submission failures (#3688)
This lets us better debug strange behavior from logs.
2018-05-08 13:20:58 -04:00
Jacob Hoffman-Andrews c5438dc2dc Unpublish ports in docker-compose.yml. (#3693)
Originally we published these ports mainly to make it easier to access Boulder
from the host when running things in a container locally. However, with the
recent changes to the Docker network, Boulder is now at a fixed, predictable IP
address on the Docker network that can be reached from the host, so we no longer
need these published ports.

This is a nice change because people running a Boulder test environment may not
want to open up ports on their public interfaces.
2018-05-07 20:53:23 -07:00
Jacob Hoffman-Andrews a4421ae75b Run gRPC backends on multiple IPs instead of multiple ports (#3679)
We're currently stuck on gRPC v1.1 because of a breaking change to certificate validation in gRPC 1.8. Our gRPC balancer uses a static list of multiple hostnames, and expects to validate against those hostnames. However gRPC expects that a service is one hostname, with multiple IP addresses, and validates all those IP addresses against the same hostname. See grpc/grpc-go#2012.

If we follow gRPC's assumptions, we can rip out our custom Balancer and custom TransportCredentials, and will probably have a lower-friction time in general.

This PR is the first step in doing so. In order to satisfy the "multiple IPs, one port" property of gRPC backends in our Docker container infrastructure, we switch to Docker's user-defined networking. This allows us to give the Boulder container multiple IP addresses on different local networks, and gives it different DNS aliases in each network.

In startservers.py, each shard of a service listens on a different DNS alias for that service, and therefore a different IP address. The listening port for each shard of a service is now identical.

This change also updates the gRPC service certificates. Now, each certificate that is used in a gRPC service (as opposed to something that is "only" a client) has three names. For instance, sa1.boulder, sa2.boulder, and sa.boulder (the generic service name). For now, we are validating against the specific hostnames. When we update our gRPC dependency, we will begin validating against the generic service name.

Incidentally, the DNS aliases feature of Docker allows us to get rid of some hackery in entrypoint.sh that inserted entries into /etc/hosts.

Note: Boulder now has a dependency on the DNS aliases feature in Docker. By default, docker-compose run creates a temporary container and doesn't assign any aliases to it. We now need to specify docker-compose run --use-aliases to get the correct behavior. Without --use-aliases, Boulder won't be able to resolve the hostnames it wants to bind to.
2018-05-07 10:38:31 -07:00
Roland Bracewell Shoemaker 9920058ef0 Cleanup .travis.yml (#3691)
Our `.travis.yml` does some weird things that are a bit non-standard. This change brings it into line with the general Travis conventions - https://docs.travis-ci.com/user/customizing-the-build
2018-05-04 19:32:30 -04:00
Daniel McCarney 230db4ebd1 Dev/CI: Add Go 1.10.2 to build matrix. (#3690)
This PR adds Go 1.10.2 to the build matrix along with Go 1.10.1.

After staging/prod have been updated to Go 1.10.2 we can remove Go 1.10.1.

Resolves #3680
2018-05-04 12:35:11 -07:00
Roland Bracewell Shoemaker 9821aeb46f Split internal and public errors out in web.RequestEvent (#3682)
Splits out the old `Errors` slice into a public `Error` string and a `InternalErrors` slice. Also removes a number of occurrences of calling `logEvent.AddError` then immediately calling `wfe.sendError` with either the same internal error which caused the same error to be logged twice or no error which is slightly redundant as `wfe.sendError` calls `logEvent.AddError` internally.

Fixes #3664.
2018-05-03 09:13:33 -04:00
Daniel McCarney b7f356150a SA: Cleanup, forbid nil issuer arg to AddCertificate (#3675)
In #3651 we introduced a new parameter to sa.AddCertificate to allow specifying the Issued date. If nil, we defaulted to the current time to maintain deployability guidelines.

Now that this has been deployed everywhere this PR updates SA.AddCertificate and the gRPC wrappers such that a nil issuer argument is rejected with an error.

Unit tests that were previously using nil for the issued time are updated to explicitly set the issued time to the fake clock's now().

Resolves #3657
2018-05-02 10:29:21 -07:00
Roland Bracewell Shoemaker a5ac5fa078 Deprecate IPv6First feature flag (#3684) 2018-05-02 10:22:25 -07:00
Roland Bracewell Shoemaker c3669f9068 Split endpoint and path in WFE+WFE2 web.RequestEvent (#3683) 2018-05-02 10:20:21 -07:00
Roland Bracewell Shoemaker d01f74402b Fix ec gen-key test (#3685)
Test sign function didn't properly pad R and S in the EC signature as per the PKCS#11 spec.

Fixes #3671.
2018-05-01 18:07:01 -07:00
Daniel McCarney 041cd26738
SA: Remove unused `CountCertificateRange` RPC. (#3676)
Now that #3638 has been deployed to all of the RA instances there are no
more RPC clients using the SA's `CountCertificatesRange` RPC.

This commit deletes the implementation, the RPC definition & wrappers,
and all the test code/mocks.
2018-05-01 15:39:45 -04:00
Daniel McCarney 054f181458 load-generator: send correct ACMEv2 Content-Type on POST (#3667)
load generator: send correct ACMEv2 Content-Type on POST.

This PR updates the Boulder load-generator to send the correct ACMEv2 Content-Type header when POSTing the ACME server. This is required for ACMEv2 and without it all POST requests to the WFE2 running a test/config-next configuration result in malformed 400 errors. While only required by ACMEv2 this commit sends it for ACMEv1 requests as well. No harm no foul.

integration tests: allow running just the load generator.
Prior to this PR an omission in an if statement in integration-test.py meant that you couldn't invoke test/integration-test.py with just the --load argument to only run the load generator. This commit updates the if to allow this use case.
2018-05-01 12:22:43 -07:00
Jacob Hoffman-Andrews 0fac9e6586 Remove redundant wfe-tls/README.txt. (#3678) 2018-05-01 09:24:27 -04:00
Daniel McCarney 5f8cae847b gitignore: ignore .gocache folder (#3677) 2018-04-30 10:17:02 -07:00
Daniel McCarney 4f9ee00510 gRPC: publish in-flight RPC gauge in client interceptor. (#3672)
This PR updates the Boulder gRPC clientInterceptor to update a Prometheus gauge stat for each in-flight RPC it dispatches, sliced by service and method.

A unit test is included that uses a custom ChillerServer that lets the test block up a bunch of RPCs, check the in-flight gauge value is increased, unblock the RPCs, and recheck that the in-flight gauge is reduced. To check the gauge value for a specific set of labels a new test-tools.go function GaugeValueWithLabels is added.

Updates #3635
2018-04-27 15:53:54 -07:00
Daniel McCarney 0e07eacb01
gRPC: Rename histogram rpc_lag -> grpc_lag (#3673) 2018-04-26 16:19:13 -04:00