Commit Graph

46 Commits

Author SHA1 Message Date
Samantha Frank 71178f4ca4
WFE: Track in-flight for "/" (#7759) 2024-10-18 12:59:26 -04:00
Samantha Frank d0c9aa3808
WFE: Track in-flight HTTP requests by endpoint using a gauge (#7758) 2024-10-18 09:51:02 -04:00
Matthew McPherrin 3aae67b8a9
Opentelemetry: Add option for public endpoints (#6867)
This PR adds a new configuration block specifically for the otelhttp
instrumentation. This block is separate from the existing
"opentelemetry" configuration, and is only relevant when using otelhttp
instrumentation. It does not share any codepath with the existing
configuration, so it is at the top level to indicate which services it
applies to.

There's a bit of plumbing new configuration through. I've adopted the
measured_http package to also set up opentelemetry instead of just
metrics, which should hopefully allow any future changes to be smaller
(just config & there) and more consistent between the wfe2 and ocsp
responder.

There's one option here now, which disables setting
[otelhttp.WithPublicEndpoint](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp#WithPublicEndpoint).
This option is designed to do exactly what we want: Don't accept
incoming spans as parents of the new span created in the server.
Previously we had a setting to disable parent-based sampling to help
with this problem, which doesn't really make sense anymore, so let's
just remove it and simplify that setup path. The default of "false" is
designed to be the safe option. It's set to True in the test/ configs
for integration tests that use traces, and I expect we'll likely set it
true in production eventually once the LBs are configured to handle
tracing themselves.

Fixes #6851
2023-05-12 15:34:34 -04:00
Andrew Gabbitas 55a519fc33
Decrease InternetFacingBuckets histogram (#5748) 2021-10-25 17:39:28 -06:00
Roland Bracewell Shoemaker 5b2f11e07e Switch away from old style statsd metrics wrappers (#4606)
In a handful of places I've nuked old stats which are not used in any alerts or dashboards as they either duplicate other stats or don't provide much insight/have never actually been used. If we feel like we need them again in the future it's trivial to add them back.

There aren't many dashboards that rely on old statsd style metrics, but a few will need to be updated when this change is deployed. There are also a few cases where prometheus labels have been changed from camel to snake case, dashboards that use these will also need to be updated. As far as I can tell no alerts are impacted by this change.

Fixes #4591.
2019-12-18 11:08:25 -05:00
Daniel McCarney 893e8459d6
Use pebble-challtestrv cmd, letsencrypt/challtestsrv package. (#3980)
Now that Pebble has a `pebble-challtestsrv` we can remove the `challtestrv`
package and associated command from Boulder. I switched CI to use
`pebble-challtestsrv`. Notably this means that we have to add our expected mock
data using the HTTP management interface. The Boulder-tools images are
regenerated to include the `pebble-challtestsrv` command.

Using this approach also allows separating the TLS-ALPN-01 and HTTPS HTTP-01
challenges by binding each challenge type in the `pebble-challtestsrv` to
different interfaces both using the same VA
HTTPS port. Mock DNS directs the VA to the correct interface.

The load-generator command that was previously using the `challtestsrv` package
from Boulder is updated to use a vendored copy of the new
`github.org/letsencrypt/challtestsrv` package.

Vendored dependencies change in two ways:
1) Gomock is updated to the latest release (matching what the Bouldertools image
   provides)
2) A couple of new subpackages in `golang.org/x/net/` are added by way of
   transitive dependency through the challtestsrv package.

Unit tests are confirmed to pass for `gomock`:
```
~/go/src/github.com/golang/mock/gomock$ git log --pretty=format:'%h' -n 1
51421b9
~/go/src/github.com/golang/mock/gomock$ go test ./...
ok    github.com/golang/mock/gomock 0.002s
?     github.com/golang/mock/gomock/internal/mock_matcher [no test files]
```
For `/x/net` all tests pass except two `/x/net/icmp` `TestDiag.go` test cases
that we have agreed are OK to ignore.

Resolves https://github.com/letsencrypt/boulder/issues/3962 and
https://github.com/letsencrypt/boulder/issues/3951
2018-12-12 14:32:56 -05:00
Roland Bracewell Shoemaker 912fa6ffff Properly set status code when WriteHeader isn't explicitly called (#3828)
If a handler doesn't explicitly call `WriteHeader` before `Write` then the status code is set to `http.StatusOK` but `measured_http.MeasuredHandler` doesn't handle this which results in reporting `0` as the response code.
2018-08-24 11:37:32 -07:00
Joel Sing f8a023e49c Remove various unnecessary uses of fmt.Sprintf (#3707)
Remove various unnecessary uses of fmt.Sprintf - in particular:

- Avoid calls like t.Error(fmt.Sprintf(...)), where t.Errorf can be used directly.

- Use strconv when converting an integer to a string, rather than using
  fmt.Sprintf("%d", ...). This is simpler and can also detect type errors at
  compile time.

- Instead of using x.Write([]byte(fmt.Sprintf(...))), use fmt.Fprintf(x, ...).
2018-05-11 11:55:25 -07:00
Roland Bracewell Shoemaker 8167abd5e3 Use internet facing appropriate histogram buckets for DNS latencies (#3616)
Also instead of repeating the same bucket definitions everywhere just use a single top level var in the metrics package in order to discourage copy/pasting.

Fixes #3607.
2018-04-04 08:01:54 -04:00
Jacob Hoffman-Andrews 6cd777bd8d Fix up stats after #3167 (#3185)
There were two bugs in #3167:

All process-level stats got prefixed with "boulder", which broke dashboards.
All request_time stats got dropped, because measured_http was using the prometheus DefaultRegisterer.
To fix, this PR plumbs through a scope object to measured_http, and uses an empty prefix when calling NewProcessCollector().
2017-10-18 11:14:59 -07:00
Roland Bracewell Shoemaker e91349217e Switch to using go 1.9 (#3047)
* Switch to using go 1.9

* Regenerate with 1.9

* Manually fix import path...

* Upgrade mockgen and regenerate

* Update github.com/golang/mock
2017-09-06 16:30:13 -04:00
Jacob Hoffman-Andrews 9c7482fa94 Remove error return from Scope interface. (#2857)
This was inherited from the statsd interface but never used. This allows us to
remove one of our errcheck exceptions.
2017-07-11 10:54:06 -07:00
Jacob Hoffman-Andrews 5b9d737380 Fix statsd->prometheus bridge. (#2828)
In #2752, I accidentally introduced a change that would use a NewRegistry for
each NewPromScope, ignoring the Registry that was passed as an argument. Because
this registry was not attached to any HTTP server, the results would not get
exported. This fixes that, so the Registry passed into NewPromScope is
respected.

In the process, I noticed that stats were getting prefixed by a spurious "_". I
fixed that by turning prefix into a slice of strings, and combining them with "_"
only if it the slice is non-empty.

Fixes #2824.
2017-06-26 14:07:30 -04:00
Jacob Hoffman-Andrews 0bfb542514 Use fields, not globals, for stats (#2790)
Following up on #2752, we don't need to use global vars for our Prometheus stats. We already have a custom registry plumbed through using Scope objects. In this PR, expose the MustRegister method of that registry through the Scope interface, and move existing global vars to be fields of objects. This should improve testability somewhat.

Note that this has a bit of an unfortunate side effect: two instances of the same stats-using class (e.g. VA) can't use the same Scope object, because their MustRegister calls will conflict. In practice this is fine since we never instantiate duplicates of the the classes that use stats, but it's something we should keep an eye on.

Updates #2733
2017-06-06 12:09:31 -07:00
Jacob Hoffman-Andrews 4e30af4ec8 Collapse switch cases. 2017-05-31 14:17:55 -07:00
Jacob Hoffman-Andrews 81e58b66f7 Use whitelisted methods for HTTP stats.
Fixes #2776
2017-05-30 15:50:12 -07:00
Jacob Hoffman-Andrews b17b5c72a6 Remove statsd from Boulder (#2752)
This removes the config and code to output to statsd.

- Change `cmd.StatsAndLogging` to output a `Scope`, not a `Statter`.
- Remove the prefixing of component name (e.g. "VA") in front of stats; this was stripped by `autoProm` but now no longer needs to be.
- Delete vendored statsd client.
- Delete `MockStatter` (generated by gomock) and `mocks.Statter` (hand generated) in favor of mocking `metrics.Scope`, which is the interface we now use everywhere.
- Remove a few unused methods on `metrics.Scope`, and update its generated mock.
- Refactor `autoProm` and add `autoRegisterer`, which can be included in a `metrics.Scope`, avoiding global state. `autoProm` now registers everything with the `prometheus.Registerer` it is given.
- Change va_test.go's `setup()` to not return a stats object; instead the individual tests that care about stats override `va.stats` directly.

Fixes #2639, #2733.
2017-05-15 10:19:54 -04:00
Roland Bracewell Shoemaker bd045b9325 Fix OCSP-Responder double slash collapsing (#2748)
Uses a special mux for the OCSP Responder so that we stop collapsing double slashes in GET requests which cause a small number of requests to be considered malformed.
2017-05-10 09:51:10 -04:00
Jacob Hoffman-Andrews d59188c676 Use pattern to determine endpoint metrics. (#2689)
This ensures we don't create infinite metrics based on users hitting
non-existent endpoints.
2017-04-20 13:14:47 -04:00
Jacob Hoffman-Andrews 4b665e35a6 Use Prometheus stats for VA, WFE, and OCSP Responder (#2628)
Rename HTTPMonitor to MeasuredHandler.
Remove inflight stat (we didn't use it).
Add timing stat by method, endpoint, and status code.
The timing stat subsumes the "rate" stat, so remove that.
WFE now wraps in MeasuredHandler, instead of relying on its cmd/main.go.
Remove FBAdapter stats.
MeasuredHandler tracks stats by method, status code, and endpoint.

In VA, add a Prometheus histogram for validation timing.
2017-04-03 17:03:04 -07:00
Daniel cc896c9996
Strips invalid characters in `promAdjust`.
The `promAdjust` function in `auto.go` previously allowed characters
that were not valid in prometheus metric names (e.g. '>'). This commit
updates `promAdjust` to remove invalid characters. The `TestPromAdjust`
function is updated with testcases that include invalid characters.
2017-02-01 16:07:33 -05:00
Daniel e88db3cd5e
Revert "Revert "Copy all statsd stats to Prometheus. (#2474)" (#2541)"
This reverts commit 9d9e4941a5 and
restores the statsd prometheus code.
2017-02-01 15:48:18 -05:00
Daniel McCarney 9d9e4941a5 Revert "Copy all statsd stats to Prometheus. (#2474)" (#2541)
This reverts commit 58ccd7a71a.

We are seeing multiple boulder components restart when they encounter the stat registration race condition described in https://github.com/letsencrypt/boulder/issues/2540
2017-02-01 12:50:27 -05:00
Jacob Hoffman-Andrews 58ccd7a71a Copy all statsd stats to Prometheus. (#2474)
We have a number of stats already expressed using the statsd interface. During
the switchover period to direct Prometheus collection, we'd like to make those
stats available both ways. This change automatically exports any stats exported
using the statsd interface via Prometheus as well.

This is a little tricky because Prometheus expects all stats to by registered
exactly once. Prometheus does offer a mechanism to gracefully recover from
registering a stat more than once by handling a certain error, but it is not
safe for concurrent access. So I added a concurrency-safe wrapper that creates
Prometheus stats on demand and memoizes them.

In the process, made a few small required side changes:
 - Clean "/" from method names in the gRPC interceptors. They are allowed in
   statsd but not in Prometheus.
 - Replace "127.0.0.1" with "boulder" as the name of our testing CT log.
   Prometheus stats can't start with a number.
 - Remove ":" from the CT-log stat names emitted by Publisher. Prometheus stats
   can't include it.
 - Remove a stray "RA" in front of some rate limit stats, since it was
   duplicative (we were emitting "RA.RA..." before).

Note that this means two stat groups in particular are duplicated:
 - Gostats* is duplicated with the default process-level stats exported by the
   Prometheus library.
 - gRPCClient* are duplicated by the stats generated by the go-grpc-prometheus
   package.

When writing dashboards and alerts in the Prometheus world, we should be careful
to avoid these two categories, as they will disappear eventually. As a general
rule, if a stat is available with an all-lowercase name, choose that one, as it
is probably the Prometheus-native version.

In the long run we will want to create most stats using the native Prometheus
stat interface, since it allows us to use add labels to metrics, which is very
useful. For instance, currently our DNS stats distinguish types of queries by
appending the type to the stat name. This would be more natural as a label in
Prometheus.
2017-01-10 10:30:15 -05:00
Jacob Hoffman-Andrews 68f8b686af Remove pid from stats. (#2182) 2016-09-16 10:56:45 -07:00
Roland Bracewell Shoemaker c8f1fb3e2f Remove direct usages of go-statsd-client in favor of using metrics.Scope (#2136)
Fixes #2118, fixes #2082.
2016-09-07 19:35:13 -04:00
Jacob Hoffman-Andrews 0543691d9e Add stats to Publisher (#2083)
Fixes #1576.

Adds a new package mock_metrics, with code generated by gomock, in order to test the change.
Modifies publisher.New to take a metrics.Scope and an SA, and unexport SA.
Moves core of submission loop into a separate function, singleLogSubmit, which can return an error rather than using the continue keyword. This reduces repetition of AuditErr lines, and makes it easier to put error statting in one place.
2016-08-17 16:25:33 -07:00
Roland Bracewell Shoemaker 7b29dba75d Add gRPC server-side interceptor (#1933)
Adds a server side unary RPC interceptor which includes basic stats. We could also use this to add a server request ID to the context.Context to identify the call through the system, but really I'd rather do that on the client side before the RPC is sent which requires the client interceptor implementation upstream. Also updates google.golang.org/grpc.

Updates #1880.
2016-06-20 11:27:32 -04:00
Roland Bracewell Shoemaker 54573b36ba Remove all stray copyright headers and appends the initial line to LICENSE.txt (#1853) 2016-05-31 12:32:04 -07:00
Jacob Hoffman-Andrews 6d5348f975 Run go generate in Travis (#1762)
* Fix go generate command in metrics.

The previous command only worked on OS X. This one works on Linux but not
OS X.

Also add generate phase of test.sh.

* Add mockgen to test setup.

* Fix github-pr-status output.

* Fix envvar style.

* Set xtrace.

* Fix test.sh

* Fix test.sh some more.

* Fix mockgen command.

* Add dependencies for running `go generate`.

* Add protoc-gen-go.

* Fix go get command.

* Fix generate.

* Wait for all.

* Fix generate.

* Update generated pb.

* Fix generate commands for vendored world.

* Update documentation for new vendor style.

* Update grpc package to latest.

* Update caaChecker proto with latest.

* Run go generate only over TESTPATHS

* See if Travis passes under 1.6

* Switch back to 1.5.

* Trim run command.

* Run stringer from correct directory.

* Move generate command.

* Restore and generate

* Fix path.

* list contents of GOPATH.

* Fix stringer by prebuilding.

* Try another import path.

* regenerate bcode_string.

* remove excess package

* pull jsha fork of protoc-gen-go that echoes

* Echo protoc version.

* install from source

* CD back.

* Go back to normal protoc-gen-go

* Fix path

* Move protobuf install into test/setup.sh

* Move before_install to install.

* Set PATH.

* Follow 301 with curl.

* Shuffle test order.

* Swap back test order.

* Restore all tests.

* Restore 1.5.3 to Travis.

* Remove unnecessary wait-or-exit

* Generate metrics mock with latest mockgen.

* Wrap TESTPATHS in curlies

* Remove spurious bracket
2016-04-21 15:23:06 -07:00
Kane York b7cf618f5d context.Context as the first parameter of all RPC calls (#1741)
Change core/interfaces to put context.Context as the first parameter of all RPC calls in preparation for gRPC.
2016-04-19 11:34:36 -07:00
Jacob Hoffman-Andrews e6c17e1717 Switch to new vendor style (#1747)
* Switch to new vendor style.

* Fix metrics generate command.

* Fix miekg/dns types_generate.

* Use generated copies of files.

* Update miekg to latest.

Fixes a problem with `go generate`.

* Set GO15VENDOREXPERIMENT.

* Build in letsencrypt/boulder.

* fix travis more.

* Exclude vendor instead of godeps.

* Replace some ...

* Fix unformatted cmd

* Fix errcheck for vendorexp

* Add GO15VENDOREXPERIMENT to Makefile.

* Temp disable errcheck.

* Restore master fetch.

* Restore errcheck.

* Build with 1.6 also.

* Match statsd.*"

* Skip errcheck unles Go1.6.

* Add other ignorepkg.

* Fix errcheck.

* move errcheck

* Remove go1.6 requirement.

* Put godep-restore with errcheck.

* Remove go1.6 dep.

* Revert master fetch revert.

* Remove -r flag from godep save.

* Set GO15VENDOREXPERIMENT in Dockerfile and remove _worskpace.

* Fix Godep version.
2016-04-18 12:51:36 -07:00
Roland Bracewell Shoemaker 800b5b0cbf Switch to using a wrapped statter that provides PID
* Switch to using a wrapped statter that provides PID

* Fix tests and change some types to interfaces

* Add hostname to suffix + update comment
2016-04-01 15:43:35 -07:00
Jeff Hodges 57b6dd5bb5 make HTTPMonitor a http.Handler 2016-02-01 22:01:21 -08:00
Jeff Hodges 07c0a547fa add metrics.Scope
This allows us to easily scope stats to a namespace without having to
get the whole long strings right everywhere.
2015-12-15 20:13:03 -08:00
Jacob Hoffman-Andrews c68e30ff6f Remove timing stat that can be user controlled.
Fixes https://github.com/letsencrypt/boulder/issues/1062.

I've skimmed all other stat names and they appear to be safe.

We can collect this same info from Nginx request timings in the logs.
2015-11-02 13:32:45 -08:00
Roland Shoemaker 9b0586dfdc Add and use clock 2015-10-01 15:49:50 -07:00
Roland Shoemaker 081b81d170 Add a facebookgo/stats client that sends StatsD metrics for facebookgo/httpdown 2015-09-26 21:38:05 -07:00
Roland Shoemaker e00fd3253f Remove unused RPCMonitor struct 2015-09-16 18:06:13 -07:00
Roland Shoemaker 00905ac07a Move RPCMonitor log to the RPCClient and do the collect natively 2015-09-10 12:48:35 -07:00
Roland Shoemaker a3c9f60bec Review fixes 2015-08-30 22:15:13 -07:00
Roland Shoemaker 163d03725f Add RPC monitor tests 2015-08-24 13:39:53 -07:00
Roland Shoemaker 01787da891 VA test fixes 2015-08-24 12:49:35 -07:00
Roland Shoemaker 4e8ee38935 Watch for timeouts 2015-08-19 15:07:32 -07:00
Roland Shoemaker 370cd07bc9 Move rpc delivery timing stuff to new metrics lib 2015-08-15 22:25:52 -07:00
Roland Shoemaker 2677c4e314 Moved http stuff to metrics library 2015-08-15 22:13:25 -07:00