boulder

Commit Graph

Author	SHA1	Message	Date
Samantha Frank	71178f4ca4	WFE: Track in-flight for "/" (#7759 )	2024-10-18 12:59:26 -04:00
Samantha Frank	d0c9aa3808	WFE: Track in-flight HTTP requests by endpoint using a gauge (#7758 )	2024-10-18 09:51:02 -04:00
Matthew McPherrin	3aae67b8a9	Opentelemetry: Add option for public endpoints (#6867 ) This PR adds a new configuration block specifically for the otelhttp instrumentation. This block is separate from the existing "opentelemetry" configuration, and is only relevant when using otelhttp instrumentation. It does not share any codepath with the existing configuration, so it is at the top level to indicate which services it applies to. There's a bit of plumbing new configuration through. I've adopted the measured_http package to also set up opentelemetry instead of just metrics, which should hopefully allow any future changes to be smaller (just config & there) and more consistent between the wfe2 and ocsp responder. There's one option here now, which disables setting [otelhttp.WithPublicEndpoint](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp#WithPublicEndpoint). This option is designed to do exactly what we want: Don't accept incoming spans as parents of the new span created in the server. Previously we had a setting to disable parent-based sampling to help with this problem, which doesn't really make sense anymore, so let's just remove it and simplify that setup path. The default of "false" is designed to be the safe option. It's set to True in the test/ configs for integration tests that use traces, and I expect we'll likely set it true in production eventually once the LBs are configured to handle tracing themselves. Fixes #6851	2023-05-12 15:34:34 -04:00
Andrew Gabbitas	55a519fc33	Decrease InternetFacingBuckets histogram (#5748 )	2021-10-25 17:39:28 -06:00
Roland Bracewell Shoemaker	5b2f11e07e	Switch away from old style statsd metrics wrappers (#4606 ) In a handful of places I've nuked old stats which are not used in any alerts or dashboards as they either duplicate other stats or don't provide much insight/have never actually been used. If we feel like we need them again in the future it's trivial to add them back. There aren't many dashboards that rely on old statsd style metrics, but a few will need to be updated when this change is deployed. There are also a few cases where prometheus labels have been changed from camel to snake case, dashboards that use these will also need to be updated. As far as I can tell no alerts are impacted by this change. Fixes #4591.	2019-12-18 11:08:25 -05:00
Daniel McCarney	893e8459d6	Use pebble-challtestrv cmd, letsencrypt/challtestsrv package. (#3980 ) Now that Pebble has a `pebble-challtestsrv` we can remove the `challtestrv` package and associated command from Boulder. I switched CI to use `pebble-challtestsrv`. Notably this means that we have to add our expected mock data using the HTTP management interface. The Boulder-tools images are regenerated to include the `pebble-challtestsrv` command. Using this approach also allows separating the TLS-ALPN-01 and HTTPS HTTP-01 challenges by binding each challenge type in the `pebble-challtestsrv` to different interfaces both using the same VA HTTPS port. Mock DNS directs the VA to the correct interface. The load-generator command that was previously using the `challtestsrv` package from Boulder is updated to use a vendored copy of the new `github.org/letsencrypt/challtestsrv` package. Vendored dependencies change in two ways: 1) Gomock is updated to the latest release (matching what the Bouldertools image provides) 2) A couple of new subpackages in `golang.org/x/net/` are added by way of transitive dependency through the challtestsrv package. Unit tests are confirmed to pass for `gomock`: ``` ~/go/src/github.com/golang/mock/gomock$ git log --pretty=format:'%h' -n 1 51421b9 ~/go/src/github.com/golang/mock/gomock$ go test ./... ok github.com/golang/mock/gomock 0.002s ? github.com/golang/mock/gomock/internal/mock_matcher [no test files] ``` For `/x/net` all tests pass except two `/x/net/icmp` `TestDiag.go` test cases that we have agreed are OK to ignore. Resolves https://github.com/letsencrypt/boulder/issues/3962 and https://github.com/letsencrypt/boulder/issues/3951	2018-12-12 14:32:56 -05:00
Roland Bracewell Shoemaker	912fa6ffff	Properly set status code when WriteHeader isn't explicitly called (#3828 ) If a handler doesn't explicitly call `WriteHeader` before `Write` then the status code is set to `http.StatusOK` but `measured_http.MeasuredHandler` doesn't handle this which results in reporting `0` as the response code.	2018-08-24 11:37:32 -07:00
Joel Sing	f8a023e49c	Remove various unnecessary uses of fmt.Sprintf (#3707 ) Remove various unnecessary uses of fmt.Sprintf - in particular: - Avoid calls like t.Error(fmt.Sprintf(...)), where t.Errorf can be used directly. - Use strconv when converting an integer to a string, rather than using fmt.Sprintf("%d", ...). This is simpler and can also detect type errors at compile time. - Instead of using x.Write([]byte(fmt.Sprintf(...))), use fmt.Fprintf(x, ...).	2018-05-11 11:55:25 -07:00
Roland Bracewell Shoemaker	8167abd5e3	Use internet facing appropriate histogram buckets for DNS latencies (#3616 ) Also instead of repeating the same bucket definitions everywhere just use a single top level var in the metrics package in order to discourage copy/pasting. Fixes #3607.	2018-04-04 08:01:54 -04:00
Jacob Hoffman-Andrews	6cd777bd8d	Fix up stats after #3167 (#3185 ) There were two bugs in #3167: All process-level stats got prefixed with "boulder", which broke dashboards. All request_time stats got dropped, because measured_http was using the prometheus DefaultRegisterer. To fix, this PR plumbs through a scope object to measured_http, and uses an empty prefix when calling NewProcessCollector().	2017-10-18 11:14:59 -07:00
Roland Bracewell Shoemaker	e91349217e	Switch to using go 1.9 (#3047 ) * Switch to using go 1.9 * Regenerate with 1.9 * Manually fix import path... * Upgrade mockgen and regenerate * Update github.com/golang/mock	2017-09-06 16:30:13 -04:00
Jacob Hoffman-Andrews	9c7482fa94	Remove error return from Scope interface. (#2857 ) This was inherited from the statsd interface but never used. This allows us to remove one of our errcheck exceptions.	2017-07-11 10:54:06 -07:00
Jacob Hoffman-Andrews	5b9d737380	Fix statsd->prometheus bridge. (#2828 ) In #2752, I accidentally introduced a change that would use a NewRegistry for each NewPromScope, ignoring the Registry that was passed as an argument. Because this registry was not attached to any HTTP server, the results would not get exported. This fixes that, so the Registry passed into NewPromScope is respected. In the process, I noticed that stats were getting prefixed by a spurious "_". I fixed that by turning prefix into a slice of strings, and combining them with "_" only if it the slice is non-empty. Fixes #2824.	2017-06-26 14:07:30 -04:00
Jacob Hoffman-Andrews	0bfb542514	Use fields, not globals, for stats (#2790 ) Following up on #2752, we don't need to use global vars for our Prometheus stats. We already have a custom registry plumbed through using Scope objects. In this PR, expose the MustRegister method of that registry through the Scope interface, and move existing global vars to be fields of objects. This should improve testability somewhat. Note that this has a bit of an unfortunate side effect: two instances of the same stats-using class (e.g. VA) can't use the same Scope object, because their MustRegister calls will conflict. In practice this is fine since we never instantiate duplicates of the the classes that use stats, but it's something we should keep an eye on. Updates #2733	2017-06-06 12:09:31 -07:00
Jacob Hoffman-Andrews	4e30af4ec8	Collapse switch cases.	2017-05-31 14:17:55 -07:00
Jacob Hoffman-Andrews	81e58b66f7	Use whitelisted methods for HTTP stats. Fixes #2776	2017-05-30 15:50:12 -07:00
Jacob Hoffman-Andrews	b17b5c72a6	Remove statsd from Boulder (#2752 ) This removes the config and code to output to statsd. - Change `cmd.StatsAndLogging` to output a `Scope`, not a `Statter`. - Remove the prefixing of component name (e.g. "VA") in front of stats; this was stripped by `autoProm` but now no longer needs to be. - Delete vendored statsd client. - Delete `MockStatter` (generated by gomock) and `mocks.Statter` (hand generated) in favor of mocking `metrics.Scope`, which is the interface we now use everywhere. - Remove a few unused methods on `metrics.Scope`, and update its generated mock. - Refactor `autoProm` and add `autoRegisterer`, which can be included in a `metrics.Scope`, avoiding global state. `autoProm` now registers everything with the `prometheus.Registerer` it is given. - Change va_test.go's `setup()` to not return a stats object; instead the individual tests that care about stats override `va.stats` directly. Fixes #2639, #2733.	2017-05-15 10:19:54 -04:00
Roland Bracewell Shoemaker	bd045b9325	Fix OCSP-Responder double slash collapsing (#2748 ) Uses a special mux for the OCSP Responder so that we stop collapsing double slashes in GET requests which cause a small number of requests to be considered malformed.	2017-05-10 09:51:10 -04:00
Jacob Hoffman-Andrews	d59188c676	Use pattern to determine endpoint metrics. (#2689 ) This ensures we don't create infinite metrics based on users hitting non-existent endpoints.	2017-04-20 13:14:47 -04:00
Jacob Hoffman-Andrews	4b665e35a6	Use Prometheus stats for VA, WFE, and OCSP Responder (#2628 ) Rename HTTPMonitor to MeasuredHandler. Remove inflight stat (we didn't use it). Add timing stat by method, endpoint, and status code. The timing stat subsumes the "rate" stat, so remove that. WFE now wraps in MeasuredHandler, instead of relying on its cmd/main.go. Remove FBAdapter stats. MeasuredHandler tracks stats by method, status code, and endpoint. In VA, add a Prometheus histogram for validation timing.	2017-04-03 17:03:04 -07:00
Daniel	cc896c9996	Strips invalid characters in `promAdjust`. The `promAdjust` function in `auto.go` previously allowed characters that were not valid in prometheus metric names (e.g. '>'). This commit updates `promAdjust` to remove invalid characters. The `TestPromAdjust` function is updated with testcases that include invalid characters.	2017-02-01 16:07:33 -05:00
Daniel	e88db3cd5e	Revert "Revert "Copy all statsd stats to Prometheus. (#2474 )" (#2541 )" This reverts commit `9d9e4941a5` and restores the statsd prometheus code.	2017-02-01 15:48:18 -05:00
Daniel McCarney	9d9e4941a5	Revert "Copy all statsd stats to Prometheus. (#2474 )" (#2541 ) This reverts commit `58ccd7a71a`. We are seeing multiple boulder components restart when they encounter the stat registration race condition described in https://github.com/letsencrypt/boulder/issues/2540	2017-02-01 12:50:27 -05:00
Jacob Hoffman-Andrews	58ccd7a71a	Copy all statsd stats to Prometheus. (#2474 ) We have a number of stats already expressed using the statsd interface. During the switchover period to direct Prometheus collection, we'd like to make those stats available both ways. This change automatically exports any stats exported using the statsd interface via Prometheus as well. This is a little tricky because Prometheus expects all stats to by registered exactly once. Prometheus does offer a mechanism to gracefully recover from registering a stat more than once by handling a certain error, but it is not safe for concurrent access. So I added a concurrency-safe wrapper that creates Prometheus stats on demand and memoizes them. In the process, made a few small required side changes: - Clean "/" from method names in the gRPC interceptors. They are allowed in statsd but not in Prometheus. - Replace "127.0.0.1" with "boulder" as the name of our testing CT log. Prometheus stats can't start with a number. - Remove ":" from the CT-log stat names emitted by Publisher. Prometheus stats can't include it. - Remove a stray "RA" in front of some rate limit stats, since it was duplicative (we were emitting "RA.RA..." before). Note that this means two stat groups in particular are duplicated: - Gostats* is duplicated with the default process-level stats exported by the Prometheus library. - gRPCClient* are duplicated by the stats generated by the go-grpc-prometheus package. When writing dashboards and alerts in the Prometheus world, we should be careful to avoid these two categories, as they will disappear eventually. As a general rule, if a stat is available with an all-lowercase name, choose that one, as it is probably the Prometheus-native version. In the long run we will want to create most stats using the native Prometheus stat interface, since it allows us to use add labels to metrics, which is very useful. For instance, currently our DNS stats distinguish types of queries by appending the type to the stat name. This would be more natural as a label in Prometheus.	2017-01-10 10:30:15 -05:00
Jacob Hoffman-Andrews	68f8b686af	Remove pid from stats. (#2182 )	2016-09-16 10:56:45 -07:00
Roland Bracewell Shoemaker	c8f1fb3e2f	Remove direct usages of go-statsd-client in favor of using metrics.Scope (#2136 ) Fixes #2118, fixes #2082.	2016-09-07 19:35:13 -04:00
Jacob Hoffman-Andrews	0543691d9e	Add stats to Publisher (#2083 ) Fixes #1576. Adds a new package mock_metrics, with code generated by gomock, in order to test the change. Modifies publisher.New to take a metrics.Scope and an SA, and unexport SA. Moves core of submission loop into a separate function, singleLogSubmit, which can return an error rather than using the continue keyword. This reduces repetition of AuditErr lines, and makes it easier to put error statting in one place.	2016-08-17 16:25:33 -07:00
Roland Bracewell Shoemaker	7b29dba75d	Add gRPC server-side interceptor (#1933 ) Adds a server side unary RPC interceptor which includes basic stats. We could also use this to add a server request ID to the context.Context to identify the call through the system, but really I'd rather do that on the client side before the RPC is sent which requires the client interceptor implementation upstream. Also updates google.golang.org/grpc. Updates #1880.	2016-06-20 11:27:32 -04:00
Roland Bracewell Shoemaker	54573b36ba	Remove all stray copyright headers and appends the initial line to LICENSE.txt (#1853 )	2016-05-31 12:32:04 -07:00
Jacob Hoffman-Andrews	6d5348f975	Run go generate in Travis (#1762 ) * Fix go generate command in metrics. The previous command only worked on OS X. This one works on Linux but not OS X. Also add generate phase of test.sh. * Add mockgen to test setup. * Fix github-pr-status output. * Fix envvar style. * Set xtrace. * Fix test.sh * Fix test.sh some more. * Fix mockgen command. * Add dependencies for running `go generate`. * Add protoc-gen-go. * Fix go get command. * Fix generate. * Wait for all. * Fix generate. * Update generated pb. * Fix generate commands for vendored world. * Update documentation for new vendor style. * Update grpc package to latest. * Update caaChecker proto with latest. * Run go generate only over TESTPATHS * See if Travis passes under 1.6 * Switch back to 1.5. * Trim run command. * Run stringer from correct directory. * Move generate command. * Restore and generate * Fix path. * list contents of GOPATH. * Fix stringer by prebuilding. * Try another import path. * regenerate bcode_string. * remove excess package * pull jsha fork of protoc-gen-go that echoes * Echo protoc version. * install from source * CD back. * Go back to normal protoc-gen-go * Fix path * Move protobuf install into test/setup.sh * Move before_install to install. * Set PATH. * Follow 301 with curl. * Shuffle test order. * Swap back test order. * Restore all tests. * Restore 1.5.3 to Travis. * Remove unnecessary wait-or-exit * Generate metrics mock with latest mockgen. * Wrap TESTPATHS in curlies * Remove spurious bracket	2016-04-21 15:23:06 -07:00
Kane York	b7cf618f5d	context.Context as the first parameter of all RPC calls (#1741 ) Change core/interfaces to put context.Context as the first parameter of all RPC calls in preparation for gRPC.	2016-04-19 11:34:36 -07:00
Jacob Hoffman-Andrews	e6c17e1717	Switch to new vendor style (#1747 ) * Switch to new vendor style. * Fix metrics generate command. * Fix miekg/dns types_generate. * Use generated copies of files. * Update miekg to latest. Fixes a problem with `go generate`. * Set GO15VENDOREXPERIMENT. * Build in letsencrypt/boulder. * fix travis more. * Exclude vendor instead of godeps. * Replace some ... * Fix unformatted cmd * Fix errcheck for vendorexp * Add GO15VENDOREXPERIMENT to Makefile. * Temp disable errcheck. * Restore master fetch. * Restore errcheck. * Build with 1.6 also. * Match statsd." Skip errcheck unles Go1.6. * Add other ignorepkg. * Fix errcheck. * move errcheck * Remove go1.6 requirement. * Put godep-restore with errcheck. * Remove go1.6 dep. * Revert master fetch revert. * Remove -r flag from godep save. * Set GO15VENDOREXPERIMENT in Dockerfile and remove _worskpace. * Fix Godep version.	2016-04-18 12:51:36 -07:00
Roland Bracewell Shoemaker	800b5b0cbf	Switch to using a wrapped statter that provides PID * Switch to using a wrapped statter that provides PID * Fix tests and change some types to interfaces * Add hostname to suffix + update comment	2016-04-01 15:43:35 -07:00
Jeff Hodges	57b6dd5bb5	make HTTPMonitor a http.Handler	2016-02-01 22:01:21 -08:00
Jeff Hodges	07c0a547fa	add metrics.Scope This allows us to easily scope stats to a namespace without having to get the whole long strings right everywhere.	2015-12-15 20:13:03 -08:00
Jacob Hoffman-Andrews	c68e30ff6f	Remove timing stat that can be user controlled. Fixes https://github.com/letsencrypt/boulder/issues/1062. I've skimmed all other stat names and they appear to be safe. We can collect this same info from Nginx request timings in the logs.	2015-11-02 13:32:45 -08:00
Roland Shoemaker	9b0586dfdc	Add and use clock	2015-10-01 15:49:50 -07:00
Roland Shoemaker	081b81d170	Add a facebookgo/stats client that sends StatsD metrics for facebookgo/httpdown	2015-09-26 21:38:05 -07:00
Roland Shoemaker	e00fd3253f	Remove unused RPCMonitor struct	2015-09-16 18:06:13 -07:00
Roland Shoemaker	00905ac07a	Move RPCMonitor log to the RPCClient and do the collect natively	2015-09-10 12:48:35 -07:00
Roland Shoemaker	a3c9f60bec	Review fixes	2015-08-30 22:15:13 -07:00
Roland Shoemaker	163d03725f	Add RPC monitor tests	2015-08-24 13:39:53 -07:00
Roland Shoemaker	01787da891	VA test fixes	2015-08-24 12:49:35 -07:00
Roland Shoemaker	4e8ee38935	Watch for timeouts	2015-08-19 15:07:32 -07:00
Roland Shoemaker	370cd07bc9	Move rpc delivery timing stuff to new metrics lib	2015-08-15 22:25:52 -07:00
Roland Shoemaker	2677c4e314	Moved http stuff to metrics library	2015-08-15 22:13:25 -07:00

46 Commits