boulder

Commit Graph

Author	SHA1	Message	Date
Daniel McCarney	de5fbbdb67	Implement CAA issueWild enforcement for wildcard names (#3266 ) This commit implements RFC 6844's description of the "CAA issuewild property" for CAA records. We check CAA in two places: at the time of validation, and at the time of issuance when an authorization is more than 8hours old. Both locations have been updated to properly enforce issuewild when checking CAA for a domain corresponding to a wildcard name in a certificate order. Resolves https://github.com/letsencrypt/boulder/issues/3211	2017-12-13 12:09:33 -05:00
Daniel McCarney	09628bcfa2	WFE2 'new-nonce' endpoint (#3270 ) This commit adds the "new-nonce" endpoint to the WFE2. A small unit test is included. The existing /directory unit tests are updated for the new endpoint.	2017-12-13 08:29:34 -08:00
Daniel McCarney	a099e40b9c	Add 'new-order' endpoint to WFE2 directory. (#3269 ) When we implemented the new-order issuance flow for the WFE2 we forgot to include the endpoint in the /directory object. This commit adds it and updates associated tests.	2017-12-12 13:43:25 -08:00
Jacob Hoffman-Andrews	90f7998b15	Speed up expired authz purger (#3267 ) Now, rather than LIMIT / OFFSET, this uses the highest id from the last batch in each new batch's query. This makes efficient use of the index, and means the database does not have to scan over a large number of non-expired rows before starting to find any expired rows. This also changes the structure of the purge function to continually push ids for deletion onto a channel, to be processed by goroutines consuming that channel. Also, remove the --yes flag and prompting.	2017-12-11 12:05:43 -05:00
Jacob Hoffman-Andrews	68d5cc3331	Restore gRPC metrics (#3265 ) The go-grpc-prometheus package by default registers its metrics with Prometheus' global registry. In #3167, when we stopped using the global registry, we accidentally lost our gRPC metrics. This change adds them back. Specifically, it adds two convenience functions, one for clients and one for servers, that makes the necessary metrics object and registers it. We run these in the main function of each server. I considered adding these as part of StatsAndLogging, but the corresponding ClientMetrics and ServerMetrics objects (defined by go-grpc-prometheus) need to be subsequently made available during construction of the gRPC clients and servers. We could add them as fields on Scope, but this seemed like a little too much tight coupling. Also, update go-grpc-prometheus to get the necessary methods. ``` $ go test github.com/grpc-ecosystem/go-grpc-prometheus/... ok github.com/grpc-ecosystem/go-grpc-prometheus 0.069s ? github.com/grpc-ecosystem/go-grpc-prometheus/examples/testproto [no test files] ```	2017-12-07 15:44:55 -08:00
Daniel McCarney	cda7b25c23	Do not `Update` pendingAuthz unnecessarily for chal update. (#3263 ) If two requests simultaneously update a challenge for the same authorization there is a chance that `UpdatePendingAuthorization` will encounter a Gorp optimistic lock error having read one LockCol value from a `Select` on the `pendingAuthorizations` table only for it to have changed by the time an `Update` on the same row is performed. After closer examination this `Update` is unnecessary! Only `RA.UpdateAuthorization` calls `SA.UpdatePendingAuthorization` and it does so only to record updated challenge information by way of `UpdatePendingAuthorization`'s call to `updateChallenges`. Since no data in the `pendingAuthorizations` row is being changed we don't need to do this `Update` at all, saving both a potential race condition & some database load. This commit removes the `Update` entirely. Several SA unit tests had to be updated because they were (ab)using `UpdatePendingAuthorization` to mutate pendingAuthz rows.	2017-12-06 12:20:37 -08:00
Roland Bracewell Shoemaker	bdea281ae0	Remove CAA SERVFAIL exceptions code (#3262 ) Fixes #3080.	2017-12-05 14:39:37 -08:00
Daniel McCarney	0684d5fc73	Add pending orders rate limit to new-order. (#3257 ) This commit adds a new rate limit to restrict the number of outstanding pending orders per account. If the threshold for this rate limit is crossed subsequent new-order requests will return a 429 response. Note: Since this the rate limit object itself defines an `Enabled()` test based on whether or not it has been configured there is not a feature flag for this change. Resolves https://github.com/letsencrypt/boulder/issues/3246	2017-12-04 16:36:48 -05:00
Daniel McCarney	1c99f91733	Policy based issuance for wildcard identifiers (Round two) (#3252 ) This PR implements issuance for wildcard names in the V2 order flow. By policy, pending authorizations for wildcard names only receive a DNS-01 challenge for the base domain. We do not re-use authorizations for the base domain that do not come from a previous wildcard issuance (e.g. a normal authorization for example.com turned valid by way of a DNS-01 challenge will not be reused for a .example.com order). The wildcard prefix is stripped off of the authorization identifier value in two places: When presenting the authorization to the user - ACME forbids having a wildcard character in an authorization identifier. When performing validation - We validate the base domain name without the . prefix. This PR is largely a rewrite/extension of #3231. Instead of using a pseudo-challenge-type (DNS-01-Wildcard) to indicate an authorization & identifier correspond to the base name of a wildcard order name we instead allow the identifier to take the wildcard order name with the *. prefix.	2017-12-04 12:18:10 -08:00
Daniel McCarney	55dd1020c0	Increase VA SingleDialTimeout to 10s. (#3260 ) This PR changes the VA's singleDialTimeout value from 5 * time.Second to 10 * time.Second. This will give slower servers a better chance to respond, especially for the multi-VA case where n requests arrive ~simultaneously. This PR also bumps the RA->VA timeout by 5s and the WFE->RA timeout by 5s to accommodate the increased dial timeout. I put this in a separate commit in case we'd rather deal with this separately.	2017-12-04 09:53:26 -08:00
Roland Bracewell Shoemaker	9da1bea433	Update histogram buckets for latencies that measure things over the internet (#3254 ) Updates the buckets for histograms in the publisher, va, and expiration-mailer which are used to measure the latency of operations that go over the internet and therefore are liable to take a lot longer than the default buckets can measure. Uses a standard set of buckets for all three instead of attempting to tune for each one. Fixes #3217.	2017-11-29 15:13:14 -08:00
Daniel McCarney	171da33513	Don't 500 on missing pendingAuthz (#3248 ) As described in #3201 a concurrent challenge POST would result in 500 errors if the pending authz row was deleted by a promotion to the authz table underneath another request. This PR adjusts the SA & RA so that if a pending authz is promoted to a final authz between updates a not found error will be returned instead of a server internal error. Resolves https://github.com/letsencrypt/boulder/issues/3201	2017-11-22 15:48:42 -05:00
Jacob Hoffman-Andrews	2fd2f9e230	Remove LegacyCAA implementation. (#3240 ) Fixes #3236	2017-11-20 16:09:00 -05:00
Andriy	72330bbedd	Fix `TestNormalizeCSR` test condition (#3245 ) Previous to this commit `TestNormalizeCSR` was comparing the expected DNSNames in the CSR to themselves, not the names in the CSR.	2017-11-17 08:30:09 -05:00
Jacob Hoffman-Andrews	25e5c3ec3c	Add test for SA's parallelismPerRPC. (#3241 ) Fixes #3138.	2017-11-16 09:15:15 -05:00
Jacob Hoffman-Andrews	0d8190f799	Update Boulderdash (#3232 ) Move Boulder CPU panel over Add basic expiry mailer stats Add HSM signatures panel Add page splits panel Add CT submissions panel, with old and new metrics for now	2017-11-13 13:22:45 -05:00
Roland Bracewell Shoemaker	d5db80ab12	Various publisher CT fixes (#3219 ) Makes a couple of changes: * Change `SubmitToCT` to make submissions to each log in parallel instead of in serial, this prevents a single slow log from eating up the majority of the deadline and causing submissions to other logs to fail * Remove the 'submissionTimeout' field on the publisher since it is actually bounded by the gRPC timeout as is misleading * Add a timeout to the CT clients internal HTTP client so that when log servers hang indefinitely we actually do retries instead of just using the entire submission deadline. Currently set at 2.5 minutes Fixes #3218.	2017-11-09 10:05:26 -05:00
Jacob Hoffman-Andrews	ef0bf7e9d0	Remove TooManyCertificatesError. (#3228 ) When counting certificates for rate limiting, we attempted to impose a limit on the query results to avoid we did not receive so many results that they caused slowness on the database or SA side. However, that check has never actually been executed correctly. The check was fixed in #3126, but rolling out that fix broke issuance for subscribers with rate limit overrides that have allowed them to exceed the limit. Because this limit has not been needed in practice over the years, remove it rather than refining it. The size of the results are loosely governed by our rate limits (and overrides), and if result sizes from this query become a performance issue in the future, we can address it then. For now, opt for simplification. Fixes #3214.	2017-11-09 09:11:45 -05:00
Robert Kästel	60ca8febb3	Pin version of Prometheus to 1.8.2 (#3230 ) Pin version of Prometheus to `1.8.2` because Prometheus folks has just released v2 and switched the `:latest` tag to that. Using Prometheus v2 produces an error: prometheus: error: unknown short flag '-c'	2017-11-09 09:03:12 -05:00
Jacob Hoffman-Andrews	975456bb08	Switch nagsAtCapacity to Gauge. (#3224 ) Fixes #3186	2017-11-08 15:35:25 -08:00
Jacob Hoffman-Andrews	9dc32b010f	Add indexes on certificateStatus. (#3225 ) In #1864 we discussed possible optimizations to how expiration-mailer and ocsp-updater query the certificateStatus table. In #2177 we added the notAfter and isExpired fields for more efficient querying. However, we forgot to add indexes on these fields. This change adds new indexes and drops the old indexes, and should result in much more efficient querying in those two components. Also, remove a comment that goose couldn't understand. Running EXPLAINs to show the difference: For expiration-mailer, before: MariaDB [boulder_sa_integration]> EXPLAIN SELECT cs.serial FROM certificateStatus AS cs WHERE cs.notAfter > DATE_ADD(NOW(), INTERVAL 21 DAY) AND cs.notAfter < DATE_ADD(NOW(), INTERVAL 10 DAY) AND cs.status != "revoked" AND COALESCE(TIMESTAMPDIFF(SECOND, cs.lastExpirationNagSent, cs.notAfter) > 10 * 86400, 1) ORDER BY cs.notAfter ASC LIMIT 100000; +------+-------------+-------+------+------------------------------+------+---------+------+------+-----------------------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| Extra \| +------+-------------+-------+------+------------------------------+------+---------+------+------+-----------------------------+ \| 1 \| SIMPLE \| cs \| ALL \| status_certificateStatus_idx \| NULL \| NULL \| NULL \| 486 \| Using where; Using filesort \| +------+-------------+-------+------+------------------------------+------+---------+------+------+-----------------------------+ 1 row in set (0.00 sec) For expiration-mailer, after: MariaDB [boulder_sa_integration]> EXPLAIN SELECT cs.serial FROM certificateStatus AS cs WHERE cs.notAfter < DATE_ADD(NOW(), INTERVAL 21 DAY) AND cs.notAfter < DATE_ADD(NOW(), INTERVAL 10 DAY) AND cs.status != "revoked" AND COALESCE(TIMESTAMPDIFF(SECOND, cs.lastExpirationNagSent, cs.notAfter) > 10 * 86400, 1) ORDER BY cs.notAfter ASC LIMIT 100000; +------+-------------+-------+-------+---------------+--------------+---------+------+------+------------------------------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| Extra \| +------+-------------+-------+-------+---------------+--------------+---------+------+------+------------------------------------+ \| 1 \| SIMPLE \| cs \| range \| notAfter_idx \| notAfter_idx \| 6 \| NULL \| 1 \| Using index condition; Using where \| +------+-------------+-------+-------+---------------+--------------+---------+------+------+------------------------------------+ For ocsp-updater, before: MariaDB [boulder_sa_integration]> EXPLAIN SELECT cs.serial, cs.status, cs.revokedDate, cs.notAfter FROM certificateStatus AS cs WHERE cs.ocspLastUpdated > DATE_SUB(NOW(), INTERVAL 10 DAY) AND cs.ocspLastUpdated < DATE_SUB(NOW(), INTERVAL 3 DAY) AND NOT cs.isExpired ORDER BY cs.ocspLastUpdated ASC LIMIT 100000; +------+-------------+-------+-------+---------------------------------------+---------------------------------------+---------+------+------+------------------------------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| Extra \| +------+-------------+-------+-------+---------------------------------------+---------------------------------------+---------+------+------+------------------------------------+ \| 1 \| SIMPLE \| cs \| range \| ocspLastUpdated_certificateStatus_idx \| ocspLastUpdated_certificateStatus_idx \| 5 \| NULL \| 1 \| Using index condition; Using where \| +------+-------------+-------+-------+---------------------------------------+---------------------------------------+---------+------+------+------------------------------------+ 1 row in set (0.00 sec) For ocsp-updater, after: MariaDB [boulder_sa_integration]> EXPLAIN SELECT cs.serial, cs.status, cs.revokedDate, cs.notAfter FROM certificateStatus AS cs WHERE cs.ocspLastUpdated > DATE_SUB(NOW(), INTERVAL 10 DAY) AND cs.ocspLastUpdated < DATE_SUB(NOW(), INTERVAL 3 DAY) AND NOT cs.isExpired ORDER BY cs.ocspLastUpdated ASC LIMIT 100000; +------+-------------+-------+-------+-------------------------------+-------------------------------+---------+------+------+-----------------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| Extra \| +------+-------------+-------+-------+-------------------------------+-------------------------------+---------+------+------+-----------------------+ \| 1 \| SIMPLE \| cs \| range \| isExpired_ocspLastUpdated_idx \| isExpired_ocspLastUpdated_idx \| 7 \| NULL \| 1 \| Using index condition \| +------+-------------+-------+-------+-------------------------------+-------------------------------+---------+------+------+-----------------------+ 1 row in set (0.00 sec)	2017-11-08 13:25:30 -08:00
Jacob Hoffman-Andrews	6178688231	Remove background subprocess for DB migrations. (#3226 ) We started running our DB migrations in the background to speed up CI. However, the semantics of subprocesses and `wait` mean that if a migration fails, the overall `create_db.sh` doesn't fail. That means, for instance, tests continue to run, and it's hard to find the resulting error. This change runs the migrations in serial again so that we can catch such errors more easily.	2017-11-08 09:25:33 -05:00
Jacob Hoffman-Andrews	5928a06d4d	Add a missing "2" to commit id. (#3223 )	2017-11-07 17:00:05 -05:00
Jacob Hoffman-Andrews	6af3f4e315	Update to latest certificate-transparency-go. (#3207 ) This pulls in multilog support (logs sharded by date). As a result, it also pulls in new dependencies gogo/protobuf (for UnmarshalText) and golang/protobuf/ptypes (for Timestamp). Replaces #3202, adding a smaller set of dependencies. See also #3205. Tests run: ``` $ go test github.com/gogo/protobuf/proto github.com/golang/protobuf/ptypes/... github.com/google/certificate-transparency-go/... ok github.com/gogo/protobuf/proto 0.063s ok github.com/golang/protobuf/ptypes 0.009s ? github.com/golang/protobuf/ptypes/any [no test files] ? github.com/golang/protobuf/ptypes/duration [no test files] ? github.com/golang/protobuf/ptypes/empty [no test files] ? github.com/golang/protobuf/ptypes/struct [no test files] ? github.com/golang/protobuf/ptypes/timestamp [no test files] ? github.com/golang/protobuf/ptypes/wrappers [no test files] ok github.com/google/certificate-transparency-go 1.005s ok github.com/google/certificate-transparency-go/asn1 0.021s ok github.com/google/certificate-transparency-go/client 22.034s ? github.com/google/certificate-transparency-go/client/ctclient [no test files] ok github.com/google/certificate-transparency-go/fixchain 0.145s ? github.com/google/certificate-transparency-go/fixchain/main [no test files] ok github.com/google/certificate-transparency-go/fixchain/ratelimiter 27.745s ok github.com/google/certificate-transparency-go/gossip 0.772s ? github.com/google/certificate-transparency-go/gossip/main [no test files] ok github.com/google/certificate-transparency-go/jsonclient 25.523s ok github.com/google/certificate-transparency-go/merkletree 0.004s ? github.com/google/certificate-transparency-go/preload [no test files] ? github.com/google/certificate-transparency-go/preload/dumpscts/main [no test files] ? github.com/google/certificate-transparency-go/preload/main [no test files] ok github.com/google/certificate-transparency-go/scanner 0.010s ? github.com/google/certificate-transparency-go/scanner/main [no test files] ok github.com/google/certificate-transparency-go/tls 0.026s ok github.com/google/certificate-transparency-go/x509 0.417s ? github.com/google/certificate-transparency-go/x509/pkix [no test files] ? github.com/google/certificate-transparency-go/x509util [no test files] ```	2017-11-07 07:59:46 -05:00
Jacob Hoffman-Andrews	4296dd985a	Use TLS in mailer integration tests (#3213 ) * Remove non-TLS support from mailer entirely * Add a config option for trusted roots in expiration-mailer. If unset, it defaults to the system roots, so this does not need to be set in production. * Use TLS in mail-test-srv, along with an internal root and localhost certificates signed by that root.	2017-11-06 14:57:14 -08:00
Jacob Hoffman-Andrews	8ed063a901	Revert "Logic error. Always-zero-value-variable used. (#3126 )" (#3215 ) This reverts commit `887d75f1e0`.	2017-11-06 09:36:24 -08:00
Jacob Hoffman-Andrews	5f0cbddd9d	Check for unnecessary godeps (#3206 ) Fixes https://github.com/letsencrypt/boulder/issues/3205. Previously, we would only move aside Godeps.json before running `godep save ./...`. However, in order to get a true picture of what is needed, we must also remove the existing `vendor/` directory. This change also removes some unnecessary dependencies that have piled up over the years, generally test dependencies. Godep used to vendor such dependencies but no longer does.	2017-11-03 14:30:07 -04:00
Roland Bracewell Shoemaker	f31d2867b2	Switch publisher to prom stats (#3212 ) Magical StatsD style->prom style stats are hard to actually use. Fixes #2906.	2017-11-03 08:48:18 -04:00
Jacob Hoffman-Andrews	d882a7a2d1	Remove export of feature flags. (#3210 ) In #3167 I removed the code that would use this, but forgot to remove the exporting code. This follows up on that. We don't currently use this for monitoring, and it's easier to get the current flags from a config file.	2017-11-02 07:07:02 -07:00
Jacob Hoffman-Andrews	3d9b3d4d20	Restore expvar handler. (#3209 ) In #3167 I removed expvar, thinking it was unused, but it turns out the RA exports the last issuance time, and core/util.go has a function to export BuildID, both of which are used in monitoring. This wasn't caught at compile time because the global expvar package was happy to register the exports even though there was no handler to serve them.	2017-11-02 07:05:54 -07:00
Jacob Hoffman-Andrews	8103ee0b27	Update godep instructions. (#3208 ) These are a little simpler and should be more reliable.	2017-11-02 09:24:11 -04:00
Jacob Hoffman-Andrews	5df083a57e	Add ROCA weak key checking (#3189 ) Thanks to @titanous for the library!	2017-11-02 08:42:59 -04:00
Daniel McCarney	2f263f8ed5	ACME v2 Finalize order support (#3169 ) This PR implements order finalization for the ACME v2 API. In broad strokes this means: * Removing the CSR from order objects & the new-order flow * Adding identifiers to the order object & new-order * Providing a finalization URL as part of orders returned by new-order * Adding support to the WFE's Order endpoint to receive finalization POST requests with a CSR * Updating the RA to accept finalization requests and to ensure orders are fully validated before issuance can proceed * Updating the SA to allow finding order authorizations & updating orders. * Updating the CA to accept an Order ID to log when issuing a certificate corresponding to an order object Resolves #3123	2017-11-01 12:39:44 -07:00
Ben Zarzycki	887d75f1e0	Logic error. Always-zero-value-variable used. (#3126 ) The intent here was pretty clear, but an oversight prevented the error condition from being checked.	2017-10-30 16:10:41 -07:00
Roland Bracewell Shoemaker	29c95f0aed	Add a PKCS#11 key generation tool (#3163 ) Tested against master SoftHSMv2 and relevant hardware. Fixes #3125.	2017-10-30 16:09:28 -07:00
Jacob Hoffman-Andrews	0882b86e6c	Add metrics to sendNags errors in expiration-mailer (#3198 ) Fixes #3176	2017-10-30 12:38:44 -07:00
Lucas Amorim	7daecf7b23	fix metric name	2017-10-30 11:26:03 -07:00
Lucas Amorim	a7a2eaf035	Add metrics to sendNags errors in expiration-mailer	2017-10-29 21:34:41 -07:00
Roland Bracewell Shoemaker	e2de327f4d	Remove unused old script (#3196 ) This appears to be from the RabbitMQ era and isn't referenced from anywhere else in the codebase.	2017-10-27 15:15:40 -04:00
Jacob Hoffman-Andrews	bf9ce64aca	Update GSB library (#3192 ) This pulls in google/safebrowsing#74, which introduces a new LookupURLsContext that allows us to pass through timeout information nicely. Also, update calling code to use LookupURLsContext instead of LookupURLs.	2017-10-24 08:33:03 -04:00
Jacob Hoffman-Andrews	c06dcfaf02	Limit number of authzs purged at once. (#3177 ) Previously the expired-authz-purger would try to load the ids for all relevant authzs into memory before doing any work. On a very large table, this would mean running out of memory. This setting allows limiting how much work will be done in one chunk. Also add periodic logging of deletion count. Fixes #3147.	2017-10-23 11:20:07 -07:00
Jacob Hoffman-Andrews	90278c80fe	Revert "Reject CAA responses containing DNAMEs (#3082 )" (#3188 ) This reverts commit `08d2018c10`. Feedback from root programs: https://cabforum.org/pipermail/public/2017-October/012293.html https://cabforum.org/pipermail/public/2017-October/012297.html https://cabforum.org/pipermail/public/2017-October/012358.html https://cabforum.org/pipermail/public/2017-October/012320.html Resolves #3130.	2017-10-23 11:14:56 -07:00
Roland Bracewell Shoemaker	e2cc6fbe68	Add test/chisel2.py for ACME v2 testing (#3179 ) Pulled out of https://github.com/certbot/certbot/compare/acme-v2 by @jsha, Boulder is the correct place for it to live.	2017-10-19 10:45:51 -07:00
Jacob Hoffman-Andrews	da31fc8b70	Add a renewal bit to issuedNames. (#3178 ) This is only the migration, so far. Rather than doing the feature-switch dance, we can wait for this migration to be applied, and then commit the code to start setting it, with a feature switch to start checking it, which can be turned on once we've been setting the bit in production for a week. Having this as an indexed bit on issuedNames allows us to cheaply exclude renewals from our rate limit queries, so we can avoid the ordering dependency for renewals vs new issuances on the same domain. Fixes #3161	2017-10-19 09:29:43 -04:00
Jacob Hoffman-Andrews	6cd777bd8d	Fix up stats after #3167 (#3185 ) There were two bugs in #3167: All process-level stats got prefixed with "boulder", which broke dashboards. All request_time stats got dropped, because measured_http was using the prometheus DefaultRegisterer. To fix, this PR plumbs through a scope object to measured_http, and uses an empty prefix when calling NewProcessCollector().	2017-10-18 11:14:59 -07:00
Roland Bracewell Shoemaker	06d348cab8	Remove references to RabbitMQ (#3184 )	2017-10-17 21:42:50 -04:00
Jacob Hoffman-Andrews	071fc0120f	Remove facebookgo/httpdown. (#3168 ) Its purpose is now served by net/http's Shutdown().	2017-10-17 08:55:43 -04:00
Jacob Hoffman-Andrews	600640294d	Increase default MaxIdleConns. (#3164 ) Go's default is 2: https://golang.org/src/database/sql/sql.go#L686. Graphs show we are opening 100-200 fresh connections per second on the SA. Changing this default should reduce that a lot, which should reduce load on both the SA and MariaDB. This should also improve latency, since every new TCP connection adds a little bit of latency.	2017-10-16 15:48:17 -07:00
Jacob Hoffman-Andrews	613ce0620f	Update minimum required Go version in README. (#3174 )	2017-10-14 14:16:48 -04:00
Jacob Hoffman-Andrews	f366e45756	Remove global state from metrics gathering (#3167 ) Previously, we used prometheus.DefaultRegisterer to register our stats, which uses global state to export its HTTP stats. We also used net/http/pprof's behavior of registering to the default global HTTP ServeMux, via DebugServer, which starts an HTTP server that uses that global ServeMux. In this change, I merge DebugServer's functions into StatsAndLogging. StatsAndLogging now takes an address parameter and fires off an HTTP server in a goroutine. That HTTP server is newly defined, and doesn't use DefaultServeMux. On it is registered the Prometheus stats handler, and handlers for the various pprof traces. In the process I split StatsAndLogging internally into two functions: makeStats and MakeLogger. I didn't port across the expvar variable exporting, which serves a similar function to Prometheus stats but which we never use. One nice immediate effect of this change: Since StatsAndLogging now requires and address, I noticed a bunch of commands that called StatsAndLogging, and passed around the resulting Scope, but never made use of it because they didn't run a DebugServer. Under the old StatsD world, these command still could have exported their stats by pushing, but since we moved to Prometheus their stats stopped being collected. We haven't used any of these stats, so instead of adding debug ports to all short-lived commands, or setting up a push gateway, I simply removed them and switched those commands to initialize only a Logger, no stats.	2017-10-13 11:58:01 -07:00

1 2 3 4 5 ...

4092 Commits All Branches Search

4092 Commits

All Branches