boulder

Commit Graph

Author	SHA1	Message	Date
Joel Sing	f8a023e49c	Remove various unnecessary uses of fmt.Sprintf (#3707 ) Remove various unnecessary uses of fmt.Sprintf - in particular: - Avoid calls like t.Error(fmt.Sprintf(...)), where t.Errorf can be used directly. - Use strconv when converting an integer to a string, rather than using fmt.Sprintf("%d", ...). This is simpler and can also detect type errors at compile time. - Instead of using x.Write([]byte(fmt.Sprintf(...))), use fmt.Fprintf(x, ...).	2018-05-11 11:55:25 -07:00
Daniel McCarney	aa810a3142	gRPC: publish RPC latency stat in server interceptor. (#3665 ) We may see RPCs that are dispatched by a client but do not arrive at the server for some time afterwards. To have insight into potential request latency at this layer we want to publish the time delta between when a client sent an RPC and when the server received it. This PR updates the gRPC client interceptor to add the current time to the gRPC request metadata context when it dispatches an RPC. The server side interceptor is updated to pull the client request time out of the gRPC request metadata. Using this timestamp it can calculate the latency and publish it as an observation on a Prometheus histogram. Accomplishing the above required wiring a clock through to each of the client interceptors. This caused a small diff across each of the gRPC aware boulder commands. A small unit test is included in this PR that checks that a latency stat is published to the histogram after an RPC to a test ChillerServer is made. It's difficult to do more in-depth testing because using fake clocks makes the latency 0 and using real clocks requires finding a way to queue/delay requests inside of the gRPC mechanisms not exposed to Boulder. Updates https://github.com/letsencrypt/boulder/issues/3635 - Still TODO: Explicitly logging latency in the VA, tracking outstanding RPCs as a gauge.	2018-04-25 15:37:22 -07:00
Jacob Hoffman-Andrews	4dcbf5c883	Run multiples of services in integration tests (#3662 ) Fixes #3653.	2018-04-24 16:00:40 -07:00
Roland Bracewell Shoemaker	24cd01d033	Revert to setting full addresses instead of just ports	2018-04-23 12:39:28 -07:00
Roland Bracewell Shoemaker	5c4eaf841f	Review fixes	2018-04-20 16:03:55 -07:00
Roland Bracewell Shoemaker	ccb02419c5	Revert client changes + addr debug override	2018-04-20 12:46:33 -07:00
Roland Bracewell Shoemaker	d424d0580b	Allow cli override of gRPC listen and service addresses	2018-04-20 12:35:12 -07:00
Roland Bracewell Shoemaker	0e6713e573	Randomize order of CT logs when submitting precerts (#3660 ) * Randomize order of CT logs when submitting precerts so we maximize the chances we actually exercise all of the logs in a group and not just the first in the list. * Add metrics for winning logs	2018-04-20 15:00:10 -04:00
Daniel McCarney	74d5decc67	Remove `TotalCertificates` rate limit. (#3638 ) The `TotalCertificates` rate limit serves to ensure we don't accidentally exceed our OCSP signing capacity by issuing too many certificates within a fixed period. In practice this rate limit has been fragile and the associated queries have been linked to performance problems. Since we now have better means of monitoring our OCSP signing capacity this commit removes the rate limit and associated code.	2018-04-12 13:25:47 -07:00
Daniel McCarney	299e53b237	RA,CA: Refuse to start with MaxNames == 0. (#3634 ) This commit updates the `boulder-ra` and `boulder-ca` commands to refuse to start if their configured `MaxNames` is 0 (the default value). This should always be set to a positive number. This commit also updates `csr/csr.go` to always apply the max names check since it will never be 0 after the change above. Also refactor `FailOnError` to pull out a separate `Fail` function. Related to https://github.com/letsencrypt/boulder/issues/3632	2018-04-10 10:53:23 -07:00
Roland Bracewell Shoemaker	cc5ec34539	Allow configuration of multiple DNS resolvers (#3612 ) * Allow configuration of multiple DNS resolvers * Use multiple DNS resolvers in integration tests Fixes #3611.	2018-04-05 11:51:22 -04:00
Jacob Hoffman-Andrews	9da5a7e1fc	Cleanup: TLS and GRPC configs are mandatory. (#3476 ) Our various main.go functions gated some key code on whether the TLS and/or GRPC config fields were present. Now that those fields are fully deployed in production, we can simplify the code and require them. Also, rename tls to tlsConfig everywhere to avoid confusion with the tls package. Avoid assigning to the same err from two different goroutines in boulder-ca (fix a race).	2018-02-26 10:16:50 -05:00
Roland Bracewell Shoemaker	0b53063a72	ctpolicy: Add informational logs and don't cancel remaining submissions (#3472 ) Add a set of logs which will be submitted to but not relied on for their SCTs, this allows us to test submissions to a particular log or submit to a log which is not yet approved by a browser/root program. Also add a feature which stops cancellations of remaining submissions when racing to get a SCT from a group of logs. Additionally add an informational log that always times out in config-next. Fixes #3464 and fixes #3465.	2018-02-23 21:51:50 -05:00
Roland Bracewell Shoemaker	9e23edf850	Use ctpolicy package in RA (#3422 ) And collect the metrics on success/failure rates. Built on top of #3414. Fixes #3413.	2018-02-08 13:33:42 -08:00
Maciej Dębski	44984cd84a	Implement regID whitelist for allowed challenge types. (#3352 ) This updates the PA component to allow authorization challenge types that are globally disabled if the account ID owning the authorization is on a configured whitelist for that challenge type.	2018-01-10 13:44:53 -05:00
Jacob Hoffman-Andrews	68d5cc3331	Restore gRPC metrics (#3265 ) The go-grpc-prometheus package by default registers its metrics with Prometheus' global registry. In #3167, when we stopped using the global registry, we accidentally lost our gRPC metrics. This change adds them back. Specifically, it adds two convenience functions, one for clients and one for servers, that makes the necessary metrics object and registers it. We run these in the main function of each server. I considered adding these as part of StatsAndLogging, but the corresponding ClientMetrics and ServerMetrics objects (defined by go-grpc-prometheus) need to be subsequently made available during construction of the gRPC clients and servers. We could add them as fields on Scope, but this seemed like a little too much tight coupling. Also, update go-grpc-prometheus to get the necessary methods. ``` $ go test github.com/grpc-ecosystem/go-grpc-prometheus/... ok github.com/grpc-ecosystem/go-grpc-prometheus 0.069s ? github.com/grpc-ecosystem/go-grpc-prometheus/examples/testproto [no test files] ```	2017-12-07 15:44:55 -08:00
Roland Bracewell Shoemaker	bdea281ae0	Remove CAA SERVFAIL exceptions code (#3262 ) Fixes #3080.	2017-12-05 14:39:37 -08:00
Jacob Hoffman-Andrews	f366e45756	Remove global state from metrics gathering (#3167 ) Previously, we used prometheus.DefaultRegisterer to register our stats, which uses global state to export its HTTP stats. We also used net/http/pprof's behavior of registering to the default global HTTP ServeMux, via DebugServer, which starts an HTTP server that uses that global ServeMux. In this change, I merge DebugServer's functions into StatsAndLogging. StatsAndLogging now takes an address parameter and fires off an HTTP server in a goroutine. That HTTP server is newly defined, and doesn't use DefaultServeMux. On it is registered the Prometheus stats handler, and handlers for the various pprof traces. In the process I split StatsAndLogging internally into two functions: makeStats and MakeLogger. I didn't port across the expvar variable exporting, which serves a similar function to Prometheus stats but which we never use. One nice immediate effect of this change: Since StatsAndLogging now requires and address, I noticed a bunch of commands that called StatsAndLogging, and passed around the resulting Scope, but never made use of it because they didn't run a DebugServer. Under the old StatsD world, these command still could have exported their stats by pushing, but since we moved to Prometheus their stats stopped being collected. We haven't used any of these stats, so instead of adding debug ports to all short-lived commands, or setting up a push gateway, I simply removed them and switched those commands to initialize only a Logger, no stats.	2017-10-13 11:58:01 -07:00
Jacob Hoffman-Andrews	0a72f768a7	Remove ProfileCmd. (#3166 ) These stats are now all collected by Prometheus.	2017-10-13 10:02:04 -04:00
Jacob Hoffman-Andrews	4128e0d95a	Add time-dependent integration testing (#3060 ) Fixes #3020. In order to write integration tests for some features, especially related to rate limiting, rechecking of CAA, and expiration of authzs, orders, and certs, we need to be able to fake the passage of time in integration tests. To do so, this change switches out all clock.Default() instances for cmd.Clock(), which can be set manually with the FAKECLOCK environment variable. integration-test.py now starts up all servers once before the main body of tests, with FAKECLOCK set to a date 70 days ago, and does some initial setup for a new integration test case. That test case tries to fetch a 70-day-old authz URL, and expects it to 404. In order to make this work, I also had to change a number of our test binaries to shut down cleanly in response to SIGTERM. Without that change, stopping the servers between the setup phase and the main tests caused startservers.check() to fail, because some processes exited with nonzero status. Note: This is an initial stab at things, to prove out the technique. Long-term, I think we will want to use an idiom where test cases are classes that have a number of optional setup phases that may be run at e.g. 70 days prior and 5 days prior. This could help us avoid a proliferation of global state as we add more time-dependent test cases.	2017-09-13 12:34:14 -07:00
Jacob Hoffman-Andrews	20ec1e3e4e	Filter spurious shutdown errors. (#3052 ) Previously, we would produce an error an a nonzero status code on shutdown, because gRPC's GracefulStop would cause s.Serve() to return an error. Now we filter that specific error and treat it as success. This also allows us to kill process with SIGTERM instead of SIGKILL in integration tests. Fixes #2410.	2017-09-07 13:45:32 -07:00
Jacob Hoffman-Andrews	b0c7bc1bee	Recheck CAA for authorizations older than 8 hours (#3014 ) Fixes #2889. VA now implements two gRPC services: VA and CAA. These both run on the same port, but this allows implementation of the IsCAAValid RPC to skip using the gRPC wrappers, and makes it easier to potentially separate the service into its own package in the future. RA.NewCertificate now checks the expiration times of authorizations, and will call out to VA to recheck CAA for those authorizations that were not validated recently enough.	2017-08-28 16:40:57 -07:00
Roland Bracewell Shoemaker	90ba766af9	Add NewOrder RPCs + methods to SA and RA (#2907 ) Fixes #2875, #2900 and #2901.	2017-08-11 14:24:25 -04:00
Roland Bracewell Shoemaker	05d869b005	Rename DNSResolver -> DNSClient (#2878 ) Fixes #639. This resolves something that has bugged me for two+ years, our DNSResolverImpl is not a DNS resolver, it is a DNS client. This change just makes that obvious.	2017-07-18 08:37:45 -04:00
Jacob Hoffman-Andrews	63a25bf913	Remove clientName everywhere. (#2862 ) This used to be used for AMQP queue names. Now that AMQP is gone, these consts were only used when printing a version string at startup. This changes VersionString to just use the name of the current program, and removes `const clientName = ` from many of our main.go's.	2017-07-12 10:28:54 -07:00
Roland Bracewell Shoemaker	8ce2f8b432	Basic RSA known weak key checking (#2765 ) Adds a basic truncated modulus hash check for RSA keys that can be used to check keys against the Debian `{openssl,openssh,openvpn}-blacklist` lists of weak keys generated during the [Debian weak key incident](https://wiki.debian.org/SSLkeys). Testing is gated on adding a new configuration key to the WFE, RA, and CA configs which contains the path to a directory which should contain the weak key lists. Fixes #157.	2017-05-25 09:33:58 -07:00
Jacob Hoffman-Andrews	b17b5c72a6	Remove statsd from Boulder (#2752 ) This removes the config and code to output to statsd. - Change `cmd.StatsAndLogging` to output a `Scope`, not a `Statter`. - Remove the prefixing of component name (e.g. "VA") in front of stats; this was stripped by `autoProm` but now no longer needs to be. - Delete vendored statsd client. - Delete `MockStatter` (generated by gomock) and `mocks.Statter` (hand generated) in favor of mocking `metrics.Scope`, which is the interface we now use everywhere. - Remove a few unused methods on `metrics.Scope`, and update its generated mock. - Refactor `autoProm` and add `autoRegisterer`, which can be included in a `metrics.Scope`, avoiding global state. `autoProm` now registers everything with the `prometheus.Registerer` it is given. - Change va_test.go's `setup()` to not return a stats object; instead the individual tests that care about stats override `va.stats` directly. Fixes #2639, #2733.	2017-05-15 10:19:54 -04:00
Jacob Hoffman-Andrews	6719dc17a6	Remove AMQP config and code (#2634 ) We now use gRPC everywhere.	2017-04-03 10:39:39 -04:00
Jacob Hoffman-Andrews	510e279208	Simplify gRPC TLS configs. (#2470 ) Previously, a given binary would have three TLS config fields (CA cert, cert, key) for its gRPC server, plus each of its configured gRPC clients. In typical use, we expect all three of those to be the same across both servers and clients within a given binary. This change reuses the TLSConfig type already defined for use with AMQP, adds a Load() convenience function that turns it into a *tls.Config, and configures it for use with all of the binaries. This should make configuration easier and more robust, since it more closely matches usage. This change preserves temporary backwards-compatibility for the ocsp-updater->publisher RPCs, since those are the only instances of gRPC currently enabled in production.	2017-01-06 14:19:18 -08:00
Jacob Hoffman-Andrews	9b8dacab03	Split out separate RPC services for issuing and for signing OCSP (#2452 ) This allows finer-grained control of which components can request issuance. The OCSP Updater should not be able to request issuance. Also, update test/grpc-creds/generate.sh to reissue the certs properly. Resolves #2417	2017-01-05 15:08:39 -08:00
Jacob Hoffman-Andrews	27a1446010	Move timeouts into client interceptor. (#2387 ) Previously we had custom code in each gRPC wrapper to implement timeouts. Moving the timeout code into the client interceptor allows us to simplify things and reduce code duplication.	2016-12-05 10:42:26 -05:00
Roland Bracewell Shoemaker	03fdd65bfe	Add gRPC server to SA (#2374 ) Adds a gRPC server to the SA and SA gRPC Clients to the WFE, RA, CA, Publisher, OCSP updater, orphan finder, admin revoker, and expiration mailer. Also adds a CA gRPC client to the OCSP Updater which was missed in #2193. Fixes #2347.	2016-12-02 17:24:46 -08:00
Roland Bracewell Shoemaker	a87379bc6e	Add gRPC server to RA (#2350 ) Fixes #2348.	2016-11-29 15:34:35 -08:00
Roland Bracewell Shoemaker	595204b23f	Implement improved signal catching in services that already use it (#2333 ) Implements a less RPC focused signal catch/shutdown method. Certain things that probably could also use this (i.e. `ocsp-updater`) haven't been given it as they would require rather substantial changes to allow for a graceful shutdown approach. Fixes #2298.	2016-11-18 21:05:04 -05:00
Roland Bracewell Shoemaker	c5f99453a9	Switch CT submission RPC from CA -> RA (#2304 ) With the current gRPC design the CA talks directly to the Publisher when calling SubmitToCT which crosses security bounadries (secure internal segment -> internet facing segment) which is dangerous if (however unlikely) the Publisher is compromised and there is a gRPC exploit that allows memory corruption on the caller end of a RPC which could expose sensitive information or cause arbitrary issuance. Instead we move the RPC call to the RA which is in a less sensitive network segment. Switching the call site from the CA -> RA is gated on adding the gRPC PublisherService object to the RA config. Fixes #2202.	2016-11-08 11:39:02 -08:00
Jacob Hoffman-Andrews	32c03f942b	Don't start DebugServer until server's ready. (#2271 ) This makes availability of DebugServer a better proxy for readiness of the component.	2016-10-21 16:57:14 -04:00
Jacob Hoffman-Andrews	1958dc9065	Update total issued count asynchronously. (#2246 ) Previously the lock on total issued count would exacerbate problems when the count query was slow, which it often is. Fixes #1809.	2016-10-20 14:17:34 -07:00
Roland Bracewell Shoemaker	5fabc90a16	Add IDN support (#2215 ) Add feature flagged support for issuing for IDNs, fixes #597. This patch expects that clients have performed valid IDN2008 encoding on any label that includes unicode characters. Invalid encodings (including non-compatible IDN2003 encoding) will be rejected. No script-mixing or script exclusion checks are performed as we assume that if a name is resolvable that it conforms to the registrar's policies on these matters and if it uses non-standard scripts in sub-domains etc that browsers should be the ones choosing how to display those names. Required a full update of the golang.org/x/net tree to pull in golang.org/x/net/idna, all test suites pass.	2016-10-06 13:05:37 -04:00
Roland Bracewell Shoemaker	7f0b7472e2	Add gRPC support to CA (#2193 ) Fixes #2171.	2016-09-21 14:13:43 -07:00
Roland Bracewell Shoemaker	239bf9ae0a	Very basic feature flag impl (#1705 ) Updates #1699. Adds a new package, `features`, which exposes methods to set and check if various internal features are enabled. The implementation uses global state to store the features so that services embedded in another service do not each require their own features map in order to check if something is enabled. Requires a `boulder-tools` image update to include `golang.org/x/tools/cmd/stringer`.	2016-09-20 16:29:01 -07:00
Roland Bracewell Shoemaker	e187c92715	Add gRPC client side metrics (#2151 ) Fixes #1880. Updates google.golang.org/grpc and github.com/jmhodges/clock, both test suites pass. A few of the gRPC interfaces changed so this also fixes those breakages.	2016-09-09 15:17:36 -04:00
Roland Bracewell Shoemaker	c8f1fb3e2f	Remove direct usages of go-statsd-client in favor of using metrics.Scope (#2136 ) Fixes #2118, fixes #2082.	2016-09-07 19:35:13 -04:00
Ben Irving	b587d4e663	Simplify KeyPolicy code (#2092 ) This PR, removes the allowedSigningAlgos configuration struct and hard codes a key policy. Fixes #1844	2016-07-30 16:15:19 -07:00
Patrick Figel	8cd74bf766	Make (pending)AuthorizationLifetime configurable (#2028 ) Introduces the `authorizationLifetimeDays` and `pendingAuthorizationLifetimeDays` configuration options for `RA`. If the values are missing from configuration, the code defaults back to the current values (300/7 days). fixes #2024	2016-07-12 15:18:22 -04:00
Ben Irving	298774e1db	Remove embedded (anonymous) fields from configs (#2019 ) This PR removes the use of all anonymous struct fields that were introduced by myself as per my work on splitting up boulder-config (#1962). The root of the bug was related to the loading of the json configuration file into the config struct. The config structs contained several embedded (anonymous) fields. An embedded (anonymous) field in a struct actually results in the flattening of the json structure. This caused json.Unmarshal to look not at the nested level, but at the root level of the json object and hence not find the nested field (i.e. AllowedSigningAlgos). See https://play.golang.org/p/6uVCsEu3Df for a working example. This fixes the reported bug: #2018	2016-07-07 10:16:41 -07:00
Ben Irving	c4f7fb580d	Split up boulder-config.json (RA) (#1974 ) Part of #1962	2016-06-29 13:43:55 -07:00
Jacob Hoffman-Andrews	55657fad0d	RA doesn't need CAASERVFAILExceptions. (#1992 ) In #1971 we added the CAASERVFAILExceptions config field and argument to NewDNSResolverImpl. This argument only needs to be passed to the VA, where we do CAA validations. However, I accidentally added code to the RA as well to use this new config field. This changes backs that out.	2016-06-29 11:23:58 -07:00
Jacob Hoffman-Andrews	0c0e94dfaf	Add enforcement for CAA SERVFAIL (#1971 ) https://github.com/letsencrypt/boulder/pull/1971	2016-06-28 11:00:23 -07:00
Daniel McCarney	9abc212448	Reuse valid authz for subsequent new authz requests (#1921 ) Presently clients may request a new AuthZ be created for a domain that they have already proved authorization over. This results in unnecessary bloat in the authorizations table and duplicated effort. This commit alters the `NewAuthorization` function of the RA such that before going through the work of creating a new AuthZ it checks whether there already exists a valid AuthZ for the domain/regID that expires in more than 24 hours from the current date. If there is, then we short circuit creation and return the existing AuthZ. When this case occurs the `RA.ReusedValidAuthz` counter is incremented to provide visibility. Since clients requesting a new AuthZ and getting an AuthZ back expect to turn around and post updates to the corresponding challenges we also return early in `UpdateAuthorization` when asked to update an AuthZ that is already valid. When this case occurs the `RA.ReusedValidAuthzChallenge` counter is incremented. All of the above behaviour is gated by a new RA config flag `reuseValidAuthz`. In the default case (false) the RA does not reuse any AuthZ's and instead maintains the historic behaviour; always creating a new AuthZ when requested, irregardless of whether there are already valid AuthZ's that could be reused. In the true case (enabled only in `boulder-config-next.json`) the AuthZ reuse described above is enabled. Resolves #1854	2016-06-10 16:44:16 -04:00
Ben Irving	438580f206	Remove last of UseNewVARPC (#1914 ) `UseNewVARPC` is no longer necessary and is safe to be removed. We default to using the newer VA RPC code.	2016-06-09 10:12:46 -04:00

1 2 3

145 Commits