Commit Graph

145 Commits

Author SHA1 Message Date
Joel Sing f8a023e49c Remove various unnecessary uses of fmt.Sprintf (#3707)
Remove various unnecessary uses of fmt.Sprintf - in particular:

- Avoid calls like t.Error(fmt.Sprintf(...)), where t.Errorf can be used directly.

- Use strconv when converting an integer to a string, rather than using
  fmt.Sprintf("%d", ...). This is simpler and can also detect type errors at
  compile time.

- Instead of using x.Write([]byte(fmt.Sprintf(...))), use fmt.Fprintf(x, ...).
2018-05-11 11:55:25 -07:00
Daniel McCarney aa810a3142 gRPC: publish RPC latency stat in server interceptor. (#3665)
We may see RPCs that are dispatched by a client but do not arrive at the server for some time afterwards. To have insight into potential request latency at this layer we want to publish the time delta between when a client sent an RPC and when the server received it.

This PR updates the gRPC client interceptor to add the current time to the gRPC request metadata context when it dispatches an RPC. The server side interceptor is updated to pull the client request time out of the gRPC request metadata. Using this timestamp it can calculate the latency and publish it as an observation on a Prometheus histogram.

Accomplishing the above required wiring a clock through to each of the client interceptors. This caused a small diff across each of the gRPC aware boulder commands.

A small unit test is included in this PR that checks that a latency stat is published to the histogram after an RPC to a test ChillerServer is made. It's difficult to do more in-depth testing because using fake clocks makes the latency 0 and using real clocks requires finding a way to queue/delay requests inside of the gRPC mechanisms not exposed to Boulder.

Updates https://github.com/letsencrypt/boulder/issues/3635 - Still TODO: Explicitly logging latency in the VA, tracking outstanding RPCs as a gauge.
2018-04-25 15:37:22 -07:00
Jacob Hoffman-Andrews 4dcbf5c883
Run multiples of services in integration tests (#3662)
Fixes #3653.
2018-04-24 16:00:40 -07:00
Roland Bracewell Shoemaker 24cd01d033 Revert to setting full addresses instead of just ports 2018-04-23 12:39:28 -07:00
Roland Bracewell Shoemaker 5c4eaf841f Review fixes 2018-04-20 16:03:55 -07:00
Roland Bracewell Shoemaker ccb02419c5 Revert client changes + addr debug override 2018-04-20 12:46:33 -07:00
Roland Bracewell Shoemaker d424d0580b Allow cli override of gRPC listen and service addresses 2018-04-20 12:35:12 -07:00
Roland Bracewell Shoemaker 0e6713e573 Randomize order of CT logs when submitting precerts (#3660)
* Randomize order of CT logs when submitting precerts so we maximize the chances we actually exercise all of the logs in a group and not just the first in the list.

* Add metrics for winning logs
2018-04-20 15:00:10 -04:00
Daniel McCarney 74d5decc67 Remove `TotalCertificates` rate limit. (#3638)
The `TotalCertificates` rate limit serves to ensure we don't
accidentally exceed our OCSP signing capacity by issuing too many
certificates within a fixed period. In practice this rate limit has been
fragile and the associated queries have been linked to performance
problems.

Since we now have better means of monitoring our OCSP signing capacity
this commit removes the rate limit and associated code.
2018-04-12 13:25:47 -07:00
Daniel McCarney 299e53b237 RA,CA: Refuse to start with MaxNames == 0. (#3634)
This commit updates the `boulder-ra` and `boulder-ca` commands to refuse
to start if their configured `MaxNames` is 0 (the default value). This
should always be set to a positive number.

This commit also updates `csr/csr.go` to always apply the max names
check since it will never be 0 after the change above.

Also refactor `FailOnError` to pull out a separate `Fail` function.

Related to https://github.com/letsencrypt/boulder/issues/3632
2018-04-10 10:53:23 -07:00
Roland Bracewell Shoemaker cc5ec34539 Allow configuration of multiple DNS resolvers (#3612)
* Allow configuration of multiple DNS resolvers
* Use multiple DNS resolvers in integration tests

Fixes #3611.
2018-04-05 11:51:22 -04:00
Jacob Hoffman-Andrews 9da5a7e1fc Cleanup: TLS and GRPC configs are mandatory. (#3476)
Our various main.go functions gated some key code on whether the TLS
and/or GRPC config fields were present. Now that those fields are fully
deployed in production, we can simplify the code and require them.
    
Also, rename tls to tlsConfig everywhere to avoid confusion with the tls
package.
    
Avoid assigning to the same err from two different goroutines in
boulder-ca (fix a race).
2018-02-26 10:16:50 -05:00
Roland Bracewell Shoemaker 0b53063a72 ctpolicy: Add informational logs and don't cancel remaining submissions (#3472)
Add a set of logs which will be submitted to but not relied on for their SCTs,
this allows us to test submissions to a particular log or submit to a log which
is not yet approved by a browser/root program.

Also add a feature which stops cancellations of remaining submissions when racing
to get a SCT from a group of logs.

Additionally add an informational log that always times out in config-next.

Fixes #3464 and fixes #3465.
2018-02-23 21:51:50 -05:00
Roland Bracewell Shoemaker 9e23edf850 Use ctpolicy package in RA (#3422)
And collect the metrics on success/failure rates. Built on top of #3414.

Fixes #3413.
2018-02-08 13:33:42 -08:00
Maciej Dębski 44984cd84a Implement regID whitelist for allowed challenge types. (#3352)
This updates the PA component to allow authorization challenge types that are globally disabled if the account ID owning the authorization is on a configured whitelist for that challenge type.
2018-01-10 13:44:53 -05:00
Jacob Hoffman-Andrews 68d5cc3331
Restore gRPC metrics (#3265)
The go-grpc-prometheus package by default registers its metrics with Prometheus' global registry. In #3167, when we stopped using the global registry, we accidentally lost our gRPC metrics. This change adds them back.

Specifically, it adds two convenience functions, one for clients and one for servers, that makes the necessary metrics object and registers it. We run these in the main function of each server.

I considered adding these as part of StatsAndLogging, but the corresponding ClientMetrics and ServerMetrics objects (defined by go-grpc-prometheus) need to be subsequently made available during construction of the gRPC clients and servers. We could add them as fields on Scope, but this seemed like a little too much tight coupling.

Also, update go-grpc-prometheus to get the necessary methods.

```
$ go test github.com/grpc-ecosystem/go-grpc-prometheus/...
ok      github.com/grpc-ecosystem/go-grpc-prometheus    0.069s
?       github.com/grpc-ecosystem/go-grpc-prometheus/examples/testproto [no test files]
```
2017-12-07 15:44:55 -08:00
Roland Bracewell Shoemaker bdea281ae0 Remove CAA SERVFAIL exceptions code (#3262)
Fixes #3080.
2017-12-05 14:39:37 -08:00
Jacob Hoffman-Andrews f366e45756 Remove global state from metrics gathering (#3167)
Previously, we used prometheus.DefaultRegisterer to register our stats, which uses global state to export its HTTP stats. We also used net/http/pprof's behavior of registering to the default global HTTP ServeMux, via DebugServer, which starts an HTTP server that uses that global ServeMux.

In this change, I merge DebugServer's functions into StatsAndLogging. StatsAndLogging now takes an address parameter and fires off an HTTP server in a goroutine. That HTTP server is newly defined, and doesn't use DefaultServeMux. On it is registered the Prometheus stats handler, and handlers for the various pprof traces. In the process I split StatsAndLogging internally into two functions: makeStats and MakeLogger. I didn't port across the expvar variable exporting, which serves a similar function to Prometheus stats but which we never use.

One nice immediate effect of this change: Since StatsAndLogging now requires and address, I noticed a bunch of commands that called StatsAndLogging, and passed around the resulting Scope, but never made use of it because they didn't run a DebugServer. Under the old StatsD world, these command still could have exported their stats by pushing, but since we moved to Prometheus their stats stopped being collected. We haven't used any of these stats, so instead of adding debug ports to all short-lived commands, or setting up a push gateway, I simply removed them and switched those commands to initialize only a Logger, no stats.
2017-10-13 11:58:01 -07:00
Jacob Hoffman-Andrews 0a72f768a7 Remove ProfileCmd. (#3166)
These stats are now all collected by Prometheus.
2017-10-13 10:02:04 -04:00
Jacob Hoffman-Andrews 4128e0d95a Add time-dependent integration testing (#3060)
Fixes #3020.

In order to write integration tests for some features, especially related to rate limiting, rechecking of CAA, and expiration of authzs, orders, and certs, we need to be able to fake the passage of time in integration tests.

To do so, this change switches out all clock.Default() instances for cmd.Clock(), which can be set manually with the FAKECLOCK environment variable. integration-test.py now starts up all servers once before the main body of tests, with FAKECLOCK set to a date 70 days ago, and does some initial setup for a new integration test case. That test case tries to fetch a 70-day-old authz URL, and expects it to 404.

In order to make this work, I also had to change a number of our test binaries to shut down cleanly in response to SIGTERM. Without that change, stopping the servers between the setup phase and the main tests caused startservers.check() to fail, because some processes exited with nonzero status.

Note: This is an initial stab at things, to prove out the technique. Long-term, I think we will want to use an idiom where test cases are classes that have a number of optional setup phases that may be run at e.g. 70 days prior and 5 days prior. This could help us avoid a proliferation of global state as we add more time-dependent test cases.
2017-09-13 12:34:14 -07:00
Jacob Hoffman-Andrews 20ec1e3e4e Filter spurious shutdown errors. (#3052)
Previously, we would produce an error an a nonzero status code on shutdown,
because gRPC's GracefulStop would cause s.Serve() to return an error. Now we
filter that specific error and treat it as success. This also allows us to kill
process with SIGTERM instead of SIGKILL in integration tests.

Fixes #2410.
2017-09-07 13:45:32 -07:00
Jacob Hoffman-Andrews b0c7bc1bee Recheck CAA for authorizations older than 8 hours (#3014)
Fixes #2889.

VA now implements two gRPC services: VA and CAA. These both run on the same port, but this allows implementation of the IsCAAValid RPC to skip using the gRPC wrappers, and makes it easier to potentially separate the service into its own package in the future.

RA.NewCertificate now checks the expiration times of authorizations, and will call out to VA to recheck CAA for those authorizations that were not validated recently enough.
2017-08-28 16:40:57 -07:00
Roland Bracewell Shoemaker 90ba766af9 Add NewOrder RPCs + methods to SA and RA (#2907)
Fixes #2875, #2900 and #2901.
2017-08-11 14:24:25 -04:00
Roland Bracewell Shoemaker 05d869b005 Rename DNSResolver -> DNSClient (#2878)
Fixes #639.

This resolves something that has bugged me for two+ years, our DNSResolverImpl is not a DNS resolver, it is a DNS client. This change just makes that obvious.
2017-07-18 08:37:45 -04:00
Jacob Hoffman-Andrews 63a25bf913 Remove clientName everywhere. (#2862)
This used to be used for AMQP queue names. Now that AMQP is gone, these consts
were only used when printing a version string at startup. This changes
VersionString to just use the name of the current program, and removes
`const clientName = ` from many of our main.go's.
2017-07-12 10:28:54 -07:00
Roland Bracewell Shoemaker 8ce2f8b432 Basic RSA known weak key checking (#2765)
Adds a basic truncated modulus hash check for RSA keys that can be used to check keys against the Debian `{openssl,openssh,openvpn}-blacklist` lists of weak keys generated during the [Debian weak key incident](https://wiki.debian.org/SSLkeys).

Testing is gated on adding a new configuration key to the WFE, RA, and CA configs which contains the path to a directory which should contain the weak key lists.

Fixes #157.
2017-05-25 09:33:58 -07:00
Jacob Hoffman-Andrews b17b5c72a6 Remove statsd from Boulder (#2752)
This removes the config and code to output to statsd.

- Change `cmd.StatsAndLogging` to output a `Scope`, not a `Statter`.
- Remove the prefixing of component name (e.g. "VA") in front of stats; this was stripped by `autoProm` but now no longer needs to be.
- Delete vendored statsd client.
- Delete `MockStatter` (generated by gomock) and `mocks.Statter` (hand generated) in favor of mocking `metrics.Scope`, which is the interface we now use everywhere.
- Remove a few unused methods on `metrics.Scope`, and update its generated mock.
- Refactor `autoProm` and add `autoRegisterer`, which can be included in a `metrics.Scope`, avoiding global state. `autoProm` now registers everything with the `prometheus.Registerer` it is given.
- Change va_test.go's `setup()` to not return a stats object; instead the individual tests that care about stats override `va.stats` directly.

Fixes #2639, #2733.
2017-05-15 10:19:54 -04:00
Jacob Hoffman-Andrews 6719dc17a6 Remove AMQP config and code (#2634)
We now use gRPC everywhere.
2017-04-03 10:39:39 -04:00
Jacob Hoffman-Andrews 510e279208 Simplify gRPC TLS configs. (#2470)
Previously, a given binary would have three TLS config fields (CA cert, cert,
key) for its gRPC server, plus each of its configured gRPC clients. In typical
use, we expect all three of those to be the same across both servers and clients
within a given binary.

This change reuses the TLSConfig type already defined for use with AMQP, adds a
Load() convenience function that turns it into a *tls.Config, and configures it
for use with all of the binaries. This should make configuration easier and more
robust, since it more closely matches usage.

This change preserves temporary backwards-compatibility for the
ocsp-updater->publisher RPCs, since those are the only instances of gRPC
currently enabled in production.
2017-01-06 14:19:18 -08:00
Jacob Hoffman-Andrews 9b8dacab03 Split out separate RPC services for issuing and for signing OCSP (#2452)
This allows finer-grained control of which components can request issuance. The OCSP Updater should not be able to request issuance.

Also, update test/grpc-creds/generate.sh to reissue the certs properly.

Resolves #2417
2017-01-05 15:08:39 -08:00
Jacob Hoffman-Andrews 27a1446010 Move timeouts into client interceptor. (#2387)
Previously we had custom code in each gRPC wrapper to implement timeouts. Moving
the timeout code into the client interceptor allows us to simplify things and
reduce code duplication.
2016-12-05 10:42:26 -05:00
Roland Bracewell Shoemaker 03fdd65bfe Add gRPC server to SA (#2374)
Adds a gRPC server to the SA and SA gRPC Clients to the WFE, RA, CA, Publisher, OCSP updater, orphan finder, admin revoker, and expiration mailer.

Also adds a CA gRPC client to the OCSP Updater which was missed in #2193.

Fixes #2347.
2016-12-02 17:24:46 -08:00
Roland Bracewell Shoemaker a87379bc6e Add gRPC server to RA (#2350)
Fixes #2348.
2016-11-29 15:34:35 -08:00
Roland Bracewell Shoemaker 595204b23f Implement improved signal catching in services that already use it (#2333)
Implements a less RPC focused signal catch/shutdown method. Certain things that probably could also use this (i.e. `ocsp-updater`) haven't been given it as they would require rather substantial changes to allow for a graceful shutdown approach.

Fixes #2298.
2016-11-18 21:05:04 -05:00
Roland Bracewell Shoemaker c5f99453a9 Switch CT submission RPC from CA -> RA (#2304)
With the current gRPC design the CA talks directly to the Publisher when calling SubmitToCT which crosses security bounadries (secure internal segment -> internet facing segment) which is dangerous if (however unlikely) the Publisher is compromised and there is a gRPC exploit that allows memory corruption on the caller end of a RPC which could expose sensitive information or cause arbitrary issuance.

Instead we move the RPC call to the RA which is in a less sensitive network segment. Switching the call site from the CA -> RA is gated on adding the gRPC PublisherService object to the RA config.

Fixes #2202.
2016-11-08 11:39:02 -08:00
Jacob Hoffman-Andrews 32c03f942b Don't start DebugServer until server's ready. (#2271)
This makes availability of DebugServer a better proxy for readiness of the
component.
2016-10-21 16:57:14 -04:00
Jacob Hoffman-Andrews 1958dc9065 Update total issued count asynchronously. (#2246)
Previously the lock on total issued count would exacerbate problems when the
count query was slow, which it often is.

Fixes #1809.
2016-10-20 14:17:34 -07:00
Roland Bracewell Shoemaker 5fabc90a16 Add IDN support (#2215)
Add feature flagged support for issuing for IDNs, fixes #597.

This patch expects that clients have performed valid IDN2008 encoding on any label that includes unicode characters. Invalid encodings (including non-compatible IDN2003 encoding) will be rejected. No script-mixing or script exclusion checks are performed as we assume that if a name is resolvable that it conforms to the registrar's policies on these matters and if it uses non-standard scripts in sub-domains etc that browsers should be the ones choosing how to display those names.

Required a full update of the golang.org/x/net tree to pull in golang.org/x/net/idna, all test suites pass.
2016-10-06 13:05:37 -04:00
Roland Bracewell Shoemaker 7f0b7472e2 Add gRPC support to CA (#2193)
Fixes #2171.
2016-09-21 14:13:43 -07:00
Roland Bracewell Shoemaker 239bf9ae0a Very basic feature flag impl (#1705)
Updates #1699.

Adds a new package, `features`, which exposes methods to set and check if various internal features are enabled. The implementation uses global state to store the features so that services embedded in another service do not each require their own features map in order to check if something is enabled.

Requires a `boulder-tools` image update to include `golang.org/x/tools/cmd/stringer`.
2016-09-20 16:29:01 -07:00
Roland Bracewell Shoemaker e187c92715 Add gRPC client side metrics (#2151)
Fixes #1880.

Updates google.golang.org/grpc and github.com/jmhodges/clock, both test suites pass. A few of the gRPC interfaces changed so this also fixes those breakages.
2016-09-09 15:17:36 -04:00
Roland Bracewell Shoemaker c8f1fb3e2f Remove direct usages of go-statsd-client in favor of using metrics.Scope (#2136)
Fixes #2118, fixes #2082.
2016-09-07 19:35:13 -04:00
Ben Irving b587d4e663 Simplify KeyPolicy code (#2092)
This PR, removes the allowedSigningAlgos configuration struct and hard codes a key policy.

Fixes #1844
2016-07-30 16:15:19 -07:00
Patrick Figel 8cd74bf766 Make (pending)AuthorizationLifetime configurable (#2028)
Introduces the `authorizationLifetimeDays` and `pendingAuthorizationLifetimeDays` configuration options for `RA`.

If the values are missing from configuration, the code defaults back to the current values (300/7 days).

fixes #2024
2016-07-12 15:18:22 -04:00
Ben Irving 298774e1db Remove embedded (anonymous) fields from configs (#2019)
This PR removes the use of all anonymous struct fields that were introduced by myself as per my work on splitting up boulder-config (#1962).

The root of the bug was related to the loading of the json configuration file into the config struct. The config structs contained several embedded (anonymous) fields. An embedded (anonymous) field in a struct actually results in the flattening of the json structure. This caused json.Unmarshal to look not at the nested level, but at the root level of the json object and hence not find the nested field (i.e. AllowedSigningAlgos).

See https://play.golang.org/p/6uVCsEu3Df for a working example.

This fixes the reported bug: #2018
2016-07-07 10:16:41 -07:00
Ben Irving c4f7fb580d Split up boulder-config.json (RA) (#1974)
Part of #1962
2016-06-29 13:43:55 -07:00
Jacob Hoffman-Andrews 55657fad0d RA doesn't need CAASERVFAILExceptions. (#1992)
In #1971 we added the CAASERVFAILExceptions config field and argument to NewDNSResolverImpl. This argument only needs to be passed to the VA, where we do CAA validations. However, I accidentally added code to the RA as well to use this new config field. This changes backs that out.
2016-06-29 11:23:58 -07:00
Jacob Hoffman-Andrews 0c0e94dfaf Add enforcement for CAA SERVFAIL (#1971)
https://github.com/letsencrypt/boulder/pull/1971
2016-06-28 11:00:23 -07:00
Daniel McCarney 9abc212448 Reuse valid authz for subsequent new authz requests (#1921)
Presently clients may request a new AuthZ be created for a domain that they have already proved authorization over. This results in unnecessary bloat in the authorizations table and duplicated effort.

This commit alters the `NewAuthorization` function of the RA such that before going through the work of creating a new AuthZ it checks whether there already exists a valid AuthZ for the domain/regID that expires in more than 24 hours from the current date. If there is, then we short circuit creation and return the existing AuthZ. When this case occurs the `RA.ReusedValidAuthz` counter is incremented to provide visibility.

Since clients requesting a new AuthZ and getting an AuthZ back expect to turn around and post updates to the corresponding challenges we also return early in `UpdateAuthorization` when asked to update an AuthZ that is already valid. When this case occurs the `RA.ReusedValidAuthzChallenge` counter is incremented.

All of the above behaviour is gated by a new RA config flag `reuseValidAuthz`. In the default case (false) the RA does **not** reuse any AuthZ's and instead maintains the historic behaviour; always creating a new AuthZ when requested, irregardless of whether there are already valid AuthZ's that could be reused. In the true case (enabled only in `boulder-config-next.json`) the AuthZ reuse described above is enabled.

Resolves #1854
2016-06-10 16:44:16 -04:00
Ben Irving 438580f206 Remove last of UseNewVARPC (#1914)
`UseNewVARPC` is no longer necessary and is safe to be removed. We default to using the newer VA RPC code.
2016-06-09 10:12:46 -04:00