Commit Graph

34 Commits

Author SHA1 Message Date
Roland Bracewell Shoemaker 842739bccd Remove deprecated features that have been purged from prod and staging configs (#4001) 2019-01-15 16:16:35 -08:00
Daniel McCarney d9d2f4e9b0
VA: Simplified HTTP-01 w/ IP address URLs (#3939)
Continued bugs from the custom dialer approach used by the VA for HTTP-01 (most recently https://github.com/letsencrypt/boulder/issues/3889) motivated a rewrite.

Instead of using a custom dialer to be able to control DNS resolution for HTTP validation requests we can construct URLs for the IP addresses we resolve and overload the Host header. This avoids having to do address resolution within the dialer and eliminates the complexity of the dialer `addrInfoChan`. The only thing left for our custom dialer now is to shave some time off of the provided context to help us discern timeouts before/after connect.

The existing IP preference & fallback behaviour is preserved: e.g. if a host has both IPv6 and IPv4 addresses we connect to the first IPv6 address. If there is a network error connecting to that address (e.g. an error during "dial"), we try once more with the first IPv4 address. No other retries are done. Matching existing behaviour no fallback is done for HTTP level failures on an IPv6 address (e.g. mismatched webroots, redirect loops, etc). A new Prometheus counter "http01_fallbacks" is used to keep track of the number of fallbacks performed.

As a result of moving the layer at which the retry happens a fallback like described above will now produce two validation records: one for the initial IPv6 connection, and one for the IPv4 connection. Neither will have the "addressesTried" field populated, just "addressesResolved" and "addressUsed". Previously with the dialer doing the retry we would have created just one validation record with an IPv4 "addressUsed" field and both an IPv6 and IPv4 address in the "addressesTried" field.

Because this is a big diff for a key part of the VA the new code is gated by the `SimplifiedVAHTTP` feature flag.

Resolves #3889
2018-11-19 14:15:39 -05:00
Roland Bracewell Shoemaker 876c727b6f Update gRPC (#3817)
Fixes #3474.
2018-08-20 10:55:42 -04:00
Joel Sing 9c2859c87b Add support for CAA account-uri validation. (#3736)
This adds support for the account-uri CAA parameter as specified by
section 3 of https://tools.ietf.org/html/draft-ietf-acme-caa-04, allowing
issuance to be restricted to one or more ACME accounts as specified by CAA
records.
2018-06-08 12:08:03 -07:00
Joel Sing 2540d59296 Implement CAA validation-methods checking. (#3716)
When performing CAA checking respect the validation-methods parameter (if
present) and restrict the allowed authorization methods to those specified.
This allows a domain to restrict authorization methods that can be used with
Let's Encrypt.

This is largely based on PR #3003 (by @lukaslihotzki), which was landed and
then later reverted due to issue #3143. The bug the resulted in the previous
code being reverted has been addressed (likely inadvertently) by 76973d0f.

This implementation also includes integration tests for CAA validation-methods.

Fixes issue #3143.
2018-05-23 14:32:31 -07:00
Jacob Hoffman-Andrews a4421ae75b Run gRPC backends on multiple IPs instead of multiple ports (#3679)
We're currently stuck on gRPC v1.1 because of a breaking change to certificate validation in gRPC 1.8. Our gRPC balancer uses a static list of multiple hostnames, and expects to validate against those hostnames. However gRPC expects that a service is one hostname, with multiple IP addresses, and validates all those IP addresses against the same hostname. See grpc/grpc-go#2012.

If we follow gRPC's assumptions, we can rip out our custom Balancer and custom TransportCredentials, and will probably have a lower-friction time in general.

This PR is the first step in doing so. In order to satisfy the "multiple IPs, one port" property of gRPC backends in our Docker container infrastructure, we switch to Docker's user-defined networking. This allows us to give the Boulder container multiple IP addresses on different local networks, and gives it different DNS aliases in each network.

In startservers.py, each shard of a service listens on a different DNS alias for that service, and therefore a different IP address. The listening port for each shard of a service is now identical.

This change also updates the gRPC service certificates. Now, each certificate that is used in a gRPC service (as opposed to something that is "only" a client) has three names. For instance, sa1.boulder, sa2.boulder, and sa.boulder (the generic service name). For now, we are validating against the specific hostnames. When we update our gRPC dependency, we will begin validating against the generic service name.

Incidentally, the DNS aliases feature of Docker allows us to get rid of some hackery in entrypoint.sh that inserted entries into /etc/hosts.

Note: Boulder now has a dependency on the DNS aliases feature in Docker. By default, docker-compose run creates a temporary container and doesn't assign any aliases to it. We now need to specify docker-compose run --use-aliases to get the correct behavior. Without --use-aliases, Boulder won't be able to resolve the hostnames it wants to bind to.
2018-05-07 10:38:31 -07:00
Jacob Hoffman-Andrews 2a1cd4981a Allow configuring gRPC's MaxConcurrentStreams (#3642)
During periods of peak load, some RPCs are significantly delayed (on the order of seconds) by client-side blocking. HTTP/2 clients have to obey a "max concurrent streams" setting sent by the server. In Go's HTTP/2 implementation, this value [defaults to 250](https://github.com/golang/net/blob/master/http2/server.go#L56), so the gRPC default is also 250. So whenever there are more than 250 requests in progress at a time, additional requests will be delayed until there is a slot available.

During this peak load, we aren't hitting limits on CPU or memory, so we should increase the max concurrent streams limit to take better advantage of our available resources. This PR adds a config field to do that.

Fixes #3641.
2018-04-12 17:17:17 -04:00
Jacob Hoffman-Andrews a4f9de9e35 Improve nesting of RPC deadlines (#3619)
gRPC passes deadline information through the RPC boundary, but client and server have the same deadline. Ideally we'd like the server to have a slightly tighter deadline than the client, so if one of the server's onward RPCs or other network calls times out, the server can pass back more detailed information to the client, rather than the client timing out the server and losing the opportunity to log more detailed information about which component caused the timeout.

In this change, I subtract 100ms from the deadline on the server side of our interceptors, using our existing serverInterceptor. I also check that there is at least 100ms remaining in which to do useful work, so the server doesn't begin a potentially expensive task only to abort it.

Fixes #3608.
2018-04-06 15:40:18 +01:00
Roland Bracewell Shoemaker cc5ec34539 Allow configuration of multiple DNS resolvers (#3612)
* Allow configuration of multiple DNS resolvers
* Use multiple DNS resolvers in integration tests

Fixes #3611.
2018-04-05 11:51:22 -04:00
Jacob Hoffman-Andrews 11434650b7 Check safe browsing at validation time (#3539)
Right now we check safe browsing at new-authz time, which introduces a possible
external dependency when calling new-authz. This is usually fine, since most safe
browsing checks can be satisfied locally, but when requests have to go external,
it can create variance in new-authz timing.

Fixes #3491.
2018-03-09 11:15:05 +00:00
Jacob Hoffman-Andrews c556a1a20d
Reduce spurious errors in integration test (#3436)
Boulder is fairly noisy about gRPC connection errors. This is a mixed
blessing: Our gRPC configuration will try to reconnect until it hits
an RPC deadline, and most likely eventually succeed. In that case,
we don't consider those to really be errors. However, in cases where
a connection is repeatedly failing, we'd like to see errors in the
logs about connection failure, rather than "deadline exceeded." So
we want to keep logging of gRPC errors.

However, right now we get a lot of these errors logged during
integration tests. They make the output hard to read, and may disguise
more serious errors. So we'd like to avoid causing such errors in
normal integration test operation.

This change reorders the startup of Boulder components by their gRPC
dependencies, so everything's backend is likely to be up and running
before it starts. It also reverses that order for clean shutdowns,
and waits for each process to exit before signalling the next one.

With these changes, I still got connection errors. Taking listenbuddy
out of the gRPC path fixed them. I believe the issue is that
listenbuddy is not a truly transparent proxy. In particular, it
accepts an inbound TCP connection before opening an outbound TCP
connection. If opening that outbound connection results in "connection
refused," it closes the inbound connection. That means gRPC sees a
"connection closed" (or "connection reset"?) rather than "connection
refused". I'm guessing it handles those cases differently, explaining
the different error results.

We've been using listenbuddy to trigger disconnects while Boulder is
running, to ensure that gRPC's reconnect code works. I think we can
probably rely on gRPC's reconnect to work. The initial problem that
led us to start testing this was a configuration problem; now that
we have the configuration we want, we should be fine and don't need
to keep testing reconnects on every integration test run.
2018-02-12 18:17:50 -08:00
Jacob Hoffman-Andrews f16c3af335 Active UDPDNS by default. (#3285)
Active UDPDNS by default
2017-12-15 12:26:45 -08:00
Roland Bracewell Shoemaker bdea281ae0 Remove CAA SERVFAIL exceptions code (#3262)
Fixes #3080.
2017-12-05 14:39:37 -08:00
Jacob Hoffman-Andrews 2fd2f9e230 Remove LegacyCAA implementation. (#3240)
Fixes #3236
2017-11-20 16:09:00 -05:00
Jacob Hoffman-Andrews 4e68fb2ff6 Switch to udp for internal DNS. (#3135)
We used to use TCP because we would request DNSSEC records from Unbound, and
they would always cause truncated records when present. Now that we no longer
request those (#2718), we can use UDP. This is better because the TCP serving
paths in Unbound are likely less thoroughly tested, and not optimized for high
load. In particular this may resolve some availability problems we've seen
recently when trying to upgrade to a more recent Unbound.

Note that this only affects the Boulder->Unbound path. The Unbound->upstream
path is already UDP by default (with TCP fallback for truncated ANSWERs).
2017-10-10 10:06:33 -04:00
Daniel McCarney 1794c56eb8 Revert "Add CAA parameter to restrict challenge type (#3003)" (#3145)
This reverts commit 23e2c4a836.
2017-10-04 12:00:44 -07:00
lukaslihotzki 23e2c4a836 Add CAA parameter to restrict challenge type (#3003)
This commit adds CAA `issue` paramter parsing and the `challenge` parameter to permit a single challenge type only. By setting `challenge=dns-01`, the nameserver keeps control over every issued certificate.
2017-10-02 11:59:47 -07:00
Jacob Hoffman-Andrews 4266853092 Implement legacy form of CAA (#3075)
This implements the pre-erratum 5065 version of CAA, behind a feature flag.

This involved refactoring DNSClient.LookupCAA to return a list of CNAMEs in addition to the CAA records, and adding an alternate lookuper that does tree-climbing on single-depth aliases.
2017-09-13 10:16:12 -04:00
Daniel McCarney 57252c3b07 Remove letsencrypt/go-safe-browsing-api dependency. (#2905)
We have migrated from our fork of `go-safe-browsing-api` to Google's
safebrowsing v4 library. This commit removes the legacy safe browsing
code.
2017-07-26 13:57:57 -07:00
Roland Bracewell Shoemaker 088b872287 Implement multi VA validation (#2802)
Adds basic multi-path validation functionality. A new method `performRemoteValidation` is added to `boulder-va` which is called if it is configured with a list of remote VA gRPC addresses. In this initial implementation the remote VAs are only used to check the validation result of the main VA, if all of the remote validations succeed but the local validation failed, the overall validation will still fail. Remote VAs use the exact same code as the local VA to perform validation. If the local validation succeeds then a configured quorum of the remote VA successes must be met in order to fully complete the validation.

This implementation assumes that metrics are collected from the remote VAs in order to have visibility into their individual validation latencies etc.

Fixes #2621.
2017-06-29 14:11:01 -07:00
Jacob Hoffman-Andrews b17b5c72a6 Remove statsd from Boulder (#2752)
This removes the config and code to output to statsd.

- Change `cmd.StatsAndLogging` to output a `Scope`, not a `Statter`.
- Remove the prefixing of component name (e.g. "VA") in front of stats; this was stripped by `autoProm` but now no longer needs to be.
- Delete vendored statsd client.
- Delete `MockStatter` (generated by gomock) and `mocks.Statter` (hand generated) in favor of mocking `metrics.Scope`, which is the interface we now use everywhere.
- Remove a few unused methods on `metrics.Scope`, and update its generated mock.
- Refactor `autoProm` and add `autoRegisterer`, which can be included in a `metrics.Scope`, avoiding global state. `autoProm` now registers everything with the `prometheus.Registerer` it is given.
- Change va_test.go's `setup()` to not return a stats object; instead the individual tests that care about stats override `va.stats` directly.

Fixes #2639, #2733.
2017-05-15 10:19:54 -04:00
Daniel McCarney 47452d6c6c Prefer IPv6 addresses, fall back to IPv4. (#2715)
This PR introduces a new feature flag "IPv6First". 

When the "IPv6First" feature is enabled the VA's HTTP dialer and TLS SNI
(01 and 02) certificate fetch requests will attempt to automatically
retry when the initial connection was to IPv6 and there is an IPv4
address available to retry with.

This resolves https://github.com/letsencrypt/boulder/issues/2623
2017-05-08 13:00:16 -07:00
Jacob Hoffman-Andrews 6719dc17a6 Remove AMQP config and code (#2634)
We now use gRPC everywhere.
2017-04-03 10:39:39 -04:00
Daniel McCarney 15e73edc5a Google Safe Browsing V4 Improvements (#2504)
This PR has three primary contributions:

1. The existing code for using the V4 safe browsing API introduced in #2446 had some bugs that are fixed in this PR.
2. A gsb-test-srv is added to provide a mock Google Safebrowsing V4 server for integration testing purposes.
3. A short integration test is added to test end-to-end GSB lookup for an "unsafe" domain.

For 1) most notably Boulder was assuming the new V4 library accepted a directory for its database persistence when it instead expects an existing file to be provided. Additionally the VA wasn't properly instantiating feature flags preventing the V4 api from being used by the VA.

For 2) the test server is designed to have a fixed set of "bad" domains (Currently just honest.achmeds.discount.hosting.com). When asked for a database update by a client it will package the list of bad domains up & send them to the client. When the client is asked to do a URL lookup it will check the local database for a matching prefix, and if found, perform a lookup against the test server. The test server will process the lookup and increment a count for how many times the bad domain was asked about.

For 3) the Boulder startservers.py was updated to start the gsb-test-srv and the VA is configured to talk to it using the V4 API. The integration test consists of attempting issuance for a domain pre-configured in the gsb-test-srv as a bad domain. If the issuance succeeds we know the GSB lookup code is faulty. If the issuance fails, we check that the gsb-test-srv received the correct number of lookups for the "bad" domain and fail if the expected isn't reality.

Notes for reviewers:

* The gsb-test-srv has to be started before anything will use it. Right now the v4 library handles database update request failures poorly and will not retry for 30min. See google/safebrowsing#44 for more information.
* There's not an easy way to test for "good" domain lookups, only hits against the list. The design of the V4 API is such that a list of prefixes is delivered to the client in the db update phase and if the domain in question matches no prefixes then the lookup is deemed unneccesary and not performed. I experimented with sending 256 1 byte prefixes to try and trick the client to always do a lookup, but the min prefix size is 4 bytes and enumerating all possible prefixes seemed gross.
* The test server has a /add endpoint that could be used by integration tests to add new domains to the block list, but it isn't being used presently. The trouble is that the client only updates its database every 30 minutes at present, and so adding a new domain will only take affect after the client updates the database.

Resolves #2448
2017-01-23 11:07:20 -08:00
Jacob Hoffman-Andrews 510e279208 Simplify gRPC TLS configs. (#2470)
Previously, a given binary would have three TLS config fields (CA cert, cert,
key) for its gRPC server, plus each of its configured gRPC clients. In typical
use, we expect all three of those to be the same across both servers and clients
within a given binary.

This change reuses the TLSConfig type already defined for use with AMQP, adds a
Load() convenience function that turns it into a *tls.Config, and configures it
for use with all of the binaries. This should make configuration easier and more
robust, since it more closely matches usage.

This change preserves temporary backwards-compatibility for the
ocsp-updater->publisher RPCs, since those are the only instances of gRPC
currently enabled in production.
2017-01-06 14:19:18 -08:00
Jacob Hoffman-Andrews 089a270453 Add instructions on load testing OCSP generation. (#2459) 2017-01-02 11:36:03 -08:00
Jacob Hoffman-Andrews 0c665b2053 Split up gRPC certificates by service. (#2453)
Previously, all gRPC services used the same client and server certificates. Now,
each service has its own certificate, which it uses for both client and server
authentication, more closely simulating production.

This also adds aliases for each of the relevant hostnames in /etc/hosts. There
may be some issues if Docker decides to rewrite /etc/hosts while Boulder is
running, but this seems to work for now.
2016-12-29 14:53:59 -08:00
Daniel McCarney 74e281c1ce Switch to Google's v4 safebrowsing library. (#2446)
Right now we are using a third-party client for the Google Safe Browsing API, but Google has recently released their own [Golang library](https://github.com/google/safebrowsing) which also supports the newer v4 API. Using this library will let us avoid fixing some lingering race conditions & unpleasantness in our fork of `go-safebrowsing-api`.

This PR adds support for using the Google library & the v4 API in place of our existing fork when the `GoogleSafeBrowsingV4` feature flag is enabled in the VA "features" configuration.

Resolves https://github.com/letsencrypt/boulder/issues/1863

Per `CONTRIBUTING.md` I also ran the unit tests for the new dependency:
```
daniel@XXXXXXXXXX:~/go/src/github.com/google/safebrowsing$ go test ./...
ok  	github.com/google/safebrowsing	3.274s
?   	github.com/google/safebrowsing/cmd/sblookup	[no test files]
?   	github.com/google/safebrowsing/cmd/sbserver	[no test files]
?   	github.com/google/safebrowsing/cmd/sbserver/statik	[no test files]
?   	github.com/google/safebrowsing/internal/safebrowsing_proto	[no test files]
ok  	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/jsonpb	0.012s
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/jsonpb/jsonpb_test_proto	[no test files]
ok  	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/proto	0.062s
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/proto/proto3_proto	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go/descriptor	[no test files]
ok  	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go/generator	0.017s
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go/grpc	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go/plugin	[no test files]
ok  	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes	0.009s
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/any	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/duration	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/empty	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/struct	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/timestamp	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/wrappers	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/rakyll/statik	[no test files]
?   	github.com/google/safebrowsing/vendor/github.com/rakyll/statik/fs	[no test files]
ok  	github.com/google/safebrowsing/vendor/golang.org/x/net/idna	0.003s
```
2016-12-27 11:18:11 -05:00
Jacob Hoffman-Andrews 1c1449b284 Improvements to tests and test configs. (#2396)
- Remove spinner from test.js. It made Travis logs hard to read.
- Listen on all interfaces for debugAddr. This makes it possible to check
  Prometheus metrics for instances running in a Docker container.
- Standardize DNS timeouts on 1s and 3 retries across all configs. This ensures
  DNS completes within the relevant RPC timeouts.
- Remove RA service queue from VA, since VA no longer uses the callback to RA on
  completing a challenge.
2016-12-05 14:35:27 -08:00
Roland Bracewell Shoemaker e2155388a1 Remove caa-checker from the tree (#2351)
The VA can internally check CAA and this additional code was deemed unneeded complexity that could be hoisted outside of Boulder. Fixes #2346.
2016-11-23 08:42:33 -05:00
Daniel McCarney 6c983e8c9e Implements client whitelisting for gRPC. (#2307)
As described in #2282, our gRPC code uses mutual TLS to authenticate both clients and servers. However, currently our gRPC servers will accept any client certificate signed by the internal CA we use to authenticate connections. Instead, we would like each server to have a list of which clients it will accept. This will improve security by preventing the compromise of one client private key being used to access endpoints unrelated to its intended scope/purpose.

This PR implements support for gRPC servers to specify a list of accepted client names. A `serverTransportCredentials` implementing `ServerHandshake` uses a `verifyClient` function to enforce that the connecting peer presents a client certificate with a SAN entry that matches an entry on the list of accepted client names

The `NewServer` function from `grpc/server.go` is updated to instantiate the `serverTransportCredentials` used by `grpc.NewServer`, specifying an accepted names list populated from the `cmd.GRPCServerConfig.ClientNames` config field.

The pre-existing client and server certificates in `test/grpc-creds/` are replaced by versions that contain SAN entries as well as subject common names. A DNS and an IP SAN entry are added to allow testing both methods of specifying allowed SANs. The `generate.sh` script is converted to use @jsha's `minica` tool (OpenSSL CLI is blech!).

An example client whitelist is added to each of the existing gRPC endpoints in config-next/ to allow the SAN of the test RPC client certificate.

Resolves #2282
2016-11-08 13:57:34 -05:00
Daniel McCarney 409f1623e6 Retires `LookupIPv6` VA flag. (#2205)
The LookupIPv6 flag has been enabled in production and isn't required anymore. This PR removes the flag entirely.

The errA and errAAAA error handling in LookupHost is left as-is, meaning that a non-nil errAAAA will not be returned to the caller. This matches the existing behaviour, and the expectations of the TestDNSLookupHost unit tests.

This commit also removes the tests from TestDNSLookupHost that tested the LookupIPv6 == false behaviours since those are no longer implemented.

Resolves #2191
2016-09-26 18:00:01 -07:00
Roland Bracewell Shoemaker 239bf9ae0a Very basic feature flag impl (#1705)
Updates #1699.

Adds a new package, `features`, which exposes methods to set and check if various internal features are enabled. The implementation uses global state to store the features so that services embedded in another service do not each require their own features map in order to check if something is enabled.

Requires a `boulder-tools` image update to include `golang.org/x/tools/cmd/stringer`.
2016-09-20 16:29:01 -07:00
Ben Irving bea8e57536 Split up boulder-config.json (VA) (#1979) 2016-07-01 13:06:50 -04:00