boulder

Commit Graph

Author	SHA1	Message	Date
Aaron Gable	8556d8a801	Update VA RPCs to proto3 (#5005 ) This updates va.proto to use proto3 syntax, and updates all clients of the autogenerated code to use the new types. In particular, it removes indirection from built-in types (proto3 uses ints, rather than pointers to ints, for example). Depends on #5003 Fixes #4956	2020-08-17 15:20:51 -07:00
Aaron Gable	e2c8f6743a	Introduce new core.AcmeChallenge type (#5012 ) ACME Challenges are well-known strings ("http-01", "dns-01", and "tlsalpn-01") identifying which kind of challenge should be used to verify control of a domain. Because they are well-known and only certain values are valid, it is better to represent them as something more akin to an enum than as bare strings. This also improves our ability to ensure that an AcmeChallenge is not accidentally used as some other kind of string in a different context. This change also brings them closer in line with the existing core.AcmeResource and core.OCSPStatus string enums. Fixes #5009	2020-08-11 15:02:16 -07:00
Aaron Gable	8920b698ea	Report canceled remote validations as problems (#5011 ) Previously, canceled remote validations were simply noted and then dropped on the floor. This should be safe, as they're theoretically only canceled when the parent span (i.e. the local PerformValidation RPC) ends. But for the sake of defense-in-depth, it seems better to correctly mark canceled remote validations as having Problems, so that their results cannot be accidentally used anywhere. This results in a test behavior change: if EnforceMultiVA is on, and some RPCs are canceled, this now results in validation failure. This should not have any production impact, because remote validations should only be canceled when the parent RPC early-exits, but that only happens when EnforceMultiVA is not enabled. These tests now test a case where the other remote validations were canceled for some other reason, which should result in validation failure.	2020-08-11 09:29:49 -07:00
Aaron Gable	0f5d2064a8	Remove logic from VA PerformValidation wrapper (#5003 ) Updates the type of the ValidationAuthority's PerformValidation method to be identical to that of the corresponding auto-generated grpc method, i.e. directly taking and returning proto message types, rather than exploded arguments. This allows all logic to be removed from the VA wrappers, which will allow them to be fully removed after the migration to proto3. Also updates all tests and VA clients to adopt the new interface. Depends on #4983 (do not review first four commits) Part of #4956	2020-08-06 10:45:35 -07:00
Roland Bracewell Shoemaker	75b034637b	Update travis go versions (remove 1.14.1, add 1.15rc1) (#5002 ) Fixes #4919.	2020-08-04 12:13:09 -07:00
orangepizza	dee757c057	Remove multiva exception list code (#4933 ) Fixes #4931	2020-07-08 10:57:17 -07:00
Roland Bracewell Shoemaker	325bba3a6f	va: measure local validation latency separately (#4865 )	2020-06-12 12:44:25 -07:00
Jacob Hoffman-Andrews	4a2029b293	Use explicit fmt.Sprintf for ProblemDetails (#4787 ) In #3708, we added formatters for the the convenience methods in the `probs` package. However, in #4783, @alexzorin pointed out that we were incorrectly passing an error message through fmt.Sprintf as the format parameter rather than as a value parameter. I proposed a fix in #4784, but during code review we concluded that the underlying problem was the pattern of using format-style functions that don't have some variant of printf in the name. That makes this wrong: `probs.DNS(err.Error())`, and this right: `probs.DNS("%s", err)`. Since that's an easy mistake to make and a hard one to spot during code review, we're going to stop using this particular pattern and call `fmt.Sprintf` directly. This PR reverts #3708 and adds some `fmt.Sprintf` where needed.	2020-04-21 14:36:11 -07:00
Jacob Hoffman-Andrews	27e785f3f2	VA: Add "During secondary validation:" error prefix. (#4677 ) This should make it easier to distinguish errors that are triggered by remote failures rather than local ones.	2020-02-14 14:00:08 -05:00
Daniel McCarney	f1894f8d1d	tidy: typo fixes flagged by codespell (#4634 )	2020-01-07 14:01:26 -05:00
Roland Bracewell Shoemaker	5b2f11e07e	Switch away from old style statsd metrics wrappers (#4606 ) In a handful of places I've nuked old stats which are not used in any alerts or dashboards as they either duplicate other stats or don't provide much insight/have never actually been used. If we feel like we need them again in the future it's trivial to add them back. There aren't many dashboards that rely on old statsd style metrics, but a few will need to be updated when this change is deployed. There are also a few cases where prometheus labels have been changed from camel to snake case, dashboards that use these will also need to be updated. As far as I can tell no alerts are impacted by this change. Fixes #4591.	2019-12-18 11:08:25 -05:00
Daniel McCarney	7b60b57c33	va: log account ID in multi VA differential JSON. (#4521 ) This will reduce the amount of analysis time required to identify large integrators that aren't compatible with multi VA.	2019-10-31 13:12:28 -04:00
Daniel McCarney	ddfc620c44	va: exempt multi-va enforcement by domain/acct ID. (#4458 ) In order to move multi perspective validation forward we need to support policy in Boulder configuration that can relax multi-va requirements temporarily. A similar mechanism was used in support of the gradual deprecation of the TLS-SNI-01 challenge type and with the introduction of CAA enforcement and has shown to be a helpful tool to have available when introducing changes that are expected to break sites. When the VA "multiVAPolicyFile" is specified it is assumed to be a YAML file containing two lists: 1. disabledNames - a list of domain names that are exempt from multi VA enforcement. 2. disabledAccounts - a list of account IDs that are exempt from multi VA enforcement. When a hostname or account ID is added to the policy we'll begin communication with the related ACME account contact to establish that this is a temporary measure and the root problem will need to be addressed before an eventual cut-off date. Resolves https://github.com/letsencrypt/boulder/issues/4455	2019-10-07 16:43:11 -04:00
Daniel McCarney	d67d76388c	va: include hostname in remote VA differentials. (#4411 ) Also rename the `RemoteVA.Addresses` field. The address is always a singular value.	2019-08-30 13:32:44 -04:00
Daniel McCarney	fe23dabd69	va: add challenge type to remote VA differentials. (#4410 ) This will make data analysis of the differentials easier. Along the way I also added a unit test for `logRemoteValidationDifferentials`.	2019-08-29 14:41:14 -07:00
Daniel McCarney	4a6e34fc4e	va: clean up DNS error handling for HTTP-01 challenges. (#4409 ) This PR changes the VA to return `dns` problem type for errors when performing HTTP-01 challenges for domains that have no IP addresses, or errors looking up the IP addresses. The `va.getAddrs` function is internal to the VA and can return `berrors.BoulderError`s with a DNS type when there is an error, allowing the calling code to convert this to a problem when required using an updated `detailedError` function. This avoids some clunky conversion the HTTP-01 code was doing that misrepresented DNS level errors as connection problems with a DNS detail message. In order to add an integration test for challenge validation that results in `getAddrs` DNS level errors the Boulder tools image had to be bumped to a tag that includes the latest `pebble-challtestsrv` that supports mocking SERVFAILs. It isn't possible to mock this case with internal IP addresses because our VA test configuration does not filter internal addresses to support the testing context. Additionally this branch removes the `UnknownHostProblem` from the `probs` package: 1. It isn't used anywhere after `532c210` 2. It's not a real RFC 8555 problem type. We should/do use the DNS type for this. Resolves https://github.com/letsencrypt/boulder/issues/4407	2019-08-28 15:47:35 -04:00
Roland Bracewell Shoemaker	6f93942a04	Consistently used stdlib context package (#4229 )	2019-05-28 14:36:16 -04:00
Daniel McCarney	ea9871de1e	core: split identifier types into separate package. (#4225 ) This will allow implementing sub-problems without creating a cyclic dependency between `core` and `problems`. The `identifier` package is somewhat small/single-purpose and in the future we may want to move more "ACME" bits beyond the `identifier` types into a dedicated package outside of `core`.	2019-05-23 13:24:41 -07:00
Daniel McCarney	1d9de1cae0	va: fix flaky test_http2_http01_challenge int. test. (#4222 ) In some rare conditions the malformed HTTP response error message that we match in the VA for HTTP-01 connections to HTTP/2 servers will be returned as a raw `http.badStringError` that doesn't have a transport connection broken prefix. In these cases the existing `test_http2_http01_challenge` integration tests fails because the `h2SettingsFrameErrRegex` doesn't match the returned error. To accommodate this we make the `h2SettingsFrameErrRegex` optionally match the transport connection broken prefix.	2019-05-23 12:42:58 -04:00
Daniel McCarney	4229a29142	va: fix validationTime metric w/ multi-va full results no enforce. (#4217 )	2019-05-15 12:59:46 -04:00
Daniel McCarney	e050820fcc	va: add specific error for HTTP-01 to HTTP/2 server. (#4172 ) In practice it seems the only way to add a specific error for when an initial HTTP-01 challenge request is made to an HTTP/2 server mis-configured on `:80` is with a regex on the error string. The error returned from the stdlib `http.Client` for HTTP to an HTTP/2 server is just an `errors.ErrorString` instance without any context (once you peel it out of the wrapping `url.Error`): > Err:(*errors.errorString)(0xc420609bf0)}] errStr=[Get http://example.com/.well-known/acme-challenge/xxxxxxx: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x00\x00\x12\x04\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x80\x00\x04\x00\x01\x00\x00\x00\x05\x00\xff\xff\xff\x00\x00\x04\b\x00\x00\x00\x00\x00\u007f\xff\x00\x00\x00\x00\b\a\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01"] Even directly in the stdlib code at the place in `http/response.go` that generates the error it's using a `&badStringError{}` and just putting the byte string that `textproto` read into it. To detect this case in `detailedError` I added a pre-compiled regex that will match the net/http malformed HTTP response error for raw bytes matching an arbitrarily sized HTTP/2 SETTINGS frame. Per RFC "A SETTINGS frame MUST be sent by both endpoints at the start of a connection" and so this seems like a fairly reliable indicator of an unexpected HTTP/2 response in an HTTP/1.1 context. Thanks to @mnordhoff for the detailed notes (and RFC refs) in #3416 It made this a lot easier! Resolves #3416.	2019-04-23 14:56:37 -04:00
Daniel McCarney	cc0d15841f	va: split up va.go by challenge type. (#4170 ) When I introduced the new HTTP-01 code I did it in `va/http.go` intending to try and make the very large `va.go` file a little bit smaller. This is the continuation of that work. * `f96ad92` - moves remaining HTTP-01 specific code to `va/http.go`. * `1efb9a1` - moves TLS-ALPN-01 code into `va/tlsalpn.go`. * `95ea567` - moves DNS-01 code into `va/dns.go`. * `6ff0395` - moves unit tests from `va/va_test.go` into `va/http_test.go`, `va/tlsalpn_test.go` and `va/dns_test.go`. In the end `va/va.go` contains code related to metrics, top level RPCs (e.g. `PerformValidation`), and the multi-VA code. This makes the file lengths much more manageable overall. Note: There is certainly room for cleaning up some of the older unit test cruft from `va/va_test.go`. For now I only moved it as-is into the challenge specific test files.	2019-04-19 11:34:58 -07:00
Daniel McCarney	063a98f02a	VA: additional feature flag control for multiVA. (#4122 ) * `EnforceMultiVA` to allow configuring multiple VAs but not changing the primary VA's result based on what the remote VAs return. * `MultiVAFullResults` to allow collecting all of the remote VA results. When all results are collected a JSON log line with the differential between the primary/remote VAs is logged. Resolves https://github.com/letsencrypt/boulder/issues/4066	2019-03-25 12:23:53 -04:00
Jacob Hoffman-Andrews	677b9b88ad	Remove GSB support. (#4115 ) This is no longer enabled in prod; cleaning up the code. https://community.letsencrypt.org/t/let-s-encrypt-no-longer-checking-google-safe-browsing/82168	2019-03-15 10:24:44 -07:00
Jacob Hoffman-Andrews	d1e6d0f190	Remove TLS-SNI-01 (#4114 ) * Remove the challenge whitelist * Reduce the signature for ChallengesFor and ChallengeTypeEnabled * Some unit tests in the VA were changed from testing TLS-SNI to testing the same behavior in TLS-ALPN, when that behavior wasn't already tested. For instance timeouts during connect are now tested. Fixes #4109	2019-03-15 09:05:24 -04:00
Jacob Hoffman-Andrews	72b361d7a7	Shave 200ms off context for HTTP validations. (#4101 ) Our integration test test_http_challenge_timeout occasionally fails with boulder-ra [AUDIT] Could not communicate with VA: rpc error: code = DeadlineExceeded desc = context deadline exceeded In at least one of these cases, the VA correctly timed-out its HTTP request and logged a validation error with the correct error message. I believe that there is a race between the VA returning its validation error to the RA, and the RA timing out its gRPC call. By shaving some time off the context we should more reliably get the response back to the RA. The order the primary VA calls `PerformValidation` on configured remote VAs is also changed to be done in a random order. Resolves #4087	2019-03-11 13:46:56 -04:00
Daniel McCarney	9f5c1b9e25	VA: Remove legacy HTTP-01 validation code. (#4102 ) We're only using the simplified HTTP-01 code from `va/http.go` now 🎉 The old unit tests that still seem relevant are left in place in `va/va_test.go` instead of being moved to `va/http_test.go` to signal that they're a bit crufty and could probably use a separate cleanup. For now I'm hesitant to remove test coverage so I updated them in-place without moving them to a new home. Resolves https://github.com/letsencrypt/boulder/issues/4089	2019-03-08 11:57:39 -08:00
Jacob Hoffman-Andrews	57f97eb550	Remove ValidationRecords from VA logEvent. (#4054 ) These are included in the challenge object, which also gets logged, so including them twice was approximately doubling the size of the VA logs.	2019-02-13 10:57:16 -08:00
Daniel McCarney	2b12c6acc8	VA: Promote singleDialTimeout, add preresolvedDialer timeout test. (#4049 ) The `singleDialTimeout` field was previously a global `const` in the `va` package. Making it a field of the VA impl (and the dialer structs) makes it easier to test that it is working as expected with a smaller than normal value. A new `TestPreresolvedDialerTimeout` unit test is added that tests the fix from https://github.com/letsencrypt/boulder/pull/4046 Without the fix applied: ``` === RUN TestPreresolvedDialerTimeout --- FAIL: TestPreresolvedDialerTimeout (0.49s) http_test.go:86: fetch didn't timeout after 50ms FAIL FAIL github.com/letsencrypt/boulder/va 0.512s ``` With the fix applied: ``` === RUN TestPreresolvedDialerTimeout --- PASS: TestPreresolvedDialerTimeout (0.05s) PASS ok github.com/letsencrypt/boulder/va 1.075s ```	2019-02-12 13:59:51 -08:00
Daniel McCarney	5fbed0c49e	VA: Replace invalid UTF-8 in cert contents, proactively marshal. (#3973 ) Marshaling invalid UTF-8 strings to protocol buffers causes an error. This can happen in VA `PerformValidation` RPC responses if remote servers return invalid UTF-8 in some ACME challenge contexts. We previously fixed this for HTTP-01 and DNS-01 but missed a case where TLS-ALPN-01/TLS-SNI-01 challenge response certificate content was included in error messages without replacing invalid UTF-8. That's now fixed & unit tests are added. To aid in diagnosing any future instances the VA is also updated to proactively attempt to marshal its `PerformValidation` results before handing off to the RPC wrappers that will do the same. This way if we detect an error in marshaling the VA can audit log the escaped content for investigation purposes. Hopefully with these two efforts combined we can avoid any future VA RPC errors from UTF-8 encoding. Resolves https://github.com/letsencrypt/boulder/issues/3838	2018-12-07 14:58:12 -05:00
Daniel McCarney	bd4c254942	Use Challtestsrv for HTTP-01 integration tests, add redirect tests (#3960 ) To complete https://github.com/letsencrypt/boulder/issues/3956 the `challtestsrv` is updated such that its existing TLS-ALPN-01 challenge test server will serve HTTP-01 responses with a self-signed certificate when a non-TLS-ALPN-01 request arrives. This lets the TLS-ALPN-01 challenge server double as a HTTPS version of the HTTP challenge server. The `challtestsrv` now also supports adding/remove redirects that will be served to clients when requesting matching paths. The existing chisel/chisel2 integration tests are updated to use the `challtestsrv` instead of starting their own standalone servers. This centralizes our mock challenge responses and lets us bind the `challtestsrv` to the VA's HTTP port in `startservers.py` without clashing ports later on. New integration tests are added for HTTP-01 redirect scenarios using the updated `challtestserv`. These test cases cover: * valid HTTP -> HTTP redirect * valid HTTP -> HTTPS redirect * Invalid HTTP -> non-HTTP/HTTPS port redirect * Invalid HTTP-> non-HTTP/HTTPS protocol scheme redirect * Invalid HTTP-> bare IP redirect * Invalid HTTP redirect loop The new integration tests shook out two fixes that were required for the legacy VA HTTP-01 code (`afad22b`) and one fix for the challtestsrv mock DNS (`59b7d6d`). Resolves https://github.com/letsencrypt/boulder/issues/3956	2018-11-30 17:20:10 -05:00
Daniel McCarney	d9d2f4e9b0	VA: Simplified HTTP-01 w/ IP address URLs (#3939 ) Continued bugs from the custom dialer approach used by the VA for HTTP-01 (most recently https://github.com/letsencrypt/boulder/issues/3889) motivated a rewrite. Instead of using a custom dialer to be able to control DNS resolution for HTTP validation requests we can construct URLs for the IP addresses we resolve and overload the Host header. This avoids having to do address resolution within the dialer and eliminates the complexity of the dialer `addrInfoChan`. The only thing left for our custom dialer now is to shave some time off of the provided context to help us discern timeouts before/after connect. The existing IP preference & fallback behaviour is preserved: e.g. if a host has both IPv6 and IPv4 addresses we connect to the first IPv6 address. If there is a network error connecting to that address (e.g. an error during "dial"), we try once more with the first IPv4 address. No other retries are done. Matching existing behaviour no fallback is done for HTTP level failures on an IPv6 address (e.g. mismatched webroots, redirect loops, etc). A new Prometheus counter "http01_fallbacks" is used to keep track of the number of fallbacks performed. As a result of moving the layer at which the retry happens a fallback like described above will now produce two validation records: one for the initial IPv6 connection, and one for the IPv4 connection. Neither will have the "addressesTried" field populated, just "addressesResolved" and "addressUsed". Previously with the dialer doing the retry we would have created just one validation record with an IPv4 "addressUsed" field and both an IPv6 and IPv4 address in the "addressesTried" field. Because this is a big diff for a key part of the VA the new code is gated by the `SimplifiedVAHTTP` feature flag. Resolves #3889	2018-11-19 14:15:39 -05:00
Jacob Hoffman-Andrews	b1be4ccaed	Fix latency logging. (#3937 ) In the VA, we were rendering a Duration to JSON, which gave an integer number of nanoseconds rather than a float64 of seconds. Also, in both VA and WFE we were rendering way more precision than we needed. Millisecond precision is enough, and since we log latency for every WFE response, the extra bytes are worth saving.	2018-11-14 10:52:48 -05:00
Jacob Hoffman-Andrews	714457badc	Add a deadline to TLS handshake. (#3921 ) Previously, if a TLS handshake timed out, we would block forever in `conn.Handshake()`, leaking both a TCP connection and a goroutine. This sets a deadline on the underlying TCP connection, ensuring that `conn.Handshake()` eventually times out. Fixes #3915	2018-11-05 15:44:08 -05:00
Roland Bracewell Shoemaker	3c2888a49e	Add a counter for the tls alpn OID that is used (#3914 ) Fixes #3913.	2018-10-31 13:12:11 -04:00
Roland Bracewell Shoemaker	a9a0846ee9	Remove checks for deployed features (#3881 ) Removes the checks for a handful of deployed feature flags in preparation for removing the flags entirely. Also moves all of the currently deprecated flags to a separate section of the flags list so they can be more easily removed once purged from production configs. Fixes #3880.	2018-10-17 20:29:18 -07:00
Roland Bracewell Shoemaker	15ccea65f7	Record latency of validation instead of request/response time (#3879 ) Fixes #3862.	2018-10-05 10:59:53 -04:00
Jacob Hoffman-Andrews	69f4f666b6	Add timeout values to VA RoundTripper. (#3869 ) Fixes #3868.	2018-09-24 16:11:23 -04:00
Daniel McCarney	43b61f5c25	VA: Fix q -> %q format specifier (#3870 )	2018-09-24 09:59:22 -07:00
Jacob Hoffman-Andrews	b25b431266	Filter invalid UTF-8 from error responses. (#3845 ) For HTTP-01 challenges that return incorrect responses, the VA tries to put the first little bit of the HTTP response in the problem detail. However, VA needs to be able to serialize the problem detail as a protobuf to send it to the RA, and protobufs require string types to be UTF-8. Filter out any invalid UTF-8 sequences and replace them with REPLACEMENT CHARACTER.	2018-09-17 14:35:46 -04:00
Daniel McCarney	94bcebd658	VA: Ignore cancelled errs from remote VAs. (#3827 ) If the context provided to a remote VA's `PerformValidation` is cancelled we should not treat the returned context cancelled error as an unexpected error and should instead ignore it as an expected result.	2018-08-27 12:20:54 -04:00
Roland Bracewell Shoemaker	1ef93c3809	Support both obsolete and new TLS-ALPN OID (#3819 )	2018-08-20 10:51:33 -04:00
Daniel McCarney	2dadd5e09a	VA: Log exceptional non-problem remote VA errors. (#3760 ) Previously, if a remote VA returned an error that is not a ProblemDetail, the primary VA would log a ServerInternalProblem but not the underlying error. This commit updates performRemoteValidation to always return the full error it receives from a remote VA. This commit also adds a unittest that checks that the VA still returns a ServerInternalProblem to the RA, and that the VA audit logs the underlying error. Resolves https://github.com/letsencrypt/boulder/issues/3753	2018-06-15 10:53:16 -07:00
Roland Bracewell Shoemaker	813aa788e9	Assume acmeValidation-v1 is wrapped OCTET STRING (#3752 ) As defined by the spec.	2018-06-11 14:44:13 -07:00
Joel Sing	9c2859c87b	Add support for CAA account-uri validation. (#3736 ) This adds support for the account-uri CAA parameter as specified by section 3 of https://tools.ietf.org/html/draft-ietf-acme-caa-04, allowing issuance to be restricted to one or more ACME accounts as specified by CAA records.	2018-06-08 12:08:03 -07:00
Maciej Dębski	bb9ddb124e	Implement TLS-ALPN-01 and integration test for it (#3654 ) This implements newly proposed TLS-ALPN-01 validation method, as described in https://tools.ietf.org/html/draft-ietf-acme-tls-alpn-01 This challenge type is disabled except in the config-next tree.	2018-06-06 13:04:09 -04:00
Joel Sing	2540d59296	Implement CAA validation-methods checking. (#3716 ) When performing CAA checking respect the validation-methods parameter (if present) and restrict the allowed authorization methods to those specified. This allows a domain to restrict authorization methods that can be used with Let's Encrypt. This is largely based on PR #3003 (by @lukaslihotzki), which was landed and then later reverted due to issue #3143. The bug the resulted in the previous code being reverted has been addressed (likely inadvertently) by `76973d0f`. This implementation also includes integration tests for CAA validation-methods. Fixes issue #3143.	2018-05-23 14:32:31 -07:00
Joel Sing	9990d14654	Convert the probs functions to be formatters. (#3708 ) Many of the probs.XYZ calls are of the form probs.XYZ(fmt.Sprintf(...)). Convert these functions to take a format string and optional arguments, following the same pattern used in the errors package. Convert the various call sites to remove the now redundant fmt.Sprintf calls.	2018-05-11 11:51:16 -07:00
Joel Sing	8ebdfc60b6	Provide formatting logger functions. (#3699 ) A very large number of the logger calls are of the form log.Function(fmt.Sprintf(...)). Rather than sprinkling fmt.Sprintf at every logger call site, provide formatting versions of the logger functions and call these directly with the format and arguments. While here remove some unnecessary trailing newlines and calls to String/Error.	2018-05-10 11:06:29 -07:00
Roland Bracewell Shoemaker	b2a2a24dc3	Stop using validation record as an input/output (#3694 ) This change cleans up how `va.http01Dialer` works with regards to `core.ValidationRecord`s. Instead of using the record as both an input and a output it now uses a set of inputs and outputs information about addresses via a channel. The validation record is then constructed in the parent scope or in the redirect function instead of the dialer itself. Fixes #2730, fixes #3109, and fixes #3663.	2018-05-09 11:55:14 -04:00

1 2 3

104 Commits