Commit Graph

581 Commits

Author SHA1 Message Date
Aaron Gable 4835709232
Remove support for obsolete id-pe-acmeIdentifier OID (#5906)
Current metrics show that subscribers present certificates using the
obsolete OID to identify their id-pe-acmeIdentifier extension about
an order of magnitude less often than they present the correct OID.
Remove support for the never-standardized OID.
2022-01-25 10:10:03 -08:00
Aaron Gable 8c28e49ab6
Enforce TLS1.2 when validating TLS-ALPN-01 (#5905)
RFC 8737, Section 4, states "ACME servers that implement "acme-tls/1"
MUST only negotiate TLS 1.2 [RFC5246] or higher when connecting to
clients for validation." Enforce that our outgoing connections to validate
TLS-ALPN-01 challenges do not negotiate TLS1.1.
2022-01-25 09:57:34 -08:00
Aaron Gable ab79f96d7b
Fixup staticcheck and stylecheck, and violations thereof (#5897)
Add `stylecheck` to our list of lints, since it got separated out from
`staticcheck`. Fix the way we configure both to be clearer and not
rely on regexes.

Additionally fix a number of easy-to-change `staticcheck` and
`stylecheck` violations, allowing us to reduce our number of ignored
checks.

Part of #5681
2022-01-20 16:22:30 -08:00
Aaron Gable 2f2bac4bf2
Improve readability of A and AAAA lookup errors (#5843)
When we query DNS for a host, and both the A and AAAA lookups fail or
are empty, combine both errors into a single error rather than only
returning the error from the A lookup.

Fixes #5819
Fixes #5319
2022-01-03 10:39:25 -08:00
Jacob Hoffman-Andrews 4205400a98
Lower logDNSError to info level. (#5701)
These log lines are sometimes useful for debugging, but are a normal
part of operation, not an error: Unbound will allow a response to
timeout if the remote server is too slow.
2021-10-12 10:44:54 -06:00
Samantha 6eee230d69
BDNS: Ensure DNS server addresses are dialable (#5520)
- Add function `validateServerAddress()` to `bdns/servers.go` which ensures that
  DNS server addresses are TCP/ UDP dial-able per: https://golang.org/src/net/dial.go?#L281
- Add unit test for `validateServerAddress()` in `bdns/servers_test.go`
- Update `cmd/boulder-va/main.go` to handle `bdns.NewStaticProvider()`
  potentially returning an error.
- Update unit tests in `bdns/dns_test.go`:
  - Handle `bdns.NewStaticProvider()` potentially returning an error
  - Add an IPv6 address to `TestRotateServerOnErr`
- Ensure DNS server addresses are validated by `validateServerAddress` whenever:
  - `dynamicProvider.update() is called`
  - `staticProvider` is constructed
- Construct server addresses using `net.JoinHostPost()` when
  `dynamicProvider.Addrs()` is called

Fixes #5463
2021-07-20 10:11:11 -07:00
Aaron Gable 4c581436a3
Add go1.17beta1 to CI (#5483)
Add go1.17beta1 docker images to the set of things we build,
and integrate go1.17beta1 into the set of environments CI runs.
Fix one test which breaks due to an underlying refactoring in
the `crypto/x509` stdlib package. Fix one other test which breaks
due to new guarantees in the stdlib's TLS ALPN implementation.

Also removes go1.16.5 from CI so we're only running 2 versions.

Fixes #5480
2021-07-13 10:00:04 -07:00
Aaron Gable 64c9ec350d
Unify protobuf generation (#5458)
Create script which finds every .proto file in the repo and correctly
invokes `protoc` for each. Create a single file with a `//go:generate`
directive to invoke the new script. Delete all of the other generate.go
files, so that our proto generation is unified in one place.

Fixes #5453
2021-06-07 08:49:15 -07:00
Aaron Gable 9abb39d4d6
Honeycomb integration proof-of-concept (#5408)
Add Honeycomb tracing to all Boulder components which act as
HTTP servers, gRPC servers, or gRPC clients. Add many values
which we currently emit to logs to the trace spans. Add a way to
configure the Honeycomb integration to our config files, and by
default configure all of our tests to "mute" (send nothing).

Followup changes will refine the configuration, attempt to reduce
the new dependency load, and introduce better sampling.

Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218
2021-05-24 16:13:08 -07:00
Aaron Gable a19ebfa0e9
VA: Query SRV to preload/cache DNS resolver addrs (#5360)
Abstract out the way that the bdns library keeps track of the
resolvers it uses to do DNS lookups. Create one implementation,
the `StaticProvider`, which behaves exactly the same as the old
mechanism (providing whatever names or addresses were given
in the config). Create another implementation, `DynamicProvider`,
which re-resolves the provided name on a regular basis.

The dynamic provider consumes a single name, does a lookup
on that name for any SRV records suggesting that it is running a
DNS service, and then looks up A records to get the address of
all the names returned by the SRV query. It exports its successes
and failures as a prometheus metric.

Finally, update the tests and config-next configs to work with
this new mechanism. Give sd-test-srv the capability to respond
to SRV queries, and put the names it provides into docker's
default DNS resolver.

Fixes #5306
2021-04-20 10:11:53 -07:00
Samantha 6cd59b75f2
VA: Don't follow 303 redirects (#5384)
- VA should reject redirects with an HTTP status code of 303
- Add 303 redirect test

Fixes #5358
2021-04-05 11:29:01 -07:00
Jacob Hoffman-Andrews 7194624191
Update grpc and protobuf to latest. (#5369)
protoc now generates grpc code in a separate file from protobuf code.
Also, grpc servers are now required to embed an "unimplemented"
interface from the generated .pb.go file, which provides forward
compatibility.

Update the generate.go files since the invocation for protoc has changed
with the split into .pb.org and _grpc.pb.go.

Fixes #5368
2021-04-01 17:18:15 -07:00
Aaron Gable ef1d3c4cde
Standardize on `AssertMetricWithLabelsEquals` (#5371)
Update all of our tests to use `AssertMetricWithLabelsEquals`
instead of combinations of the older `CountFoo` helpers with
simple asserts. This coalesces all of our prometheus inspection
logic into a single function, allowing the deletion of four separate
helper functions.
2021-04-01 15:20:43 -07:00
Andrew Gabbitas 3d9d5e2306
Cleanup go1.15.7 (#5374)
Remove code that is no longer needed after migrating to go1.16.x.
Remove testing with go1.15.7 in the test matrix.

Fixes #5321
2021-04-01 10:50:18 -07:00
Andrew Gabbitas 81eed0cd07
Replace invalid UTF-8 in error message (#5341)
Add processing to http body when it is passed as an error to be properly
marshalled for grpc.

Fixes #5317
2021-03-16 14:10:16 -06:00
Aaron Gable 95b77dbd25
Remove va gRPC wrapper (#5328)
Delete the ValidationAuthorityGRPCServer and ...GRPCClient structs,
and update references to instead reference the underlying vapb.VAClient
type directly. Also delete the core.ValidationAuthority interface.

Does not require updating interfaces elsewhere, as the client
wrapper already included the variadic grpc.CallOption parameter.

Fixes #5325
2021-03-11 15:38:50 -08:00
Andrew Gabbitas ceffe18dfc
Add testing for golang 1.16 (#5313)
- Add 1.16.1 to the GitHub CI test matrix
- Fix tlsalpn tests for go 1.16.1 but maintain compatibility with 1.15.x
- Fix integration tests.

Fix: #5301
Fix: #5316
2021-03-11 11:47:41 -08:00
Andrew Gabbitas f5362fba24
Add Validated time field to challenges (#5288)
Move the validated timestamp to the RA where the challenge is passed to
the SA for database storage. If a challenge becomes valid or invalid, take
the validated timestamp and store it in the attemptedAt field of the
authz2 table. Upon retrieval of the challenge from the database, add the
attemptedAt value to challenge.Validated which is passed back to the WFE
and presented to the user as part of the challenge as required in ACME
RFC8555.

Fix: #5198
2021-03-10 14:39:59 -08:00
Jacob Hoffman-Andrews 2a8f0fe6ac
Rename several items in bdns (#5260)
[Go style says](https://blog.golang.org/package-names):

> Avoid stutter. Since client code uses the package name as a prefix
> when referring to the package contents, the names for those contents
> need not repeat the package name. The HTTP server provided by the
> http package is called Server, not HTTPServer. Client code refers to
> this type as http.Server, so there is no ambiguity.

Rename DNSClient, DNSClientImpl, NewDNSClientImpl,
NewTestDNSClientImpl, DNSError, and MockDNSClient to follow those
guidelines.

Unexport DNSClientImpl and MockTimeoutError (was only used internally).

Make New and NewTest return the Client interface rather than a concrete
`impl` type.
2021-01-29 17:20:35 -08:00
Jacob Hoffman-Andrews 2a6cb72518
Speed up VA test. (#5261)
We had a test that relied on sleeping to hit a timeout. This doesn't
remove the sleep, but it does tighten the duration significantly. Brings
unit test time for the VA from 11 seconds to 1.7 seconds on my machine.
2021-01-29 17:07:58 -08:00
Andrew Gabbitas aa20bcaded
Add validated timestamp to challenges (#5253)
We do not present a validated timestamp in challenges where status = valid
as required by RFC8555.

This change is the first step to presenting challenge timestamps to the
client. It adds a timestamp to each place where we change a challenge to
valid. This only displays in the logs and will not display to the
subscriber because it is not yet stored somewhere retrievable. The next
step will be to store it in the database and then finally present it to
the client.

Part of #5198
2021-01-29 08:07:32 -08:00
Andrew Gabbitas a0d12af73c
Detect redirect loops in VA (#5234)
Currently the VA checks to see how many redirects have been followed and
bails out if greater than maxRedirect (10), but it does not check to see
if any redirect url has been followed twice which would mean a broken
infinite redirect loop. Storing the validation records for these is
relatively expensive because we store a record for each hop in the
redirect.

This change checks the previous redirect records to see if the URL has
been used before and error if it has. This will catch a redirect loop
earlier than the maxRedirect value in most cases.

Fixes #5224
2021-01-19 16:38:03 -08:00
Samantha 802d4fed9d
Return full CAA RR response from bdns to va (#5181)
When the VA encounters CAA records, it logs the contents of those
records. When those records were the result of following a chain of
CNAMEs, the CNAMEs are included as part of the response from our
recursive resolver. However, the current flow for logging the responses
logs only the CAA records, not the CNAMEs.

This change returns the complete dig-style RR response from bdns to the
va where the response of the authoritative CAA RR is string-quoted and
logged.

This dig-style RR response is quite verbose, however it is only ever
returned from bdns.LookupCAA when a CAA response is non-empty. If the CAA
response is empty only an empty string is returned.

Fixes #5082
2020-12-10 18:17:04 -08:00
Aaron Gable 294d1c31d7
Use error wrapping for berrors and tests (#5169)
This change adds two new test assertion helpers, `AssertErrorIs`
and `AssertErrorWraps`. The former is a wrapper around `errors.Is`,
and asserts that the error's wrapping chain contains a specific (i.e.
singleton) error. The latter is a wrapper around `errors.As`, and
asserts that the error's wrapping chain contains any error which is
of the given type; it also has the same unwrapping side effect as
`errors.As`, which can be useful for further assertions about the
contents of the error.

It also makes two small changes to our `berrors` package, namely
making `berrors.ErrorType` itself an error rather than just an int,
and giving `berrors.BoulderError` an `Unwrap()` method which
exposes that inner `ErrorType`. This allows us to use the two new
helpers above to make assertions about berrors, rather than
having to hand-roll equality assertions about their types.

Finally, it takes advantage of the two changes above to greatly
simplify many of the assertions in our tests, removing conditional
checks and replacing them with simple assertions.
2020-11-06 13:17:11 -08:00
Samantha 387e94407c
va: replacing error assertions with errors.As (#5136)
errors.As checks for a specific error in a wrapped error chain
(see https://golang.org/pkg/errors/#As) as opposed to asserting
that an error is of a specific type.

Part of #5010
2020-10-30 15:51:29 -07:00
Jacob Hoffman-Andrews bf7c80792d
core: move to proto3 (#5063)
Builds on #5062
Part of #5050
2020-08-31 17:58:32 -07:00
Aaron Gable 8556d8a801
Update VA RPCs to proto3 (#5005)
This updates va.proto to use proto3 syntax, and updates
all clients of the autogenerated code to use the new types.
In particular, it removes indirection from built-in types
(proto3 uses ints, rather than pointers to ints, for example).

Depends on #5003
Fixes #4956
2020-08-17 15:20:51 -07:00
Aaron Gable e2c8f6743a
Introduce new core.AcmeChallenge type (#5012)
ACME Challenges are well-known strings ("http-01", "dns-01", and
"tlsalpn-01") identifying which kind of challenge should be used
to verify control of a domain. Because they are well-known and
only certain values are valid, it is better to represent them as
something more akin to an enum than as bare strings. This also
improves our ability to ensure that an AcmeChallenge is not
accidentally used as some other kind of string in a different
context. This change also brings them closer in line with the
existing core.AcmeResource and core.OCSPStatus string enums.

Fixes #5009
2020-08-11 15:02:16 -07:00
Aaron Gable 8920b698ea
Report canceled remote validations as problems (#5011)
Previously, canceled remote validations were simply noted and then
dropped on the floor. This should be safe, as they're theoretically
only canceled when the parent span (i.e. the local PerformValidation
RPC) ends. But for the sake of defense-in-depth, it seems better to
correctly mark canceled remote validations as having Problems, so
that their results cannot be accidentally used anywhere.

This results in a test behavior change: if EnforceMultiVA is on, and
some RPCs are canceled, this now results in validation failure. This
should not have any production impact, because remote validations
should only be canceled when the parent RPC early-exits, but that only
happens when EnforceMultiVA is not enabled. These tests now test a
case where the other remote validations were canceled for some other
reason, which should result in validation failure.
2020-08-11 09:29:49 -07:00
Aaron Gable 0f5d2064a8
Remove logic from VA PerformValidation wrapper (#5003)
Updates the type of the ValidationAuthority's PerformValidation
method to be identical to that of the corresponding auto-generated
grpc method, i.e. directly taking and returning proto message
types, rather than exploded arguments.

This allows all logic to be removed from the VA wrappers, which
will allow them to be fully removed after the migration to proto3.

Also updates all tests and VA clients to adopt the new interface.

Depends on #4983 (do not review first four commits)
Part of #4956
2020-08-06 10:45:35 -07:00
Aaron Gable 634d57ce86
Use 2-space indents in all proto files (#5006)
Our proto files had a variety of indentation styles: 2 spaces,
4 spaces, 8 spaces, and tabs; sometimes mixed within the same
file. The proto3 style guide[1] says to use 2-space indents,
so this change standardizes on that.

[1] https://developers.google.com/protocol-buffers/docs/style
2020-08-05 10:38:19 -07:00
Roland Bracewell Shoemaker 75b034637b
Update travis go versions (remove 1.14.1, add 1.15rc1) (#5002)
Fixes #4919.
2020-08-04 12:13:09 -07:00
Aaron Gable 7e626b63a6
Temporarily revert CA and VA proto3 migrations (#4962) 2020-07-16 14:29:42 -07:00
Aaron Gable 281575433b
Switch VA RPCs to proto3 (#4960)
This updates va.proto to use proto3 syntax, and updates
all clients of the autogenerated code to use the new types. In
particular, it removes indirection from built-in types (proto3
uses ints, rather than pointers to ints, for example).

Fixes #4956
2020-07-16 09:16:23 -07:00
orangepizza dee757c057
Remove multiva exception list code (#4933)
Fixes #4931
2020-07-08 10:57:17 -07:00
Roland Bracewell Shoemaker 325bba3a6f
va: measure local validation latency separately (#4865) 2020-06-12 12:44:25 -07:00
Jacob Hoffman-Andrews b1347fb3b3
Upgrade to latest protoc and protoc-gen-go (#4794)
There are some changes to the code generated in the latest version, so
this modifies every .pb.go file.

Also, the way protoc-gen-go decides where to put files has changed, so
each generate.go gets the --go_opt=paths=source_relative flag to
tell protoc to continue placing output next to the input.

Remove staticcheck from build.sh; we get it via golangci-lint now.

Pass --no-document to gem install fpm; this is recommended in the fpm docs.
2020-04-23 18:54:44 -07:00
Jacob Hoffman-Andrews 4a2029b293
Use explicit fmt.Sprintf for ProblemDetails (#4787)
In #3708, we added formatters for the the convenience methods in the
`probs` package.

However, in #4783, @alexzorin pointed out that we were incorrectly
passing an error message through fmt.Sprintf as the format parameter
rather than as a value parameter.

I proposed a fix in #4784, but during code review we concluded that the
underlying problem was the pattern of using format-style functions that
don't have some variant of printf in the name. That makes this wrong:
`probs.DNS(err.Error())`, and this right: `probs.DNS("%s", err)`. Since
that's an easy mistake to make and a hard one to spot during code review,
we're going to stop using this particular pattern and call `fmt.Sprintf`
directly.

This PR reverts #3708 and adds some `fmt.Sprintf` where needed.
2020-04-21 14:36:11 -07:00
Jacob Hoffman-Andrews 2d7337dcd0
Remove newlines from log messages. (#4777)
Since Boulder's log system adds checksums to lines, but log-validator
processes entries on a per-line basis, including newlines in log
messages can cause a validation failure.
2020-04-16 16:49:08 -07:00
Jacob Hoffman-Andrews bc528cf8cd
Error when redirect target is too long. (#4775)
This can happen when a misconfiguration redirects a certain path to
itself, doubled. After 10 redirects the error message can get quite
long. Instead we halt things at 2000 bytes, which should be more than
enough.
2020-04-15 13:44:26 -07:00
Jacob Hoffman-Andrews 72deb5b798
gofmt code with -s (simplify) flag (#4763)
Found by golangci-lint's `gofmt` linter.
2020-04-08 17:25:35 -07:00
Jacob Hoffman-Andrews 75024c3ec1
Replace clock.Default() with clock.New() (#4761)
clock.Default is deprecated:
https://godoc.org/github.com/jmhodges/clock#Default
2020-04-08 17:23:43 -07:00
Jacob Hoffman-Andrews cdb0bddbd8
Prefix error names with "Err" (#4755)
Staticcheck cleanup: https://staticcheck.io/docs/checks#ST1012
2020-04-08 17:19:35 -07:00
Jacob Hoffman-Andrews 27e785f3f2
VA: Add "During secondary validation:" error prefix. (#4677)
This should make it easier to distinguish errors that are triggered by
remote failures rather than local ones.
2020-02-14 14:00:08 -05:00
Daniel McCarney f1894f8d1d
tidy: typo fixes flagged by codespell (#4634) 2020-01-07 14:01:26 -05:00
Roland Bracewell Shoemaker 5b2f11e07e Switch away from old style statsd metrics wrappers (#4606)
In a handful of places I've nuked old stats which are not used in any alerts or dashboards as they either duplicate other stats or don't provide much insight/have never actually been used. If we feel like we need them again in the future it's trivial to add them back.

There aren't many dashboards that rely on old statsd style metrics, but a few will need to be updated when this change is deployed. There are also a few cases where prometheus labels have been changed from camel to snake case, dashboards that use these will also need to be updated. As far as I can tell no alerts are impacted by this change.

Fixes #4591.
2019-12-18 11:08:25 -05:00
Daniel McCarney 6ed4ce23a8
bdns: move logDNSError to exchangeOne, log ErrId specially. (#4553)
We've found we need the context offered from logging the error closer to when it
happens in the `bdns` package rather than in the `va`. Adopting the function
requires adapting it slightly. Specifically in the new location we know it won't
be called with any timeout results, with a non-dns error, or with a nil
underlying error.

Having the logging done in `bdns` (and specifically from `exchangeOne`) also
lets us log the wire format of the query and response when we get a `dns.ErrId`
error indicating a query/response ID mismatch. A small unit test is included
that ensures the logging happens as expected.

In case it proves useful for matching against other metrics the DNS ID mismatch
error case also now increments a dedicated prometheus counter vector stat,
`dns_id_mismatch`. The stat is labelled by resolver and query type.

Resolves https://github.com/letsencrypt/boulder/issues/4532
2019-11-15 16:03:45 -05:00
Jacob Hoffman-Andrews 7f6caddc5b VA: log internal DNS errors. (#4520)
When we get a DNS error that has an internal cause (like connection
refused), we return a generic message like "networking error" to the
user to avoid revealing details that would be confusing. However, when
debugging problems with our own services, it's useful to have the
underlying errors.

This adds a helper method in the VA and calls it from each place we use
DNS errors.
2019-11-04 09:09:24 -05:00
Daniel McCarney 7b60b57c33
va: log account ID in multi VA differential JSON. (#4521)
This will reduce the amount of analysis time required to identify
large integrators that aren't compatible with multi VA.
2019-10-31 13:12:28 -04:00
Daniel McCarney 2926074a29
CI/Dev: enable TLS 1.3 (#4489)
Also update the VA's TLS-ALPN-01 TLS 1.3 unit test to not expect
a failure.
2019-10-17 14:01:38 -04:00
Daniel McCarney ddfc620c44
va: exempt multi-va enforcement by domain/acct ID. (#4458)
In order to move multi perspective validation forward we need to support policy
in Boulder configuration that can relax multi-va requirements temporarily.

A similar mechanism was used in support of the gradual deprecation of the
TLS-SNI-01 challenge type and with the introduction of CAA enforcement and has
shown to be a helpful tool to have available when introducing changes that are
expected to break sites.

When the VA "multiVAPolicyFile" is specified it is assumed to be a YAML file
containing two lists:

1. disabledNames - a list of domain names that are exempt from multi VA
   enforcement.
2. disabledAccounts - a list of account IDs that are exempt from multi VA
   enforcement.

When a hostname or account ID is added to the policy we'll begin communication
with the related ACME account contact to establish that this is a temporary
measure and the root problem will need to be addressed before an eventual
cut-off date.

Resolves https://github.com/letsencrypt/boulder/issues/4455
2019-10-07 16:43:11 -04:00
Daniel McCarney 93902965e5 Add Go 1.13 support, temporarily disable TLS 1.3 default. (#4435)
A unit test is included to verify that a TLS-ALPN-01 challenge to
a TLS 1.3 only server doesn't succeed when the `GODEBUG` value to
disable TLS 1.3 in `docker-compose.yml` is set. Without this env var
the test fails on the Go 1.13 build because of the new default:

```
=== RUN   TestTLSALPN01TLS13
--- FAIL: TestTLSALPN01TLS13 (0.04s)
    tlsalpn_test.go:531: expected problem validating TLS-ALPN-01 challenge against a TLS 1.3 only server, got nil
    FAIL
    FAIL        github.com/letsencrypt/boulder/va       0.065s
```

With the env var set the test passes, getting the expected connection
problem reporting a tls error:

```
=== RUN   TestTLSALPN01TLS13
2019/09/13 18:59:00 http: TLS handshake error from 127.0.0.1:51240: tls: client offered only unsupported versions: [303 302 301]
--- PASS: TestTLSALPN01TLS13 (0.03s)
PASS
ok      github.com/letsencrypt/boulder/va       1.054s
```

Since we plan to eventually enable TLS 1.3 support and the `GODEBUG`
mechanism tested in the above test is platform-wide vs package
specific I decided it wasn't worth the time investment to write a
similar HTTP-01 unit test that verifies the TLS 1.3 behaviour on a
HTTP-01 HTTP->HTTPS redirect.

Resolves https://github.com/letsencrypt/boulder/issues/4415
2019-09-17 11:00:58 -07:00
Daniel McCarney d67d76388c
va: include hostname in remote VA differentials. (#4411)
Also rename the `RemoteVA.Addresses` field. The address is always
a singular value.
2019-08-30 13:32:44 -04:00
Daniel McCarney fe23dabd69 va: add challenge type to remote VA differentials. (#4410)
This will make data analysis of the differentials easier. Along the way
I also added a unit test for `logRemoteValidationDifferentials`.
2019-08-29 14:41:14 -07:00
Daniel McCarney 4a6e34fc4e
va: clean up DNS error handling for HTTP-01 challenges. (#4409)
This PR changes the VA to return `dns` problem type for errors when performing
HTTP-01 challenges for domains that have no IP addresses, or errors looking up
the IP addresses.

The `va.getAddrs` function is internal to the VA and can return
`berrors.BoulderError`s with a DNS type when there is an error, allowing the
calling code to convert this to a problem when required
using an updated `detailedError` function. This avoids some clunky conversion
the HTTP-01 code was doing that misrepresented DNS level errors as connection
problems with a DNS detail message.

In order to add an integration test for challenge validation that results in
`getAddrs` DNS level errors the Boulder tools image had to be bumped to a tag
that includes the latest `pebble-challtestsrv` that
supports mocking SERVFAILs. It isn't possible to mock this case with internal IP
addresses because our VA test configuration does not filter internal addresses
to support the testing context.

Additionally this branch removes the `UnknownHostProblem` from the `probs`
package:

1. It isn't used anywhere after 532c210
2. It's not a real RFC 8555 problem type. We should/do use the
   DNS type for this.

Resolves https://github.com/letsencrypt/boulder/issues/4407
2019-08-28 15:47:35 -04:00
alexzorin df2909a7ca va: Send extValue in TLSALPN unauthorized response (#4330)
Brings it to be more in line with the responses from the other two challenges and
will hopefully make the challenge a lot easier to debug (like in the recent community 
thread).

```json
"error": {
  "type": "urn:ietf:params:acme:error:unauthorized",
  "detail": "Incorrect validation certificate for tls-alpn-01 challenge. Expected acmeValidationV1 extension value 836bf5358f8a32826c61faeff2e0225b00756f935b00ed3002cabb9d536b9f53 for this challenge but got 8539b12e31c306b81a0aedab4128722c6ad71f71f46316a3c71612f47df0e532",
  "status": 403
},
```
2019-07-11 09:08:14 -07:00
Jacob Hoffman-Andrews c8dbbf005d Handle unprintable characters in HTTP responses. (#4312)
Fixes #4244.
2019-07-02 13:42:55 -04:00
Roland Bracewell Shoemaker 6f93942a04 Consistently used stdlib context package (#4229) 2019-05-28 14:36:16 -04:00
Roland Bracewell Shoemaker e839042bae dns: Remove Authorities field from ValidationRecord (#4230) 2019-05-28 14:11:32 -04:00
Daniel McCarney ea9871de1e core: split identifier types into separate package. (#4225)
This will allow implementing sub-problems without creating a cyclic
dependency between `core` and `problems`.

The `identifier` package is somewhat small/single-purpose and in the
future we may want to move more "ACME" bits beyond the `identifier`
types into a dedicated package outside of `core`.
2019-05-23 13:24:41 -07:00
Daniel McCarney 1d9de1cae0
va: fix flaky test_http2_http01_challenge int. test. (#4222)
In some rare conditions the malformed HTTP response error message that
we match in the VA for HTTP-01 connections to HTTP/2 servers will be
returned as a raw `http.badStringError` that doesn't have a transport
connection broken prefix. In these cases the existing
`test_http2_http01_challenge` integration tests fails because the
`h2SettingsFrameErrRegex` doesn't match the returned error.

To accommodate this we make the `h2SettingsFrameErrRegex` optionally
match the transport connection broken prefix.
2019-05-23 12:42:58 -04:00
alexzorin 105fe3b8e1 va: case-insensitivity of suffixes in http redirs (#4218)
An URI host is supposed to be case-insensitive.
2019-05-16 10:52:05 -04:00
Daniel McCarney 4229a29142
va: fix validationTime metric w/ multi-va full results no enforce. (#4217) 2019-05-15 12:59:46 -04:00
Jacob Hoffman-Andrews 4c420e2bc2 bdns: Remove LookupMX. (#4202)
We used to use this for checking email domains on registration, but not
anymore.
2019-05-06 09:29:44 -04:00
Daniel McCarney e050820fcc
va: add specific error for HTTP-01 to HTTP/2 server. (#4172)
In practice it seems the only way to add a specific error for when an initial HTTP-01 challenge request is made to an HTTP/2 server mis-configured on `:80` is with a regex on the error string.

The error returned from the stdlib `http.Client` for HTTP to an HTTP/2 server is just an `errors.ErrorString` instance without any context (once you peel it out of the wrapping `url.Error`):
>  Err:(*errors.errorString)(0xc420609bf0)}] errStr=[Get http://example.com/.well-known/acme-challenge/xxxxxxx: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x00\x00\x12\x04\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x80\x00\x04\x00\x01\x00\x00\x00\x05\x00\xff\xff\xff\x00\x00\x04\b\x00\x00\x00\x00\x00\u007f\xff\x00\x00\x00\x00\b\a\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01"]

Even directly in the stdlib code at the place in `http/response.go` that generates the error it's using a `&badStringError{}` and just putting the byte string that `textproto` read into it.

To detect this case in `detailedError` I added a pre-compiled regex that will match the net/http malformed HTTP response error for raw bytes matching an arbitrarily sized HTTP/2 SETTINGS frame. Per RFC "A SETTINGS frame MUST be sent by both endpoints at the start of a connection" and so this seems like a fairly reliable indicator of an unexpected HTTP/2 response in an HTTP/1.1 context.

Thanks to @mnordhoff for the detailed notes (and RFC refs) in #3416 It made this a lot easier!

Resolves #3416.
2019-04-23 14:56:37 -04:00
Daniel McCarney 2f3c703a72 va: add specific error for broken HTTP-01 redirects. (#4171)
Often folks will mis-configure their webserver to send an HTTP redirect missing
a `/' between the FQDN and the path.

E.g. in Apache using:

Redirect / https://bad-redirect.org

Instead of:

Redirect / https://bad-redirect.org/

Will produce an invalid HTTP-01 redirect target like:

https://bad-redirect.org.well-known/acme-challenge/xxxx

This happens frequently enough we want to return a distinct error message
for this case by detecting the redirect targets ending in ".well-known".

After the "Simple HTTP-01" code landed this case was previously getting an error
message of the form:
> "Invalid hostname in redirect target, must end in IANA registered TLD"

Resolves https://github.com/letsencrypt/boulder/issues/3606
2019-04-23 10:50:47 -07:00
Daniel McCarney cc0d15841f va: split up va.go by challenge type. (#4170)
When I introduced the new HTTP-01 code I did it in `va/http.go` intending to try and make the very large `va.go` file a little bit smaller. This is the continuation of that work.

* f96ad92 - moves remaining HTTP-01 specific code to `va/http.go`.
* 1efb9a1 - moves TLS-ALPN-01 code into `va/tlsalpn.go`.
* 95ea567 - moves DNS-01 code into `va/dns.go`.
* 6ff0395 - moves unit tests from `va/va_test.go` into `va/http_test.go`, `va/tlsalpn_test.go` and `va/dns_test.go`.

In the end `va/va.go` contains code related to metrics, top level RPCs (e.g. `PerformValidation`), and the multi-VA code. This makes the file lengths much more manageable overall.

Note: There is certainly room for cleaning up some of the older unit test cruft from `va/va_test.go`. For now I only moved it as-is into the challenge specific test files.
2019-04-19 11:34:58 -07:00
Jacob Hoffman-Andrews d2d5f0a328 Update miekg/dns and golang/protobuf. (#4150)
Precursor to #4116. Since some of our dependencies impose a minimum
version on these two packages higher than what we have in Godeps, we'll
have to bump them anyhow. Bumping them independently of the modules
update should keep things a little simpler.

In order to get protobuf tests to pass, I had to update protoc-gen-go in
boulder-tools. Now we download a prebuilt binary instead of using the
Ubuntu package, which is stuck on 3.0.0. This also meant I needed to
re-generate our pb.go files, since the new version generates somewhat
different output.

This happens to change the tag for pbutil, but it's not a substantive change - they just added a tagged version where there was none.

$ go test github.com/miekg/dns/...
ok      github.com/miekg/dns    4.675s
ok      github.com/miekg/dns/dnsutil    0.003s

ok      github.com/golang/protobuf/descriptor   (cached)
ok      github.com/golang/protobuf/jsonpb       (cached)
?       github.com/golang/protobuf/jsonpb/jsonpb_test_proto     [no test files]
ok      github.com/golang/protobuf/proto        (cached)
?       github.com/golang/protobuf/proto/proto3_proto   [no test files]
?       github.com/golang/protobuf/proto/test_proto     [no test files]
ok      github.com/golang/protobuf/protoc-gen-go        (cached)
?       github.com/golang/protobuf/protoc-gen-go/descriptor     [no test files]
ok      github.com/golang/protobuf/protoc-gen-go/generator      (cached)
ok      github.com/golang/protobuf/protoc-gen-go/generator/internal/remap       (cached)
?       github.com/golang/protobuf/protoc-gen-go/grpc   [no test files]
?       github.com/golang/protobuf/protoc-gen-go/plugin [no test files]
ok      github.com/golang/protobuf/ptypes       (cached)
?       github.com/golang/protobuf/ptypes/any   [no test files]
?       github.com/golang/protobuf/ptypes/duration      [no test files]
?       github.com/golang/protobuf/ptypes/empty [no test files]
?       github.com/golang/protobuf/ptypes/struct        [no test files]
?       github.com/golang/protobuf/ptypes/timestamp     [no test files]
?       github.com/golang/protobuf/ptypes/wrappers      [no test files]
2019-04-09 09:27:28 -07:00
Jacob Hoffman-Andrews ff3129247d Put features.Reset in unitest setup functions. (#4129)
Previously we relied on each instance of `features.Set` to have a
corresponding `defer features.Reset()`. If we forget that, we can wind
up with unexpected behavior where features set in one test case leak
into another test case. This led to the bug in
https://github.com/letsencrypt/boulder/issues/4118 going undetected.

Fix #4120
2019-04-02 10:14:38 -07:00
Daniel McCarney 063a98f02a
VA: additional feature flag control for multiVA. (#4122)
* `EnforceMultiVA` to allow configuring multiple VAs but not changing the primary VA's result based on what the remote VAs return.
* `MultiVAFullResults` to allow collecting all of the remote VA results. When all results are collected a JSON log line with the differential between the primary/remote VAs is logged.

Resolves https://github.com/letsencrypt/boulder/issues/4066
2019-03-25 12:23:53 -04:00
Daniel McCarney 56c18f6d96
VA: reject HTTP-01 redirects to bad hostnames earlier. (#4123)
* A redirect without a hostname is obviously bad and should get
a distinct error message as early as possible.

* A redirect to a hostname that doesn't end in an IANA registered TLD is
also obviously bad and should get a distinct error message as early as
possible.
2019-03-19 08:31:38 -04:00
Jacob Hoffman-Andrews 7891cec305 Fix includes. (#4119)
Two recent PRs had a bad merge and resulted in this unused include left behind.
2019-03-15 11:05:57 -07:00
Jacob Hoffman-Andrews 677b9b88ad Remove GSB support. (#4115)
This is no longer enabled in prod; cleaning up the code.

https://community.letsencrypt.org/t/let-s-encrypt-no-longer-checking-google-safe-browsing/82168
2019-03-15 10:24:44 -07:00
Jacob Hoffman-Andrews d1e6d0f190 Remove TLS-SNI-01 (#4114)
* Remove the challenge whitelist
* Reduce the signature for ChallengesFor and ChallengeTypeEnabled
* Some unit tests in the VA were changed from testing TLS-SNI to testing the same behavior
  in TLS-ALPN, when that behavior wasn't already tested. For instance timeouts during connect 
  are now tested.

Fixes #4109
2019-03-15 09:05:24 -04:00
Jacob Hoffman-Andrews 72b361d7a7 Shave 200ms off context for HTTP validations. (#4101)
Our integration test test_http_challenge_timeout occasionally fails with

boulder-ra [AUDIT] Could not communicate with VA: rpc
  error: code = DeadlineExceeded desc = context deadline exceeded

In at least one of these cases, the VA correctly timed-out its HTTP
request and logged a validation error with the correct error message.
I believe that there is a race between the VA returning its validation
error to the RA, and the RA timing out its gRPC call. By shaving some
time off the context we should more reliably get the response back to
the RA.

The order the primary VA calls `PerformValidation` on configured 
remote VAs is also changed to be done in a random order.

Resolves #4087
2019-03-11 13:46:56 -04:00
Daniel McCarney 9f5c1b9e25 VA: Remove legacy HTTP-01 validation code. (#4102)
We're only using the simplified HTTP-01 code from `va/http.go` now 🎉 The old unit tests that still seem relevant are left in place in `va/va_test.go` instead of being moved to `va/http_test.go` to signal that they're a bit crufty and could probably use a separate cleanup. For now I'm hesitant to remove test coverage so I updated them in-place without moving them to a new home.

Resolves https://github.com/letsencrypt/boulder/issues/4089
2019-03-08 11:57:39 -08:00
Romuald Brunet cc4ce59d7d VA: fix typo in checkCAARecords comment (#4103) 2019-03-08 13:03:39 -05:00
Jacob Hoffman-Andrews f816ca5e0d Improve error in TestPerformRemoveValidationFail (#4100)
When this test fails, it logs the fact that it got the wrong type of
ProblemDetail, but not what the actual ProblemDetail was. Fixing that
will make it easier to track down intermittent failures.
2019-03-07 13:53:27 -08:00
Roland Bracewell Shoemaker 232a5f828f Fix ineffectual assignments (#4052)
* in boulder-ra we connected to the publisher and created a publisher gRPC client twice for no apparent reason
* in the SA we ignored errors from `getChallenges` in `GetAuthorizations` which could result in a nil challenge being returned in an authorization
2019-02-13 15:39:58 -05:00
Jacob Hoffman-Andrews 57f97eb550 Remove ValidationRecords from VA logEvent. (#4054)
These are included in the challenge object, which also gets logged, so
including them twice was approximately doubling the size of the VA logs.
2019-02-13 10:57:16 -08:00
Daniel McCarney 2b12c6acc8 VA: Promote singleDialTimeout, add preresolvedDialer timeout test. (#4049)
The `singleDialTimeout` field was previously a global `const` in the
`va` package. Making it a field of the VA impl (and the dialer structs)
makes it easier to test that it is working as expected with a smaller
than normal value.

A new `TestPreresolvedDialerTimeout` unit test is added that tests the
fix from https://github.com/letsencrypt/boulder/pull/4046

Without the fix applied:
```
=== RUN   TestPreresolvedDialerTimeout
--- FAIL: TestPreresolvedDialerTimeout (0.49s)
    http_test.go:86: fetch didn't timeout after 50ms
    FAIL
    FAIL  github.com/letsencrypt/boulder/va 0.512s
```

With the fix applied:
```
=== RUN   TestPreresolvedDialerTimeout
--- PASS: TestPreresolvedDialerTimeout (0.05s)
PASS
ok    github.com/letsencrypt/boulder/va 1.075s
```
2019-02-12 13:59:51 -08:00
Daniel McCarney 1c0be52e53 VA: Add integration test for HTTP timeouts. (#4050)
Also update `TestHTTPTimeout` to test with the `SimplifiedVAHTTP`
feature flag enabled.
2019-02-12 13:42:01 -08:00
Daniel McCarney c37355b40b VA: Use correct Timeout for SimplifiedVAHTTP reqs. (#4046)
The `DefaultTransport`'s `DialContext` sets a `Timeout` and `KeepAlive`
of 30 seconds. When configured with the `SimplifiedVAHTTP` feature flag
we need to use a shorter `Timeout`.
2019-02-11 11:03:03 -08:00
Daniel McCarney 98663717d8
VA: Rework SimplifiedVAHTTP for pre-resolved dials. (#4016)
The URL construction approach we were previously using for the refactored VA HTTP-01 validation code was nice but broke SNI for HTTP->HTTPS redirects. In order to preserve this functionality we need to use a custom `DialContext` handler on the HTTP Transport that overrides the target host to use a pre-resolved IP.

Resolves https://github.com/letsencrypt/boulder/issues/3969
2019-01-21 15:08:40 -05:00
Roland Bracewell Shoemaker 93ac7fbe9e
Modify authorization creation to allow for new style storage schema (#3998)
Adds a feature which gates creation of authorizations following the style required for the new schema (and which can be used for gating the reset of our new schema code later down the road).

There was an internal discussion about an issue this creates regarding a predictable ordering of challenges within a challenge due to sequential challenge IDs which will always be static for each challenge type. It was suggested we could add some kind of obfuscation to the challenge ID when presented to the user to prevent this. This hasn't been done in this PR as it would only be focused in the WFE and would be better suited as its own changeset.

Fixes #3981.
2019-01-17 17:09:38 -08:00
Daniel McCarney 11433e1ea0
VA: Fix SimplifiedVAHTTP01 redirect query param handling. (#3988)
When the `SimplifiedVAHTTP01` feature flag is enabled we need to
preserve query parameters when reconstructing a redirect URL for the
resolved IP address.

To add integration testing for this condition the Boulder tools images
are updated to in turn pull in an updated `pebble-challtestsrv` command
that tracks request history.

A new Python wrapper for the `pebble-challtestsrv` HTTP API is added to
centralize interacting with the chall test srv to add mock data and to
get the history of HTTP requests that have been processed.
2019-01-04 14:20:44 -05:00
Daniel McCarney 5fbed0c49e
VA: Replace invalid UTF-8 in cert contents, proactively marshal. (#3973)
Marshaling invalid UTF-8 strings to protocol buffers causes an error. This can
happen in VA `PerformValidation` RPC responses if remote servers return invalid
UTF-8 in some ACME challenge contexts. We previously fixed this for HTTP-01 and
DNS-01 but missed a case where TLS-ALPN-01/TLS-SNI-01 challenge response
certificate content was included in error messages without replacing invalid
UTF-8. That's now fixed & unit tests are added.

To aid in diagnosing any future instances the VA is also updated to proactively
attempt to marshal its `PerformValidation` results before handing off to the RPC
wrappers that will do the same. This way if we detect an error in marshaling the
VA can audit log the escaped content for investigation purposes.

Hopefully with these two efforts combined we can avoid any future VA RPC errors
from UTF-8 encoding.

Resolves https://github.com/letsencrypt/boulder/issues/3838
2018-12-07 14:58:12 -05:00
Jacob Hoffman-Andrews 1ad8c70c36 Lower log level for following redirects. (#3972) 2018-12-06 16:47:40 -08:00
Daniel McCarney bd4c254942
Use Challtestsrv for HTTP-01 integration tests, add redirect tests (#3960)
To complete https://github.com/letsencrypt/boulder/issues/3956 the `challtestsrv` is updated such that its existing TLS-ALPN-01 challenge test server will serve HTTP-01 responses with a self-signed certificate when a non-TLS-ALPN-01 request arrives. This lets the TLS-ALPN-01 challenge server double as a HTTPS version of the HTTP challenge server. The `challtestsrv` now also supports adding/remove redirects that will be served to clients when requesting matching paths.

The existing chisel/chisel2 integration tests are updated to use the `challtestsrv` instead of starting their own standalone servers. This centralizes our mock challenge responses and lets us bind the `challtestsrv` to the VA's HTTP port in `startservers.py` without clashing ports later on.

New integration tests are added for HTTP-01 redirect scenarios using the updated `challtestserv`. These test cases cover:
* valid HTTP -> HTTP redirect
* valid HTTP -> HTTPS redirect
* Invalid HTTP -> non-HTTP/HTTPS port redirect
* Invalid HTTP-> non-HTTP/HTTPS protocol scheme redirect
* Invalid HTTP-> bare IP redirect
* Invalid HTTP redirect loop

The new integration tests shook out two fixes that were required for the legacy VA HTTP-01 code (afad22b) and one fix for the challtestsrv mock DNS (59b7d6d).

Resolves https://github.com/letsencrypt/boulder/issues/3956
2018-11-30 17:20:10 -05:00
Daniel McCarney 8a610e5828
VA: Fix SimplifiedHTTP01 w/ HTTPS redirects. (#3959)
The VA code activated by the `SimplifiedHTTP01` flag had a regression
that constructed validation URLs for HTTP-01 challenge redirects to
a HTTPS host incorrectly.

The regression is fixed and the unit tests are updated to cover this
case. An integration test is not included in this commit but will be
done as follow-up work since it requires adjusting existing integration
tests for HTTP-01 to use the challtestrv instead of Certbot's ACME
standalone server.
2018-11-30 08:12:24 -05:00
Daniel McCarney d9d2f4e9b0
VA: Simplified HTTP-01 w/ IP address URLs (#3939)
Continued bugs from the custom dialer approach used by the VA for HTTP-01 (most recently https://github.com/letsencrypt/boulder/issues/3889) motivated a rewrite.

Instead of using a custom dialer to be able to control DNS resolution for HTTP validation requests we can construct URLs for the IP addresses we resolve and overload the Host header. This avoids having to do address resolution within the dialer and eliminates the complexity of the dialer `addrInfoChan`. The only thing left for our custom dialer now is to shave some time off of the provided context to help us discern timeouts before/after connect.

The existing IP preference & fallback behaviour is preserved: e.g. if a host has both IPv6 and IPv4 addresses we connect to the first IPv6 address. If there is a network error connecting to that address (e.g. an error during "dial"), we try once more with the first IPv4 address. No other retries are done. Matching existing behaviour no fallback is done for HTTP level failures on an IPv6 address (e.g. mismatched webroots, redirect loops, etc). A new Prometheus counter "http01_fallbacks" is used to keep track of the number of fallbacks performed.

As a result of moving the layer at which the retry happens a fallback like described above will now produce two validation records: one for the initial IPv6 connection, and one for the IPv4 connection. Neither will have the "addressesTried" field populated, just "addressesResolved" and "addressUsed". Previously with the dialer doing the retry we would have created just one validation record with an IPv4 "addressUsed" field and both an IPv6 and IPv4 address in the "addressesTried" field.

Because this is a big diff for a key part of the VA the new code is gated by the `SimplifiedVAHTTP` feature flag.

Resolves #3889
2018-11-19 14:15:39 -05:00
Jacob Hoffman-Andrews b1be4ccaed Fix latency logging. (#3937)
In the VA, we were rendering a Duration to JSON, which gave an integer
number of nanoseconds rather than a float64 of seconds. Also, in both VA
and WFE we were rendering way more precision than we needed. Millisecond
precision is enough, and since we log latency for every WFE response,
the extra bytes are worth saving.
2018-11-14 10:52:48 -05:00
Jacob Hoffman-Andrews 714457badc Add a deadline to TLS handshake. (#3921)
Previously, if a TLS handshake timed out, we would block forever in
`conn.Handshake()`, leaking both a TCP connection and a goroutine.
This sets a deadline on the underlying TCP connection, ensuring that
`conn.Handshake()` eventually times out.

Fixes #3915
2018-11-05 15:44:08 -05:00
Roland Bracewell Shoemaker 3c2888a49e Add a counter for the tls alpn OID that is used (#3914)
Fixes #3913.
2018-10-31 13:12:11 -04:00
Roland Bracewell Shoemaker a9a0846ee9
Remove checks for deployed features (#3881)
Removes the checks for a handful of deployed feature flags in preparation for removing the flags entirely. Also moves all of the currently deprecated flags to a separate section of the flags list so they can be more easily removed once purged from production configs.

Fixes #3880.
2018-10-17 20:29:18 -07:00
Roland Bracewell Shoemaker 15ccea65f7
Record latency of validation instead of request/response time (#3879)
Fixes #3862.
2018-10-05 10:59:53 -04:00
Jacob Hoffman-Andrews 69f4f666b6 Add timeout values to VA RoundTripper. (#3869)
Fixes #3868.
2018-09-24 16:11:23 -04:00
Daniel McCarney 43b61f5c25 VA: Fix q -> %q format specifier (#3870) 2018-09-24 09:59:22 -07:00
Jacob Hoffman-Andrews b25b431266 Filter invalid UTF-8 from error responses. (#3845)
For HTTP-01 challenges that return incorrect responses, the
VA tries to put the first little bit of the HTTP response in the problem
detail.

However, VA needs to be able to serialize the problem detail as a
protobuf to send it to the RA, and protobufs require string types to be
UTF-8. Filter out any invalid UTF-8 sequences and replace them with
REPLACEMENT CHARACTER.
2018-09-17 14:35:46 -04:00
Joel Sing a64928bc3d Rework CAA value parameter parsing to match RFC 6644bis draft. (#3805)
This switches from whitespace to semi-colon separated tag/value parameters,
while implementing stricter checks on valid tag and value values (to match
the RFC). Test coverage is added for CAA value parameter parsing, along with
some additional tests for CAA records with multiple parameter values.

Fixes issue #3795.
2018-09-05 17:09:10 -07:00
Daniel McCarney 94bcebd658
VA: Ignore cancelled errs from remote VAs. (#3827)
If the context provided to a remote VA's `PerformValidation` is
cancelled we should not treat the returned context cancelled error as an
unexpected error and should instead ignore it as an expected result.
2018-08-27 12:20:54 -04:00
Roland Bracewell Shoemaker 1ef93c3809 Support both obsolete and new TLS-ALPN OID (#3819) 2018-08-20 10:51:33 -04:00
Roland Bracewell Shoemaker 1e6699d03e Remove hyphens from ACME-CAA parameters (#3772)
The hyphens were incompatible with RFC 6844 (but not RFC 6844bis), and
broke some CAA-processing software in practice. Hugo revised the ACME-CAA
draft (https://datatracker.ietf.org/doc/html/draft-ietf-acme-caa-05) to remove
the hyphens.
2018-06-21 13:49:48 -07:00
Daniel McCarney 2dadd5e09a VA: Log exceptional non-problem remote VA errors. (#3760)
Previously, if a remote VA returned an error that is not a ProblemDetail, the
primary VA would log a ServerInternalProblem but not the underlying error.
This commit updates performRemoteValidation to always return the full error it
receives from a remote VA.

This commit also adds a unittest that checks that the VA still returns a
ServerInternalProblem to the RA, and that the VA audit logs the underlying
error.

Resolves https://github.com/letsencrypt/boulder/issues/3753
2018-06-15 10:53:16 -07:00
Roland Bracewell Shoemaker 813aa788e9 Assume acmeValidation-v1 is wrapped OCTET STRING (#3752)
As defined by the spec.
2018-06-11 14:44:13 -07:00
Joel Sing 9c2859c87b Add support for CAA account-uri validation. (#3736)
This adds support for the account-uri CAA parameter as specified by
section 3 of https://tools.ietf.org/html/draft-ietf-acme-caa-04, allowing
issuance to be restricted to one or more ACME accounts as specified by CAA
records.
2018-06-08 12:08:03 -07:00
Maciej Dębski bb9ddb124e Implement TLS-ALPN-01 and integration test for it (#3654)
This implements newly proposed TLS-ALPN-01 validation method, as described in
https://tools.ietf.org/html/draft-ietf-acme-tls-alpn-01 This challenge type is disabled 
except in the config-next tree.
2018-06-06 13:04:09 -04:00
Daniel McCarney b29fe6559d VA: Log challenge type in `checkCAA`. (#3742)
This commit updates the VA's checkCAA function to include the provided
challengeType parameter as part of the audit log line for the CAA
check result. If challengeType == nil then the value "none" is logged.
Unit tests are updated accordingly.

This change will make it easier to distinguish cases where "Valid for
issuance" is false because of a validation-methods restriction.

Resolves #3740
2018-06-04 09:19:10 -07:00
Joel Sing 2540d59296 Implement CAA validation-methods checking. (#3716)
When performing CAA checking respect the validation-methods parameter (if
present) and restrict the allowed authorization methods to those specified.
This allows a domain to restrict authorization methods that can be used with
Let's Encrypt.

This is largely based on PR #3003 (by @lukaslihotzki), which was landed and
then later reverted due to issue #3143. The bug the resulted in the previous
code being reverted has been addressed (likely inadvertently) by 76973d0f.

This implementation also includes integration tests for CAA validation-methods.

Fixes issue #3143.
2018-05-23 14:32:31 -07:00
Jacob Hoffman-Andrews 5ad14170fb Ignore canceled IsSafeDomain calls. (#3730)
Fixes #3681.
2018-05-23 12:50:30 -07:00
Daniel McCarney 3df93d2230 VA: Log observed CAA records. (#3726)
This is a quick first pass at audit logging the *dns.CAA records in JSON format from within the VA's IsCAAValid function. This will provide more information for post-hoc analysis of CAA decisions.
2018-05-18 15:28:45 -07:00
Daniel McCarney 084819b011
VA: CAA tag identifiers are case insensitive. (#3722)
Per RFC 6844 Section 5.1 the matching of CAA tag identifiers (e.g.
"Issue") is case insensitive. This commit updates the CAA tag processing
to be case insensitive as required by the RFC.

To exercise the fix this commit adds a test case to the `caaMockDNS`
`LookupCAA` implementation for a hostname (`mixedcase.com`) that has
a CAA record with a mixed case `Issue` tag. Prior to the fix from this
branch being included in `va/caa.go` the test fails:
```
    --- FAIL: TestCAAChecking/Bad_(Reserved,_Mixed_case_Issue) (0.00s)
          caa_test.go:292: checkCAARecords validity mismatch for
          mixedcase.com: got true expected false
```

With the fix applied, the test passes.
2018-05-18 09:34:57 -04:00
Joel Sing f8a023e49c Remove various unnecessary uses of fmt.Sprintf (#3707)
Remove various unnecessary uses of fmt.Sprintf - in particular:

- Avoid calls like t.Error(fmt.Sprintf(...)), where t.Errorf can be used directly.

- Use strconv when converting an integer to a string, rather than using
  fmt.Sprintf("%d", ...). This is simpler and can also detect type errors at
  compile time.

- Instead of using x.Write([]byte(fmt.Sprintf(...))), use fmt.Fprintf(x, ...).
2018-05-11 11:55:25 -07:00
Joel Sing 9990d14654 Convert the probs functions to be formatters. (#3708)
Many of the probs.XYZ calls are of the form probs.XYZ(fmt.Sprintf(...)).
Convert these functions to take a format string and optional arguments,
following the same pattern used in the errors package. Convert the
various call sites to remove the now redundant fmt.Sprintf calls.
2018-05-11 11:51:16 -07:00
Joel Sing 8ebdfc60b6 Provide formatting logger functions. (#3699)
A very large number of the logger calls are of the form log.Function(fmt.Sprintf(...)).
Rather than sprinkling fmt.Sprintf at every logger call site, provide formatting versions
of the logger functions and call these directly with the format and arguments.

While here remove some unnecessary trailing newlines and calls to String/Error.
2018-05-10 11:06:29 -07:00
Roland Bracewell Shoemaker b2a2a24dc3 Stop using validation record as an input/output (#3694)
This change cleans up how `va.http01Dialer` works with regards to `core.ValidationRecord`s. Instead of using the record as both an input and a output it now uses a set of inputs and outputs information about addresses via a channel. The validation record is then constructed in the parent scope or in the redirect function instead of the dialer itself.

Fixes #2730, fixes #3109, and fixes #3663.
2018-05-09 11:55:14 -04:00
Roland Bracewell Shoemaker a5ac5fa078 Deprecate IPv6First feature flag (#3684) 2018-05-02 10:22:25 -07:00
Jacob Hoffman-Andrews d0a510664b Remove Timeout field from VA's http.Client. (#3661)
This field was set to singleDialTimeout, but the net/http library treats
it as covering all of dial, write headers, and read headers and body.
Since http01Dialer also uses singleDialTimeout, there's a race between
http01Dialer and net/http to see who will time out first. The result is
that sometimes we give "Timeout after connect" when the error really
should be "Timeout during connect." This issue also inhibits IPv6 to
IPv4 fallback, and tickles a data race that was causing a rare panic in
VA: https://github.com/letsencrypt/boulder/issues/3109.

After this change, the overall HTTP request will get the full deadline
allowed by the RPC context. The dialer will continue to use
singleDialTimeout for each of its two possible dial attempts.
2018-04-23 09:24:23 -04:00
Jacob Hoffman-Andrews a1b98d9163
Use a context when dialing TLS for TLS-SNI (#3648)
This allows us to have fast-running unittests without modifying the global state in singleDialTimeout,
which can become a const.

Fixes #3628.

Builds on top of #3629, review that first.
2018-04-16 15:06:56 -07:00
Jacob Hoffman-Andrews 339ea954bd Give more detailed validation errors in VA (#3629)
In particular, differentiate timeouts during connect (which are usually a firewall problem) from timeouts after connect (which are usually a software problem). In the process, refactor the tests and add testing for specific problem detail messages.

This also switches over the HTTP challenge's dialer to use DialContext, and to shave a little bit of headroom off of the context deadline, so that the dial can report its timeout before the overall context expires, which would lead to an overly generic "deadline exceeded" error, which would then get translated (incorrectly) into a "timeout after connect."

There is an additional error case, Timeout during %s (your server may be slow or overloaded), (where %s can be read or write) which doesn't have any unittests. I believe it may not be possible to trigger this, since read and write timeouts get subsumed by the HTTP or TLS library, but it's worth having as a fallback case. We'll see if it shows up in the logs.

Among the test refactorings, I shortened the timeout on the TLS timeout test to 50ms. Previously this was the long pole making the whole test take 10s. Now it takes ~500 ms overall.

I recommend starting review at https://github.com/letsencrypt/boulder/compare/detailed-va-errors?expand=1#diff-4c51d1d7ca3ec3022d14b42809af0d7eR671 (the changes to detailedError), then reviewing the Dial -> DialContext changes, then the tests.
2018-04-16 12:01:08 -07:00
Jacob Hoffman-Andrews 699c7e4c44 Add a DNS problem type. (#3625)
As specified in ACME. Also, include problem type in the stats.

Fixes #3613.
2018-04-09 12:21:02 -04:00
Roland Bracewell Shoemaker 8167abd5e3 Use internet facing appropriate histogram buckets for DNS latencies (#3616)
Also instead of repeating the same bucket definitions everywhere just use a single top level var in the metrics package in order to discourage copy/pasting.

Fixes #3607.
2018-04-04 08:01:54 -04:00
Jacob Hoffman-Andrews 11434650b7 Check safe browsing at validation time (#3539)
Right now we check safe browsing at new-authz time, which introduces a possible
external dependency when calling new-authz. This is usually fine, since most safe
browsing checks can be satisfied locally, but when requests have to go external,
it can create variance in new-authz timing.

Fixes #3491.
2018-03-09 11:15:05 +00:00
Daniel McCarney 49d55d9ab5 Make POSTing KeyAuthorization optional, V2 don't echo it. (#3526)
This commit updates the RA to make the notion of submitting
a KeyAuthorization value as part of the ra.UpdateAuthorization call
optional. If set, the value is enforced against expected and an error is
returned if the provided authorization isn't correct. If it isn't set
the RA populates the field with the computed authorization for the VA to
enforce against the value it sees in challenges. This retains the legacy
behaviour of the V1 API. The V2 API will never unmarshal a provided
key authorization.

The ACMEv2/WFEv2 prepChallengeForDisplay function is updated to strip
the ProvidedKeyAuthorization field before sending the challenge object
back to a client. ACMEv1/WFEv1 continue to return the KeyAuthorization
in challenges to avoid breaking clients that are relying on this legacy
behaviour.

For deployability ease this commit retains the name of the
core.Challenge.ProvidedKeyAuthorization field even though it should
be called core.Challenge.ComputedKeyAuthorization now that it isn't
set based on the client's provided key authz. This will be easier as
a follow-up change.

Resolves #3514
2018-03-06 20:33:01 +00:00
Daniel McCarney 28cc969814 Remove TLS-SNI-02 implementation. (#3516)
This code was never enabled in production. Our original intent was to
ship this as part of the ACMEv2 API. Before that could happen flaws were
identified in TLS-SNI-01|02 that resulted in TLS-SNI-02 being removed
from the ACME protocol. We won't ever be enabling this code and so we
might as well remove it.
2018-03-02 10:56:13 -08:00
Jacob Hoffman-Andrews 1fe8aa8128 Improve errors for DNS challenge (#3317)
Before this change, we would just log "Correct value not found for DNS challenge"
when we got a TXT record that didn't match what we expected. This was different
from the error when no TXT records were found at all, but viewing the error out of
context doesn't make that clear. This change improves the error to specifically say
that we found a TXT record, but it was the wrong one.

Also in this change: if we found multiple TXT records, we mention the number;
and we trim the length of the echoed TXT record.
2018-01-03 15:37:23 -05:00
Daniel McCarney de5fbbdb67
Implement CAA issueWild enforcement for wildcard names (#3266)
This commit implements RFC 6844's description of the "CAA issuewild
property" for CAA records.

We check CAA in two places: at the time of validation, and at the time
of issuance when an authorization is more than 8hours old. Both
locations have been updated to properly enforce issuewild when checking
CAA for a domain corresponding to a wildcard name in a certificate
order.

Resolves https://github.com/letsencrypt/boulder/issues/3211
2017-12-13 12:09:33 -05:00
Daniel McCarney 1c99f91733 Policy based issuance for wildcard identifiers (Round two) (#3252)
This PR implements issuance for wildcard names in the V2 order flow. By policy, pending authorizations for wildcard names only receive a DNS-01 challenge for the base domain. We do not re-use authorizations for the base domain that do not come from a previous wildcard issuance (e.g. a normal authorization for example.com turned valid by way of a DNS-01 challenge will not be reused for a *.example.com order).

The wildcard prefix is stripped off of the authorization identifier value in two places:

When presenting the authorization to the user - ACME forbids having a wildcard character in an authorization identifier.
When performing validation - We validate the base domain name without the *. prefix.
This PR is largely a rewrite/extension of #3231. Instead of using a pseudo-challenge-type (DNS-01-Wildcard) to indicate an authorization & identifier correspond to the base name of a wildcard order name we instead allow the identifier to take the wildcard order name with the *. prefix.
2017-12-04 12:18:10 -08:00
Daniel McCarney 55dd1020c0 Increase VA SingleDialTimeout to 10s. (#3260)
This PR changes the VA's singleDialTimeout value from 5 * time.Second to 10 * time.Second. This will give slower servers a better chance to respond, especially for the multi-VA case where n requests arrive ~simultaneously.

This PR also bumps the RA->VA timeout by 5s and the WFE->RA timeout by 5s to accommodate the increased dial timeout. I put this in a separate commit in case we'd rather deal with this separately.
2017-12-04 09:53:26 -08:00
Roland Bracewell Shoemaker 9da1bea433 Update histogram buckets for latencies that measure things over the internet (#3254)
Updates the buckets for histograms in the publisher, va, and expiration-mailer which are used to measure the latency of operations that go over the internet and therefore are liable to take a lot longer than the default buckets can measure. Uses a standard set of buckets for all three instead of attempting to tune for each one.

Fixes #3217.
2017-11-29 15:13:14 -08:00
Jacob Hoffman-Andrews 2fd2f9e230 Remove LegacyCAA implementation. (#3240)
Fixes #3236
2017-11-20 16:09:00 -05:00
Jacob Hoffman-Andrews bf9ce64aca Update GSB library (#3192)
This pulls in google/safebrowsing#74, which introduces a new LookupURLsContext that allows us to pass through timeout information nicely.

Also, update calling code to use LookupURLsContext instead of LookupURLs.
2017-10-24 08:33:03 -04:00
Jacob Hoffman-Andrews 51991cd264 Fix logging of hostname in VA. (#3149)
The pbToAuthzMeta method in rpc/pb-marshalling.go only propagates ID and
registrationID, not hostname. So log the "domain" parameter instead.
2017-10-06 11:10:02 -07:00
Daniel McCarney 1794c56eb8 Revert "Add CAA parameter to restrict challenge type (#3003)" (#3145)
This reverts commit 23e2c4a836.
2017-10-04 12:00:44 -07:00
lukaslihotzki 23e2c4a836 Add CAA parameter to restrict challenge type (#3003)
This commit adds CAA `issue` paramter parsing and the `challenge` parameter to permit a single challenge type only. By setting `challenge=dns-01`, the nameserver keeps control over every issued certificate.
2017-10-02 11:59:47 -07:00
Jacob Hoffman-Andrews d2883f12c1 Remove TimingDuration call from VA (#3122)
Also switch over tests.

Fixes #3100
2017-09-28 14:25:22 -07:00
Daniel McCarney 966e02313f Forbid HTTP redirects to non-80/443 ports. (#3115)
Prior to this commit the VA would follow redirects from the initial
HTTP-01 challenge request on port 80 to any other port. In practice the
Let's Encrypt production environment has network egress firewall rules
that drop outbound requests that are not on port 80 or 443. In effect
this meant any challenge request that was redirected from 80 to a port
other than 80/443 was turned into a mysterious connection timeout error.

We have decided to preserve the egress firewall rule and continue to act
conservatively. Only port 80 and 443 should be allowed in redirects.

This commit updates the VA to return a clear error message when
a non-80/443 redirect is made.

To aid in testing/configuration the actual ports enforced are specified
by the va.httpPort and va.httpsPort that are used for the initial
outbound HTTP-01 connection.

The VA TestHTTPRedirectLookup unit test is updated accordingly to test
that a non-80/443 redirect fails with the expected message.

Resolves #3049
2017-09-25 10:19:10 -07:00
Daniel McCarney 3408b62720 Include the domain name in problems from IsCAAValid. (#3116)
For certificates with many domains it can be difficult to associate
a given CAA error with the specific domain that caused it. To make this
easier this commit explicitly prefixes all of the problems that can be
returned from `va.IsCAAValid` with the domain name in question.

A small unit test is included to check a CAA problem's detail message is
suitably prefixed with the affected domain.
2017-09-25 13:11:50 -04:00
Jacob Hoffman-Andrews fce975a1e6 Move CAA mocks into caa_test. (#3084)
There were a bunch of test fixtures in bdns/mocks.go that were only used in va/caa_test.go. This moves them to be in the same file so we have less spooky action at a distance.

One side-effect: We can't construct bdns.DNSError with the internal fields we want, because those fields are unexported. So we switch a couple of mock cases to just return a generic error, and the corresponding test cases to expect that error.
2017-09-18 13:10:01 -07:00
Roland Bracewell Shoemaker 0ab1e2ff46 Raise treeClimbingLookupCAA limit (#3098) 2017-09-15 18:30:29 -07:00
Roland Bracewell Shoemaker 8a2ad13a87 Don't tree climb for trees we've already climbed (#3096)
Prevents repeated lookups in traditional CNAME or tree based CNAME loops
2017-09-15 19:29:35 -04:00
Roland Bracewell Shoemaker d1d6cab8ce Fix CAA test (#3092) 2017-09-14 16:02:54 -07:00
Jacob Hoffman-Andrews 9ab2ff4e03 Add CAA-specific error. (#3051)
Previously, CAA problems were lumped in under "ConnectionProblem" or
"Unauthorized". This should make things clearer and easier to differentiate.

Fixes #3043
2017-09-14 14:11:41 -07:00
Jacob Hoffman-Andrews 4266853092 Implement legacy form of CAA (#3075)
This implements the pre-erratum 5065 version of CAA, behind a feature flag.

This involved refactoring DNSClient.LookupCAA to return a list of CNAMEs in addition to the CAA records, and adding an alternate lookuper that does tree-climbing on single-depth aliases.
2017-09-13 10:16:12 -04:00
Jacob Hoffman-Andrews 568407e5b8 Remote VA logging and stats (#3063)
Add a logging statement that fires when a remote VA fail causes
overall failure. Also change remoteValidationFailures into a
counter that counts the same thing, instead of a histogram. Since
the histogram had the default bucket sizes, it failed to collect
what we needed, and produced more metrics than necessary.
2017-09-11 12:50:50 -07:00
Roland Bracewell Shoemaker e91349217e Switch to using go 1.9 (#3047)
* Switch to using go 1.9

* Regenerate with 1.9

* Manually fix import path...

* Upgrade mockgen and regenerate

* Update github.com/golang/mock
2017-09-06 16:30:13 -04:00
Daniel McCarney baf32878c0 Prefix problem type with namespace at runtime. (#3039)
To support having problem types that use either the classic
"urn:acme:error" namespace or the new "urn:ietf:params:acme:error"
namespace as appropriate we need to prefix the problem type at runtime
right before returning it through the WFE to the user as JSON. This
commit updates the WFE/WFE2 to do this for both problems sent through
sendError as well as problems embedded in challenges. For the latter
we do not modify problems with a type that is already prefixed to
support backwards compatibility.

Resolves #2938

Note: We should cut a follow-up issue to devise a way to share some
common code between the WFE and WFE2. For example, the
prepChallengeForDisplay should probably be hoisted to a common
"web" package
2017-09-06 12:55:10 -07:00
Jacob Hoffman-Andrews b0c7bc1bee Recheck CAA for authorizations older than 8 hours (#3014)
Fixes #2889.

VA now implements two gRPC services: VA and CAA. These both run on the same port, but this allows implementation of the IsCAAValid RPC to skip using the gRPC wrappers, and makes it easier to potentially separate the service into its own package in the future.

RA.NewCertificate now checks the expiration times of authorizations, and will call out to VA to recheck CAA for those authorizations that were not validated recently enough.
2017-08-28 16:40:57 -07:00
Jacob Hoffman-Andrews 0d69b24fcc Move VA's CAA code into separate file (#3010)
va.go is quite a large file. This splits out the CAA-related code and tests into its own file for simplicity. This is a simple move; no code has been changed, and there is no package split.
2017-08-28 11:24:03 -07:00
Jacob Hoffman-Andrews 9026f6cbf8 Remove global state from VA test (#3009)
The VA test had a global:

`var ident = core.AcmeIdentifier{Type: core.IdentifierDNS, Value: "localhost"}`

Evidently this was meant as a convenience to avoid having to retype this common value, but it wound up being mutated independently by different tests. This PR replaces it with a convenience function `dnsi()` that generates a DNS-type identifier with the given hostname. Makes the VA test much more reliable locally.
2017-08-25 16:55:38 -07:00