Commit Graph

135 Commits

Author SHA1 Message Date
Roland Bracewell Shoemaker af41bea99a Switch to more efficient multi nonce-service design (#4308)
Basically a complete re-write/re-design of the forwarding concept introduced in
#4297 (sorry for the rapid churn here). Instead of nonce-services blindly
forwarding nonces around to each other in an attempt to find out who issued the
nonce we add an identifying prefix to each nonce generated by a service. The
WFEs then use this prefix to decide which nonce-service to ask to validate the
nonce.

This requires a slightly more complicated configuration at the WFE/2 end, but
overall I think ends up being a way cleaner, more understandable, easy to
reason about implementation. When configuring the WFE you need to provide two
forms of gRPC config:

* one gRPC config for retrieving nonces, this should be a DNS name that
resolves to all available nonce-services (or at least the ones you want to
retrieve nonces from locally, in a two DC setup you might only configure the
nonce-services that are in the same DC as the WFE instance). This allows
getting a nonce from any of the configured services and is load-balanced
transparently at the gRPC layer. 
* a map of nonce prefixes to gRPC configs, this maps each individual
nonce-service to it's prefix and allows the WFE instances to figure out which
nonce-service to ask to validate a nonce it has received (in a two DC setup
you'd want to configure this with all the nonce-services across both DCs so
that you can validate a nonce that was generated by a nonce-service in another
DC).

This balancing is implemented in the integration tests.

Given the current remote nonce code hasn't been deployed anywhere yet this
makes a number of hard breaking changes to both the existing nonce-service
code, and the forwarding code.

Fixes #4303.
2019-06-28 12:58:46 -04:00
Roland Bracewell Shoemaker 844ae26b65
Allow forwarding of nonce-service Redeem RPCs from one service… (#4297)
Fixes #4295.
2019-06-26 13:04:31 -07:00
Jacob Hoffman-Andrews df19fd9e58
Integration test for v1 authz reuse when v2 flag is enabled (#4288)
When NewAuthorizationSchema is enabled, we still want v1 authzs to be reusable in
new orders. This tests that that code is implemented correctly.

Updates #4241
2019-06-25 10:50:58 -07:00
Roland Bracewell Shoemaker 24f150f8fc Re-apply #4279 with requests fix (#4286)
Move from using `requests` to `urllib2` in `helpers.py`. Verified
this works with `docker-compose up`. In the future we really should
be installing our own python dependencies in the boulder-tools image
rather than relying on getting them by using the certbot virtualenv.
2019-06-24 11:58:37 -07:00
Adrien Ferrand 8e31d58113 Revert "tests: Switch to instant OCSP verification in int. tests (#4279)" (#4285)
This reverts commit f4b9235acb.

Fixes #4284
2019-06-23 12:06:16 -07:00
Roland Bracewell Shoemaker f4b9235acb tests: Switch to instant OCSP verification in int. tests (#4279)
* Switch to instant OCSP verification in integration tests
* Move waitport to helpers and use it to determine if ocsp-responder is
  alive in test_single_ocsp
2019-06-21 09:53:01 -04:00
Jacob Hoffman-Andrews 18a3c78d6f Refactor test_caa and twenty-days-ago setup (#4261)
As part of #4241, I need to introduce some twenty-days-ago setup. So I refactored the
only current instance (test_caa) to use a style where setup functions can be registered right
next to the test cases they affect. The @register_twenty_days_ago is Python for
"call register_twenty_days_ago with the thing on the next line as an argument."

I also cleaned up a bunch of related stuff:
* Removed the ACCOUNT_URI environment variable and associated function params.
This was introduced in in #3736 to pass a URI to challtestsrv before we refactored for
more dynamic updates. It's not used any more.
* Removed a try / except from startChallSrv that needlessly hid errors.
* Move setting of DNS fixtures for caa_test into the test case itself.
2019-06-18 14:58:06 -07:00
Roland Bracewell Shoemaker 4ca01b5de3
Implement standalone nonce service (#4228)
Fixes #3976.
2019-06-05 10:41:19 -07:00
Daniel McCarney b99b35009e load-generator: support all challenge types, run in CI. (#4140)
## CI: restore load-generator run.

This restores running the `load-generator` during CI to make sure it doesn't bitrot. It was previously removed while we debugged the VA getting jammed up and not cleanly shutting down.

Since the global `pebble-challtestsrv` and the `load-generator`'s internal chall test srv will conflict this requires moving the `load-generator` run to the end of integration tests and updating `startservers.py` to allow the load gen integration test code to stop the `pebble-challtestsrv` before starting the `load-generator`.

The `load-generator` and associated config are updated to allow specifying bind addresses for the DNS interface of the internal challtestsrv. Multiple addresses are supported so that the `load-generator`'s chall test srv can listen on port DNS ports Boulder is configured to use. The `load-generator` config now accepts a `fakeDNS` parameter that can be used to specify the default IPv4 address returned by the `load-generator`'s DNS server for A queries.

## load-generator: support different challenges/strategies.

Updates the load-generator to support HTTP-01, DNS-01, and TLS-ALPN-01 challenge response servers. A new challenge selection configuration parameter (`ChallengeStrategy`) can be set to `"http-01"`, `"dns-01"`, or `"tls-alpn-01"` to solve only challenges of that type. Using `"random"` will let the load-generator choose a challenge type randomly.

Resolves https://github.com/letsencrypt/boulder/issues/3900
2019-04-04 11:44:14 -07:00
Jacob Hoffman-Andrews cb86f9e850 Copy boulder-va to boulder-remoteva. (#4128)
This will make it easier to distinguish in logs.
2019-03-20 16:07:51 -04:00
Jacob Hoffman-Andrews 677b9b88ad Remove GSB support. (#4115)
This is no longer enabled in prod; cleaning up the code.

https://community.letsencrypt.org/t/let-s-encrypt-no-longer-checking-google-safe-browsing/82168
2019-03-15 10:24:44 -07:00
Daniel McCarney 1c0be52e53 VA: Add integration test for HTTP timeouts. (#4050)
Also update `TestHTTPTimeout` to test with the `SimplifiedVAHTTP`
feature flag enabled.
2019-02-12 13:42:01 -08:00
Roland Bracewell Shoemaker 046955e99c Add a standalone akamai purger service (#4040)
Fixes #4030.
2019-02-05 09:00:31 -08:00
Daniel McCarney f72c371bdc
Set pebble-challtestsrv IP from FAKE_DNS at startup. (#3984)
`pebble-challtestsrv` added a `-defaultIPv4` arg we can use to simplify
the integration tests and fix FAKE_DNS usage outside of integration
tests.

A new boulder-tools image with an updated `pebble-challtestsrv` is used
and `test/startservers.py` is changed to populate `-defaultIPv4` via the
`FAKE_DNS` env var.
2018-12-13 13:49:12 -05:00
Daniel McCarney 893e8459d6
Use pebble-challtestrv cmd, letsencrypt/challtestsrv package. (#3980)
Now that Pebble has a `pebble-challtestsrv` we can remove the `challtestrv`
package and associated command from Boulder. I switched CI to use
`pebble-challtestsrv`. Notably this means that we have to add our expected mock
data using the HTTP management interface. The Boulder-tools images are
regenerated to include the `pebble-challtestsrv` command.

Using this approach also allows separating the TLS-ALPN-01 and HTTPS HTTP-01
challenges by binding each challenge type in the `pebble-challtestsrv` to
different interfaces both using the same VA
HTTPS port. Mock DNS directs the VA to the correct interface.

The load-generator command that was previously using the `challtestsrv` package
from Boulder is updated to use a vendored copy of the new
`github.org/letsencrypt/challtestsrv` package.

Vendored dependencies change in two ways:
1) Gomock is updated to the latest release (matching what the Bouldertools image
   provides)
2) A couple of new subpackages in `golang.org/x/net/` are added by way of
   transitive dependency through the challtestsrv package.

Unit tests are confirmed to pass for `gomock`:
```
~/go/src/github.com/golang/mock/gomock$ git log --pretty=format:'%h' -n 1
51421b9
~/go/src/github.com/golang/mock/gomock$ go test ./...
ok    github.com/golang/mock/gomock 0.002s
?     github.com/golang/mock/gomock/internal/mock_matcher [no test files]
```
For `/x/net` all tests pass except two `/x/net/icmp` `TestDiag.go` test cases
that we have agreed are OK to ignore.

Resolves https://github.com/letsencrypt/boulder/issues/3962 and
https://github.com/letsencrypt/boulder/issues/3951
2018-12-12 14:32:56 -05:00
Daniel McCarney bd4c254942
Use Challtestsrv for HTTP-01 integration tests, add redirect tests (#3960)
To complete https://github.com/letsencrypt/boulder/issues/3956 the `challtestsrv` is updated such that its existing TLS-ALPN-01 challenge test server will serve HTTP-01 responses with a self-signed certificate when a non-TLS-ALPN-01 request arrives. This lets the TLS-ALPN-01 challenge server double as a HTTPS version of the HTTP challenge server. The `challtestsrv` now also supports adding/remove redirects that will be served to clients when requesting matching paths.

The existing chisel/chisel2 integration tests are updated to use the `challtestsrv` instead of starting their own standalone servers. This centralizes our mock challenge responses and lets us bind the `challtestsrv` to the VA's HTTP port in `startservers.py` without clashing ports later on.

New integration tests are added for HTTP-01 redirect scenarios using the updated `challtestserv`. These test cases cover:
* valid HTTP -> HTTP redirect
* valid HTTP -> HTTPS redirect
* Invalid HTTP -> non-HTTP/HTTPS port redirect
* Invalid HTTP-> non-HTTP/HTTPS protocol scheme redirect
* Invalid HTTP-> bare IP redirect
* Invalid HTTP redirect loop

The new integration tests shook out two fixes that were required for the legacy VA HTTP-01 code (afad22b) and one fix for the challtestsrv mock DNS (59b7d6d).

Resolves https://github.com/letsencrypt/boulder/issues/3956
2018-11-30 17:20:10 -05:00
Roland Bracewell Shoemaker ba7a8e8e5d Add fake Akamai purge server for integration testing (#3946)
Fixes #3916.
2018-11-27 09:49:05 -05:00
Roland Bracewell Shoemaker 9b94d4fdfe Add a orphan queue to the CA (#3832)
Retains the existing logging of orphaned certs until we are confident that this
solution can fully replace it (even then we may want to keep it just for auditing etc).

Fixes #3636.
2018-09-05 11:12:07 -07:00
Roland Bracewell Shoemaker 9ea4a54ca2 Use challtestsrv for solving TLS-ALPN-01 in integration tests (#3789)
Also in the process fix some errors I made in the original challtestsrv TLS-ALPN-01 implementation.

Fixes #3780.
2018-07-03 10:41:20 -04:00
Joel Sing 9c2859c87b Add support for CAA account-uri validation. (#3736)
This adds support for the account-uri CAA parameter as specified by
section 3 of https://tools.ietf.org/html/draft-ietf-acme-caa-04, allowing
issuance to be restricted to one or more ACME accounts as specified by CAA
records.
2018-06-08 12:08:03 -07:00
Jacob Hoffman-Andrews dbcb16543e Start using multiple-IP hostnames for load balancing (#3687)
We'd like to start using the DNS load balancer in the latest version of gRPC. That means putting all IPs for a service under a single hostname (or using a SRV record, but we're not taking that path). This change adds an sd-test-srv to act as our service discovery DNS service. It returns both Boulder IP addresses for any A lookup ending in ".boulder". This change also sets up the Docker DNS for our boulder container to defer to sd-test-srv when it doesn't know an answer.

sd-test-srv doesn't know how to resolve public Internet names like `github.com`. Resolving public names is required for the `godep-restore` test phase, so this change breaks out a copy of the boulder container that is used only for `godep-restore`.

This change implements a shim of a DNS resolver for gRPC, so that we can switch to DNS-based load balancing with the currently vendored gRPC, then when we upgrade to the latest gRPC we won't need a simultaneous config update.

Also, this change introduces a check at the end of the integration test that each backend received at least one RPC, ensuring that we are not sending all load to a single backend.
2018-05-23 09:47:14 -04:00
Daniel McCarney c254159235 challsrv: Common ACME challenge response server library/command. (#3689)
Prior to this commit we had two implementations of ACME challenge
servers for use in tests:
1) test/dns-test-srv - a small fake DNS server used for adding/removing
   DNS-01 TXT records and returning fake A/AAAA data.
2) test/load-generator/challenge-servers.go - a small library for
   providing an HTTP-01 challenge server.

This commit consolidates both into a dedicated `test/challsrv` package.
The `load-generator` code is updated to use this library package to
implement its HTTP-01 challenge server. This leaves the `load-generator`
as a nice stand alone tool that doesn't need coordination between itself
and a separate `challsrv` binary.

To keep the `dns-test-srv` use-case of a nice standalone binary that can
be run from `test/startservers.py` the `test/challsrv` package has
a `test/challsrv/cmd/challsrv` package that provides the `challsrv`
command. This is a stand-alone binary that can offer both an HTTP-01 and
a DNS-01 challenge server along with a management HTTP interface that
can be used by external programs to add/remove HTTP-01 and DNS-01
challenges.

The Boulder integration tests are updated to use `challsrv` instead of
`dns-test-srv`. Presently only the DNS-01 challenge server of `challsrv`
is used by the integration tests.

TODO: The DNS-01 challenge server is doing a fair number of non-DNS-01
challenge things (Fake host data, etc). This should be cleaned up and
made configurable.

Updates #3652
2018-05-09 12:49:13 -07:00
Jacob Hoffman-Andrews a4421ae75b Run gRPC backends on multiple IPs instead of multiple ports (#3679)
We're currently stuck on gRPC v1.1 because of a breaking change to certificate validation in gRPC 1.8. Our gRPC balancer uses a static list of multiple hostnames, and expects to validate against those hostnames. However gRPC expects that a service is one hostname, with multiple IP addresses, and validates all those IP addresses against the same hostname. See grpc/grpc-go#2012.

If we follow gRPC's assumptions, we can rip out our custom Balancer and custom TransportCredentials, and will probably have a lower-friction time in general.

This PR is the first step in doing so. In order to satisfy the "multiple IPs, one port" property of gRPC backends in our Docker container infrastructure, we switch to Docker's user-defined networking. This allows us to give the Boulder container multiple IP addresses on different local networks, and gives it different DNS aliases in each network.

In startservers.py, each shard of a service listens on a different DNS alias for that service, and therefore a different IP address. The listening port for each shard of a service is now identical.

This change also updates the gRPC service certificates. Now, each certificate that is used in a gRPC service (as opposed to something that is "only" a client) has three names. For instance, sa1.boulder, sa2.boulder, and sa.boulder (the generic service name). For now, we are validating against the specific hostnames. When we update our gRPC dependency, we will begin validating against the generic service name.

Incidentally, the DNS aliases feature of Docker allows us to get rid of some hackery in entrypoint.sh that inserted entries into /etc/hosts.

Note: Boulder now has a dependency on the DNS aliases feature in Docker. By default, docker-compose run creates a temporary container and doesn't assign any aliases to it. We now need to specify docker-compose run --use-aliases to get the correct behavior. Without --use-aliases, Boulder won't be able to resolve the hostnames it wants to bind to.
2018-05-07 10:38:31 -07:00
Roland Bracewell Shoemaker 24cd01d033 Revert to setting full addresses instead of just ports 2018-04-23 12:39:28 -07:00
Roland Bracewell Shoemaker 5c4eaf841f Review fixes 2018-04-20 16:03:55 -07:00
Roland Bracewell Shoemaker 0a86573a73 Update integration tests 2018-04-20 13:18:40 -07:00
Jacob Hoffman-Andrews 007a75f2e5 Remove multi-va errors in integration tests. (#3448)
We recently reordered the programs in startservers.py to reflect
dependencies, so we wouldn't generate gRPC errors when a process comes
up before its backend. However, we missed the remote VAs. This change
moves them to the beginning.

Also, don't print "all processes started" after each process starts.
2018-02-15 09:56:04 -05:00
Jacob Hoffman-Andrews c556a1a20d
Reduce spurious errors in integration test (#3436)
Boulder is fairly noisy about gRPC connection errors. This is a mixed
blessing: Our gRPC configuration will try to reconnect until it hits
an RPC deadline, and most likely eventually succeed. In that case,
we don't consider those to really be errors. However, in cases where
a connection is repeatedly failing, we'd like to see errors in the
logs about connection failure, rather than "deadline exceeded." So
we want to keep logging of gRPC errors.

However, right now we get a lot of these errors logged during
integration tests. They make the output hard to read, and may disguise
more serious errors. So we'd like to avoid causing such errors in
normal integration test operation.

This change reorders the startup of Boulder components by their gRPC
dependencies, so everything's backend is likely to be up and running
before it starts. It also reverses that order for clean shutdowns,
and waits for each process to exit before signalling the next one.

With these changes, I still got connection errors. Taking listenbuddy
out of the gRPC path fixed them. I believe the issue is that
listenbuddy is not a truly transparent proxy. In particular, it
accepts an inbound TCP connection before opening an outbound TCP
connection. If opening that outbound connection results in "connection
refused," it closes the inbound connection. That means gRPC sees a
"connection closed" (or "connection reset"?) rather than "connection
refused". I'm guessing it handles those cases differently, explaining
the different error results.

We've been using listenbuddy to trigger disconnects while Boulder is
running, to ensure that gRPC's reconnect code works. I think we can
probably rely on gRPC's reconnect to work. The initial problem that
led us to start testing this was a configuration problem; now that
we have the configuration we want, we should be fine and don't need
to keep testing reconnects on every integration test run.
2018-02-12 18:17:50 -08:00
Jacob Hoffman-Andrews 2dc3b56fa9
Add variable latency to ct-test-srv (#3435)
For the upcoming SCT embedding changes, it will be useful to have a CT test server that blocks for nontrivial amounts of time before responding. This change introduces a config file for `ct-test-srv` that can be used to set up multiple "personalities" on various ports. Each personality can have a "latencySchedule" that determines how long it will sleep before servicing responding to a submission.

This change also introduces two new "personalities" on :4510 and :4511, plus configures CTLogGroups in the RA. Having four CT log personalities allows us to simulate two nontrivial log groups.

Note: This triggers Publisher to emit audit errors on timed-out submissions. We may want to make Publisher not treat those as errors, and instead only log an error if a whole log group fails.
2018-02-09 13:48:19 -08:00
Jacob Hoffman-Andrews 4296dd985a Use TLS in mailer integration tests (#3213)
* Remove non-TLS support from mailer entirely
* Add a config option for trusted roots in expiration-mailer. If unset, it defaults to the system roots, so this does not need to be set in production.
* Use TLS in mail-test-srv, along with an internal root and localhost certificates signed by that root.
2017-11-06 14:57:14 -08:00
Jacob Hoffman-Andrews 4128e0d95a Add time-dependent integration testing (#3060)
Fixes #3020.

In order to write integration tests for some features, especially related to rate limiting, rechecking of CAA, and expiration of authzs, orders, and certs, we need to be able to fake the passage of time in integration tests.

To do so, this change switches out all clock.Default() instances for cmd.Clock(), which can be set manually with the FAKECLOCK environment variable. integration-test.py now starts up all servers once before the main body of tests, with FAKECLOCK set to a date 70 days ago, and does some initial setup for a new integration test case. That test case tries to fetch a 70-day-old authz URL, and expects it to 404.

In order to make this work, I also had to change a number of our test binaries to shut down cleanly in response to SIGTERM. Without that change, stopping the servers between the setup phase and the main tests caused startservers.check() to fail, because some processes exited with nonzero status.

Note: This is an initial stab at things, to prove out the technique. Long-term, I think we will want to use an idiom where test cases are classes that have a number of optional setup phases that may be run at e.g. 70 days prior and 5 days prior. This could help us avoid a proliferation of global state as we add more time-dependent test cases.
2017-09-13 12:34:14 -07:00
Jacob Hoffman-Andrews 20ec1e3e4e Filter spurious shutdown errors. (#3052)
Previously, we would produce an error an a nonzero status code on shutdown,
because gRPC's GracefulStop would cause s.Serve() to return an error. Now we
filter that specific error and treat it as success. This also allows us to kill
process with SIGTERM instead of SIGKILL in integration tests.

Fixes #2410.
2017-09-07 13:45:32 -07:00
Daniel McCarney bd3e2747ba Duplicate WFE to WFE2. (#2839)
This PR is the initial duplication of the WFE to create a WFE2
package. The rationale is briefly explained in `wfe2/README.md`.

Per #2822 this PR only lays the groundwork for further customization
and deduplication. Presently both the WFE and WFE2 are identical except
for the following configuration differences:

* The WFE offers HTTP and HTTPS on 4000 and 4430 respectively, the WFE2
  offers HTTP on 4001 and 4431.
* The WFE has a debug port on 8000, the WFE2 uses the next free "8000
  range port" and puts its debug service on 8013

Resolves https://github.com/letsencrypt/boulder/issues/2822
2017-07-05 13:32:45 -07:00
Roland Bracewell Shoemaker 088b872287 Implement multi VA validation (#2802)
Adds basic multi-path validation functionality. A new method `performRemoteValidation` is added to `boulder-va` which is called if it is configured with a list of remote VA gRPC addresses. In this initial implementation the remote VAs are only used to check the validation result of the main VA, if all of the remote validations succeed but the local validation failed, the overall validation will still fail. Remote VAs use the exact same code as the local VA to perform validation. If the local validation succeeds then a configured quorum of the remote VA successes must be met in order to fully complete the validation.

This implementation assumes that metrics are collected from the remote VAs in order to have visibility into their individual validation latencies etc.

Fixes #2621.
2017-06-29 14:11:01 -07:00
Roland Bracewell Shoemaker a46d30945c Purge remaining AMQP code (#2648)
Deletes github.com/streadway/amqp and the various RabbitMQ setup tools etc. Changes how listenbuddy is used to proxy all of the gRPC client -> server connections so we test reconnection logic.

+49 -8,221 😁

Fixes #2640 and #2562.
2017-04-04 15:02:22 -07:00
Patrick Figel 6ba8aadfd7 Use X.509 AIA Issuer URL in rel="up" link header (#2545)
In order to provide the correct issuer certificate for older certificates after an issuer certificate rollover or when using multiple issuer certificates (e.g. RSA and ECDSA), use the AIA CA Issuer URL embedded in the certificate for the rel="up" link served by WFE. This behaviour is gated behind the UseAIAIssuerURL feature, which defaults to false.

To prevent MitM vulnerabilities in cases where the AIA URL is HTTP-only, it is upgraded to HTTPS.

This also adds a test for the issuer URL returned by the /acme/cert endpoint. wfe/test/178.{crt,key} were regenerated to add the AIA extension required to pass the test.

/acme/cert was changed to return an absolute URL to the issuer endpoint (making it consistent with /acme/new-cert).

Fixes #1663
Based on #1780
2017-02-07 11:19:22 -08:00
Daniel McCarney 15e73edc5a Google Safe Browsing V4 Improvements (#2504)
This PR has three primary contributions:

1. The existing code for using the V4 safe browsing API introduced in #2446 had some bugs that are fixed in this PR.
2. A gsb-test-srv is added to provide a mock Google Safebrowsing V4 server for integration testing purposes.
3. A short integration test is added to test end-to-end GSB lookup for an "unsafe" domain.

For 1) most notably Boulder was assuming the new V4 library accepted a directory for its database persistence when it instead expects an existing file to be provided. Additionally the VA wasn't properly instantiating feature flags preventing the V4 api from being used by the VA.

For 2) the test server is designed to have a fixed set of "bad" domains (Currently just honest.achmeds.discount.hosting.com). When asked for a database update by a client it will package the list of bad domains up & send them to the client. When the client is asked to do a URL lookup it will check the local database for a matching prefix, and if found, perform a lookup against the test server. The test server will process the lookup and increment a count for how many times the bad domain was asked about.

For 3) the Boulder startservers.py was updated to start the gsb-test-srv and the VA is configured to talk to it using the V4 API. The integration test consists of attempting issuance for a domain pre-configured in the gsb-test-srv as a bad domain. If the issuance succeeds we know the GSB lookup code is faulty. If the issuance fails, we check that the gsb-test-srv received the correct number of lookups for the "bad" domain and fail if the expected isn't reality.

Notes for reviewers:

* The gsb-test-srv has to be started before anything will use it. Right now the v4 library handles database update request failures poorly and will not retry for 30min. See google/safebrowsing#44 for more information.
* There's not an easy way to test for "good" domain lookups, only hits against the list. The design of the V4 API is such that a list of prefixes is delivered to the client in the db update phase and if the domain in question matches no prefixes then the lookup is deemed unneccesary and not performed. I experimented with sending 256 1 byte prefixes to try and trick the client to always do a lookup, but the min prefix size is 4 bytes and enumerating all possible prefixes seemed gross.
* The test server has a /add endpoint that could be used by integration tests to add new domains to the block list, but it isn't being used presently. The trouble is that the client only updates its database every 30 minutes at present, and so adding a new domain will only take affect after the client updates the database.

Resolves #2448
2017-01-23 11:07:20 -08:00
Jacob Hoffman-Andrews 16ab736c07 Temporarily switch to SIGKILL for startservers shutdown. (#2512)
Unfortunately our clean shutdown code paths are too noisy, and often obscure
real errors. We can turn this back to SIGTERM once that's fixed.
2017-01-19 16:45:43 -08:00
Jacob Hoffman-Andrews 0a367962d6 Make restarting boulder in docker nicer. (#2492)
* Make restarting boulder in docker nicer.

Handle SIGTERM in startservers.py.
Forcibly remove rsyslog pid to avoid error.

* Add explanatory comment.

* Send SIGTERM instead of kill.

* Further improvements.

- Handle SIGINT too.
- Use unbuffered mode for Python so the print statements (like "all servers
  running") get printed right away rather than at shutdown
- Squelch an unnecessary OSError about interrupting the wait() call.
2017-01-13 11:55:28 -05:00
Jacob Hoffman-Andrews 5407a45b02 Revert "Disable fail-fast for gRPC. (#2397)" (#2427)
This reverts commit 5b865f1d63.

The QueueDeclare and QueueBind calls in that change caused AMQP permission
denied errors.
2016-12-13 13:20:08 -08:00
Jacob Hoffman-Andrews 5b865f1d63 Disable fail-fast for gRPC. (#2397)
This allows us to restart backends with relatively little interruption in
service, provided the backends come up promptly.

Fixes #2389 and #2408
2016-12-09 12:03:45 -08:00
Roland Bracewell Shoemaker e2155388a1 Remove caa-checker from the tree (#2351)
The VA can internally check CAA and this additional code was deemed unneeded complexity that could be hoisted outside of Boulder. Fixes #2346.
2016-11-23 08:42:33 -05:00
Jacob Hoffman-Andrews 557e7b1b5e Quick fix for flaky integration tests. (#2325)
Until recently, no RPCs would be performed during integration tests until all
servers were up and running. In https://github.com/letsencrypt/boulder/pull/2246
we added code so that RA now calls SA.CountCertificatesRange immediately on
startup. This should be fine - the message should be queued by AMQP and get
handled when SA starts up, typically is less than 3 seconds. However, this
caused a flaky test failure in CI. It's possible in some of these cases that SA
is actually taking 3 seconds to start, or it's possible that there is a bug that
depends on start order.

This change moves the SA to start earlier, which seems to reduce flakiness but
is not a real fix to the problem.
2016-11-10 15:08:25 -08:00
Jacob Hoffman-Andrews 87fee12d6c Improve single-ocsp command (#2181)
Output base64-encoded DER, as expected by ocsp-responder.
Use flags instead of template for Status, ThisUpdate, NextUpdate.
Provide better help.
Remove old test (wasn't run automatically).
Add it to integration test, and use its output for integration test of issuer ocsp-responder.

Add another slot to boulder-tools HSM image, to store root key.
2016-09-15 15:28:54 -07:00
Daniel McCarney a584f8de46 Allow `mailer` to reconnect to server. (#2101)
The `MailerImpl` gains a few new fields (`retryBase`, &  `retryMax`). These are used with `core.RetryBackoff` in `reconnect()` to implement exponential backoff in a reconnect attempt loop. Both `expiration-mailer` and `notify-mailer` are modified to add CLI args for these 2 flags and to wire them into the `MailerImpl` via its `New()` constructor.

In `MailerImpl`'s `SendMail()` function it now detects when `sendOne` returns an `io.EOF` error indicating that the server closed the connection unexpectedly. When this case occurs `reconnect()` is invoked. If the reconnect succeeds then we invoke `sendOne` again to try and complete the message sending operation that was interrupted by the disconnect.

For integration testing purposes I modified the `mail-test-srv` to support a `-closeChance` parameter between 0 and 100. This controls what % of `MAIL` commands will result in the server immediately closing the client connection before further processing. This allows us to simulate a flaky mailserver. `test/startservers.py` is modified to start the `mail-test-srv` with a 35% close chance to thoroughly test the reconnection logic during the existing `expiration-mailer` integration tests. I took this as a chance to do some slight clean-up of the `mail-test-srv` code (mostly removing global state).

For unit testing purposes I modified the mailer `TestConnect` test to abstract out a server that can operate similar to `mail-test-serv` (e.g. can close connections artificially).

This is testing a server that **closes** a connection, and not a server that **goes away/goes down**. E.g. the `core.RetryBackoff` sleeps themselves are not being tested. The client is disconnected and attempts a reconnection which always succeeds on the first try. To test a "gone away" server would require a more substantial rewrite of the unit tests and the `mail-test-srv`/integration tests. I think this matches the experience we have with MailChimp/Mandril closing long lived connections.
2016-08-15 14:14:49 -07:00
Ben Irving 159aeca64e Split up boulder-config.json (Single OCSP) + Cleanup (#2069)
This PR removes the use of the global configuration variable BOULDER_CONFIG. It also removes the global configuration struct cmd.Config. Furthermore, it removes the dependency codegangsta/cli and the last bit of code that was using it cmd/single-ocsp/main.go.

This is the final (hopefully) pull request in the work to remove the reliance on a global configuration structure. Included below is a history of all other pull requests relevant in accomplishing this:

 WFE (#1973)
 RA (#1974)
 SA (#1975)
 CA (#1978)
 VA (#1979)
 Publisher (#2008)
 OCSP Updater (#2013)
 OCSP Responder (#2017)
 Admin Revoker (#2053)
 Expiration Mailer (#2036)
 Cert Checker (#2058)
 Orphan Finder (#2059)
 Single OCSP (this PR)

Closes #1962
2016-07-22 12:39:29 -07:00
Roland Bracewell Shoemaker a0a9623cb6 Switch to using SoftHSM in Docker for testing (#1920)
Instead of reading the CA key from a file on disk into memory and using that for signing in `boulder-ca` this patch adds a new Docker container that runs SoftHSM and pkcs11-proxy in order to hold the key and perform signing operations. The pkcs11-proxy module is used by `boulder-ca` to talk to the SoftHSM container.

This exercises (almost) the full pkcs11 path through boulder and will allow testing various HSM related failures in the future as well as simplifying tuning signing performance for benchmarking.

Fixes #703.
2016-07-11 11:20:51 -07:00
Ben Irving 6007df8f3c Split up boulder-config.json (WFE) (#1973)
Moves the wfe to it's own config file.

Each config will now belong in `test/config` and `test/config-next` analogous to `boulder-config` and `boulder-config-next`.
2016-06-28 10:40:16 -07:00
Jacob Hoffman-Andrews 6b4c3bf63a Pass through BOULDER_CONFIG in .travis.yml (#1954)
Moving to Docker meant that we weren't passing through the BOULDER_CONFIG
variable properly, which meant we weren't testing the boulder-config-next.json
configuration. That allowed https://github.com/letsencrypt/boulder/issues/1948
to pass unnoticed.

Credit to @benileo for asking the question of why that issue wasn't caught in
testing, which led to this fix. Thanks for the attention to detail!
2016-06-22 11:25:01 -07:00
Jacob Hoffman-Andrews e804c18a06 Fix environment passing in startservers.py. (#1907)
In https://github.com/letsencrypt/boulder/pull/1885 I tried to simplify setting the GORACE environment variable, but that actually had the effect of inhibiting pass-through of all environment variables. This reverts that part of the change.
2016-06-08 16:22:21 -04:00
Jacob Hoffman-Andrews 163d9547f4 Remove the agreement flag from test.js. (#1885)
Since we only use this for testing, not a live client, it's unnecessary
complexity.
2016-06-06 13:19:57 -07:00
Jacob Hoffman-Andrews 6f082f397b Improve error logging in test.js (#1829)
Also fix a typo in startservers.py and quote variables in Makefile (provides more meaningful errors when they are unset).
2016-05-19 15:54:53 -07:00
Roland Bracewell Shoemaker 8eaf247ee9 Split CAA checking out to its own service (#1647)
* Split out CAA checking service (minus logging etc)
* Add example.yml config + follow general Boulder style
* Update protobuf package to correct version
* Add grpc client to va
* Add TLS authentication in both directions for CAA client/server
* Remove go lint check
* Add bcodes package listing custom codes for Boulder
* Add very basic (pull-only) gRPC metrics to VA + caa-service
2016-04-12 23:02:41 -07:00
Jacob Hoffman-Andrews d98eb634d1 Docker improvements.
Use bridged networking.

Add some files to .dockerignore to shrink the build state sent to Docker
daemon.

Use specific hostnames to contact services, rather than localhost.

Add instructions for adding those hostnames to /etc/hosts in non-Docker config.

Use DSN-style connect strings for DBs.

Remove localhost / 127.0.0.1 rewrite hack from create_db.sh.

Add hosts section with new hostnames.

Remove bin from .dockerignore.

SQL grants go to %

Short-circuit DB creation if already existing.

Make `go install` a part of Docker image build so that Docker run is much
faster.

Bind to 0.0.0.0 for OCSP responders so they can be reached from host, and
publish / expose their ports.

Remove ToSServerThread and test.js' fetch of ToS.

Increase the registrationsPerIP rate limit threshold. When issuing from a Docker
host, the 127.0.0.1 override doesn't apply, so the limit is quickly hit.

Update docker-compose for bridged networking. Note: docker-compose doesn't currently work, but should be close.

https://github.com/letsencrypt/boulder/pull/1639
2016-04-04 16:05:08 -07:00
Kane York 98567efdfc Add integration tests for expiry mailer
This creates a new server, 'mail-test-srv', which is a simplistic SMTP
server that accepts mail and can report the received mail over HTTP.

An integration test is added that uses the new server to test the expiry
mailer.

The FAKECLOCK environment variable is used to force the expiry mailer to
think that the just-issued certificate is about to expire.

Additionally, the expiry mailer is modified to cleanly shut down its
SMTP connections.
2016-03-25 10:02:02 -07:00
Kane York 9e4066e0c7 Remove std_json build tag
After a review of the logs, it seems that no clients are using
capitalized or duplicate keys in the JWS bodies. Remove the std_json
build tag.
2016-03-22 14:00:33 -07:00
Roland Shoemaker 4d8c7a323f Set std_json build flag in order to preserve case insensitive JSON key parsing 2016-03-15 14:25:03 -07:00
Kane York a6317d1717 Introduce cmd.Clock() for use in integration tests
If the FAKECLOCK environment variable is set, and the build was in a
test environment, cmd.Clock will return a FakeClock with the time set to
the content of the environment variable.

The choice of the UnixDate format was because `date -d` is a common
choice for shell scripts.
2016-03-07 14:52:34 -08:00
Jacob Hoffman-Andrews f67648d22f Disable activity-monitor.
We no longer run this in prod, so we shouldn't run it in test / dev.
2016-01-05 14:50:25 -08:00
Jacob Hoffman-Andrews 9e4b0c1e5b Move RabbitMQ initialization into its own binary.
Previously our executables would all try to declare the boulder exchange on
startup, which may have been leading to some race conditions in Travis. Also,
the Activity Monitor would try to bind a queue to the exchange at startup.
In prod both of these tasks are taken care of administratively, so including
them in the app code was adding unnecessary complexity. It also may have been
part of an issue causing Activity Monitor to fail to start up recently.

Also, turn the Activity Monitor into an RPC service, which gets it reconnects
for free, and add it to startservers.py.
2015-11-29 16:55:03 -08:00
Jeff Hodges baaa8a6209 check for wfe port liveness in integration tests
We're getting many more spurious "Can't connect to WFE" errors in
TravisCI. So, we add the WFE's main port to the port liveness check in
amqp-integration-test.py.

Fixes #1099
2015-11-05 15:32:05 -08:00
Jacob Hoffman-Andrews 194e421931 Add reconnects in AMQP. 2015-10-27 19:54:54 -07:00
Jacob Hoffman-Andrews 17918010dc Allow override of all build flags in Makefile.
I think even the ldflags that did not change between subsequent invocations of
./start.py, e.g. BUILD_HOST_VAR, were different between ./start.py and
`go test ./...`, which would cause test runs to be unnecessarily slow.

Open question: To keep local developer builds fast, maybe we should enable race
detection only in Travis? Otherwise, `go test ./...` runs with one set of
ldflags, and then `ampq-integration-test.py` runs with a different set, which I
think makes both of them slower.
2015-10-03 12:45:06 -07:00
Roland Shoemaker 3be29f8288 Remove cmd/ prefix 2015-10-01 17:30:49 -07:00
Roland Shoemaker 9dc7b2d682 Merge master 2015-10-01 17:23:48 -07:00
Roland Shoemaker 2d0dee4ce1 Daemonize the OCSP updater tool so we are constantly updating OCSP responses.
also moves the first OCSP responses generation from the CA to the OCSP updater. This patch lays the
ground work for moving CT submission and adding CT backfill to the OCSP updater.
2015-10-01 16:36:51 -07:00
Jacob Hoffman-Andrews b3aca1ff2b Speed up tests.
Make `make` aware of output files so it doesn't always have to rebuild. Also
make it use `go install`, which is faster than building files individually.

Now that make is faster, use it in startservers.py to consolidate building
logic. This also has the handy side-effect that ./start.py exposes useful build
information through /build, whereas before only the .rpm packaged version did.

Additionally, this allows us to remove `make` from the Travis matrix, since we
are running `make` as part of the integration test. This means each PR only
triggers two Travis builds instead of one, which means we will get results from
Travis faster.

Also, change the Travis matrix logic to be a list of actions to run, rather than
a list of actions to skip. That fixes
https://github.com/letsencrypt/boulder/issues/817.

Enumerate specific sections of test.sh to run, rather than sections to skip.

Note: ./start.py now installs into ./bin/ instead of $GOPATH/bin.

Only set up GitHub secret file (for PR status reporting) when available, and
decrypt it into /tmp rather than $HOME, to avoid accidentally caching it once
Travis' caching features are available.

Clone letsencrypt repo into $HOME instead of $TMP, to make it possible to cache
eventually.

Remove unused `mysql` dependency in Travis.

Override default Travis install command to prevent it from adding
Godeps/_workspace to GOPATH. When that happens, it hides failures that should
arise from importing non-vendorized paths.
2015-10-01 16:28:17 -07:00
Jacob Hoffman-Andrews 540c792474 Add an OCSP responder that serves from a file.
This is useful for intermediate and root OCSP, which are generated manually one
a year.
2015-09-23 16:34:13 -07:00
Jacob Hoffman-Andrews 5666b5a59a Add dummy CT log server for integration testing. 2015-09-22 17:10:38 -07:00
Roland Shoemaker ff6eca7a29 Submit all issued certificates to configured CT logs
Adds a new service, Publisher, which exists to submit issued certificates to various Certificate Transparency logs. Once submitted the Publisher will also parse and store the returned SCT (Signed Certificate Timestamp) receipts that are used to prove inclusion in a specific log in the SA database. A SA migration adds the new SCT receipt table.

The Publisher only exposes one method, SubmitToCT, which is called in a goroutine by ca.IssueCertificate as to not block any other issuance operations. This method will iterate through all of the configured logs attempting to submit the certificate, and any required intermediate certificates, to them. If a submission to a log fails it will be retried the pre-configured number of times and will either use a back-off set in a Retry-After header or a pre-configured back-off between submission attempts.

This changeset is the first of a number of changes ending with serving SCT receipts in OCSP responses and purposefully leaves out the following pieces for follow-up PRs.

* A fake CT server for integration testing
* A external tool to search the database for certificates lacking a full set of SCT receipts
* A method to construct X.509 v3 extensions containing receipts for the OCSP responder
* Returned SCT signature verification (beyond just checking that the signature is of the correct type so we aren't just serving arbitrary binary blobs to clients)

Resolves #95.
2015-09-17 18:11:05 -07:00
Jacob Hoffman-Andrews fc70f00fb3 Restore `exec` command to startservers.py.
Fixes https://github.com/letsencrypt/boulder/issues/671
2015-08-27 12:56:36 -07:00
Jacob Hoffman-Andrews 02c22c40aa Fix error output in startservers.
Previously startservers would crash with an error about concatenating NoneType and string, if there was a build erro.r
2015-08-26 10:12:29 -07:00
Jeff Hodges 469253a9e3 fix some dregs in startservers.py
Changes this to use just communicate(), not the subprocess.PIPE stuff (which
apparently can do Weird Things)

Also rename the install variable to cmd in the install function
2015-08-25 21:42:48 -07:00
Jeff Hodges 3a4fef4463 install boulder cmds in one cmd in startserver.py
This eases the CPU and thread requirements of our tests (by forking
less, not doing everything at once). It should also speed up the tests
by avoiding certain repetitive work.

Updates https://github.com/letsencrypt/letsencrypt/issues/712
2015-08-25 16:02:08 -07:00
Jacob Hoffman-Andrews f6c21120b0 Add OCSP testing to integration test. 2015-08-20 09:37:24 -07:00
Jacob Hoffman-Andrews bcfb935472 Fail startservers.py when compile fails. 2015-08-07 17:55:43 -07:00
Jacob Hoffman-Andrews 9b20f0afaf Startservers.py: remove tempdir, add sys.exit 2015-07-29 11:15:01 -07:00
Jacob Hoffman-Andrews 237f759ac9 Use go install for even more speed. 2015-07-28 18:29:39 -07:00
Jacob Hoffman-Andrews d69f97e954 Fix exception handling. 2015-07-28 18:11:52 -07:00
Jacob Hoffman-Andrews a4c4b473f1 Speed up start.py and integration test.
Run builds in parallell as well as starting servers in parallel.
Wait for the servers to come up, so tests don't start running too early.
Enable race detection only for the integration test, not for start.py.
Previously I'd suggested it should always be on, but after running with it for a
while I'm convinced it's too slow for start.py (but still very valuable for
integration tests!).
2015-07-28 18:07:22 -07:00
Tom Clegg 2914ba6af5 Fix "main process kept alive forever by ToSServerThread." 2015-07-25 18:17:02 -04:00
Tom Clegg e6ca449d34 Bring up a stub ToS server in test scripts. 2015-07-25 16:21:40 -04:00
Tom Clegg e871b30cbf Shut down everything if any server exits before ^C/timer. Fixup log messages. 2015-07-25 15:59:38 -04:00
Tom Clegg 43c738cc93 Set GORACE env var only in "go build", not everywhere. 2015-07-25 14:51:22 -04:00
Tom Clegg de5cce8c03 De-duplicate start.py and test/amqp-integration-test.py 2015-07-25 04:04:20 -04:00