boulder

Commit Graph

Author	SHA1	Message	Date
Roland Bracewell Shoemaker	af41bea99a	Switch to more efficient multi nonce-service design (#4308 ) Basically a complete re-write/re-design of the forwarding concept introduced in #4297 (sorry for the rapid churn here). Instead of nonce-services blindly forwarding nonces around to each other in an attempt to find out who issued the nonce we add an identifying prefix to each nonce generated by a service. The WFEs then use this prefix to decide which nonce-service to ask to validate the nonce. This requires a slightly more complicated configuration at the WFE/2 end, but overall I think ends up being a way cleaner, more understandable, easy to reason about implementation. When configuring the WFE you need to provide two forms of gRPC config: * one gRPC config for retrieving nonces, this should be a DNS name that resolves to all available nonce-services (or at least the ones you want to retrieve nonces from locally, in a two DC setup you might only configure the nonce-services that are in the same DC as the WFE instance). This allows getting a nonce from any of the configured services and is load-balanced transparently at the gRPC layer. * a map of nonce prefixes to gRPC configs, this maps each individual nonce-service to it's prefix and allows the WFE instances to figure out which nonce-service to ask to validate a nonce it has received (in a two DC setup you'd want to configure this with all the nonce-services across both DCs so that you can validate a nonce that was generated by a nonce-service in another DC). This balancing is implemented in the integration tests. Given the current remote nonce code hasn't been deployed anywhere yet this makes a number of hard breaking changes to both the existing nonce-service code, and the forwarding code. Fixes #4303.	2019-06-28 12:58:46 -04:00
Roland Bracewell Shoemaker	844ae26b65	Allow forwarding of nonce-service Redeem RPCs from one service… (#4297 ) Fixes #4295.	2019-06-26 13:04:31 -07:00
Jacob Hoffman-Andrews	df19fd9e58	Integration test for v1 authz reuse when v2 flag is enabled (#4288 ) When NewAuthorizationSchema is enabled, we still want v1 authzs to be reusable in new orders. This tests that that code is implemented correctly. Updates #4241	2019-06-25 10:50:58 -07:00
Roland Bracewell Shoemaker	24f150f8fc	Re-apply #4279 with requests fix (#4286 ) Move from using `requests` to `urllib2` in `helpers.py`. Verified this works with `docker-compose up`. In the future we really should be installing our own python dependencies in the boulder-tools image rather than relying on getting them by using the certbot virtualenv.	2019-06-24 11:58:37 -07:00
Adrien Ferrand	8e31d58113	Revert "tests: Switch to instant OCSP verification in int. tests (#4279 )" (#4285 ) This reverts commit `f4b9235acb`. Fixes #4284	2019-06-23 12:06:16 -07:00
Roland Bracewell Shoemaker	f4b9235acb	tests: Switch to instant OCSP verification in int. tests (#4279 ) * Switch to instant OCSP verification in integration tests * Move waitport to helpers and use it to determine if ocsp-responder is alive in test_single_ocsp	2019-06-21 09:53:01 -04:00
Jacob Hoffman-Andrews	18a3c78d6f	Refactor test_caa and twenty-days-ago setup (#4261 ) As part of #4241, I need to introduce some twenty-days-ago setup. So I refactored the only current instance (test_caa) to use a style where setup functions can be registered right next to the test cases they affect. The @register_twenty_days_ago is Python for "call register_twenty_days_ago with the thing on the next line as an argument." I also cleaned up a bunch of related stuff: * Removed the ACCOUNT_URI environment variable and associated function params. This was introduced in in #3736 to pass a URI to challtestsrv before we refactored for more dynamic updates. It's not used any more. * Removed a try / except from startChallSrv that needlessly hid errors. * Move setting of DNS fixtures for caa_test into the test case itself.	2019-06-18 14:58:06 -07:00
Roland Bracewell Shoemaker	4ca01b5de3	Implement standalone nonce service (#4228 ) Fixes #3976.	2019-06-05 10:41:19 -07:00
Daniel McCarney	b99b35009e	load-generator: support all challenge types, run in CI. (#4140 ) ## CI: restore load-generator run. This restores running the `load-generator` during CI to make sure it doesn't bitrot. It was previously removed while we debugged the VA getting jammed up and not cleanly shutting down. Since the global `pebble-challtestsrv` and the `load-generator`'s internal chall test srv will conflict this requires moving the `load-generator` run to the end of integration tests and updating `startservers.py` to allow the load gen integration test code to stop the `pebble-challtestsrv` before starting the `load-generator`. The `load-generator` and associated config are updated to allow specifying bind addresses for the DNS interface of the internal challtestsrv. Multiple addresses are supported so that the `load-generator`'s chall test srv can listen on port DNS ports Boulder is configured to use. The `load-generator` config now accepts a `fakeDNS` parameter that can be used to specify the default IPv4 address returned by the `load-generator`'s DNS server for A queries. ## load-generator: support different challenges/strategies. Updates the load-generator to support HTTP-01, DNS-01, and TLS-ALPN-01 challenge response servers. A new challenge selection configuration parameter (`ChallengeStrategy`) can be set to `"http-01"`, `"dns-01"`, or `"tls-alpn-01"` to solve only challenges of that type. Using `"random"` will let the load-generator choose a challenge type randomly. Resolves https://github.com/letsencrypt/boulder/issues/3900	2019-04-04 11:44:14 -07:00
Jacob Hoffman-Andrews	cb86f9e850	Copy boulder-va to boulder-remoteva. (#4128 ) This will make it easier to distinguish in logs.	2019-03-20 16:07:51 -04:00
Jacob Hoffman-Andrews	677b9b88ad	Remove GSB support. (#4115 ) This is no longer enabled in prod; cleaning up the code. https://community.letsencrypt.org/t/let-s-encrypt-no-longer-checking-google-safe-browsing/82168	2019-03-15 10:24:44 -07:00
Daniel McCarney	1c0be52e53	VA: Add integration test for HTTP timeouts. (#4050 ) Also update `TestHTTPTimeout` to test with the `SimplifiedVAHTTP` feature flag enabled.	2019-02-12 13:42:01 -08:00
Roland Bracewell Shoemaker	046955e99c	Add a standalone akamai purger service (#4040 ) Fixes #4030.	2019-02-05 09:00:31 -08:00
Daniel McCarney	f72c371bdc	Set pebble-challtestsrv IP from FAKE_DNS at startup. (#3984 ) `pebble-challtestsrv` added a `-defaultIPv4` arg we can use to simplify the integration tests and fix FAKE_DNS usage outside of integration tests. A new boulder-tools image with an updated `pebble-challtestsrv` is used and `test/startservers.py` is changed to populate `-defaultIPv4` via the `FAKE_DNS` env var.	2018-12-13 13:49:12 -05:00
Daniel McCarney	893e8459d6	Use pebble-challtestrv cmd, letsencrypt/challtestsrv package. (#3980 ) Now that Pebble has a `pebble-challtestsrv` we can remove the `challtestrv` package and associated command from Boulder. I switched CI to use `pebble-challtestsrv`. Notably this means that we have to add our expected mock data using the HTTP management interface. The Boulder-tools images are regenerated to include the `pebble-challtestsrv` command. Using this approach also allows separating the TLS-ALPN-01 and HTTPS HTTP-01 challenges by binding each challenge type in the `pebble-challtestsrv` to different interfaces both using the same VA HTTPS port. Mock DNS directs the VA to the correct interface. The load-generator command that was previously using the `challtestsrv` package from Boulder is updated to use a vendored copy of the new `github.org/letsencrypt/challtestsrv` package. Vendored dependencies change in two ways: 1) Gomock is updated to the latest release (matching what the Bouldertools image provides) 2) A couple of new subpackages in `golang.org/x/net/` are added by way of transitive dependency through the challtestsrv package. Unit tests are confirmed to pass for `gomock`: ``` ~/go/src/github.com/golang/mock/gomock$ git log --pretty=format:'%h' -n 1 51421b9 ~/go/src/github.com/golang/mock/gomock$ go test ./... ok github.com/golang/mock/gomock 0.002s ? github.com/golang/mock/gomock/internal/mock_matcher [no test files] ``` For `/x/net` all tests pass except two `/x/net/icmp` `TestDiag.go` test cases that we have agreed are OK to ignore. Resolves https://github.com/letsencrypt/boulder/issues/3962 and https://github.com/letsencrypt/boulder/issues/3951	2018-12-12 14:32:56 -05:00
Daniel McCarney	bd4c254942	Use Challtestsrv for HTTP-01 integration tests, add redirect tests (#3960 ) To complete https://github.com/letsencrypt/boulder/issues/3956 the `challtestsrv` is updated such that its existing TLS-ALPN-01 challenge test server will serve HTTP-01 responses with a self-signed certificate when a non-TLS-ALPN-01 request arrives. This lets the TLS-ALPN-01 challenge server double as a HTTPS version of the HTTP challenge server. The `challtestsrv` now also supports adding/remove redirects that will be served to clients when requesting matching paths. The existing chisel/chisel2 integration tests are updated to use the `challtestsrv` instead of starting their own standalone servers. This centralizes our mock challenge responses and lets us bind the `challtestsrv` to the VA's HTTP port in `startservers.py` without clashing ports later on. New integration tests are added for HTTP-01 redirect scenarios using the updated `challtestserv`. These test cases cover: * valid HTTP -> HTTP redirect * valid HTTP -> HTTPS redirect * Invalid HTTP -> non-HTTP/HTTPS port redirect * Invalid HTTP-> non-HTTP/HTTPS protocol scheme redirect * Invalid HTTP-> bare IP redirect * Invalid HTTP redirect loop The new integration tests shook out two fixes that were required for the legacy VA HTTP-01 code (`afad22b`) and one fix for the challtestsrv mock DNS (`59b7d6d`). Resolves https://github.com/letsencrypt/boulder/issues/3956	2018-11-30 17:20:10 -05:00
Roland Bracewell Shoemaker	ba7a8e8e5d	Add fake Akamai purge server for integration testing (#3946 ) Fixes #3916.	2018-11-27 09:49:05 -05:00
Roland Bracewell Shoemaker	9b94d4fdfe	Add a orphan queue to the CA (#3832 ) Retains the existing logging of orphaned certs until we are confident that this solution can fully replace it (even then we may want to keep it just for auditing etc). Fixes #3636.	2018-09-05 11:12:07 -07:00
Roland Bracewell Shoemaker	9ea4a54ca2	Use challtestsrv for solving TLS-ALPN-01 in integration tests (#3789 ) Also in the process fix some errors I made in the original challtestsrv TLS-ALPN-01 implementation. Fixes #3780.	2018-07-03 10:41:20 -04:00
Joel Sing	9c2859c87b	Add support for CAA account-uri validation. (#3736 ) This adds support for the account-uri CAA parameter as specified by section 3 of https://tools.ietf.org/html/draft-ietf-acme-caa-04, allowing issuance to be restricted to one or more ACME accounts as specified by CAA records.	2018-06-08 12:08:03 -07:00
Jacob Hoffman-Andrews	dbcb16543e	Start using multiple-IP hostnames for load balancing (#3687 ) We'd like to start using the DNS load balancer in the latest version of gRPC. That means putting all IPs for a service under a single hostname (or using a SRV record, but we're not taking that path). This change adds an sd-test-srv to act as our service discovery DNS service. It returns both Boulder IP addresses for any A lookup ending in ".boulder". This change also sets up the Docker DNS for our boulder container to defer to sd-test-srv when it doesn't know an answer. sd-test-srv doesn't know how to resolve public Internet names like `github.com`. Resolving public names is required for the `godep-restore` test phase, so this change breaks out a copy of the boulder container that is used only for `godep-restore`. This change implements a shim of a DNS resolver for gRPC, so that we can switch to DNS-based load balancing with the currently vendored gRPC, then when we upgrade to the latest gRPC we won't need a simultaneous config update. Also, this change introduces a check at the end of the integration test that each backend received at least one RPC, ensuring that we are not sending all load to a single backend.	2018-05-23 09:47:14 -04:00
Daniel McCarney	c254159235	challsrv: Common ACME challenge response server library/command. (#3689 ) Prior to this commit we had two implementations of ACME challenge servers for use in tests: 1) test/dns-test-srv - a small fake DNS server used for adding/removing DNS-01 TXT records and returning fake A/AAAA data. 2) test/load-generator/challenge-servers.go - a small library for providing an HTTP-01 challenge server. This commit consolidates both into a dedicated `test/challsrv` package. The `load-generator` code is updated to use this library package to implement its HTTP-01 challenge server. This leaves the `load-generator` as a nice stand alone tool that doesn't need coordination between itself and a separate `challsrv` binary. To keep the `dns-test-srv` use-case of a nice standalone binary that can be run from `test/startservers.py` the `test/challsrv` package has a `test/challsrv/cmd/challsrv` package that provides the `challsrv` command. This is a stand-alone binary that can offer both an HTTP-01 and a DNS-01 challenge server along with a management HTTP interface that can be used by external programs to add/remove HTTP-01 and DNS-01 challenges. The Boulder integration tests are updated to use `challsrv` instead of `dns-test-srv`. Presently only the DNS-01 challenge server of `challsrv` is used by the integration tests. TODO: The DNS-01 challenge server is doing a fair number of non-DNS-01 challenge things (Fake host data, etc). This should be cleaned up and made configurable. Updates #3652	2018-05-09 12:49:13 -07:00
Jacob Hoffman-Andrews	a4421ae75b	Run gRPC backends on multiple IPs instead of multiple ports (#3679 ) We're currently stuck on gRPC v1.1 because of a breaking change to certificate validation in gRPC 1.8. Our gRPC balancer uses a static list of multiple hostnames, and expects to validate against those hostnames. However gRPC expects that a service is one hostname, with multiple IP addresses, and validates all those IP addresses against the same hostname. See grpc/grpc-go#2012. If we follow gRPC's assumptions, we can rip out our custom Balancer and custom TransportCredentials, and will probably have a lower-friction time in general. This PR is the first step in doing so. In order to satisfy the "multiple IPs, one port" property of gRPC backends in our Docker container infrastructure, we switch to Docker's user-defined networking. This allows us to give the Boulder container multiple IP addresses on different local networks, and gives it different DNS aliases in each network. In startservers.py, each shard of a service listens on a different DNS alias for that service, and therefore a different IP address. The listening port for each shard of a service is now identical. This change also updates the gRPC service certificates. Now, each certificate that is used in a gRPC service (as opposed to something that is "only" a client) has three names. For instance, sa1.boulder, sa2.boulder, and sa.boulder (the generic service name). For now, we are validating against the specific hostnames. When we update our gRPC dependency, we will begin validating against the generic service name. Incidentally, the DNS aliases feature of Docker allows us to get rid of some hackery in entrypoint.sh that inserted entries into /etc/hosts. Note: Boulder now has a dependency on the DNS aliases feature in Docker. By default, docker-compose run creates a temporary container and doesn't assign any aliases to it. We now need to specify docker-compose run --use-aliases to get the correct behavior. Without --use-aliases, Boulder won't be able to resolve the hostnames it wants to bind to.	2018-05-07 10:38:31 -07:00
Roland Bracewell Shoemaker	24cd01d033	Revert to setting full addresses instead of just ports	2018-04-23 12:39:28 -07:00
Roland Bracewell Shoemaker	5c4eaf841f	Review fixes	2018-04-20 16:03:55 -07:00
Roland Bracewell Shoemaker	0a86573a73	Update integration tests	2018-04-20 13:18:40 -07:00
Jacob Hoffman-Andrews	007a75f2e5	Remove multi-va errors in integration tests. (#3448 ) We recently reordered the programs in startservers.py to reflect dependencies, so we wouldn't generate gRPC errors when a process comes up before its backend. However, we missed the remote VAs. This change moves them to the beginning. Also, don't print "all processes started" after each process starts.	2018-02-15 09:56:04 -05:00
Jacob Hoffman-Andrews	c556a1a20d	Reduce spurious errors in integration test (#3436 ) Boulder is fairly noisy about gRPC connection errors. This is a mixed blessing: Our gRPC configuration will try to reconnect until it hits an RPC deadline, and most likely eventually succeed. In that case, we don't consider those to really be errors. However, in cases where a connection is repeatedly failing, we'd like to see errors in the logs about connection failure, rather than "deadline exceeded." So we want to keep logging of gRPC errors. However, right now we get a lot of these errors logged during integration tests. They make the output hard to read, and may disguise more serious errors. So we'd like to avoid causing such errors in normal integration test operation. This change reorders the startup of Boulder components by their gRPC dependencies, so everything's backend is likely to be up and running before it starts. It also reverses that order for clean shutdowns, and waits for each process to exit before signalling the next one. With these changes, I still got connection errors. Taking listenbuddy out of the gRPC path fixed them. I believe the issue is that listenbuddy is not a truly transparent proxy. In particular, it accepts an inbound TCP connection before opening an outbound TCP connection. If opening that outbound connection results in "connection refused," it closes the inbound connection. That means gRPC sees a "connection closed" (or "connection reset"?) rather than "connection refused". I'm guessing it handles those cases differently, explaining the different error results. We've been using listenbuddy to trigger disconnects while Boulder is running, to ensure that gRPC's reconnect code works. I think we can probably rely on gRPC's reconnect to work. The initial problem that led us to start testing this was a configuration problem; now that we have the configuration we want, we should be fine and don't need to keep testing reconnects on every integration test run.	2018-02-12 18:17:50 -08:00
Jacob Hoffman-Andrews	2dc3b56fa9	Add variable latency to ct-test-srv (#3435 ) For the upcoming SCT embedding changes, it will be useful to have a CT test server that blocks for nontrivial amounts of time before responding. This change introduces a config file for `ct-test-srv` that can be used to set up multiple "personalities" on various ports. Each personality can have a "latencySchedule" that determines how long it will sleep before servicing responding to a submission. This change also introduces two new "personalities" on :4510 and :4511, plus configures CTLogGroups in the RA. Having four CT log personalities allows us to simulate two nontrivial log groups. Note: This triggers Publisher to emit audit errors on timed-out submissions. We may want to make Publisher not treat those as errors, and instead only log an error if a whole log group fails.	2018-02-09 13:48:19 -08:00
Jacob Hoffman-Andrews	4296dd985a	Use TLS in mailer integration tests (#3213 ) * Remove non-TLS support from mailer entirely * Add a config option for trusted roots in expiration-mailer. If unset, it defaults to the system roots, so this does not need to be set in production. * Use TLS in mail-test-srv, along with an internal root and localhost certificates signed by that root.	2017-11-06 14:57:14 -08:00
Jacob Hoffman-Andrews	4128e0d95a	Add time-dependent integration testing (#3060 ) Fixes #3020. In order to write integration tests for some features, especially related to rate limiting, rechecking of CAA, and expiration of authzs, orders, and certs, we need to be able to fake the passage of time in integration tests. To do so, this change switches out all clock.Default() instances for cmd.Clock(), which can be set manually with the FAKECLOCK environment variable. integration-test.py now starts up all servers once before the main body of tests, with FAKECLOCK set to a date 70 days ago, and does some initial setup for a new integration test case. That test case tries to fetch a 70-day-old authz URL, and expects it to 404. In order to make this work, I also had to change a number of our test binaries to shut down cleanly in response to SIGTERM. Without that change, stopping the servers between the setup phase and the main tests caused startservers.check() to fail, because some processes exited with nonzero status. Note: This is an initial stab at things, to prove out the technique. Long-term, I think we will want to use an idiom where test cases are classes that have a number of optional setup phases that may be run at e.g. 70 days prior and 5 days prior. This could help us avoid a proliferation of global state as we add more time-dependent test cases.	2017-09-13 12:34:14 -07:00
Jacob Hoffman-Andrews	20ec1e3e4e	Filter spurious shutdown errors. (#3052 ) Previously, we would produce an error an a nonzero status code on shutdown, because gRPC's GracefulStop would cause s.Serve() to return an error. Now we filter that specific error and treat it as success. This also allows us to kill process with SIGTERM instead of SIGKILL in integration tests. Fixes #2410.	2017-09-07 13:45:32 -07:00
Daniel McCarney	bd3e2747ba	Duplicate WFE to WFE2. (#2839 ) This PR is the initial duplication of the WFE to create a WFE2 package. The rationale is briefly explained in `wfe2/README.md`. Per #2822 this PR only lays the groundwork for further customization and deduplication. Presently both the WFE and WFE2 are identical except for the following configuration differences: * The WFE offers HTTP and HTTPS on 4000 and 4430 respectively, the WFE2 offers HTTP on 4001 and 4431. * The WFE has a debug port on 8000, the WFE2 uses the next free "8000 range port" and puts its debug service on 8013 Resolves https://github.com/letsencrypt/boulder/issues/2822	2017-07-05 13:32:45 -07:00
Roland Bracewell Shoemaker	088b872287	Implement multi VA validation (#2802 ) Adds basic multi-path validation functionality. A new method `performRemoteValidation` is added to `boulder-va` which is called if it is configured with a list of remote VA gRPC addresses. In this initial implementation the remote VAs are only used to check the validation result of the main VA, if all of the remote validations succeed but the local validation failed, the overall validation will still fail. Remote VAs use the exact same code as the local VA to perform validation. If the local validation succeeds then a configured quorum of the remote VA successes must be met in order to fully complete the validation. This implementation assumes that metrics are collected from the remote VAs in order to have visibility into their individual validation latencies etc. Fixes #2621.	2017-06-29 14:11:01 -07:00
Roland Bracewell Shoemaker	a46d30945c	Purge remaining AMQP code (#2648 ) Deletes github.com/streadway/amqp and the various RabbitMQ setup tools etc. Changes how listenbuddy is used to proxy all of the gRPC client -> server connections so we test reconnection logic. +49 -8,221 😁 Fixes #2640 and #2562.	2017-04-04 15:02:22 -07:00
Patrick Figel	6ba8aadfd7	Use X.509 AIA Issuer URL in rel="up" link header (#2545 ) In order to provide the correct issuer certificate for older certificates after an issuer certificate rollover or when using multiple issuer certificates (e.g. RSA and ECDSA), use the AIA CA Issuer URL embedded in the certificate for the rel="up" link served by WFE. This behaviour is gated behind the UseAIAIssuerURL feature, which defaults to false. To prevent MitM vulnerabilities in cases where the AIA URL is HTTP-only, it is upgraded to HTTPS. This also adds a test for the issuer URL returned by the /acme/cert endpoint. wfe/test/178.{crt,key} were regenerated to add the AIA extension required to pass the test. /acme/cert was changed to return an absolute URL to the issuer endpoint (making it consistent with /acme/new-cert). Fixes #1663 Based on #1780	2017-02-07 11:19:22 -08:00
Daniel McCarney	15e73edc5a	Google Safe Browsing V4 Improvements (#2504 ) This PR has three primary contributions: 1. The existing code for using the V4 safe browsing API introduced in #2446 had some bugs that are fixed in this PR. 2. A gsb-test-srv is added to provide a mock Google Safebrowsing V4 server for integration testing purposes. 3. A short integration test is added to test end-to-end GSB lookup for an "unsafe" domain. For 1) most notably Boulder was assuming the new V4 library accepted a directory for its database persistence when it instead expects an existing file to be provided. Additionally the VA wasn't properly instantiating feature flags preventing the V4 api from being used by the VA. For 2) the test server is designed to have a fixed set of "bad" domains (Currently just honest.achmeds.discount.hosting.com). When asked for a database update by a client it will package the list of bad domains up & send them to the client. When the client is asked to do a URL lookup it will check the local database for a matching prefix, and if found, perform a lookup against the test server. The test server will process the lookup and increment a count for how many times the bad domain was asked about. For 3) the Boulder startservers.py was updated to start the gsb-test-srv and the VA is configured to talk to it using the V4 API. The integration test consists of attempting issuance for a domain pre-configured in the gsb-test-srv as a bad domain. If the issuance succeeds we know the GSB lookup code is faulty. If the issuance fails, we check that the gsb-test-srv received the correct number of lookups for the "bad" domain and fail if the expected isn't reality. Notes for reviewers: * The gsb-test-srv has to be started before anything will use it. Right now the v4 library handles database update request failures poorly and will not retry for 30min. See google/safebrowsing#44 for more information. * There's not an easy way to test for "good" domain lookups, only hits against the list. The design of the V4 API is such that a list of prefixes is delivered to the client in the db update phase and if the domain in question matches no prefixes then the lookup is deemed unneccesary and not performed. I experimented with sending 256 1 byte prefixes to try and trick the client to always do a lookup, but the min prefix size is 4 bytes and enumerating all possible prefixes seemed gross. * The test server has a /add endpoint that could be used by integration tests to add new domains to the block list, but it isn't being used presently. The trouble is that the client only updates its database every 30 minutes at present, and so adding a new domain will only take affect after the client updates the database. Resolves #2448	2017-01-23 11:07:20 -08:00
Jacob Hoffman-Andrews	16ab736c07	Temporarily switch to SIGKILL for startservers shutdown. (#2512 ) Unfortunately our clean shutdown code paths are too noisy, and often obscure real errors. We can turn this back to SIGTERM once that's fixed.	2017-01-19 16:45:43 -08:00
Jacob Hoffman-Andrews	0a367962d6	Make restarting boulder in docker nicer. (#2492 ) * Make restarting boulder in docker nicer. Handle SIGTERM in startservers.py. Forcibly remove rsyslog pid to avoid error. * Add explanatory comment. * Send SIGTERM instead of kill. * Further improvements. - Handle SIGINT too. - Use unbuffered mode for Python so the print statements (like "all servers running") get printed right away rather than at shutdown - Squelch an unnecessary OSError about interrupting the wait() call.	2017-01-13 11:55:28 -05:00
Jacob Hoffman-Andrews	5407a45b02	Revert "Disable fail-fast for gRPC. (#2397 )" (#2427 ) This reverts commit `5b865f1d63`. The QueueDeclare and QueueBind calls in that change caused AMQP permission denied errors.	2016-12-13 13:20:08 -08:00
Jacob Hoffman-Andrews	5b865f1d63	Disable fail-fast for gRPC. (#2397 ) This allows us to restart backends with relatively little interruption in service, provided the backends come up promptly. Fixes #2389 and #2408	2016-12-09 12:03:45 -08:00
Roland Bracewell Shoemaker	e2155388a1	Remove caa-checker from the tree (#2351 ) The VA can internally check CAA and this additional code was deemed unneeded complexity that could be hoisted outside of Boulder. Fixes #2346.	2016-11-23 08:42:33 -05:00
Jacob Hoffman-Andrews	557e7b1b5e	Quick fix for flaky integration tests. (#2325 ) Until recently, no RPCs would be performed during integration tests until all servers were up and running. In https://github.com/letsencrypt/boulder/pull/2246 we added code so that RA now calls SA.CountCertificatesRange immediately on startup. This should be fine - the message should be queued by AMQP and get handled when SA starts up, typically is less than 3 seconds. However, this caused a flaky test failure in CI. It's possible in some of these cases that SA is actually taking 3 seconds to start, or it's possible that there is a bug that depends on start order. This change moves the SA to start earlier, which seems to reduce flakiness but is not a real fix to the problem.	2016-11-10 15:08:25 -08:00
Jacob Hoffman-Andrews	87fee12d6c	Improve single-ocsp command (#2181 ) Output base64-encoded DER, as expected by ocsp-responder. Use flags instead of template for Status, ThisUpdate, NextUpdate. Provide better help. Remove old test (wasn't run automatically). Add it to integration test, and use its output for integration test of issuer ocsp-responder. Add another slot to boulder-tools HSM image, to store root key.	2016-09-15 15:28:54 -07:00
Daniel McCarney	a584f8de46	Allow `mailer` to reconnect to server. (#2101 ) The `MailerImpl` gains a few new fields (`retryBase`, & `retryMax`). These are used with `core.RetryBackoff` in `reconnect()` to implement exponential backoff in a reconnect attempt loop. Both `expiration-mailer` and `notify-mailer` are modified to add CLI args for these 2 flags and to wire them into the `MailerImpl` via its `New()` constructor. In `MailerImpl`'s `SendMail()` function it now detects when `sendOne` returns an `io.EOF` error indicating that the server closed the connection unexpectedly. When this case occurs `reconnect()` is invoked. If the reconnect succeeds then we invoke `sendOne` again to try and complete the message sending operation that was interrupted by the disconnect. For integration testing purposes I modified the `mail-test-srv` to support a `-closeChance` parameter between 0 and 100. This controls what % of `MAIL` commands will result in the server immediately closing the client connection before further processing. This allows us to simulate a flaky mailserver. `test/startservers.py` is modified to start the `mail-test-srv` with a 35% close chance to thoroughly test the reconnection logic during the existing `expiration-mailer` integration tests. I took this as a chance to do some slight clean-up of the `mail-test-srv` code (mostly removing global state). For unit testing purposes I modified the mailer `TestConnect` test to abstract out a server that can operate similar to `mail-test-serv` (e.g. can close connections artificially). This is testing a server that closes a connection, and not a server that goes away/goes down. E.g. the `core.RetryBackoff` sleeps themselves are not being tested. The client is disconnected and attempts a reconnection which always succeeds on the first try. To test a "gone away" server would require a more substantial rewrite of the unit tests and the `mail-test-srv`/integration tests. I think this matches the experience we have with MailChimp/Mandril closing long lived connections.	2016-08-15 14:14:49 -07:00
Ben Irving	159aeca64e	Split up boulder-config.json (Single OCSP) + Cleanup (#2069 ) This PR removes the use of the global configuration variable BOULDER_CONFIG. It also removes the global configuration struct cmd.Config. Furthermore, it removes the dependency codegangsta/cli and the last bit of code that was using it cmd/single-ocsp/main.go. This is the final (hopefully) pull request in the work to remove the reliance on a global configuration structure. Included below is a history of all other pull requests relevant in accomplishing this: WFE (#1973) RA (#1974) SA (#1975) CA (#1978) VA (#1979) Publisher (#2008) OCSP Updater (#2013) OCSP Responder (#2017) Admin Revoker (#2053) Expiration Mailer (#2036) Cert Checker (#2058) Orphan Finder (#2059) Single OCSP (this PR) Closes #1962	2016-07-22 12:39:29 -07:00
Roland Bracewell Shoemaker	a0a9623cb6	Switch to using SoftHSM in Docker for testing (#1920 ) Instead of reading the CA key from a file on disk into memory and using that for signing in `boulder-ca` this patch adds a new Docker container that runs SoftHSM and pkcs11-proxy in order to hold the key and perform signing operations. The pkcs11-proxy module is used by `boulder-ca` to talk to the SoftHSM container. This exercises (almost) the full pkcs11 path through boulder and will allow testing various HSM related failures in the future as well as simplifying tuning signing performance for benchmarking. Fixes #703.	2016-07-11 11:20:51 -07:00
Ben Irving	6007df8f3c	Split up boulder-config.json (WFE) (#1973 ) Moves the wfe to it's own config file. Each config will now belong in `test/config` and `test/config-next` analogous to `boulder-config` and `boulder-config-next`.	2016-06-28 10:40:16 -07:00
Jacob Hoffman-Andrews	6b4c3bf63a	Pass through BOULDER_CONFIG in .travis.yml (#1954 ) Moving to Docker meant that we weren't passing through the BOULDER_CONFIG variable properly, which meant we weren't testing the boulder-config-next.json configuration. That allowed https://github.com/letsencrypt/boulder/issues/1948 to pass unnoticed. Credit to @benileo for asking the question of why that issue wasn't caught in testing, which led to this fix. Thanks for the attention to detail!	2016-06-22 11:25:01 -07:00
Jacob Hoffman-Andrews	e804c18a06	Fix environment passing in startservers.py. (#1907 ) In https://github.com/letsencrypt/boulder/pull/1885 I tried to simplify setting the GORACE environment variable, but that actually had the effect of inhibiting pass-through of all environment variables. This reverts that part of the change.	2016-06-08 16:22:21 -04:00
Jacob Hoffman-Andrews	163d9547f4	Remove the agreement flag from test.js. (#1885 ) Since we only use this for testing, not a live client, it's unnecessary complexity.	2016-06-06 13:19:57 -07:00
Jacob Hoffman-Andrews	6f082f397b	Improve error logging in test.js (#1829 ) Also fix a typo in startservers.py and quote variables in Makefile (provides more meaningful errors when they are unset).	2016-05-19 15:54:53 -07:00
Roland Bracewell Shoemaker	8eaf247ee9	Split CAA checking out to its own service (#1647 ) * Split out CAA checking service (minus logging etc) * Add example.yml config + follow general Boulder style * Update protobuf package to correct version * Add grpc client to va * Add TLS authentication in both directions for CAA client/server * Remove go lint check * Add bcodes package listing custom codes for Boulder * Add very basic (pull-only) gRPC metrics to VA + caa-service	2016-04-12 23:02:41 -07:00
Jacob Hoffman-Andrews	d98eb634d1	Docker improvements. Use bridged networking. Add some files to .dockerignore to shrink the build state sent to Docker daemon. Use specific hostnames to contact services, rather than localhost. Add instructions for adding those hostnames to /etc/hosts in non-Docker config. Use DSN-style connect strings for DBs. Remove localhost / 127.0.0.1 rewrite hack from create_db.sh. Add hosts section with new hostnames. Remove bin from .dockerignore. SQL grants go to % Short-circuit DB creation if already existing. Make `go install` a part of Docker image build so that Docker run is much faster. Bind to 0.0.0.0 for OCSP responders so they can be reached from host, and publish / expose their ports. Remove ToSServerThread and test.js' fetch of ToS. Increase the registrationsPerIP rate limit threshold. When issuing from a Docker host, the 127.0.0.1 override doesn't apply, so the limit is quickly hit. Update docker-compose for bridged networking. Note: docker-compose doesn't currently work, but should be close. https://github.com/letsencrypt/boulder/pull/1639	2016-04-04 16:05:08 -07:00
Kane York	98567efdfc	Add integration tests for expiry mailer This creates a new server, 'mail-test-srv', which is a simplistic SMTP server that accepts mail and can report the received mail over HTTP. An integration test is added that uses the new server to test the expiry mailer. The FAKECLOCK environment variable is used to force the expiry mailer to think that the just-issued certificate is about to expire. Additionally, the expiry mailer is modified to cleanly shut down its SMTP connections.	2016-03-25 10:02:02 -07:00
Kane York	9e4066e0c7	Remove std_json build tag After a review of the logs, it seems that no clients are using capitalized or duplicate keys in the JWS bodies. Remove the std_json build tag.	2016-03-22 14:00:33 -07:00
Roland Shoemaker	4d8c7a323f	Set std_json build flag in order to preserve case insensitive JSON key parsing	2016-03-15 14:25:03 -07:00
Kane York	a6317d1717	Introduce cmd.Clock() for use in integration tests If the FAKECLOCK environment variable is set, and the build was in a test environment, cmd.Clock will return a FakeClock with the time set to the content of the environment variable. The choice of the UnixDate format was because `date -d` is a common choice for shell scripts.	2016-03-07 14:52:34 -08:00
Jacob Hoffman-Andrews	f67648d22f	Disable activity-monitor. We no longer run this in prod, so we shouldn't run it in test / dev.	2016-01-05 14:50:25 -08:00
Jacob Hoffman-Andrews	9e4b0c1e5b	Move RabbitMQ initialization into its own binary. Previously our executables would all try to declare the boulder exchange on startup, which may have been leading to some race conditions in Travis. Also, the Activity Monitor would try to bind a queue to the exchange at startup. In prod both of these tasks are taken care of administratively, so including them in the app code was adding unnecessary complexity. It also may have been part of an issue causing Activity Monitor to fail to start up recently. Also, turn the Activity Monitor into an RPC service, which gets it reconnects for free, and add it to startservers.py.	2015-11-29 16:55:03 -08:00
Jeff Hodges	baaa8a6209	check for wfe port liveness in integration tests We're getting many more spurious "Can't connect to WFE" errors in TravisCI. So, we add the WFE's main port to the port liveness check in amqp-integration-test.py. Fixes #1099	2015-11-05 15:32:05 -08:00
Jacob Hoffman-Andrews	194e421931	Add reconnects in AMQP.	2015-10-27 19:54:54 -07:00
Jacob Hoffman-Andrews	17918010dc	Allow override of all build flags in Makefile. I think even the ldflags that did not change between subsequent invocations of ./start.py, e.g. BUILD_HOST_VAR, were different between ./start.py and `go test ./...`, which would cause test runs to be unnecessarily slow. Open question: To keep local developer builds fast, maybe we should enable race detection only in Travis? Otherwise, `go test ./...` runs with one set of ldflags, and then `ampq-integration-test.py` runs with a different set, which I think makes both of them slower.	2015-10-03 12:45:06 -07:00
Roland Shoemaker	3be29f8288	Remove cmd/ prefix	2015-10-01 17:30:49 -07:00
Roland Shoemaker	9dc7b2d682	Merge master	2015-10-01 17:23:48 -07:00
Roland Shoemaker	2d0dee4ce1	Daemonize the OCSP updater tool so we are constantly updating OCSP responses. also moves the first OCSP responses generation from the CA to the OCSP updater. This patch lays the ground work for moving CT submission and adding CT backfill to the OCSP updater.	2015-10-01 16:36:51 -07:00
Jacob Hoffman-Andrews	b3aca1ff2b	Speed up tests. Make `make` aware of output files so it doesn't always have to rebuild. Also make it use `go install`, which is faster than building files individually. Now that make is faster, use it in startservers.py to consolidate building logic. This also has the handy side-effect that ./start.py exposes useful build information through /build, whereas before only the .rpm packaged version did. Additionally, this allows us to remove `make` from the Travis matrix, since we are running `make` as part of the integration test. This means each PR only triggers two Travis builds instead of one, which means we will get results from Travis faster. Also, change the Travis matrix logic to be a list of actions to run, rather than a list of actions to skip. That fixes https://github.com/letsencrypt/boulder/issues/817. Enumerate specific sections of test.sh to run, rather than sections to skip. Note: ./start.py now installs into ./bin/ instead of $GOPATH/bin. Only set up GitHub secret file (for PR status reporting) when available, and decrypt it into /tmp rather than $HOME, to avoid accidentally caching it once Travis' caching features are available. Clone letsencrypt repo into $HOME instead of $TMP, to make it possible to cache eventually. Remove unused `mysql` dependency in Travis. Override default Travis install command to prevent it from adding Godeps/_workspace to GOPATH. When that happens, it hides failures that should arise from importing non-vendorized paths.	2015-10-01 16:28:17 -07:00
Jacob Hoffman-Andrews	540c792474	Add an OCSP responder that serves from a file. This is useful for intermediate and root OCSP, which are generated manually one a year.	2015-09-23 16:34:13 -07:00
Jacob Hoffman-Andrews	5666b5a59a	Add dummy CT log server for integration testing.	2015-09-22 17:10:38 -07:00
Roland Shoemaker	ff6eca7a29	Submit all issued certificates to configured CT logs Adds a new service, Publisher, which exists to submit issued certificates to various Certificate Transparency logs. Once submitted the Publisher will also parse and store the returned SCT (Signed Certificate Timestamp) receipts that are used to prove inclusion in a specific log in the SA database. A SA migration adds the new SCT receipt table. The Publisher only exposes one method, SubmitToCT, which is called in a goroutine by ca.IssueCertificate as to not block any other issuance operations. This method will iterate through all of the configured logs attempting to submit the certificate, and any required intermediate certificates, to them. If a submission to a log fails it will be retried the pre-configured number of times and will either use a back-off set in a Retry-After header or a pre-configured back-off between submission attempts. This changeset is the first of a number of changes ending with serving SCT receipts in OCSP responses and purposefully leaves out the following pieces for follow-up PRs. * A fake CT server for integration testing * A external tool to search the database for certificates lacking a full set of SCT receipts * A method to construct X.509 v3 extensions containing receipts for the OCSP responder * Returned SCT signature verification (beyond just checking that the signature is of the correct type so we aren't just serving arbitrary binary blobs to clients) Resolves #95.	2015-09-17 18:11:05 -07:00
Jacob Hoffman-Andrews	fc70f00fb3	Restore `exec` command to startservers.py. Fixes https://github.com/letsencrypt/boulder/issues/671	2015-08-27 12:56:36 -07:00
Jacob Hoffman-Andrews	02c22c40aa	Fix error output in startservers. Previously startservers would crash with an error about concatenating NoneType and string, if there was a build erro.r	2015-08-26 10:12:29 -07:00
Jeff Hodges	469253a9e3	fix some dregs in startservers.py Changes this to use just communicate(), not the subprocess.PIPE stuff (which apparently can do Weird Things) Also rename the install variable to cmd in the install function	2015-08-25 21:42:48 -07:00
Jeff Hodges	3a4fef4463	install boulder cmds in one cmd in startserver.py This eases the CPU and thread requirements of our tests (by forking less, not doing everything at once). It should also speed up the tests by avoiding certain repetitive work. Updates https://github.com/letsencrypt/letsencrypt/issues/712	2015-08-25 16:02:08 -07:00
Jacob Hoffman-Andrews	f6c21120b0	Add OCSP testing to integration test.	2015-08-20 09:37:24 -07:00
Jacob Hoffman-Andrews	bcfb935472	Fail startservers.py when compile fails.	2015-08-07 17:55:43 -07:00
Jacob Hoffman-Andrews	9b20f0afaf	Startservers.py: remove tempdir, add sys.exit	2015-07-29 11:15:01 -07:00
Jacob Hoffman-Andrews	237f759ac9	Use go install for even more speed.	2015-07-28 18:29:39 -07:00
Jacob Hoffman-Andrews	d69f97e954	Fix exception handling.	2015-07-28 18:11:52 -07:00
Jacob Hoffman-Andrews	a4c4b473f1	Speed up start.py and integration test. Run builds in parallell as well as starting servers in parallel. Wait for the servers to come up, so tests don't start running too early. Enable race detection only for the integration test, not for start.py. Previously I'd suggested it should always be on, but after running with it for a while I'm convinced it's too slow for start.py (but still very valuable for integration tests!).	2015-07-28 18:07:22 -07:00
Tom Clegg	2914ba6af5	Fix "main process kept alive forever by ToSServerThread."	2015-07-25 18:17:02 -04:00
Tom Clegg	e6ca449d34	Bring up a stub ToS server in test scripts.	2015-07-25 16:21:40 -04:00
Tom Clegg	e871b30cbf	Shut down everything if any server exits before ^C/timer. Fixup log messages.	2015-07-25 15:59:38 -04:00
Tom Clegg	43c738cc93	Set GORACE env var only in "go build", not everywhere.	2015-07-25 14:51:22 -04:00
Tom Clegg	de5cce8c03	De-duplicate start.py and test/amqp-integration-test.py	2015-07-25 04:04:20 -04:00

1 2 3

135 Commits