This copies over a number of features flags and other settings from
test/config-next that have been applied in prod.
Also, remove the config-next gate on various tests.
Add passthrough for certain environment variables to
docker-compose.yml, making it easier to set them:
RUN=unit docker-compose run --use-aliases boulder ./test.sh
Use 4001 instead of 4443 to monitor boulder-wfe2's health. This avoids
a spurious error log about a failed TLS handshake.
Remove unused code around running Certbot in integation tests.
Once we de-dupe startservers.install calls we may want to move
setupHierarchy there, but for now we need to call it from
integration-test.py and start.py.
Fixes#4842
Adds a daemon which monitors the new blockedKeys table and checks for any unexpired, unrevoked certificates that are associated with the added SPKI hashes and revokes them, notifying the user that issued the certificates.
Fixes#4772.
For now this mainly provides an example config and confirms that
log-validator can start up and shut down cleanly, as well as provide a
stat indicating how many log lines it has handled.
This introduces a syslog config to the boulder-tools image that will write
logs to /var/log/program.log. It also tweaks the various .json config
files so they have non-default syslogLevel, to ensure they actually
write something for log-validator to verify.
Instead of installing Certbot from the repo, install the python-acme
library (the only piece we need) from the apt repository. This also
allows us to skip installing build dependencies for Certbot.
Uninstall cmake after building.
Clean the various Go caches.
Move codespell and acme into requirements.txt. Don't use virtualenv anymore.
This reduces image size from 1.4 GB to 1.0 GB.
Incidentally, move the Go install to its own phase in the Dockerfile.
This will give it its own image layer, making rebuilds faster.
Python 2 is over in 1 month 4 days: https://pythonclock.org/
This rolls forward most of the changes in #4313.
The original change was rolled back in #4323 because it
broke `docker-compose up`. This change fixes those original issues by
(a) making sure `requests` is installed and (b) sourcing a virtualenv
containing the `requests` module before running start.py.
Other notable changes in this:
- Certbot has changed the developer instructions to install specific packages
rather than rely on `letsencrypt-auto --os-packages-only`, so we follow suit.
- Python3 now has a `bytes` type that is used in some places that used to
provide `str`, and all `str` are now Unicode. That means going from `bytes` to
`str` and back requires explicit `.decode()` and `.encode()`.
- Moved from urllib2 to requests in many places.
We now expect that the config dir is always set, so we make that
explicit in the integration test and error if that's not true.
This change also renames the variable to just "config_dir", and removes
the parameter to startservers.start, which is currently never set to
anything other than its default value.
This also explicitly sets the environment variable in .travis.yml.
This reverts commit 796a7aa2f4.
People's tests have been breaking on `docker-compose up` with the following output:
```
ImportError: No module named requests
```
Fixes#4322
* integration: move to Python3
- Add parentheses to all print and raise calls.
- Python3 distinguishes bytes from strings. Add encode() and
decode() calls as needed to provide the correct type.
- Use requests library consistently (urllib3 is not in Python3).
- Remove shebang from Python files without a main, and update
shebang for integration-test.py.
Basically a complete re-write/re-design of the forwarding concept introduced in
#4297 (sorry for the rapid churn here). Instead of nonce-services blindly
forwarding nonces around to each other in an attempt to find out who issued the
nonce we add an identifying prefix to each nonce generated by a service. The
WFEs then use this prefix to decide which nonce-service to ask to validate the
nonce.
This requires a slightly more complicated configuration at the WFE/2 end, but
overall I think ends up being a way cleaner, more understandable, easy to
reason about implementation. When configuring the WFE you need to provide two
forms of gRPC config:
* one gRPC config for retrieving nonces, this should be a DNS name that
resolves to all available nonce-services (or at least the ones you want to
retrieve nonces from locally, in a two DC setup you might only configure the
nonce-services that are in the same DC as the WFE instance). This allows
getting a nonce from any of the configured services and is load-balanced
transparently at the gRPC layer.
* a map of nonce prefixes to gRPC configs, this maps each individual
nonce-service to it's prefix and allows the WFE instances to figure out which
nonce-service to ask to validate a nonce it has received (in a two DC setup
you'd want to configure this with all the nonce-services across both DCs so
that you can validate a nonce that was generated by a nonce-service in another
DC).
This balancing is implemented in the integration tests.
Given the current remote nonce code hasn't been deployed anywhere yet this
makes a number of hard breaking changes to both the existing nonce-service
code, and the forwarding code.
Fixes#4303.
When NewAuthorizationSchema is enabled, we still want v1 authzs to be reusable in
new orders. This tests that that code is implemented correctly.
Updates #4241
Move from using `requests` to `urllib2` in `helpers.py`. Verified
this works with `docker-compose up`. In the future we really should
be installing our own python dependencies in the boulder-tools image
rather than relying on getting them by using the certbot virtualenv.
* Switch to instant OCSP verification in integration tests
* Move waitport to helpers and use it to determine if ocsp-responder is
alive in test_single_ocsp
As part of #4241, I need to introduce some twenty-days-ago setup. So I refactored the
only current instance (test_caa) to use a style where setup functions can be registered right
next to the test cases they affect. The @register_twenty_days_ago is Python for
"call register_twenty_days_ago with the thing on the next line as an argument."
I also cleaned up a bunch of related stuff:
* Removed the ACCOUNT_URI environment variable and associated function params.
This was introduced in in #3736 to pass a URI to challtestsrv before we refactored for
more dynamic updates. It's not used any more.
* Removed a try / except from startChallSrv that needlessly hid errors.
* Move setting of DNS fixtures for caa_test into the test case itself.
## CI: restore load-generator run.
This restores running the `load-generator` during CI to make sure it doesn't bitrot. It was previously removed while we debugged the VA getting jammed up and not cleanly shutting down.
Since the global `pebble-challtestsrv` and the `load-generator`'s internal chall test srv will conflict this requires moving the `load-generator` run to the end of integration tests and updating `startservers.py` to allow the load gen integration test code to stop the `pebble-challtestsrv` before starting the `load-generator`.
The `load-generator` and associated config are updated to allow specifying bind addresses for the DNS interface of the internal challtestsrv. Multiple addresses are supported so that the `load-generator`'s chall test srv can listen on port DNS ports Boulder is configured to use. The `load-generator` config now accepts a `fakeDNS` parameter that can be used to specify the default IPv4 address returned by the `load-generator`'s DNS server for A queries.
## load-generator: support different challenges/strategies.
Updates the load-generator to support HTTP-01, DNS-01, and TLS-ALPN-01 challenge response servers. A new challenge selection configuration parameter (`ChallengeStrategy`) can be set to `"http-01"`, `"dns-01"`, or `"tls-alpn-01"` to solve only challenges of that type. Using `"random"` will let the load-generator choose a challenge type randomly.
Resolves https://github.com/letsencrypt/boulder/issues/3900
`pebble-challtestsrv` added a `-defaultIPv4` arg we can use to simplify
the integration tests and fix FAKE_DNS usage outside of integration
tests.
A new boulder-tools image with an updated `pebble-challtestsrv` is used
and `test/startservers.py` is changed to populate `-defaultIPv4` via the
`FAKE_DNS` env var.
Now that Pebble has a `pebble-challtestsrv` we can remove the `challtestrv`
package and associated command from Boulder. I switched CI to use
`pebble-challtestsrv`. Notably this means that we have to add our expected mock
data using the HTTP management interface. The Boulder-tools images are
regenerated to include the `pebble-challtestsrv` command.
Using this approach also allows separating the TLS-ALPN-01 and HTTPS HTTP-01
challenges by binding each challenge type in the `pebble-challtestsrv` to
different interfaces both using the same VA
HTTPS port. Mock DNS directs the VA to the correct interface.
The load-generator command that was previously using the `challtestsrv` package
from Boulder is updated to use a vendored copy of the new
`github.org/letsencrypt/challtestsrv` package.
Vendored dependencies change in two ways:
1) Gomock is updated to the latest release (matching what the Bouldertools image
provides)
2) A couple of new subpackages in `golang.org/x/net/` are added by way of
transitive dependency through the challtestsrv package.
Unit tests are confirmed to pass for `gomock`:
```
~/go/src/github.com/golang/mock/gomock$ git log --pretty=format:'%h' -n 1
51421b9
~/go/src/github.com/golang/mock/gomock$ go test ./...
ok github.com/golang/mock/gomock 0.002s
? github.com/golang/mock/gomock/internal/mock_matcher [no test files]
```
For `/x/net` all tests pass except two `/x/net/icmp` `TestDiag.go` test cases
that we have agreed are OK to ignore.
Resolves https://github.com/letsencrypt/boulder/issues/3962 and
https://github.com/letsencrypt/boulder/issues/3951
To complete https://github.com/letsencrypt/boulder/issues/3956 the `challtestsrv` is updated such that its existing TLS-ALPN-01 challenge test server will serve HTTP-01 responses with a self-signed certificate when a non-TLS-ALPN-01 request arrives. This lets the TLS-ALPN-01 challenge server double as a HTTPS version of the HTTP challenge server. The `challtestsrv` now also supports adding/remove redirects that will be served to clients when requesting matching paths.
The existing chisel/chisel2 integration tests are updated to use the `challtestsrv` instead of starting their own standalone servers. This centralizes our mock challenge responses and lets us bind the `challtestsrv` to the VA's HTTP port in `startservers.py` without clashing ports later on.
New integration tests are added for HTTP-01 redirect scenarios using the updated `challtestserv`. These test cases cover:
* valid HTTP -> HTTP redirect
* valid HTTP -> HTTPS redirect
* Invalid HTTP -> non-HTTP/HTTPS port redirect
* Invalid HTTP-> non-HTTP/HTTPS protocol scheme redirect
* Invalid HTTP-> bare IP redirect
* Invalid HTTP redirect loop
The new integration tests shook out two fixes that were required for the legacy VA HTTP-01 code (afad22b) and one fix for the challtestsrv mock DNS (59b7d6d).
Resolves https://github.com/letsencrypt/boulder/issues/3956
Retains the existing logging of orphaned certs until we are confident that this
solution can fully replace it (even then we may want to keep it just for auditing etc).
Fixes#3636.
This adds support for the account-uri CAA parameter as specified by
section 3 of https://tools.ietf.org/html/draft-ietf-acme-caa-04, allowing
issuance to be restricted to one or more ACME accounts as specified by CAA
records.
We'd like to start using the DNS load balancer in the latest version of gRPC. That means putting all IPs for a service under a single hostname (or using a SRV record, but we're not taking that path). This change adds an sd-test-srv to act as our service discovery DNS service. It returns both Boulder IP addresses for any A lookup ending in ".boulder". This change also sets up the Docker DNS for our boulder container to defer to sd-test-srv when it doesn't know an answer.
sd-test-srv doesn't know how to resolve public Internet names like `github.com`. Resolving public names is required for the `godep-restore` test phase, so this change breaks out a copy of the boulder container that is used only for `godep-restore`.
This change implements a shim of a DNS resolver for gRPC, so that we can switch to DNS-based load balancing with the currently vendored gRPC, then when we upgrade to the latest gRPC we won't need a simultaneous config update.
Also, this change introduces a check at the end of the integration test that each backend received at least one RPC, ensuring that we are not sending all load to a single backend.
Prior to this commit we had two implementations of ACME challenge
servers for use in tests:
1) test/dns-test-srv - a small fake DNS server used for adding/removing
DNS-01 TXT records and returning fake A/AAAA data.
2) test/load-generator/challenge-servers.go - a small library for
providing an HTTP-01 challenge server.
This commit consolidates both into a dedicated `test/challsrv` package.
The `load-generator` code is updated to use this library package to
implement its HTTP-01 challenge server. This leaves the `load-generator`
as a nice stand alone tool that doesn't need coordination between itself
and a separate `challsrv` binary.
To keep the `dns-test-srv` use-case of a nice standalone binary that can
be run from `test/startservers.py` the `test/challsrv` package has
a `test/challsrv/cmd/challsrv` package that provides the `challsrv`
command. This is a stand-alone binary that can offer both an HTTP-01 and
a DNS-01 challenge server along with a management HTTP interface that
can be used by external programs to add/remove HTTP-01 and DNS-01
challenges.
The Boulder integration tests are updated to use `challsrv` instead of
`dns-test-srv`. Presently only the DNS-01 challenge server of `challsrv`
is used by the integration tests.
TODO: The DNS-01 challenge server is doing a fair number of non-DNS-01
challenge things (Fake host data, etc). This should be cleaned up and
made configurable.
Updates #3652
We're currently stuck on gRPC v1.1 because of a breaking change to certificate validation in gRPC 1.8. Our gRPC balancer uses a static list of multiple hostnames, and expects to validate against those hostnames. However gRPC expects that a service is one hostname, with multiple IP addresses, and validates all those IP addresses against the same hostname. See grpc/grpc-go#2012.
If we follow gRPC's assumptions, we can rip out our custom Balancer and custom TransportCredentials, and will probably have a lower-friction time in general.
This PR is the first step in doing so. In order to satisfy the "multiple IPs, one port" property of gRPC backends in our Docker container infrastructure, we switch to Docker's user-defined networking. This allows us to give the Boulder container multiple IP addresses on different local networks, and gives it different DNS aliases in each network.
In startservers.py, each shard of a service listens on a different DNS alias for that service, and therefore a different IP address. The listening port for each shard of a service is now identical.
This change also updates the gRPC service certificates. Now, each certificate that is used in a gRPC service (as opposed to something that is "only" a client) has three names. For instance, sa1.boulder, sa2.boulder, and sa.boulder (the generic service name). For now, we are validating against the specific hostnames. When we update our gRPC dependency, we will begin validating against the generic service name.
Incidentally, the DNS aliases feature of Docker allows us to get rid of some hackery in entrypoint.sh that inserted entries into /etc/hosts.
Note: Boulder now has a dependency on the DNS aliases feature in Docker. By default, docker-compose run creates a temporary container and doesn't assign any aliases to it. We now need to specify docker-compose run --use-aliases to get the correct behavior. Without --use-aliases, Boulder won't be able to resolve the hostnames it wants to bind to.
We recently reordered the programs in startservers.py to reflect
dependencies, so we wouldn't generate gRPC errors when a process comes
up before its backend. However, we missed the remote VAs. This change
moves them to the beginning.
Also, don't print "all processes started" after each process starts.
Boulder is fairly noisy about gRPC connection errors. This is a mixed
blessing: Our gRPC configuration will try to reconnect until it hits
an RPC deadline, and most likely eventually succeed. In that case,
we don't consider those to really be errors. However, in cases where
a connection is repeatedly failing, we'd like to see errors in the
logs about connection failure, rather than "deadline exceeded." So
we want to keep logging of gRPC errors.
However, right now we get a lot of these errors logged during
integration tests. They make the output hard to read, and may disguise
more serious errors. So we'd like to avoid causing such errors in
normal integration test operation.
This change reorders the startup of Boulder components by their gRPC
dependencies, so everything's backend is likely to be up and running
before it starts. It also reverses that order for clean shutdowns,
and waits for each process to exit before signalling the next one.
With these changes, I still got connection errors. Taking listenbuddy
out of the gRPC path fixed them. I believe the issue is that
listenbuddy is not a truly transparent proxy. In particular, it
accepts an inbound TCP connection before opening an outbound TCP
connection. If opening that outbound connection results in "connection
refused," it closes the inbound connection. That means gRPC sees a
"connection closed" (or "connection reset"?) rather than "connection
refused". I'm guessing it handles those cases differently, explaining
the different error results.
We've been using listenbuddy to trigger disconnects while Boulder is
running, to ensure that gRPC's reconnect code works. I think we can
probably rely on gRPC's reconnect to work. The initial problem that
led us to start testing this was a configuration problem; now that
we have the configuration we want, we should be fine and don't need
to keep testing reconnects on every integration test run.
For the upcoming SCT embedding changes, it will be useful to have a CT test server that blocks for nontrivial amounts of time before responding. This change introduces a config file for `ct-test-srv` that can be used to set up multiple "personalities" on various ports. Each personality can have a "latencySchedule" that determines how long it will sleep before servicing responding to a submission.
This change also introduces two new "personalities" on :4510 and :4511, plus configures CTLogGroups in the RA. Having four CT log personalities allows us to simulate two nontrivial log groups.
Note: This triggers Publisher to emit audit errors on timed-out submissions. We may want to make Publisher not treat those as errors, and instead only log an error if a whole log group fails.
* Remove non-TLS support from mailer entirely
* Add a config option for trusted roots in expiration-mailer. If unset, it defaults to the system roots, so this does not need to be set in production.
* Use TLS in mail-test-srv, along with an internal root and localhost certificates signed by that root.
Fixes#3020.
In order to write integration tests for some features, especially related to rate limiting, rechecking of CAA, and expiration of authzs, orders, and certs, we need to be able to fake the passage of time in integration tests.
To do so, this change switches out all clock.Default() instances for cmd.Clock(), which can be set manually with the FAKECLOCK environment variable. integration-test.py now starts up all servers once before the main body of tests, with FAKECLOCK set to a date 70 days ago, and does some initial setup for a new integration test case. That test case tries to fetch a 70-day-old authz URL, and expects it to 404.
In order to make this work, I also had to change a number of our test binaries to shut down cleanly in response to SIGTERM. Without that change, stopping the servers between the setup phase and the main tests caused startservers.check() to fail, because some processes exited with nonzero status.
Note: This is an initial stab at things, to prove out the technique. Long-term, I think we will want to use an idiom where test cases are classes that have a number of optional setup phases that may be run at e.g. 70 days prior and 5 days prior. This could help us avoid a proliferation of global state as we add more time-dependent test cases.
Previously, we would produce an error an a nonzero status code on shutdown,
because gRPC's GracefulStop would cause s.Serve() to return an error. Now we
filter that specific error and treat it as success. This also allows us to kill
process with SIGTERM instead of SIGKILL in integration tests.
Fixes#2410.
This PR is the initial duplication of the WFE to create a WFE2
package. The rationale is briefly explained in `wfe2/README.md`.
Per #2822 this PR only lays the groundwork for further customization
and deduplication. Presently both the WFE and WFE2 are identical except
for the following configuration differences:
* The WFE offers HTTP and HTTPS on 4000 and 4430 respectively, the WFE2
offers HTTP on 4001 and 4431.
* The WFE has a debug port on 8000, the WFE2 uses the next free "8000
range port" and puts its debug service on 8013
Resolves https://github.com/letsencrypt/boulder/issues/2822
Adds basic multi-path validation functionality. A new method `performRemoteValidation` is added to `boulder-va` which is called if it is configured with a list of remote VA gRPC addresses. In this initial implementation the remote VAs are only used to check the validation result of the main VA, if all of the remote validations succeed but the local validation failed, the overall validation will still fail. Remote VAs use the exact same code as the local VA to perform validation. If the local validation succeeds then a configured quorum of the remote VA successes must be met in order to fully complete the validation.
This implementation assumes that metrics are collected from the remote VAs in order to have visibility into their individual validation latencies etc.
Fixes#2621.
Deletes github.com/streadway/amqp and the various RabbitMQ setup tools etc. Changes how listenbuddy is used to proxy all of the gRPC client -> server connections so we test reconnection logic.
+49 -8,221 😁Fixes#2640 and #2562.
In order to provide the correct issuer certificate for older certificates after an issuer certificate rollover or when using multiple issuer certificates (e.g. RSA and ECDSA), use the AIA CA Issuer URL embedded in the certificate for the rel="up" link served by WFE. This behaviour is gated behind the UseAIAIssuerURL feature, which defaults to false.
To prevent MitM vulnerabilities in cases where the AIA URL is HTTP-only, it is upgraded to HTTPS.
This also adds a test for the issuer URL returned by the /acme/cert endpoint. wfe/test/178.{crt,key} were regenerated to add the AIA extension required to pass the test.
/acme/cert was changed to return an absolute URL to the issuer endpoint (making it consistent with /acme/new-cert).
Fixes#1663
Based on #1780
This PR has three primary contributions:
1. The existing code for using the V4 safe browsing API introduced in #2446 had some bugs that are fixed in this PR.
2. A gsb-test-srv is added to provide a mock Google Safebrowsing V4 server for integration testing purposes.
3. A short integration test is added to test end-to-end GSB lookup for an "unsafe" domain.
For 1) most notably Boulder was assuming the new V4 library accepted a directory for its database persistence when it instead expects an existing file to be provided. Additionally the VA wasn't properly instantiating feature flags preventing the V4 api from being used by the VA.
For 2) the test server is designed to have a fixed set of "bad" domains (Currently just honest.achmeds.discount.hosting.com). When asked for a database update by a client it will package the list of bad domains up & send them to the client. When the client is asked to do a URL lookup it will check the local database for a matching prefix, and if found, perform a lookup against the test server. The test server will process the lookup and increment a count for how many times the bad domain was asked about.
For 3) the Boulder startservers.py was updated to start the gsb-test-srv and the VA is configured to talk to it using the V4 API. The integration test consists of attempting issuance for a domain pre-configured in the gsb-test-srv as a bad domain. If the issuance succeeds we know the GSB lookup code is faulty. If the issuance fails, we check that the gsb-test-srv received the correct number of lookups for the "bad" domain and fail if the expected isn't reality.
Notes for reviewers:
* The gsb-test-srv has to be started before anything will use it. Right now the v4 library handles database update request failures poorly and will not retry for 30min. See google/safebrowsing#44 for more information.
* There's not an easy way to test for "good" domain lookups, only hits against the list. The design of the V4 API is such that a list of prefixes is delivered to the client in the db update phase and if the domain in question matches no prefixes then the lookup is deemed unneccesary and not performed. I experimented with sending 256 1 byte prefixes to try and trick the client to always do a lookup, but the min prefix size is 4 bytes and enumerating all possible prefixes seemed gross.
* The test server has a /add endpoint that could be used by integration tests to add new domains to the block list, but it isn't being used presently. The trouble is that the client only updates its database every 30 minutes at present, and so adding a new domain will only take affect after the client updates the database.
Resolves#2448