None of the development team use this approach to running a dev env. It
no longer works without modifying `test/startservers.py` and the `test/`
configurations. Given that it has been broken for a month+ and has only
provoked one user issue I think we can be fairly confident that few others
are using this method of setting up a Boulder development environment
and should prioritize our time/docs accordingly.
We're currently stuck on gRPC v1.1 because of a breaking change to certificate validation in gRPC 1.8. Our gRPC balancer uses a static list of multiple hostnames, and expects to validate against those hostnames. However gRPC expects that a service is one hostname, with multiple IP addresses, and validates all those IP addresses against the same hostname. See grpc/grpc-go#2012.
If we follow gRPC's assumptions, we can rip out our custom Balancer and custom TransportCredentials, and will probably have a lower-friction time in general.
This PR is the first step in doing so. In order to satisfy the "multiple IPs, one port" property of gRPC backends in our Docker container infrastructure, we switch to Docker's user-defined networking. This allows us to give the Boulder container multiple IP addresses on different local networks, and gives it different DNS aliases in each network.
In startservers.py, each shard of a service listens on a different DNS alias for that service, and therefore a different IP address. The listening port for each shard of a service is now identical.
This change also updates the gRPC service certificates. Now, each certificate that is used in a gRPC service (as opposed to something that is "only" a client) has three names. For instance, sa1.boulder, sa2.boulder, and sa.boulder (the generic service name). For now, we are validating against the specific hostnames. When we update our gRPC dependency, we will begin validating against the generic service name.
Incidentally, the DNS aliases feature of Docker allows us to get rid of some hackery in entrypoint.sh that inserted entries into /etc/hosts.
Note: Boulder now has a dependency on the DNS aliases feature in Docker. By default, docker-compose run creates a temporary container and doesn't assign any aliases to it. We now need to specify docker-compose run --use-aliases to get the correct behavior. Without --use-aliases, Boulder won't be able to resolve the hostnames it wants to bind to.
This PR updates the `test/boulder-tools/tag_and_upload.sh` script to template a `Dockerfile` for building multiple copies of `boulder-tools`: one per supported Go version. Unfortunately this is required because only Docker 17+ supports an env var in a Dockerfile `FROM`. It's best if we can stay on package manger installed versions of Docker which precludes 17+ 😞.
The `docker-compose.yml` is updated to version "3" to allow specifying a `GO_VERSION` env var in the respective Boulder `image` directives. This requires `docker-compose` version 1.10.0+ which in turn requires Docker engine version 1.13.0+. The README is updated to reflect these new requirements. This Docker engine version is commonly available in package managers (e.g. Ubuntu 16.04). A sufficient `docker-compose` version is not, but this is a simple one binary Go application that is easy to update outside of package managers.
The `.travis.yml` config file is updated to set the `GO_VERSION` in the build matrix, allowing build tasks for different Go versions. Since the `docker-compose.yml` now requires `docker-compose` 1.10.0+ the
`.travis.yml` also gains a new `before_install` for setting up a modern `docker-compose` version.
Lastly tools and images are updated to support both Go 1.10 (our current Go version) and Go 1.10.1 (the new point release). By default Go 1.10 is used, we can switch this once staging/prod are updated.
_*TODO*: One thing I haven't implemented yet is a `sed` expression in `tag_and_upload.sh` that updates both `image` lines in `docker-compose.yml` with an up-to-date tag. Putting this up for review while I work on that last creature comfort._
Resolves https://github.com/letsencrypt/boulder/issues/3551
Replaces https://github.com/letsencrypt/boulder/pull/3620 (GH got stuck from a yaml error)
From... ahem... some frustrating debugging I determined that the Boulder
docker environment fails in strange & mysterious ways if you do not have
sufficient RAM. This commit adds this fact to the README to save future
souls my torment.
This commit also adds a cd to the intial git clone instructions to
ensure the user is in the correct directory to run docker-compose up
from.
After talking to @jsha, this updates Boulder's docker-compose.yml to version 2. I'm currently working on moving some Certbot tests from EC2 to Docker and this allows me to take advantage of networking features like embedded DNS which is used by default in newer versions of Docker Compose.
This shouldn't change any behavior of the file. One notable thing is I had to add network_mode: bridge to the bhsm service. I don't believe this is a change in behavior though since bhsm was included in the links section for boulder
Fixes#976.
This implements a new rate limit, InvalidAuthorizationsPerAccount. If a given account fails authorization for a given hostname too many times within the window, subsequent new-authz attempts for that account and hostname will fail early with a rateLimited error. This mitigates the misconfigured clients that constantly retry authorization even though they always fail (e.g., because the hostname no longer resolves).
For the new rate limit, I added a new SA RPC, CountInvalidAuthorizations. I chose to implement this only in gRPC, not in AMQP-RPC, so checking the rate limit is gated on gRPC. See #2406 for some description of the how and why. I also chose to directly use the gRPC interfaces rather than wrapping them in core.StorageAuthority, as a step towards what we will want to do once we've moved fully to gRPC.
Because authorizations don't have a created time, we need to look at the expires time instead. Invalid authorizations retain the expiration they were given when they were created as pending authorizations, so we use now + pendingAuthorizationLifetime as one side of the window for rate limiting, and look backwards from there. Note that this means you could maliciously bypass this rate limit by stacking up pending authorizations over time, then failing them all at once.
Similarly, since this limit is by (account, hostname) rather than just (hostname), you can bypass it by creating multiple accounts. It would be more natural and robust to limit by hostname, like our certificate limits. However, we currently only have two indexes on the authz table: the primary key, and
(`registrationID`,`identifier`,`status`,`expires`)
Since this limit is intended mainly to combat misconfigured clients, I think this is sufficient for now.
Corresponding PR for website: letsencrypt/website#125
Pulls in logging improvements in OCSP Responder and the CT client, plus a handful of API changes. Also, the CT client verifies responses by default now.
This change includes some Boulder diffs to accommodate the API changes.
By default the Boulder dev env's VA config test/config/va.json has
a portConfig that connects to port 5002 for HTTP-01 challenges and
port 5001 for TLS-SNI-01 challenges.
This can be confusing to someone that follows the FAKE_DNS guidelines
for using a client from the host machine but expects the VA to reach out
on port 80 or 443 like production/staging's VA.
This commit updates the README to include a note on the non-standard
ports and how to change them.
Previously the instructions assumed you had Go setup on your host, which
somewhat defeates the point of running Boulder inside Docker, since it requires
more initial setup. These instructions make first-time users less likely to hit
the oci runtime error described later in the README.
We have seen a couple issues from folks that run into trouble using `docker-compose up` with invalid `$GOPATH`'s configured (e.g. see issues #2150, #2141, #2112).
This PR adds a sentence to the README indicate that if you see a docker "oci runtime error" or a failure to create the container it may be caused by your `$GOPATH` and to check that first.
As a follow-up to #2098 this PR changes the sentence describing "an alias for letsencrypt" to reference an "alias to certbot" instead. This fixes what was an outdated reference to the client formerly known as Let's Encrypt, now Certbot.
The slow start guide's "working with a client" section of the README still referenced a `letsencrypt` path instead of the correct `certbot` path when describing sourcing the integration test alias script. This PR updates the path to use certbot.
Docker's now our main dev env, and running outside of Docker requires a change
to not use SoftHSM. Update the README to reflect that.
Also include instructions in the README on how to run SoftHSM on the host, update make-softhsm to match the current setup, and describe the mounting of GOPATH for instant updates to code.
That change broke the certbot tests because it switched to a MariaDB
10.1-specific syntax. certbot/certbot#3058 changes the certbot tests to use
Boulder's docker-compose.yml, so they will get MariaDB 10.1 automatically.
* MariaDB 10.1
* MariaDB 10.1 in Docker
* Run docker stuff.
* Improve test.js error.
* Lower log level
* Revert dockerfile to master
* Export debug ports, set FAKE_DNS, and remove container_name.
* Remove typo.
* Make integration-test.py wait for debug ports.
* Use 10.1 and export more Boulder ports.
* Test updates for Docker
Listen on 0.0.0.0 for utility servers.
Make integration-test.py just wait for ports rather than calling startservers.
Run docker-compose in test.sh.
Remove bypass when database exists.
Separate mailer test into its own function in integration test.
Print better errors in test.js.
* Always bring up mysql container.
* Wait for MySQL to come up.
* Put it in travis-before-install.
* Use 127
* Remove manual docker-up.
* Add ifconfig
* Switch to docker-compose run
* It works!
* Remove some spurious env vars.
* Add bash
* try running it
* Add all deps.
* Pass through env.
* Install everything in the Dockerfile.
* Fix install of ruby
* More improvements
* Revert integration test to run directly
Also remove .git from dockerignore and add some packages.
* Revert integration-test.py to master.
* Stop ignoring test/js
* Start from boulder-tools.
* Add boulder-tools.
* Tweak travis.yml
* Separate out docker-compose pull as install.
* Build in install phase; don't bother with go install in Dockerfile
* Add virtualenv
* Actually build rabbitmq-setup
* Remove FAKE_DNS
* Trivial change
* Pull boulder-tools as a separate step so it gets its own timing info.
* Install certbot and protobuf from repos.
* Use cerbot from debian backports.
* Fix clone
* Remove CERTBOT_PATH
* Updates
* Go back to letsencrypt for build.sh
* Remove certbot volume.
* go back to preinstalled letsencrypt
* Restore ENV
* Remove BASH_ENV
* Adapt reloader test so it psses when run as root.
* Fixups for review.
* Revert test.js
* Revert startservers.py
* Revert Makefile.
* Fix go generate command in metrics.
The previous command only worked on OS X. This one works on Linux but not
OS X.
Also add generate phase of test.sh.
* Add mockgen to test setup.
* Fix github-pr-status output.
* Fix envvar style.
* Set xtrace.
* Fix test.sh
* Fix test.sh some more.
* Fix mockgen command.
* Add dependencies for running `go generate`.
* Add protoc-gen-go.
* Fix go get command.
* Fix generate.
* Wait for all.
* Fix generate.
* Update generated pb.
* Fix generate commands for vendored world.
* Update documentation for new vendor style.
* Update grpc package to latest.
* Update caaChecker proto with latest.
* Run go generate only over TESTPATHS
* See if Travis passes under 1.6
* Switch back to 1.5.
* Trim run command.
* Run stringer from correct directory.
* Move generate command.
* Restore and generate
* Fix path.
* list contents of GOPATH.
* Fix stringer by prebuilding.
* Try another import path.
* regenerate bcode_string.
* remove excess package
* pull jsha fork of protoc-gen-go that echoes
* Echo protoc version.
* install from source
* CD back.
* Go back to normal protoc-gen-go
* Fix path
* Move protobuf install into test/setup.sh
* Move before_install to install.
* Set PATH.
* Follow 301 with curl.
* Shuffle test order.
* Swap back test order.
* Restore all tests.
* Restore 1.5.3 to Travis.
* Remove unnecessary wait-or-exit
* Generate metrics mock with latest mockgen.
* Wrap TESTPATHS in curlies
* Remove spurious bracket
Use bridged networking.
Add some files to .dockerignore to shrink the build state sent to Docker
daemon.
Use specific hostnames to contact services, rather than localhost.
Add instructions for adding those hostnames to /etc/hosts in non-Docker config.
Use DSN-style connect strings for DBs.
Remove localhost / 127.0.0.1 rewrite hack from create_db.sh.
Add hosts section with new hostnames.
Remove bin from .dockerignore.
SQL grants go to %
Short-circuit DB creation if already existing.
Make `go install` a part of Docker image build so that Docker run is much
faster.
Bind to 0.0.0.0 for OCSP responders so they can be reached from host, and
publish / expose their ports.
Remove ToSServerThread and test.js' fetch of ToS.
Increase the registrationsPerIP rate limit threshold. When issuing from a Docker
host, the 127.0.0.1 override doesn't apply, so the limit is quickly hit.
Update docker-compose for bridged networking. Note: docker-compose doesn't currently work, but should be close.
https://github.com/letsencrypt/boulder/pull/1639
The previously mentioned command fails with
```
$ go get -u https://github.com/tools/godep
package https:/github.com/tools/godep: "https://" not allowed in import path
```
This accomplishes two things:
- setup.sh should now be usable by the client integration test.
- setup.sh can be used by new project members to simplify first setup.
Update the README to indicate the new file, and to correct some out-of-date
information.
Previously our executables would all try to declare the boulder exchange on
startup, which may have been leading to some race conditions in Travis. Also,
the Activity Monitor would try to bind a queue to the exchange at startup.
In prod both of these tasks are taken care of administratively, so including
them in the app code was adding unnecessary complexity. It also may have been
part of an issue causing Activity Monitor to fail to start up recently.
Also, turn the Activity Monitor into an RPC service, which gets it reconnects
for free, and add it to startservers.py.