We'd like to start using the DNS load balancer in the latest version of gRPC. That means putting all IPs for a service under a single hostname (or using a SRV record, but we're not taking that path). This change adds an sd-test-srv to act as our service discovery DNS service. It returns both Boulder IP addresses for any A lookup ending in ".boulder". This change also sets up the Docker DNS for our boulder container to defer to sd-test-srv when it doesn't know an answer.
sd-test-srv doesn't know how to resolve public Internet names like `github.com`. Resolving public names is required for the `godep-restore` test phase, so this change breaks out a copy of the boulder container that is used only for `godep-restore`.
This change implements a shim of a DNS resolver for gRPC, so that we can switch to DNS-based load balancing with the currently vendored gRPC, then when we upgrade to the latest gRPC we won't need a simultaneous config update.
Also, this change introduces a check at the end of the integration test that each backend received at least one RPC, ensuring that we are not sending all load to a single backend.
Production/staging have been updated to a release built with Go 1.10.2.
This allows us to remove the Go 1.10.1 builds from the travis matrix and
default to Go 1.10.2.
We're currently stuck on gRPC v1.1 because of a breaking change to certificate validation in gRPC 1.8. Our gRPC balancer uses a static list of multiple hostnames, and expects to validate against those hostnames. However gRPC expects that a service is one hostname, with multiple IP addresses, and validates all those IP addresses against the same hostname. See grpc/grpc-go#2012.
If we follow gRPC's assumptions, we can rip out our custom Balancer and custom TransportCredentials, and will probably have a lower-friction time in general.
This PR is the first step in doing so. In order to satisfy the "multiple IPs, one port" property of gRPC backends in our Docker container infrastructure, we switch to Docker's user-defined networking. This allows us to give the Boulder container multiple IP addresses on different local networks, and gives it different DNS aliases in each network.
In startservers.py, each shard of a service listens on a different DNS alias for that service, and therefore a different IP address. The listening port for each shard of a service is now identical.
This change also updates the gRPC service certificates. Now, each certificate that is used in a gRPC service (as opposed to something that is "only" a client) has three names. For instance, sa1.boulder, sa2.boulder, and sa.boulder (the generic service name). For now, we are validating against the specific hostnames. When we update our gRPC dependency, we will begin validating against the generic service name.
Incidentally, the DNS aliases feature of Docker allows us to get rid of some hackery in entrypoint.sh that inserted entries into /etc/hosts.
Note: Boulder now has a dependency on the DNS aliases feature in Docker. By default, docker-compose run creates a temporary container and doesn't assign any aliases to it. We now need to specify docker-compose run --use-aliases to get the correct behavior. Without --use-aliases, Boulder won't be able to resolve the hostnames it wants to bind to.
This PR adds Go 1.10.2 to the build matrix along with Go 1.10.1.
After staging/prod have been updated to Go 1.10.2 we can remove Go 1.10.1.
Resolves#3680
We have updated staging/prod Boulder builds to use Go 1.10.1. This means
we no longer need to support Go 1.10.0 in dev docker images, CI, and our
image building tools.
This PR updates the `test/boulder-tools/tag_and_upload.sh` script to template a `Dockerfile` for building multiple copies of `boulder-tools`: one per supported Go version. Unfortunately this is required because only Docker 17+ supports an env var in a Dockerfile `FROM`. It's best if we can stay on package manger installed versions of Docker which precludes 17+ 😞.
The `docker-compose.yml` is updated to version "3" to allow specifying a `GO_VERSION` env var in the respective Boulder `image` directives. This requires `docker-compose` version 1.10.0+ which in turn requires Docker engine version 1.13.0+. The README is updated to reflect these new requirements. This Docker engine version is commonly available in package managers (e.g. Ubuntu 16.04). A sufficient `docker-compose` version is not, but this is a simple one binary Go application that is easy to update outside of package managers.
The `.travis.yml` config file is updated to set the `GO_VERSION` in the build matrix, allowing build tasks for different Go versions. Since the `docker-compose.yml` now requires `docker-compose` 1.10.0+ the
`.travis.yml` also gains a new `before_install` for setting up a modern `docker-compose` version.
Lastly tools and images are updated to support both Go 1.10 (our current Go version) and Go 1.10.1 (the new point release). By default Go 1.10 is used, we can switch this once staging/prod are updated.
_*TODO*: One thing I haven't implemented yet is a `sed` expression in `tag_and_upload.sh` that updates both `image` lines in `docker-compose.yml` with an up-to-date tag. Putting this up for review while I work on that last creature comfort._
Resolves https://github.com/letsencrypt/boulder/issues/3551
Replaces https://github.com/letsencrypt/boulder/pull/3620 (GH got stuck from a yaml error)
- Remove acme-v2 test phase.
- Rename integration-test-v2.py to v2_integration, so it can be imported.
- Import all symbols from v2_integration before running test_*.
- In chisel2:
- Rename DIRECTORY so it doesn't collide.
- Incidental logging and error fixes.
- Merge v1 and v2 load testing into a single function.
- Run cert-checker just once, after all other test cases.
- In v2_integration:
- Remove unnecessary imports.
- Import chisel2 methods in the chisel2 namespace so they don't
collide with chisel methods.
- Remove main and shutdown code.
We have a Dockerfile that builds off of boulder-tools, but does
basically nothing. Removing it saves a build step and avoids some
hard-to-diagnose edge cases.
This change triggered an issue with the "prom" container and
"docker-compose up", saying that prom can't be linked to a not-running
container. Since we rarely use prom in dev/test, I'm removing it for
now. The nice, futuristic way to get containers to talk to each other is
with docker networking, so we can bring it back with a switch to docker
networking in the future if we want.
Travis only allows us 5 simultaneous build jobs, so going from 6 to 5 jobs per
build should reduce the wall time required to get a CI result on any given
branch.
- Add OCSP graphs
- Graph overall request rate
- Separate out WFE vs OCSP graphs
- Fix challenge graph (add a / to endpoint)
- Some incidental changes to "step"
- Add a lint script to check for common dashboard mistakes
The unit test runs in CI have been taking ~20 minutes. The root cause is
using `-race` on every individual `go test` invocation. We can't switch
to one big `go test` with `-race` instead of individuals if we want test
coverage to be reported. The workaround is to do one big `go test` with
`-race` first, and then many individual `go test`'s to collect coverage
*without* `-race`. This is still faster overall than the current state
of affairs.
Resolves https://github.com/letsencrypt/boulder/issues/2695
Remove explicit `boulder-tools` docker pull cmd.
Per @jsha's comment in #2567 it should be possible to remove the explicit docker pull letsencrypt/boulder-tools since the docker-compose pull that precedes it will take care of it.
Previously we had a 'latest' docker image tag for the `boulder-tools`
image that was *not* the most recent. This was confusing so we deleted
it this morning to close#2030.
Unfortunately Travis was defaulting to pulling the "latest" tag since
one wasn't specified (e.g. the way we do in `Dockerfile` and
`docker-compose.yml`). Resulting in build breakage:
```
docker pull letsencrypt/boulder-tools
Using default tag: latest
Pulling repository docker.io/letsencrypt/boulder-tools
Tag latest not found in repository docker.io/letsencrypt/boulder-tools
The command "docker pull letsencrypt/boulder-tools" failed and exited
with 1
```
This commit specifies the same tag as in `Dockerfile` and
`docker-compose.yml` for travis. We will need to update this tag when we
update the other places for a new boolder-tools image.
Protobuf files need to be regenerated because (I think) Golang 1.7.3 uses a somewhat different method of ordering fields in a struct when marshaling to bytes.
* Switch back to go 1.5 in Travis.
* Add back GO15VENDOREXPERIMENT.
* Add GO15VENDOREXPERIMENT to Dockerfile
* Revert FAKE_DNS change.
* Revert "Properly close test servers (#2110)"
* Revert "Close VA HTTP test servers (#2111)"
* Change Godep version to 1.5.
* Standardize on issue number
This PR modifies the `test.sh` script to allow a `rpm` value in the `RUN` parameter passed to the script via the environment. When present, `make rpm` is invoked and a good status is required for the build to pass.
The `Makefile` was modified to add a `-f` to the `fpm` invocation used by the `rpm` build task to allow the output rpm to be overwritten if present. Otherwise multiple runs of identical bulld (e.g. on a local dev machine) would collide on the .rpm already being present.
Finally `.travis.yml` is updated to include `rpm` in the `RUN` used during CI such that an RPM is built by default for CI runs. I left the default `RUN` in `test.sh` unmodified, so an RPM will not be built for local runs (e.g. `docker-compose run boulder ./test.sh`).
This fixes#2085
This PR removes the use of the global configuration variable BOULDER_CONFIG. It also removes the global configuration struct cmd.Config. Furthermore, it removes the dependency codegangsta/cli and the last bit of code that was using it cmd/single-ocsp/main.go.
This is the final (hopefully) pull request in the work to remove the reliance on a global configuration structure. Included below is a history of all other pull requests relevant in accomplishing this:
WFE (#1973)
RA (#1974)
SA (#1975)
CA (#1978)
VA (#1979)
Publisher (#2008)
OCSP Updater (#2013)
OCSP Responder (#2017)
Admin Revoker (#2053)
Expiration Mailer (#2036)
Cert Checker (#2058)
Orphan Finder (#2059)
Single OCSP (this PR)
Closes#1962
Moves the wfe to it's own config file.
Each config will now belong in `test/config` and `test/config-next` analogous to `boulder-config` and `boulder-config-next`.
Moving to Docker meant that we weren't passing through the BOULDER_CONFIG
variable properly, which meant we weren't testing the boulder-config-next.json
configuration. That allowed https://github.com/letsencrypt/boulder/issues/1948
to pass unnoticed.
Credit to @benileo for asking the question of why that issue wasn't caught in
testing, which led to this fix. Thanks for the attention to detail!
Since they are only run inside an "if Travis" block, and we know those tools are
installed in the Docker image we use on Travis. This restores coverage reporting
to our builds.
https://github.com/letsencrypt/boulder/pull/1850
That change broke the certbot tests because it switched to a MariaDB
10.1-specific syntax. certbot/certbot#3058 changes the certbot tests to use
Boulder's docker-compose.yml, so they will get MariaDB 10.1 automatically.
* MariaDB 10.1
* MariaDB 10.1 in Docker
* Run docker stuff.
* Improve test.js error.
* Lower log level
* Revert dockerfile to master
* Export debug ports, set FAKE_DNS, and remove container_name.
* Remove typo.
* Make integration-test.py wait for debug ports.
* Use 10.1 and export more Boulder ports.
* Test updates for Docker
Listen on 0.0.0.0 for utility servers.
Make integration-test.py just wait for ports rather than calling startservers.
Run docker-compose in test.sh.
Remove bypass when database exists.
Separate mailer test into its own function in integration test.
Print better errors in test.js.
* Always bring up mysql container.
* Wait for MySQL to come up.
* Put it in travis-before-install.
* Use 127
* Remove manual docker-up.
* Add ifconfig
* Switch to docker-compose run
* It works!
* Remove some spurious env vars.
* Add bash
* try running it
* Add all deps.
* Pass through env.
* Install everything in the Dockerfile.
* Fix install of ruby
* More improvements
* Revert integration test to run directly
Also remove .git from dockerignore and add some packages.
* Revert integration-test.py to master.
* Stop ignoring test/js
* Start from boulder-tools.
* Add boulder-tools.
* Tweak travis.yml
* Separate out docker-compose pull as install.
* Build in install phase; don't bother with go install in Dockerfile
* Add virtualenv
* Actually build rabbitmq-setup
* Remove FAKE_DNS
* Trivial change
* Pull boulder-tools as a separate step so it gets its own timing info.
* Install certbot and protobuf from repos.
* Use cerbot from debian backports.
* Fix clone
* Remove CERTBOT_PATH
* Updates
* Go back to letsencrypt for build.sh
* Remove certbot volume.
* go back to preinstalled letsencrypt
* Restore ENV
* Remove BASH_ENV
* Adapt reloader test so it psses when run as root.
* Fixups for review.
* Revert test.js
* Revert startservers.py
* Revert Makefile.
* Fix go generate command in metrics.
The previous command only worked on OS X. This one works on Linux but not
OS X.
Also add generate phase of test.sh.
* Add mockgen to test setup.
* Fix github-pr-status output.
* Fix envvar style.
* Set xtrace.
* Fix test.sh
* Fix test.sh some more.
* Fix mockgen command.
* Add dependencies for running `go generate`.
* Add protoc-gen-go.
* Fix go get command.
* Fix generate.
* Wait for all.
* Fix generate.
* Update generated pb.
* Fix generate commands for vendored world.
* Update documentation for new vendor style.
* Update grpc package to latest.
* Update caaChecker proto with latest.
* Run go generate only over TESTPATHS
* See if Travis passes under 1.6
* Switch back to 1.5.
* Trim run command.
* Run stringer from correct directory.
* Move generate command.
* Restore and generate
* Fix path.
* list contents of GOPATH.
* Fix stringer by prebuilding.
* Try another import path.
* regenerate bcode_string.
* remove excess package
* pull jsha fork of protoc-gen-go that echoes
* Echo protoc version.
* install from source
* CD back.
* Go back to normal protoc-gen-go
* Fix path
* Move protobuf install into test/setup.sh
* Move before_install to install.
* Set PATH.
* Follow 301 with curl.
* Shuffle test order.
* Swap back test order.
* Restore all tests.
* Restore 1.5.3 to Travis.
* Remove unnecessary wait-or-exit
* Generate metrics mock with latest mockgen.
* Wrap TESTPATHS in curlies
* Remove spurious bracket
* Split out CAA checking service (minus logging etc)
* Add example.yml config + follow general Boulder style
* Update protobuf package to correct version
* Add grpc client to va
* Add TLS authentication in both directions for CAA client/server
* Remove go lint check
* Add bcodes package listing custom codes for Boulder
* Add very basic (pull-only) gRPC metrics to VA + caa-service
* Fix all errcheck errors
* Add errcheck to test.sh
* Add a new sa.Rollback method to make handling errors in rollbacks easier.
This also causes a behavior change in the VA. If a HTTP connection is
abruptly closed after serving the headers for a non-200 response, the
reported error will be the read failure instead of the non-200.
Use bridged networking.
Add some files to .dockerignore to shrink the build state sent to Docker
daemon.
Use specific hostnames to contact services, rather than localhost.
Add instructions for adding those hostnames to /etc/hosts in non-Docker config.
Use DSN-style connect strings for DBs.
Remove localhost / 127.0.0.1 rewrite hack from create_db.sh.
Add hosts section with new hostnames.
Remove bin from .dockerignore.
SQL grants go to %
Short-circuit DB creation if already existing.
Make `go install` a part of Docker image build so that Docker run is much
faster.
Bind to 0.0.0.0 for OCSP responders so they can be reached from host, and
publish / expose their ports.
Remove ToSServerThread and test.js' fetch of ToS.
Increase the registrationsPerIP rate limit threshold. When issuing from a Docker
host, the 127.0.0.1 override doesn't apply, so the limit is quickly hit.
Update docker-compose for bridged networking. Note: docker-compose doesn't currently work, but should be close.
https://github.com/letsencrypt/boulder/pull/1639
We get intermittent failures on Travis where mariadb times out writes. Travis
support recommended we start catting the logs after a failure to see if there's
anything useful.