### Connect before sending mail, not at startup
Per #2250 when we connect to the remote SMTP server at start-up time by calling `mailer.Connect()` but do not actually call `mailer.SendMail()` until after we have done some potentially expensive/time-consuming work we are liable to have our connection closed due to timeout.
This PR moves the `Connect()` call in `expiration-mailer` and `notify-mailer` to be closer to where the actual messages are sent via `SendMail()` and resolves#2250
### Handle SMTP 421 errors gracefully
Issue #2249 describes a case where we see this SMTP error code from the remote server when our connection has been idle for too long. This would manifest when connecting to the remote server at startup, running a very long database query, and then sending mail. This commit allows the mailer to treat SMTP 421 errors as an event that should produce a reconnect attempt and resolves#2249.
A unit test is added to the mailer tests to test that reconnection works when the server sends a SMTP 421 error. Prior to b64e51f and support for SMTP 421 reconnection this test failed in a manner matching issue #2249:
```
go test -p 1 -race --test.run TestReconnectSMTP421
github.com/letsencrypt/boulder/mail
Wrote goodbye msg: 421 1.2.3 green.eggs.and.spam Error: timeout exceeded
Cutting off client early
--- FAIL: TestReconnectSMTP421 (0.00s)
mailer_test.go:257: Expected SendMail() to not fail. Got err: 421
1.2.3 green.eggs.and.spam Error: timeout exceeded
FAIL
FAIL github.com/letsencrypt/boulder/mail 0.023s
```
With b64e51f the test passes and the client gracefully reconnects.
The existing reconnect testing logic in the `mail-test-srv` integration tests is changed such that half of the forced disconnects are a normal clean connection close and half are a SMTP 421. This allows the existing integration test for server disconnects to be reused to test the 421 reconnect logic.
The `MailerImpl` gains a few new fields (`retryBase`, & `retryMax`). These are used with `core.RetryBackoff` in `reconnect()` to implement exponential backoff in a reconnect attempt loop. Both `expiration-mailer` and `notify-mailer` are modified to add CLI args for these 2 flags and to wire them into the `MailerImpl` via its `New()` constructor.
In `MailerImpl`'s `SendMail()` function it now detects when `sendOne` returns an `io.EOF` error indicating that the server closed the connection unexpectedly. When this case occurs `reconnect()` is invoked. If the reconnect succeeds then we invoke `sendOne` again to try and complete the message sending operation that was interrupted by the disconnect.
For integration testing purposes I modified the `mail-test-srv` to support a `-closeChance` parameter between 0 and 100. This controls what % of `MAIL` commands will result in the server immediately closing the client connection before further processing. This allows us to simulate a flaky mailserver. `test/startservers.py` is modified to start the `mail-test-srv` with a 35% close chance to thoroughly test the reconnection logic during the existing `expiration-mailer` integration tests. I took this as a chance to do some slight clean-up of the `mail-test-srv` code (mostly removing global state).
For unit testing purposes I modified the mailer `TestConnect` test to abstract out a server that can operate similar to `mail-test-serv` (e.g. can close connections artificially).
This is testing a server that **closes** a connection, and not a server that **goes away/goes down**. E.g. the `core.RetryBackoff` sleeps themselves are not being tested. The client is disconnected and attempts a reconnection which always succeeds on the first try. To test a "gone away" server would require a more substantial rewrite of the unit tests and the `mail-test-srv`/integration tests. I think this matches the experience we have with MailChimp/Mandril closing long lived connections.
That change broke the certbot tests because it switched to a MariaDB
10.1-specific syntax. certbot/certbot#3058 changes the certbot tests to use
Boulder's docker-compose.yml, so they will get MariaDB 10.1 automatically.
* MariaDB 10.1
* MariaDB 10.1 in Docker
* Run docker stuff.
* Improve test.js error.
* Lower log level
* Revert dockerfile to master
* Export debug ports, set FAKE_DNS, and remove container_name.
* Remove typo.
* Make integration-test.py wait for debug ports.
* Use 10.1 and export more Boulder ports.
* Test updates for Docker
Listen on 0.0.0.0 for utility servers.
Make integration-test.py just wait for ports rather than calling startservers.
Run docker-compose in test.sh.
Remove bypass when database exists.
Separate mailer test into its own function in integration test.
Print better errors in test.js.
* Always bring up mysql container.
* Wait for MySQL to come up.
* Put it in travis-before-install.
* Use 127
* Remove manual docker-up.
* Add ifconfig
* Switch to docker-compose run
* It works!
* Remove some spurious env vars.
* Add bash
* try running it
* Add all deps.
* Pass through env.
* Install everything in the Dockerfile.
* Fix install of ruby
* More improvements
* Revert integration test to run directly
Also remove .git from dockerignore and add some packages.
* Revert integration-test.py to master.
* Stop ignoring test/js
* Start from boulder-tools.
* Add boulder-tools.
* Tweak travis.yml
* Separate out docker-compose pull as install.
* Build in install phase; don't bother with go install in Dockerfile
* Add virtualenv
* Actually build rabbitmq-setup
* Remove FAKE_DNS
* Trivial change
* Pull boulder-tools as a separate step so it gets its own timing info.
* Install certbot and protobuf from repos.
* Use cerbot from debian backports.
* Fix clone
* Remove CERTBOT_PATH
* Updates
* Go back to letsencrypt for build.sh
* Remove certbot volume.
* go back to preinstalled letsencrypt
* Restore ENV
* Remove BASH_ENV
* Adapt reloader test so it psses when run as root.
* Fixups for review.
* Revert test.js
* Revert startservers.py
* Revert Makefile.
- Remove error signatures from log methods. This means fewer places where errcheck will show ignored errors.
- Pull in latest cfssl to be compatible with errorless log messages.
- Reduce the number of message priorities we support to just those we actually use.
- AuditNotice -> AuditInfo
- Remove InfoObject (only one use, switched to Info)
- Remove EmergencyExit and related functions in favor of panic
- Remove SyslogWriter / AuditLogger separate types in favor of a single interface, Logger, that has all the logging methods on it.
- Merge mock log into logger. This allows us to unexport the internals but still override them in the mock.
- Shorten names to be compatible with Go style: New, Set, Get, Logger, NewMock, etc.
- Use a shorter log format for stdout logs.
- Remove "... Starting" log messages. We have better information in the "Versions" message logged at startup.
Motivation: The AuditLogger / SyslogWriter distinction was confusing and exposed internals only necessary for tests. Some components accepted one type and some accepted the other. This made it hard to consistently use mock loggers in tests. Also, the unnecessarily fat interface for AuditLogger made it hard to meaningfully mock out.
This creates a new server, 'mail-test-srv', which is a simplistic SMTP
server that accepts mail and can report the received mail over HTTP.
An integration test is added that uses the new server to test the expiry
mailer.
The FAKECLOCK environment variable is used to force the expiry mailer to
think that the just-issued certificate is about to expire.
Additionally, the expiry mailer is modified to cleanly shut down its
SMTP connections.