Commit Graph

19 Commits

Author SHA1 Message Date
Aaron Gable 212a66ab49
Update go versions in CI and release (#7971)
Update from go1.23.1 to go1.23.6 for our primary CI and release builds.
This brings in a few security fixes that aren't directly relevant to us.

Add go1.24.0 to our matrix of CI and release versions, to prepare for
switching to this next major version in prod.
2025-02-19 14:37:01 -08:00
Aaron Gable bd1d27b8e8
Fix non-gRPC process cleanup and exit (#6808)
Although #6771 significantly cleaned up how gRPC services stop and clean
up, it didn't make any changes to our HTTP servers or our non-server
(e.g. crl-updater, log-validator) processes. This change finishes the
work.

Add a new helper method cmd.WaitForSignal, which simply blocks until one
of the three signals we care about is received. This easily replaces all
calls to cmd.CatchSignals which passed `nil` as the callback argument,
with the added advantage that it doesn't call os.Exit() and therefore
allows deferred cleanup functions to execute. This new function is
intended to be the last line of main(), allowing the whole process to
exit once it returns.

Reimplement cmd.CatchSignals as a thin wrapper around cmd.WaitForSignal,
but with the added callback functionality. Also remove the os.Exit()
call from CatchSignals, so that the main goroutine is allowed to finish
whatever it's doing, call deferred functions, and exit naturally.

Update all of our non-gRPC binaries to use one of these two functions.
The vast majority use WaitForSignal, as they run their main processing
loop in a background goroutine. A few (particularly those that can run
either in run-once or in daemonized mode) still use CatchSignals, since
their primary processing happens directly on the main goroutine.

The changes to //test/load-generator are the most invasive, simply
because that binary needed to have a context plumbed into it for proper
cancellation, but it already had a custom struct type named "context"
which needed to be renamed to avoid shadowing.

Fixes https://github.com/letsencrypt/boulder/issues/6794
2023-04-14 16:22:56 -04:00
Aaron Gable 18216a7ea8
Run CI tests on go1.20 (#6550)
Add go1.20 as a new version to run tests on, and to build release
artifacts from. Fix one test which was failing because it was
accidentally relying on consistent (i.e. unseeded) non-cryptographic
random number generation, which go1.20 now automatically seeds at import
time.

Update the version of golangci-lint used in our docker containers to the
new version that has go1.20 support. Remove a number of nolint comments
that were required due to an old version of the gosec linter.
2023-02-03 11:57:07 -08:00
Jacob Hoffman-Andrews dd1c52573e
log: allow logging to stdout/stderr instead of syslog (#6307)
Right now, Boulder expects to be able to connect to syslog, and panics
if it's not available. We'd like to be able to log to stdout/stderr as a
replacement for syslog.

- Add a detailed timestamp (down to microseconds, same as we collect in
prod via syslog).
- Remove the escape codes for colorizing output.
- Report the severity level numerically rather than with a letter prefix.

Add locking for stdout/stderr and syslog logs. Neither the [syslog] package
nor the [os] package document concurrency-safety, and the Go rule is: if
it's not documented to be concurrent-safe, it's not. Notably the [log.Logger]
package is documented to be concurrent-safe, and a look at its implementation
shows it uses a Mutex internally.

Remove places that use the singleton `blog.Get()`, and instead pass through
a logger from main in all the places that need it.

[syslog]: https://pkg.go.dev/log/syslog
[os]: https://pkg.go.dev/os
[log.Logger]: https://pkg.go.dev/log#Logger
2022-08-29 06:19:22 -07:00
Aaron Gable 305ef9cce9
Improve error checking paradigm (#5920)
We have decided that we don't like the if err := call(); err != nil
syntax, because it creates confusing scopes, but we have not cleaned up
all existing instances of that syntax. However, we have now found a
case where that syntax enables a bug: It caused readers to believe that
a later err = call() statement was assigning to an already-declared err
in the local scope, when in fact it was assigning to an
already-declared err in the parent scope of a closure. This caused our
ineffassign and staticcheck linters to be unable to analyze the
lifetime of the err variable, and so they did not complain when we
never checked the actual value of that error.

This change standardizes on the two-line error checking syntax
everywhere, so that we can more easily ensure that our linters are
correctly analyzing all error assignments.
2022-02-01 14:42:43 -07:00
Jacob Hoffman-Andrews ca26126ca9
Replace master with main. (#4917)
Also, update an example username in mailer tests.
2020-06-30 16:39:39 -07:00
Jacob Hoffman-Andrews bef02e782a
Fix nits found by staticcheck (#4726)
Part of #4700
2020-03-30 10:20:20 -07:00
Jacob Hoffman-Andrews 3a1a08a10b
Remove unused code. (#4722)
Found by staticcheck.
2020-03-27 11:55:42 -07:00
Joel Sing 8ebdfc60b6 Provide formatting logger functions. (#3699)
A very large number of the logger calls are of the form log.Function(fmt.Sprintf(...)).
Rather than sprinkling fmt.Sprintf at every logger call site, provide formatting versions
of the logger functions and call these directly with the format and arguments.

While here remove some unnecessary trailing newlines and calls to String/Error.
2018-05-10 11:06:29 -07:00
Jacob Hoffman-Andrews 6b8b6a37c0 Update chisel2 and boulder-tools (#3495)
This change updates boulder-tools to use Go 1.10, and references a
newly-pushed image built using that new config.

Since boulder-tools pulls in the latest Certbot master at the time of
build, this also pulls in the latest changes to Certbot's acme module,
which now supports ACME v2. This means we no longer have to check out
the special acme-v2-integration branch in our integration tests.

This also updates chisel2.py to reflect some of the API changes that
landed in the acme module as it was merged to master.

Since we don't need additional checkouts to get the ACMEv2-compatible
version of the acme module, we can include it in the default RUN set for
local tests.
2018-02-28 15:21:40 -08:00
Jacob Hoffman-Andrews 4296dd985a Use TLS in mailer integration tests (#3213)
* Remove non-TLS support from mailer entirely
* Add a config option for trusted roots in expiration-mailer. If unset, it defaults to the system roots, so this does not need to be set in production.
* Use TLS in mail-test-srv, along with an internal root and localhost certificates signed by that root.
2017-11-06 14:57:14 -08:00
Jacob Hoffman-Andrews 4128e0d95a Add time-dependent integration testing (#3060)
Fixes #3020.

In order to write integration tests for some features, especially related to rate limiting, rechecking of CAA, and expiration of authzs, orders, and certs, we need to be able to fake the passage of time in integration tests.

To do so, this change switches out all clock.Default() instances for cmd.Clock(), which can be set manually with the FAKECLOCK environment variable. integration-test.py now starts up all servers once before the main body of tests, with FAKECLOCK set to a date 70 days ago, and does some initial setup for a new integration test case. That test case tries to fetch a 70-day-old authz URL, and expects it to 404.

In order to make this work, I also had to change a number of our test binaries to shut down cleanly in response to SIGTERM. Without that change, stopping the servers between the setup phase and the main tests caused startservers.check() to fail, because some processes exited with nonzero status.

Note: This is an initial stab at things, to prove out the technique. Long-term, I think we will want to use an idiom where test cases are classes that have a number of optional setup phases that may be run at e.g. 70 days prior and 5 days prior. This could help us avoid a proliferation of global state as we add more time-dependent test cases.
2017-09-13 12:34:14 -07:00
Daniel McCarney 8efc6342bb Mailer reliability improvements (#2262)
### Connect before sending mail, not at startup

Per #2250 when we connect to the remote SMTP server at start-up time by calling `mailer.Connect()` but do not actually call `mailer.SendMail()` until after we have done some potentially expensive/time-consuming work we are liable to have our connection closed due to timeout.

This PR moves the `Connect()` call in `expiration-mailer` and `notify-mailer` to be closer to where the actual messages are sent via `SendMail()` and resolves #2250 

### Handle SMTP 421 errors gracefully

Issue #2249 describes a case where we see this SMTP error code from the remote server when our connection has been idle for too long. This would manifest when connecting to the remote server at startup, running a very long database query, and then sending mail. This commit allows the mailer to treat SMTP 421 errors as an event that should produce a reconnect attempt and resolves #2249.

A unit test is added to the mailer tests to test that reconnection works when the server sends a SMTP 421 error. Prior to b64e51f and support for SMTP 421 reconnection this test failed in a manner matching issue #2249:

```
go test -p 1 -race --test.run TestReconnectSMTP421
github.com/letsencrypt/boulder/mail
Wrote goodbye msg: 421 1.2.3 green.eggs.and.spam Error: timeout exceeded
Cutting off client early
--- FAIL: TestReconnectSMTP421 (0.00s)
  mailer_test.go:257: Expected SendMail() to not fail. Got err: 421
  1.2.3 green.eggs.and.spam Error: timeout exceeded
  FAIL
  FAIL  github.com/letsencrypt/boulder/mail     0.023s
```

With b64e51f the test passes and the client gracefully reconnects.

The existing reconnect testing logic in the `mail-test-srv` integration tests is changed such that half of the forced disconnects are a normal clean connection close and half are a SMTP 421. This allows the existing integration test for server disconnects to be reused to test the 421 reconnect logic.
2016-10-20 14:10:47 -04:00
Daniel McCarney a584f8de46 Allow `mailer` to reconnect to server. (#2101)
The `MailerImpl` gains a few new fields (`retryBase`, &  `retryMax`). These are used with `core.RetryBackoff` in `reconnect()` to implement exponential backoff in a reconnect attempt loop. Both `expiration-mailer` and `notify-mailer` are modified to add CLI args for these 2 flags and to wire them into the `MailerImpl` via its `New()` constructor.

In `MailerImpl`'s `SendMail()` function it now detects when `sendOne` returns an `io.EOF` error indicating that the server closed the connection unexpectedly. When this case occurs `reconnect()` is invoked. If the reconnect succeeds then we invoke `sendOne` again to try and complete the message sending operation that was interrupted by the disconnect.

For integration testing purposes I modified the `mail-test-srv` to support a `-closeChance` parameter between 0 and 100. This controls what % of `MAIL` commands will result in the server immediately closing the client connection before further processing. This allows us to simulate a flaky mailserver. `test/startservers.py` is modified to start the `mail-test-srv` with a 35% close chance to thoroughly test the reconnection logic during the existing `expiration-mailer` integration tests. I took this as a chance to do some slight clean-up of the `mail-test-srv` code (mostly removing global state).

For unit testing purposes I modified the mailer `TestConnect` test to abstract out a server that can operate similar to `mail-test-serv` (e.g. can close connections artificially).

This is testing a server that **closes** a connection, and not a server that **goes away/goes down**. E.g. the `core.RetryBackoff` sleeps themselves are not being tested. The client is disconnected and attempts a reconnection which always succeeds on the first try. To test a "gone away" server would require a more substantial rewrite of the unit tests and the `mail-test-srv`/integration tests. I think this matches the experience we have with MailChimp/Mandril closing long lived connections.
2016-08-15 14:14:49 -07:00
Jacob Hoffman-Andrews 71e4af43f7 Roll forward "Run Travis tests in Docker (#1830)" (#1838)
That change broke the certbot tests because it switched to a MariaDB
10.1-specific syntax. certbot/certbot#3058 changes the certbot tests to use
Boulder's docker-compose.yml, so they will get MariaDB 10.1 automatically.
2016-05-24 15:11:22 -07:00
Jacob Hoffman-Andrews b954dcc010 Revert "Run Travis tests in Docker (#1830)" (#1834)
This reverts commit 92d94f2 and commit 0b4623f to unbreak the Certbot build.
2016-05-20 15:57:10 -07:00
Jacob Hoffman-Andrews 92d94f2558 Run Travis tests in Docker (#1830)
* MariaDB 10.1

* MariaDB 10.1 in Docker

* Run docker stuff.

* Improve test.js error.

* Lower log level

* Revert dockerfile to master

* Export debug ports, set FAKE_DNS, and remove container_name.

* Remove typo.

* Make integration-test.py wait for debug ports.

* Use 10.1 and export more Boulder ports.

* Test updates for Docker

Listen on 0.0.0.0 for utility servers.
Make integration-test.py just wait for ports rather than calling startservers.
Run docker-compose in test.sh.
Remove bypass when database exists.
Separate mailer test into its own function in integration test.
Print better errors in test.js.

* Always bring up mysql container.

* Wait for MySQL to come up.

* Put it in travis-before-install.

* Use 127

* Remove manual docker-up.

* Add ifconfig

* Switch to docker-compose run

* It works!

* Remove some spurious env vars.

* Add bash

* try running it

* Add all deps.

* Pass through env.

* Install everything in the Dockerfile.

* Fix install of ruby

* More improvements

* Revert integration test to run directly
Also remove .git from dockerignore and add some packages.

* Revert integration-test.py to master.

* Stop ignoring test/js

* Start from boulder-tools.

* Add boulder-tools.

* Tweak travis.yml

* Separate out docker-compose pull as install.

* Build in install phase; don't bother with go install in Dockerfile

* Add virtualenv

* Actually build rabbitmq-setup

* Remove FAKE_DNS

* Trivial change

* Pull boulder-tools as a separate step so it gets its own timing info.

* Install certbot and protobuf from repos.

* Use cerbot from debian backports.

* Fix clone

* Remove CERTBOT_PATH

* Updates

* Go back to letsencrypt for build.sh

* Remove certbot volume.

* go back to preinstalled letsencrypt

* Restore ENV

* Remove BASH_ENV

* Adapt reloader test so it psses when run as root.

* Fixups for review.

* Revert test.js

* Revert startservers.py

* Revert Makefile.
2016-05-19 16:29:45 -07:00
Jacob Hoffman-Andrews ecc04e8e61 Refactor log package (#1717)
- Remove error signatures from log methods. This means fewer places where errcheck will show ignored errors.
- Pull in latest cfssl to be compatible with errorless log messages.
- Reduce the number of message priorities we support to just those we actually use.
- AuditNotice -> AuditInfo
- Remove InfoObject (only one use, switched to Info)
- Remove EmergencyExit and related functions in favor of panic
- Remove SyslogWriter / AuditLogger separate types in favor of a single interface, Logger, that has all the logging methods on it.
- Merge mock log into logger. This allows us to unexport the internals but still override them in the mock.
- Shorten names to be compatible with Go style: New, Set, Get, Logger, NewMock, etc.
- Use a shorter log format for stdout logs.
- Remove "... Starting" log messages. We have better information in the "Versions" message logged at startup.

Motivation: The AuditLogger / SyslogWriter distinction was confusing and exposed internals only necessary for tests. Some components accepted one type and some accepted the other. This made it hard to consistently use mock loggers in tests. Also, the unnecessarily fat interface for AuditLogger made it hard to meaningfully mock out.
2016-04-08 16:12:20 -07:00
Kane York 98567efdfc Add integration tests for expiry mailer
This creates a new server, 'mail-test-srv', which is a simplistic SMTP
server that accepts mail and can report the received mail over HTTP.

An integration test is added that uses the new server to test the expiry
mailer.

The FAKECLOCK environment variable is used to force the expiry mailer to
think that the just-issued certificate is about to expire.

Additionally, the expiry mailer is modified to cleanly shut down its
SMTP connections.
2016-03-25 10:02:02 -07:00