Commit Graph

56 Commits

Author SHA1 Message Date
forcedebug b33d28c8bd
Remove repeated words in comments (#7445)
Signed-off-by: forcedebug <forcedebug@outlook.com>
2024-04-23 10:30:33 -04:00
Phil Porada 4bd90ea82f
Log version string for more tools at startup (#7087)
This is a followup to https://github.com/letsencrypt/boulder/pull/7086
2023-09-19 12:46:55 -04:00
Jacob Hoffman-Andrews 7d66d67054
It's borpin' time! (#6982)
This change replaces [gorp] with [borp].

The changes consist of a mass renaming of the import and comments / doc
fixups, plus modifications of many call sites to provide a
context.Context everywhere, since gorp newly requires this (this was one
of the motivating factors for the borp fork).

This also refactors `github.com/letsencrypt/boulder/db.WrappedMap` and
`github.com/letsencrypt/boulder/db.Transaction` to not embed their
underlying gorp/borp objects, but to have them as plain fields. This
ensures that we can only call methods on them that are specifically
implemented in `github.com/letsencrypt/boulder/db`, so we don't miss
wrapping any. This required introducing a `NewWrappedMap` method along
with accessors `SQLDb()` and `BorpDB()` to get at the internal fields
during metrics and logging setup.

Fixes #6944
2023-07-17 14:38:29 -07:00
Jacob Hoffman-Andrews a2b2e53045
cmd: fail without panic (#6935)
For "ordinary" errors like "file not found" for some part of the config,
we would prefer to log an error and exit without logging about a panic
and printing a stack trace.

To achieve that, we want to call `defer AuditPanic()` once, at the top
of `cmd/boulder`'s main. That's so early that we haven't yet parsed the
config, which means we haven't yet initialized a logger. We compromise:
`AuditPanic` now calls `log.Get()`, which will retrieve the configured
logger if one has been set up, or will create a default one (which logs
to stderr/stdout).

AuditPanic and Fail/FailOnError now cooperate: Fail/FailOnError panic
with a special type, and AuditPanic checks for that type and prints a
simple message before exiting when it's present.

This PR also coincidentally fixes a bug: panicking didn't previously
cause the program to exit with nonzero status, because it recovered the
panic but then did not explicitly exit nonzero.

Fixes #6933
2023-06-20 12:29:02 -07:00
Samantha b2224eb4bc
config: Add validation tags to all configuration structs (#6674)
- Require `letsencrypt/validator` package.
- Add a framework for registering configuration structs and any custom
validators for each Boulder component at `init()` time.
- Add a `validate` subcommand which allows you to pass a `-component`
name and `-config` file path.
- Expose validation via exported utility functions
`cmd.LookupConfigValidator()`, `cmd.ValidateJSONConfig()` and
`cmd.ValidateYAMLConfig()`.
- Add unit test which validates all registered component configuration
structs against test configuration files.

Part of #6052
2023-03-21 14:08:03 -04:00
Aaron Gable 5ce4b5a6d4
Use time format constants (#6694)
Use constants from the go stdlib time package, such as time.DateTime and
time.RFC3339, when parsing and formatting timestamps. Additionally,
simplify or remove some of our uses of parsing timestamps, such as to
set fake clocks in tests.
2023-02-24 11:22:23 -08:00
Aaron Gable 9c197e1f43
Use io and os instead of deprecated ioutil (#6286)
The iotuil package has been deprecated since go1.16; the various
functions it provided now exist in the os and io packages. Replace all
instances of ioutil with either io or os, as appropriate.
2022-08-10 13:30:17 -07:00
Aaron Gable 07e96e326f
Fix unproblematic race in parallel notify-mailer (#6273)
The newly-parallel notify-mailer has a potential race where multiple
goroutines all try to set an option on the email template at the same
time. This is not an issue, as they're all setting the same option, but
fix it anyway by moving that option-setting to before the parallelism
begins.
2022-08-03 15:37:54 -07:00
Aaron Gable f5525ccd15
Parallelize notify-mailer (#6268)
Use the same pattern as was recently implemented in
expiration-mailer to parallelize notify-mailer. This should
significantly increase throughput when sending emails
to all subscribers.
2022-08-02 16:18:01 -07:00
Aaron Gable d2efdf5929
notify-mailer: remove extraneous empty template (#6242) 2022-07-20 17:13:49 -07:00
Jacob Hoffman-Andrews 76f987a1df
Reland "Allow expiration mailer to work in parallel" (#6133)
This reverts commit 7ef6913e71.

We turned on the `ExpirationMailerDontLookTwice` feature flag in prod, and it's
working fine but not clearing the backlog. Since
https://github.com/letsencrypt/boulder/pull/6100 fixed the issue that caused us
to (nearly) stop sending mail when we deployed #6057, this should be safe to
roll forward.

The revert of the revert applied cleanly, except for expiration-mailer/main.go
and `main_test.go`, particularly around the contents `processCerts` (where
`sendToOneRegID` was extracted from) and `sendToOneRegID` itself. So those areas
are good targets for extra attention.
2022-05-23 16:16:43 -07:00
Aaron Gable 7ef6913e71
Revert "Allow expiration mailer to work in parallel" (#6080)
When deployed, the newly-parallel expiration-mailer encountered
unexpected difficulties and dropped to apparently sending nearly zero
emails despite not throwing any real errors. Reverting the parallelism
change until we understand and can fix the root cause.

This reverts two commits:
- Allow expiration mailer to work in parallel (#6057)
- Fix data race in expiration-mailer test mocks (#6072) 

It also modifies the revert to leave the new `ParallelSends` config key
in place (albeit completely ignored), so that the binary containing this
revert can be safely deployed regardless of config status.

Part of #5682
2022-05-03 13:18:40 -07:00
Jacob Hoffman-Andrews 9629c88d66
Allow expiration mailer to work in parallel (#6057)
Previously, each accounts email would be sent in serial,
along with several reads from the database (to check for
certificate renewal) and several writes to the database (to update
`certificateStatus.lastExpirationNagSent`). This adds a config field
for the expiration mailer that sets the parallelism it will use.

That means making and using multiple SMTP connections as well. Previously,
`bmail.Mailer` was not safe for concurrent use. It also had a piece of
API awkwardness: after you created a Mailer, you had to call Connect on
it to change its state.

Instead of treating that as a state change on Mailer, I split out a
separate component: `bmail.Conn`. Now, when you call `Mailer.Connect()`,
you get a Conn. You can send mail on that Conn and Close it when you're
done. A single Mailer instance can produce multiple Conns, so Mailer is
now concurrency-safe (while Conn is not).

This involved a moderate amount of renaming and code movement, and
GitHub's move detector is not keeping up 100%, so an eye towards "is
this moved code?" may help. Also adding `?w=1` to the diff URL to ignore
whitespace diffs.
2022-04-21 18:04:55 -07:00
Samantha ecd6d0118c
notify-mailer: Improve error checking during template execution (#5932)
- Break message body construction out into a testable method.
- Ensure that in the event of a missing key, an informative error is returned instead
  of allowing the message to be populated with the zero value of the key.
- Add message body construction tests for success, empty map, and missing key. 
- Comment the `recipient` struct and it's `Data` field to make it clear that SRE
  must be informed of any modifications.

Fixes #5921
2022-02-08 09:53:51 -08:00
Aaron Gable 305ef9cce9
Improve error checking paradigm (#5920)
We have decided that we don't like the if err := call(); err != nil
syntax, because it creates confusing scopes, but we have not cleaned up
all existing instances of that syntax. However, we have now found a
case where that syntax enables a bug: It caused readers to believe that
a later err = call() statement was assigning to an already-declared err
in the local scope, when in fact it was assigning to an
already-declared err in the parent scope of a closure. This caused our
ineffassign and staticcheck linters to be unable to analyze the
lifetime of the err variable, and so they did not complain when we
never checked the actual value of that error.

This change standardizes on the two-line error checking syntax
everywhere, so that we can more easily ensure that our linters are
correctly analyzing all error assignments.
2022-02-01 14:42:43 -07:00
Samantha f69b57e0e1
Make DB client initialization uniform and stop setting 'READ-UNCOMMITTED' (#5741)
Boulder components initialize their gorp and gorp-less (non-wrapped) database
clients via two new SA helpers. These helpers handle client construction,
database metric initialization, and (for gorp only) debug logging setup.

Removes transaction isolation parameter `'READ-UNCOMMITTED'` from all database
connections.

Fixes #5715 
Fixes #5889
2022-01-31 13:34:23 -08:00
Jacob Hoffman-Andrews 3bf06bb4d8
Export the config structs from our main files (#5875)
This allows our documentation on those structs to show up in our godoc
output.
2022-01-12 12:20:27 -08:00
Jacob Hoffman-Andrews 23dd1e21f9
Build all boulder binaries into a single binary (#5693)
The resulting `boulder` binary can be invoked by different names to
trigger the behavior of the relevant subcommand. For instance, symlinking
and invoking as `boulder-ca` acts as the CA. Symlinking and invoking as
`boulder-va` acts as the VA.

This reduces the .deb file size from about 200MB to about 20MB.

This works by creating a registry that maps subcommand names to `main`
functions. Each subcommand registers itself in an `init()` function. The
monolithic `boulder` binary then checks what name it was invoked with
(`os.Args[0]`), looks it up in the registry, and invokes the appropriate
`main`. To avoid conflicts, all of the old `package main` are replaced
with `package notmain`.

To get the list of registered subcommands, run `boulder --list`. This
is used when symlinking all the variants into place, to ensure the set
of symlinked names matches the entries in the registry.

Fixes #5692
2021-10-20 17:05:45 -07:00
Samantha d08668f8ad
notify-mailer: Improve recipient list parser (#5495)
- Parse recipient list file as a TSV when flag `-tsv` is provided
- Log recipient list records that contain empty columns
- Log and skip recipient list records that contain the same `id` as previously
  read records
- Remove unnecessary check for mismatched header and record column length, this
  is already handled by the `encoding/csv` package
- Remove unnecessary check for empty line, this is already handled by the
  `encoding/csv` package

Part of  #5420
2021-06-28 16:45:22 -07:00
Samantha 45d3421193
notify-mailer: Database connection and query improvements (#5496)
- Use database settings in JSON config when creating connection
- Set database transaction isolation level `READ UNCOMMITTED` by modifying the
  DSN
- Add additional empty case of `[]` when querying contact table

Part of #5420
2021-06-21 18:49:06 -07:00
Samantha 401d862354
mail: Rename RecoverableSMTPError to BadAddressSMTPError (#5479)
Rename `RecoverableSMTPError` to `BadAddressSMTPError`. The former
implies that an operation resulting in this error can be retried.
2021-06-15 11:04:56 -07:00
Samantha 205223abbc
notify-mailer: Improve terminology consistency and general cleanup (#5485)
### Improve consistency
- Make registration `id` an `int64`
- Use `address`, `recipient`, and `record` terminology
- Use `errors.New()` in place of `fmt.Errorf()`
- Use `strings.Builder` in place of `bytes.Buffer`
- Use `errors.Is()` when checking for sentinel errors
- Remove unused (duplicate) `cmd.PasswordFile` in `config`
- Remove unused `cmd.Features` in `config`

### Improve readability
- Use godoc standard comments
- Replace multiple calls to `len(someVariable)` with `totalSomeVariable`

Part of #5420
2021-06-15 10:09:19 -07:00
Samantha 5a92926b0c
Remove dbconfig migration deployability code (#5348)
Default boulder code paths to exclusively use the `db` config key

Fixes #5338
2021-03-18 16:41:15 -07:00
Samantha e2e7dad034
Move cmd.DBConfig fields to their own named sub-struct (#5286)
Named field `DB`, in a each component configuration struct, acts as the
receiver for the value of `db` when component JSON files are
unmarshalled.

When `cmd.DBConfig` fields are received at the root of component
configuration struct instead of `DB` copy them to the `DB` field of the
component configuration struct.

Move existing `cmd.DBConfig` values from the root of each component's
JSON configuration in `test/config-next` to `db`

Part of #5275
2021-02-16 10:48:58 -08:00
Samantha e0510056cc
Enhancements to SQL driver tuning via JSON config (#5235)
Historically the only database/sql driver setting exposed via JSON
config was maxDBConns. This change adds support for maxIdleConns,
connMaxLifetime, connMaxIdleTime, and renames maxDBConns to
maxOpenConns. The addition of these settings will give our SRE team a
convenient method for tuning the reuse/closure of database connections.

A new struct, DBSettings, has been added to SA. The struct, and each of
it's fields has been commented.

All new fields have been plumbed through to the relevant Boulder
components and exported as Prometheus metrics. Tests have been
added/modified to ensure that the fields are being set. There should be
no loss in coverage

Deployability concerns for the migration from maxDBConns to maxOpenConns
have been addressed with the temporary addition of the helper method
cmd.DBConfig.GetMaxOpenConns(). This method can be removed once
test/config is defaulted to using maxOpenConns. Relevant sections of the
code have TODOs added that link back to an newly opened issue.

Fixes #5199
2021-01-25 15:34:55 -08:00
Samantha 20bfc65c32
mailer: replacing error assertions with errors.As (#5123)
errors.As checks for a specific error in a wrapped error chain
(see https://golang.org/pkg/errors/#As) as opposed to asserting
that an error is of a specific type.

Part of #5010
2020-10-13 17:34:17 -07:00
Jacob Hoffman-Andrews 3bf6aa4aac
notify-mailer: improve log output (#5094)
One of the log lines describes the most frequent address corresponding
to a number of accounts, but it actually corresponds to a number of
lines in the input CSV.

Also, now that we escape newlines in log output, the dryRunMailer's
output looks messed up. Split the message body into lines and emit one
log message per line.
2020-09-17 09:56:24 -07:00
Jacob Hoffman-Andrews 6f4966cc0f
Check email address validity in notify-mailer. (#4841)
This required a refactoring: Move validateEmail from the RA to ValidEmail
in the `policy` package. I also moved `ValidDomain` from a method on
PolicyAuthority to a standalone function so that ValidEmail can call it.

notify-mailer will now log invalid addresses and skip them without
attempting to send mail. Since @example.com addresses are invalid,
I updated the notify-mailer test, which used a lot of such addresses.

Also, now when notify-mailer receives an unrecoverable error sending
mail, it logs the email address and what offset within the list it was.
2020-06-04 18:28:02 -07:00
Jacob Hoffman-Andrews bef02e782a
Fix nits found by staticcheck (#4726)
Part of #4700
2020-03-30 10:20:20 -07:00
Jacob Hoffman-Andrews 3a1a08a10b
Remove unused code. (#4722)
Found by staticcheck.
2020-03-27 11:55:42 -07:00
Daniel McCarney f1894f8d1d
tidy: typo fixes flagged by codespell (#4634) 2020-01-07 14:01:26 -05:00
Roland Bracewell Shoemaker 5b2f11e07e Switch away from old style statsd metrics wrappers (#4606)
In a handful of places I've nuked old stats which are not used in any alerts or dashboards as they either duplicate other stats or don't provide much insight/have never actually been used. If we feel like we need them again in the future it's trivial to add them back.

There aren't many dashboards that rely on old statsd style metrics, but a few will need to be updated when this change is deployed. There are also a few cases where prometheus labels have been changed from camel to snake case, dashboards that use these will also need to be updated. As far as I can tell no alerts are impacted by this change.

Fixes #4591.
2019-12-18 11:08:25 -05:00
Daniel McCarney 1c9ece3f44
SA: use wrapped database maps/transactions. (#4585)
New types and related infrastructure are added to the `db` package to allow
wrapping gorp DbMaps and Transactions.

The wrapped versions return a special `db.ErrDatabaseOp` error type when errors
occur. The new error type includes additional information such as the operation
that failed and the related table.

Where possible we determine the table based on the types of the gorp function
arguments. Where that isn't possible (e.g. with raw SQL queries) we try to use
a simple regexp approach to find the table name. This isn't great for general
SQL but works well enough for Boulder's existing SQL queries.

To get additional confidence my regexps work for all of Boulder's queries
I temporarily changed the `db` package's `tableFromQuery` function to panic if
the table couldn't be determined. I re-ran the full unit and integration test
suites with this configuration and saw no panics.

Resolves https://github.com/letsencrypt/boulder/issues/4559
2019-12-04 13:03:09 -05:00
Roland Bracewell Shoemaker 064001203b Continue work on more SMTP errors (#4039)
Instead of just on 401. Pulled the various error codes from a handful of SMTP docs I
could find, they could probably use a second once over by others though.
2019-01-28 22:23:25 -08:00
Jacob Hoffman-Andrews 4b9fd1f97e notify-mailer: Support CSV and parameters (#4024)
Fixes #4018 

This rearranges notify-mailer so we can give it CSV input and interpolate fields from that CSV.
It removes the old-style JSON input so we don't have to support two different input styles.

When multiple accounts have the same email address, their recipient data is consolidated under
that address so they only receive a single email. The CSV data can be interpolated using
the `range` operator in Golang templates.

Because we're now operating on the resolved email addresses instead of purely on accounts,
this PR also changes the checkpointing mode. Instead of a numeric start and end, it takes
a pair of strings, and only sends to email addresses between those two strings.
2019-01-22 16:07:17 -08:00
Daniel McCarney 1a68cc2225 notify-mailer: warn for bad rcpt, don't exit. (#4022)
Resolves https://github.com/letsencrypt/boulder/issues/4019

I can't find RFC verse and chapter for "401 4.1.3" errors, but [IANA's registry of SMTP enhanced status codes](https://www.iana.org/assignments/smtp-enhanced-status-codes/smtp-enhanced-status-codes.xhtml) does show an entry matching `x.1.3`:
```
X.1.3 | Bad destination mailbox address syntax | 501 | The destination address was syntactically invalid. This can apply to any field in the address. This code is only useful for permanent failures. | [RFC3463] (Standards Track) | G. Vaudreuil | IESG
```
However that entry from IANA says the "associated basic code" is 501, not 401.

Since we wrote this tool to talk to exactly one SMTP server in the world and it definitely is returning "401 4.1.3" in some cases I think its reasonable to handle as I've done in this PR. Alternative suggestions welcome.
2019-01-18 14:14:30 -08:00
Daniel McCarney ed01d6bc14 notify-mailer: skip invalid contact emails (#4021)
Resolves #4020
2019-01-18 11:47:21 -08:00
Jacob Hoffman-Andrews 281e2546f3
De-duplicate email addresses in notify-mailer. (#4015)
Resolves #4003
2019-01-17 11:34:04 -08:00
Daniel 7f626a1c79 Clearer fix 2019-01-16 09:30:13 -05:00
Daniel 7d8f55d64c notify-mailer: fix off-by-one in printStatus args 2019-01-15 16:25:37 -05:00
Joel Sing 8ebdfc60b6 Provide formatting logger functions. (#3699)
A very large number of the logger calls are of the form log.Function(fmt.Sprintf(...)).
Rather than sprinkling fmt.Sprintf at every logger call site, provide formatting versions
of the logger functions and call these directly with the format and arguments.

While here remove some unnecessary trailing newlines and calls to String/Error.
2018-05-10 11:06:29 -07:00
Jacob Hoffman-Andrews 4296dd985a Use TLS in mailer integration tests (#3213)
* Remove non-TLS support from mailer entirely
* Add a config option for trusted roots in expiration-mailer. If unset, it defaults to the system roots, so this does not need to be set in production.
* Use TLS in mail-test-srv, along with an internal root and localhost certificates signed by that root.
2017-11-06 14:57:14 -08:00
Jacob Hoffman-Andrews f366e45756 Remove global state from metrics gathering (#3167)
Previously, we used prometheus.DefaultRegisterer to register our stats, which uses global state to export its HTTP stats. We also used net/http/pprof's behavior of registering to the default global HTTP ServeMux, via DebugServer, which starts an HTTP server that uses that global ServeMux.

In this change, I merge DebugServer's functions into StatsAndLogging. StatsAndLogging now takes an address parameter and fires off an HTTP server in a goroutine. That HTTP server is newly defined, and doesn't use DefaultServeMux. On it is registered the Prometheus stats handler, and handlers for the various pprof traces. In the process I split StatsAndLogging internally into two functions: makeStats and MakeLogger. I didn't port across the expvar variable exporting, which serves a similar function to Prometheus stats but which we never use.

One nice immediate effect of this change: Since StatsAndLogging now requires and address, I noticed a bunch of commands that called StatsAndLogging, and passed around the resulting Scope, but never made use of it because they didn't run a DebugServer. Under the old StatsD world, these command still could have exported their stats by pushing, but since we moved to Prometheus their stats stopped being collected. We haven't used any of these stats, so instead of adding debug ports to all short-lived commands, or setting up a push gateway, I simply removed them and switched those commands to initialize only a Logger, no stats.
2017-10-13 11:58:01 -07:00
Jacob Hoffman-Andrews b17b5c72a6 Remove statsd from Boulder (#2752)
This removes the config and code to output to statsd.

- Change `cmd.StatsAndLogging` to output a `Scope`, not a `Statter`.
- Remove the prefixing of component name (e.g. "VA") in front of stats; this was stripped by `autoProm` but now no longer needs to be.
- Delete vendored statsd client.
- Delete `MockStatter` (generated by gomock) and `mocks.Statter` (hand generated) in favor of mocking `metrics.Scope`, which is the interface we now use everywhere.
- Remove a few unused methods on `metrics.Scope`, and update its generated mock.
- Refactor `autoProm` and add `autoRegisterer`, which can be included in a `metrics.Scope`, avoiding global state. `autoProm` now registers everything with the `prometheus.Registerer` it is given.
- Change va_test.go's `setup()` to not return a stats object; instead the individual tests that care about stats override `va.stats` directly.

Fixes #2639, #2733.
2017-05-15 10:19:54 -04:00
Daniel McCarney 00d11f126b Parse feature flags in all cmd's (#2534)
If you are the first person to add a feature to a Boulder command its very
easy to forget to update the command's config structure to accommodate a
`map[string]bool` entry and to pass it to `features.Set` in `main()`. See
https://github.com/letsencrypt/boulder/issues/2533 for one example. I've
fallen into this trap myself a few times so I'm going to try and save myself
some future grief by fixing it across the board once and for all!

This PR adds a `Features` config entry and a corresponding `features.Set` to:
* ocsp-updater (resolves #2533)
* admin-revoker
* boulder-publisher
* contact-exporter
* expiration-mailer
* expired-authz-purger
* notify-mailer
* ocsp-responder
* orphan-finder

These components were skipped because they already had features supported:
* boulder-ca
* boulder-ra
* boulder-sa
* boulder-va
* boulder-wfe
* cert-checker

I deliberately skipped adding Feature support to:
* single-ocsp (Its only configuration comes from the pkcs11key library and
  doesn't support features)
* rabbitmq-setup (No configuration/features and we'll likely soon be rming this
  since the gRPC migration)
* notafter-backfill (This is a one-off that will be deleted soon)
2017-01-27 16:29:46 -05:00
Daniel bcc389d109
Fixes gofmt -s diffs 2016-11-30 13:30:03 -05:00
Daniel McCarney 8efc6342bb Mailer reliability improvements (#2262)
### Connect before sending mail, not at startup

Per #2250 when we connect to the remote SMTP server at start-up time by calling `mailer.Connect()` but do not actually call `mailer.SendMail()` until after we have done some potentially expensive/time-consuming work we are liable to have our connection closed due to timeout.

This PR moves the `Connect()` call in `expiration-mailer` and `notify-mailer` to be closer to where the actual messages are sent via `SendMail()` and resolves #2250 

### Handle SMTP 421 errors gracefully

Issue #2249 describes a case where we see this SMTP error code from the remote server when our connection has been idle for too long. This would manifest when connecting to the remote server at startup, running a very long database query, and then sending mail. This commit allows the mailer to treat SMTP 421 errors as an event that should produce a reconnect attempt and resolves #2249.

A unit test is added to the mailer tests to test that reconnection works when the server sends a SMTP 421 error. Prior to b64e51f and support for SMTP 421 reconnection this test failed in a manner matching issue #2249:

```
go test -p 1 -race --test.run TestReconnectSMTP421
github.com/letsencrypt/boulder/mail
Wrote goodbye msg: 421 1.2.3 green.eggs.and.spam Error: timeout exceeded
Cutting off client early
--- FAIL: TestReconnectSMTP421 (0.00s)
  mailer_test.go:257: Expected SendMail() to not fail. Got err: 421
  1.2.3 green.eggs.and.spam Error: timeout exceeded
  FAIL
  FAIL  github.com/letsencrypt/boulder/mail     0.023s
```

With b64e51f the test passes and the client gracefully reconnects.

The existing reconnect testing logic in the `mail-test-srv` integration tests is changed such that half of the forced disconnects are a normal clean connection close and half are a SMTP 421. This allows the existing integration test for server disconnects to be reused to test the 421 reconnect logic.
2016-10-20 14:10:47 -04:00
Daniel McCarney b824f31a4c `notify-mailer` graceful handling of `sql.ErrNoRows`. (#2185)
* Fixes `mockEmailResolver` to return `sql.ErrNoRows`.

This commit reproduces the error observed in #2183 where a registration
ID is provided that doesn't match a row with a valid contact.

First a bug is fixed in the ID range check done by the
`mockEmailResolver` - it was using a `||` where it should have been
using a `&&` and also had a `> 0` where it needed `>= 0`, oops! slipped
past review!

Second the `mockEmailResolver` is modified to return `sql.ErrNoRows`
when the index is out of bounds for the mock data.

Lastly a ID of `999` is added to the `TestResolveEmails` function to
elicit the "mailer.send returned error: sql: no rows in result set"
error.

* Handles `sql.ErrNoRows` in `emailsForReg`.

This commit fixes #2183 (and the failing unit test introduced in the
prior commit) by handling `sql.ErrNoRows` in `emailsForReg` gracefully.

* Clarfies mockEmailResolver comment
2016-09-19 12:14:48 -07:00
Roland Bracewell Shoemaker c8f1fb3e2f Remove direct usages of go-statsd-client in favor of using metrics.Scope (#2136)
Fixes #2118, fixes #2082.
2016-09-07 19:35:13 -04:00
Daniel McCarney a584f8de46 Allow `mailer` to reconnect to server. (#2101)
The `MailerImpl` gains a few new fields (`retryBase`, &  `retryMax`). These are used with `core.RetryBackoff` in `reconnect()` to implement exponential backoff in a reconnect attempt loop. Both `expiration-mailer` and `notify-mailer` are modified to add CLI args for these 2 flags and to wire them into the `MailerImpl` via its `New()` constructor.

In `MailerImpl`'s `SendMail()` function it now detects when `sendOne` returns an `io.EOF` error indicating that the server closed the connection unexpectedly. When this case occurs `reconnect()` is invoked. If the reconnect succeeds then we invoke `sendOne` again to try and complete the message sending operation that was interrupted by the disconnect.

For integration testing purposes I modified the `mail-test-srv` to support a `-closeChance` parameter between 0 and 100. This controls what % of `MAIL` commands will result in the server immediately closing the client connection before further processing. This allows us to simulate a flaky mailserver. `test/startservers.py` is modified to start the `mail-test-srv` with a 35% close chance to thoroughly test the reconnection logic during the existing `expiration-mailer` integration tests. I took this as a chance to do some slight clean-up of the `mail-test-srv` code (mostly removing global state).

For unit testing purposes I modified the mailer `TestConnect` test to abstract out a server that can operate similar to `mail-test-serv` (e.g. can close connections artificially).

This is testing a server that **closes** a connection, and not a server that **goes away/goes down**. E.g. the `core.RetryBackoff` sleeps themselves are not being tested. The client is disconnected and attempts a reconnection which always succeeds on the first try. To test a "gone away" server would require a more substantial rewrite of the unit tests and the `mail-test-srv`/integration tests. I think this matches the experience we have with MailChimp/Mandril closing long lived connections.
2016-08-15 14:14:49 -07:00