Commit Graph

5728 Commits

Author SHA1 Message Date
dependabot[bot] fd57e39414
Bump actions/checkout from 2 to 3 (#6107)
Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 3.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v2...v3)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-24 14:36:27 -07:00
Jacob Hoffman-Andrews 76f987a1df
Reland "Allow expiration mailer to work in parallel" (#6133)
This reverts commit 7ef6913e71.

We turned on the `ExpirationMailerDontLookTwice` feature flag in prod, and it's
working fine but not clearing the backlog. Since
https://github.com/letsencrypt/boulder/pull/6100 fixed the issue that caused us
to (nearly) stop sending mail when we deployed #6057, this should be safe to
roll forward.

The revert of the revert applied cleanly, except for expiration-mailer/main.go
and `main_test.go`, particularly around the contents `processCerts` (where
`sendToOneRegID` was extracted from) and `sendToOneRegID` itself. So those areas
are good targets for extra attention.
2022-05-23 16:16:43 -07:00
Jacob Hoffman-Andrews 7dcbf69536
Add sendDelay metric (#6130)
Fixes #6125
2022-05-20 19:44:13 -07:00
Aaron Gable 9b4ca235dd
Update boulder-tools dependencies (#6129)
Update:
- golangci-lint from v1.42.1 to v1.46.2
- protoc from v3.15.6 to v3.20.1
- protoc-gen-go from v1.26.0 to v1.28.0
- protoc-gen-go-grpc from v1.1.0 to v1.2.0
- fpm from v1.14.0 to v1.14.2

Also remove a reference to go1.17.9 from one last place.

This does result in updating all of our generated .pb.go files, but only
to update the version number embedded in each file's header.

Fixes #6123
2022-05-20 14:24:01 -07:00
Aaron Gable 3a94adecb1
Fix nil pointer dereference in TestMultiVA (#6132)
If the test unexpectedly failed, it would hit a nil pointer dereference
on line 511 when it tries to access the `.Type` field of nil. Add another
case to handle this.
2022-05-19 17:16:06 -07:00
Aaron Gable 50f6a6b84f
Speed up ca unit tests by 10x (#6128)
Profiling showed that the unit tests were spending almost all of
their time in key generation, because the tests were generating
new fake keys for the linter with every call to `setup()`. Instead,
create the linters just once at startup time.

This decreases the runtime of `go test ./ca/...` from ~9.7s to less
than one second.
2022-05-18 13:16:04 -07:00
Aaron Gable f958d479f9
Stop testing on go1.17 (#6126)
We are using exclusively go1.18 in our deployment environments.
2022-05-18 08:40:29 -07:00
Aaron Gable 238c309088
Reduce redundant audit logging of CSR (#6113)
Stop logging the full CSR bytes at the WFE; we log these bytes in the
CA immediately before attempting to sign a precertificate anyway.

Also unify the way we log precertificate and final certificate signing
events in the CA: an Info containing the request prior to signing, and
then either an Info or an Error containing the result afterwards.

Fixes #6088
2022-05-16 14:33:33 -07:00
Jacob Hoffman-Andrews be893678bd
expiration-mailer: feature-gate bug fix (#6122)
We recently landed a fix so the expiration-mailer won't look twice at
the same certificate. This will cause an immediate behavior change when
it is deployed, and that might have surprising effects. Put the fix
behind a feature flag so we can control when it rolls out more
carefully.
2022-05-16 14:17:23 -07:00
Jacob Hoffman-Andrews 5c3f62d4a5
expiration-mailer: avoid re-examining certificates (#6100)
findExpiringCertificates had a previously unstated invariant: It is
supposed to only examine each certificate once per nag window. Otherwise,
the set of certificates it has to examine may get clogged up with
certificates it has previously looked at, and it will fall behind. It
does this by updating lastExpirationNagSent after examining each
certificate. For certificates that have already been renewed, we update
this field even though we didn't actually send any mail.

We accidentally broke this invariant in #6057. When that change rolled
out to staging, suddenly our rate of sent mail plummeted to near-zero,
and our nags_at_capacity metric went to 1 for all expiration windows,
and stayed there until we deployed a revert.

The problem was here:
https://github.com/letsencrypt/boulder/pull/6057/files#diff-ae49e23ff8be05aae6145106f04c76fc1f0b7b336c0cdcbe8183f572dedf47c5R261-R263.
We check if `reg.Contacts` is empty, and bail. Prior to #6057,
that check happened _after_ checking for renewal and updating
lastExpirationNagSent. In the code shuffle of #6057, that check got
moved to _before_ checking for renewal. So expiration-mailer's input
became clogged with certificates that had already been renewed, but were
issued by accounts with no contact.

There's a second problem that has existed practically forever: we hit the
`reg.Contacts` check for no-contact-info certificates that have _not_
been renewed, and those also clog up expiration mailer. This problem is
probably causing some amount of slowness today in both prod and staging.

I wrote unittests to check the first and second problem, and verified
that they fail as appropriate. The second one fails with current `main`;
the first one fails if I simulate #6057 by moving the `reg.Contacts`
check higher in the file.

The fix is simple: delete the `reg.Contacts` check entirely. It's valid
to call `sendNags` with an empty contacts list, and will return a nil
error. That allows findExpiringCertificates to proceed to updating
lastExpirationNagSent for those certificates, so they won't be examined
again on the next iteration. I also removed another spurious length check
in `sendNags`, since there's an equivalent length check a few lines lower.

Related to #5682
2022-05-16 11:42:43 -07:00
Jacob Hoffman-Andrews 81f724ba53
ocsp/responder: fix goroutine leak in multiSource (#6116)
In multiSource, if the primary source returned before the secondary
source, we would leak the goroutine waiting on results from the secondary
source. This was because that goroutine was writing to a channel that
no longer had any readers.

Fix that by making the channel in getResponse buffered, so getResponse
can always write to its channel, regardless of whether there are readers.

I've confirmed that the unittest fails when run without the fix.
2022-05-16 11:41:58 -07:00
Aaron Gable 8e156f4dd9
Quiet log output for successful /directory requests (#6096)
Add functionality to the ubiquitous RequestEvent (aka logEvent) to
allow handlers to suppress the final log line that is printed when
a non-500 response is being sent.

Use this functionality to suppress logging GET requests for the /directory
endpoint. We can expand this in the future to quiet other logs that
are not helpful for metrics or analysis.

Fixes #6094
2022-05-13 08:35:34 -07:00
Jacob Hoffman-Andrews a4ba9b1adb
rocsp/config: fix PoolSize comment (#6110)
The go-redis docs say default is 10 * NumCPU, but the actual code says 5.

Extra context:

2465baaab5/options.go (L143-L145)

2465baaab5/cluster.go (L96-L98)

For Options, the default (documented) is 10 * NumCPUs. For ClusterOptions, the
default (undocumented) is 5 * NumCPUs. We use ClusterOptions. Also worth noting:
for ClusterOptions, the limit is per node.
2022-05-12 16:29:26 -07:00
Jacob Hoffman-Andrews 25e4b7e7fa
expiration-mailer: Deprecate NagCheckInterval (#6103)
This was introduced when expiration-mailer was run by cron, and was a
way for expiration-mailer to know something about its expected run
interval so it could send notifications "on time" rather than "just
after" the configured email time.

Now that expiration-mailer runs as a daemon we can simply pull this
value from `Frequency`, which is set to the same value in prod.
2022-05-12 16:28:42 -07:00
Jacob Hoffman-Andrews ed1063d5c6
rocsp: add "connection_pool" to metric names (#6112)
Most of our existing Redis metrics are related to the connection pool,
but some sound like they could be request-oriented (like redis_hits and
redis_misses). Fix that with better naming. Also, join three related
metrics (hits, misses, and timeouts) under a single metric name with
labels so they more easily by grouped and counted.

Move the stat creation into rocsp/metrics.go so the names are in the
same file as the collector.

Add a unittest for rocsp metrics.
2022-05-12 16:01:07 -07:00
Jacob Hoffman-Andrews eeaa81cf78
expiration-mailer: bail early on context cancel (#6102)
We have various long-running loops in expiration-mailer. When a context
is canceled, rather than running through the loops to their end (and
potentially spamming logs with "context canceled" errors), we should
return early.

Fixes #6069
2022-05-12 14:01:24 -07:00
dependabot[bot] 6f5b3e7f2e
Bump golangci-lint actions/setup-go from v2 to v3 (#6108)
Bumps [actions/setup-go](https://github.com/actions/setup-go) from 2 to 3.
- [Release notes](https://github.com/actions/setup-go/releases)
- [Commits](https://github.com/actions/setup-go/compare/v2...v3)

We don't use any special configuration for setup-go, so this is fully backwards
compatible for us.
2022-05-12 10:04:48 -07:00
Aaron Gable 5e11772e49
Simplify SA gRPC stream test scaffolding (#6068)
When testing the SA from within the SA package itself, we can directly
call its methods without having to go through the gRPC client interface.
Rather than needing to mock out gRPC client and server streams that
talk to each other over a channel, we can simply mock the server stream
and read from it directly.

The inMemSA adapter work will still be necessary when (for example)
testing an RA which uses a real in-memory SA to serve as its test
backend, rather than mocking out the SA. In that case, we need to wrap
the `storageAuthorityImpl` in something which causes it to match the
`sapb.StorageAuthorityClient` interface expected by the RA. That's what
the `//test/inmem/` package is for, and we'll move this wrapper code there
when we need it.

By testing these methods directly, rather than through an inmem wrapper,
we remove a potential cyclic import that was preventing us from moving
the inmem stream adapter into the `//test/inmem` package itself.
2022-05-12 10:00:00 -07:00
Naveen 94ce90845e
CI: Add githubactions to the dependabot config (#6078) 2022-05-11 17:34:35 -07:00
Jacob Hoffman-Andrews 0044c1b089
Make rocsp config robust to empty "addrs" config (#6086)
Before this change, an empty "addrs" config would result in a panic.
After this change, it will result in a clean error message.
2022-05-11 14:41:13 -07:00
Aaron Gable 8cb01a0c34
Enable additional linters (#6106)
These new linters are almost all part of golangci-lint's collection
of default linters, that would all be running if we weren't setting
`disable-all: true`. By adding them, we now have parity with the
default configuration, as well as the additional linters we like.

Adds the following linters:
* unconvert
* deadcode
* structcheck
* typecheck
* varcheck
* wastedassign
2022-05-11 13:58:58 -07:00
Jacob Hoffman-Andrews 8ba71cd624
expiration-mailer: Don't bail early on error (#6101)
If there is an error in SelectCertificate, we can continue doing work
with the remaining certificates.
2022-05-10 16:40:37 -07:00
Jacob Hoffman-Andrews f5769c0967
Fix comment on AssertMetricWithLabelsEquals (#6099)
Also tag it as a helper.
2022-05-10 15:52:19 -07:00
nathannaveen d40edc1933
Set permissions for GitHub actions (#6044)
Explicitly restrict the permissions used by the boulder-ci workflow jobs.
2022-05-10 13:24:30 -07:00
Jacob Hoffman-Andrews 5451e79643
Clean up expiration mailer test (#6093)
There was a lot of copy-paste code, and in particular one of
the most important pieces of setup information (the value of the
`lastExpirationNagSent` field) we often hidden off to the right of the
screen. This extracts out common logic into helper functions and replaces
manual INSERTs with gorp inserts.
2022-05-10 11:44:36 -07:00
Aaron Gable f29f63a317
Don't write "null" to DB for missing contacts (#6090)
Instead write `[]`, a better representation of an empty contact set,
and avoid having literal JSON `null`s in our database.

As part of doing so, add some extra code to //sa/model.go that
bypasses the need for //sa/type-converter.go to do any magic
JSON-to-string-slice conversions for us.

Fixes #6074
2022-05-10 09:25:41 -07:00
Aaron Gable f6978f396f
Improve github release artifacts (#6092)
Generate .deb packages for all currently configured Go versions
(usually the current and upcoming versions that we use in prod), rather
than just the one default version. Also ensure that the uploaded
artifacts have 8-character short hashes in their names.

Unfortunately this does require updating Go versions in one additional
place (the release.yml file), since we are no longer parsing it out of the
docker-compose.yml. This is unavoidable without hacks that I consider
to be even uglier than the repetition.

Fixes #6075
Fixes #6084
2022-05-09 16:41:26 -07:00
Jacob Hoffman-Andrews 507c0d1ab3
expiration-mailer: add stats (#6091) 2022-05-09 16:41:10 -07:00
Jacob Hoffman-Andrews e0605aadab
issuance: improve error message when load chain fails (#6087) 2022-05-05 15:41:07 -07:00
Aaron Gable b3c56a5d05
Reorganize docs and bring in release docs (#6077)
Bring in the release docs from the boulder-release-process repo,
so that they're adjacent to all of our other docs. This allows us to
delete that repo. Also update references to that repo to instead point
to the new doc here.

Also make minor organization updates to other docs to keep the root
of this repository clean.
2022-05-04 12:21:37 -07:00
Aaron Gable 2b23d3c1fc
Register OCSP responder metrics (#6083)
These metrics were never registered, so although they are
being incremented, they are not being exported or collected.
2022-05-04 11:22:34 -07:00
Aaron Gable 7b6b914697
Use go1.18.1 by default (#6081)
This also updates the version built by the build and release action.
2022-05-03 13:19:02 -07:00
Aaron Gable 7ef6913e71
Revert "Allow expiration mailer to work in parallel" (#6080)
When deployed, the newly-parallel expiration-mailer encountered
unexpected difficulties and dropped to apparently sending nearly zero
emails despite not throwing any real errors. Reverting the parallelism
change until we understand and can fix the root cause.

This reverts two commits:
- Allow expiration mailer to work in parallel (#6057)
- Fix data race in expiration-mailer test mocks (#6072) 

It also modifies the revert to leave the new `ParallelSends` config key
in place (albeit completely ignored), so that the binary containing this
revert can be safely deployed regardless of config status.

Part of #5682
2022-05-03 13:18:40 -07:00
Daniel Jeffery a2ff222fda
cert-checker: use config log level and handle nil mariadb response (#6066)
- Fix cert-checker to use the syslog and stdout logging facilities it
reads from the config file instead of having them hard-coded to zero.
- Fix cert-checker to handle a nil response from mariadb if no records
are found.
- Fix comment in log.go to correctly describe when the initialize function
and therefore default values would be used.

Fixes #6067
2022-05-02 13:29:53 -07:00
Aaron Gable 8ec10c4848
Fix data race in expiration-mailer test mocks (#6072)
Although each goroutine gets its own `mocks.mockMailerConn`, each one
of those is racing with the others whenever they try to update the list
of sent messages in their parent `mocks.Mailer`. This leads to data races
in the unit tests (but, thankfully, not in the production code).

Introduce a mutex around the slice of sent messages to prevent the race.

Fixes #6070
2022-05-02 13:16:27 -07:00
Aaron Gable 802acc510f
Use Redis 6.2.7 because Redis 7.0.0 breaks go-redis (#6073)
Redis recently released version 7.0.0, which has several breaking
changes. The go-redis library that we rely on does not yet support
communicating with a Redis 7.0.0 cluster.

Pin ourselves to the latest non-7.0.0 version, 6.2.7, until such time
as go-redis releases a version with support for 7.0.0.

Fixes #6071
2022-05-02 11:42:02 -07:00
dependabot[bot] 0243b54e5b
Bump github.com/eggsampler/acme/v3 from 3.2.1 to 3.3.0 (#6060)
Bumps github.com/eggsampler/acme/v3 from 3.2.1 to 3.3.0.
- Release notes: https://github.com/eggsampler/acme/releases
- Diff: https://github.com/eggsampler/acme/compare/v3.2.1...v3.3.0

Also updates github.com/miekg/dns from v1.1.45 to v1.1.48.
This does not affect any files we depend on.
2022-04-25 15:09:25 -07:00
Jacob Hoffman-Andrews e4c1cf2eb1
rocsp-tool: remove ServiceConfig from config (#6061)
ServiceConfig is only needed for components that act as gRPC services.
For rocsp-tool, which is a gRPC client but is long-running enough to
merit a debug port, we should provide DebugAddr in the config.
2022-04-25 14:57:18 -07:00
Jacob Hoffman-Andrews 29249b4aad
Add feature flag AllowUnrecognizedFeatures (#6056)
By default, Boulder's feature flag code verifies that the list of flags
being set (from a JSON file) maps to actually-existing flags.

However, this gets in the way of a deployment strategy where feature
flags are added to config templates during a staging deploy with "true"
or "false" filled in depending on production or staging status - for
instance, when rolling out a deprecation to staging ahead of production.
If those configs get rolled to prod before the corresponding Boulder
deploy, Boulder will refuse to start up, even though it would be fine to
start up with the unrecognized flag ignored.

The envisioned deployment behavior here is that prod will have
AllowUnrecognizedFeatures: true while staging will have it set to false,
to ensure that misspellings of feature flag names are caught during
staging deploy. As a correlary, this assumes that the list of flags in
configs will be the same between staging and prod, with only their
values changing.
2022-04-21 18:05:18 -07:00
Jacob Hoffman-Andrews 9629c88d66
Allow expiration mailer to work in parallel (#6057)
Previously, each accounts email would be sent in serial,
along with several reads from the database (to check for
certificate renewal) and several writes to the database (to update
`certificateStatus.lastExpirationNagSent`). This adds a config field
for the expiration mailer that sets the parallelism it will use.

That means making and using multiple SMTP connections as well. Previously,
`bmail.Mailer` was not safe for concurrent use. It also had a piece of
API awkwardness: after you created a Mailer, you had to call Connect on
it to change its state.

Instead of treating that as a state change on Mailer, I split out a
separate component: `bmail.Conn`. Now, when you call `Mailer.Connect()`,
you get a Conn. You can send mail on that Conn and Close it when you're
done. A single Mailer instance can produce multiple Conns, so Mailer is
now concurrency-safe (while Conn is not).

This involved a moderate amount of renaming and code movement, and
GitHub's move detector is not keeping up 100%, so an eye towards "is
this moved code?" may help. Also adding `?w=1` to the diff URL to ignore
whitespace diffs.
2022-04-21 18:04:55 -07:00
Jacob Hoffman-Andrews fe6fab8821
Remove fqdnsets_old workaround (#6054)
Fixes #5670
2022-04-21 16:39:35 -07:00
dependabot[bot] f1c7b038b1
Bump github.com/miekg/dns from 1.1.45 to 1.1.48 (#6059)
Bumps [github.com/miekg/dns](https://github.com/miekg/dns) from 1.1.45 to 1.1.48.
- [Release notes](https://github.com/miekg/dns/releases)
- [Changelog](https://github.com/miekg/dns/blob/master/Makefile.release)
- [Commits](https://github.com/miekg/dns/compare/v1.1.45...v1.1.48)

---
updated-dependencies:
- dependency-name: github.com/miekg/dns
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-21 16:37:12 -07:00
dependabot[bot] 4afb2f191f
Bump gopkg.in/square/go-jose.v2 from 2.4.1 to 2.6.0 (#6046)
Bumps [gopkg.in/square/go-jose.v2](https://github.com/square/go-jose) from 2.4.1 to 2.6.0.
- [Release notes](https://github.com/square/go-jose/releases)
- [Commits](https://github.com/square/go-jose/compare/v2.4.1...v2.6.0)

---
updated-dependencies:
- dependency-name: gopkg.in/square/go-jose.v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-21 16:07:36 -07:00
Jacob Hoffman-Andrews 23fc3c907b
Split up build and release job (#6055)
This allows us to narrow permissions by only granting write privileges
to the upload portion of the job (which doesn't run any code from our
repo). It also allows us to verify that the release build works on every
commit, while only generating releases on actual release tags.
2022-04-19 21:42:23 -07:00
Jacob Hoffman-Andrews 4467cf27db
Update config from config-next (#6051)
This copies over settings from config-next that are now deployed in prod.

Also, I updated a comment in sd-test-srv to more accurately describe how SRV records work.
2022-04-19 12:10:26 -07:00
Jacob Hoffman-Andrews cf9df961ba
Add feature flags for upcoming deprecations (#6043)
This adds three features flags: SHA1CSRs, OldTLSOutbound, and
OldTLSInbound. Each controls the behavior of an upcoming deprecation
(except OldTLSInbound, which isn't yet scheduled for a deprecation
but will be soon). Note that these feature flags take advantage of
`features`' default values, so they can default to "true" (that is, each
of these features is enabled by default), and we set them to "false"
in the config JSON to turn them off when the time comes.

The unittest for OldTLSOutbound requires that `example.com` resolves
to 127.0.0.1. This is because there's logic in the VA that checks
that redirected-to hosts end in an IANA TLD. The unittest relies on
redirecting, and we can't use e.g. `localhost` in it because of that
TLD check, so we use example.com.

Fixes #6036 and #6037
2022-04-15 12:14:00 -07:00
Jacob Hoffman-Andrews fad74f772a
Document the errors package (#6050)
Also document that errors.ServerInternal is deprecated in favor of
returning plain errors.
2022-04-14 13:41:24 -07:00
Samantha e0de2f6610
SA: Add support for querying which serials are impacted by a given incident (#6034)
- Add protobuf types `SerialsForIncidentRequest` and `IncidentSerial`
- Rename `incidentCertModel` to `incidentSerialModel`
- Add new SA method `SerialsForIncident`
- Add streaming GRPC adapter to allow for unit testing `SerialsForIncident`
  
Fixes #5947
2022-04-14 12:47:36 -07:00
Jacob Hoffman-Andrews 42c6eacd0f
Remove old challModel code (#6048)
This is no longer needed since the move to authz2.
2022-04-13 16:26:17 -07:00
Jacob Hoffman-Andrews ca29b4b380
Install a specific version of fpm (#6049)
This prevents fpm from changing out from under us unexpectedly.
2022-04-13 16:26:09 -07:00