Commit Graph

5734 Commits

Author SHA1 Message Date
dependabot[bot] 99dcb9a5b3
Bump docker/login-action from 1.8.0 to 2.0.0 (#6118)
Bumps [docker/login-action](https://github.com/docker/login-action) from 1.8.0 to 2.0.0.
- [Release notes](https://github.com/docker/login-action/releases)
- [Commits](https://github.com/docker/login-action/compare/v1.8.0...v2.0.0)

---
updated-dependencies:
- dependency-name: docker/login-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-31 15:35:52 -07:00
Aaron Gable 11544756bb
Support new Google CT Policy (#6082)
Add a new code path to the ctpolicy package which enforces Chrome's new
CT Policy, which requires that SCTs come from logs run by two different
operators, rather than one Google and one non-Google log. To achieve
this, invert the "race" logic: rather than assuming we always have two
groups, and racing the logs within each group against each other, we now
race the various groups against each other, and pick just one arbitrary
log from each group to attempt submission to.

Ensure that the new code path does the right thing by adding a new zlint
which checks that the two SCTs embedded in a certificate come from logs
run by different operators. To support this lint, which needs to have a
canonical mapping from logs to their operators, import the Chrome CT Log
List JSON Schema and autogenerate Go structs from it so that we can
parse a real CT Log List. Also add flags to all services which run these
lints (the CA and cert-checker) to let them load a CT Log List from disk
and provide it to the lint.

Finally, since we now have the ability to load a CT Log List file
anyway, use this capability to simplify configuration of the RA. Rather
than listing all of the details for each log we're willing to submit to,
simply list the names (technically, Descriptions) of each log, and look
up the rest of the details from the log list file.

To support this change, SRE will need to deploy log list files (the real
Chrome log list for prod, and a custom log list for staging) and then
update the configuration of the RA, CA, and cert-checker. Once that
transition is complete, the deletion TODOs left behind by this change
will be able to be completed, removing the old RA configuration and old
ctpolicy race logic.

Part of #5938
2022-05-25 15:14:57 -07:00
Jacob Hoffman-Andrews 3e01ffe8bf
Revert "Add GitHub Action for golangci-lint (#5973)" (#6138)
This reverts commit 5fe5859c38.

Per #5973:

> we will eventually want to go back to doing this in boulder-tools, so it's easy
> to run the lints locally. But this is useful so we can unblock testing on go 1.18beta2.
2022-05-25 14:35:39 -07:00
Aaron Gable 8227c8fcb2
Fix flake in TestMultiVA (#6141)
Fix a race condition in one of the TestMultiVA test cases.

There are many other possible flakes in these tests, because
they use real HTTP to talk to a fake remote VA on localhost,
but we can at least trivially remove this race condition.

Fixes #6119
2022-05-25 14:25:41 -07:00
Aaron Gable cf9e250735
Update Public Suffix List (#6134)
Fixes #6030
2022-05-25 11:53:37 -07:00
Aaron Gable 8066983fa9
Fix unnecessary conversion lint (#6136) 2022-05-24 18:15:07 -07:00
dependabot[bot] fd57e39414
Bump actions/checkout from 2 to 3 (#6107)
Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 3.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v2...v3)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-24 14:36:27 -07:00
Jacob Hoffman-Andrews 76f987a1df
Reland "Allow expiration mailer to work in parallel" (#6133)
This reverts commit 7ef6913e71.

We turned on the `ExpirationMailerDontLookTwice` feature flag in prod, and it's
working fine but not clearing the backlog. Since
https://github.com/letsencrypt/boulder/pull/6100 fixed the issue that caused us
to (nearly) stop sending mail when we deployed #6057, this should be safe to
roll forward.

The revert of the revert applied cleanly, except for expiration-mailer/main.go
and `main_test.go`, particularly around the contents `processCerts` (where
`sendToOneRegID` was extracted from) and `sendToOneRegID` itself. So those areas
are good targets for extra attention.
2022-05-23 16:16:43 -07:00
Jacob Hoffman-Andrews 7dcbf69536
Add sendDelay metric (#6130)
Fixes #6125
2022-05-20 19:44:13 -07:00
Aaron Gable 9b4ca235dd
Update boulder-tools dependencies (#6129)
Update:
- golangci-lint from v1.42.1 to v1.46.2
- protoc from v3.15.6 to v3.20.1
- protoc-gen-go from v1.26.0 to v1.28.0
- protoc-gen-go-grpc from v1.1.0 to v1.2.0
- fpm from v1.14.0 to v1.14.2

Also remove a reference to go1.17.9 from one last place.

This does result in updating all of our generated .pb.go files, but only
to update the version number embedded in each file's header.

Fixes #6123
2022-05-20 14:24:01 -07:00
Aaron Gable 3a94adecb1
Fix nil pointer dereference in TestMultiVA (#6132)
If the test unexpectedly failed, it would hit a nil pointer dereference
on line 511 when it tries to access the `.Type` field of nil. Add another
case to handle this.
2022-05-19 17:16:06 -07:00
Aaron Gable 50f6a6b84f
Speed up ca unit tests by 10x (#6128)
Profiling showed that the unit tests were spending almost all of
their time in key generation, because the tests were generating
new fake keys for the linter with every call to `setup()`. Instead,
create the linters just once at startup time.

This decreases the runtime of `go test ./ca/...` from ~9.7s to less
than one second.
2022-05-18 13:16:04 -07:00
Aaron Gable f958d479f9
Stop testing on go1.17 (#6126)
We are using exclusively go1.18 in our deployment environments.
2022-05-18 08:40:29 -07:00
Aaron Gable 238c309088
Reduce redundant audit logging of CSR (#6113)
Stop logging the full CSR bytes at the WFE; we log these bytes in the
CA immediately before attempting to sign a precertificate anyway.

Also unify the way we log precertificate and final certificate signing
events in the CA: an Info containing the request prior to signing, and
then either an Info or an Error containing the result afterwards.

Fixes #6088
2022-05-16 14:33:33 -07:00
Jacob Hoffman-Andrews be893678bd
expiration-mailer: feature-gate bug fix (#6122)
We recently landed a fix so the expiration-mailer won't look twice at
the same certificate. This will cause an immediate behavior change when
it is deployed, and that might have surprising effects. Put the fix
behind a feature flag so we can control when it rolls out more
carefully.
2022-05-16 14:17:23 -07:00
Jacob Hoffman-Andrews 5c3f62d4a5
expiration-mailer: avoid re-examining certificates (#6100)
findExpiringCertificates had a previously unstated invariant: It is
supposed to only examine each certificate once per nag window. Otherwise,
the set of certificates it has to examine may get clogged up with
certificates it has previously looked at, and it will fall behind. It
does this by updating lastExpirationNagSent after examining each
certificate. For certificates that have already been renewed, we update
this field even though we didn't actually send any mail.

We accidentally broke this invariant in #6057. When that change rolled
out to staging, suddenly our rate of sent mail plummeted to near-zero,
and our nags_at_capacity metric went to 1 for all expiration windows,
and stayed there until we deployed a revert.

The problem was here:
https://github.com/letsencrypt/boulder/pull/6057/files#diff-ae49e23ff8be05aae6145106f04c76fc1f0b7b336c0cdcbe8183f572dedf47c5R261-R263.
We check if `reg.Contacts` is empty, and bail. Prior to #6057,
that check happened _after_ checking for renewal and updating
lastExpirationNagSent. In the code shuffle of #6057, that check got
moved to _before_ checking for renewal. So expiration-mailer's input
became clogged with certificates that had already been renewed, but were
issued by accounts with no contact.

There's a second problem that has existed practically forever: we hit the
`reg.Contacts` check for no-contact-info certificates that have _not_
been renewed, and those also clog up expiration mailer. This problem is
probably causing some amount of slowness today in both prod and staging.

I wrote unittests to check the first and second problem, and verified
that they fail as appropriate. The second one fails with current `main`;
the first one fails if I simulate #6057 by moving the `reg.Contacts`
check higher in the file.

The fix is simple: delete the `reg.Contacts` check entirely. It's valid
to call `sendNags` with an empty contacts list, and will return a nil
error. That allows findExpiringCertificates to proceed to updating
lastExpirationNagSent for those certificates, so they won't be examined
again on the next iteration. I also removed another spurious length check
in `sendNags`, since there's an equivalent length check a few lines lower.

Related to #5682
2022-05-16 11:42:43 -07:00
Jacob Hoffman-Andrews 81f724ba53
ocsp/responder: fix goroutine leak in multiSource (#6116)
In multiSource, if the primary source returned before the secondary
source, we would leak the goroutine waiting on results from the secondary
source. This was because that goroutine was writing to a channel that
no longer had any readers.

Fix that by making the channel in getResponse buffered, so getResponse
can always write to its channel, regardless of whether there are readers.

I've confirmed that the unittest fails when run without the fix.
2022-05-16 11:41:58 -07:00
Aaron Gable 8e156f4dd9
Quiet log output for successful /directory requests (#6096)
Add functionality to the ubiquitous RequestEvent (aka logEvent) to
allow handlers to suppress the final log line that is printed when
a non-500 response is being sent.

Use this functionality to suppress logging GET requests for the /directory
endpoint. We can expand this in the future to quiet other logs that
are not helpful for metrics or analysis.

Fixes #6094
2022-05-13 08:35:34 -07:00
Jacob Hoffman-Andrews a4ba9b1adb
rocsp/config: fix PoolSize comment (#6110)
The go-redis docs say default is 10 * NumCPU, but the actual code says 5.

Extra context:

2465baaab5/options.go (L143-L145)

2465baaab5/cluster.go (L96-L98)

For Options, the default (documented) is 10 * NumCPUs. For ClusterOptions, the
default (undocumented) is 5 * NumCPUs. We use ClusterOptions. Also worth noting:
for ClusterOptions, the limit is per node.
2022-05-12 16:29:26 -07:00
Jacob Hoffman-Andrews 25e4b7e7fa
expiration-mailer: Deprecate NagCheckInterval (#6103)
This was introduced when expiration-mailer was run by cron, and was a
way for expiration-mailer to know something about its expected run
interval so it could send notifications "on time" rather than "just
after" the configured email time.

Now that expiration-mailer runs as a daemon we can simply pull this
value from `Frequency`, which is set to the same value in prod.
2022-05-12 16:28:42 -07:00
Jacob Hoffman-Andrews ed1063d5c6
rocsp: add "connection_pool" to metric names (#6112)
Most of our existing Redis metrics are related to the connection pool,
but some sound like they could be request-oriented (like redis_hits and
redis_misses). Fix that with better naming. Also, join three related
metrics (hits, misses, and timeouts) under a single metric name with
labels so they more easily by grouped and counted.

Move the stat creation into rocsp/metrics.go so the names are in the
same file as the collector.

Add a unittest for rocsp metrics.
2022-05-12 16:01:07 -07:00
Jacob Hoffman-Andrews eeaa81cf78
expiration-mailer: bail early on context cancel (#6102)
We have various long-running loops in expiration-mailer. When a context
is canceled, rather than running through the loops to their end (and
potentially spamming logs with "context canceled" errors), we should
return early.

Fixes #6069
2022-05-12 14:01:24 -07:00
dependabot[bot] 6f5b3e7f2e
Bump golangci-lint actions/setup-go from v2 to v3 (#6108)
Bumps [actions/setup-go](https://github.com/actions/setup-go) from 2 to 3.
- [Release notes](https://github.com/actions/setup-go/releases)
- [Commits](https://github.com/actions/setup-go/compare/v2...v3)

We don't use any special configuration for setup-go, so this is fully backwards
compatible for us.
2022-05-12 10:04:48 -07:00
Aaron Gable 5e11772e49
Simplify SA gRPC stream test scaffolding (#6068)
When testing the SA from within the SA package itself, we can directly
call its methods without having to go through the gRPC client interface.
Rather than needing to mock out gRPC client and server streams that
talk to each other over a channel, we can simply mock the server stream
and read from it directly.

The inMemSA adapter work will still be necessary when (for example)
testing an RA which uses a real in-memory SA to serve as its test
backend, rather than mocking out the SA. In that case, we need to wrap
the `storageAuthorityImpl` in something which causes it to match the
`sapb.StorageAuthorityClient` interface expected by the RA. That's what
the `//test/inmem/` package is for, and we'll move this wrapper code there
when we need it.

By testing these methods directly, rather than through an inmem wrapper,
we remove a potential cyclic import that was preventing us from moving
the inmem stream adapter into the `//test/inmem` package itself.
2022-05-12 10:00:00 -07:00
Naveen 94ce90845e
CI: Add githubactions to the dependabot config (#6078) 2022-05-11 17:34:35 -07:00
Jacob Hoffman-Andrews 0044c1b089
Make rocsp config robust to empty "addrs" config (#6086)
Before this change, an empty "addrs" config would result in a panic.
After this change, it will result in a clean error message.
2022-05-11 14:41:13 -07:00
Aaron Gable 8cb01a0c34
Enable additional linters (#6106)
These new linters are almost all part of golangci-lint's collection
of default linters, that would all be running if we weren't setting
`disable-all: true`. By adding them, we now have parity with the
default configuration, as well as the additional linters we like.

Adds the following linters:
* unconvert
* deadcode
* structcheck
* typecheck
* varcheck
* wastedassign
2022-05-11 13:58:58 -07:00
Jacob Hoffman-Andrews 8ba71cd624
expiration-mailer: Don't bail early on error (#6101)
If there is an error in SelectCertificate, we can continue doing work
with the remaining certificates.
2022-05-10 16:40:37 -07:00
Jacob Hoffman-Andrews f5769c0967
Fix comment on AssertMetricWithLabelsEquals (#6099)
Also tag it as a helper.
2022-05-10 15:52:19 -07:00
nathannaveen d40edc1933
Set permissions for GitHub actions (#6044)
Explicitly restrict the permissions used by the boulder-ci workflow jobs.
2022-05-10 13:24:30 -07:00
Jacob Hoffman-Andrews 5451e79643
Clean up expiration mailer test (#6093)
There was a lot of copy-paste code, and in particular one of
the most important pieces of setup information (the value of the
`lastExpirationNagSent` field) we often hidden off to the right of the
screen. This extracts out common logic into helper functions and replaces
manual INSERTs with gorp inserts.
2022-05-10 11:44:36 -07:00
Aaron Gable f29f63a317
Don't write "null" to DB for missing contacts (#6090)
Instead write `[]`, a better representation of an empty contact set,
and avoid having literal JSON `null`s in our database.

As part of doing so, add some extra code to //sa/model.go that
bypasses the need for //sa/type-converter.go to do any magic
JSON-to-string-slice conversions for us.

Fixes #6074
2022-05-10 09:25:41 -07:00
Aaron Gable f6978f396f
Improve github release artifacts (#6092)
Generate .deb packages for all currently configured Go versions
(usually the current and upcoming versions that we use in prod), rather
than just the one default version. Also ensure that the uploaded
artifacts have 8-character short hashes in their names.

Unfortunately this does require updating Go versions in one additional
place (the release.yml file), since we are no longer parsing it out of the
docker-compose.yml. This is unavoidable without hacks that I consider
to be even uglier than the repetition.

Fixes #6075
Fixes #6084
2022-05-09 16:41:26 -07:00
Jacob Hoffman-Andrews 507c0d1ab3
expiration-mailer: add stats (#6091) 2022-05-09 16:41:10 -07:00
Jacob Hoffman-Andrews e0605aadab
issuance: improve error message when load chain fails (#6087) 2022-05-05 15:41:07 -07:00
Aaron Gable b3c56a5d05
Reorganize docs and bring in release docs (#6077)
Bring in the release docs from the boulder-release-process repo,
so that they're adjacent to all of our other docs. This allows us to
delete that repo. Also update references to that repo to instead point
to the new doc here.

Also make minor organization updates to other docs to keep the root
of this repository clean.
2022-05-04 12:21:37 -07:00
Aaron Gable 2b23d3c1fc
Register OCSP responder metrics (#6083)
These metrics were never registered, so although they are
being incremented, they are not being exported or collected.
2022-05-04 11:22:34 -07:00
Aaron Gable 7b6b914697
Use go1.18.1 by default (#6081)
This also updates the version built by the build and release action.
2022-05-03 13:19:02 -07:00
Aaron Gable 7ef6913e71
Revert "Allow expiration mailer to work in parallel" (#6080)
When deployed, the newly-parallel expiration-mailer encountered
unexpected difficulties and dropped to apparently sending nearly zero
emails despite not throwing any real errors. Reverting the parallelism
change until we understand and can fix the root cause.

This reverts two commits:
- Allow expiration mailer to work in parallel (#6057)
- Fix data race in expiration-mailer test mocks (#6072) 

It also modifies the revert to leave the new `ParallelSends` config key
in place (albeit completely ignored), so that the binary containing this
revert can be safely deployed regardless of config status.

Part of #5682
2022-05-03 13:18:40 -07:00
Daniel Jeffery a2ff222fda
cert-checker: use config log level and handle nil mariadb response (#6066)
- Fix cert-checker to use the syslog and stdout logging facilities it
reads from the config file instead of having them hard-coded to zero.
- Fix cert-checker to handle a nil response from mariadb if no records
are found.
- Fix comment in log.go to correctly describe when the initialize function
and therefore default values would be used.

Fixes #6067
2022-05-02 13:29:53 -07:00
Aaron Gable 8ec10c4848
Fix data race in expiration-mailer test mocks (#6072)
Although each goroutine gets its own `mocks.mockMailerConn`, each one
of those is racing with the others whenever they try to update the list
of sent messages in their parent `mocks.Mailer`. This leads to data races
in the unit tests (but, thankfully, not in the production code).

Introduce a mutex around the slice of sent messages to prevent the race.

Fixes #6070
2022-05-02 13:16:27 -07:00
Aaron Gable 802acc510f
Use Redis 6.2.7 because Redis 7.0.0 breaks go-redis (#6073)
Redis recently released version 7.0.0, which has several breaking
changes. The go-redis library that we rely on does not yet support
communicating with a Redis 7.0.0 cluster.

Pin ourselves to the latest non-7.0.0 version, 6.2.7, until such time
as go-redis releases a version with support for 7.0.0.

Fixes #6071
2022-05-02 11:42:02 -07:00
dependabot[bot] 0243b54e5b
Bump github.com/eggsampler/acme/v3 from 3.2.1 to 3.3.0 (#6060)
Bumps github.com/eggsampler/acme/v3 from 3.2.1 to 3.3.0.
- Release notes: https://github.com/eggsampler/acme/releases
- Diff: https://github.com/eggsampler/acme/compare/v3.2.1...v3.3.0

Also updates github.com/miekg/dns from v1.1.45 to v1.1.48.
This does not affect any files we depend on.
2022-04-25 15:09:25 -07:00
Jacob Hoffman-Andrews e4c1cf2eb1
rocsp-tool: remove ServiceConfig from config (#6061)
ServiceConfig is only needed for components that act as gRPC services.
For rocsp-tool, which is a gRPC client but is long-running enough to
merit a debug port, we should provide DebugAddr in the config.
2022-04-25 14:57:18 -07:00
Jacob Hoffman-Andrews 29249b4aad
Add feature flag AllowUnrecognizedFeatures (#6056)
By default, Boulder's feature flag code verifies that the list of flags
being set (from a JSON file) maps to actually-existing flags.

However, this gets in the way of a deployment strategy where feature
flags are added to config templates during a staging deploy with "true"
or "false" filled in depending on production or staging status - for
instance, when rolling out a deprecation to staging ahead of production.
If those configs get rolled to prod before the corresponding Boulder
deploy, Boulder will refuse to start up, even though it would be fine to
start up with the unrecognized flag ignored.

The envisioned deployment behavior here is that prod will have
AllowUnrecognizedFeatures: true while staging will have it set to false,
to ensure that misspellings of feature flag names are caught during
staging deploy. As a correlary, this assumes that the list of flags in
configs will be the same between staging and prod, with only their
values changing.
2022-04-21 18:05:18 -07:00
Jacob Hoffman-Andrews 9629c88d66
Allow expiration mailer to work in parallel (#6057)
Previously, each accounts email would be sent in serial,
along with several reads from the database (to check for
certificate renewal) and several writes to the database (to update
`certificateStatus.lastExpirationNagSent`). This adds a config field
for the expiration mailer that sets the parallelism it will use.

That means making and using multiple SMTP connections as well. Previously,
`bmail.Mailer` was not safe for concurrent use. It also had a piece of
API awkwardness: after you created a Mailer, you had to call Connect on
it to change its state.

Instead of treating that as a state change on Mailer, I split out a
separate component: `bmail.Conn`. Now, when you call `Mailer.Connect()`,
you get a Conn. You can send mail on that Conn and Close it when you're
done. A single Mailer instance can produce multiple Conns, so Mailer is
now concurrency-safe (while Conn is not).

This involved a moderate amount of renaming and code movement, and
GitHub's move detector is not keeping up 100%, so an eye towards "is
this moved code?" may help. Also adding `?w=1` to the diff URL to ignore
whitespace diffs.
2022-04-21 18:04:55 -07:00
Jacob Hoffman-Andrews fe6fab8821
Remove fqdnsets_old workaround (#6054)
Fixes #5670
2022-04-21 16:39:35 -07:00
dependabot[bot] f1c7b038b1
Bump github.com/miekg/dns from 1.1.45 to 1.1.48 (#6059)
Bumps [github.com/miekg/dns](https://github.com/miekg/dns) from 1.1.45 to 1.1.48.
- [Release notes](https://github.com/miekg/dns/releases)
- [Changelog](https://github.com/miekg/dns/blob/master/Makefile.release)
- [Commits](https://github.com/miekg/dns/compare/v1.1.45...v1.1.48)

---
updated-dependencies:
- dependency-name: github.com/miekg/dns
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-21 16:37:12 -07:00
dependabot[bot] 4afb2f191f
Bump gopkg.in/square/go-jose.v2 from 2.4.1 to 2.6.0 (#6046)
Bumps [gopkg.in/square/go-jose.v2](https://github.com/square/go-jose) from 2.4.1 to 2.6.0.
- [Release notes](https://github.com/square/go-jose/releases)
- [Commits](https://github.com/square/go-jose/compare/v2.4.1...v2.6.0)

---
updated-dependencies:
- dependency-name: gopkg.in/square/go-jose.v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-21 16:07:36 -07:00
Jacob Hoffman-Andrews 23fc3c907b
Split up build and release job (#6055)
This allows us to narrow permissions by only granting write privileges
to the upload portion of the job (which doesn't run any code from our
repo). It also allows us to verify that the release build works on every
commit, while only generating releases on actual release tags.
2022-04-19 21:42:23 -07:00