Add a new code path to the ctpolicy package which enforces Chrome's new
CT Policy, which requires that SCTs come from logs run by two different
operators, rather than one Google and one non-Google log. To achieve
this, invert the "race" logic: rather than assuming we always have two
groups, and racing the logs within each group against each other, we now
race the various groups against each other, and pick just one arbitrary
log from each group to attempt submission to.
Ensure that the new code path does the right thing by adding a new zlint
which checks that the two SCTs embedded in a certificate come from logs
run by different operators. To support this lint, which needs to have a
canonical mapping from logs to their operators, import the Chrome CT Log
List JSON Schema and autogenerate Go structs from it so that we can
parse a real CT Log List. Also add flags to all services which run these
lints (the CA and cert-checker) to let them load a CT Log List from disk
and provide it to the lint.
Finally, since we now have the ability to load a CT Log List file
anyway, use this capability to simplify configuration of the RA. Rather
than listing all of the details for each log we're willing to submit to,
simply list the names (technically, Descriptions) of each log, and look
up the rest of the details from the log list file.
To support this change, SRE will need to deploy log list files (the real
Chrome log list for prod, and a custom log list for staging) and then
update the configuration of the RA, CA, and cert-checker. Once that
transition is complete, the deletion TODOs left behind by this change
will be able to be completed, removing the old RA configuration and old
ctpolicy race logic.
Part of #5938
This reverts commit 7ef6913e71.
We turned on the `ExpirationMailerDontLookTwice` feature flag in prod, and it's
working fine but not clearing the backlog. Since
https://github.com/letsencrypt/boulder/pull/6100 fixed the issue that caused us
to (nearly) stop sending mail when we deployed #6057, this should be safe to
roll forward.
The revert of the revert applied cleanly, except for expiration-mailer/main.go
and `main_test.go`, particularly around the contents `processCerts` (where
`sendToOneRegID` was extracted from) and `sendToOneRegID` itself. So those areas
are good targets for extra attention.
Update:
- golangci-lint from v1.42.1 to v1.46.2
- protoc from v3.15.6 to v3.20.1
- protoc-gen-go from v1.26.0 to v1.28.0
- protoc-gen-go-grpc from v1.1.0 to v1.2.0
- fpm from v1.14.0 to v1.14.2
Also remove a reference to go1.17.9 from one last place.
This does result in updating all of our generated .pb.go files, but only
to update the version number embedded in each file's header.
Fixes#6123
We recently landed a fix so the expiration-mailer won't look twice at
the same certificate. This will cause an immediate behavior change when
it is deployed, and that might have surprising effects. Put the fix
behind a feature flag so we can control when it rolls out more
carefully.
The go-redis docs say default is 10 * NumCPU, but the actual code says 5.
Extra context:
2465baaab5/options.go (L143-L145)2465baaab5/cluster.go (L96-L98)
For Options, the default (documented) is 10 * NumCPUs. For ClusterOptions, the
default (undocumented) is 5 * NumCPUs. We use ClusterOptions. Also worth noting:
for ClusterOptions, the limit is per node.
This was introduced when expiration-mailer was run by cron, and was a
way for expiration-mailer to know something about its expected run
interval so it could send notifications "on time" rather than "just
after" the configured email time.
Now that expiration-mailer runs as a daemon we can simply pull this
value from `Frequency`, which is set to the same value in prod.
Instead write `[]`, a better representation of an empty contact set,
and avoid having literal JSON `null`s in our database.
As part of doing so, add some extra code to //sa/model.go that
bypasses the need for //sa/type-converter.go to do any magic
JSON-to-string-slice conversions for us.
Fixes#6074
When deployed, the newly-parallel expiration-mailer encountered
unexpected difficulties and dropped to apparently sending nearly zero
emails despite not throwing any real errors. Reverting the parallelism
change until we understand and can fix the root cause.
This reverts two commits:
- Allow expiration mailer to work in parallel (#6057)
- Fix data race in expiration-mailer test mocks (#6072)
It also modifies the revert to leave the new `ParallelSends` config key
in place (albeit completely ignored), so that the binary containing this
revert can be safely deployed regardless of config status.
Part of #5682
Previously, each accounts email would be sent in serial,
along with several reads from the database (to check for
certificate renewal) and several writes to the database (to update
`certificateStatus.lastExpirationNagSent`). This adds a config field
for the expiration mailer that sets the parallelism it will use.
That means making and using multiple SMTP connections as well. Previously,
`bmail.Mailer` was not safe for concurrent use. It also had a piece of
API awkwardness: after you created a Mailer, you had to call Connect on
it to change its state.
Instead of treating that as a state change on Mailer, I split out a
separate component: `bmail.Conn`. Now, when you call `Mailer.Connect()`,
you get a Conn. You can send mail on that Conn and Close it when you're
done. A single Mailer instance can produce multiple Conns, so Mailer is
now concurrency-safe (while Conn is not).
This involved a moderate amount of renaming and code movement, and
GitHub's move detector is not keeping up 100%, so an eye towards "is
this moved code?" may help. Also adding `?w=1` to the diff URL to ignore
whitespace diffs.
This copies over settings from config-next that are now deployed in prod.
Also, I updated a comment in sd-test-srv to more accurately describe how SRV records work.
go1.17.9 (released 2022-04-12) includes security fixes to the crypto/elliptic and encoding/pem packages, as well as bug fixes to the linker and runtime. See the [Go 1.17.9 milestone](https://github.com/golang/go/issues?q=milestone%3AGo1.17.9+label%3ACherryPickApproved) on our issue tracker for details.
go1.18.1 (released 2022-04-12) includes security fixes to the crypto/elliptic, crypto/x509, and encoding/pem packages, as well as bug fixes to the compiler, linker, runtime, the go command, vet, and the bytes, crypto/x509, and go/types packages. See the [Go 1.18.1 milestone](https://github.com/golang/go/issues?q=milestone%3AGo1.18.1+label%3ACherryPickApproved) on our issue tracker for details.
First commit adding support for tooling to aid in the tracking and remediation
of incidents.
- Add new SA method `IncidentsForSerial`
- Add database models for `incident`s and `incidentCert`s
- Add protobuf type for `incident`
- Add database migrations for `incidents`, `incident_foo`, and `incident_bar`
- Give db user `sa` permissions to `incidents`, `incident_foo`, and
`incident_bar`
Part Of #5947
Simplify the WFE `RevokeCertificate` API method in three ways:
- Remove most of the logic checking if the requester is authorized to
revoke the certificate in question (based on who is making the
request, what authorizations they have, and what reason they're
requesting). That checking is now done by the RA. Instead, simply
verify that the JWS is authenticated.
- Remove the hard-to-read `authorizedToRevoke` callbacks, and make the
`revokeCertBySubscriberKey` (nee `revokeCertByKeyID`) and
`revokeCertByCertKey` (nee `revokeCertByJWK`) helpers much more
straight-line in their execution logic.
- Call the RA's new `RevokeCertByApplicant` and `RevokeCertByKey` gRPC
methods, rather than the deprecated `RevokeCertificateWithReg`.
This change, without any flag flips, should be invisible to the
end-user. It will slightly change some of our log message formats.
However, by now relying on the new RA gRPC revocation methods, this
change allows us to change our revocation policies by enabling the
`AllowDoubleRevocation` and `MozRevocationReasons` feature flags, which
affect the behavior of those new helpers.
Fixes#5936
- Add new configuration key `throughput`, a mapping which contains all
throughput related akamai-purger settings.
- Deprecate configuration key `purgeInterval` in favor of `purgeBatchInterval` in
the new `throughput` configuration mapping.
- When no `throughput` or `purgeInterval` is provided, the purger uses optimized
default settings which offer 1.9x the throughput of current production settings.
- At startup, all throughput related settings are modeled to ensure that we
don't exceed the limits imposed on us by Akamai.
- Queue is now `[][]string`, instead of `[]string`.
- When a given queue entry is purged we know all 3 of it's URLs were purged.
- At startup we know the size of a theoretical request to purge based on the
number of queue entries included
- Raises the queue size from ~333-thousand cached OCSP responses to
1.25-million, which is roughly 6 hours of work using the optimized default
settings
- Raise `purgeInterval` in test config from 1ms, which violates API limits, to 800ms
Fixes#5984
Adds a rocsp redis client to the sa if cluster information is provided in the
sa config. If a redis cluster is configured, all new certificate OCSP
responses added with sa.AddPrecertificate will attempt to be written to
the redis cluster, but will not block or fail on errors.
Fixes: #5871
- Remove GOPATH-style path structure, which isn't needed with Go
modules.
- Remove check for existing of docker buildx builder instance, since it
was unreliable.
Add two new gRPC methods to the SA:
- `RevokeCertByKey` will be used when the API request was signed by the
certificate's keypair, rather than a Subscriber keypair. If the
request is for reason `keyCompromise`, it will ensure that the key is
added to the blocked keys table, and will attempt to "re-revoke" a
certificate that was already revoked for some other reason.
- `RevokeCertByApplicant` supports both the path where the original
subscriber or another account which has proven control over all of the
identifier in the certificate requests revocation via the API. It does
not allow the requested reason to be `keyCompromise`, as these
requests do not represent a demonstration of key compromise.
In addition, add a new feature flag `MozRevocationReasons` which
controls the behavior of these new methods. If the flag is not set, they
behave like they have historically (see above). If the flag is set to true,
then the new methods enforce the upcoming Mozilla policies around
revocation reasons, namely:
- Only the original Subscriber can choose the revocation reason; other
clients will get a set reason code based on the method of requesting
revocation. When the original Subscriber requests reason
`keyCompromise`, this request will be honored, but the key will not be
blocked and other certificates with that key will not also be revoked.
- Revocations signed with the certificate key will always get reason
`keyCompromise`, because we do not know who is sending the request and
therefore must assume that the use of the key in this way represents
compromise. Because these requests will always be fore reason
`keyCompromise`, they will always be added to the blocked keys table
and they will always attempt "re-revocation".
- Revocations authorized via control of all names in the cert will
always get reason `cessationOfOperation`, which is to be used when the
original Subscriber does not control all names in the certificate
anymore.
Finally, update the existing `AdministrativelyRevokeCertificate` method
to use the new helper functions shared by the two new methods.
Part of #5936
This requires using GODEBUG to enable a couple of thing turned off by go1.18 (TLS 1.0/1.1, SHA-1 CSRs).
Also add help for a failure mode of cross builds.
Reverts letsencrypt/boulder#5963
Turns out the tests are still flaky -- using the `grpc.WaitForReady(true)`
connection option results in sometimes seeing 9 entries added to the
purger queue, and sometimes 10 entries. Reverting because flakiness
on main should not be tolerated.
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.36.1 to 1.44.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.36.1...v1.44.0)
Also update akamai-purger integration test to avoid experimental API.
The `conn.GetState()` API is marked experimental and may change behavior
at any time. It appears to have changed between v1.36.1 and v1.44.0,
and so the akamai-purger integration tests which rely on it break.
Rather than writing our own loop which polls `conn.GetState()`, just
use the stable `WaitForReady(true)` connection option, and apply it to
all connections by setting it as a default option in the dial options.
Light cleanup of akamai-purger and the akamai cache-client. This does not make
any material changes to logic.
- Use `errors.New` and `errors.Is` instead of a custom `ErrFatal` type and
`errors.As`
- Add whitespace to separate chunks of execution and error checking from one
another
- Use `logger.Infof` and `logger.Errorf` instead of wrapped calls to
`fmt.Sprintf`
- Remove capital letters from the beginning of error messages
- Additional comments and removal of some that are no longer accurate
When inside a closure, it is important to not accidentally assign
to variables declared outside the scope of the closure. Doing so
causes static analysis tools (such as `errcheck`) to be unable to
evaluate the lifetime of the variable, and unable to determine if
it is appropriately read from before being assigned to again.
Fix two instances where we assign to a variable declared in the
closure's enclosing scope, rather than declaring a new variable
with the same name.
We have decided that we don't like the if err := call(); err != nil
syntax, because it creates confusing scopes, but we have not cleaned up
all existing instances of that syntax. However, we have now found a
case where that syntax enables a bug: It caused readers to believe that
a later err = call() statement was assigning to an already-declared err
in the local scope, when in fact it was assigning to an
already-declared err in the parent scope of a closure. This caused our
ineffassign and staticcheck linters to be unable to analyze the
lifetime of the err variable, and so they did not complain when we
never checked the actual value of that error.
This change standardizes on the two-line error checking syntax
everywhere, so that we can more easily ensure that our linters are
correctly analyzing all error assignments.
Incidents of key compromise where proof is supplied in the form of a private key
have historically been labor intensive for SRE. This PR seeks to automate the
process of embedded public key validation , query for issuance, revocation, and
blocking by SPKI hash.
For an example of private keys embedding a mismatched public key, see:
https://blog.hboeck.de/archives/888-How-I-tricked-Symantec-with-a-Fake-Private-Key.html.
Adds two new sub-commands (private-key-block and private-key-revoke) and one new
flag (-dry-run) to admin-revoker. Both new sub-commands validate that the
provided private key and provide the operator with an issuance count. Any
blocking and revocation actions are gated by the new '-dry-run' flag, which is
'true' by default.
private-key-block: if -dry-run=false, will immediately block issuance for the
provided key. The operator is informed that bad-key-revoker will eventually
revoke any certificates using the provided key.
private-key-revoke: if -dry-run=false, will revoke all certificates using the
provided key and then blocks future issuance. This avoids a race with the
bad-key-revoker. This command will execute successfully even if issuance for the
provided key is already blocked.
- Add support for blocking issuance by private key to admin-revoker
- Add support for revoking certificates by private key to admin-revoker
- Create new package called 'privatekey'
- Move private key loading logic from 'issuance' to 'privatekey'
- Add embedded public key verification to 'privatekey'
- Add new field `skipBlockKey` to `AdministrativelyRevokeCertificate` protobuf
- Add check in RA to ensure that only KeyCompromise revocations use
`skipBlockKey`
Fixes#5785
Add `stylecheck` to our list of lints, since it got separated out from
`staticcheck`. Fix the way we configure both to be clearer and not
rely on regexes.
Additionally fix a number of easy-to-change `staticcheck` and
`stylecheck` violations, allowing us to reduce our number of ignored
checks.
Part of #5681
These gRPC methods were only used by the ACMEv1 code paths.
Now that boulder-wfe has been fully removed, we can be confident
that no clients ever call these methods, and can remove them from
the gRPC service interface.
Part of #5816
Running an older version (v0.0.1-2020.1.4) of `staticcheck` in
whole-program mode (`staticcheck --unused.whole-program=true -- ./...`)
finds various instances of unused code which don't normally show up
as CI issues. I've used this to find and remove a large chunk of the
unused code, to pave the way for additional large deletions accompanying
the WFE1 removal.
Part of #5681
When we query DNS for a host, and both the A and AAAA lookups fail or
are empty, combine both errors into a single error rather than only
returning the error from the A lookup.
Fixes#5819Fixes#5319
Overhaul the revocation integration tests to comprehensively test
every combination of:
- revoking a cert vs a precert
- revoking via the cert key, the subscriber key, or a separate account
that has validation for all of the names in the cert
- revoking for reason Unspecified vs for reason KeyCompromise
Also update a number of the python tests to verify that they cannot
revoke for reason keyCompromise, but can and do revoke with other
reasons.
When looping over multiple Go versions this script currently exits in error
because we attempt to create a cross-compiling node even though it already
exists. This allows subsequent builds to make use of the Docker cache, reducing
the build time by ~400 seconds.
- Only create the cross-compiling node if it doesn't exist
- No longer remove the cross-compiling node on exit
Followup from #5839.
I chose groupcache/lru as our LRU cache implementation because it's part
of the golang org, written by one of the Go authors, and very simple
and easy to read.
This adds an `AccountGetter` interface that is implemented by both the
AccountCache and the SA. If the WFE config includes an AccountCache field,
it will wrap the SA in an AccountCache with the configured max size and
expiration time.
We set an expiration time on account cache entries because we want a
bounded amount of time that they may be stale by. This will be used in
conjunction with a delay on account-updating pathways to ensure we don't
allow authentication with a deactivated account or changed key.
The account cache stores corepb.Registration objects because protobufs
have an established way to do a deep copy. Deep copies are important so
the cache can maintain its own internal state and ensure nothing external
is modifying it.
As part of this process I changed construction of the WFE. Previously,
"SA" and "RA" were public fields that were mutated after construction. Now
they are parameters to the constructor, along with the new "accountGetter"
parameter.
The cache includes stats for requests categorized by hits and misses.
Add a new check to GoodKey which attempts to factor the public modulus
of the presented key using Fermat's factorization method. This method
will succeed if and only if the prime factors are very close to each
other -- i.e. almost certainly were not selected independently from a
random uniform distribution, but were instead calculated via some other
less secure method.
To support this new feature, add a new config flag to the RA, CA, and
WFE, which all use the GoodKey checks. As part of adding this new config
value, refactor the GoodKey config items into their own config struct
which can be re-used across all services.
If the new `FermatRounds` config value has not been set, it will default
to zero, causing no factorization to be attempted.
Fixes#5850
Part of #5851
Build a new docker container for the new Go 1.17.5 security release,
which includes a fix for the `net/http` package. Update our CI to run
tests on both our current and the new go versions.
Currently, if `docker buildx` fails the cross-compilation node, created before
the build starts, will never be deleted. This ensures that the cross-compilation
node is always deleted before `tag_and_upload.sh` exits.