Allow gRPC SRV resolver to succeed even when some names are not resolved
successfully. Cross-DC services (e.g. nonce) will fail to resolve when
the link between DCs is severed or one DC is taken offline, this should
not result in hard gRPC service failures.
Fixes#6974
This adds a lookup in cert-checker to find the linting precertificate
with the same serial number as a given final certificate, and checks
precertificate correspondence between the two.
Fixes#6959
Fix an issue related to the custom gRPC Picker implementation introduced
in #6618. When a nonce contained a prefix not associated with a known
backend, the Picker would continuously rebuild, re-resolve DNS, and
eventually throw a 500 "Server Error" at RPC timeout. The Picker now
promptly returns a 400 "Bad Nonce" error as expected, in response the
requesting client should retry their request with a fresh nonce.
Additionally:
- WFE unit tests use derived nonces when `"BOULDER_CONFIG_DIR" ==
"test/config-next"`.
- `Balancer.Build()` in "noncebalancer" forces a rebuild until non-zero
backends are available. This matches the
[balancer/roundrobin](d524b40946/balancer/roundrobin/roundrobin.go (L49-L53))
implementation.
- Nonces with no matching backend increment "jose_errors" with label
`"type": "JWSInvalidNonce"` and "nonce_no_backend_found".
- Nonces of incorrect length are now rejected at the WFE and increment
"jose_errors" with label `"type": "JWSMalformedNonce"` instead of
`"type": "JWSInvalidNonce"`.
- Nonces not encoded as base64url are now rejected at the WFE and
increment "jose_errors" with label `"type": "JWSMalformedNonce"` instead
of `"type": "JWSInvalidNonce"`.
Fixes#6969
Part of #6974
When crl-updater produces a CRL with shard index 0, have it also produce
an identical CRL with shard index NumShards (e.g. 128). This is the
first step towards having it only produce shards numbered 1 through
NumShards, i.e. transition from using 0-indexing to 1-indexing.
We want to do this because various aspects of Golang and gRPC cannot
tell the difference between "this struct/message has no shard index set"
and "this struct/message has shard index 0 set".
Part of https://github.com/letsencrypt/boulder/issues/7007
This design seeks to reduce read-pressure on our DB by moving rate limit
tabulation to a key-value datastore. This PR provides the following:
- (README.md) a short guide to the schemas, formats, and concepts
introduced in this PR
- (source.go) an interface for storing, retrieving, and resetting a
subscriber bucket
- (name.go) an enumeration of all defined rate limits
- (limit.go) a schema for defining default limits and per-subscriber
overrides
- (limiter.go) a high-level API for interacting with key-value rate
limits
- (gcra.go) an implementation of the Generic Cell Rate Algorithm, a
leaky bucket-style scheduling algorithm, used to calculate the present
or future capacity of a subscriber bucket using spend and refund
operations
Note: the included source implementation is test-only and currently
accomplished using a simple in-memory map protected by a mutex,
implementations using Redis and potentially other data stores will
follow.
Part of #5545
Add a new feature flag, LeaseCRLShards, which controls certain aspects
of crl-updater's behavior.
When this flag is enabled, crl-updater calls the new SA.LeaseCRLShard
method before beginning work on a shard. This prevents it from stepping
on the toes of another crl-updater instance which may be working on the
same shard. This is important to prevent two competing instances from
accidentally updating a CRL's Number (which is an integer representation
of its thisUpdate timestamp) *backwards*, which would be a compliance
violation.
When this flag is enabled, crl-updater also calls the new
SA.UpdateCRLShard method after finishing work on a shard.
In the future, additional work will be done to make crl-updater use the
"give me the oldest available shard" mode of the LeaseCRLShard method.
Fixes https://github.com/letsencrypt/boulder/issues/6897
Completely remove the ability to configure Certificate Policy OIDs in
the Boulder CA. Instead, hard-code the Baseline Requirements Domain
Validated Reserved Policy Identifier. Boulder will never perform OV or
EV validation, so this is the only identifier that will be necessary.
In the ceremony tool, introduce additional checks that assert that Root
certificates do not have policies, and Intermediate certificates have
exactly the one Baseline Requirements Domain Validated Reserved Policy
Identifier.
This change replaces [gorp] with [borp].
The changes consist of a mass renaming of the import and comments / doc
fixups, plus modifications of many call sites to provide a
context.Context everywhere, since gorp newly requires this (this was one
of the motivating factors for the borp fork).
This also refactors `github.com/letsencrypt/boulder/db.WrappedMap` and
`github.com/letsencrypt/boulder/db.Transaction` to not embed their
underlying gorp/borp objects, but to have them as plain fields. This
ensures that we can only call methods on them that are specifically
implemented in `github.com/letsencrypt/boulder/db`, so we don't miss
wrapping any. This required introducing a `NewWrappedMap` method along
with accessors `SQLDb()` and `BorpDB()` to get at the internal fields
during metrics and logging setup.
Fixes#6944
Our linting system uses a throwaway key to sign an untrusted version of
the to-be-signed cert, then runs the lints over that. But this means
that, when linting a self-signed cert, the signature no longer matches
the embedded public key. This in turn causes a bunch of zlint's checks
to think they're linting a Subordinate CA cert, rather than a Root CA
cert.
Change our linting system to make the lint cert appear self-signed when
the input cert is intended to be self-signed.
The inclusion of Policy Qualifiers inside Policy Information elements of
a Certificate Policies extension is now NOT RECOMMENDED by the Baseline
Requirements. We have already removed these fields from all of our
Boulder configuration, and ceased issuing certificates with Policy
Qualifiers.
Remove all support for configuring and including Policy Qualifiers in
our certificates, both in Boulder's main issuance path and in our
ceremony tool. Switch from using the policyasn1 library to manually
encode these extensions, to using the crypto/x509's
Certificate.PolicyIdentifiers field. Delete the policyasn1 package as it
is no longer necessary.
Fixes https://github.com/letsencrypt/boulder/issues/6880
Rather than marshalling and comparing the bytes of each key, simply use
the .Equal() method provided by all go stdlib types that implement the
crypto.PublicKey interface.
This is a follow-up fix to
https://github.com/letsencrypt/boulder/pull/6987 which mistakenly did
not update the docker compose default BOULDER_TOOLS_TAG variable from
go1.20.5 to go1.20.6. This would only manifest on developer machines
manually running unit/integration tests rather than CI which explicitly
tests against a matrix of BOULDER_TOOLS_TAG versions.
```
Error response from daemon: manifest for letsencrypt/boulder-tools:go1.20.5_2023-07-11 not found: manifest unknown: manifest unknown
```
Update zlint to v3.5.0, which introduces scaffolding for running lints
over CRLs.
Convert all of our existing CRL checks to structs which match the zlint
interface, and add them to the registry. Then change our linter's
CheckCRL function, and crl-checker's Validate function, to run all lints
in the zlint registry.
Finally, update the ceremony tool to run these lints as well.
This change touches a lot of files, but involves almost no logic
changes. It's all just infrastructure, changing the way our lints and
their tests are shaped, and moving test files into new homes.
Fixes https://github.com/letsencrypt/boulder/issues/6934
Fixes https://github.com/letsencrypt/boulder/issues/6979
This version includes a fix that seems relevant to us:
> The HTTP/1 client did not fully validate the contents of the Host
header. A maliciously crafted Host header could inject additional
headers or entire requests. The HTTP/1 client now refuses to send
requests containing an invalid Request.Host or Request.URL.Host value.
>
> Thanks to Bartek Nowotarski for reporting this issue.
>
> Includes security fixes for CVE-2023-29406 and Go issue
https://go.dev/issue/60374
This was introduced early in Boulder development when we had the concept
of a "short serial" (monotonically increasing) which would be prepended
to random bytes to form the full serial. We wanted to specially report
the case that there were duplicates of a given short serial since it
meant a problem with our monotonicity.
We've long since abandoned that idea, and also this code can't be
exercised because sa.SelectCertificate does a LIMIT 1 anyhow.
The upstream zlint lints are organized not by what kind of certificate
they apply to, but what source they are from. This change rearranges
(and slightly renames) our custom lints to match the same structure.
This will make it easier for us to temporarily add lints (e.g. for our
CRLs) which we intend to upstream to zlint later.
Part of https://github.com/letsencrypt/boulder/issues/6934
Add go1.21rc2 to the matrix of go versions we test against.
Add a new step to our CI workflows (boulder-ci, try-release, and
release) which sets the "GOEXPERIMENT=loopvar" environment variable if
we're running go1.21. This experiment makes it so that loop variables
are scoped only to their single loop iteration, rather than to the whole
loop. This prevents bugs such as our CAA Rechecking incident
(https://bugzilla.mozilla.org/show_bug.cgi?id=1619047). Also add a line
to our docker setup to propagate this environment variable into the
container, where it can affect builds.
Finally, fix one TLS-ALPN-01 test to have the fake subscriber server
actually willing to negotiate the acme-tls/1 protocol, so that the ACME
server's tls client actually waits to (fail to) get the certificate,
instead of dying immediately. This fix is related to the upgrade to
go1.21, not the loopvar experiment.
Fixes https://github.com/letsencrypt/boulder/issues/6950
Add two new methods, LeaseCRLShard and UpdateCRLShard, to the SA gRPC
interface. These methods work in concert both to prevent multiple
instances of crl-updater from stepping on each others toes, and to lay
the groundwork for a less bursty version of crl-updater in the future.
Introduce a new database table, crlShards, which tracks the thisUpdate
and nextUpdate timestamps of each CRL shard for each issuer. It also has
a column "leasedUntil", which is also a timestamp. Grant the SA user
read-write access to this table.
LeaseCRLShard updates the leasedUntil column of the identified shard to
the given time. It returns an error if the identified shard's
leasedUntil timestamp is already in the future. This provides a
mechanism for crl-updater instances to "lick the cookie", so to speak,
marking CRL shards as "taken" so that multiple crl-updater instances
don't attempt to work on the same shard at the same time. Using a
timestamp has the added benefit that leases are guaranteed to expire,
ensuring that we don't accidentally fail to work on a shard forever.
LeaseCRLShard has a second mode of operation, when a range of potential
shards is given in the request, rather than a single shard. In this
mode, it returns the shard (within the given range) whose thisUpdate
timestamp is oldest. (Shards with no thisUpdate timestamp, including
because the requested range includes shard indices the database doesn't
yet know about, count as older than any shard with any thisUpdate
timestamp.) This allows crl-updater instances which don't care which
shard they're working on to do the most urgent work first.
UpdateCRLShard updates the thisUpdate and nextUpdate timestamps of the
identified shard. This closes the loop with the second mode of
LeaseCRLShard above: by updating the thisUpdate timestamp, the method
marks the shard as no longer urgently needing to be worked on.
IN-9220 tracks creating this table in staging and production
Part of #6897
This introduces a small new package, `precert`, with one function
`Correspond` that checks a precertificate against a final certificate to
see if they correspond in the relationship described in RFC 6962.
This also modifies the `issuance` package so that RequestFromPrecert
generates an IssuanceRequest that keeps a reference to the
precertificate's bytes. The allows `issuance.Prepare` to do a
correspondence check when preparing to sign the final certificate. Note
in particular that the correspondence check is done against the
_linting_ version of the final certificate. This allows us to catch
correspondence problems before the real, trusted signature is actually
made.
Fixes#6945
This can take two values (typically the return values of a two-value
function) and panic if the error is non-nil, returning the interesting
value. This is particularly useful for cases where we statically know
the call will succeed.
Thanks to @mcpherrinm for the idea!
Over on the community forum, there's been requests for the new .vn
domains. weppos/publicsuffix-go hasn't had a release tagged in a little
while, so this is the result of:
go get github.com/weppos/publicsuffix-go@latest
go mod tidy
go mod vendor
The contacts field of an account can be very verbose, and is irrelevant
to the vast majority -- e.g. creating orders, validating challenges, and
downloading certificates -- of requests made by an account. To reduce
the length of our WFE log lines, remove the Contacts field from all
logs. When we actually need it, we can get it from the database.
Also remove the RequestEvent.TLS field, which is unused.
In WFE, we do a goodkey check when validating a self-authenticated POST
(i.e. when creating an account). For a while, that was a purely local
check, looking at a list of bad keys or bad moduluses, or checking for
factorability. At some point we also added a backend check, querying the
SA to see if a key was blocked. However, we did not update this one code
path to distinguish "bad key" from "timeout querying SA." That meant
that sometimes we would give a badPublicKey Problem Document when we
should have given an internalServerError.
Related:
https://github.com/letsencrypt/boulder/issues/6795#issuecomment-1574217398
For "ordinary" errors like "file not found" for some part of the config,
we would prefer to log an error and exit without logging about a panic
and printing a stack trace.
To achieve that, we want to call `defer AuditPanic()` once, at the top
of `cmd/boulder`'s main. That's so early that we haven't yet parsed the
config, which means we haven't yet initialized a logger. We compromise:
`AuditPanic` now calls `log.Get()`, which will retrieve the configured
logger if one has been set up, or will create a default one (which logs
to stderr/stdout).
AuditPanic and Fail/FailOnError now cooperate: Fail/FailOnError panic
with a special type, and AuditPanic checks for that type and prints a
simple message before exiting when it's present.
This PR also coincidentally fixes a bug: panicking didn't previously
cause the program to exit with nonzero status, because it recovered the
panic but then did not explicitly exit nonzero.
Fixes#6933