solve https://github.com/letsencrypt/boulder/issues/8088
RFC8555 6.2 requires badSignatureAlgorithm on unacceptable JWS signing
algorithm, but current boulder return malform:failed to parse jws error
instead
Its because this only checks about JWS protected header's signature
algorithm, current checkAlgorithm is while too late to catch parse time
error but not redundant, as it checks against a key and signed message
---------
Co-authored-by: Samantha Frank <hello@entropy.cat>
Change all of the helper methods and functions in verify.go to return an
`error` instead of a `probs.ProblemDetails`. Add a few new types to our
errors package, and support for those types in ProblemDetailsForError,
to maintain the same public-facing error types. Update the tests to
check for specific errors instead of specific problems.
This is a building block towards making the probs.ProblemDetails type
not implement the Error interface, and only be used when rendering
errors to the user (i.e. not within Boulder logic itself).
Part of https://github.com/letsencrypt/boulder/issues/4980
Upgrade from the old go-jose v2.6.1 to the newly minted go-jose v4.0.1.
Cleans up old code now that `jose.ParseSigned` can take a list of
supported signature algorithms.
Fixes https://github.com/letsencrypt/boulder/issues/7390
---------
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
Fix an issue related to the custom gRPC Picker implementation introduced
in #6618. When a nonce contained a prefix not associated with a known
backend, the Picker would continuously rebuild, re-resolve DNS, and
eventually throw a 500 "Server Error" at RPC timeout. The Picker now
promptly returns a 400 "Bad Nonce" error as expected, in response the
requesting client should retry their request with a fresh nonce.
Additionally:
- WFE unit tests use derived nonces when `"BOULDER_CONFIG_DIR" ==
"test/config-next"`.
- `Balancer.Build()` in "noncebalancer" forces a rebuild until non-zero
backends are available. This matches the
[balancer/roundrobin](d524b40946/balancer/roundrobin/roundrobin.go (L49-L53))
implementation.
- Nonces with no matching backend increment "jose_errors" with label
`"type": "JWSInvalidNonce"` and "nonce_no_backend_found".
- Nonces of incorrect length are now rejected at the WFE and increment
"jose_errors" with label `"type": "JWSMalformedNonce"` instead of
`"type": "JWSInvalidNonce"`.
- Nonces not encoded as base64url are now rejected at the WFE and
increment "jose_errors" with label `"type": "JWSMalformedNonce"` instead
of `"type": "JWSInvalidNonce"`.
Fixes#6969
Part of #6974
The contacts field of an account can be very verbose, and is irrelevant
to the vast majority -- e.g. creating orders, validating challenges, and
downloading certificates -- of requests made by an account. To reduce
the length of our WFE log lines, remove the Contacts field from all
logs. When we actually need it, we can get it from the database.
Also remove the RequestEvent.TLS field, which is unused.
In WFE, we do a goodkey check when validating a self-authenticated POST
(i.e. when creating an account). For a while, that was a purely local
check, looking at a list of bad keys or bad moduluses, or checking for
factorability. At some point we also added a backend check, querying the
SA to see if a key was blocked. However, we did not update this one code
path to distinguish "bad key" from "timeout querying SA." That meant
that sometimes we would give a badPublicKey Problem Document when we
should have given an internalServerError.
Related:
https://github.com/letsencrypt/boulder/issues/6795#issuecomment-1574217398
Define a `bJSONWebSignature` struct which embeds a
`*jose.JSONWebSignature`. The only method that can produce a
`bJSONWebSignature` is `wfe.parseJWS` so that we can ensure
safety/sanity checks are performed on the incoming data. Restricts
several methods and functions to take a `jose.Header` as an input
parameter, rather than a full JWS.
Fixes https://github.com/letsencrypt/boulder/issues/5676.
Remove tracing using Beeline from Boulder. The only remnant left behind
is the deprecated configuration, to ensure deployability.
We had previously planned to swap in OpenTelemetry in a single PR, but
that adds significant churn in a single change, so we're doing this as
multiple steps that will each be significantly easier to reason about
and review.
Part of #6361
Also remove CSRDNSNames, CSRIPAddresses and CSREmailAddresses.
And add a new log field "DNSNames", for use in new-order, finalize, and
revoke requests.
Add a "RevocationReason" field in the "Extra" section for revoke
requests.
Assign nonce prefixes for each nonce-service by taking the first eight
characters of the the base64url encoded HMAC-SHA256 hash of the RPC
listening address using a provided key. The provided key must be same
across all boulder-wfe and nonce-service instances.
- Add a custom `grpc-go` load balancer implementation (`nonce`) which
can route nonce redemption RPC messages by matching the prefix to the
derived prefix of the nonce-service instance which created it.
- Modify the RPC client constructor to allow the operator to override
the default load balancer implementation (`round_robin`).
- Modify the `srv` RPC resolver to accept a comma separated list of
targets to be resolved.
- Remove unused nonce-service `-prefix` flag.
Fixes#6404
Originally, WFEs had a built-in nonce service. Then we added a "remote
nonce service" via gRPC, but we kept a fallback path for when the remote
nonce service was not configured, to use a built-in nonce service. This
PR removes that fallback path.
Since the fallback path was relied on by the unittests, this also
refactors the unittests to use a gRPC-style nonce service (but in-memory
for the unittests).
Fixes#6530
When a client cancels their HTTP request, that propagates into
gRPC-level cancellation. But we don't want those canceled RPCs to show
up as 500s, we want them to show up as 408s. We do that by producing a
special ProblemDetails at the gRPC level. However, for that trick to
work, we need to make sure errors from RPC methods get passed through
web.ProblemDetailsForError. There were some places where we didn't do
this and instead created a from-scratch ProblemDetails. This resulted in
spurious 500s.
My methodology was to look at every method call on each of the WFE's
fields that represents a gRPC backend: `ra`, `sa`, `accountGetter`,
`nonceService`, `remoteNonceServe`. If the error handling for that call
did not use web.ProblemDetailsForError, I changed it to use that.
Fixes#6524
In draft because still needs tests.
Enable the "unparam" linter, which checks for unused function
parameters, unused function return values, and parameters and
return values that always have the same value every time they
are used.
In addition, fix many instances where the unparam linter complains
about our existing codebase. Remove error return values from a
number of functions that never return an error, remove or use
context and test parameters that were previously unused, and
simplify a number of (mostly test-only) functions that always take the
same value for their parameter. Most notably, remove the ability to
customize the RSA Public Exponent from the ceremony tooling,
since it should always be 65537 anyway.
Fixes#6104
The iotuil package has been deprecated since go1.16; the various
functions it provided now exist in the os and io packages. Replace all
instances of ioutil with either io or os, as appropriate.
This special casing was originally added to catch a bug where
the underlying library said that the EC *private* key was the
wrong length. We were also interested in exposing this specific
error to clients because EC key-length strictness was added to
go-jose and we wanted to ensure that upgrading to the new
version wouldn't break existing clients.
We are now several years past that upgrade, and hit this error
case very infrequently. There is no need to continue special-
casing it.
We have decided that we don't like the if err := call(); err != nil
syntax, because it creates confusing scopes, but we have not cleaned up
all existing instances of that syntax. However, we have now found a
case where that syntax enables a bug: It caused readers to believe that
a later err = call() statement was assigning to an already-declared err
in the local scope, when in fact it was assigning to an
already-declared err in the parent scope of a closure. This caused our
ineffassign and staticcheck linters to be unable to analyze the
lifetime of the err variable, and so they did not complain when we
never checked the actual value of that error.
This change standardizes on the two-line error checking syntax
everywhere, so that we can more easily ensure that our linters are
correctly analyzing all error assignments.
Followup from #5839.
I chose groupcache/lru as our LRU cache implementation because it's part
of the golang org, written by one of the Go authors, and very simple
and easy to read.
This adds an `AccountGetter` interface that is implemented by both the
AccountCache and the SA. If the WFE config includes an AccountCache field,
it will wrap the SA in an AccountCache with the configured max size and
expiration time.
We set an expiration time on account cache entries because we want a
bounded amount of time that they may be stale by. This will be used in
conjunction with a delay on account-updating pathways to ensure we don't
allow authentication with a deactivated account or changed key.
The account cache stores corepb.Registration objects because protobufs
have an established way to do a deep copy. Deep copies are important so
the cache can maintain its own internal state and ensure nothing external
is modifying it.
As part of this process I changed construction of the WFE. Previously,
"SA" and "RA" were public fields that were mutated after construction. Now
they are parameters to the constructor, along with the new "accountGetter"
parameter.
The cache includes stats for requests categorized by hits and misses.
Remove the `GoodKey` check from `validSelfAuthenticatedJWS`, so that
the verification performed by `RevokeCertificate`'s `revokeCertByJWK`
path doesn't bail out early if the key being used to sign the revocation
request has already been blocklisted (most likely because someone else
has observed the same compromise and already submitted a similar
revocation request).
In turn, add the `GoodKey` check to `validSelfAuthenticatedPOST`, the
only other function that relies on `validSelfAuthenticatedJWS` as a
helper. This ensures that other self-authenticated code paths (namely
`NewAccount`) continue to enforce our key policies.
Fixes#5662
Remove all error checking and type transformation from the gRPC wrappers
for the following methods on the SA:
- GetRegistration
- GetRegistrationByKey
- NewRegistration
- UpdateRegistration
- DeactivateRegistration
Update callers of these methods to construct the appropriate protobuf
request messages directly, and to consume the protobuf response messages
directly. In many cases, this requires changing the way that clients
handle the `Jwk` field (from expecting a `JSONWebKey` to expecting a
slice of bytes) and the `Contacts` field (from expecting a possibly-nil
pointer to relying on the value of the `ContactsPresent` boolean field).
Implement two new methods in `sa/model.go` to convert directly between
database models and protobuf messages, rather than round-tripping
through `core` objects in between. Delete the older methods that
converted between database models and `core` objects, as they are no
longer necessary.
Update test mocks to have the correct signatures, and update tests to
not rely on `JSONWebKey` and instead use byte slices.
Fixes#5531
Add Honeycomb tracing to all Boulder components which act as
HTTP servers, gRPC servers, or gRPC clients. Add many values
which we currently emit to logs to the trace spans. Add a way to
configure the Honeycomb integration to our config files, and by
default configure all of our tests to "mute" (send nothing).
Followup changes will refine the configuration, attempt to reduce
the new dependency load, and introduce better sampling.
Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218
This change adds two new test assertion helpers, `AssertErrorIs`
and `AssertErrorWraps`. The former is a wrapper around `errors.Is`,
and asserts that the error's wrapping chain contains a specific (i.e.
singleton) error. The latter is a wrapper around `errors.As`, and
asserts that the error's wrapping chain contains any error which is
of the given type; it also has the same unwrapping side effect as
`errors.As`, which can be useful for further assertions about the
contents of the error.
It also makes two small changes to our `berrors` package, namely
making `berrors.ErrorType` itself an error rather than just an int,
and giving `berrors.BoulderError` an `Unwrap()` method which
exposes that inner `ErrorType`. This allows us to use the two new
helpers above to make assertions about berrors, rather than
having to hand-roll equality assertions about their types.
Finally, it takes advantage of the two changes above to greatly
simplify many of the assertions in our tests, removing conditional
checks and replacing them with simple assertions.
Normally we do this with a reverse proxy in front of Boulder, but it's
nice to have an additional layer of protection in case someone deploys
Boulder without a reverse proxy.
Fixes#4730.
In #3708, we added formatters for the the convenience methods in the
`probs` package.
However, in #4783, @alexzorin pointed out that we were incorrectly
passing an error message through fmt.Sprintf as the format parameter
rather than as a value parameter.
I proposed a fix in #4784, but during code review we concluded that the
underlying problem was the pattern of using format-style functions that
don't have some variant of printf in the name. That makes this wrong:
`probs.DNS(err.Error())`, and this right: `probs.DNS("%s", err)`. Since
that's an easy mistake to make and a hard one to spot during code review,
we're going to stop using this particular pattern and call `fmt.Sprintf`
directly.
This PR reverts #3708 and adds some `fmt.Sprintf` where needed.
Errors that were being returned in the checkAlgorithm methods of both wfe and
wfe2 didn't really match up to what was actually being checked. This change
attempts to bring the errors in line with what is actually being tested.
Fixes#4452.
This branch also updates the WFE2 parseJWS function to match the error string fixed in the upstream project for the case where a JWS EC public key fails to unmarshal due to an incorrect length.
Resolves#4300
To make log analysis easier we choose to elevate the pseudo ACME HTTP
method "POST-as-GET" to the `web.RequestEvent.Method` after processing
a valid POST-as-GET request, replacing the "POST" method value that will
have been set by the outermost handler.
Basically a complete re-write/re-design of the forwarding concept introduced in
#4297 (sorry for the rapid churn here). Instead of nonce-services blindly
forwarding nonces around to each other in an attempt to find out who issued the
nonce we add an identifying prefix to each nonce generated by a service. The
WFEs then use this prefix to decide which nonce-service to ask to validate the
nonce.
This requires a slightly more complicated configuration at the WFE/2 end, but
overall I think ends up being a way cleaner, more understandable, easy to
reason about implementation. When configuring the WFE you need to provide two
forms of gRPC config:
* one gRPC config for retrieving nonces, this should be a DNS name that
resolves to all available nonce-services (or at least the ones you want to
retrieve nonces from locally, in a two DC setup you might only configure the
nonce-services that are in the same DC as the WFE instance). This allows
getting a nonce from any of the configured services and is load-balanced
transparently at the gRPC layer.
* a map of nonce prefixes to gRPC configs, this maps each individual
nonce-service to it's prefix and allows the WFE instances to figure out which
nonce-service to ask to validate a nonce it has received (in a two DC setup
you'd want to configure this with all the nonce-services across both DCs so
that you can validate a nonce that was generated by a nonce-service in another
DC).
This balancing is implemented in the integration tests.
Given the current remote nonce code hasn't been deployed anywhere yet this
makes a number of hard breaking changes to both the existing nonce-service
code, and the forwarding code.
Fixes#4303.
Also excises the existing bad padding metrics code, adds a special error for when we encounter badly padded keys, and adds a test for the new special error.
Fixes#4070 and fixes#3964.
This allows POST-as-GET requests to Orders, Authorizations, Challenges, Certificates and Accounts. Legacy GET support remains for Orders, Authorizations, Challenges and Certificates. Legacy "POST {}" support for Accounts remains.
Resolves https://github.com/letsencrypt/boulder/issues/3871
Removes the checks for a handful of deployed feature flags in preparation for removing the flags entirely. Also moves all of the currently deprecated flags to a separate section of the flags list so they can be more easily removed once purged from production configs.
Fixes#3880.
ACME draft-13 changed the inner JWS' body to contain the old key
that signed the outer JWS. Because this is a backwards incompatible
change with the draft-12 ACME v2 key rollover we introduce a new feature
flag `features.ACME13KeyRollover` and conditionally use the old or new
key rollover semantics based on its value.
This commit updates the `wfe2/verify.go` implementation of
`acctIDFromURL` to include the invalid `kid` value being rejected as
part of the rejection error message. This will hopefully help
users/client developers understand the problem faster. Unit tests are
updated accordingly.
While we intended to allow legacy ACME v1 accounts created through the WFE to work with the ACME v2 implementation and the WFE2 we neglected to consider that a legacy account would have a Key ID URL that doesn't match the expected for a V2 account. This caused `wfe2/verify.go`'s `lookupJWK` to reject all POST requests authenticated by a legacy account unless the ACME client took the extra manual step of "fixing" the URL.
This PR adds a configuration parameter to the WFE2 for an allowed legacy key ID prefix. The WFE2 verification logic is updated to allow both the expected key ID prefix and the configured legacy key ID prefix. This will allow us to specify the correct legacy URL in configuration for both staging/prod to allow unmodified V1 ACME accounts to be used with ACME v2.
Resolves https://github.com/letsencrypt/boulder/issues/3674
Many of the probs.XYZ calls are of the form probs.XYZ(fmt.Sprintf(...)).
Convert these functions to take a format string and optional arguments,
following the same pattern used in the errors package. Convert the
various call sites to remove the now redundant fmt.Sprintf calls.
For a long time now the WFE has generated URLs based on the incoming
request rather than a hardcoded BaseURL. BaseURL is no longer set in the
prod configs.
This also allows factoring out relativeEndpoint into the web package.
This commit adds a new WFE2 feature flag "EnforceV2ContentType". When
enabled, the WFE2's validPostRequest function will enforce that the
request carries a Content-Type header equal to
application/jose+json. This is required by ACME draft-10 per section
6.2 "Request Authentication".
This is behind a feature flag because it is likely to break
some number of existing ACMEv2 clients that may not be sending the
correct Content-Type.
We are defaulting to not setting the new feature flag in test/config-next
because it currently break's Certbot's acme module's revocation support
and we rely on this in our V2 integration tests.
Resolves#3529