Replace the current three-piece setup (enum of feature variables, map of
feature vars to default values, and autogenerated bidirectional maps of
feature variables to and from strings) with a much simpler one-piece
setup: a single struct with one boolean-typed field per feature. This
preserves the overall structure of the package -- a single global
feature set protected by a mutex, and Set, Reset, and Enabled methods --
although the exact function signatures have all changed somewhat.
The executable config format remains the same, so no deployment changes
are necessary. This change does deprecate the AllowUnrecognizedFeatures
feature, as we cannot tell the json config parser to ignore unknown
field names, but that flag is set to False in all of our deployment
environments already.
Fixes https://github.com/letsencrypt/boulder/issues/6802
Fixes https://github.com/letsencrypt/boulder/issues/5229
- Move default and override limits, and associated methods, out of the
Limiter to new limitRegistry struct, embedded in a new public
TransactionBuilder.
- Export Transaction and add corresponding Transaction constructor
methods for each limit Name, making Limiter and TransactionBuilder the
API for interacting with the ratelimits package.
- Implement batched Spends and Refunds on the Limiter, the new methods
accept a slice of Transactions.
- Add new boolean fields check and spend to Transaction to support more
complicated cases that can arise in batches:
1. the InvalidAuthorizations limit is checked at New Order time in a
batch with many other limits, but should only be spent when an
Authorization is first considered invalid.
2. the CertificatesPerDomain limit is overridden by
CertficatesPerDomainPerAccount, when this is the case, spends of the
CertificatesPerDomain limit should be "best-effort" but NOT deny the
request if capacity is lacking.
- Modify the existing Spend/Refund methods to support
Transaction.check/spend and 0 cost Transactions.
- Make bucketId private and add a constructor for each bucket key format
supported by ratelimits.
- Move domainsForRateLimiting() from the ra.go to ratelimits. This
avoids a circular import issue in ra.go.
Part of #5545
This is a cleanup PR finishing the migration from int64 timestamps to
protobuf `*timestamppb.Timestamps` by removing all usage of the old
int64 fields. In the previous PR
https://github.com/letsencrypt/boulder/pull/7121 all fields were
switched to read from the protobuf timestamppb fields.
Adds a new case to `core.IsAnyNilOrZero` to check various properties of
a `*timestamppb.Timestamp` reducing the visual complexity for receivers.
Fixes https://github.com/letsencrypt/boulder/issues/7060
The `Limiter` API has been adjusted significantly to both improve both
safety and ergonomics and two `Limit` types have been corrected to match
the legacy implementations.
**Safety**
Previously, the key used for looking up limit overrides and for fetching
individual buckets from the key-value store was constructed within the
WFE. This posed a risk: if the key was malformed, the default limit
would still be enforced, but individual overrides would fail to function
properly. This has been addressed by the introduction of a new
`BucketId` type along with a `BucketId` constructor for each `Limit`
type. Each constructor is responsible for producing a well-formed bucket
key which undergoes the very same validation as any potentially matching
override key.
**Ergonomics**
Previously, each of the `Limiter` methods took a `Limit` name, a bucket
identifier, and a cost to be spent/ refunded. To simplify this, each
method now accepts a new `Transaction` type which provides a cost, and
wraps a `BucketId` identifying the specific bucket.
The two changes above, when taken together, make the implementation of
batched rate limit transactions considerably easier, as a batch method
can accept a slice of `Transaction`.
**Limit Corrections**
PR #6947 added all of the existing rate limits which could be made
compatible with the key-value approach. Two of these were improperly
implemented;
- `CertificatesPerDomain` and `CertificatesPerFQDNSet`, were implemented
as
- `CertificatesPerDomainPerAccount` and
`CertificatesPerFQDNSetPerAccount`.
Since we do not actually associate these limits with a particular ACME
account, the `regID` portion of each of their bucket keys has been
removed.
The RequireCommonName feature flag was our only "inverted" feature flag,
which defaulted to true and had to be explicitly set to false. This
inversion can lead to confusion, especially to readers who expect all Go
default values to be zero values. We plan to remove the ability for our
feature flag system to support default-true flags, which the existence
of this flag blocked. Since this flag has not been set in any real
configs, inverting it is easy.
Part of https://github.com/letsencrypt/boulder/issues/6802
* Adds new `google.protobuf.Timestamp` fields to each .proto file where
we had been using `int64` fields as a timestamp.
* Updates relevant gRPC messages to populate the new
`google.protobuf.Timestamp` fields in addition to the old `int64`
timestamp fields.
* Added tests for each `<x>ToPB` and `PBto<x>` functions to ensure that
new fields passed into a gRPC message arrive as intended.
* Removed an unused error return from `PBToCert` and `PBToCertStatus`
and cleaned up each call site.
Built on-top of https://github.com/letsencrypt/boulder/pull/7069
Part 2 of 4 related to
https://github.com/letsencrypt/boulder/issues/7060
Integrate the key-value rate limits from #6947 into the WFE. Rate limits
are backed by the Redis source added in #7016, and use the SRV record
shard discovery added in #7042.
Part of #5545
Fix an issue related to the custom gRPC Picker implementation introduced
in #6618. When a nonce contained a prefix not associated with a known
backend, the Picker would continuously rebuild, re-resolve DNS, and
eventually throw a 500 "Server Error" at RPC timeout. The Picker now
promptly returns a 400 "Bad Nonce" error as expected, in response the
requesting client should retry their request with a fresh nonce.
Additionally:
- WFE unit tests use derived nonces when `"BOULDER_CONFIG_DIR" ==
"test/config-next"`.
- `Balancer.Build()` in "noncebalancer" forces a rebuild until non-zero
backends are available. This matches the
[balancer/roundrobin](d524b40946/balancer/roundrobin/roundrobin.go (L49-L53))
implementation.
- Nonces with no matching backend increment "jose_errors" with label
`"type": "JWSInvalidNonce"` and "nonce_no_backend_found".
- Nonces of incorrect length are now rejected at the WFE and increment
"jose_errors" with label `"type": "JWSMalformedNonce"` instead of
`"type": "JWSInvalidNonce"`.
- Nonces not encoded as base64url are now rejected at the WFE and
increment "jose_errors" with label `"type": "JWSMalformedNonce"` instead
of `"type": "JWSInvalidNonce"`.
Fixes#6969
Part of #6974
This was introduced early in Boulder development when we had the concept
of a "short serial" (monotonically increasing) which would be prepended
to random bytes to form the full serial. We wanted to specially report
the case that there were duplicates of a given short serial since it
meant a problem with our monotonicity.
We've long since abandoned that idea, and also this code can't be
exercised because sa.SelectCertificate does a LIMIT 1 anyhow.
This can take two values (typically the return values of a two-value
function) and panic if the error is non-nil, returning the interesting
value. This is particularly useful for cases where we statically know
the call will succeed.
Thanks to @mcpherrinm for the idea!
The contacts field of an account can be very verbose, and is irrelevant
to the vast majority -- e.g. creating orders, validating challenges, and
downloading certificates -- of requests made by an account. To reduce
the length of our WFE log lines, remove the Contacts field from all
logs. When we actually need it, we can get it from the database.
Also remove the RequestEvent.TLS field, which is unused.
In WFE, we do a goodkey check when validating a self-authenticated POST
(i.e. when creating an account). For a while, that was a purely local
check, looking at a list of bad keys or bad moduluses, or checking for
factorability. At some point we also added a backend check, querying the
SA to see if a key was blocked. However, we did not update this one code
path to distinguish "bad key" from "timeout querying SA." That meant
that sometimes we would give a badPublicKey Problem Document when we
should have given an internalServerError.
Related:
https://github.com/letsencrypt/boulder/issues/6795#issuecomment-1574217398
Define a `bJSONWebSignature` struct which embeds a
`*jose.JSONWebSignature`. The only method that can produce a
`bJSONWebSignature` is `wfe.parseJWS` so that we can ensure
safety/sanity checks are performed on the incoming data. Restricts
several methods and functions to take a `jose.Header` as an input
parameter, rather than a full JWS.
Fixes https://github.com/letsencrypt/boulder/issues/5676.
Remove the remaining divergences from RFC8555 regarding what error types
we use in certain situations. Specifically:
- use "invalidContact" instead of "invalidEmail";
- use "unsupportedContact" for contact addresses that use a protocol
other than "mailto:"; and
- use "unsupportedIdentifier" for identifiers that specify a type other
than "dns".
This PR adds a new configuration block specifically for the otelhttp
instrumentation. This block is separate from the existing
"opentelemetry" configuration, and is only relevant when using otelhttp
instrumentation. It does not share any codepath with the existing
configuration, so it is at the top level to indicate which services it
applies to.
There's a bit of plumbing new configuration through. I've adopted the
measured_http package to also set up opentelemetry instead of just
metrics, which should hopefully allow any future changes to be smaller
(just config & there) and more consistent between the wfe2 and ocsp
responder.
There's one option here now, which disables setting
[otelhttp.WithPublicEndpoint](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp#WithPublicEndpoint).
This option is designed to do exactly what we want: Don't accept
incoming spans as parents of the new span created in the server.
Previously we had a setting to disable parent-based sampling to help
with this problem, which doesn't really make sense anymore, so let's
just remove it and simplify that setup path. The default of "false" is
designed to be the safe option. It's set to True in the test/ configs
for integration tests that use traces, and I expect we'll likely set it
true in production eventually once the LBs are configured to handle
tracing themselves.
Fixes#6851
Make minor, non-user-visible changes to how we structure the probs
package. Notably:
- Add new problem types for UnsupportedContact and
UnsupportedIdentifier, which are specified by RFC8555 and which we will
use in the future, but haven't been using historically.
- Sort the problem types and constructor functions to match the
(alphabetical) order given in RFC8555.
- Rename some of the constructor functions to better match their
underlying problem types (e.g. "TLSError" to just "TLS").
- Replace the redundant ProblemDetailsToStatusCode function with simply
always returning a 500 if we haven't properly set the problem's
HTTPStatus.
- Remove the ability to use either the V1 or V2 error namespace prefix;
always use the proper RFC namespace prefix.
Update the document number to the latest version, and remove the /get/
prefix since it now supports both the GET and POST portions of the spec.
Also update one piece of tooling to properly get the ARI URL from the
directory, rather than hard-coding it.
Add a new shared config stanza which all boulder components can use to
configure their Open Telemetry tracing. This allows components to
specify where their traces should be sent, what their sampling ratio
should be, and whether or not they should respect their parent's
sampling decisions (so that web front-ends can ignore sampling info
coming from outside our infrastructure). It's likely we'll need to
evolve this configuration over time, but this is a good starting point.
Add basic Open Telemetry setup to our existing cmd.StatsAndLogging
helper, so that it gets initialized at the same time as our other
observability helpers. This sets certain default fields on all
traces/spans generated by the service. Currently these include the
service name, the service version, and information about the telemetry
SDK itself. In the future we'll likely augment this with information
about the host and process.
Finally, add instrumentation for the HTTP servers and grpc
clients/servers. This gives us a starting point of being able to monitor
Boulder, but is fairly minimal as this PR is already somewhat unwieldy:
It's really only enough to understand that everything is wired up
properly in the configuration. In subsequent work we'll enhance those
spans with more data, and add more spans for things not automatically
traced here.
Fixes https://github.com/letsencrypt/boulder/issues/6361
---------
Co-authored-by: Aaron Gable <aaron@aarongable.com>
When sending an ARI response, write the Retry-After header before
writing the JSON response body. This is necessary because
http.ResponseWriter implicitly calls WriteHeader whenever Write is
called, flushing all headers to the network and preventing any
additional headers from being written. Unfortunately, the unittests use
httptest.ResponseRecorder, which doesn't seem to enforce this invariant
(it's happy to report headers which were written after the body). Add a
header check to the integration tests, to make up for this deficiency.
Change the SetCommonName flag, introduced in #6706, to
RequireCommonName. Rather than having the flag control both whether or
not a name is hoisted from the SANs into the CN *and* whether or not the
CA is willing to issue certs with no CN, this updated flag now only
controls the latter. By default, the new flag is true, and continues our
current behavior of failing issuance if we cannot set a CN in the cert.
When the flag is set to false, then we are willing to issue certificates
for which the CSR contains no CN and there is no SAN short enough to be
hoisted into the CN field.
When we have rolled out this change, we can move on to the next flag in
this series: HoistCommonName, which will control whether or not a SAN is
hoisted at all, effectively giving the CSRs (and therefore the clients)
full control over whether their certificate contains a SAN.
This change is safe because no environment explicitly sets the
SetCommonName flag to false yet.
Fixes#5112
Remove tracing using Beeline from Boulder. The only remnant left behind
is the deprecated configuration, to ensure deployability.
We had previously planned to swap in OpenTelemetry in a single PR, but
that adds significant churn in a single change, so we're doing this as
multiple steps that will each be significantly easier to reason about
and review.
Part of #6361
For simple 404s, there's no need to log an InternalError in addition to
the user-facing error, so pass `nil` to sendError as the internalError
parameter. This cleans up Certificate, GetOrder, and FinalizeOrder; all
the places I could find that checked for `NotFound` and also logged an
unnecessary InternalError.
I also removed a redundant an unnecessary error wrapping, and a
reference to "short serial" which is not a concept we have anymore.
This was mostly unused. The only caller was orphan-finder, which used it
to determine if a certificate was already in the database. But this is
not particularly important functionality, so I've removed it.
[RFC 8555 section
7.4](https://www.rfc-editor.org/rfc/rfc8555.html#section-7.4) states
regarding Orders in the "processing" state:
> "processing": The certificate is being issued. Send a POST-as-GET
> request after the time given in the Retry-After header field of
> the response, if any.
Add a Retry-After header when serving Order objects that are in the
"processing" state. This may help control clients which implement Order
polling but without any built-in backoff. The retry interval is
hard-coded to be 3s, slightly above our current 99th percentile Finalize
latency.
Remove the `MandatoryPOSTasGET` flag from the WFE2.
Update the ACMEv2 divergence doc to note that neither staging nor
production use MandatoryPOSTasGET.
Fixes#6582.
Also remove CSRDNSNames, CSRIPAddresses and CSREmailAddresses.
And add a new log field "DNSNames", for use in new-order, finalize, and
revoke requests.
Add a "RevocationReason" field in the "Extra" section for revoke
requests.
Give ARI improved error messages when no request path is specified and
when parsing of the request path blob fails.
Also, add a tool which can be used to quickly generate ARI requests and
print their results, to make manual spot-checking easier.
Fixes#6629
Assign nonce prefixes for each nonce-service by taking the first eight
characters of the the base64url encoded HMAC-SHA256 hash of the RPC
listening address using a provided key. The provided key must be same
across all boulder-wfe and nonce-service instances.
- Add a custom `grpc-go` load balancer implementation (`nonce`) which
can route nonce redemption RPC messages by matching the prefix to the
derived prefix of the nonce-service instance which created it.
- Modify the RPC client constructor to allow the operator to override
the default load balancer implementation (`round_robin`).
- Modify the `srv` RPC resolver to accept a comma separated list of
targets to be resolved.
- Remove unused nonce-service `-prefix` flag.
Fixes#6404
Originally, WFEs had a built-in nonce service. Then we added a "remote
nonce service" via gRPC, but we kept a fallback path for when the remote
nonce service was not configured, to use a built-in nonce service. This
PR removes that fallback path.
Since the fallback path was relied on by the unittests, this also
refactors the unittests to use a gRPC-style nonce service (but in-memory
for the unittests).
Fixes#6530
Update our implementation of ARI to return a renewal window entirely in
the past (i.e., suggesting immediate renewal) if the certificate in
question has been revoked for any reason. This will allow clients which
implement ARI to discover that they need to replace their certificate
without having to query OCSP directly, especially as we move into a
future where OCSP is mostly supplanted by aggregated CRLs.
Fixes#6503
In the WFE, ocsp-responder, and crl-updater, switch from using
StorageAuthorityClients to StorageAuthorityReadOnlyClients. This ensures
that these services cannot call methods which write to our database.
Fixes#6454
When a client cancels their HTTP request, that propagates into
gRPC-level cancellation. But we don't want those canceled RPCs to show
up as 500s, we want them to show up as 408s. We do that by producing a
special ProblemDetails at the gRPC level. However, for that trick to
work, we need to make sure errors from RPC methods get passed through
web.ProblemDetailsForError. There were some places where we didn't do
this and instead created a from-scratch ProblemDetails. This resulted in
spurious 500s.
My methodology was to look at every method call on each of the WFE's
fields that represents a gRPC backend: `ra`, `sa`, `accountGetter`,
`nonceService`, `remoteNonceServe`. If the error handling for that call
did not use web.ProblemDetailsForError, I changed it to use that.
Fixes#6524
In draft because still needs tests.
Clean up several spots where we were behaving differently on
go1.18 and go1.19, now that we're using go1.19 everywhere. Also
re-enable the lint and generate tests, and fix the various places where
the two versions disagreed on how comments should be formatted.
Also clean up the OldTLS codepaths, now that both go1.19 and our
own feature flags have forbidden TLS < 1.2 everywhere.
Fixes#6011
- Add a new field, `RetryAfter` to `BoulderError`s
- Add logic to wrap/unwrap the value of the `RetryAfter` field to our gRPC error
interceptor
- Plumb `RetryAfter` for `DuplicateCertificateError` emitted by RA to the WFE
client response header
Part of #6256
Suggest that subscribers with certificates impacted by an ongoing revocation
incident renew immediately.
- Make SA method `IncidentsForSerial` a callable RPC
Resolves#6282
Update our ACME Renewal Info implementation to parse
the CertID-based request format specified in the current
version of the draft specification.
Part of #6033
Enable the "unparam" linter, which checks for unused function
parameters, unused function return values, and parameters and
return values that always have the same value every time they
are used.
In addition, fix many instances where the unparam linter complains
about our existing codebase. Remove error return values from a
number of functions that never return an error, remove or use
context and test parameters that were previously unused, and
simplify a number of (mostly test-only) functions that always take the
same value for their parameter. Most notably, remove the ability to
customize the RSA Public Exponent from the ceremony tooling,
since it should always be 65537 anyway.
Fixes#6104
The iotuil package has been deprecated since go1.16; the various
functions it provided now exist in the os and io packages. Replace all
instances of ioutil with either io or os, as appropriate.
The `affiliationChanged` revocation reason is only relevant
to certificates which contain Subject Identity Information.
As we only issue DV certificates, which cannot contain such
information, our certificates should not be able to be revoked
for this reason.
See https://groups.google.com/a/mozilla.org/g/dev-security-policy/c/m3-XPcVcJ9M
Add a new `GenerateOCSP` gRPC method to the RA, which
passes the request through to the CA and returns the resulting
OCSP response bytes. This method does not store the new
response anywhere, it is up to the client (in this case intended to
be the new ocsp-responder's live-signing code path) to store it
in any/all appropriate locations.
Fixes#6189
This special casing was originally added to catch a bug where
the underlying library said that the EC *private* key was the
wrong length. We were also interested in exposing this specific
error to clients because EC key-length strictness was added to
go-jose and we wanted to ensure that upgrading to the new
version wouldn't break existing clients.
We are now several years past that upgrade, and hit this error
case very infrequently. There is no need to continue special-
casing it.
Stop logging the full CSR bytes at the WFE; we log these bytes in the
CA immediately before attempting to sign a precertificate anyway.
Also unify the way we log precertificate and final certificate signing
events in the CA: an Info containing the request prior to signing, and
then either an Info or an Error containing the result afterwards.
Fixes#6088
Add functionality to the ubiquitous RequestEvent (aka logEvent) to
allow handlers to suppress the final log line that is printed when
a non-500 response is being sent.
Use this functionality to suppress logging GET requests for the /directory
endpoint. We can expand this in the future to quiet other logs that
are not helpful for metrics or analysis.
Fixes#6094
These new linters are almost all part of golangci-lint's collection
of default linters, that would all be running if we weren't setting
`disable-all: true`. By adding them, we now have parity with the
default configuration, as well as the additional linters we like.
Adds the following linters:
* unconvert
* deadcode
* structcheck
* typecheck
* varcheck
* wastedassign
This adds three features flags: SHA1CSRs, OldTLSOutbound, and
OldTLSInbound. Each controls the behavior of an upcoming deprecation
(except OldTLSInbound, which isn't yet scheduled for a deprecation
but will be soon). Note that these feature flags take advantage of
`features`' default values, so they can default to "true" (that is, each
of these features is enabled by default), and we set them to "false"
in the config JSON to turn them off when the time comes.
The unittest for OldTLSOutbound requires that `example.com` resolves
to 127.0.0.1. This is because there's logic in the VA that checks
that redirected-to hosts end in an IANA TLD. The unittest relies on
redirecting, and we can't use e.g. `localhost` in it because of that
TLD check, so we use example.com.
Fixes#6036 and #6037
Simplify the WFE `RevokeCertificate` API method in three ways:
- Remove most of the logic checking if the requester is authorized to
revoke the certificate in question (based on who is making the
request, what authorizations they have, and what reason they're
requesting). That checking is now done by the RA. Instead, simply
verify that the JWS is authenticated.
- Remove the hard-to-read `authorizedToRevoke` callbacks, and make the
`revokeCertBySubscriberKey` (nee `revokeCertByKeyID`) and
`revokeCertByCertKey` (nee `revokeCertByJWK`) helpers much more
straight-line in their execution logic.
- Call the RA's new `RevokeCertByApplicant` and `RevokeCertByKey` gRPC
methods, rather than the deprecated `RevokeCertificateWithReg`.
This change, without any flag flips, should be invisible to the
end-user. It will slightly change some of our log message formats.
However, by now relying on the new RA gRPC revocation methods, this
change allows us to change our revocation policies by enabling the
`AllowDoubleRevocation` and `MozRevocationReasons` feature flags, which
affect the behavior of those new helpers.
Fixes#5936
Add two new gRPC methods to the SA:
- `RevokeCertByKey` will be used when the API request was signed by the
certificate's keypair, rather than a Subscriber keypair. If the
request is for reason `keyCompromise`, it will ensure that the key is
added to the blocked keys table, and will attempt to "re-revoke" a
certificate that was already revoked for some other reason.
- `RevokeCertByApplicant` supports both the path where the original
subscriber or another account which has proven control over all of the
identifier in the certificate requests revocation via the API. It does
not allow the requested reason to be `keyCompromise`, as these
requests do not represent a demonstration of key compromise.
In addition, add a new feature flag `MozRevocationReasons` which
controls the behavior of these new methods. If the flag is not set, they
behave like they have historically (see above). If the flag is set to true,
then the new methods enforce the upcoming Mozilla policies around
revocation reasons, namely:
- Only the original Subscriber can choose the revocation reason; other
clients will get a set reason code based on the method of requesting
revocation. When the original Subscriber requests reason
`keyCompromise`, this request will be honored, but the key will not be
blocked and other certificates with that key will not also be revoked.
- Revocations signed with the certificate key will always get reason
`keyCompromise`, because we do not know who is sending the request and
therefore must assume that the use of the key in this way represents
compromise. Because these requests will always be fore reason
`keyCompromise`, they will always be added to the blocked keys table
and they will always attempt "re-revocation".
- Revocations authorized via control of all names in the cert will
always get reason `cessationOfOperation`, which is to be used when the
original Subscriber does not control all names in the certificate
anymore.
Finally, update the existing `AdministrativelyRevokeCertificate` method
to use the new helper functions shared by the two new methods.
Part of #5936
We have decided that we don't like the if err := call(); err != nil
syntax, because it creates confusing scopes, but we have not cleaned up
all existing instances of that syntax. However, we have now found a
case where that syntax enables a bug: It caused readers to believe that
a later err = call() statement was assigning to an already-declared err
in the local scope, when in fact it was assigning to an
already-declared err in the parent scope of a closure. This caused our
ineffassign and staticcheck linters to be unable to analyze the
lifetime of the err variable, and so they did not complain when we
never checked the actual value of that error.
This change standardizes on the two-line error checking syntax
everywhere, so that we can more easily ensure that our linters are
correctly analyzing all error assignments.
Add `stylecheck` to our list of lints, since it got separated out from
`staticcheck`. Fix the way we configure both to be clearer and not
rely on regexes.
Additionally fix a number of easy-to-change `staticcheck` and
`stylecheck` violations, allowing us to reduce our number of ignored
checks.
Part of #5681
These gRPC methods were only used by the ACMEv1 code paths.
Now that boulder-wfe has been fully removed, we can be confident
that no clients ever call these methods, and can remove them from
the gRPC service interface.
Part of #5816
Followup from #5839.
I chose groupcache/lru as our LRU cache implementation because it's part
of the golang org, written by one of the Go authors, and very simple
and easy to read.
This adds an `AccountGetter` interface that is implemented by both the
AccountCache and the SA. If the WFE config includes an AccountCache field,
it will wrap the SA in an AccountCache with the configured max size and
expiration time.
We set an expiration time on account cache entries because we want a
bounded amount of time that they may be stale by. This will be used in
conjunction with a delay on account-updating pathways to ensure we don't
allow authentication with a deactivated account or changed key.
The account cache stores corepb.Registration objects because protobufs
have an established way to do a deep copy. Deep copies are important so
the cache can maintain its own internal state and ensure nothing external
is modifying it.
As part of this process I changed construction of the WFE. Previously,
"SA" and "RA" were public fields that were mutated after construction. Now
they are parameters to the constructor, along with the new "accountGetter"
parameter.
The cache includes stats for requests categorized by hits and misses.
These tests are testing functionality that is no longer in use in
production deployments of Boulder. As we go about removing wfe1
functionality, these tests will break, so let's just remove them
wholesale right now. I have verified that all of the tests removed in
this PR are duplicated against wfe2.
One of the changes in this PR is to cease starting up the wfe1 process
in the integration tests at all. However, that component was serving
requests for the AIA Issuer URL, which gets queried by various OCSP and
revocation tests. In order to keep those tests working, this change also
adds an integration-test-only handler to wfe2, and updates the CA
configuration to point at the new handler.
Part of #5681
Re-parsing the certificate after we're sure we issued it accomplishes
nothing except wasting CPU cycles. This duplicate work was left over
after the removal of the old codepath which was incapable of revoking
precertificates.
The draft requires that the renewalInfo endpoint have a
Retry-After header indicating how often clients should poll
for their renewal information. As per our previous thinking,
set this timer to 6 hours for now.
Fixes#5765
Add a unit test and an integration test that both exercise the new
experimental ACME Renewal Info endpoint. These tests do not
yet validate the contents of the response, just that the appropriate
HTTP response code is returned, but they will be developed as the
code under test evolves.
Fixes#5674
Add a new feature flag to control whether or not the experimental ARI
information is exposed. Add a new entry to the Directory object which
provides the base URL for ARI requests. Add a new handler to the WFE
which parses incoming requests and returns reasonable renewalInfo.
Part of #5674
Update the version of golangci-lint we use in our docker image,
and update the version of the docker image we use in our tests.
Fix a couple places where we were violating lints (ineffective assign
and calling `t.Fatal` from outside the main test goroutine), and add
one lint (using math/rand) to the ignore list.
Fixes#5710
Remove the last of the gRPC wrapper files. In order to do so:
- Remove the `core.StorageGetter` interface. Replace it with a new
interface (whose methods include the `...grpc.CallOption` arg)
inside the `sa/proto/` package.
- Remove the `core.StorageAdder` interface. There's no real use-case
for having a write-only interface.
- Remove the `core.StorageAuthority` interface, as it is now redundant
with the autogenerated `sapb.StorageAuthorityClient` interface.
- Replace the `certificateStorage` interface (which appears in two
different places) with a single unified interface also in `sa/proto/`.
- Update all test mocks to include the `_ ...grpc.CallOption` arg in
their method signatures so they match the gRPC client interface.
- Delete many methods from mocks which are no longer necessary (mostly
because they're mocking old authz1 methods that no longer exist).
- Move the two `test/inmem/` wrappers into their own sub-packages to
avoid an import cycle.
- Simplify the `satest` package to satisfy one of its TODOs and to
avoid an import cycle.
- Add many methods to the `test/inmem/sa/` wrapper, to accommodate all
of the methods which are called in unittests.
Fixes#5600
Remove the `GoodKey` check from `validSelfAuthenticatedJWS`, so that
the verification performed by `RevokeCertificate`'s `revokeCertByJWK`
path doesn't bail out early if the key being used to sign the revocation
request has already been blocklisted (most likely because someone else
has observed the same compromise and already submitted a similar
revocation request).
In turn, add the `GoodKey` check to `validSelfAuthenticatedPOST`, the
only other function that relies on `validSelfAuthenticatedJWS` as a
helper. This ensures that other self-authenticated code paths (namely
`NewAccount`) continue to enforce our key policies.
Fixes#5662
- Make `GetAuthorization2` a pass-through
- Make `GetAuthorizations2` a pass-through
- Make `GetPendingAuthorization2` a pass-through
- Make `GetValidOrderAuthorizations2` a pass-through
- Make `GetValidAuthorizations2` a pass-through
- Make `NewAuthorizations2` a pass-through
- Make `FinalizeAuthorization2` a pass-through
- Make `DeactivateAuthorization2` a pass-through
Fixes#5534
Make the gRPC wrappers for sa.GetCertificate and
sa.GetPrecertificate bare passthroughs. The latter of
these already took and returned appropriate protobufs,
so this change mostly just makes the former look like the
latter.
Part of #5532
- Move `DeactivateAuthorization` logic from `grpc` to `ra` and `wfe`
- Update `ra` mocks in `wfe` tests
- Remove unnecessary marshalling between `core.Authorization` and
`corepb.Authorization` in `ra` tests.
Fixes#5562
Remove all error checking and type transformation from the gRPC wrappers
for the following methods on the SA:
- GetRegistration
- GetRegistrationByKey
- NewRegistration
- UpdateRegistration
- DeactivateRegistration
Update callers of these methods to construct the appropriate protobuf
request messages directly, and to consume the protobuf response messages
directly. In many cases, this requires changing the way that clients
handle the `Jwk` field (from expecting a `JSONWebKey` to expecting a
slice of bytes) and the `Contacts` field (from expecting a possibly-nil
pointer to relying on the value of the `ContactsPresent` boolean field).
Implement two new methods in `sa/model.go` to convert directly between
database models and protobuf messages, rather than round-tripping
through `core` objects in between. Delete the older methods that
converted between database models and `core` objects, as they are no
longer necessary.
Update test mocks to have the correct signatures, and update tests to
not rely on `JSONWebKey` and instead use byte slices.
Fixes#5531
- Move `AdministrativelyRevokeCertificate` logic from `grpc` to `ra`
- Test new error conditions in `ra/ra_test.go`
- Update `ra` mocks in `wfe` tests
Fixes#5529
- Move response validation from `RA` client wrapper to `WFE` and `WFE2`
- Move request validation from `RA` server wrapper to `RA`
- Refactor `RA` tests to construct valid `core.Authorization` objects
- Consolidate multiple error declarations to global `errIncompleteGRPCRequest`
Fixes#5439
Move the logEvent.Endpoint and .Slug assignment as well as tracing to the
top of the HandlerFunc so a return cannot happen before the assignment.
Fixes cases where the endpoint is blank in logs in certain error cases.
Fixes: #5432
Replace `core.Empty` with `google.protobuf.Empty` in all of our gRPC
methods which consume or return an empty protobuf. The golang core
proto libraries provide an empty message type, so there is no need
for us to reinvent the wheel.
This change is backwards-compatible and does not require a special
deploy. The protobuf message descriptions of `core.Empty` and
`google.protobuf.Empty` are identical, so their wire-formats are
indistinguishable and therefore interoperable / cross-compatible.
Fixes#5443
Update the signature of the RA's RevokeCertificateWithReg
method to exactly match that of the gRPC method it implements.
Remove all logic from the `RevokeCertificateWithReg` client
and server wrappers. Move the small amount of checking they
were performing directly into the server implementation.
Fixes#5440
Add Honeycomb tracing to all Boulder components which act as
HTTP servers, gRPC servers, or gRPC clients. Add many values
which we currently emit to logs to the trace spans. Add a way to
configure the Honeycomb integration to our config files, and by
default configure all of our tests to "mute" (send nothing).
Followup changes will refine the configuration, attempt to reduce
the new dependency load, and introduce better sampling.
Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218
Update all of our tests to use `AssertMetricWithLabelsEquals`
instead of combinations of the older `CountFoo` helpers with
simple asserts. This coalesces all of our prometheus inspection
logic into a single function, allowing the deletion of four separate
helper functions.
Check at construction time that at least one issuerCertificate and
at least one certificateChain have been provided. All of our configs
populate these fields, and not populating them just results in other
errors later on, so catch missing configuration early.
When we receive a request for a certificate for which the WFE no longer
has the issuer configured in its certificate chains, and the requested
certificate is expired, return just the bare cert rather than returning
a 500 error.
To enable this, refactor the chain-construction logic to occur inside
a closure, so that both error-path and non-error-path early returns
are possible. This also simplifies the chain construction logic to be
more straight-line and readable, despite taking place inside a
closure.
Fixes#5345
In wfe2, set the `Boulder-Requester:` header from the NewAccount API
endpoint when returning a pre-existing account.
This header is already set when this endpoint creates a new account,
but was omitted when returning an existing account.
Add a check to `wfe2.Certificate` to ensure that the chain we select to
serve with the end-entity cert actually validates the end-entity's
signature. Add new test certificates, generated to match our actual
hierarchy. Update wfe2 tests to use the new test certificates, as well
as new mocks, in order to properly test the new check.
The new test certs and overhauled tests are necessary because the prior
wfe2 tests built and served chains that were not valid, and in
fact could not be valid because they were built with self-signed certs.
Fixes#5225
This change simplifies and hardens the wfe2's support for having
multiple issuers, and multiple chains for each issuer, configured
and loaded in memory.
The only config-visible change is replacing the old two separate config
values (`certificateChains` and `alternateCertificateChains`) with a
single value (`chains`). This new value does not require the user to
know and hand-code the AIA URLs at which the certificates are available;
instead the chains are simply presented as lists of files. If this new
config value is present, the old config values will be ignored; if it
is not, the old config values will be respected.
Behind the scenes, the chain loading code has been completely changed.
Instead of loading PEM bytes directly from the file, and then asserting
various things (line endings, no trailing bits, etc) about those bytes,
we now parse a certificate from the file, and in-memory recreate the
PEM from that certificate. This approach allows the file loading to be
much more forgiving, while also being stricter: we now check that each
certificate in the chain is correctly signed by the next cert, and that
the last cert in the chain is a self-signed root.
Within the WFE itself, most of the internal structure has been retained.
However, both the internal `issuerCertificates` (used for checking
that certs we are asked to revoke were in fact issued by us) and the
`certificateChains` (used to append chains to end-entity certs when
served to clients) have been updated to be maps keyed by IssuerNameID.
This allows revocation checking to not have to iterate through the
whole list of issuers, and also makes it easy to double-check that
the signatures on end-entity certs are valid before serving them. Actual
checking of the validity will come in a follow-up change, due to the
invasive nature of the necessary test changes.
Fixes#5164
The PrecertificateRevocation flag is turned on everywhere, so the
else case is unused code. This change updates the WFE to always
use the PrecertificateRevocation code path, and deprecates the old
feature flag.
The TestRevokeCertificateWithAuthz method was deleted because
it is redundant with TestRevokeCertificateWithAuthorizations.
Fixes#5240
We intend to delete the v1 API (i.e. `wfe` and its associated codepaths)
in the near future, and as such are not giving it new features or
capabilities. However, before then we intend to allow the v2 API to
provide issuance both from our RSA and from our ECDSA intermediates.
The v1 API cannot gain such capability at the same time.
The CA doesn't know which frontend originated any given issuance
request, so we can't simply gate the single- or double-issuer behavior
based on that. Instead, this change introduces the ability for the
WFE (and the RA, which sits between the WFE and the CA) to request
issuance from a specific intermediate. If the specified intermediate is
not available in the CA, issuance will fail. If no intermediate is
specified (as is the case in requests coming from wfe2), it falls back
to selecting the issuer based on the algorithm of the public key to
be signed.
Fixes#5216
This change deletes the (used, but not useful)
`mockSANoSuchRegistration` from the wfe2 tests, and updates all
places where the wfe tests create a mock SA to take the existing
mock SA as input instead of building another one from scratch from
the fakeclock.
The /issuer-cert endpoint was a holdover from the v1 API, where
it is a critical part of the issuance flow. In the v2 issuance
flow, the issuer certificate is provided directly in the response
for the certificate itself. Thus, this endpoint is redundant.
Stats show that it receives approximately zero traffic (less than
one request per week, all of which are now coming from wget or
browser useragents). It also complicates the refactoring necessary
for the v2 API to support multiple issuers.
As such, it is a safe and easy decision to remove it.
Fixes#5196
Having both of these very similar methods sitting around
only serves to increase confusion. This removes the last
few places which use `cmd.LoadCert` and replaces them
with `core.LoadCert`, and deletes the method itself.
Fixes#5163
This change adds two new test assertion helpers, `AssertErrorIs`
and `AssertErrorWraps`. The former is a wrapper around `errors.Is`,
and asserts that the error's wrapping chain contains a specific (i.e.
singleton) error. The latter is a wrapper around `errors.As`, and
asserts that the error's wrapping chain contains any error which is
of the given type; it also has the same unwrapping side effect as
`errors.As`, which can be useful for further assertions about the
contents of the error.
It also makes two small changes to our `berrors` package, namely
making `berrors.ErrorType` itself an error rather than just an int,
and giving `berrors.BoulderError` an `Unwrap()` method which
exposes that inner `ErrorType`. This allows us to use the two new
helpers above to make assertions about berrors, rather than
having to hand-roll equality assertions about their types.
Finally, it takes advantage of the two changes above to greatly
simplify many of the assertions in our tests, removing conditional
checks and replacing them with simple assertions.
If two revocation requests for the same cert arrive in rapid
succession, it is possible for one of them to complete in the
time between the other one's initial check (that the cert isn't
revoked already) and final database update. This causes
the db update to fail, and the request to end in a 500.
Other methods, such as new account registration, have a
three-step "check for duplicates, update db, if that failed
check for duplicates again" flow. This change updates the
cert revocation handlers to have a similar flow.
It returns the RFC 8555 `alreadyRevoked` from wfe2, to
match the error code returned by the initial duplicate
check. It returns the non-standardized 409 Conflict from
wfe, to match the code returned by that frontend's initial
duplicate check.
Fixes#5107
Previously this was a NotFound error, but since we now update the
certificateStatus table synchronously on issuance and revocation, we
expect to always get a successful response; if we get an error, that's
a ServerInternal error.
These were used during the transition to authzv2. The SA side of these
RPCs already ignores these booleans. This is just cleaning up the
protobufs and call sites.
As part of #5050, I'm updating some of the code in grpc/pb-marshaling.go
to move from nil checks to zero checks. In the process I'm introducing some
new zero checks, on things like challenge type, status, and token. This is
shaking out some places where our mocks have taken shortcuts by not
creating a "full" object including all fields that are normally present.
This PR updates our mocks and tests to provide more realistic objects in
all the places that broke when introducing those zero checks.
One slightly surprising / interesting thing: Since core types like
Order and Registration are still proto2 and have pointer fields,
there are actually some places in this PR where I had to add
a `*` rather than delete an `&`, because I was taking a pointer
field from one of those core types and passing it as a field in
an SA RPC request.
Fixes#5037.
Updates the Registration Authority to use proto3 for its
RPC methods. This turns out to be a fairly minimal change,
as many of the RA's request and response messages are
defined in core.proto, and are therefore still proto2.
Fixes#4955
ACME Challenges are well-known strings ("http-01", "dns-01", and
"tlsalpn-01") identifying which kind of challenge should be used
to verify control of a domain. Because they are well-known and
only certain values are valid, it is better to represent them as
something more akin to an enum than as bare strings. This also
improves our ability to ensure that an AcmeChallenge is not
accidentally used as some other kind of string in a different
context. This change also brings them closer in line with the
existing core.AcmeResource and core.OCSPStatus string enums.
Fixes#5009
This copies over a number of features flags and other settings from
test/config-next that have been applied in prod.
Also, remove the config-next gate on various tests.
The spec specifies (https://tools.ietf.org/html/rfc8555#section-7.2)
that a `no-store` Cache-Control header is required in response to
getting a new nonce. This PR makes that change specifically but does
not modify other uses of the `no-cache` directive.
Fixes#4727
Normally we do this with a reverse proxy in front of Boulder, but it's
nice to have an additional layer of protection in case someone deploys
Boulder without a reverse proxy.
Fixes#4730.
Currently 99.99% of RSA keys we see in certificates at Let's Encrypt are
either 2048, 3072, or 4096 bits, but we support every 8 bit increment
between 2048 and 4096. Supporting these uncommon key sizes opens us up to
having to block much larger ranges of keys when dealing with something
like the Debian weak keys incident. Instead we should just reduce the
set of key sizes we support down to what people actually use.
Fixes#4835.
When we originally added this package (4 years ago) x/crypto/ocsp didn't
have its own list of revocation reasons, so we added our own. Now it does
have its own list, so just use that list instead of duplicating code for
no real reason.
Also we build a list of the revocation reasons we support so that we can
tell users when they try to use an unsupported one. Instead of building
this string every time, just build it once it during package initialization.
Finally return the same error message in wfe that we use in wfe2 when a
user requests an unsupported reason.
In #3708, we added formatters for the the convenience methods in the
`probs` package.
However, in #4783, @alexzorin pointed out that we were incorrectly
passing an error message through fmt.Sprintf as the format parameter
rather than as a value parameter.
I proposed a fix in #4784, but during code review we concluded that the
underlying problem was the pattern of using format-style functions that
don't have some variant of printf in the name. That makes this wrong:
`probs.DNS(err.Error())`, and this right: `probs.DNS("%s", err)`. Since
that's an easy mistake to make and a hard one to spot during code review,
we're going to stop using this particular pattern and call `fmt.Sprintf`
directly.
This PR reverts #3708 and adds some `fmt.Sprintf` where needed.
Closes#4567.
Enabled in `config-next`.
This PR cross-signs the existing issuers (`test-ca-cross.pem`, `test-ca2-cross.pem`) with a new root (`test-root2.key`, `test-root2.pem` = *c2ckling cryptogr2pher f2ke ROOT*).
The cross-signed issuers are referenced in wfe2's configuration, beside the existing `certificateChains` key:
```json
"certificateChains": {
"http://boulder:4430/acme/issuer-cert": [ "test/test-ca2.pem" ],
"http://127.0.0.1:4000/acme/issuer-cert": [ "test/test-ca2.pem" ]
},
"alternateCertificateChains": {
"http://boulder:4430/acme/issuer-cert": [ "test/test-ca2-cross.pem" ],
"http://127.0.0.1:4000/acme/issuer-cert": [ "test/test-ca2-cross.pem" ]
},
```
When this key is populated, the WFE will send links for all alternate certificate chains available for the current end-entity certificate (except for the chain sent in the current response):
Link: <http://localhost:4001/acme/cert/ff5d3d84e777fc91ae3afb7cbc1d2c7735e0/1>;rel="alternate"
For backwards-compatibility, not specifying a chain is the same as specifying `0`: `/acme/cert/{serial} == /acme/cert/{serial}/0` and `0` always refers to the default certificate chain for that issuer (i.e. the value of `certificateChains[aiaIssuerURL]`).
This builds on the work @sh7dm started in #4600. I primarily did some
refactoring, added enforcement of the stale check for authorizations and
challenges, and completed the unit test coverage.
A new Boulder-specific (e.g. not specified by ACME / RFC 8555) API is added for
fetching order, authorization, challenge, and certificate resources by URL
without using POST-as-GET. Since we intend this API to only be used by humans
for debugging and we want to ensure ACME client devs use the standards compliant
method we restrict the GET API to only allowing access to "stale" resources
where the required staleness is defined by the WFE2 "staleTimeout"
configuration value (set to 5m in dev/CI).
Since authorizations don't have a creation date tracked we add
a `authorizationLifetimeDays` and `pendingAuthorizationLifetimeDays`
configuration parameter to the WFE2 that matches the RA's configuration. These
values are subtracted from the authorization expiry to find the creation date to
enforce the staleness check for authz/challenge GETs.
One other note: Resources accessed via the GET API will have Link relation URLs
pointing to the standard ACME API URL. E.g. a GET to a stale challenge will have
a response header with a link "up" relation URL pointing at the POST-as-GET URL
for the associated authorization. I wanted to avoid complicating
`prepAuthorizationForDisplay` and `prepChallengeForDisplay` to be aware of the
GET API and update or exclude the Link relations. This seems like a fine
trade-off since we don't expect machine consumption of the GET API results
(these are for human debugging).
Replaces #4600Resolves#4577
After the prev. cleanup of legacy authz1 bits the `authzLookupFunc`
interface and the associated `handleAuthorization` function are only
used in one place for handling authz2 resources. This commit cleans
this now unneeded abstraction up (and also removes the "V2" suffix
from the challenge and authz handlers).
In a handful of places I've nuked old stats which are not used in any alerts or dashboards as they either duplicate other stats or don't provide much insight/have never actually been used. If we feel like we need them again in the future it's trivial to add them back.
There aren't many dashboards that rely on old statsd style metrics, but a few will need to be updated when this change is deployed. There are also a few cases where prometheus labels have been changed from camel to snake case, dashboards that use these will also need to be updated. As far as I can tell no alerts are impacted by this change.
Fixes#4591.
Prev. we weren't checking the domain portion of an email contact address
very strictly in the RA. This updates the PA to export a function that
can be used to validate the domain the same way we validate domain
portions of DNS type identifiers for issuance.
This also changes the RA to use the `invalidEmail` error type in more
places.
A new Go integration test is added that checks these errors end-to-end
for both account creation and account update.
Errors that were being returned in the checkAlgorithm methods of both wfe and
wfe2 didn't really match up to what was actually being checked. This change
attempts to bring the errors in line with what is actually being tested.
Fixes#4452.
This branch also updates the WFE2 parseJWS function to match the error string fixed in the upstream project for the case where a JWS EC public key fails to unmarshal due to an incorrect length.
Resolves#4300