The Boulder WFE accepts incoming connections (from our load balancers)
via either TLS or plain HTTP. When those connections are made over TLS,
it already enforces that the client be using TLS 1.3 or above. When those
connections are made over plain HTTP, the load balancer includes the TLS
version as a header, and Boulder was performing filtering based on that.
Our load balancers are now configured to reject older TLS versions, so we
can remove this check.
Fixes https://github.com/letsencrypt/boulder/issues/7710
Clean up how we handle identifiers throughout the Boulder codebase by
- moving the Identifier protobuf message definition from sa.proto to
core.proto;
- adding support for IP identifier to the "identifier" package;
- renaming the "identifier" package's exported names to be clearer; and
- ensuring we use the identifier package's helper functions everywhere
we can.
This will make future work to actually respect identifier types (such as
in Authorization and Order protobuf messages) simpler and easier to
review.
Part of https://github.com/letsencrypt/boulder/issues/7311
Begin testing on go1.23. To facilitate this, also update /x/net,
golangci-lint, staticcheck, and pebble-challtestsrv to versions which
support go1.23. As a result of these updates, also fix a handful of new
lint findings, mostly regarding passing non-static (i.e. potentially
user-controlled) format strings into Sprintf-style functions.
Additionally, delete one VA unittest that was duplicating the checks
performed by a different VA unittest, but with a context timeout bug
that caused it to break when go1.23 subtly changed DialContext behavior.
Have the WFE ask the RA for authorizations, rather than asking the SA
directly. This extra layer of indirection allows us to filter out
challenges which have been disabled, so that clients don't think they
can attempt challenges that we have disabled.
Also shuffle the order of challenges within the authz objects rendered
by the API. We used to have code which does this at authz creation time,
but of course that was completely ineffectual once we stored the
challenges as just a bitmap in the database.
Update the WFE unit tests to mock RA.GetAuthorization instead of
SA.GetAuthorization2. This includes making the mock more accurate, so
that (e.g.) valid authorizations contain valid challenges, and the
challenges have their correct types (e.g. "http-01" instead of just
"http"). Also update the OTel tracing test to account for the new RPC.
Part of https://github.com/letsencrypt/boulder/issues/5913
- Add feature flag `UseKvLimitsForNewOrder`
- Add feature flag `UseKvLimitsForNewAccount`
- Flush all Redis shards before running integration or unit tests, this
avoids false positives between local testing runs
Fixes#7664
Blocked by #7676
- Check `CertificatesPerDomain` at newOrder and spend at Finalize time.
- Check `CertificatesPerAccountPerDomain` at newOrder and spend at
Finalize time.
- Check `CertificatesPerFQDNSet` at newOrder and spend at Finalize time.
- Fix a bug
in`FailedAuthorizationsPerDomainPerAccountSpendOnlyTransaction()` which
results in failed authorizations being spent for the exact FQDN, not the
eTLD+1.
- Remove redundant "max names" check at transaction construction time
- Enable key-value rate limits in the RA
Find all gRPC fields which represent DNS Names -- sometimes called
"identifier", "hostname", "domain", "identifierValue", or other things
-- and unify their naming. This naming makes it very clear that these
values are strings which may be included in the SAN extension of a
certificate with type dnsName.
As we move towards issuing IP Address certificates, all of these fields
will need to be replaced by fields which carry both an identifier type
and value, not just a single name. This unified naming makes it very
clear which messages and methods need to be updated to support
non-dnsName identifiers.
Part of https://github.com/letsencrypt/boulder/issues/7647
Have the RA's UnpauseAccount gRPC method forward the requested account
ID to the SA's corresponding method, and in turn forward the SA's count
of unpaused identifiers back to the caller in the response.
Changing the response message from emptypb.Empty to a new
rapb.UnpauseAccountResponse is safe, because message names are not
transmitted on the wire, only message field numbers.
While we're here, drastically simplify the wfe_test and sfe_test Mock
RAs, so they don't have to implement methods that aren't actually used
by the tests.
Fixes https://github.com/letsencrypt/boulder/issues/7536
Change the way profiles are configured at the WFE to allow them to be
accompanied by descriptive strings. Augment the construction of the
directory resource's "meta" sub-object to include these profile names
and descriptions.
This config swap is safe, since no Boulder WFE instance is configured
with `CertificateProfileNames` yet.
Fixes https://github.com/letsencrypt/boulder/issues/7602
Requests to the new-nonce endpoint make up about 20% of our WFE log
lines, but they're uninteresting and largely useless for debugging.
Suppress the log event for successful requests to reduce our log volume.
- Rename `NewOrderRequest` field `LimitsExempt` to `IsARIRenewal`
- Introduce a new `NewOrderRequest` field, `IsRenewal`
- Introduce a new (temporary) feature flag, `CheckRenewalExemptionAtWFE`
WFE:
- Perform renewal detection in the WFE when `CheckRenewalExemptionAtWFE`
is set
- Skip (key-value) `NewOrdersPerAccount` and `CertificatesPerDomain`
limit checks when renewal detection indicates the the order is a
renewal.
RA:
- Leave renewal detection in the RA intact
- Skip renewal detection and (legacy) `NewOrdersPerAccount` and
`CertificatesPerDomain` limit checks when `CheckRenewalExemptionAtWFE`
is set and the `NewOrderRequest` indicates that the order is a renewal.
Fixes#7508
Part of #5545
- Shrink the number of public `ratelimits` methods by relocating two
sizeable transaction constructors. Simplify the spend and refund
call-sites in the WFE.
- Spend calls now block instead of being called asynchronously.
In #7530, `wfe.NewOrder` [began constructing a rate limit
transaction](https://github.com/letsencrypt/boulder/pull/7530/files#diff-3f950e720c205ce9fa8dea12c6fd7fd44272c2671f19d0e06962abfbea00d491R2340-R2344)
with a precondition that all names must be lower-cased, however the
actual implementation of the precondition was accidentally overlooked.
This fix corrects that and adds a unit test to prevent a future
regression.
Other changes:
- Only normalized names count towards max names limit
- Only normalized names will be logged in the web.RequestEvent
---------
Co-authored-by: Samantha Frank <hello@entropy.cat>
Add the `ra.UnpauseAccount` which takes an `rapb.UnpauseAccountRequest`
input parameter. The method is just a stub to allow downstream SFE
development to continue. There is relevant ongoing work in the SA which
will eventually reside in this stub method.
Change how goodkey.KeyPolicy keeps track of allowed RSA and ECDSA key
sizes, to make it slightly more flexible while still retaining the very
locked-down allowlist of only 6 acceptable key sizes (RSA 2048, 3076,
and 4092, and ECDSA P256, P384, and P521). Add a new constructor which
takes in a collection of allowed key sizes, so that users of the goodkey
package can customize which keys they accept. Rename the existing
constructor to make it clear that it uses hardcoded default values.
With these new constructors available, make all of the goodkey.KeyPolicy
member fields private, so that a KeyPolicy can only be built via these
constructors.
This allows us to give a user-meaningful error about malformed names
early on, instead of propagating internal errors from the new rate
limiting system.
This moves the well-formedness logic from `WillingToIssue` into a new
function `WellFormedDomainNames`, which calls `ValidDomain` on each name
and combines the errors into suberrors if there is more than one.
`WillingToIssue` now calls `WellFormedDomainNames` to keep the existing
behavior. Additionally, WFE calls `WellFormedDomainNames` before
checking rate limits.
This creates a slight behavior change: If an order contains both
malformed domain names and wellformed but blocked domain names,
suberrors will only be generated for the malformed domain names. This is
reflected in the changes to `TestWillingToIssue_Wildcard`.
Adds a WFE test case for receiving malformed identifiers in a new-order
request.
Follows up on #3323 and #7218Fixes#7526
Some small incidental fixes:
- checkWildcardHostList was checking `pa.blocklist` for `nil` before
accessing `pa.wildcardExactBlocklist`. Fix that.
- move table test for WillingToIssue into a new test case for
WellFormedDomainNames
- move two standalone test cases into the big table test
The core.Challenge.ProvidedKeyAuthorization field is problematic, both
because it is poorly named (which is admittedly easily fixable) and
because it is a field which we never expose to the client yet it is held
on a core type. Deprecate this field, and replace it with a new
vapb.PerformValidationRequest.ExpectedKeyAuthorization field.
Within the VA, this also simplifies the primary logic methods to just
take the expected key authorization, rather than taking a whole (largely
unnecessary) challenge object. This has large but wholly mechanical
knock-on effects on the unit tests.
While we're here, improve the documentation on core.Challenge itself,
and remove Challenge.URI, which was deprecated long ago and is wholly
unused.
Part of https://github.com/letsencrypt/boulder/issues/7514
Everything logged at the error level is implicitly given the audit tag
as well. That's not merited in this case, so downgrade these error logs
to be info logs instead.
Replace "mocks.StorageAuthority" with "sapb.StorageAuthorityClient" in
our test mocks. The improves them by removing implementations of the
methods the tests don't actually need, instead of inheriting lots of
extraneous methods from the huge and cumbersome mocks.StorageAuthority.
This reduces our usage of mocks.StorageAuthority to only the WFE tests
(which create one in the frequently-used setup() function), which will
make refactoring those mocks in the pursuit of
https://github.com/letsencrypt/boulder/issues/7476 much easier.
Part of https://github.com/letsencrypt/boulder/issues/7476
Remove the redis-tls, wfe-tls, and mail-test-srv keys which were
generated by minica and then checked in to the repo. All three are
replaced by the dynamically-generated ipki directory.
Part of https://github.com/letsencrypt/boulder/issues/7476
[draft-ietf-acme-ari-03 section
4.1](https://www.ietf.org/archive/id/draft-ietf-acme-ari-03.html#section-4.1)
states the following indicating that it's the clients responsibility to
add a `/` after the `renewalInfoPath`, not the server.
> Thus the full request url is constructed as follows, where the "||"
operator indicates string concatenation and the renewalInfo url is taken
from the Directory object:
```
url = renewalInfo || '/' || base64url(AKI) || '.' || base64url(Serial)
```
Fixes https://github.com/letsencrypt/boulder/issues/7481
Update the hierarchy which the integration tests auto-generate inside
the ./hierarchy folder to include three intermediates of each key type,
two to be actively loaded and one to be held in reserve. To facilitate
this:
- Update the generation script to loop, rather than hard-coding each
intermediate we want
- Improve the filenames of the generated hierarchy to be more readable
- Replace the WFE's AIA endpoint with a thin aia-test-srv so that we
don't have to have NameIDs hardcoded in our ca.json configs
Having this new hierarchy will make it easier for our integration tests
to validate that new features like "unpredictable issuance" are working
correctly.
Part of https://github.com/letsencrypt/boulder/issues/729
Upgrade from the old go-jose v2.6.1 to the newly minted go-jose v4.0.1.
Cleans up old code now that `jose.ParseSigned` can take a list of
supported signature algorithms.
Fixes https://github.com/letsencrypt/boulder/issues/7390
---------
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
- Parse and validate the `profile` field in `newOrder` requests.
- Pass the `profile` field from `newOrder` calls to the resulting
`RA.NewOrder` call.
- When the client requests a specific profile, ensure that the profile
field is populated in the order returned.
Fixes#7332
Part of #7309
- Update the failed authorizations limit to use 'enum:regId:domain' for
transactions while maintaining 'enum:regId' for overrides.
- Modify the failed authorizations transaction builder to generate a
transaction for each order name.
- Rename the `FailedAuthorizationsPerAccount` enum to
`FailedAuthorizationsPerDomainPerAccount` to align with its corrected
implementation. This change is possible because the limit isn't yet
deployed in staging or production.
Blocks #7346
Part of #5545
Implement draft-ietf-acme-ari-02 changes in WFE newOrder:
- Add a `replaces` field to the newOrder request object
- Ensure that `replaces` values provided by subscribers are vetted
according to the requirements set out in the draft specification
- When a NewOrder request falls inside the suggested RenewalWindow,
exempt from rate limits in the WFE and indicate exemption in the RA
NewOrder request
Part of #7038
Remove three deprecated feature flags which have been removed from all
production configs:
- StoreLintingCertificateInsteadOfPrecertificate
- LeaseCRLShards
- AllowUnrecognizedFeatures
Deprecate three flags which are set to true in all production configs:
- CAAAfterValidation
- AllowNoCommonName
- SHA256SubjectKeyIdentifier
IN-9879 tracked the removal of these flags.
Make NewRegistration more consistent with the implementation in NewOrder
(#7201):
- Construct transactions just once,
- use batched spending instead of multiple spend calls, and
- do not attempt a refund for requests that fail due to RateLimit
errors.
Part of #5545
Rename "IssuerNameID" to just "NameID". Similarly rename the standalone
functions which compute it to better describe their function. Add a
.NameID() directly to issuance.Issuer, so that callers in other packages
don't have to directly access the .Cert member of an Issuer. Finally,
rearrange the code in issuance.go to be sensibly grouped as concerning
NameIDs, Certificates, or Issuers, rather than all mixed up between the
three.
Fixes https://github.com/letsencrypt/boulder/issues/5152
Enable the atomicalign, deepequalerrors, findcall, nilness,
reflectvaluecompare, sortslice, timeformat, and unusedwrite go vet
analyzers, which golangci-lint does not enable by default. Additionally,
enable new go vet analyzers by default as they become available.
The fieldalignment and shadow analyzers remain disabled because they
report so many errors that they should be fixed in a separate PR.
Note that the nilness analyzer appears to have found one very real bug
in tlsalpn.go.
Add support for draft-ietf-acme-ari-02 format alongside the existing
draft-ietf-acme-ari-01 implementation. Both formats are interchangeable.
Fixes#7037
Replace the current three-piece setup (enum of feature variables, map of
feature vars to default values, and autogenerated bidirectional maps of
feature variables to and from strings) with a much simpler one-piece
setup: a single struct with one boolean-typed field per feature. This
preserves the overall structure of the package -- a single global
feature set protected by a mutex, and Set, Reset, and Enabled methods --
although the exact function signatures have all changed somewhat.
The executable config format remains the same, so no deployment changes
are necessary. This change does deprecate the AllowUnrecognizedFeatures
feature, as we cannot tell the json config parser to ignore unknown
field names, but that flag is set to False in all of our deployment
environments already.
Fixes https://github.com/letsencrypt/boulder/issues/6802
Fixes https://github.com/letsencrypt/boulder/issues/5229
- Move default and override limits, and associated methods, out of the
Limiter to new limitRegistry struct, embedded in a new public
TransactionBuilder.
- Export Transaction and add corresponding Transaction constructor
methods for each limit Name, making Limiter and TransactionBuilder the
API for interacting with the ratelimits package.
- Implement batched Spends and Refunds on the Limiter, the new methods
accept a slice of Transactions.
- Add new boolean fields check and spend to Transaction to support more
complicated cases that can arise in batches:
1. the InvalidAuthorizations limit is checked at New Order time in a
batch with many other limits, but should only be spent when an
Authorization is first considered invalid.
2. the CertificatesPerDomain limit is overridden by
CertficatesPerDomainPerAccount, when this is the case, spends of the
CertificatesPerDomain limit should be "best-effort" but NOT deny the
request if capacity is lacking.
- Modify the existing Spend/Refund methods to support
Transaction.check/spend and 0 cost Transactions.
- Make bucketId private and add a constructor for each bucket key format
supported by ratelimits.
- Move domainsForRateLimiting() from the ra.go to ratelimits. This
avoids a circular import issue in ra.go.
Part of #5545
This is a cleanup PR finishing the migration from int64 timestamps to
protobuf `*timestamppb.Timestamps` by removing all usage of the old
int64 fields. In the previous PR
https://github.com/letsencrypt/boulder/pull/7121 all fields were
switched to read from the protobuf timestamppb fields.
Adds a new case to `core.IsAnyNilOrZero` to check various properties of
a `*timestamppb.Timestamp` reducing the visual complexity for receivers.
Fixes https://github.com/letsencrypt/boulder/issues/7060
The `Limiter` API has been adjusted significantly to both improve both
safety and ergonomics and two `Limit` types have been corrected to match
the legacy implementations.
**Safety**
Previously, the key used for looking up limit overrides and for fetching
individual buckets from the key-value store was constructed within the
WFE. This posed a risk: if the key was malformed, the default limit
would still be enforced, but individual overrides would fail to function
properly. This has been addressed by the introduction of a new
`BucketId` type along with a `BucketId` constructor for each `Limit`
type. Each constructor is responsible for producing a well-formed bucket
key which undergoes the very same validation as any potentially matching
override key.
**Ergonomics**
Previously, each of the `Limiter` methods took a `Limit` name, a bucket
identifier, and a cost to be spent/ refunded. To simplify this, each
method now accepts a new `Transaction` type which provides a cost, and
wraps a `BucketId` identifying the specific bucket.
The two changes above, when taken together, make the implementation of
batched rate limit transactions considerably easier, as a batch method
can accept a slice of `Transaction`.
**Limit Corrections**
PR #6947 added all of the existing rate limits which could be made
compatible with the key-value approach. Two of these were improperly
implemented;
- `CertificatesPerDomain` and `CertificatesPerFQDNSet`, were implemented
as
- `CertificatesPerDomainPerAccount` and
`CertificatesPerFQDNSetPerAccount`.
Since we do not actually associate these limits with a particular ACME
account, the `regID` portion of each of their bucket keys has been
removed.
The RequireCommonName feature flag was our only "inverted" feature flag,
which defaulted to true and had to be explicitly set to false. This
inversion can lead to confusion, especially to readers who expect all Go
default values to be zero values. We plan to remove the ability for our
feature flag system to support default-true flags, which the existence
of this flag blocked. Since this flag has not been set in any real
configs, inverting it is easy.
Part of https://github.com/letsencrypt/boulder/issues/6802
* Adds new `google.protobuf.Timestamp` fields to each .proto file where
we had been using `int64` fields as a timestamp.
* Updates relevant gRPC messages to populate the new
`google.protobuf.Timestamp` fields in addition to the old `int64`
timestamp fields.
* Added tests for each `<x>ToPB` and `PBto<x>` functions to ensure that
new fields passed into a gRPC message arrive as intended.
* Removed an unused error return from `PBToCert` and `PBToCertStatus`
and cleaned up each call site.
Built on-top of https://github.com/letsencrypt/boulder/pull/7069
Part 2 of 4 related to
https://github.com/letsencrypt/boulder/issues/7060
Integrate the key-value rate limits from #6947 into the WFE. Rate limits
are backed by the Redis source added in #7016, and use the SRV record
shard discovery added in #7042.
Part of #5545
Fix an issue related to the custom gRPC Picker implementation introduced
in #6618. When a nonce contained a prefix not associated with a known
backend, the Picker would continuously rebuild, re-resolve DNS, and
eventually throw a 500 "Server Error" at RPC timeout. The Picker now
promptly returns a 400 "Bad Nonce" error as expected, in response the
requesting client should retry their request with a fresh nonce.
Additionally:
- WFE unit tests use derived nonces when `"BOULDER_CONFIG_DIR" ==
"test/config-next"`.
- `Balancer.Build()` in "noncebalancer" forces a rebuild until non-zero
backends are available. This matches the
[balancer/roundrobin](d524b40946/balancer/roundrobin/roundrobin.go (L49-L53))
implementation.
- Nonces with no matching backend increment "jose_errors" with label
`"type": "JWSInvalidNonce"` and "nonce_no_backend_found".
- Nonces of incorrect length are now rejected at the WFE and increment
"jose_errors" with label `"type": "JWSMalformedNonce"` instead of
`"type": "JWSInvalidNonce"`.
- Nonces not encoded as base64url are now rejected at the WFE and
increment "jose_errors" with label `"type": "JWSMalformedNonce"` instead
of `"type": "JWSInvalidNonce"`.
Fixes#6969
Part of #6974
This was introduced early in Boulder development when we had the concept
of a "short serial" (monotonically increasing) which would be prepended
to random bytes to form the full serial. We wanted to specially report
the case that there were duplicates of a given short serial since it
meant a problem with our monotonicity.
We've long since abandoned that idea, and also this code can't be
exercised because sa.SelectCertificate does a LIMIT 1 anyhow.
This can take two values (typically the return values of a two-value
function) and panic if the error is non-nil, returning the interesting
value. This is particularly useful for cases where we statically know
the call will succeed.
Thanks to @mcpherrinm for the idea!
The contacts field of an account can be very verbose, and is irrelevant
to the vast majority -- e.g. creating orders, validating challenges, and
downloading certificates -- of requests made by an account. To reduce
the length of our WFE log lines, remove the Contacts field from all
logs. When we actually need it, we can get it from the database.
Also remove the RequestEvent.TLS field, which is unused.
In WFE, we do a goodkey check when validating a self-authenticated POST
(i.e. when creating an account). For a while, that was a purely local
check, looking at a list of bad keys or bad moduluses, or checking for
factorability. At some point we also added a backend check, querying the
SA to see if a key was blocked. However, we did not update this one code
path to distinguish "bad key" from "timeout querying SA." That meant
that sometimes we would give a badPublicKey Problem Document when we
should have given an internalServerError.
Related:
https://github.com/letsencrypt/boulder/issues/6795#issuecomment-1574217398
Define a `bJSONWebSignature` struct which embeds a
`*jose.JSONWebSignature`. The only method that can produce a
`bJSONWebSignature` is `wfe.parseJWS` so that we can ensure
safety/sanity checks are performed on the incoming data. Restricts
several methods and functions to take a `jose.Header` as an input
parameter, rather than a full JWS.
Fixes https://github.com/letsencrypt/boulder/issues/5676.
Remove the remaining divergences from RFC8555 regarding what error types
we use in certain situations. Specifically:
- use "invalidContact" instead of "invalidEmail";
- use "unsupportedContact" for contact addresses that use a protocol
other than "mailto:"; and
- use "unsupportedIdentifier" for identifiers that specify a type other
than "dns".
This PR adds a new configuration block specifically for the otelhttp
instrumentation. This block is separate from the existing
"opentelemetry" configuration, and is only relevant when using otelhttp
instrumentation. It does not share any codepath with the existing
configuration, so it is at the top level to indicate which services it
applies to.
There's a bit of plumbing new configuration through. I've adopted the
measured_http package to also set up opentelemetry instead of just
metrics, which should hopefully allow any future changes to be smaller
(just config & there) and more consistent between the wfe2 and ocsp
responder.
There's one option here now, which disables setting
[otelhttp.WithPublicEndpoint](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp#WithPublicEndpoint).
This option is designed to do exactly what we want: Don't accept
incoming spans as parents of the new span created in the server.
Previously we had a setting to disable parent-based sampling to help
with this problem, which doesn't really make sense anymore, so let's
just remove it and simplify that setup path. The default of "false" is
designed to be the safe option. It's set to True in the test/ configs
for integration tests that use traces, and I expect we'll likely set it
true in production eventually once the LBs are configured to handle
tracing themselves.
Fixes#6851
Make minor, non-user-visible changes to how we structure the probs
package. Notably:
- Add new problem types for UnsupportedContact and
UnsupportedIdentifier, which are specified by RFC8555 and which we will
use in the future, but haven't been using historically.
- Sort the problem types and constructor functions to match the
(alphabetical) order given in RFC8555.
- Rename some of the constructor functions to better match their
underlying problem types (e.g. "TLSError" to just "TLS").
- Replace the redundant ProblemDetailsToStatusCode function with simply
always returning a 500 if we haven't properly set the problem's
HTTPStatus.
- Remove the ability to use either the V1 or V2 error namespace prefix;
always use the proper RFC namespace prefix.
Update the document number to the latest version, and remove the /get/
prefix since it now supports both the GET and POST portions of the spec.
Also update one piece of tooling to properly get the ARI URL from the
directory, rather than hard-coding it.
Add a new shared config stanza which all boulder components can use to
configure their Open Telemetry tracing. This allows components to
specify where their traces should be sent, what their sampling ratio
should be, and whether or not they should respect their parent's
sampling decisions (so that web front-ends can ignore sampling info
coming from outside our infrastructure). It's likely we'll need to
evolve this configuration over time, but this is a good starting point.
Add basic Open Telemetry setup to our existing cmd.StatsAndLogging
helper, so that it gets initialized at the same time as our other
observability helpers. This sets certain default fields on all
traces/spans generated by the service. Currently these include the
service name, the service version, and information about the telemetry
SDK itself. In the future we'll likely augment this with information
about the host and process.
Finally, add instrumentation for the HTTP servers and grpc
clients/servers. This gives us a starting point of being able to monitor
Boulder, but is fairly minimal as this PR is already somewhat unwieldy:
It's really only enough to understand that everything is wired up
properly in the configuration. In subsequent work we'll enhance those
spans with more data, and add more spans for things not automatically
traced here.
Fixes https://github.com/letsencrypt/boulder/issues/6361
---------
Co-authored-by: Aaron Gable <aaron@aarongable.com>
When sending an ARI response, write the Retry-After header before
writing the JSON response body. This is necessary because
http.ResponseWriter implicitly calls WriteHeader whenever Write is
called, flushing all headers to the network and preventing any
additional headers from being written. Unfortunately, the unittests use
httptest.ResponseRecorder, which doesn't seem to enforce this invariant
(it's happy to report headers which were written after the body). Add a
header check to the integration tests, to make up for this deficiency.
Change the SetCommonName flag, introduced in #6706, to
RequireCommonName. Rather than having the flag control both whether or
not a name is hoisted from the SANs into the CN *and* whether or not the
CA is willing to issue certs with no CN, this updated flag now only
controls the latter. By default, the new flag is true, and continues our
current behavior of failing issuance if we cannot set a CN in the cert.
When the flag is set to false, then we are willing to issue certificates
for which the CSR contains no CN and there is no SAN short enough to be
hoisted into the CN field.
When we have rolled out this change, we can move on to the next flag in
this series: HoistCommonName, which will control whether or not a SAN is
hoisted at all, effectively giving the CSRs (and therefore the clients)
full control over whether their certificate contains a SAN.
This change is safe because no environment explicitly sets the
SetCommonName flag to false yet.
Fixes#5112
Remove tracing using Beeline from Boulder. The only remnant left behind
is the deprecated configuration, to ensure deployability.
We had previously planned to swap in OpenTelemetry in a single PR, but
that adds significant churn in a single change, so we're doing this as
multiple steps that will each be significantly easier to reason about
and review.
Part of #6361
For simple 404s, there's no need to log an InternalError in addition to
the user-facing error, so pass `nil` to sendError as the internalError
parameter. This cleans up Certificate, GetOrder, and FinalizeOrder; all
the places I could find that checked for `NotFound` and also logged an
unnecessary InternalError.
I also removed a redundant an unnecessary error wrapping, and a
reference to "short serial" which is not a concept we have anymore.
This was mostly unused. The only caller was orphan-finder, which used it
to determine if a certificate was already in the database. But this is
not particularly important functionality, so I've removed it.
[RFC 8555 section
7.4](https://www.rfc-editor.org/rfc/rfc8555.html#section-7.4) states
regarding Orders in the "processing" state:
> "processing": The certificate is being issued. Send a POST-as-GET
> request after the time given in the Retry-After header field of
> the response, if any.
Add a Retry-After header when serving Order objects that are in the
"processing" state. This may help control clients which implement Order
polling but without any built-in backoff. The retry interval is
hard-coded to be 3s, slightly above our current 99th percentile Finalize
latency.
Remove the `MandatoryPOSTasGET` flag from the WFE2.
Update the ACMEv2 divergence doc to note that neither staging nor
production use MandatoryPOSTasGET.
Fixes#6582.
Also remove CSRDNSNames, CSRIPAddresses and CSREmailAddresses.
And add a new log field "DNSNames", for use in new-order, finalize, and
revoke requests.
Add a "RevocationReason" field in the "Extra" section for revoke
requests.
Give ARI improved error messages when no request path is specified and
when parsing of the request path blob fails.
Also, add a tool which can be used to quickly generate ARI requests and
print their results, to make manual spot-checking easier.
Fixes#6629
Assign nonce prefixes for each nonce-service by taking the first eight
characters of the the base64url encoded HMAC-SHA256 hash of the RPC
listening address using a provided key. The provided key must be same
across all boulder-wfe and nonce-service instances.
- Add a custom `grpc-go` load balancer implementation (`nonce`) which
can route nonce redemption RPC messages by matching the prefix to the
derived prefix of the nonce-service instance which created it.
- Modify the RPC client constructor to allow the operator to override
the default load balancer implementation (`round_robin`).
- Modify the `srv` RPC resolver to accept a comma separated list of
targets to be resolved.
- Remove unused nonce-service `-prefix` flag.
Fixes#6404
Originally, WFEs had a built-in nonce service. Then we added a "remote
nonce service" via gRPC, but we kept a fallback path for when the remote
nonce service was not configured, to use a built-in nonce service. This
PR removes that fallback path.
Since the fallback path was relied on by the unittests, this also
refactors the unittests to use a gRPC-style nonce service (but in-memory
for the unittests).
Fixes#6530
Update our implementation of ARI to return a renewal window entirely in
the past (i.e., suggesting immediate renewal) if the certificate in
question has been revoked for any reason. This will allow clients which
implement ARI to discover that they need to replace their certificate
without having to query OCSP directly, especially as we move into a
future where OCSP is mostly supplanted by aggregated CRLs.
Fixes#6503
In the WFE, ocsp-responder, and crl-updater, switch from using
StorageAuthorityClients to StorageAuthorityReadOnlyClients. This ensures
that these services cannot call methods which write to our database.
Fixes#6454
When a client cancels their HTTP request, that propagates into
gRPC-level cancellation. But we don't want those canceled RPCs to show
up as 500s, we want them to show up as 408s. We do that by producing a
special ProblemDetails at the gRPC level. However, for that trick to
work, we need to make sure errors from RPC methods get passed through
web.ProblemDetailsForError. There were some places where we didn't do
this and instead created a from-scratch ProblemDetails. This resulted in
spurious 500s.
My methodology was to look at every method call on each of the WFE's
fields that represents a gRPC backend: `ra`, `sa`, `accountGetter`,
`nonceService`, `remoteNonceServe`. If the error handling for that call
did not use web.ProblemDetailsForError, I changed it to use that.
Fixes#6524
In draft because still needs tests.
Clean up several spots where we were behaving differently on
go1.18 and go1.19, now that we're using go1.19 everywhere. Also
re-enable the lint and generate tests, and fix the various places where
the two versions disagreed on how comments should be formatted.
Also clean up the OldTLS codepaths, now that both go1.19 and our
own feature flags have forbidden TLS < 1.2 everywhere.
Fixes#6011
- Add a new field, `RetryAfter` to `BoulderError`s
- Add logic to wrap/unwrap the value of the `RetryAfter` field to our gRPC error
interceptor
- Plumb `RetryAfter` for `DuplicateCertificateError` emitted by RA to the WFE
client response header
Part of #6256
Suggest that subscribers with certificates impacted by an ongoing revocation
incident renew immediately.
- Make SA method `IncidentsForSerial` a callable RPC
Resolves#6282
Update our ACME Renewal Info implementation to parse
the CertID-based request format specified in the current
version of the draft specification.
Part of #6033