Remove the .Error() method from probs.ProblemDetails, so that it can no
longer be returned from functions which return an error. Update various
call sites to use the .String() method to get a textual representation
of the problem instead. Simplify ProblemDetailsForError to not
special-case and pass-through ProblemDetails, since they are no longer a
valid input to that function.
This reduces instances of "boxed nil" bugs, and paves the way for all of
the WFE methods to be refactored to simply return errors instead of
writing them directly into the response object.
Part of https://github.com/letsencrypt/boulder/issues/4980
Change all of the helper methods and functions in verify.go to return an
`error` instead of a `probs.ProblemDetails`. Add a few new types to our
errors package, and support for those types in ProblemDetailsForError,
to maintain the same public-facing error types. Update the tests to
check for specific errors instead of specific problems.
This is a building block towards making the probs.ProblemDetails type
not implement the Error interface, and only be used when rendering
errors to the user (i.e. not within Boulder logic itself).
Part of https://github.com/letsencrypt/boulder/issues/4980
Add `identifier` fields, which will soon replace the `dnsName` fields,
to:
- `corepb.Authorization`
- `corepb.Order`
- `rapb.NewOrderRequest`
- `sapb.CountFQDNSetsRequest`
- `sapb.CountInvalidAuthorizationsRequest`
- `sapb.FQDNSetExistsRequest`
- `sapb.GetAuthorizationsRequest`
- `sapb.GetOrderForNamesRequest`
- `sapb.GetValidAuthorizationsRequest`
- `sapb.NewOrderRequest`
Populate these `identifier` fields in every function that creates
instances of these structs.
Use these `identifier` fields instead of `dnsName` fields (at least
preferentially) in every function that uses these structs. When crossing
component boundaries, don't assume they'll be present, for
deployability's sake.
Deployability note: Mismatched `cert-checker` and `sa` versions will be
incompatible because of a type change in the arguments to
`sa.SelectAuthzsMatchingIssuance`.
Part of #7311
Return "alreadyReplaced" in addition to HTTP 409 Conflict to signal that
an order indicates that it replaces a certificate which already has a
replacement order.
The current profiles draft
(https://datatracker.ietf.org/doc/draft-aaron-acme-profiles/00/) says:
> If a server receives a request to finalize an Order whose profile the
> CA is no longer willing to issue under, it MUST respond with a
> problem document of type "invalidProfile". The server SHOULD attempt
> to avoid this situation, e.g. by ensuring that all Orders for a
> profile have expired before it stops issuing under that profile.
Add types and helper functions representing this new error type to the
berrors, probs, and web packages. Update the WFE code which rejects
new-order requests with unrecognized profiles to use these new types,
and add similar code to the WFE's finalize path. Update the unit and
integration tests to reflect the fact that we now configure at least one
profile in both Staging and Prod (tracked in IN-10574).
In the WFE, store the User-Agent in a `context.Context` object. In our
gRPC interceptors, pass that field in a Metadata header, and re-add it
to `Context` on the server side.
Add a test in the gRPC interceptors that User-Agent is properly
propagated.
Note: this adds a new `setup()` function for the gRPC tests that is
currently only used by the new test. I'll upload another PR shortly that
expands the use of that function to more tests.
Fixes https://github.com/letsencrypt/boulder/issues/7792
This reverts commit 242d746040 (#7796)
We want to make this change, but it carries some risk that we'd prefer
not to take over the holiday. We'd also like to keep `main` in a state
where it would be reasonable to deploy (even if, in practice, any
over-the-holiday deploy would be a hotfix, not a direct tag from
`main`).
For batch operations, include the operation and the number of keys in
the error message. This should help diagnose whether we are getting `i/o
timeout` errors disproportionately for larger requests, or for certain
operations.
Also, make the ignored errors part of the overall WFE request logs,
which allows us to get additional context, like whether certain
requesters or domain names are getting disproportionately many errors.
Related to #7846.
Clean up how we handle identifiers throughout the Boulder codebase by
- moving the Identifier protobuf message definition from sa.proto to
core.proto;
- adding support for IP identifier to the "identifier" package;
- renaming the "identifier" package's exported names to be clearer; and
- ensuring we use the identifier package's helper functions everywhere
we can.
This will make future work to actually respect identifier types (such as
in Authorization and Order protobuf messages) simpler and easier to
review.
Part of https://github.com/letsencrypt/boulder/issues/7311
Begin testing on go1.23. To facilitate this, also update /x/net,
golangci-lint, staticcheck, and pebble-challtestsrv to versions which
support go1.23. As a result of these updates, also fix a handful of new
lint findings, mostly regarding passing non-static (i.e. potentially
user-controlled) format strings into Sprintf-style functions.
Additionally, delete one VA unittest that was duplicating the checks
performed by a different VA unittest, but with a context timeout bug
that caused it to break when go1.23 subtly changed DialContext behavior.
Adds a new boulder component named `sfe` aka the Self-service FrontEnd
which is dedicated to non-ACME related Subscriber functions. This change
implements one such function which is a web interface and handlers for
account unpausing.
When paused, an ACME client receives a log line URL with a JWT parameter
from the WFE. For the observant Subscriber, manually clicking the link
opens their web browser and displays a page with a pre-filled HTML form.
Upon clicking the form button, the SFE sends an HTTP POST back to itself
and either validates the JWT and issues an RA gRPC request to unpause
the account, or returns an HTML error page.
The SFE and WFE should share a 32 byte seed value e.g. the output of
`openssl rand -hex 16` which will be used as a go-jose symmetric signer
using the HS256 algorithm. The SFE will check various [RFC
7519](https://datatracker.ietf.org/doc/html/rfc7519) claims on the JWT
such as the `iss`, `aud`, `nbf`, `exp`, `iat`, and a custom `apiVersion`
claim.
The SFE should not yet be relied upon or deployed to staging/production
environments. It is very much a work in progress, but this change is big
enough as-is.
Related to https://github.com/letsencrypt/boulder/issues/7406
Part of https://github.com/letsencrypt/boulder/issues/7499
Upgrade from the old go-jose v2.6.1 to the newly minted go-jose v4.0.1.
Cleans up old code now that `jose.ParseSigned` can take a list of
supported signature algorithms.
Fixes https://github.com/letsencrypt/boulder/issues/7390
---------
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
This new standard library method returns a context with all of the
original metadata (e.g. tracing spans) still attached, but which will
not be canceled by any cancel funcs, deadlines, or timeouts set on the
parent context. We do this manually in a few places to prevent client
cancellations (usually disconnects) from disrupting our work, so this
just makes that code slightly simpler.
Fixes https://github.com/letsencrypt/boulder/issues/5506
Run staticcheck as a standalone binary rather than as a library via
golangci-lint. From the golangci-lint help out,
> staticcheck (megacheck): It's a set of rules from staticcheck. It's
not the same thing as the staticcheck binary. The author of staticcheck
doesn't support or approve the use of staticcheck as a library inside
golangci-lint.
We decided to disable ST1000 which warns about incorrect or missing
package comments.
For SA4011, I chose to change the semantics[1] of the for loop rather
than ignoring the SA4011 lint for that line.
Fixes https://github.com/letsencrypt/boulder/issues/6988
1. https://go.dev/ref/spec#Continue_statements
The contacts field of an account can be very verbose, and is irrelevant
to the vast majority -- e.g. creating orders, validating challenges, and
downloading certificates -- of requests made by an account. To reduce
the length of our WFE log lines, remove the Contacts field from all
logs. When we actually need it, we can get it from the database.
Also remove the RequestEvent.TLS field, which is unused.
Remove the remaining divergences from RFC8555 regarding what error types
we use in certain situations. Specifically:
- use "invalidContact" instead of "invalidEmail";
- use "unsupportedContact" for contact addresses that use a protocol
other than "mailto:"; and
- use "unsupportedIdentifier" for identifiers that specify a type other
than "dns".
Make minor, non-user-visible changes to how we structure the probs
package. Notably:
- Add new problem types for UnsupportedContact and
UnsupportedIdentifier, which are specified by RFC8555 and which we will
use in the future, but haven't been using historically.
- Sort the problem types and constructor functions to match the
(alphabetical) order given in RFC8555.
- Rename some of the constructor functions to better match their
underlying problem types (e.g. "TLSError" to just "TLS").
- Replace the redundant ProblemDetailsToStatusCode function with simply
always returning a 500 if we haven't properly set the problem's
HTTPStatus.
- Remove the ability to use either the V1 or V2 error namespace prefix;
always use the proper RFC namespace prefix.
Add a new shared config stanza which all boulder components can use to
configure their Open Telemetry tracing. This allows components to
specify where their traces should be sent, what their sampling ratio
should be, and whether or not they should respect their parent's
sampling decisions (so that web front-ends can ignore sampling info
coming from outside our infrastructure). It's likely we'll need to
evolve this configuration over time, but this is a good starting point.
Add basic Open Telemetry setup to our existing cmd.StatsAndLogging
helper, so that it gets initialized at the same time as our other
observability helpers. This sets certain default fields on all
traces/spans generated by the service. Currently these include the
service name, the service version, and information about the telemetry
SDK itself. In the future we'll likely augment this with information
about the host and process.
Finally, add instrumentation for the HTTP servers and grpc
clients/servers. This gives us a starting point of being able to monitor
Boulder, but is fairly minimal as this PR is already somewhat unwieldy:
It's really only enough to understand that everything is wired up
properly in the configuration. In subsequent work we'll enhance those
spans with more data, and add more spans for things not automatically
traced here.
Fixes https://github.com/letsencrypt/boulder/issues/6361
---------
Co-authored-by: Aaron Gable <aaron@aarongable.com>
Remove tracing using Beeline from Boulder. The only remnant left behind
is the deprecated configuration, to ensure deployability.
We had previously planned to swap in OpenTelemetry in a single PR, but
that adds significant churn in a single change, so we're doing this as
multiple steps that will each be significantly easier to reason about
and review.
Part of #6361
Also remove CSRDNSNames, CSRIPAddresses and CSREmailAddresses.
And add a new log field "DNSNames", for use in new-order, finalize, and
revoke requests.
Add a "RevocationReason" field in the "Extra" section for revoke
requests.
- Add a new field, `RetryAfter` to `BoulderError`s
- Add logic to wrap/unwrap the value of the `RetryAfter` field to our gRPC error
interceptor
- Plumb `RetryAfter` for `DuplicateCertificateError` emitted by RA to the WFE
client response header
Part of #6256
Run the Boulder unit and integration tests with go1.19.
In addition, make a few small changes to allow both sets of
tests to run side-by-side. Mark a few tests, including our lints
and generate checks, as go1.18-only. Reformat a few doc
comments, particularly lists, to abide by go1.19's stricter gofmt.
Causes #6275
The iotuil package has been deprecated since go1.16; the various
functions it provided now exist in the os and io packages. Replace all
instances of ioutil with either io or os, as appropriate.
The WFE logs internal errors as part of its request logging. These
errors are mostly passed along from other components, particularly the
SA, and spike whenever the database struggles with availability.
Since these are logged at error level, they mask other, potentially more
useful error logs. Remove these error logs in preference for the
info-level request logs, plus metrics that indicate when we are serving
a large proportion of 500s.
Add functionality to the ubiquitous RequestEvent (aka logEvent) to
allow handlers to suppress the final log line that is printed when
a non-500 response is being sent.
Use this functionality to suppress logging GET requests for the /directory
endpoint. We can expand this in the future to quiet other logs that
are not helpful for metrics or analysis.
Fixes#6094
These new linters are almost all part of golangci-lint's collection
of default linters, that would all be running if we weren't setting
`disable-all: true`. By adding them, we now have parity with the
default configuration, as well as the additional linters we like.
Adds the following linters:
* unconvert
* deadcode
* structcheck
* typecheck
* varcheck
* wastedassign
This adds three features flags: SHA1CSRs, OldTLSOutbound, and
OldTLSInbound. Each controls the behavior of an upcoming deprecation
(except OldTLSInbound, which isn't yet scheduled for a deprecation
but will be soon). Note that these feature flags take advantage of
`features`' default values, so they can default to "true" (that is, each
of these features is enabled by default), and we set them to "false"
in the config JSON to turn them off when the time comes.
The unittest for OldTLSOutbound requires that `example.com` resolves
to 127.0.0.1. This is because there's logic in the VA that checks
that redirected-to hosts end in an IANA TLD. The unittest relies on
redirecting, and we can't use e.g. `localhost` in it because of that
TLD check, so we use example.com.
Fixes#6036 and #6037
Simplify the WFE `RevokeCertificate` API method in three ways:
- Remove most of the logic checking if the requester is authorized to
revoke the certificate in question (based on who is making the
request, what authorizations they have, and what reason they're
requesting). That checking is now done by the RA. Instead, simply
verify that the JWS is authenticated.
- Remove the hard-to-read `authorizedToRevoke` callbacks, and make the
`revokeCertBySubscriberKey` (nee `revokeCertByKeyID`) and
`revokeCertByCertKey` (nee `revokeCertByJWK`) helpers much more
straight-line in their execution logic.
- Call the RA's new `RevokeCertByApplicant` and `RevokeCertByKey` gRPC
methods, rather than the deprecated `RevokeCertificateWithReg`.
This change, without any flag flips, should be invisible to the
end-user. It will slightly change some of our log message formats.
However, by now relying on the new RA gRPC revocation methods, this
change allows us to change our revocation policies by enabling the
`AllowDoubleRevocation` and `MozRevocationReasons` feature flags, which
affect the behavior of those new helpers.
Fixes#5936
Add `stylecheck` to our list of lints, since it got separated out from
`staticcheck`. Fix the way we configure both to be clearer and not
rely on regexes.
Additionally fix a number of easy-to-change `staticcheck` and
`stylecheck` violations, allowing us to reduce our number of ignored
checks.
Part of #5681
Add Honeycomb tracing to all Boulder components which act as
HTTP servers, gRPC servers, or gRPC clients. Add many values
which we currently emit to logs to the trace spans. Add a way to
configure the Honeycomb integration to our config files, and by
default configure all of our tests to "mute" (send nothing).
Followup changes will refine the configuration, attempt to reduce
the new dependency load, and introduce better sampling.
Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218
The `http.Request` object can already have a context associated
with it. If it does, preserve that context rather than creating a new
one. If it doesn't, create a new `context.Background` instead.
This flag is now enabled in Let's Encrypt staging/prod.
This change deprecates the flag and prepares it for deletion in a future
change. It can then be removed once no staging/prod configs reference the
flag.
Fixes#5236
This error class is only used in one instance, and when returned to
the user it is transformed into a `probs.Malformed` anyway. It does
more harm than good to keep this one-off BoulderError around, as
it introduces confusion about what sorts of errors we expose to the
client.
Fixes#5167
errors.As checks for a specific error in a wrapped error chain
(see https://golang.org/pkg/errors/#As) as opposed to asserting
that an error is of a specific type.
Part of #5010
Currently 99.99% of RSA keys we see in certificates at Let's Encrypt are
either 2048, 3072, or 4096 bits, but we support every 8 bit increment
between 2048 and 4096. Supporting these uncommon key sizes opens us up to
having to block much larger ranges of keys when dealing with something
like the Debian weak keys incident. Instead we should just reduce the
set of key sizes we support down to what people actually use.
Fixes#4835.
In #3708, we added formatters for the the convenience methods in the
`probs` package.
However, in #4783, @alexzorin pointed out that we were incorrectly
passing an error message through fmt.Sprintf as the format parameter
rather than as a value parameter.
I proposed a fix in #4784, but during code review we concluded that the
underlying problem was the pattern of using format-style functions that
don't have some variant of printf in the name. That makes this wrong:
`probs.DNS(err.Error())`, and this right: `probs.DNS("%s", err)`. Since
that's an easy mistake to make and a hard one to spot during code review,
we're going to stop using this particular pattern and call `fmt.Sprintf`
directly.
This PR reverts #3708 and adds some `fmt.Sprintf` where needed.