Add `identifier` fields, which will soon replace the `dnsName` fields,
to:
- `corepb.Authorization`
- `corepb.Order`
- `rapb.NewOrderRequest`
- `sapb.CountFQDNSetsRequest`
- `sapb.CountInvalidAuthorizationsRequest`
- `sapb.FQDNSetExistsRequest`
- `sapb.GetAuthorizationsRequest`
- `sapb.GetOrderForNamesRequest`
- `sapb.GetValidAuthorizationsRequest`
- `sapb.NewOrderRequest`
Populate these `identifier` fields in every function that creates
instances of these structs.
Use these `identifier` fields instead of `dnsName` fields (at least
preferentially) in every function that uses these structs. When crossing
component boundaries, don't assume they'll be present, for
deployability's sake.
Deployability note: Mismatched `cert-checker` and `sa` versions will be
incompatible because of a type change in the arguments to
`sa.SelectAuthzsMatchingIssuance`.
Part of #7311
Update from go1.23.1 to go1.23.6 for our primary CI and release builds.
This brings in a few security fixes that aren't directly relevant to us.
Add go1.24.0 to our matrix of CI and release versions, to prepare for
switching to this next major version in prod.
Add "certificateProfileName" to the model used to insert new authz2 rows
and to the list of column names read when retrieving rows from the
authz2 table. Add support for this column to the functions which convert
to and from authz2 model types.
Add support for the profile field to core types so that it can be
returned by the SA.
Fixes https://github.com/letsencrypt/boulder/issues/7955
This uses a pattern that is new to our tests. setup accepts a variadic
list of options, and uses a type switch to make use of those options
during setup. This allows us to pass setup only the options that are
relevant to any given test case, leaving the rest to sensible defaults.
Part of #7920
There will be a followup removing the remaining places that set
`contactsPresent`.
---------
Co-authored-by: Jacob Hoffman-Andrews <jsha+github@letsencrypt.org>
In the WFE, store the User-Agent in a `context.Context` object. In our
gRPC interceptors, pass that field in a Metadata header, and re-add it
to `Context` on the server side.
Add a test in the gRPC interceptors that User-Agent is properly
propagated.
Note: this adds a new `setup()` function for the gRPC tests that is
currently only used by the new test. I'll upload another PR shortly that
expands the use of that function to more tests.
Fixes https://github.com/letsencrypt/boulder/issues/7792
- Make performRemoteValidation a more generic function that returns a
new remoteResult interface
- Modify the return value of IsCAAValid and PerformValidation to satisfy
the remoteResult interface
- Include compile time checks and tests that pass an arbitrary operation
- Remove Perspective and RIR from ValidationRecords
- Make ValidationResultToPB Perspective and RIR aware
- Update comment for VA.PerformValidation
- Make verificationRequestEvent more like doDCVAuditLog
- Update language used in problems created by performRemoteValidation to
be more like doRemoteDCV.
Add a new WFE & nonce config field, `NonceHMACKey`, which uses the new
`cmd.HMACKeyConfig` type. Deprecate the `NoncePrefixKey` config field.
Generalize the error message when validating `HMACKeyConfig` in
`config`.
Remove the deprecated `UseDerivablePrefix` config field, which is no
longer used anywhere.
Part of #7632
Clean up how we handle identifiers throughout the Boulder codebase by
- moving the Identifier protobuf message definition from sa.proto to
core.proto;
- adding support for IP identifier to the "identifier" package;
- renaming the "identifier" package's exported names to be clearer; and
- ensuring we use the identifier package's helper functions everywhere
we can.
This will make future work to actually respect identifier types (such as
in Authorization and Order protobuf messages) simpler and easier to
review.
Part of https://github.com/letsencrypt/boulder/issues/7311
Have our gRPC server interceptor check for excessive clock skew between
its own clock and gRPC client clocks. Do this by taking advantage of the
client request timestamp that most clients already supply for the
purpose of measuring cross-service latency. If the included timestamp is
more than 10 minutes from the gRPC server's local time, immediately
error out.
To keep the integration tests -- which heavily rely on clock
manipulation -- working, use build tags to disable this behavior during
integration testing.
Fixes https://github.com/letsencrypt/boulder/issues/7684
Begin testing on go1.23. To facilitate this, also update /x/net,
golangci-lint, staticcheck, and pebble-challtestsrv to versions which
support go1.23. As a result of these updates, also fix a handful of new
lint findings, mostly regarding passing non-static (i.e. potentially
user-controlled) format strings into Sprintf-style functions.
Additionally, delete one VA unittest that was duplicating the checks
performed by a different VA unittest, but with a context timeout bug
that caused it to break when go1.23 subtly changed DialContext behavior.
Find all gRPC fields which represent DNS Names -- sometimes called
"identifier", "hostname", "domain", "identifierValue", or other things
-- and unify their naming. This naming makes it very clear that these
values are strings which may be included in the SAN extension of a
certificate with type dnsName.
As we move towards issuing IP Address certificates, all of these fields
will need to be replaced by fields which carry both an identifier type
and value, not just a single name. This unified naming makes it very
clear which messages and methods need to be updated to support
non-dnsName identifiers.
Part of https://github.com/letsencrypt/boulder/issues/7647
Replace all of Boulder's usage of the Go stdlib "math/rand" package with
the newer "math/rand/v2" package which first became available in go1.22.
This package has an improved API and faster performance across the
board.
See https://go.dev/blog/randv2 and https://go.dev/blog/chacha8rand for
details.
Simplify SA.GetValidOrderAuthorizations2 so that it no longer conditions
the query on the status, expiry, or registration ID of the authorization
rows. This gives the query much better performance, because it no longer
tries to use an overly-large index, and fall back to large row-scans
when the query planner decides the index is too large.
While we're here, also improve the return type of
GetValidOrderAuthorizations2, so that instead of returning a map of
names to authorizations, it simply returns a list of authzs. This both
reduces the size of the gRPC message (once the old map is fully
removed), and improves its correctness because we cannot count on names
to be unique across multiple identifier types.
Finally, improve the RA code which calls SA.GetValidOrderAuthorizations2
to handle this improved return type, to make fewer assumptions about
identifier types, and to separate static authorization-checking from CAA
rechecking.
Fixes https://github.com/letsencrypt/boulder/issues/7645
Changes the default grpc client/server histogram buckets from the
defaults to better track the long tail of slow requests. Removes `.005`
and `.25` granularity in favor of adding the larger values of `45` and `90`
to avoid changing the cardinality.
```
# Before, the default prometheus buckets
[]float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10}
# After
[]float64{.01, .025, .05, .1, .5, 1, 2.5, 5, 10, 45, 90}
```
Fixes https://github.com/letsencrypt/boulder/issues/6384
Directly update:
- go.opentelemetry.io/otel/* from v1.26.0 to v1.27.0
- go.opentelemetry.io/contrib/* from v0.51.0 to v0.52.0
Indirectly update:
- google.golang.org/protobuf from v1.33.0 to v1.34.0
This update breaks some of our existing otel grpc interceptors, but in
return allows us to use the newer grpc StatsHandler mechanism, while
still filtering out health-check requests.
Fixes https://github.com/letsencrypt/boulder/issues/7235
Update the version of protoc-gen-go-grpc that we use to generate Go gRPC
code from our proto files, and update the versions of other gRPC tools
and libraries that we use to match. Turn on the new
`use_generic_streams` code generation flag to change how
protoc-gen-go-grpc generates implementations of our streaming methods,
from creating a wholly independent implementation for every stream to
using shared generic implementations.
Take advantage of this code-sharing to remove our SA "wrapper" methods,
now that they have truly the same signature as the SARO methods which
they wrap. Also remove all references to the old-style stream names
(e.g. foopb.FooService_BarMethodClient) and replace them with the new
underlying generic names, for the sake of consistency. Finally, also
remove a few custom stream test mocks, replacing them with the generic
mocks.ServerStreamClient.
Note that this PR does not change the names in //mocks/sa.go, to avoid
conflicts with work happening in the pursuit of
https://github.com/letsencrypt/boulder/issues/7476. Note also that this
PR updates the version of protoc-gen-go-grpc that we use to a specific
commit. This is because, although a new release of grpc-go itself has
been cut, the codegen binary is a separate Go module with its own
releases, and it hasn't had a new release cut yet. Tracking for that is
in https://github.com/grpc/grpc-go/issues/7030.
The summary here is:
- Move test/cert-ceremonies to test/certs
- Move .hierarchy (generated by the above) to test/certs/webpki
- Remove our mapping of .hierarchy to /hierarchy inside docker
- Move test/grpc-creds to test/certs/ipki
- Unify the generation of both test/certs/webpki and test/certs/ipki
into a single script at test/certs/generate.sh
- Make that script the entrypoint of a new docker compose service
- Have t.sh and tn.sh invoke that service to ensure keys and certs are
created before tests run
No production changes are necessary, the config changes here are just
for testing purposes.
Part of https://github.com/letsencrypt/boulder/issues/7476
Set the minimum TLS version used for communication with gRPC, Redis,
and Unbound to 1.3. Also remove deprecated `SecurityVersion` setting in
`clientTransportCredentials` and `serverTransportCredentials`, as
grpc-go now uses the settings provided by the `tls.Config`.
The http-01 and tls-alpn-01challenges are not affected:
- 939ac1be8f/va/http.go (L140-L157)
- 939ac1be8f/va/tlsalpn.go (L213-L217)
Upgrade from the old go-jose v2.6.1 to the newly minted go-jose v4.0.1.
Cleans up old code now that `jose.ParseSigned` can take a list of
supported signature algorithms.
Fixes https://github.com/letsencrypt/boulder/issues/7390
---------
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
When passing detailed error information between services as gRPC
metadata, ensure that the suberrors being sent contain only ascii
characters, because gRPC metadata is sent as HTTP headers which only
allow visible ascii characters.
Also add a regression test.
Adds the chosen DNS resolver to the VAs `ValidationRecord` object so
that for each challenge type during a validation, boulder can audit log
the resolver(s) chosen to fulfill the request..
Fixes https://github.com/letsencrypt/boulder/issues/7140
Many services already have --addr and/or --debug-addr flags.
However, it wasn't universal, so this PR adds flags to commands where
they're not currently present.
This makes it easier to use a shared config file but listen on different
ports, for running multiple instances on a single host.
The config options are made optional as well, and removed from
config-next/.
This is a cleanup PR finishing the migration from int64 timestamps to
protobuf `*timestamppb.Timestamps` by removing all usage of the old
int64 fields. In the previous PR
https://github.com/letsencrypt/boulder/pull/7121 all fields were
switched to read from the protobuf timestamppb fields.
Adds a new case to `core.IsAnyNilOrZero` to check various properties of
a `*timestamppb.Timestamp` reducing the visual complexity for receivers.
Fixes https://github.com/letsencrypt/boulder/issues/7060
* Renames all of int64 as a time.Duration fields to `<fieldname>NS` to
indicate they are Unix nanoseconds.
* Adds new `google.protobuf.Duration` fields to each .proto file where
we previously had been using an int64 field to populate a time.Duration.
* Updates relevant gRPC messages.
Part 1 of 3 for https://github.com/letsencrypt/boulder/issues/7097
* Adds new `google.protobuf.Timestamp` fields to each .proto file where
we had been using `int64` fields as a timestamp.
* Updates relevant gRPC messages to populate the new
`google.protobuf.Timestamp` fields in addition to the old `int64`
timestamp fields.
* Added tests for each `<x>ToPB` and `PBto<x>` functions to ensure that
new fields passed into a gRPC message arrive as intended.
* Removed an unused error return from `PBToCert` and `PBToCertStatus`
and cleaned up each call site.
Built on-top of https://github.com/letsencrypt/boulder/pull/7069
Part 2 of 4 related to
https://github.com/letsencrypt/boulder/issues/7060
Allow gRPC SRV resolver to succeed even when some names are not resolved
successfully. Cross-DC services (e.g. nonce) will fail to resolve when
the link between DCs is severed or one DC is taken offline, this should
not result in hard gRPC service failures.
Fixes#6974
Fix an issue related to the custom gRPC Picker implementation introduced
in #6618. When a nonce contained a prefix not associated with a known
backend, the Picker would continuously rebuild, re-resolve DNS, and
eventually throw a 500 "Server Error" at RPC timeout. The Picker now
promptly returns a 400 "Bad Nonce" error as expected, in response the
requesting client should retry their request with a fresh nonce.
Additionally:
- WFE unit tests use derived nonces when `"BOULDER_CONFIG_DIR" ==
"test/config-next"`.
- `Balancer.Build()` in "noncebalancer" forces a rebuild until non-zero
backends are available. This matches the
[balancer/roundrobin](d524b40946/balancer/roundrobin/roundrobin.go (L49-L53))
implementation.
- Nonces with no matching backend increment "jose_errors" with label
`"type": "JWSInvalidNonce"` and "nonce_no_backend_found".
- Nonces of incorrect length are now rejected at the WFE and increment
"jose_errors" with label `"type": "JWSMalformedNonce"` instead of
`"type": "JWSInvalidNonce"`.
- Nonces not encoded as base64url are now rejected at the WFE and
increment "jose_errors" with label `"type": "JWSMalformedNonce"` instead
of `"type": "JWSInvalidNonce"`.
Fixes#6969
Part of #6974
Add the necessary scaffolding for deep health checking of our various
gRPC components. Each component implementation that also implements the
grpc.checker interface will be checked periodically, and the health
status of the component will be updated accordingly.
Add the necessary methods to SA to implement the grpc.checker interface
and register these new health checks with Consul.
Additionally:
- Update entry point script to check for ProxySQL readiness.
- Increase the poll rate for gRPC Consul checks from 5s to 2s to help
with DNS failures, due to check failures, on startup.
- Change log level for Consul from INFO to ERROR to deal with noisy logs
full of transport failures due to Consul gRPC checks firing before the
SAs are up.
Fixes#6878
Part of #6795
Removes the `Hostname` and `Port` fields from an http-01
ValidationRecord model prior to storing the record in the database.
Using `"hostname":"example.com","port":"80"` as a snippet of a whole
validation record, we'll save minimum 36 bytes for each new http-01
ValidationRecord that gets stored. When retrieving the record, the
ValidationRecord `RehydrateHostPort` method will repopulate the
`Hostname` and `Port` fields from the `URL` field.
Fixes the main goal of
https://github.com/letsencrypt/boulder/issues/5231.
---------
Co-authored-by: Samantha <hello@entropy.cat>
Currently we set WaitForReady(true), which causes gRPC requests to not
fail immediately if no backends are available, but instead wait until
the timeout in case a backend does become available. The downside is
that this behavior masks true connection errors. We'd like to turn it
off.
Fixes#6834