Add a new shared config stanza which all boulder components can use to
configure their Open Telemetry tracing. This allows components to
specify where their traces should be sent, what their sampling ratio
should be, and whether or not they should respect their parent's
sampling decisions (so that web front-ends can ignore sampling info
coming from outside our infrastructure). It's likely we'll need to
evolve this configuration over time, but this is a good starting point.
Add basic Open Telemetry setup to our existing cmd.StatsAndLogging
helper, so that it gets initialized at the same time as our other
observability helpers. This sets certain default fields on all
traces/spans generated by the service. Currently these include the
service name, the service version, and information about the telemetry
SDK itself. In the future we'll likely augment this with information
about the host and process.
Finally, add instrumentation for the HTTP servers and grpc
clients/servers. This gives us a starting point of being able to monitor
Boulder, but is fairly minimal as this PR is already somewhat unwieldy:
It's really only enough to understand that everything is wired up
properly in the configuration. In subsequent work we'll enhance those
spans with more data, and add more spans for things not automatically
traced here.
Fixes https://github.com/letsencrypt/boulder/issues/6361
---------
Co-authored-by: Aaron Gable <aaron@aarongable.com>
Although #6771 significantly cleaned up how gRPC services stop and clean
up, it didn't make any changes to our HTTP servers or our non-server
(e.g. crl-updater, log-validator) processes. This change finishes the
work.
Add a new helper method cmd.WaitForSignal, which simply blocks until one
of the three signals we care about is received. This easily replaces all
calls to cmd.CatchSignals which passed `nil` as the callback argument,
with the added advantage that it doesn't call os.Exit() and therefore
allows deferred cleanup functions to execute. This new function is
intended to be the last line of main(), allowing the whole process to
exit once it returns.
Reimplement cmd.CatchSignals as a thin wrapper around cmd.WaitForSignal,
but with the added callback functionality. Also remove the os.Exit()
call from CatchSignals, so that the main goroutine is allowed to finish
whatever it's doing, call deferred functions, and exit naturally.
Update all of our non-gRPC binaries to use one of these two functions.
The vast majority use WaitForSignal, as they run their main processing
loop in a background goroutine. A few (particularly those that can run
either in run-once or in daemonized mode) still use CatchSignals, since
their primary processing happens directly on the main goroutine.
The changes to //test/load-generator are the most invasive, simply
because that binary needed to have a context plumbed into it for proper
cancellation, but it already had a custom struct type named "context"
which needed to be renamed to avoid shadowing.
Fixes https://github.com/letsencrypt/boulder/issues/6794
Enable the errcheck linter. Update the way we express exclusions to use
the new, non-deprecated, non-regex-based format. Fix all places where we
began accidentally violating errcheck while it was disabled.
The CA, RA, and VA have multiple goroutines running alongside primary
gRPC handling goroutine. These ancillary goroutines should be gracefully
shut down when the process is about to exit. Historically, we have
handled this by putting a call to each of these goroutine's shutdown
function inside cmd.CatchSignals, so that when a SIGINT is received, all
of the various cleanup routines happen in sequence.
But there's a cleaner way to do it: just use defer! All of these
cleanups need to happen after the primary gRPC server has fully shut
down, so that we know they stick around at least as long as the service
is handling gRPC requests. And when the service receives a SIGINT,
cmd.CatchSignals will call the gRPC server's GracefulStop, which will
cause the server's .Serve() to finally exit, which will cause start() to
exit, which will cause main() to exit, which will cause all deferred
functions to be run.
In addition, remove filterShutdownErrors as the bug which made it
necessary (.Serve() returning an error even when GracefulShutdown() is
called) was fixed back in 2017. This allows us to call the start()
function in a much more natural way, simply logging any error it returns
instead of calling os.Exit(1) if it returns an error.
This allows us to simplify the exit-handling code in these three
services' main() functions, and lets us be a bit more idiomatic with our
deferred cleanup functions.
Part of #6794
Deprecate the ROCSPStage6 feature flag. Remove all references to the
`ocspResponse` column from the SA, both when reading from and when
writing to the `certificateStatus` table. This makes it safe to fully
remove that column from the database.
IN-8731 enabled this flag in all environments, so it is safe to
deprecate.
Part of #6285
Upgrade grpc to v1.53.0, as preparation for introducing OpenTelemetry,
which depends on that grpc version.
Two changes to our own code were necessitated by upstream changes:
1. Add a stub implementation of GetOrBuildProducer: this was added to
the balancer.SubConn interface by grpc v1.51.0
2. Change use of Endpoint field to Endpoint() method: the field was
removed and replaced by a method in
https://github.com/grpc/grpc-go/pull/5852. This also means that our
tests can't set the .Endpoint field, so the tests are updated to use the
.URL field instead, and a helper has been added to make that easy.
Part of #6361
Remove tracing using Beeline from Boulder. The only remnant left behind
is the deprecated configuration, to ensure deployability.
We had previously planned to swap in OpenTelemetry in a single PR, but
that adds significant churn in a single change, so we're doing this as
multiple steps that will each be significantly easier to reason about
and review.
Part of #6361
sa: rename AddPrecertificateRequest.IssuerID
to IssuerNameID. This is in preparation for adding a similarly-named
field to AddSerialRequest.
Part of #5152.
Add the GRPCStatus method to our BoulderError type, so that the gRPC
server code can automatically set an appropriate Status on all gRPC
responses, based on the kind of error that we return. We still serialize
the whole BoulderError type and details into the response metadata, so
that it can be rehydrated on the client side, but this allows the
gRPC-native Status to be something other than Unknown. As part of this
change, have our custom error serialization code stop manually setting
the gRPC status code to codes.Unknown.
This change allows the default gRPC prometheus metrics to more
accurately report the kinds of errors our gRPC requests experience, and
may allow us to more elegantly transition to using grpc.Status errors in
other places where they're relevant and useful.
Assign nonce prefixes for each nonce-service by taking the first eight
characters of the the base64url encoded HMAC-SHA256 hash of the RPC
listening address using a provided key. The provided key must be same
across all boulder-wfe and nonce-service instances.
- Add a custom `grpc-go` load balancer implementation (`nonce`) which
can route nonce redemption RPC messages by matching the prefix to the
derived prefix of the nonce-service instance which created it.
- Modify the RPC client constructor to allow the operator to override
the default load balancer implementation (`round_robin`).
- Modify the `srv` RPC resolver to accept a comma separated list of
targets to be resolved.
- Remove unused nonce-service `-prefix` flag.
Fixes#6404
Add a new gRPC server interceptor (both unary and streaming) which
verifies that the mTLS info set on the persistent connection has a
client cert which contains a name which is allowlisted for the
particular service being called, not just for the overall server.
This will allow us to make more services -- particularly the CA and the
SA -- more similar to the VA. We will be able to run multiple services
on the same port, while still being able to control access to those
services on a per-client basis. It will also let us split those services
(e.g. into read-only and read-write subsets) much more easily, because a
client will be able to switch which service it is calling without also
having to be reconfigured to call a different address. And finally, it
will allow us to simplify configuration for clients (such as the RA)
which maintain connections to multiple different services on the same
server, as they'll be able to re-use the same address configuration.
Turn bgrpc.NewServer into a builder-pattern, with a config-based
initialization, multiple calls to Add to add new gRPC services, and a
final call to Build to produce the start() and stop() functions which
control server behavior. All calls are chainable to produce compact code
in each component's main() function.
This improves the process of creating a new gRPC server in three ways:
1) It avoids the need for generics/templating, which was slightly
verbose.
2) It allows the set of services to be registered on this server to be
known ahead of time.
3) It greatly streamlines adding multiple services to the same server,
which we use today in the VA and will be using soon in the SA and CA.
While we're here, add a new per-service config stanza to the
GRPCServerConfig, so that individual services on the same server can
have their own configuration. For now, only provide a "ClientNames" key,
which will be used in a follow-up PR.
Part of #6454
Remove the need for clients to explicitly call bgrpc.NewClientMetrics,
by moving that call inside bgrpc.ClientSetup. In case ClientSetup is
called multiple times, use the recommended method to gracefully recover
from registering duplicate metrics. This makes gRPC client setup much
more similar to gRPC server setup after the previous server refactoring
change landed.
Collapse most of our boilerplate gRPC creation steps (in particular,
creating default metrics, making the server and listener, registering
the server, creating and registering the health service, filtering
shutdown errors from the output, and gracefully stopping) into a single
function in the existing bgrpc package. This allows all but one of our
server main functions to drop their calls to NewServer and
NewServerMetrics.
To enable this, create a new helper type and method in the bgrpc
package. Conceptually, this could be just a new function, but it must be
attached to a new type so that it can be generic over the type of gRPC
server being created. (Unfortunately, the grpc.RegisterFooServer methods
do not accept an interface type for their second argument).
The only main function which is not updated is the boulder-va, which is
a special case because it creates multiple gRPC servers but (unlike the
CA) serves them all on the same port with the same server and listener.
Part of #6452
- Add a new field, `RetryAfter` to `BoulderError`s
- Add logic to wrap/unwrap the value of the `RetryAfter` field to our gRPC error
interceptor
- Plumb `RetryAfter` for `DuplicateCertificateError` emitted by RA to the WFE
client response header
Part of #6256
- Fork the default `dns` resolver from `go-grpc` to add backend discovery via
DNS SRV resource records.
- Add new fields for SRV based discovery to `cmd.GRPCClientConfig`
- Add new (optional) field `DNSAuthority` for specifying custom DNS server to
`cmd.GRPCClientConfig`
- Add a utility method to `cmd.GRPCClientConfig` to simplify target URI and host
construction. With three schemes and `DNSAuthority` it makes more sense to
handle all of this parsing and construction outside of the RPC client
constructor.
Resolves#6111
Create new gRPC interceptors which are capable of working
on streaming gRPC methods. Add these new interceptors, as
well as the default metrics interceptor provided by grpc-prometheus,
to all of our gRPC clients and servers.
The new interceptors behave virtually identically to their unary
counterparts: they wrap and unwrap our custom errors from the
gRPC metadata, they increment and decrement the in-flight RPC
metric, and they ensure that the RPCs don't fail-fast and do have
enough time left in their deadline to actually finish.
Unfortunately, because the interfaces for unary and streaming
RPCs are so divergent, it's not feasible to share code between the
two kinds of interceptors. While much of the new code is copy-pasted
from the old interceptors, there are subtle differences (such as not
immediately deferring the local context's cancel() function).
Fixes#6356
Add a new `test.AssertNil()` helper to facilitate asserting that a given
unit test result is a non-boxed nil. Update `test.AssertNotNil()` to use
the reflect package's `.IsNil()` method to catch boxed nils.
In Go, variables whose type is constrained to be an interface type (e.g.
a function parameter which takes an interface, or the return value of a
function which returns `error`, itself an interface type) should
actually be thought of as a (T, V) tuple, where T is their underlying
concrete type and V is their underlying value. Thus, there are two ways
for such a variable to be nil-like: it can be truly nil where T=nil and
V is uninitialized, or it can be a "boxed nil" where T is a nillable
type such as a pointer or a slice and V=nil.
Unfortunately, only the former of these is == nil. The latter is the
cause of frequent bugs, programmer frustration, a whole entry in the Go
FAQ, and considerable design effort to remove from Go 2.
Therefore these two test helpers both call `t.Fatal()` when passed a
boxed nil. We want to avoid passing around boxed nils whenever possible,
and having our tests fail whenever we do is a good way to enforce good
nil hygiene.
Fixes#3279
Run the Boulder unit and integration tests with go1.19.
In addition, make a few small changes to allow both sets of
tests to run side-by-side. Mark a few tests, including our lints
and generate checks, as go1.18-only. Reformat a few doc
comments, particularly lists, to abide by go1.19's stricter gofmt.
Causes #6275
- Implement a static resolver for the gPRC dialer under the scheme `static:///`
which allows the dialer to resolve a backend from a static list of IPv4/IPv6
addresses passed via the existing JSON config.
- Add config key `serverAddresses` to the `GRPCClientConfig` which, when
populated, enables static IP resolution of gRPC server backends.
- Set `config-next` to use static gRPC backend resolution for all SA clients.
- Generate a new SA certificate which adds `10.77.77.77` and `10.88.88.88` to
the SANs.
Resolves#6255
Update:
- golangci-lint from v1.42.1 to v1.46.2
- protoc from v3.15.6 to v3.20.1
- protoc-gen-go from v1.26.0 to v1.28.0
- protoc-gen-go-grpc from v1.1.0 to v1.2.0
- fpm from v1.14.0 to v1.14.2
Also remove a reference to go1.17.9 from one last place.
This does result in updating all of our generated .pb.go files, but only
to update the version number embedded in each file's header.
Fixes#6123
These new linters are almost all part of golangci-lint's collection
of default linters, that would all be running if we weren't setting
`disable-all: true`. By adding them, we now have parity with the
default configuration, as well as the additional linters we like.
Adds the following linters:
* unconvert
* deadcode
* structcheck
* typecheck
* varcheck
* wastedassign
Use newly-available HandshakeContext instead of firing off a goroutine.
https://pkg.go.dev/crypto/tls#Conn.HandshakeContext
Don't set explicit min and max TLS versions. Go's defaults do a good job
protecting us here (and the max version prevented us from using TLS
1.3).
We have decided that we don't like the if err := call(); err != nil
syntax, because it creates confusing scopes, but we have not cleaned up
all existing instances of that syntax. However, we have now found a
case where that syntax enables a bug: It caused readers to believe that
a later err = call() statement was assigning to an already-declared err
in the local scope, when in fact it was assigning to an
already-declared err in the parent scope of a closure. This caused our
ineffassign and staticcheck linters to be unable to analyze the
lifetime of the err variable, and so they did not complain when we
never checked the actual value of that error.
This change standardizes on the two-line error checking syntax
everywhere, so that we can more easily ensure that our linters are
correctly analyzing all error assignments.
Add `stylecheck` to our list of lints, since it got separated out from
`staticcheck`. Fix the way we configure both to be clearer and not
rely on regexes.
Additionally fix a number of easy-to-change `staticcheck` and
`stylecheck` violations, allowing us to reduce our number of ignored
checks.
Part of #5681
Running an older version (v0.0.1-2020.1.4) of `staticcheck` in
whole-program mode (`staticcheck --unused.whole-program=true -- ./...`)
finds various instances of unused code which don't normally show up
as CI issues. I've used this to find and remove a large chunk of the
unused code, to pave the way for additional large deletions accompanying
the WFE1 removal.
Part of #5681
Update the version of golangci-lint we use in our docker image,
and update the version of the docker image we use in our tests.
Fix a couple places where we were violating lints (ineffective assign
and calling `t.Fatal` from outside the main test goroutine), and add
one lint (using math/rand) to the ignore list.
Fixes#5710
Remove the last of the gRPC wrapper files. In order to do so:
- Remove the `core.StorageGetter` interface. Replace it with a new
interface (whose methods include the `...grpc.CallOption` arg)
inside the `sa/proto/` package.
- Remove the `core.StorageAdder` interface. There's no real use-case
for having a write-only interface.
- Remove the `core.StorageAuthority` interface, as it is now redundant
with the autogenerated `sapb.StorageAuthorityClient` interface.
- Replace the `certificateStorage` interface (which appears in two
different places) with a single unified interface also in `sa/proto/`.
- Update all test mocks to include the `_ ...grpc.CallOption` arg in
their method signatures so they match the gRPC client interface.
- Delete many methods from mocks which are no longer necessary (mostly
because they're mocking old authz1 methods that no longer exist).
- Move the two `test/inmem/` wrappers into their own sub-packages to
avoid an import cycle.
- Simplify the `satest` package to satisfy one of its TODOs and to
avoid an import cycle.
- Add many methods to the `test/inmem/sa/` wrapper, to accommodate all
of the methods which are called in unittests.
Fixes#5600
Add a new method to the SA's gRPC interface which takes both an Order
and a list of new Authorizations to insert into the database, and adds
both (as well as the various ancillary rows) inside a transaction.
To enable this, add a new abstraction layer inside the `db/` package
that facilitates inserting many rows at once, as we do for the `authz2`,
`orderToAuthz2`, and `requestedNames` tables in this operation.
Finally, add a new codepath to the RA (and a feature flag to control it)
which uses this new SA method instead of separately calling the
`NewAuthorization` method multiple times. Enable this feature flag in
the config-next integration tests.
This should reduce the failure rate of the new-order flow by reducing
the number of database operations by coalescing multiple inserts into a
single multi-row insert. It should also reduce the incidence of new
authorizations being created in the database but then never exposed to
the subscriber because of a failure later in the new-order flow, both by
reducing failures overall and by adding those authorizations in a
transaction which will be rolled back if there is a later failure.
Fixes#5577
* Make `sa.SetOrderError` passthrough.
* Create new proto message `sapb.SetOrderErrorRequest`
that includes only the order id and error to avoid passing around
unnecessary fields of an order.
Part of: #5533
* Make `sa.NewOrder` passthrough.
* Create a new proto message `sapb.NewOrderRequest`
that includes only the information needed to store a new order.
Part of: #5533
* Make sa.SetOrderProcessing GRPC wrapper passthrough. Also, change the
server method to accept an `*sapb.OrderRequest{}` (essentially just an
order ID) as the parameter instead of a whole order.
Part of: #5533
- Make `CountRegistrationsByIP` a pass-through
- Make `CountRegistrationsByIPRange` a pass-through
- Make `CountOrders` a pass-through
- Make `CountFQDNSets` a pass-through
- Make `CountPendingAuthorizations2` a pass-through
- Make `CountInvalidAuthorizations2` a pass-through
Fixes#5535
- Make `GetAuthorization2` a pass-through
- Make `GetAuthorizations2` a pass-through
- Make `GetPendingAuthorization2` a pass-through
- Make `GetValidOrderAuthorizations2` a pass-through
- Make `GetValidAuthorizations2` a pass-through
- Make `NewAuthorizations2` a pass-through
- Make `FinalizeAuthorization2` a pass-through
- Make `DeactivateAuthorization2` a pass-through
Fixes#5534
Make the gRPC wrappers for the SA's `AddCertificate`,
`AddPrecertificate`, `AddSerial`, and `RevokeCertificate`
methods simple pass-throughs.
Fixup a couple tests that were passing only because their
requests to in-memory SA objects were not passing through
the wrapper's consistency checks.
Part of #5532
Add a new field, `issuerID`, to the `core.CertificateStatus` proto
message. Also add this field to the list of fields carried over when
converting back and forth between core and proto objects.
This should be safe to deploy without risk of miscommunication
between components that update at different times. If the SA
updates first and begins sending IssuerIDs to clients, they'll simply
ignore them. If the clients update first and try to extract IssuerIDs
from messages that don't have them, they'll simply get the zero
value, which is what they were getting before anyway.
Part of #5152 and #5532
Make the gRPC wrappers for sa.GetCertificate and
sa.GetPrecertificate bare passthroughs. The latter of
these already took and returned appropriate protobufs,
so this change mostly just makes the former look like the
latter.
Part of #5532
- Move `DeactivateAuthorization` logic from `grpc` to `ra` and `wfe`
- Update `ra` mocks in `wfe` tests
- Remove unnecessary marshalling between `core.Authorization` and
`corepb.Authorization` in `ra` tests.
Fixes#5562
Pull the "was the gRPC error a Canceled error" checking code out into a
separate interceptor, and add that interceptor only in the wfe and wfe2
gRPC clients.
Although the vast majority of our cancelations come from the HTTP client
disconnecting (and that cancelation being propagated through our gRPC
stack), there are a few other situations in which we cancel gRPC
connections, including when we receive a quorum of responses from VAs
and no longer need responses from the remaining remote VA(s). This
change ensures that we do not treat those other kinds of cancelations in
the same way that we treat client-initiated cancelations.
Fixes#5444
Remove all error checking and type transformation from the gRPC wrappers
for the following methods on the SA:
- GetRegistration
- GetRegistrationByKey
- NewRegistration
- UpdateRegistration
- DeactivateRegistration
Update callers of these methods to construct the appropriate protobuf
request messages directly, and to consume the protobuf response messages
directly. In many cases, this requires changing the way that clients
handle the `Jwk` field (from expecting a `JSONWebKey` to expecting a
slice of bytes) and the `Contacts` field (from expecting a possibly-nil
pointer to relying on the value of the `ContactsPresent` boolean field).
Implement two new methods in `sa/model.go` to convert directly between
database models and protobuf messages, rather than round-tripping
through `core` objects in between. Delete the older methods that
converted between database models and `core` objects, as they are no
longer necessary.
Update test mocks to have the correct signatures, and update tests to
not rely on `JSONWebKey` and instead use byte slices.
Fixes#5531
- Move `AdministrativelyRevokeCertificate` logic from `grpc` to `ra`
- Test new error conditions in `ra/ra_test.go`
- Update `ra` mocks in `wfe` tests
Fixes#5529
A recent mysql driver upgrade caused a performance regression. We
believe this may be due to cancellations getting passed through to the
database driver, which as of the upgrade will more aggressively tear
down connections that experienced a cancellation.
Also, we only recently started propagation cancellations all the way
from the frontend in #5404.
This makes it so the driver doesn't see the cancellation.
Second attempt at #5447
- Move response validation from `RA` client wrapper to `WFE` and `WFE2`
- Move request validation from `RA` server wrapper to `RA`
- Refactor `RA` tests to construct valid `core.Authorization` objects
- Consolidate multiple error declarations to global `errIncompleteGRPCRequest`
Fixes#5439
Create script which finds every .proto file in the repo and correctly
invokes `protoc` for each. Create a single file with a `//go:generate`
directive to invoke the new script. Delete all of the other generate.go
files, so that our proto generation is unified in one place.
Fixes#5453
Replace `core.Empty` with `google.protobuf.Empty` in all of our gRPC
methods which consume or return an empty protobuf. The golang core
proto libraries provide an empty message type, so there is no need
for us to reinvent the wheel.
This change is backwards-compatible and does not require a special
deploy. The protobuf message descriptions of `core.Empty` and
`google.protobuf.Empty` are identical, so their wire-formats are
indistinguishable and therefore interoperable / cross-compatible.
Fixes#5443
The //grpc/test_proto/generate.go file was not generating the protos
in its own directory, it was regenerating the VA protos. Therefore the
generated files were out of date, and were relying on an old version
of the go proto library, which we can now remove from our direct deps.
Part of #5443
Part of #5453
Update the signature of the RA's RevokeCertificateWithReg
method to exactly match that of the gRPC method it implements.
Remove all logic from the `RevokeCertificateWithReg` client
and server wrappers. Move the small amount of checking they
were performing directly into the server implementation.
Fixes#5440
Use the built-in grpc-go client and server interceptor chaining
utilities, instead of the ones provided by go-grpc-middleware.
Simplify our interceptors to call their handlers/invokers directly,
instead of delegating to the metrics interceptor, and add the
metrics interceptor to the chains instead.
Add Honeycomb tracing to all Boulder components which act as
HTTP servers, gRPC servers, or gRPC clients. Add many values
which we currently emit to logs to the trace spans. Add a way to
configure the Honeycomb integration to our config files, and by
default configure all of our tests to "mute" (send nothing).
Followup changes will refine the configuration, attempt to reduce
the new dependency load, and introduce better sampling.
Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218
protoc now generates grpc code in a separate file from protobuf code.
Also, grpc servers are now required to embed an "unimplemented"
interface from the generated .pb.go file, which provides forward
compatibility.
Update the generate.go files since the invocation for protoc has changed
with the split into .pb.org and _grpc.pb.go.
Fixes#5368
Update all of our tests to use `AssertMetricWithLabelsEquals`
instead of combinations of the older `CountFoo` helpers with
simple asserts. This coalesces all of our prometheus inspection
logic into a single function, allowing the deletion of four separate
helper functions.
Delete the ValidationAuthorityGRPCServer and ...GRPCClient structs,
and update references to instead reference the underlying vapb.VAClient
type directly. Also delete the core.ValidationAuthority interface.
Does not require updating interfaces elsewhere, as the client
wrapper already included the variadic grpc.CallOption parameter.
Fixes#5325
Delete the PublisherClientWrapper and PublisherServerWrapper. Update
various structs and functions to expect a pubpb.PublisherClient instead
of a core.Publisher; these two interfaces differ only in that the
auto-generated PublisherClient takes a variadic CallOptions parameter.
Update all mock publishers in tests to match the new interface. Finally,
delete the now-unused core.Publisher interface and some already-unused
mock-generating code.
This deletes a single sanity check (for a nil SCT even when there is a
nil error), but that check was redundant with an identical check in the
only extant client code in ctpolicy.go.
Fixes#5323
Delete the CertificateAuthorityClientWrapper, OCSPGeneratorClientWrapper,
and CertificateAuthorityServerWrapper structs, which provided no error
checking above and beyond their wrapped types. Replace them with the
corresponding auto-generated gRPC types in calling code. Update some
mocks to have the necessary variadic grpc.CallOption parameter. Finally,
delete the now-unused core.CertificateAuthority interface.
Fixes#5324
This allows servers to tell clients to go away after some period of time, which triggers the clients to re-resolve DNS.
Per grpc/grpc#12295, this is the preferred way to do this.
Related: #5307.
We do not present a validated timestamp in challenges where status = valid
as required by RFC8555.
This change is the first step to presenting challenge timestamps to the
client. It adds a timestamp to each place where we change a challenge to
valid. This only displays in the logs and will not display to the
subscriber because it is not yet stored somewhere retrievable. The next
step will be to store it in the database and then finally present it to
the client.
Part of #5198
We intend to delete the v1 API (i.e. `wfe` and its associated codepaths)
in the near future, and as such are not giving it new features or
capabilities. However, before then we intend to allow the v2 API to
provide issuance both from our RSA and from our ECDSA intermediates.
The v1 API cannot gain such capability at the same time.
The CA doesn't know which frontend originated any given issuance
request, so we can't simply gate the single- or double-issuer behavior
based on that. Instead, this change introduces the ability for the
WFE (and the RA, which sits between the WFE and the CA) to request
issuance from a specific intermediate. If the specified intermediate is
not available in the CA, issuance will fail. If no intermediate is
specified (as is the case in requests coming from wfe2), it falls back
to selecting the issuer based on the algorithm of the public key to
be signed.
Fixes#5216
In all boulder services, we construct a single tls.Config object
and then pass it into multiple gRPC setup methods. In all boulder
services but one, we pass the object into multiple clients, and
just one server. In general, this is safe, because all of the client
setup happens on the main thread, and the server setup similarly
happens on the main thread before spinning off the gRPC server
goroutine.
In the CA, we do the above and pass the tlsConfig object into two
gRPC server setup functions. Thus the first server goroutine races
with the setup of the second server.
This change removes the post-hoc assignment of MinVersion,
MaxVersion, and CipherSuites of the tls.Config object passed
to grpc.ClientSetup and grpc.NewServer. And adds those same
values to the cmd.TLSConfig.Load, the method responsible for
constructing the tls.Config object before it's passed to
grpc.ClientSetup and grpc.NewServer.
Part of #5159
This change adds two new test assertion helpers, `AssertErrorIs`
and `AssertErrorWraps`. The former is a wrapper around `errors.Is`,
and asserts that the error's wrapping chain contains a specific (i.e.
singleton) error. The latter is a wrapper around `errors.As`, and
asserts that the error's wrapping chain contains any error which is
of the given type; it also has the same unwrapping side effect as
`errors.As`, which can be useful for further assertions about the
contents of the error.
It also makes two small changes to our `berrors` package, namely
making `berrors.ErrorType` itself an error rather than just an int,
and giving `berrors.BoulderError` an `Unwrap()` method which
exposes that inner `ErrorType`. This allows us to use the two new
helpers above to make assertions about berrors, rather than
having to hand-roll equality assertions about their types.
Finally, it takes advantage of the two changes above to greatly
simplify many of the assertions in our tests, removing conditional
checks and replacing them with simple assertions.
This health service implements the gRPC Health Checking
Protocol, as defined in
https://github.com/grpc/grpc/blob/master/doc/health-checking.md
and as implemented by the gRPC authors in
https://pkg.go.dev/google.golang.org/grpc/health@v1.29.0
It simply instantiates a health service, and attaches it to the same
gRPC server that is handling requests to the primary (e.g. CA) service.
When the main service would be shut down (e.g. because it caught a
signal), it also sets the status of the service to NOT_SERVING.
This change also imports the health client into our grpc client,
ensuring that all of our grpc clients use the health service to inform
their load-balancing behavior.
This will be used to replace our current usage of polling the debug
port to determine whether a given service is up and running. It may
also be useful for more comprehensive checks and blackbox probing
in the future.
Part of #5074
The previous implementation of `IsAnyNilOrZero` did not in fact work,
and its tests did not catch this fact. Within the numeric clause, the
compiler would only instantiate the comparison literal 0 to be one
of the eight possible types. Comparisons against any of the other
seven types would always be false, no matter what value that type
held.
The tests did not catch this because they only tested two literal
values: `0` and `-12.345`, both of which can be `float64`s.
This change updates the utility function to use the `reflect` package,
to ensure that it works correctly. It also updates the test to test
multiple different kinds of numeric values, and removes the code
for handling pointer-to- types, as all of our proto2 code has been
removed.
Finally, it updates the SA wrapper's `RevokeCertificate` method to
correctly not require that `req.Reason` be non-zero: this field can
and often is zero, as that value represents `Unspecified`.
Using the reflect package is a conscious tradeoff. It will be slower
than manually writing out every single case, but it will also be less
prone to error.
Part of #5097
In preparation for moving to proto3 for the core/proto types
(Registration, Order, Authorization, Challenge, etc), this removes
checks that will fail when a proto2 client or server receives a message
from a proto3 client or server. Since proto3 encodes fields that have
their zero value as being absent (i.e. nil, in Go), we treat nil the
same as having a zero value.
In the process this change introduces additional checks, verifying
certain fields which should never have a zero value.
This involves factoring out registrationValid into registrationValid and
newRegistrationValid, since new registration requests lack some fields that
already-created registrations should always have.
Similarly, UpdateRegistration is changed to not verify that
`request.Update` is valid, since an update to registration object is not
a complete registration object - it may only update one field.
These were used during the transition to authzv2. The SA side of these
RPCs already ignores these booleans. This is just cleaning up the
protobufs and call sites.
As part of #5050, I'm updating some of the code in grpc/pb-marshaling.go
to move from nil checks to zero checks. In the process I'm introducing some
new zero checks, on things like challenge type, status, and token. This is
shaking out some places where our mocks have taken shortcuts by not
creating a "full" object including all fields that are normally present.
This PR updates our mocks and tests to provide more realistic objects in
all the places that broke when introducing those zero checks.
One slightly surprising / interesting thing: Since core types like
Order and Registration are still proto2 and have pointer fields,
there are actually some places in this PR where I had to add
a `*` rather than delete an `&`, because I was taking a pointer
field from one of those core types and passing it as a field in
an SA RPC request.
Fixes#5037.
Updates the Registration Authority to use proto3 for its
RPC methods. This turns out to be a fairly minimal change,
as many of the RA's request and response messages are
defined in core.proto, and are therefore still proto2.
Fixes#4955
Any field which can be zero must be allowed to be nil,
so that a proto2 server receiving requests from a proto3
client is willing to process messages with zero-value fields
encoded as missing.
Part of #4955
As part of the migration to proto3, any fields in requests that may be
zero should also be allowed to be nil. That's because proto3 will
represent those fields as absent when they have their zero value.
This is based on a manual review of the wrappers for the SA, plus
a pair of integration test runs. For the integration test runs I took these
steps:
1. Copy sa/proto to sa/proto2
2. Change sa/proto to use proto3 and regenerate.
3. In sa/*.go and cmd/boulder-sa/main.go, update the imports to use the
proto2 version.
4. Split grpc/sa-wrappers.go into sa-server-wrappers.go and sa-wrappers.go
(containing the client code)
5. In sa-server-wrappers.go, change the import to use sa/proto2.
6. In sa-server-wrappers.go, make a local copy of the core.StorageAuthority
interface that uses the sa/proto2 types. This was necessary as
a temporary kludge because of how the server wrapper internally
uses the core.StorageAuthority interface.
7. Fix all the pointer-vs-value build errors in every other package.
8. Run integration tests.
I also performed those steps with proto2 and proto3 swapped, to confirm the
behavior when a proto2 client talks to a proto3 SA.
ACME Challenges are well-known strings ("http-01", "dns-01", and
"tlsalpn-01") identifying which kind of challenge should be used
to verify control of a domain. Because they are well-known and
only certain values are valid, it is better to represent them as
something more akin to an enum than as bare strings. This also
improves our ability to ensure that an AcmeChallenge is not
accidentally used as some other kind of string in a different
context. This change also brings them closer in line with the
existing core.AcmeResource and core.OCSPStatus string enums.
Fixes#5009
Updates the type of the ValidationAuthority's PerformValidation
method to be identical to that of the corresponding auto-generated
grpc method, i.e. directly taking and returning proto message
types, rather than exploded arguments.
This allows all logic to be removed from the VA wrappers, which
will allow them to be fully removed after the migration to proto3.
Also updates all tests and VA clients to adopt the new interface.
Depends on #4983 (do not review first four commits)
Part of #4956
This is the only method on the ca which uses a non-proto
type as its request or response value. Changing this to
use a proto removes the last logic from the wrappers,
allowing them to be removed in a future CL. It also makes
the interface more uniform and easier to reason about.
Issue: #4940
Introduces a new generic helper utility to check that
fields of proto messages are non-nil and non-zero.
Uses this helper to simplify the ca RPC wrapper
methods, moving their completeness checks into
the underlying method handler. Also annotates the
completeness checks to justify which fields are or
are not being checked for future readers. Finally,
removes the similar non-nil checks from the client
wrappers, where they provide no marginal value.
Follow-up changes will do the same for other RPC
services, migrate said services to proto3, and change
the IssueCertificateForPrecertificate method to return
a corepb.Certificate instead of a core.Certificate, like
the other methods on the ca service.
Issues: #4955
We previously used mixed case names for proto imports
(e.g. both `caPB` and `rapb`), sometimes in the same file.
This change standardizes on the all-lowercase spelling,
which was predominant throughout the codebase.
Simplify database interactions
This change is a result of an audit of all places where
Go code directly constructs SQL queries and executes them
against a dbMap, with the goal of eliminating all instances
of constructing a well-known object type (such as a
core.CertificateStatus) from explicitly-listed database columns.
Instead, we should be relying on helper functions defined in the
sa itself to determine which columns are relevant for the
construction of any given object.
This audit did not find many places where this was occurring. It
did reveal a few simplifications, which are contained in this
change:
1) Greater use of existing SelectFoo methods provided by models.go
2) Streamlining of various SelectSingularFoo methods to always
select by serial string, rather than user-provided WHERE clause
3) One spot (in ocsp-responder) where using a well-known type seemed
better than using a more minimal custom type
Addresses #4899