This means that we don't have to test whether it is nil before
filling it in with the old deprecated values; we can just assume
that it exists whether or not the json config has a matching stanza.
All handling of uninitialized values (i.e. empty strings for key files)
is handled in `goodkey.NewKeyPolicy`.
Add a new check to GoodKey which attempts to factor the public modulus
of the presented key using Fermat's factorization method. This method
will succeed if and only if the prime factors are very close to each
other -- i.e. almost certainly were not selected independently from a
random uniform distribution, but were instead calculated via some other
less secure method.
To support this new feature, add a new config flag to the RA, CA, and
WFE, which all use the GoodKey checks. As part of adding this new config
value, refactor the GoodKey config items into their own config struct
which can be re-used across all services.
If the new `FermatRounds` config value has not been set, it will default
to zero, causing no factorization to be attempted.
Fixes#5850
Part of #5851
The resulting `boulder` binary can be invoked by different names to
trigger the behavior of the relevant subcommand. For instance, symlinking
and invoking as `boulder-ca` acts as the CA. Symlinking and invoking as
`boulder-va` acts as the VA.
This reduces the .deb file size from about 200MB to about 20MB.
This works by creating a registry that maps subcommand names to `main`
functions. Each subcommand registers itself in an `init()` function. The
monolithic `boulder` binary then checks what name it was invoked with
(`os.Args[0]`), looks it up in the registry, and invokes the appropriate
`main`. To avoid conflicts, all of the old `package main` are replaced
with `package notmain`.
To get the list of registered subcommands, run `boulder --list`. This
is used when symlinking all the variants into place, to ensure the set
of symlinked names matches the entries in the registry.
Fixes#5692
Remove the last of the gRPC wrapper files. In order to do so:
- Remove the `core.StorageGetter` interface. Replace it with a new
interface (whose methods include the `...grpc.CallOption` arg)
inside the `sa/proto/` package.
- Remove the `core.StorageAdder` interface. There's no real use-case
for having a write-only interface.
- Remove the `core.StorageAuthority` interface, as it is now redundant
with the autogenerated `sapb.StorageAuthorityClient` interface.
- Replace the `certificateStorage` interface (which appears in two
different places) with a single unified interface also in `sa/proto/`.
- Update all test mocks to include the `_ ...grpc.CallOption` arg in
their method signatures so they match the gRPC client interface.
- Delete many methods from mocks which are no longer necessary (mostly
because they're mocking old authz1 methods that no longer exist).
- Move the two `test/inmem/` wrappers into their own sub-packages to
avoid an import cycle.
- Simplify the `satest` package to satisfy one of its TODOs and to
avoid an import cycle.
- Add many methods to the `test/inmem/sa/` wrapper, to accommodate all
of the methods which are called in unittests.
Fixes#5600
In go1.17, the `x509.CreateCertificate()` method fails if the provided
Signer (private key) and Parent (cert with public key) do not match.
This change both updates the lint library to create and use an issuer
cert whose public key matches the throwaway private key used for lint
signatures, and overhauls its public interface for readability and
simplicity.
Rename the `lint` library to `linter`, to allow other methods to be
renamed to reduce word repetition. Reduce the linter library interface
to three functions: `Check()`, `New()`, and `Linter.Check()` by making
all helper functions private. Refactor the top-level `Check()` method to
rely on `New()` and `Linter.Check()` behind the scenes. Finally, create
a new helper method for creating a lint issuer certificate, call this
new method from `New()`, and store the result in the `Linter` struct.
Part of #5480
- Remove field `ECDSAAllowedAccounts` from CA
- Remove `ECDSAAllowedAccounts` from CA tests
- Replace `ECDSAAllowedAccounts` with `ECDSAAllowListFilename` in
`test/config/ca-a.json` and `test/config/ca-b.json`
- Add YAML allow list file at `test/config/ecdsaAllowList.yml`
Fixes#5394
Add Honeycomb tracing to all Boulder components which act as
HTTP servers, gRPC servers, or gRPC clients. Add many values
which we currently emit to logs to the trace spans. Add a way to
configure the Honeycomb integration to our config files, and by
default configure all of our tests to "mute" (send nothing).
Followup changes will refine the configuration, attempt to reduce
the new dependency load, and introduce better sampling.
Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218
Solve a nil pointer dereference of `ecdsaAllowList` in `boulder-ca` by
calling `reloader.New()` in constructor `ca.NewECDSAAllowListFromFile`
instead.
- Add missing entry `ECDSAAllowListFilename` to
`test/config-next/ca-a.json` and `test/config-next/ca-b.json`
- Add missing file ecdsaAllowList.yml to `test/config-next`
- Add missing entry `ECDSAAllowedAccounts` to `test/config/ca-a.json`
and `test/config/ca-b.json`
- Move creation of the reloader to `NewECDSAAllowListFromFile`
Fixes#5414
- Add field `ECDSAAllowListFilename` to `config.CA`
- Move ECDSA allow list logic from `boulder-ca/main.go` to new file
`ca/ecdsa_allow_list.go`
- Add field `ecdsaAllowList` to `certificateAuthorityImpl`
- Update units test to account for changes to `certificateAuthorityImpl`
- Move previous allow list unit tests to `TestDeprecatedECDSAAllowList`
- Add `TestECDSAAllowList` units tests
Fixes#5361
Create a new `ocspImpl` struct which satisfies the interface required
by the `OCSPGenerator` gRPC service. Move the `GenerateOCSP`
method from the `certificateAuthorityImpl` to this new type. To support
existing gRPC clients, keep a reference to the new OCSP service in
the CA impl, and maintain a pass-through `GenerateOCSP` method.
Simplify some of the CA setup code, and make the CA implementation
non-exported because it doesn't need to be.
In order to maintain our existing signature and sign error metrics,
they now need to be initialized outside the CA and OCSP constructors.
This complicates the tests slightly, but seems like a worthwhile
tradeoff.
Fixes#5226Fixes#5086
Make the `NonCFSSLSigner` code path the only code path through the CA.
Remove all code related to the old, CFSSL-based code path. Update tests
to supply (or mock) issuers of the new kind. Remove or simplify a few
tests that were testing for behavior only exhibited by the old code
path, such as incrementing certain metrics. Remove code from `//cmd/`
for initializing the CFSSL library. Finally, mark the `NonCFSSLSigner`
feature flag itself as deprecated.
Delete the portions of the vendored CFSSL code which were only used
by these deleted code paths. This does not remove the CFSSL library
entirely, the rest of the cleanup will follow shortly.
Part of #5115
Delete the CertificateAuthorityClientWrapper, OCSPGeneratorClientWrapper,
and CertificateAuthorityServerWrapper structs, which provided no error
checking above and beyond their wrapped types. Replace them with the
corresponding auto-generated gRPC types in calling code. Update some
mocks to have the necessary variadic grpc.CallOption parameter. Finally,
delete the now-unused core.CertificateAuthority interface.
Fixes#5324
Named field `DB`, in a each component configuration struct, acts as the
receiver for the value of `db` when component JSON files are
unmarshalled.
When `cmd.DBConfig` fields are received at the root of component
configuration struct instead of `DB` copy them to the `DB` field of the
component configuration struct.
Move existing `cmd.DBConfig` values from the root of each component's
JSON configuration in `test/config-next` to `db`
Part of #5275
`cai.Stop()` called from boulder-ca could potentially exit before errors
emitted by `caSrv` and `ocspSrv` are logged. This could lead to
boulder-ca erroneously exiting `0` instead of `1`.
Add a `sync.WaitGroup`. Increment the waitgroup before `caSrv.Serve()`
and `ocspSrv.Serv()` are spun off. Await the waitgroup before
`cai.Stop()` is called.
Fixes#5246
Currently, the CA is configured with a set of `internalIssuer`s,
and a mapping of public key algorithms (e.g. `x509.RSA`) to which
internalIssuer to use. In operation today, we use the same issuer
for all kinds of public key algorithms. In the future, we will use
different issuers for different algorithms (in particular, we will
use R3 to issue for RSA keys, and E1 to issue for ECDSA keys). But
we want to roll that out slowly, continuing to use our RSA issuer
to issue for all types of public keys, except for ECDSA keys which
are presented by a specific set of allowed accounts.
This change adds a new config field to the CA, which lets us specify
a small list of registration IDs which are allowed to have issuance
from our ECDSA issuer. If the config list is empty, then all accounts
are allowed. The CA checks to see if the key being issued for is
ECDSA: if it is, it then checks to make sure that the associated
registration ID is in the allowlist. If the account is not allowed,
it then overrides the issuance algorithm to use RSA instead,
mimicking our old behavior. It also adds a new feature flag, which
can be enabled to skip the allowlist entirely (effectively allowing
all registered accounts). This feature flag will be enabled when
we're done with our testing and confident in our ECDSA issuance.
Fixes#5259
This adds a new component to the CA, ocspLogQueue, which batches up
OCSP generation events for audit logging. It will log accumulated
events when it reaches a certain line length, or when a maximum amount
of times has passed.
This was already part done: There is an ID() method in issuance. This
change extends that by:
- Defining a type alias indicating something is an IssuerID.
- Defining issuance.Certificate, which also has an ID() method,
so that components that aren't the CA can load certificates and
use the type system to mark them as issuers (and get their IDs).
- Converting akamai-purger and ca to use the new types.
- Removing idForIssuer from ca.go.
This health service implements the gRPC Health Checking
Protocol, as defined in
https://github.com/grpc/grpc/blob/master/doc/health-checking.md
and as implemented by the gRPC authors in
https://pkg.go.dev/google.golang.org/grpc/health@v1.29.0
It simply instantiates a health service, and attaches it to the same
gRPC server that is handling requests to the primary (e.g. CA) service.
When the main service would be shut down (e.g. because it caught a
signal), it also sets the status of the service to NOT_SERVING.
This change also imports the health client into our grpc client,
ensuring that all of our grpc clients use the health service to inform
their load-balancing behavior.
This will be used to replace our current usage of polling the debug
port to determine whether a given service is up and running. It may
also be useful for more comprehensive checks and blackbox probing
in the future.
Part of #5074
The CA is the only service which still defines its json config format
in the package itself, rather than in its corresponding boulder-ca cmd
package. This has allowed the CA's constructor interface to hide
arbitrary complexity inside its first argument, the whole config blob.
This change moves the CA's config to boulder-ca/main.go, to match
the other Boulder components. In the process, it makes a host of
other improvements:
It refactors the issuance package to have a cleaner configuration
interface. It also separates the config into a high-level profile (which
applies equally to all issuers), and issuer-level profiles (which apply
only to a single issuer). This does involve some code duplication,
but that will be removed when CFSSL goes away.
It adds helper functions to the issuance package to make it easier
to construct a new issuer, and takes advantage of these in the
boulder-ca package. As a result, the CA now receives fully-formed
Issuers at construction time, rather than constructing them from
nearly-complete configs during its own initialization.
It adds a Linter struct to the lint package, so that an issuer can
simply carry around a Linter, rather than a separate lint signing
key and registry of lints to run.
It makes CFSSL-specific code more clearly marked as such,
making future removal easier and cleaner.
Fixes#5070Fixes#5076
We define a "signer" to be a private key, or something that satisfies the
crypto.Signer interface. We define an "issuer" to be an object which has
both a signer (so it can sign things) and a certificate (so that the things
it signs can have appropriate issuer fields set).
As a result, this change:
- moves the new "signer" library to be called "issuance" instead
- renames several "signers" to instead be "issuers", as defined above
- renames several "issuers" to instead be "certs", to reduce confusion more
There are some further cleanups which could be made, but most of them
will be made irrelevant by the removal of the CFSSL code, so I'm leaving
them be for now.
This just changes the `loadCFSSLIssuers` signature to more closely match
the `loadBoulderIssuers` signature (it didn't need access to the whole config
object), and standardize our json on lowercase string keys.
Adds a replacement issuance library that replaces CFSSL. Usage of the
new library is gated by a feature, meaning until we fully deploy the
new signer we need to support both the new one and CFSSL, which makes
a few things a bit complicated.
One Big follow-up change is that once CFSSL is completely gone we'll
be able to stop using CSRs as the internal representation of issuance
requests (i.e. instead of passing a CSR all the way through from the
WFE -> CA and then converting it to the new signer.IssuanceRequest,
we can just construct a signer.IssuanceRequest at the WFE (or RA) and
pass that through the backend instead, making things a lot less opaque).
Fixes#4906.
We previously used mixed case names for proto imports
(e.g. both `caPB` and `rapb`), sometimes in the same file.
This change standardizes on the all-lowercase spelling,
which was predominant throughout the codebase.
This commit consists of three classes of changes:
1) Changing various command main.go files to always behave as they
would have when features.BlockedKeyTable was true. Also changing
one test in the same manner.
2) Removing the BlockedKeyTable flag from configuration in config-next,
because the flag is already live.
3) Moving the BlockedKeyTable flag to the "deprecated" section of
features.go, and regenerating featureflag_strings.go.
A future change will remove the BlockedKeyTable flag (and other
similarly deprecated flags) from features.go entirely.
Fixes#4873
This is a breaking API change: pkcs11key now takes as input a public key rather than
a private key label. In order to find the private key, it first finds the public key's CKA_ID
in the token, then looks for a private key with the same CKA_ID. From ftp://ftp.rsasecurity.com/pub/pkcs/pkcs-11/v2-30/pkcs-11v2-30b-d6.pdf:
> The CKA_ID field is intended to distinguish among multiple keys. In
the case of public and private keys, this field assists in handling
multiple keys held by the same subject; the key identifier for a
public key and its corresponding private key should be the same.
This does require that both the public key and private key are present and have
appropriate CKA_IDs set. I've verified this is the case in prod. In our integration
testing environment it was not the case, so I've tweaked entrypoint.sh to load
public keys into SoftHSM and set their CKA_ID.
The initial part of this change was written by @cpu. I've reviewed and approved
those commits.
We occasionally have reason to block public keys from being used in CSRs
or for JWKs. This work adds support for loading a YAML blocked keys list
to the WFE, the RA and the CA (all the components already using the
`goodekey` package).
The list is loaded in-memory and is intended to be used sparingly and
not for more complicated mass blocking scenarios. This augments the
existing debian weak key checking which is specific to RSA keys and
operates on a truncated hash of the key modulus. In comparison the
admin. blocked keys are identified by the Base64 encoding of a SHA256
hash over the DER encoding of the public key expressed as a PKIX subject
public key. For ECDSA keys in particular we believe a more thorough
solution would have to consider inverted curve points but to start we're
calling this approach "Good Enough".
A utility program (`block-a-key`) is provided that can read a PEM
formatted x509 certificate or a JSON formatted JWK and emit lines to be
added to the blocked keys YAML to block the related public key.
A test blocked keys YAML file is included
(`test/example-blocked-keys.yml`), initially populated with a few of the
keys from the `test/` directory. We may want to do a more through pass
through Boulder's source code and add a block entry for every test
private key.
Resolves https://github.com/letsencrypt/boulder/issues/4404
Retains the existing logging of orphaned certs until we are confident that this
solution can fully replace it (even then we may want to keep it just for auditing etc).
Fixes#3636.
We may see RPCs that are dispatched by a client but do not arrive at the server for some time afterwards. To have insight into potential request latency at this layer we want to publish the time delta between when a client sent an RPC and when the server received it.
This PR updates the gRPC client interceptor to add the current time to the gRPC request metadata context when it dispatches an RPC. The server side interceptor is updated to pull the client request time out of the gRPC request metadata. Using this timestamp it can calculate the latency and publish it as an observation on a Prometheus histogram.
Accomplishing the above required wiring a clock through to each of the client interceptors. This caused a small diff across each of the gRPC aware boulder commands.
A small unit test is included in this PR that checks that a latency stat is published to the histogram after an RPC to a test ChillerServer is made. It's difficult to do more in-depth testing because using fake clocks makes the latency 0 and using real clocks requires finding a way to queue/delay requests inside of the gRPC mechanisms not exposed to Boulder.
Updates https://github.com/letsencrypt/boulder/issues/3635 - Still TODO: Explicitly logging latency in the VA, tracking outstanding RPCs as a gauge.
This commit updates the `boulder-ra` and `boulder-ca` commands to refuse
to start if their configured `MaxNames` is 0 (the default value). This
should always be set to a positive number.
This commit also updates `csr/csr.go` to always apply the max names
check since it will never be 0 after the change above.
Also refactor `FailOnError` to pull out a separate `Fail` function.
Related to https://github.com/letsencrypt/boulder/issues/3632
Our various main.go functions gated some key code on whether the TLS
and/or GRPC config fields were present. Now that those fields are fully
deployed in production, we can simplify the code and require them.
Also, rename tls to tlsConfig everywhere to avoid confusion with the tls
package.
Avoid assigning to the same err from two different goroutines in
boulder-ca (fix a race).
Previously, there was a disagreement between WFE and CA as to what the correct
issuer certificate was. Consolidate on test-ca2.pem (h2ppy h2cker fake CA).
Also, the CA configs contained an outdated entry for "IssuerCert", which was not
being used: The CA configs now use an "Issuers" array to allow signing by
multiple issuer certificates at once (for instance when rolling intermediates).
Removed this outdated entry, and the config code for CA to load it. I've
confirmed these changes match what is currently in production.
Added an integration test to check for this problem in the future.
Fixes#3309, thanks to @icing for bringing the issue to our attention!
This also includes changes from #3321 to clarify certificates for WFE.
The go-grpc-prometheus package by default registers its metrics with Prometheus' global registry. In #3167, when we stopped using the global registry, we accidentally lost our gRPC metrics. This change adds them back.
Specifically, it adds two convenience functions, one for clients and one for servers, that makes the necessary metrics object and registers it. We run these in the main function of each server.
I considered adding these as part of StatsAndLogging, but the corresponding ClientMetrics and ServerMetrics objects (defined by go-grpc-prometheus) need to be subsequently made available during construction of the gRPC clients and servers. We could add them as fields on Scope, but this seemed like a little too much tight coupling.
Also, update go-grpc-prometheus to get the necessary methods.
```
$ go test github.com/grpc-ecosystem/go-grpc-prometheus/...
ok github.com/grpc-ecosystem/go-grpc-prometheus 0.069s
? github.com/grpc-ecosystem/go-grpc-prometheus/examples/testproto [no test files]
```
Previously, we used prometheus.DefaultRegisterer to register our stats, which uses global state to export its HTTP stats. We also used net/http/pprof's behavior of registering to the default global HTTP ServeMux, via DebugServer, which starts an HTTP server that uses that global ServeMux.
In this change, I merge DebugServer's functions into StatsAndLogging. StatsAndLogging now takes an address parameter and fires off an HTTP server in a goroutine. That HTTP server is newly defined, and doesn't use DefaultServeMux. On it is registered the Prometheus stats handler, and handlers for the various pprof traces. In the process I split StatsAndLogging internally into two functions: makeStats and MakeLogger. I didn't port across the expvar variable exporting, which serves a similar function to Prometheus stats but which we never use.
One nice immediate effect of this change: Since StatsAndLogging now requires and address, I noticed a bunch of commands that called StatsAndLogging, and passed around the resulting Scope, but never made use of it because they didn't run a DebugServer. Under the old StatsD world, these command still could have exported their stats by pushing, but since we moved to Prometheus their stats stopped being collected. We haven't used any of these stats, so instead of adding debug ports to all short-lived commands, or setting up a push gateway, I simply removed them and switched those commands to initialize only a Logger, no stats.
We removed most of the component-specific configs out of cmd a long time ago. We
left CAConfig in, because it is used directly in the NewCertificateAuthority
constructor. That means that moving CAConfig into cmd/boulder-ca would have
resulted in a circular dependency.
Eventually we probably want to decompose CAConfig so it's a set of arguments to
NewCertificateAuthority, but as a short term improvement, move the config into
its own package to break the dependency. This has the advantage of removing a
couple of big dependencies from cmd.
Fixes#3020.
In order to write integration tests for some features, especially related to rate limiting, rechecking of CAA, and expiration of authzs, orders, and certs, we need to be able to fake the passage of time in integration tests.
To do so, this change switches out all clock.Default() instances for cmd.Clock(), which can be set manually with the FAKECLOCK environment variable. integration-test.py now starts up all servers once before the main body of tests, with FAKECLOCK set to a date 70 days ago, and does some initial setup for a new integration test case. That test case tries to fetch a 70-day-old authz URL, and expects it to 404.
In order to make this work, I also had to change a number of our test binaries to shut down cleanly in response to SIGTERM. Without that change, stopping the servers between the setup phase and the main tests caused startservers.check() to fail, because some processes exited with nonzero status.
Note: This is an initial stab at things, to prove out the technique. Long-term, I think we will want to use an idiom where test cases are classes that have a number of optional setup phases that may be run at e.g. 70 days prior and 5 days prior. This could help us avoid a proliferation of global state as we add more time-dependent test cases.
Previously, we would produce an error an a nonzero status code on shutdown,
because gRPC's GracefulStop would cause s.Serve() to return an error. Now we
filter that specific error and treat it as success. This also allows us to kill
process with SIGTERM instead of SIGKILL in integration tests.
Fixes#2410.
An early design mistake meant that some fields of our services were exported
unnecessarily. In particular, fields storing handles of other services (e.g.
"SA" or "PA") were exported. This introduces the possibility of race conditions,
though in practice these fields are set at startup and never modified
concurrently.
We'd like to go through our codebase and change these all to unexported fields,
set at construction time. This is one step in that process.
This used to be used for AMQP queue names. Now that AMQP is gone, these consts
were only used when printing a version string at startup. This changes
VersionString to just use the name of the current program, and removes
`const clientName = ` from many of our main.go's.
Adds a basic truncated modulus hash check for RSA keys that can be used to check keys against the Debian `{openssl,openssh,openvpn}-blacklist` lists of weak keys generated during the [Debian weak key incident](https://wiki.debian.org/SSLkeys).
Testing is gated on adding a new configuration key to the WFE, RA, and CA configs which contains the path to a directory which should contain the weak key lists.
Fixes#157.
This removes the config and code to output to statsd.
- Change `cmd.StatsAndLogging` to output a `Scope`, not a `Statter`.
- Remove the prefixing of component name (e.g. "VA") in front of stats; this was stripped by `autoProm` but now no longer needs to be.
- Delete vendored statsd client.
- Delete `MockStatter` (generated by gomock) and `mocks.Statter` (hand generated) in favor of mocking `metrics.Scope`, which is the interface we now use everywhere.
- Remove a few unused methods on `metrics.Scope`, and update its generated mock.
- Refactor `autoProm` and add `autoRegisterer`, which can be included in a `metrics.Scope`, avoiding global state. `autoProm` now registers everything with the `prometheus.Registerer` it is given.
- Change va_test.go's `setup()` to not return a stats object; instead the individual tests that care about stats override `va.stats` directly.
Fixes#2639, #2733.
Previously, a given binary would have three TLS config fields (CA cert, cert,
key) for its gRPC server, plus each of its configured gRPC clients. In typical
use, we expect all three of those to be the same across both servers and clients
within a given binary.
This change reuses the TLSConfig type already defined for use with AMQP, adds a
Load() convenience function that turns it into a *tls.Config, and configures it
for use with all of the binaries. This should make configuration easier and more
robust, since it more closely matches usage.
This change preserves temporary backwards-compatibility for the
ocsp-updater->publisher RPCs, since those are the only instances of gRPC
currently enabled in production.