Add a new check to GoodKey which attempts to factor the public modulus
of the presented key using Fermat's factorization method. This method
will succeed if and only if the prime factors are very close to each
other -- i.e. almost certainly were not selected independently from a
random uniform distribution, but were instead calculated via some other
less secure method.
To support this new feature, add a new config flag to the RA, CA, and
WFE, which all use the GoodKey checks. As part of adding this new config
value, refactor the GoodKey config items into their own config struct
which can be re-used across all services.
If the new `FermatRounds` config value has not been set, it will default
to zero, causing no factorization to be attempted.
Fixes#5850
Part of #5851
These hashes are useful for OCSP computations, as they are the two
values that are used to uniquely identify the issuer of the given cert in
an OCSP request. Here, they are restricted to SHA1 only, as Boulder
only supports SHA1 for OCSP, as per RFC 5019.
In addition, because the `ID`, `NameID`, `NameHash`, and `KeyHash`
are relatively expensive to compute, introduce a new constructor for
`issuance.Certificate` that computes all four values at startup time and
then simply returns the precomputed values when asked.
The resulting `boulder` binary can be invoked by different names to
trigger the behavior of the relevant subcommand. For instance, symlinking
and invoking as `boulder-ca` acts as the CA. Symlinking and invoking as
`boulder-va` acts as the VA.
This reduces the .deb file size from about 200MB to about 20MB.
This works by creating a registry that maps subcommand names to `main`
functions. Each subcommand registers itself in an `init()` function. The
monolithic `boulder` binary then checks what name it was invoked with
(`os.Args[0]`), looks it up in the registry, and invokes the appropriate
`main`. To avoid conflicts, all of the old `package main` are replaced
with `package notmain`.
To get the list of registered subcommands, run `boulder --list`. This
is used when symlinking all the variants into place, to ensure the set
of symlinked names matches the entries in the registry.
Fixes#5692
Remove the last of the gRPC wrapper files. In order to do so:
- Remove the `core.StorageGetter` interface. Replace it with a new
interface (whose methods include the `...grpc.CallOption` arg)
inside the `sa/proto/` package.
- Remove the `core.StorageAdder` interface. There's no real use-case
for having a write-only interface.
- Remove the `core.StorageAuthority` interface, as it is now redundant
with the autogenerated `sapb.StorageAuthorityClient` interface.
- Replace the `certificateStorage` interface (which appears in two
different places) with a single unified interface also in `sa/proto/`.
- Update all test mocks to include the `_ ...grpc.CallOption` arg in
their method signatures so they match the gRPC client interface.
- Delete many methods from mocks which are no longer necessary (mostly
because they're mocking old authz1 methods that no longer exist).
- Move the two `test/inmem/` wrappers into their own sub-packages to
avoid an import cycle.
- Simplify the `satest` package to satisfy one of its TODOs and to
avoid an import cycle.
- Add many methods to the `test/inmem/sa/` wrapper, to accommodate all
of the methods which are called in unittests.
Fixes#5600
Pull the "was the gRPC error a Canceled error" checking code out into a
separate interceptor, and add that interceptor only in the wfe and wfe2
gRPC clients.
Although the vast majority of our cancelations come from the HTTP client
disconnecting (and that cancelation being propagated through our gRPC
stack), there are a few other situations in which we cancel gRPC
connections, including when we receive a quorum of responses from VAs
and no longer need responses from the remaining remote VA(s). This
change ensures that we do not treat those other kinds of cancelations in
the same way that we treat client-initiated cancelations.
Fixes#5444
Add Honeycomb tracing to all Boulder components which act as
HTTP servers, gRPC servers, or gRPC clients. Add many values
which we currently emit to logs to the trace spans. Add a way to
configure the Honeycomb integration to our config files, and by
default configure all of our tests to "mute" (send nothing).
Followup changes will refine the configuration, attempt to reduce
the new dependency load, and introduce better sampling.
Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218
loadChain is an unexported utility function recently added to
boulder-wfe to support the loading and validating of PEM files that
represent a certificate chain
This change moves the core loadChain functionality out of boulder-wfe to
a new exported LoadChain function in the Issuance package. All
boulder-wfe unit tests have been preserved and most of them have been
pared down and added to the Issuance package as well.
Blocks #1669Fixes#5270
This change simplifies and hardens the wfe2's support for having
multiple issuers, and multiple chains for each issuer, configured
and loaded in memory.
The only config-visible change is replacing the old two separate config
values (`certificateChains` and `alternateCertificateChains`) with a
single value (`chains`). This new value does not require the user to
know and hand-code the AIA URLs at which the certificates are available;
instead the chains are simply presented as lists of files. If this new
config value is present, the old config values will be ignored; if it
is not, the old config values will be respected.
Behind the scenes, the chain loading code has been completely changed.
Instead of loading PEM bytes directly from the file, and then asserting
various things (line endings, no trailing bits, etc) about those bytes,
we now parse a certificate from the file, and in-memory recreate the
PEM from that certificate. This approach allows the file loading to be
much more forgiving, while also being stricter: we now check that each
certificate in the chain is correctly signed by the next cert, and that
the last cert in the chain is a self-signed root.
Within the WFE itself, most of the internal structure has been retained.
However, both the internal `issuerCertificates` (used for checking
that certs we are asked to revoke were in fact issued by us) and the
`certificateChains` (used to append chains to end-entity certs when
served to clients) have been updated to be maps keyed by IssuerNameID.
This allows revocation checking to not have to iterate through the
whole list of issuers, and also makes it easy to double-check that
the signatures on end-entity certs are valid before serving them. Actual
checking of the validity will come in a follow-up change, due to the
invasive nature of the necessary test changes.
Fixes#5164
The /issuer-cert endpoint was a holdover from the v1 API, where
it is a critical part of the issuance flow. In the v2 issuance
flow, the issuer certificate is provided directly in the response
for the certificate itself. Thus, this endpoint is redundant.
Stats show that it receives approximately zero traffic (less than
one request per week, all of which are now coming from wget or
browser useragents). It also complicates the refactoring necessary
for the v2 API to support multiple issuers.
As such, it is a safe and easy decision to remove it.
Fixes#5196
Having both of these very similar methods sitting around
only serves to increase confusion. This removes the last
few places which use `cmd.LoadCert` and replaces them
with `core.LoadCert`, and deletes the method itself.
Fixes#5163
This commit consists of three classes of changes:
1) Changing various command main.go files to always behave as they
would have when features.BlockedKeyTable was true. Also changing
one test in the same manner.
2) Removing the BlockedKeyTable flag from configuration in config-next,
because the flag is already live.
3) Moving the BlockedKeyTable flag to the "deprecated" section of
features.go, and regenerating featureflag_strings.go.
A future change will remove the BlockedKeyTable flag (and other
similarly deprecated flags) from features.go entirely.
Fixes#4873
log.Logger, the wrapper type that http.Server.ErrorLog uses will append
a newline to every line before calling Write on the inner logger if the
line doesn't already contain one. This breaks our checksum generation/
verification code because syslog will strip newlines. So that we don't
generate irreproducible checksums we strip the newline that log.Logger
added.
Fixes#4812
In theory we should only receive well-behaved requests, but just in case
there are network issues, this may keep us from waiting forever on a
dead connection.
Also, set the ErrorLog field of our http.Servers so we can collect logs for
unusual problems.
Since Boulder's log system adds checksums to lines, but log-validator
processes entries on a per-line basis, including newlines in log
messages can cause a validation failure.
Closes#4567.
Enabled in `config-next`.
This PR cross-signs the existing issuers (`test-ca-cross.pem`, `test-ca2-cross.pem`) with a new root (`test-root2.key`, `test-root2.pem` = *c2ckling cryptogr2pher f2ke ROOT*).
The cross-signed issuers are referenced in wfe2's configuration, beside the existing `certificateChains` key:
```json
"certificateChains": {
"http://boulder:4430/acme/issuer-cert": [ "test/test-ca2.pem" ],
"http://127.0.0.1:4000/acme/issuer-cert": [ "test/test-ca2.pem" ]
},
"alternateCertificateChains": {
"http://boulder:4430/acme/issuer-cert": [ "test/test-ca2-cross.pem" ],
"http://127.0.0.1:4000/acme/issuer-cert": [ "test/test-ca2-cross.pem" ]
},
```
When this key is populated, the WFE will send links for all alternate certificate chains available for the current end-entity certificate (except for the chain sent in the current response):
Link: <http://localhost:4001/acme/cert/ff5d3d84e777fc91ae3afb7cbc1d2c7735e0/1>;rel="alternate"
For backwards-compatibility, not specifying a chain is the same as specifying `0`: `/acme/cert/{serial} == /acme/cert/{serial}/0` and `0` always refers to the default certificate chain for that issuer (i.e. the value of `certificateChains[aiaIssuerURL]`).
This builds on the work @sh7dm started in #4600. I primarily did some
refactoring, added enforcement of the stale check for authorizations and
challenges, and completed the unit test coverage.
A new Boulder-specific (e.g. not specified by ACME / RFC 8555) API is added for
fetching order, authorization, challenge, and certificate resources by URL
without using POST-as-GET. Since we intend this API to only be used by humans
for debugging and we want to ensure ACME client devs use the standards compliant
method we restrict the GET API to only allowing access to "stale" resources
where the required staleness is defined by the WFE2 "staleTimeout"
configuration value (set to 5m in dev/CI).
Since authorizations don't have a creation date tracked we add
a `authorizationLifetimeDays` and `pendingAuthorizationLifetimeDays`
configuration parameter to the WFE2 that matches the RA's configuration. These
values are subtracted from the authorization expiry to find the creation date to
enforce the staleness check for authz/challenge GETs.
One other note: Resources accessed via the GET API will have Link relation URLs
pointing to the standard ACME API URL. E.g. a GET to a stale challenge will have
a response header with a link "up" relation URL pointing at the POST-as-GET URL
for the associated authorization. I wanted to avoid complicating
`prepAuthorizationForDisplay` and `prepChallengeForDisplay` to be aware of the
GET API and update or exclude the Link relations. This seems like a fine
trade-off since we don't expect machine consumption of the GET API results
(these are for human debugging).
Replaces #4600Resolves#4577
In a handful of places I've nuked old stats which are not used in any alerts or dashboards as they either duplicate other stats or don't provide much insight/have never actually been used. If we feel like we need them again in the future it's trivial to add them back.
There aren't many dashboards that rely on old statsd style metrics, but a few will need to be updated when this change is deployed. There are also a few cases where prometheus labels have been changed from camel to snake case, dashboards that use these will also need to be updated. As far as I can tell no alerts are impacted by this change.
Fixes#4591.
Previously, we referred to "DER encoded PKIX public keys", but PKIX (RFC 5280)
doesn't define a standalone "public key" type. Instead, it defines
SubjectPublicKeyInfo, containing an algorithm and a BIT STRING. As a
result, SPKI and SPKI hash are more commonly used terms, and we're more
likely to get reports based on those. We should mirror that terminology
in our documentation.
When the `features.PrecertificateRevocation` feature flag is enabled the WFE2
will allow revoking certificates for a submitted precertificate. The legacy WFE1
behaviour remains unchanged (as before (pre)certificates issued through the V1
API will be revocable with the V2 API).
Previously the WFE2 vetted the certificate from the revocation request by
looking up a final certificate by the serial number in the requested
certificate, and then doing a byte for byte comparison between the stored and
requested certificate.
Rather than adjust this logic to handle looking up and comparing stored
precertificates against requested precertificates (requiring new RPCs and an
additional round-trip) we choose to instead check the signature on the requested
certificate or precertificate and consider it valid for revocation if the
signature validates with one of the WFE2's known issuers. We trust the integrity
of our own signatures.
An integration test that performs a revocation of a precertificate (in this case
one that never had a final certificate issued due to SCT embedded errors) with
all of the available authentication mechanisms is included.
Resolves https://github.com/letsencrypt/boulder/issues/4414
We occasionally have reason to block public keys from being used in CSRs
or for JWKs. This work adds support for loading a YAML blocked keys list
to the WFE, the RA and the CA (all the components already using the
`goodekey` package).
The list is loaded in-memory and is intended to be used sparingly and
not for more complicated mass blocking scenarios. This augments the
existing debian weak key checking which is specific to RSA keys and
operates on a truncated hash of the key modulus. In comparison the
admin. blocked keys are identified by the Base64 encoding of a SHA256
hash over the DER encoding of the public key expressed as a PKIX subject
public key. For ECDSA keys in particular we believe a more thorough
solution would have to consider inverted curve points but to start we're
calling this approach "Good Enough".
A utility program (`block-a-key`) is provided that can read a PEM
formatted x509 certificate or a JSON formatted JWK and emit lines to be
added to the blocked keys YAML to block the related public key.
A test blocked keys YAML file is included
(`test/example-blocked-keys.yml`), initially populated with a few of the
keys from the `test/` directory. We may want to do a more through pass
through Boulder's source code and add a block entry for every test
private key.
Resolves https://github.com/letsencrypt/boulder/issues/4404
Basically a complete re-write/re-design of the forwarding concept introduced in
#4297 (sorry for the rapid churn here). Instead of nonce-services blindly
forwarding nonces around to each other in an attempt to find out who issued the
nonce we add an identifying prefix to each nonce generated by a service. The
WFEs then use this prefix to decide which nonce-service to ask to validate the
nonce.
This requires a slightly more complicated configuration at the WFE/2 end, but
overall I think ends up being a way cleaner, more understandable, easy to
reason about implementation. When configuring the WFE you need to provide two
forms of gRPC config:
* one gRPC config for retrieving nonces, this should be a DNS name that
resolves to all available nonce-services (or at least the ones you want to
retrieve nonces from locally, in a two DC setup you might only configure the
nonce-services that are in the same DC as the WFE instance). This allows
getting a nonce from any of the configured services and is load-balanced
transparently at the gRPC layer.
* a map of nonce prefixes to gRPC configs, this maps each individual
nonce-service to it's prefix and allows the WFE instances to figure out which
nonce-service to ask to validate a nonce it has received (in a two DC setup
you'd want to configure this with all the nonce-services across both DCs so
that you can validate a nonce that was generated by a nonce-service in another
DC).
This balancing is implemented in the integration tests.
Given the current remote nonce code hasn't been deployed anywhere yet this
makes a number of hard breaking changes to both the existing nonce-service
code, and the forwarding code.
Fixes#4303.
While we intended to allow legacy ACME v1 accounts created through the WFE to work with the ACME v2 implementation and the WFE2 we neglected to consider that a legacy account would have a Key ID URL that doesn't match the expected for a V2 account. This caused `wfe2/verify.go`'s `lookupJWK` to reject all POST requests authenticated by a legacy account unless the ACME client took the extra manual step of "fixing" the URL.
This PR adds a configuration parameter to the WFE2 for an allowed legacy key ID prefix. The WFE2 verification logic is updated to allow both the expected key ID prefix and the configured legacy key ID prefix. This will allow us to specify the correct legacy URL in configuration for both staging/prod to allow unmodified V1 ACME accounts to be used with ACME v2.
Resolves https://github.com/letsencrypt/boulder/issues/3674
A very large number of the logger calls are of the form log.Function(fmt.Sprintf(...)).
Rather than sprinkling fmt.Sprintf at every logger call site, provide formatting versions
of the logger functions and call these directly with the format and arguments.
While here remove some unnecessary trailing newlines and calls to String/Error.
We may see RPCs that are dispatched by a client but do not arrive at the server for some time afterwards. To have insight into potential request latency at this layer we want to publish the time delta between when a client sent an RPC and when the server received it.
This PR updates the gRPC client interceptor to add the current time to the gRPC request metadata context when it dispatches an RPC. The server side interceptor is updated to pull the client request time out of the gRPC request metadata. Using this timestamp it can calculate the latency and publish it as an observation on a Prometheus histogram.
Accomplishing the above required wiring a clock through to each of the client interceptors. This caused a small diff across each of the gRPC aware boulder commands.
A small unit test is included in this PR that checks that a latency stat is published to the histogram after an RPC to a test ChillerServer is made. It's difficult to do more in-depth testing because using fake clocks makes the latency 0 and using real clocks requires finding a way to queue/delay requests inside of the gRPC mechanisms not exposed to Boulder.
Updates https://github.com/letsencrypt/boulder/issues/3635 - Still TODO: Explicitly logging latency in the VA, tracking outstanding RPCs as a gauge.
This commit addresses two config elements that were defined but not
wired through to the WFE implementation object. Prior to this commit the
`c.WFE.DirectoryCAAIdentity` and `c.WFE.DirectoryWebsite` configuration
values were read and unmarshaled from config but not passed to the WFE.
After this commit these two config options will be picked up by the WFE
impl.
This commit updates the WFE and WFE2 to have configuration support for
setting a value for the `/directory` object's "meta" field's
optional "caaIdentities" and "website" fields. The config-next wfe/wfe2
configuration are updated with values for these fields. Unit tests are
updated to check that they are sent when expected and not otherwise.
Bonus content: The `test.AssertUnmarshaledEquals` function had a bug
where it would consider two inputs equal when the # of keys differed.
This commit also fixes that bug.
The outer `config.SubscriberAgreementURL` field has been deprecated for
a while in favour of `config.wfe.SubscriberAgreementURL`. After
verifying the prod/staging configurations do not use the legacy field
this commit removes it.
* Reject WFE2 certificate chain PEM files with CRLF endings.
This commit updates the `boulder-wfe2` command's processing of
certificate chains such that it will reject chain files that contain PEM
encoding with Windows CRLF line endings. Boulder is a UNIX service and
throughout we assume UNIX newlines. CRLF endings in a certificate chain
input file is an error that should be resolved by the operator prior to
startup.
* Add trailing newline to PEM chainfiles automatically.
If a PEM encoded chain file doesn't end with a trailing `\n` the WFE2
should add it. This commit updates the chain file loading to handle this
and adds a corresponding unit test.
For a long time now the WFE has generated URLs based on the incoming
request rather than a hardcoded BaseURL. BaseURL is no longer set in the
prod configs.
This also allows factoring out relativeEndpoint into the web package.
Our various main.go functions gated some key code on whether the TLS
and/or GRPC config fields were present. Now that those fields are fully
deployed in production, we can simplify the code and require them.
Also, rename tls to tlsConfig everywhere to avoid confusion with the tls
package.
Avoid assigning to the same err from two different goroutines in
boulder-ca (fix a race).
Boulder is fairly noisy about gRPC connection errors. This is a mixed
blessing: Our gRPC configuration will try to reconnect until it hits
an RPC deadline, and most likely eventually succeed. In that case,
we don't consider those to really be errors. However, in cases where
a connection is repeatedly failing, we'd like to see errors in the
logs about connection failure, rather than "deadline exceeded." So
we want to keep logging of gRPC errors.
However, right now we get a lot of these errors logged during
integration tests. They make the output hard to read, and may disguise
more serious errors. So we'd like to avoid causing such errors in
normal integration test operation.
This change reorders the startup of Boulder components by their gRPC
dependencies, so everything's backend is likely to be up and running
before it starts. It also reverses that order for clean shutdowns,
and waits for each process to exit before signalling the next one.
With these changes, I still got connection errors. Taking listenbuddy
out of the gRPC path fixed them. I believe the issue is that
listenbuddy is not a truly transparent proxy. In particular, it
accepts an inbound TCP connection before opening an outbound TCP
connection. If opening that outbound connection results in "connection
refused," it closes the inbound connection. That means gRPC sees a
"connection closed" (or "connection reset"?) rather than "connection
refused". I'm guessing it handles those cases differently, explaining
the different error results.
We've been using listenbuddy to trigger disconnects while Boulder is
running, to ensure that gRPC's reconnect code works. I think we can
probably rely on gRPC's reconnect to work. The initial problem that
led us to start testing this was a configuration problem; now that
we have the configuration we want, we should be fine and don't need
to keep testing reconnects on every integration test run.
This commit implements a mapping from certificate AIA Issuer URL to PEM
encoded certificate chain. GET's to the V2 Certificate endpoint will
return a full PEM encoded certificate chain in addition to the leaf cert
using the AIA issuer URL of the leaf cert and the configured mapping.
The boulder-wfe2 command builds the chain mapping by reading the
"wfe" config section's 'certificateChains" field, specifying a list
of file paths to PEM certificates for each AIA issuer URL. At startup
the PEM file contents are ready, verified and separated by a newline.
The resulting populated AIA issuer URL -> PEM cert chain mapping is
given to the WFE for use with the Certificate endpoint.
Resolves#3291
This commit removes `CertCacheDuration`, `CertNoCacheExpirationWindow`,
`IndexCacheDuration` and `IssuerCacheDuration`. These were read from
config values that weren't set in config/config-next into WFE struct
fields that were never referenced in any code.
The go-grpc-prometheus package by default registers its metrics with Prometheus' global registry. In #3167, when we stopped using the global registry, we accidentally lost our gRPC metrics. This change adds them back.
Specifically, it adds two convenience functions, one for clients and one for servers, that makes the necessary metrics object and registers it. We run these in the main function of each server.
I considered adding these as part of StatsAndLogging, but the corresponding ClientMetrics and ServerMetrics objects (defined by go-grpc-prometheus) need to be subsequently made available during construction of the gRPC clients and servers. We could add them as fields on Scope, but this seemed like a little too much tight coupling.
Also, update go-grpc-prometheus to get the necessary methods.
```
$ go test github.com/grpc-ecosystem/go-grpc-prometheus/...
ok github.com/grpc-ecosystem/go-grpc-prometheus 0.069s
? github.com/grpc-ecosystem/go-grpc-prometheus/examples/testproto [no test files]
```
There were two bugs in #3167:
All process-level stats got prefixed with "boulder", which broke dashboards.
All request_time stats got dropped, because measured_http was using the prometheus DefaultRegisterer.
To fix, this PR plumbs through a scope object to measured_http, and uses an empty prefix when calling NewProcessCollector().
Previously, we used prometheus.DefaultRegisterer to register our stats, which uses global state to export its HTTP stats. We also used net/http/pprof's behavior of registering to the default global HTTP ServeMux, via DebugServer, which starts an HTTP server that uses that global ServeMux.
In this change, I merge DebugServer's functions into StatsAndLogging. StatsAndLogging now takes an address parameter and fires off an HTTP server in a goroutine. That HTTP server is newly defined, and doesn't use DefaultServeMux. On it is registered the Prometheus stats handler, and handlers for the various pprof traces. In the process I split StatsAndLogging internally into two functions: makeStats and MakeLogger. I didn't port across the expvar variable exporting, which serves a similar function to Prometheus stats but which we never use.
One nice immediate effect of this change: Since StatsAndLogging now requires and address, I noticed a bunch of commands that called StatsAndLogging, and passed around the resulting Scope, but never made use of it because they didn't run a DebugServer. Under the old StatsD world, these command still could have exported their stats by pushing, but since we moved to Prometheus their stats stopped being collected. We haven't used any of these stats, so instead of adding debug ports to all short-lived commands, or setting up a push gateway, I simply removed them and switched those commands to initialize only a Logger, no stats.
Fixes#3020.
In order to write integration tests for some features, especially related to rate limiting, rechecking of CAA, and expiration of authzs, orders, and certs, we need to be able to fake the passage of time in integration tests.
To do so, this change switches out all clock.Default() instances for cmd.Clock(), which can be set manually with the FAKECLOCK environment variable. integration-test.py now starts up all servers once before the main body of tests, with FAKECLOCK set to a date 70 days ago, and does some initial setup for a new integration test case. That test case tries to fetch a 70-day-old authz URL, and expects it to 404.
In order to make this work, I also had to change a number of our test binaries to shut down cleanly in response to SIGTERM. Without that change, stopping the servers between the setup phase and the main tests caused startservers.check() to fail, because some processes exited with nonzero status.
Note: This is an initial stab at things, to prove out the technique. Long-term, I think we will want to use an idiom where test cases are classes that have a number of optional setup phases that may be run at e.g. 70 days prior and 5 days prior. This could help us avoid a proliferation of global state as we add more time-dependent test cases.
This used to be used for AMQP queue names. Now that AMQP is gone, these consts
were only used when printing a version string at startup. This changes
VersionString to just use the name of the current program, and removes
`const clientName = ` from many of our main.go's.
This PR is the initial duplication of the WFE to create a WFE2
package. The rationale is briefly explained in `wfe2/README.md`.
Per #2822 this PR only lays the groundwork for further customization
and deduplication. Presently both the WFE and WFE2 are identical except
for the following configuration differences:
* The WFE offers HTTP and HTTPS on 4000 and 4430 respectively, the WFE2
offers HTTP on 4001 and 4431.
* The WFE has a debug port on 8000, the WFE2 uses the next free "8000
range port" and puts its debug service on 8013
Resolves https://github.com/letsencrypt/boulder/issues/2822