Remove the backwards-compatible profile hashing code. It is no longer
necessary, since all deployed profile configs now set
IncludeCRLDistributionPoints to true and set the UnsplitIssuance flag to
true. Catch up the CA and crl-updater configs to match config-next and
what is actively deployed in prod.
Part of https://github.com/letsencrypt/boulder/issues/8039
Part of https://github.com/letsencrypt/boulder/issues/8059
Add a new custom lint which sends CRLs to PKIMetal, and configure it to
run in our integration test environment. Factor out most of the code
used to talk to the PKIMetal API so that it can be shared by the two
custom lints which do so. Add the ability to configure lints to the
CRLProfileConfig, so that zlint knows where to load the necessary custom
config from.
Compute the width of the ARI suggested renewal window as 2% of the
validity period. This means that 90-day certificates have their
suggested window shrink slightly from 48 hours to 43.2 hours, and gives
six-day (160h) certs a suggested window 3.2 hours wide.
Also move the center of that window to the midpoint of the certificate
validity period for certs which are valid for less than 10 days, so that
operators have (proportionally) a little more time to respond to renewal
issues.
Fixes https://github.com/letsencrypt/boulder/issues/7996
Add a new RPC to the CA: `IssueCertificate` covers issuance of both the
precertificate and the final certificate. In between, it calls out to
the RA's new method `GetSCTs`.
The RA calls the new `CA.IssueCertificate` if the `UnsplitIssuance`
feature flag is true.
The RA had a metric that counted certificates by profile name and hash.
Since the RA doesn't receive a profile hash in the new flow, simply
record the total number of issuances.
Fixes https://github.com/letsencrypt/boulder/issues/7983
When we turn on explicit sharding, we'll change the CA serial prefix, so
we can know that all issuance from the new prefixes uses explicit
sharding, and all issuance from the old prefixes uses temporal sharding.
This lets us avoid putting a revoked cert in two different CRL shards
(the temporal one and the explicit one).
To achieve this, the crl-updater gets a list of temporally sharded
serial prefixes. When it queries the `certificateStatus` table by date
(`GetRevokedCerts`), it will filter out explicitly sharded certificates:
those that don't have their prefix on the list.
Part of #7094
The CRLDP is included only when the profile's
IncludeCRLDistributionPoints field is true.
Introduce a new config field for issuers, CRLShards. If
IncludeCRLDistributionPoints is true and this is zero, issuance will
error.
The CRL shard is assigned at issuance time based on the (random) low
bits of the serial number.
Part of https://github.com/letsencrypt/boulder/issues/7094
In revocation_test.go, fetch all CRLs, and look for revoked certificates
on both CRLs and OCSP.
Make s3-test-srv listen on all interfaces, so the CRL URLs in the CA
config work.
Add IssuerNameIDs to the CRL URLs in ca.json, to match how those CRLs
are uploaded to S3.
Make TestRevocation parallel. Speedup from ~60s to ~3s.
Increase ocsp-responder's allowed parallelism to account for parallel
test. Also, add "maxInflightSignings" to config/ since it's in prod.
"maxSigningWaiters" is not yet in prod, so don't move that field.
Add a mutex around running crl-updater, and decrease the log level so
errors stand out more when they happen.
This brings in several new and useful lints. It also brings in one CABF
BR lint which we have to ignore in our default profile which includes
the Subject Key Identifier extension:
"w_ext_subject_key_identifier_not_recommended_subscriber". In our modern
profile which omits several fields, we have to ignore the opposite
RFC5280 lint "w_ext_subject_key_identifier_missing_sub_cert".
Release notes: https://github.com/zmap/zlint/releases/tag/v3.6.4
Changelog: https://github.com/zmap/zlint/compare/v3.6.0...v3.6.4
Note that the majority of the ~400 file changes are merely copyright
date changes.
The corresponding production config changes tracked in IN-10466 are
complete.
Goodkey has two ways to detect a key as weak: it runs a variety of
algorithmic checks (such as Fermat factorization and rocacheck), or the
key can be listed in a "weak key file". Similarly, it has two ways to
detect a key as blocked: it can call a generic function (which we use to
query our database), or the key can be listed in a "blocked key file".
This is two methods too many. Reliance on files of weak or blocked keys
introduces unnecessary complexity to both the implementation and
configuration of the goodkey package. Remove both "key file" options and
delete all code which supported them.
Also remove //test/block-a-key, as it was only used to generate these
test files.
IN-10762 tracked the removal of these files in prod.
Fixes https://github.com/letsencrypt/boulder/issues/7748
Add a new SerialPrefixHex field to the CA's config, which takes a
two-character hexadecimal string to use as the serial prefix. This
matches the way that the OCSP Responder's acceptable serial prefixes are
configured, and is easier for human operators to configure than raw
integers.
At the same time, change the type of the CA's internal serial prefix
from `int` to `byte`, using the type system to enforce its 8-bit length.
Fixes#7213
Move the two lint-configuration keys, LintConfig and IgnoreLints, from
the top-level CA.Issuance config stanza into each individual
CA.Issuance.CertProfiles stanza. This allows us to have
differently-configured lints for different profiles, to ensure that our
linting regime is as strict as possible.
Without this change, it would be necessary for us to ignore both the
"common name included" and the "no subject key id" lints at the
top-level, when in fact each of those warnings only triggers on one of
our two profiles.
Fixes https://github.com/letsencrypt/boulder/issues/7635
Add three new keys to the CA's ProfileConfig:
- OmitKeyEncipherment causes the keyEncipherment Key Usage to be omitted
from certificates with RSA public keys. We currently include it for
backwards compatibility with TLS 1.1 servers that don't support modern
cipher suites, but this KU is completely useless as of TLS 1.3.
- OmitClientAuth causes the tlsClientAuthentication Extended Key Usage
to be omitted from all certificates. We currently include it to support
any subscribers who may be relying on it, but Root Programs are moving
towards single-purpose hierarchies and its inclusion is being
discouraged.
- OmitSKID causes the Subject Key Identifier extension to be omitted
from all certificates. We currently include this extension because it is
recommended by RFC 5280, but it serves little to no practical purpose
and consumes a large number of bytes, so it is now NOT RECOMMENDED by
the Baseline Requirements.
Make substantive changes to issuer.requestValid and issuer.Prepare to
implement the desired behavior for each of these options. Make a very
slight change to ra.matchesCSR to generally allow for serverAuth-only
EKUs. Improve the unit tests of both the //ca and //issuance packages to
cover the new behavior.
Part of https://github.com/letsencrypt/boulder/issues/7610
One of our goals with profiles is to allow different profiles to have
different validity periods. While the profiles already had the ability
to enforce different maximum backdates and validities, the CA still had
separate global configuration for what the backdate and validity period
should actually be.
Move the computation of the notBefore and notAfter timestamps into the
issuance package, so that it can be based on the profile's configured
backdate and validity durations. Deprecate the global "backdate" and
"expiry" config fields, as they are no longer used. Finally, add more
validation for the profile's backdate and validity.
Part of https://github.com/letsencrypt/boulder/issues/7610
Add a new profile config key named "OmitCommonName" which, if set to
`true`, causes the issuance package to exclude the CN from the resulting
certificate even if the initiating IssuanceRequest specified one.
Deprecate the old "AllowCommonName" config key, so that it no longer has
any effect, rather than causing the issuance package to fully reject
IssuanceRequests containing a CN.
This allows for more graceful variation between profiles, since we know
that excluding the Common Name is always safe.
Part of https://github.com/letsencrypt/boulder/issues/7610
These profile variables are set to "true" everywhere, and we have no
intention of ever setting them to "false" anywhere. Deprecate them so
that they can be removed in the future, and to reduce the chances of
confusion when new profile variables are introduced in the near future.
Part of https://github.com/letsencrypt/boulder/issues/7610
This change guarantees compliance with CA/BF Ballot SC-073 "Compromised
and Weak Keys", which requires that at least 100 rounds of Fermat
Factorization be attempted:
> Section 6.1.1.3 Subscriber Key Pair Generation
> The CA SHALL reject a certificate request if... The Public Key
corresponds to an industry-demonstrated weak Private Key. For requests
submitted on or after November 15, 2024,... In the case of Close Primes
vulnerability (https://fermatattack.secvuln.info/), the CA SHALL reject
weak keys which can be factored within 100 rounds using Fermat’s
factorization method.
We choose 110 rounds to ensure a margin above and beyond the requirements.
Fixes https://github.com/letsencrypt/boulder/issues/7558
`ECDSAForAll` feature is now enabled by default (due to it not being
referenced in any issuance path) and as a result the `ECDSAAllowlist`
has been deleted.
Fixes https://github.com/letsencrypt/boulder/issues/7535
The summary here is:
- Move test/cert-ceremonies to test/certs
- Move .hierarchy (generated by the above) to test/certs/webpki
- Remove our mapping of .hierarchy to /hierarchy inside docker
- Move test/grpc-creds to test/certs/ipki
- Unify the generation of both test/certs/webpki and test/certs/ipki
into a single script at test/certs/generate.sh
- Make that script the entrypoint of a new docker compose service
- Have t.sh and tn.sh invoke that service to ensure keys and certs are
created before tests run
No production changes are necessary, the config changes here are just
for testing purposes.
Part of https://github.com/letsencrypt/boulder/issues/7476
Add a new "LintConfig" item to the CA's config, which can point to a
zlint configuration toml file. This allows lints to be configured, e.g.
to control the number of rounds of factorization performed by the Fermat
factorization lint.
Leverage this new config to create a new custom zlint which calls out to
a configured pkilint API endpoint. In config-next integration tests,
configure the lint to point at a new pkilint docker container.
This approach has three nice forward-looking features: we now have the
ability to configure any of our lints; it's easy to expand this
mechanism to lint CRLs when the pkilint API has support for that; and
it's easy to enable this new lint if we decide to stand up a pkilint
container in our production environment.
No production configuration changes are necessary at this time.
Fixes https://github.com/letsencrypt/boulder/issues/7430
Replace the CA's "useForRSA" and "useForECDSA" config keys with a single
"active" boolean. When the CA starts up, all active RSA issuers will be
used to issue precerts with RSA pubkeys, and all ECDSA issuers will be
used to issue precerts with ECDSA pubkeys (if the ECDSAForAll flag is
true; otherwise just those that are on the allow-list). All "inactive"
issuers can still issue OCSP responses, CRLs, and (notably) final
certificates.
Instead of using the "useForRSA" and "useForECDSA" flags, plus implicit
config ordering, to determine which issuer to use to handle a given
issuance, simply use the issuer's public key algorithm to determine
which issuances it should be handling. All implicit ordering
considerations are removed, because the "active" certificates now just
form a pool that is sampled from randomly.
To facilitate this, update some unit and integration tests to be more
flexible and try multiple potential issuing intermediates, particularly
when constructing OCSP requests.
For this change to be safe to deploy with no user-visible behavior
changes, the CA configs must contain:
- Exactly one RSA-keyed intermediate with "useForRSALeaves" set to true;
and
- Exactly one ECDSA-keyed intermediate with "useForECDSALeaves" set to
true.
If the configs contain more than one intermediate meeting one of the
bullets above, then randomized issuance will begin immediately.
Fixes https://github.com/letsencrypt/boulder/issues/7291
Fixes https://github.com/letsencrypt/boulder/issues/7290
Update the hierarchy which the integration tests auto-generate inside
the ./hierarchy folder to include three intermediates of each key type,
two to be actively loaded and one to be held in reserve. To facilitate
this:
- Update the generation script to loop, rather than hard-coding each
intermediate we want
- Improve the filenames of the generated hierarchy to be more readable
- Replace the WFE's AIA endpoint with a thin aia-test-srv so that we
don't have to have NameIDs hardcoded in our ca.json configs
Having this new hierarchy will make it easier for our integration tests
to validate that new features like "unpredictable issuance" are working
correctly.
Part of https://github.com/letsencrypt/boulder/issues/729
This change introduces a new config key `certProfiles` which contains a
map of `profiles`. Only one of `profile` or `certProfiles` should be
used, because configuring both will result in the CA erroring and
shutting down. Further, the singular `profile` is now
[deprecated](https://github.com/letsencrypt/boulder/issues/7414).
The CA pre-computes several maps at startup;
* A human-readable name to a `*issuance.Profile` which is referred to as
"name".
* A SHA-256 sum over the entire contents of the given profile to the
`*issuance.Profile`. We'll refer to this as "hash".
Internally, CA methods no longer pass an `*issuance.Profile`, instead
they pass a structure containing maps of certificate profile
identifiers. To determine the default profile used by the CA, a new
config field `defaultCertificateProfileName` has been added to the
Issuance struct. Absence of `defaultCertificateProfileName` will cause
the CA to use the default value of `defaultBoulderCertificateProfile`
such as for the the deprecated `profile`. The key for each given
certificate profile will be used as the "name". Duplicate names or
hashes will cause the CA to error during initialization and shutdown.
When the RA calls `ra.CA.IssuePrecertificate`, it will pass an arbitrary
certificate profile name to the CA triggering the CA to lookup if the
name exists in its internal mapping. The RA maintains no state or
knowledge of configured certificate profiles and relies on the CA to
provide this information. If the name exists in the CA's map, it will
return the hash along with the precertificate bytes in a
`capb.IssuePrecertificateResponse`. The RA will then call
`ra.CA.IssueCertificateForPrecertificate` with that same hash. The CA
will lookup the hash to determine if it exists in its map, and if so
will continue on with certificate issuance.
Precertificate and certificate issuance audit logs will now include the
certificate profile name and hex representation of the hash that they
were issued with.
Fixes https://github.com/letsencrypt/boulder/issues/6966
There are no required config or SQL changes.
Remove three deprecated feature flags which have been removed from all
production configs:
- StoreLintingCertificateInsteadOfPrecertificate
- LeaseCRLShards
- AllowUnrecognizedFeatures
Deprecate three flags which are set to true in all production configs:
- CAAAfterValidation
- AllowNoCommonName
- SHA256SubjectKeyIdentifier
IN-9879 tracked the removal of these flags.
Move the CRL issuance logic -- building an x509.RevocationList template,
populating it with correctly-built extensions, linting it, and actually
signing it -- out of the //ca package and into the //issuance package.
This means that the CA's CRL code no longer needs to be able to reach
inside the issuance package to access its issuers and certificates (and
those fields will be able to be made private after the same is done for
OCSP issuance).
Additionally, improve the configuration of CRL issuance, create
additional checks on CRL's ThisUpdate and NextUpdate fields, and make it
possible for a CRL to contain two IssuingDistributionPoint URIs so that
we can migrate to shorter addresses.
IN-10045 tracks the corresponding production changes.
Fixes https://github.com/letsencrypt/boulder/issues/7159
Part of https://github.com/letsencrypt/boulder/issues/7296
Part of https://github.com/letsencrypt/boulder/issues/7294
Part of https://github.com/letsencrypt/boulder/issues/7094
Part of https://github.com/letsencrypt/boulder/issues/7100
Upgrade to zlint v3.6.0
Two new lints are triggered in various places:
aia_contains_internal_names is ignored in integration test
configurations, and unit tests are updated to have more realistic URLs.
The w_subject_common_name_included lint needs to be ignored where we'd
ignored n_subject_common_name_included before.
Related to https://github.com/letsencrypt/boulder/issues/7261
Retains the existing logging of orphaned certs until we are confident that this
solution can fully replace it (even then we may want to keep it just for auditing etc).
Fixes#3636.
Things removed:
* features.EmbedSCTs (and all the associated RA/CA/ocsp-updater code etc)
* ca.enablePrecertificateFlow (and all the associated RA/CA code)
* sa.AddSCTReceipt and sa.GetSCTReceipt RPCs
* publisher.SubmitToCT and publisher.SubmitToSingleCT RPCs
Fixes#3755.
We'd like to start using the DNS load balancer in the latest version of gRPC. That means putting all IPs for a service under a single hostname (or using a SRV record, but we're not taking that path). This change adds an sd-test-srv to act as our service discovery DNS service. It returns both Boulder IP addresses for any A lookup ending in ".boulder". This change also sets up the Docker DNS for our boulder container to defer to sd-test-srv when it doesn't know an answer.
sd-test-srv doesn't know how to resolve public Internet names like `github.com`. Resolving public names is required for the `godep-restore` test phase, so this change breaks out a copy of the boulder container that is used only for `godep-restore`.
This change implements a shim of a DNS resolver for gRPC, so that we can switch to DNS-based load balancing with the currently vendored gRPC, then when we upgrade to the latest gRPC we won't need a simultaneous config update.
Also, this change introduces a check at the end of the integration test that each backend received at least one RPC, ensuring that we are not sending all load to a single backend.
We're currently stuck on gRPC v1.1 because of a breaking change to certificate validation in gRPC 1.8. Our gRPC balancer uses a static list of multiple hostnames, and expects to validate against those hostnames. However gRPC expects that a service is one hostname, with multiple IP addresses, and validates all those IP addresses against the same hostname. See grpc/grpc-go#2012.
If we follow gRPC's assumptions, we can rip out our custom Balancer and custom TransportCredentials, and will probably have a lower-friction time in general.
This PR is the first step in doing so. In order to satisfy the "multiple IPs, one port" property of gRPC backends in our Docker container infrastructure, we switch to Docker's user-defined networking. This allows us to give the Boulder container multiple IP addresses on different local networks, and gives it different DNS aliases in each network.
In startservers.py, each shard of a service listens on a different DNS alias for that service, and therefore a different IP address. The listening port for each shard of a service is now identical.
This change also updates the gRPC service certificates. Now, each certificate that is used in a gRPC service (as opposed to something that is "only" a client) has three names. For instance, sa1.boulder, sa2.boulder, and sa.boulder (the generic service name). For now, we are validating against the specific hostnames. When we update our gRPC dependency, we will begin validating against the generic service name.
Incidentally, the DNS aliases feature of Docker allows us to get rid of some hackery in entrypoint.sh that inserted entries into /etc/hosts.
Note: Boulder now has a dependency on the DNS aliases feature in Docker. By default, docker-compose run creates a temporary container and doesn't assign any aliases to it. We now need to specify docker-compose run --use-aliases to get the correct behavior. Without --use-aliases, Boulder won't be able to resolve the hostnames it wants to bind to.
During periods of peak load, some RPCs are significantly delayed (on the order of seconds) by client-side blocking. HTTP/2 clients have to obey a "max concurrent streams" setting sent by the server. In Go's HTTP/2 implementation, this value [defaults to 250](https://github.com/golang/net/blob/master/http2/server.go#L56), so the gRPC default is also 250. So whenever there are more than 250 requests in progress at a time, additional requests will be delayed until there is a slot available.
During this peak load, we aren't hitting limits on CPU or memory, so we should increase the max concurrent streams limit to take better advantage of our available resources. This PR adds a config field to do that.
Fixes#3641.
gRPC passes deadline information through the RPC boundary, but client and server have the same deadline. Ideally we'd like the server to have a slightly tighter deadline than the client, so if one of the server's onward RPCs or other network calls times out, the server can pass back more detailed information to the client, rather than the client timing out the server and losing the opportunity to log more detailed information about which component caused the timeout.
In this change, I subtract 100ms from the deadline on the server side of our interceptors, using our existing serverInterceptor. I also check that there is at least 100ms remaining in which to do useful work, so the server doesn't begin a potentially expensive task only to abort it.
Fixes#3608.
Adds SCT embedding to the certificate issuance flow. When a issuance is requested a precertificate (the requested certificate but poisoned with the critical CT extension) is issued and submitted to the required CT logs. Once the SCTs for the precertificate have been collected a new certificate is issued with the poison extension replace with a SCT list extension containing the retrieved SCTs.
Fixes#2244, fixes#3492 and fixes#3429.
This code was never enabled in production. Our original intent was to
ship this as part of the ACMEv2 API. Before that could happen flaws were
identified in TLS-SNI-01|02 that resulted in TLS-SNI-02 being removed
from the ACME protocol. We won't ever be enabling this code and so we
might as well remove it.
Boulder is fairly noisy about gRPC connection errors. This is a mixed
blessing: Our gRPC configuration will try to reconnect until it hits
an RPC deadline, and most likely eventually succeed. In that case,
we don't consider those to really be errors. However, in cases where
a connection is repeatedly failing, we'd like to see errors in the
logs about connection failure, rather than "deadline exceeded." So
we want to keep logging of gRPC errors.
However, right now we get a lot of these errors logged during
integration tests. They make the output hard to read, and may disguise
more serious errors. So we'd like to avoid causing such errors in
normal integration test operation.
This change reorders the startup of Boulder components by their gRPC
dependencies, so everything's backend is likely to be up and running
before it starts. It also reverses that order for clean shutdowns,
and waits for each process to exit before signalling the next one.
With these changes, I still got connection errors. Taking listenbuddy
out of the gRPC path fixed them. I believe the issue is that
listenbuddy is not a truly transparent proxy. In particular, it
accepts an inbound TCP connection before opening an outbound TCP
connection. If opening that outbound connection results in "connection
refused," it closes the inbound connection. That means gRPC sees a
"connection closed" (or "connection reset"?) rather than "connection
refused". I'm guessing it handles those cases differently, explaining
the different error results.
We've been using listenbuddy to trigger disconnects while Boulder is
running, to ensure that gRPC's reconnect code works. I think we can
probably rely on gRPC's reconnect to work. The initial problem that
led us to start testing this was a configuration problem; now that
we have the configuration we want, we should be fine and don't need
to keep testing reconnects on every integration test run.
Previously, there was a disagreement between WFE and CA as to what the correct
issuer certificate was. Consolidate on test-ca2.pem (h2ppy h2cker fake CA).
Also, the CA configs contained an outdated entry for "IssuerCert", which was not
being used: The CA configs now use an "Issuers" array to allow signing by
multiple issuer certificates at once (for instance when rolling intermediates).
Removed this outdated entry, and the config code for CA to load it. I've
confirmed these changes match what is currently in production.
Added an integration test to check for this problem in the future.
Fixes#3309, thanks to @icing for bringing the issue to our attention!
This also includes changes from #3321 to clarify certificates for WFE.
This PR implements issuance for wildcard names in the V2 order flow. By policy, pending authorizations for wildcard names only receive a DNS-01 challenge for the base domain. We do not re-use authorizations for the base domain that do not come from a previous wildcard issuance (e.g. a normal authorization for example.com turned valid by way of a DNS-01 challenge will not be reused for a *.example.com order).
The wildcard prefix is stripped off of the authorization identifier value in two places:
When presenting the authorization to the user - ACME forbids having a wildcard character in an authorization identifier.
When performing validation - We validate the base domain name without the *. prefix.
This PR is largely a rewrite/extension of #3231. Instead of using a pseudo-challenge-type (DNS-01-Wildcard) to indicate an authorization & identifier correspond to the base name of a wildcard order name we instead allow the identifier to take the wildcard order name with the *. prefix.
Adds a basic truncated modulus hash check for RSA keys that can be used to check keys against the Debian `{openssl,openssh,openvpn}-blacklist` lists of weak keys generated during the [Debian weak key incident](https://wiki.debian.org/SSLkeys).
Testing is gated on adding a new configuration key to the WFE, RA, and CA configs which contains the path to a directory which should contain the weak key lists.
Fixes#157.
This removes the config and code to output to statsd.
- Change `cmd.StatsAndLogging` to output a `Scope`, not a `Statter`.
- Remove the prefixing of component name (e.g. "VA") in front of stats; this was stripped by `autoProm` but now no longer needs to be.
- Delete vendored statsd client.
- Delete `MockStatter` (generated by gomock) and `mocks.Statter` (hand generated) in favor of mocking `metrics.Scope`, which is the interface we now use everywhere.
- Remove a few unused methods on `metrics.Scope`, and update its generated mock.
- Refactor `autoProm` and add `autoRegisterer`, which can be included in a `metrics.Scope`, avoiding global state. `autoProm` now registers everything with the `prometheus.Registerer` it is given.
- Change va_test.go's `setup()` to not return a stats object; instead the individual tests that care about stats override `va.stats` directly.
Fixes#2639, #2733.