Add a new custom lint which sends CRLs to PKIMetal, and configure it to
run in our integration test environment. Factor out most of the code
used to talk to the PKIMetal API so that it can be shared by the two
custom lints which do so. Add the ability to configure lints to the
CRLProfileConfig, so that zlint knows where to load the necessary custom
config from.
Remove some RA tests that were checking for errors specific to the split
issuance flow. Make one of the tests test GetSCTs directly, which makes
for a much nicer test!
Compute the width of the ARI suggested renewal window as 2% of the
validity period. This means that 90-day certificates have their
suggested window shrink slightly from 48 hours to 43.2 hours, and gives
six-day (160h) certs a suggested window 3.2 hours wide.
Also move the center of that window to the midpoint of the certificate
validity period for certs which are valid for less than 10 days, so that
operators have (proportionally) a little more time to respond to renewal
issues.
Fixes https://github.com/letsencrypt/boulder/issues/7996
In revocation_test.go, fetch all CRLs, and look for revoked certificates
on both CRLs and OCSP.
Make s3-test-srv listen on all interfaces, so the CRL URLs in the CA
config work.
Add IssuerNameIDs to the CRL URLs in ca.json, to match how those CRLs
are uploaded to S3.
Make TestRevocation parallel. Speedup from ~60s to ~3s.
Increase ocsp-responder's allowed parallelism to account for parallel
test. Also, add "maxInflightSignings" to config/ since it's in prod.
"maxSigningWaiters" is not yet in prod, so don't move that field.
Add a mutex around running crl-updater, and decrease the log level so
errors stand out more when they happen.
Remove the singular Profile field from the CA config, as it has been
replaced by the plural CertProfiles key. Remove the Expiry, Backdate,
LintConfig, and IgnoredLints keys from the top-level CA config, as they
are now also configured on a per-profile basis. Remove the LifespanCRL
key from the CA config, as it is now configured within the CRLProfile.
For all of the above, remove transitional fallbacks from within
//ca/main.go.
These config changes were deployed to production in IN-10568, IN-10506,
and IN-10045.
Fixes https://github.com/letsencrypt/boulder/issues/7414
Fixes https://github.com/letsencrypt/boulder/issues/7159
This brings in several new and useful lints. It also brings in one CABF
BR lint which we have to ignore in our default profile which includes
the Subject Key Identifier extension:
"w_ext_subject_key_identifier_not_recommended_subscriber". In our modern
profile which omits several fields, we have to ignore the opposite
RFC5280 lint "w_ext_subject_key_identifier_missing_sub_cert".
Release notes: https://github.com/zmap/zlint/releases/tag/v3.6.4
Changelog: https://github.com/zmap/zlint/compare/v3.6.0...v3.6.4
Note that the majority of the ~400 file changes are merely copyright
date changes.
The corresponding production config changes tracked in IN-10466 are
complete.
Goodkey has two ways to detect a key as weak: it runs a variety of
algorithmic checks (such as Fermat factorization and rocacheck), or the
key can be listed in a "weak key file". Similarly, it has two ways to
detect a key as blocked: it can call a generic function (which we use to
query our database), or the key can be listed in a "blocked key file".
This is two methods too many. Reliance on files of weak or blocked keys
introduces unnecessary complexity to both the implementation and
configuration of the goodkey package. Remove both "key file" options and
delete all code which supported them.
Also remove //test/block-a-key, as it was only used to generate these
test files.
IN-10762 tracked the removal of these files in prod.
Fixes https://github.com/letsencrypt/boulder/issues/7748
`ECDSAForAll` feature is now enabled by default (due to it not being
referenced in any issuance path) and as a result the `ECDSAAllowlist`
has been deleted.
Fixes https://github.com/letsencrypt/boulder/issues/7535
This change removes the ECDSAAllowList entry and enables ECDSAForAll for
the `test/config/ca.json` to match the configuration in
`test/config-next/ca.json`. A future change will remove ECDSAAllowList
and ECDSAForAll permanently.
Part of https://github.com/letsencrypt/boulder/issues/7535
This was introduced in config-next in #7441, and has been working well.
We should run it in the mainline tests as well.
No production config change is necessary.
The summary here is:
- Move test/cert-ceremonies to test/certs
- Move .hierarchy (generated by the above) to test/certs/webpki
- Remove our mapping of .hierarchy to /hierarchy inside docker
- Move test/grpc-creds to test/certs/ipki
- Unify the generation of both test/certs/webpki and test/certs/ipki
into a single script at test/certs/generate.sh
- Make that script the entrypoint of a new docker compose service
- Have t.sh and tn.sh invoke that service to ensure keys and certs are
created before tests run
No production changes are necessary, the config changes here are just
for testing purposes.
Part of https://github.com/letsencrypt/boulder/issues/7476
Remove the CA's global "crldpBase" config item, and the code which used
it to produce a IDP URI in our CRLs if it was configured.
This config item has been replaced by per-issuer crlURLBase configs
instead, because we have switched our CRL URL format from
"commonURL/issuerID/shard.crl" to "issuerURL/shard.crl" in anticipation
of including these URLs directly in our end-entity certs.
IN-10046 tracked the corresponding change in prod
Add a new "LintConfig" item to the CA's config, which can point to a
zlint configuration toml file. This allows lints to be configured, e.g.
to control the number of rounds of factorization performed by the Fermat
factorization lint.
Leverage this new config to create a new custom zlint which calls out to
a configured pkilint API endpoint. In config-next integration tests,
configure the lint to point at a new pkilint docker container.
This approach has three nice forward-looking features: we now have the
ability to configure any of our lints; it's easy to expand this
mechanism to lint CRLs when the pkilint API has support for that; and
it's easy to enable this new lint if we decide to stand up a pkilint
container in our production environment.
No production configuration changes are necessary at this time.
Fixes https://github.com/letsencrypt/boulder/issues/7430
Update the hierarchy which the integration tests auto-generate inside
the ./hierarchy folder to include three intermediates of each key type,
two to be actively loaded and one to be held in reserve. To facilitate
this:
- Update the generation script to loop, rather than hard-coding each
intermediate we want
- Improve the filenames of the generated hierarchy to be more readable
- Replace the WFE's AIA endpoint with a thin aia-test-srv so that we
don't have to have NameIDs hardcoded in our ca.json configs
Having this new hierarchy will make it easier for our integration tests
to validate that new features like "unpredictable issuance" are working
correctly.
Part of https://github.com/letsencrypt/boulder/issues/729
The `//ca/ca_test.go` `setup` function will now create issuers that each
have a unique private key from `//test/hierarchy/`, rather than multiple
issuers sharing a private key. This was spotted while reviewing an [OCSP
test](10e894a172/ca/ocsp_test.go (L53-L87)).
Some now unnecessary key material has been deleted from `//test/`.
Fixes https://github.com/letsencrypt/boulder/issues/7304
Upgrade to zlint v3.6.0
Two new lints are triggered in various places:
aia_contains_internal_names is ignored in integration test
configurations, and unit tests are updated to have more realistic URLs.
The w_subject_common_name_included lint needs to be ignored where we'd
ignored n_subject_common_name_included before.
Related to https://github.com/letsencrypt/boulder/issues/7261
Retains the existing logging of orphaned certs until we are confident that this
solution can fully replace it (even then we may want to keep it just for auditing etc).
Fixes#3636.
Things removed:
* features.EmbedSCTs (and all the associated RA/CA/ocsp-updater code etc)
* ca.enablePrecertificateFlow (and all the associated RA/CA code)
* sa.AddSCTReceipt and sa.GetSCTReceipt RPCs
* publisher.SubmitToCT and publisher.SubmitToSingleCT RPCs
Fixes#3755.
In staging/prod we use `doNotForceCN: false` for both the RA & CA
config. Switching this to `true` is blocked on CABF work that will
likely take considerable time.
In the short-term we should use `doNotForceCN: false` in `test/config`
and only use `doNotForceCN: true` in `test/config-next`.
We're currently stuck on gRPC v1.1 because of a breaking change to certificate validation in gRPC 1.8. Our gRPC balancer uses a static list of multiple hostnames, and expects to validate against those hostnames. However gRPC expects that a service is one hostname, with multiple IP addresses, and validates all those IP addresses against the same hostname. See grpc/grpc-go#2012.
If we follow gRPC's assumptions, we can rip out our custom Balancer and custom TransportCredentials, and will probably have a lower-friction time in general.
This PR is the first step in doing so. In order to satisfy the "multiple IPs, one port" property of gRPC backends in our Docker container infrastructure, we switch to Docker's user-defined networking. This allows us to give the Boulder container multiple IP addresses on different local networks, and gives it different DNS aliases in each network.
In startservers.py, each shard of a service listens on a different DNS alias for that service, and therefore a different IP address. The listening port for each shard of a service is now identical.
This change also updates the gRPC service certificates. Now, each certificate that is used in a gRPC service (as opposed to something that is "only" a client) has three names. For instance, sa1.boulder, sa2.boulder, and sa.boulder (the generic service name). For now, we are validating against the specific hostnames. When we update our gRPC dependency, we will begin validating against the generic service name.
Incidentally, the DNS aliases feature of Docker allows us to get rid of some hackery in entrypoint.sh that inserted entries into /etc/hosts.
Note: Boulder now has a dependency on the DNS aliases feature in Docker. By default, docker-compose run creates a temporary container and doesn't assign any aliases to it. We now need to specify docker-compose run --use-aliases to get the correct behavior. Without --use-aliases, Boulder won't be able to resolve the hostnames it wants to bind to.
Boulder is fairly noisy about gRPC connection errors. This is a mixed
blessing: Our gRPC configuration will try to reconnect until it hits
an RPC deadline, and most likely eventually succeed. In that case,
we don't consider those to really be errors. However, in cases where
a connection is repeatedly failing, we'd like to see errors in the
logs about connection failure, rather than "deadline exceeded." So
we want to keep logging of gRPC errors.
However, right now we get a lot of these errors logged during
integration tests. They make the output hard to read, and may disguise
more serious errors. So we'd like to avoid causing such errors in
normal integration test operation.
This change reorders the startup of Boulder components by their gRPC
dependencies, so everything's backend is likely to be up and running
before it starts. It also reverses that order for clean shutdowns,
and waits for each process to exit before signalling the next one.
With these changes, I still got connection errors. Taking listenbuddy
out of the gRPC path fixed them. I believe the issue is that
listenbuddy is not a truly transparent proxy. In particular, it
accepts an inbound TCP connection before opening an outbound TCP
connection. If opening that outbound connection results in "connection
refused," it closes the inbound connection. That means gRPC sees a
"connection closed" (or "connection reset"?) rather than "connection
refused". I'm guessing it handles those cases differently, explaining
the different error results.
We've been using listenbuddy to trigger disconnects while Boulder is
running, to ensure that gRPC's reconnect code works. I think we can
probably rely on gRPC's reconnect to work. The initial problem that
led us to start testing this was a configuration problem; now that
we have the configuration we want, we should be fine and don't need
to keep testing reconnects on every integration test run.
Previously, there was a disagreement between WFE and CA as to what the correct
issuer certificate was. Consolidate on test-ca2.pem (h2ppy h2cker fake CA).
Also, the CA configs contained an outdated entry for "IssuerCert", which was not
being used: The CA configs now use an "Issuers" array to allow signing by
multiple issuer certificates at once (for instance when rolling intermediates).
Removed this outdated entry, and the config code for CA to load it. I've
confirmed these changes match what is currently in production.
Added an integration test to check for this problem in the future.
Fixes#3309, thanks to @icing for bringing the issue to our attention!
This also includes changes from #3321 to clarify certificates for WFE.
Deletes github.com/streadway/amqp and the various RabbitMQ setup tools etc. Changes how listenbuddy is used to proxy all of the gRPC client -> server connections so we test reconnection logic.
+49 -8,221 😁Fixes#2640 and #2562.
Pulls in logging improvements in OCSP Responder and the CT client, plus a handful of API changes. Also, the CT client verifies responses by default now.
This change includes some Boulder diffs to accommodate the API changes.
Set authorizationLifetimeDays to 60 across both config and config-next.
Set NumSessions to 2 in both config and config-next. A decrease from 10 because pkcs11-proxy (or pkcs11-daemon?) seems to error out under load if you have more sessions than CPUs.
Reorder parallelGenerateOCSPRequests to match config-next.
Remove extra tags for parsing yaml in config objects.
- Remove spinner from test.js. It made Travis logs hard to read.
- Listen on all interfaces for debugAddr. This makes it possible to check
Prometheus metrics for instances running in a Docker container.
- Standardize DNS timeouts on 1s and 3 retries across all configs. This ensures
DNS completes within the relevant RPC timeouts.
- Remove RA service queue from VA, since VA no longer uses the callback to RA on
completing a challenge.
Instead of reading the CA key from a file on disk into memory and using that for signing in `boulder-ca` this patch adds a new Docker container that runs SoftHSM and pkcs11-proxy in order to hold the key and perform signing operations. The pkcs11-proxy module is used by `boulder-ca` to talk to the SoftHSM container.
This exercises (almost) the full pkcs11 path through boulder and will allow testing various HSM related failures in the future as well as simplifying tuning signing performance for benchmarking.
Fixes#703.