Permit all valid identifier types in `wfe.NewOrder` and `csr.VerifyCSR`.
Permit certs with just IP address identifiers to skip
`sa.addIssuedNames`.
Check that URI SANs are empty in `csr.VerifyCSR`, which was previously
missed.
Use a real (Let's Encrypt) IP address range in integration testing, to
let challtestsrv satisfy IP address challenges.
Fixes#8192
Depends on #8154
- Copy
https://pkg.go.dev/github.com/letsencrypt/pebble/v2/cmd/pebble-challtestsrv
to `test/chall-test-srv`
- Rename pebble-challtestsrv to chall-test-srv, consistent with other
test server naming in Boulder
- Replace Dockerfile go install with Makefile compilation of
`chall-test-srv`
- Run chall-test-srv from `./bin/chall-test-srv`
- Bump `github.com/letsencrypt/challtestsrv` from `v1.2.1` to `v1.3.2`
in go.mod
- Update boulder-ci GitHub workflow to use `go1.24.1_2025-04-02`
Part of #7963
Remove crl-updater from the list of services run by startservers.py, so
that it isn't running at the same time as the crl-updater instances run
by specific integration tests. In return, add a new integration test
which starts crl-updater and waits for it to listen on its debug port,
just like startservers does.
Also make the existing crl-updater integration tests more robust and
more parallelizable by having them always reset the leasedUntil column
before executing the updater, instead of requiring each individual test
to perform that reset.
Fixes https://github.com/letsencrypt/boulder/issues/7590
Add a new boulder service, email-exporter, which uses the Pardot API
client added in #8016 and the email.Exporter gRPC service added in
#8017.
Add pardot-test-srv, a test-only service for mocking communication with
Salesforce OAuth and Pardot APIs in non-production environments. Since
Salesforce does not provide Pardot functionality in developer sandboxes,
pardot-test-srv must run in all non-production environments (e.g.,
sre-development and staging).
Integrate the email-exporter service with the WFE and modify
WFE.NewAccount and WFE.UpdateAccount to submit valid email contacts.
Ensure integration tests verify that contacts eventually reach
pardot-test-srv.
Update configuration where necessary to:
- Build pardot-test-srv as a standalone binary.
- Bring up pardot-test-srv and cmd/email-exporter for integration
testing.
- Integrate WFE with cmd/email-exporter when running test/config-next.
Closes#7966
Add a new RPC to the CA: `IssueCertificate` covers issuance of both the
precertificate and the final certificate. In between, it calls out to
the RA's new method `GetSCTs`.
The RA calls the new `CA.IssueCertificate` if the `UnsplitIssuance`
feature flag is true.
The RA had a metric that counted certificates by profile name and hash.
Since the RA doesn't receive a profile hash in the new flow, simply
record the total number of issuances.
Fixes https://github.com/letsencrypt/boulder/issues/7983
In revocation_test.go, fetch all CRLs, and look for revoked certificates
on both CRLs and OCSP.
Make s3-test-srv listen on all interfaces, so the CRL URLs in the CA
config work.
Add IssuerNameIDs to the CRL URLs in ca.json, to match how those CRLs
are uploaded to S3.
Make TestRevocation parallel. Speedup from ~60s to ~3s.
Increase ocsp-responder's allowed parallelism to account for parallel
test. Also, add "maxInflightSignings" to config/ since it's in prod.
"maxSigningWaiters" is not yet in prod, so don't move that field.
Add a mutex around running crl-updater, and decrease the log level so
errors stand out more when they happen.
To prepare for the MPIC requirement of having a minimum of 3
perspectives, I added code to `NewValidationAuthorityImpl` to error if
there aren't enough remote VAs configured _and_ the current VA is the
primary perspective. Then I fixed all the tests, which involved adding
some backends in the unittests, and spinning up `remoteva-c` in the
integration tests.
As a reminder, the `boulder va` command always considers itself the
primary perspective, while `boulder remoteva` gives itself a perspective
based on its config.
I wound up backing out the code in `NewValidationAuthorityImpl` because
right now our remote VAs are actually running the `boulder va` command,
so they would error out in prod, even though our actual primary
perspective does have enough backends. So this wound up as a test-only
change.
Adds a new boulder component named `sfe` aka the Self-service FrontEnd
which is dedicated to non-ACME related Subscriber functions. This change
implements one such function which is a web interface and handlers for
account unpausing.
When paused, an ACME client receives a log line URL with a JWT parameter
from the WFE. For the observant Subscriber, manually clicking the link
opens their web browser and displays a page with a pre-filled HTML form.
Upon clicking the form button, the SFE sends an HTTP POST back to itself
and either validates the JWT and issues an RA gRPC request to unpause
the account, or returns an HTML error page.
The SFE and WFE should share a 32 byte seed value e.g. the output of
`openssl rand -hex 16` which will be used as a go-jose symmetric signer
using the HS256 algorithm. The SFE will check various [RFC
7519](https://datatracker.ietf.org/doc/html/rfc7519) claims on the JWT
such as the `iss`, `aud`, `nbf`, `exp`, `iat`, and a custom `apiVersion`
claim.
The SFE should not yet be relied upon or deployed to staging/production
environments. It is very much a work in progress, but this change is big
enough as-is.
Related to https://github.com/letsencrypt/boulder/issues/7406
Part of https://github.com/letsencrypt/boulder/issues/7499
Remove the redis-tls, wfe-tls, and mail-test-srv keys which were
generated by minica and then checked in to the repo. All three are
replaced by the dynamically-generated ipki directory.
Part of https://github.com/letsencrypt/boulder/issues/7476
The summary here is:
- Move test/cert-ceremonies to test/certs
- Move .hierarchy (generated by the above) to test/certs/webpki
- Remove our mapping of .hierarchy to /hierarchy inside docker
- Move test/grpc-creds to test/certs/ipki
- Unify the generation of both test/certs/webpki and test/certs/ipki
into a single script at test/certs/generate.sh
- Make that script the entrypoint of a new docker compose service
- Have t.sh and tn.sh invoke that service to ensure keys and certs are
created before tests run
No production changes are necessary, the config changes here are just
for testing purposes.
Part of https://github.com/letsencrypt/boulder/issues/7476
* Adds a new `remoteva` binary that takes a distinct configuration from
the existing `boulder-va`
* Removed the `boulder-remoteva` name registration from `boulder-va`.
* Existing users of `boulder-remoteva` must either
1. laterally migrate to `boulder-va` which uses that same config, or
2. switch to using `remoteva` with a new config.
Part of https://github.com/letsencrypt/boulder/issues/5294
Update the hierarchy which the integration tests auto-generate inside
the ./hierarchy folder to include three intermediates of each key type,
two to be actively loaded and one to be held in reserve. To facilitate
this:
- Update the generation script to loop, rather than hard-coding each
intermediate we want
- Improve the filenames of the generated hierarchy to be more readable
- Replace the WFE's AIA endpoint with a thin aia-test-srv so that we
don't have to have NameIDs hardcoded in our ca.json configs
Having this new hierarchy will make it easier for our integration tests
to validate that new features like "unpredictable issuance" are working
correctly.
Part of https://github.com/letsencrypt/boulder/issues/729
These names corresponded to single instances of a service, and were
primarily used for (a) specifying which interface to bind a gRPC port on
and (b) allowing `health-checker` to check individual instances rather
than a service as a whole.
For (a), change the `--grpc-addr` flags to bind to "all interfaces." For
(b), provide a specific IP address and port for health checking. This
required adding a `--hostOverride` flag for `health-checker` because the
service certificates contain hostname SANs, not IP address SANs.
Clarify the situation with nonce services a little bit. Previously we
had one nonce "service" in Consul and got nonces from that (i.e.
randomly between the two nonce-service instances). Now we have two nonce
services in consul, representing multiple datacenters, and one of them
is explicitly configured as the "get" service, while both are configured
as the "redeem" service.
Part of #7245.
Note this change does not yet get rid of the rednet/bluenet distinction,
nor does it get rid of all use of 10.88.88.88. That will be a followup
change.
Part of #7245.
This just provides a unique port for each instance, and breaks the
service<->port mapping. A subsequent PR will move to listening on the
same IP.
Remove unused `-b` variants of crl-storer and akamai-purger.
The new port scheme is that the first instance of a service is on `93xx`
and the second instance of a service is on `94xx`.
Part of a stacked change with #7243.
The servers are invoked such that they have to look up their service
names in DNS in order to bind a port. This means that when consul is
down, they take a long time to start up- they are timing out the query.
In the meantime there are a number of messages about timed out health
checks. This winds up obscuring the real error, so let's do a quick DNS
check at startup and give a more meaningful error.
Many services already have --addr and/or --debug-addr flags.
However, it wasn't universal, so this PR adds flags to commands where
they're not currently present.
This makes it easier to use a shared config file but listen on different
ports, for running multiple instances on a single host.
The config options are made optional as well, and removed from
config-next/.
Most boulder components have a command line flag to override what gRPC
and debug port they listen on, which is used in tests to run multiple
instances with the same configuration.
However, CA's flag is named "--ca-addr", and not "--addr". This is
inconsistent with SA, RA, VA, nonce, publisher, and purger.
This flag isn't used in production, where we set it in the config file,
so it shouldn't be a breaking change to rename it.
These config stanzas have been removed in staging and prod. They used to
configure the separate OCSP and CRL gRPC services provided by the CA
process, but the CA now provides those services on the same port as the
main CA gRPC service.
Fixes#6448
Delete the ocsp-updater service, and the //ocsp/updater library that
supports it. Remove test configs for the service, and remove references
to the service from other test files.
This service has been fully shut down for an extended period now, and is
safe to remove.
Fixes#6499
Assign nonce prefixes for each nonce-service by taking the first eight
characters of the the base64url encoded HMAC-SHA256 hash of the RPC
listening address using a provided key. The provided key must be same
across all boulder-wfe and nonce-service instances.
- Add a custom `grpc-go` load balancer implementation (`nonce`) which
can route nonce redemption RPC messages by matching the prefix to the
derived prefix of the nonce-service instance which created it.
- Modify the RPC client constructor to allow the operator to override
the default load balancer implementation (`round_robin`).
- Modify the `srv` RPC resolver to accept a comma separated list of
targets to be resolved.
- Remove unused nonce-service `-prefix` flag.
Fixes#6404
Boulder builds a single binary which is symlinked to the different binary names, which are included in its releases.
However, requiring symlinks isn't always convenient.
This change makes the base `boulder` command usable as any of the other binary names. If the binary is invoked as boulder, runs the second argument as the command name. It shifts off the `boulder` from os.Args so that all the existing argument parsing can remain unchanged.
This uses the subcommand versions in integration tests, which I think is important to verify this change works, however we can debate whether or not that should be merged, since we're using the symlink method in production, that's what we want to test.
Issue #6362 suggests we want to move to a more fully-featured command-line parsing library that has proper subcommand support. This fixes one fragment of that, by providing subcommands, but is definitely nowhere near as nice as it could be with a more fully fleshed out library. Thus this change takes a minimal-touch approach to this change, since we know a larger refactoring is coming.
- Add a dedicated Consul container
- Replace `sd-test-srv` with Consul
- Add documentation for configuring Consul
- Re-issue all gRPC credentials for `<service-name>.service.consul`
Part of #6111
Create a new crl-storer service, which receives CRL shards via gRPC and
uploads them to an S3 bucket. It ignores AWS SDK configuration in the
usual places, in favor of configuration from our standard JSON service
config files. It ensures that the CRLs it receives parse and are signed
by the appropriate issuer before uploading them.
Integrate crl-updater with the new service. It streams bytes to the
crl-storer as it receives them from the CA, without performing any
checking at the same time. This new functionality is disabled if the
crl-updater does not have a config stanza instructing it how to connect
to the crl-storer.
Finally, add a new test component, the s3-test-srv. This acts similarly
to the existing mail-test-srv: it receives requests, stores information
about them, and exposes that information for later querying by the
integration test. The integration test uses this to ensure that a
newly-revoked certificate does show up in the next generation of CRLs
produced.
Fixes#6162
Create a new service named crl-updater. It is responsible for
maintaining the full set of CRLs we issue: one "full and complete" CRL
for each currently-active Issuer, split into a number of "shards" which
are essentially CRLs with arbitrary scopes.
The crl-updater is modeled after the ocsp-updater: it is a long-running
standalone service that wakes up periodically, does a large amount of
work in parallel, and then sleeps. The period at which it wakes to do
work is configurable. Unlike the ocsp-responder, it does all of its work
every time it wakes, so we expect to set the update frequency at 6-24
hours.
Maintaining CRL scopes is done statelessly. Every certificate belongs to
a specific "bucket", given its notAfter date. This mapping is generally
unchanging over the life of the certificate, so revoked certificate
entries will not be moving between shards upon every update. The only
exception is if we change the number of shards, in which case all of the
bucket boundaries will be recomputed. For more details, see the comment
on `getShardBoundaries`.
It uses the new SA.GetRevokedCerts method to collect all of the revoked
certificates whose notAfter timestamps fall within the boundaries of
each shard's time-bucket. It uses the new CA.GenerateCRL method to sign
the CRLs. In the future, it will send signed CRLs to the crl-storer to
be persisted outside our infrastructure.
Fixes#6163
Add a new CA gRPC method named `GenerateCRL`. In the
style of the existing `GenerateOCSP` method, this new endpoint
is implemented as a separate service, for which the CA binary
spins up an additional gRPC service.
This method uses gRPC streaming for both its input and output.
For input, the stream must contain exactly one metadata message
identifying the crl number, issuer, and timestamp, and then any
number of messages identifying a single certificate which should
be included in the CRL. For output, it simply streams chunks of
bytes.
Fixes#6161
Update:
- golangci-lint from v1.42.1 to v1.46.2
- protoc from v3.15.6 to v3.20.1
- protoc-gen-go from v1.26.0 to v1.28.0
- protoc-gen-go-grpc from v1.1.0 to v1.2.0
- fpm from v1.14.0 to v1.14.2
Also remove a reference to go1.17.9 from one last place.
This does result in updating all of our generated .pb.go files, but only
to update the version number embedded in each file's header.
Fixes#6123
These tests are testing functionality that is no longer in use in
production deployments of Boulder. As we go about removing wfe1
functionality, these tests will break, so let's just remove them
wholesale right now. I have verified that all of the tests removed in
this PR are duplicated against wfe2.
One of the changes in this PR is to cease starting up the wfe1 process
in the integration tests at all. However, that component was serving
requests for the AIA Issuer URL, which gets queried by various OCSP and
revocation tests. In order to keep those tests working, this change also
adds an integration-test-only handler to wfe2, and updates the CA
configuration to point at the new handler.
Part of #5681
Previously, `starservers.start()` would implicitly build the binaries.
This separates the `startservers.install()` step as a separate one that
must happen first. This is useful because it allows us to ensure the
`ceremony` tool has been built before we run `setupHierarchy`.
Also, add a `-s` flag to `curl` when checking whether start.py resulted
in a successful startup. This reduces the amount of log spam when it
failed to come up.
bad-key-revoker was listed with no dependencies, and boulder-va did not
list its remoteva backends as dependencies. This led to some spurious
error messages in log output.
This adds a new tool, `health-checker`, which is a client of the new
Health Checker Service that has been integrated into all of our
boulder components. This tool takes an address, a timeout, and a
config file. It then attempts to connect to a gRPC Health Service at
the given address, retrying until it hits its timeout, using credentials
specified by the config file.
This is then wrapped by a new function `waithealth` in our Python
helpers, which serves much the same function as `waitport`, but
specifically for services which surface a gRPC Health Service
This in turn requires slight modifications to `startservers`, namely
specifying the address and port on which each service starts its
gRPC listener.
Finally, this change also introduces new credentials for this
health-checker, and adds those credentials as a valid client to
all services' json configs. A similar change would have to be made
to our production configs if we were to establish a long-lived
health checker/prober in prod.
Fixes#5074
When issuing a new root or intermediate cert, we should take every
precaution possible to ensure that these certs are well-formed.
This change introduces a new step prior to issuing and writing a new CA
cert. We generate a new disposable private key based on the type of the
key being used in the real ceremony, then use this key to sign a fake
certificate for the sole purpose of linting. We then pass this through
the full suite of zlint's checks before proceeding with the actual
issuance.
Since this code path is largely similar to the pre-issuance linting done
by the new boulder signer tool, this change also factors it out into a
small, single-purpose `lint` package.
Fixes#5051
This copies over a number of features flags and other settings from
test/config-next that have been applied in prod.
Also, remove the config-next gate on various tests.
Add passthrough for certain environment variables to
docker-compose.yml, making it easier to set them:
RUN=unit docker-compose run --use-aliases boulder ./test.sh
Use 4001 instead of 4443 to monitor boulder-wfe2's health. This avoids
a spurious error log about a failed TLS handshake.
Remove unused code around running Certbot in integation tests.
Once we de-dupe startservers.install calls we may want to move
setupHierarchy there, but for now we need to call it from
integration-test.py and start.py.
Fixes#4842
Adds a daemon which monitors the new blockedKeys table and checks for any unexpired, unrevoked certificates that are associated with the added SPKI hashes and revokes them, notifying the user that issued the certificates.
Fixes#4772.
For now this mainly provides an example config and confirms that
log-validator can start up and shut down cleanly, as well as provide a
stat indicating how many log lines it has handled.
This introduces a syslog config to the boulder-tools image that will write
logs to /var/log/program.log. It also tweaks the various .json config
files so they have non-default syslogLevel, to ensure they actually
write something for log-validator to verify.
Instead of installing Certbot from the repo, install the python-acme
library (the only piece we need) from the apt repository. This also
allows us to skip installing build dependencies for Certbot.
Uninstall cmake after building.
Clean the various Go caches.
Move codespell and acme into requirements.txt. Don't use virtualenv anymore.
This reduces image size from 1.4 GB to 1.0 GB.
Incidentally, move the Go install to its own phase in the Dockerfile.
This will give it its own image layer, making rebuilds faster.
Python 2 is over in 1 month 4 days: https://pythonclock.org/
This rolls forward most of the changes in #4313.
The original change was rolled back in #4323 because it
broke `docker-compose up`. This change fixes those original issues by
(a) making sure `requests` is installed and (b) sourcing a virtualenv
containing the `requests` module before running start.py.
Other notable changes in this:
- Certbot has changed the developer instructions to install specific packages
rather than rely on `letsencrypt-auto --os-packages-only`, so we follow suit.
- Python3 now has a `bytes` type that is used in some places that used to
provide `str`, and all `str` are now Unicode. That means going from `bytes` to
`str` and back requires explicit `.decode()` and `.encode()`.
- Moved from urllib2 to requests in many places.
We now expect that the config dir is always set, so we make that
explicit in the integration test and error if that's not true.
This change also renames the variable to just "config_dir", and removes
the parameter to startservers.start, which is currently never set to
anything other than its default value.
This also explicitly sets the environment variable in .travis.yml.
This reverts commit 796a7aa2f4.
People's tests have been breaking on `docker-compose up` with the following output:
```
ImportError: No module named requests
```
Fixes#4322
* integration: move to Python3
- Add parentheses to all print and raise calls.
- Python3 distinguishes bytes from strings. Add encode() and
decode() calls as needed to provide the correct type.
- Use requests library consistently (urllib3 is not in Python3).
- Remove shebang from Python files without a main, and update
shebang for integration-test.py.
Basically a complete re-write/re-design of the forwarding concept introduced in
#4297 (sorry for the rapid churn here). Instead of nonce-services blindly
forwarding nonces around to each other in an attempt to find out who issued the
nonce we add an identifying prefix to each nonce generated by a service. The
WFEs then use this prefix to decide which nonce-service to ask to validate the
nonce.
This requires a slightly more complicated configuration at the WFE/2 end, but
overall I think ends up being a way cleaner, more understandable, easy to
reason about implementation. When configuring the WFE you need to provide two
forms of gRPC config:
* one gRPC config for retrieving nonces, this should be a DNS name that
resolves to all available nonce-services (or at least the ones you want to
retrieve nonces from locally, in a two DC setup you might only configure the
nonce-services that are in the same DC as the WFE instance). This allows
getting a nonce from any of the configured services and is load-balanced
transparently at the gRPC layer.
* a map of nonce prefixes to gRPC configs, this maps each individual
nonce-service to it's prefix and allows the WFE instances to figure out which
nonce-service to ask to validate a nonce it has received (in a two DC setup
you'd want to configure this with all the nonce-services across both DCs so
that you can validate a nonce that was generated by a nonce-service in another
DC).
This balancing is implemented in the integration tests.
Given the current remote nonce code hasn't been deployed anywhere yet this
makes a number of hard breaking changes to both the existing nonce-service
code, and the forwarding code.
Fixes#4303.