Replace the bpkilint container with a new bpkimetal container. Update
our custom lint which calls out to that API to speak PKIMetal's (very
similar) protocol instead. Update our zlint custom configuration to
configure this updated lint.
Fixes https://github.com/letsencrypt/boulder/issues/8009
Update from go1.23.1 to go1.23.6 for our primary CI and release builds.
This brings in a few security fixes that aren't directly relevant to us.
Add go1.24.0 to our matrix of CI and release versions, to prepare for
switching to this next major version in prod.
Begin testing on go1.23. To facilitate this, also update /x/net,
golangci-lint, staticcheck, and pebble-challtestsrv to versions which
support go1.23. As a result of these updates, also fix a handful of new
lint findings, mostly regarding passing non-static (i.e. potentially
user-controlled) format strings into Sprintf-style functions.
Additionally, delete one VA unittest that was duplicating the checks
performed by a different VA unittest, but with a context timeout bug
that caused it to break when go1.23 subtly changed DialContext behavior.
The summary here is:
- Move test/cert-ceremonies to test/certs
- Move .hierarchy (generated by the above) to test/certs/webpki
- Remove our mapping of .hierarchy to /hierarchy inside docker
- Move test/grpc-creds to test/certs/ipki
- Unify the generation of both test/certs/webpki and test/certs/ipki
into a single script at test/certs/generate.sh
- Make that script the entrypoint of a new docker compose service
- Have t.sh and tn.sh invoke that service to ensure keys and certs are
created before tests run
No production changes are necessary, the config changes here are just
for testing purposes.
Part of https://github.com/letsencrypt/boulder/issues/7476
Add a new "LintConfig" item to the CA's config, which can point to a
zlint configuration toml file. This allows lints to be configured, e.g.
to control the number of rounds of factorization performed by the Fermat
factorization lint.
Leverage this new config to create a new custom zlint which calls out to
a configured pkilint API endpoint. In config-next integration tests,
configure the lint to point at a new pkilint docker container.
This approach has three nice forward-looking features: we now have the
ability to configure any of our lints; it's easy to expand this
mechanism to lint CRLs when the pkilint API has support for that; and
it's easy to enable this new lint if we decide to stand up a pkilint
container in our production environment.
No production configuration changes are necessary at this time.
Fixes https://github.com/letsencrypt/boulder/issues/7430
We have moved entirely to go1.22 in prod. This also allows us to remove
setting loopvar from our CI tasks, since it is the default behavior as
of go1.22.
These are now "bouldernet" and "integrationtestnet". The division isn't
perfectly clean right now, because for instance challtestsrv binds to an
address in bouldernet, but that's okay.
Fixes#7245
Part of #7245.
There are still a few places that use 10.88.88.88 that will be harder to
remove. In particular, some of the Python integration tests start up
their own HTTP servers that differ from challtestsrv in some important
way (like timing out requests). Because challtestsrv already binds to
10.77.77.77:80, those test servers need a different IP address to bind
to. We can probably solve that but I'll leave it for another PR.
This solves a few problems:
- When producing a new revision of boulder-tools, it often requires
multiple iterations to get it right. This provides a straightforward
path to build those iterations without trying to upload them to a Docker
repository each time.
- It's no longer necessary to produce dev container images in addition
to CI container images. Dev images are built on-demand and cached.
- Cross builds are no longer needed unless building the CI images on
non-amd64.
For third-party integration tests that do `docker compose up`, this may
result in longer build times if they are rebuilding from scratch each
time. That can be improved by keeping docker cache around.
`t.sh` and `tn.sh` now each use a distinct GOCACHE. This causes
back-to-back invocations of the unittests in config and config-next
to not incorrectly skip tests whose behavior changes between
those two environments.
Remove the "netaccess" container from the docker-compose dev
environment.
It isn't needed during a regular 'docker compose up' developer
environment, and only really serves as a way to use the same tools image
in CI. Two checks run during CI are the govulncheck and verifying go mod
tidy / go vendor. Neither of these checks require anything from the
custom image other than Golang itself, which can be provided directly
from the CI environment.
If a developer is working inside the existing containers, they can still
run `go mod tidy; go mod vendor` themselves, which is a standard Golang
workflow and thus is simpler than using the netaccess image via docker
compose.
Run staticcheck as a standalone binary rather than as a library via
golangci-lint. From the golangci-lint help out,
> staticcheck (megacheck): It's a set of rules from staticcheck. It's
not the same thing as the staticcheck binary. The author of staticcheck
doesn't support or approve the use of staticcheck as a library inside
golangci-lint.
We decided to disable ST1000 which warns about incorrect or missing
package comments.
For SA4011, I chose to change the semantics[1] of the for loop rather
than ignoring the SA4011 lint for that line.
Fixes https://github.com/letsencrypt/boulder/issues/6988
1. https://go.dev/ref/spec#Continue_statements
This is a follow-up fix to
https://github.com/letsencrypt/boulder/pull/6987 which mistakenly did
not update the docker compose default BOULDER_TOOLS_TAG variable from
go1.20.5 to go1.20.6. This would only manifest on developer machines
manually running unit/integration tests rather than CI which explicitly
tests against a matrix of BOULDER_TOOLS_TAG versions.
```
Error response from daemon: manifest for letsencrypt/boulder-tools:go1.20.5_2023-07-11 not found: manifest unknown: manifest unknown
```
This version includes a fix that seems relevant to us:
> The HTTP/1 client did not fully validate the contents of the Host
header. A maliciously crafted Host header could inject additional
headers or entire requests. The HTTP/1 client now refuses to send
requests containing an invalid Request.Host or Request.URL.Host value.
>
> Thanks to Bartek Nowotarski for reporting this issue.
>
> Includes security fixes for CVE-2023-29406 and Go issue
https://go.dev/issue/60374
Add go1.21rc2 to the matrix of go versions we test against.
Add a new step to our CI workflows (boulder-ci, try-release, and
release) which sets the "GOEXPERIMENT=loopvar" environment variable if
we're running go1.21. This experiment makes it so that loop variables
are scoped only to their single loop iteration, rather than to the whole
loop. This prevents bugs such as our CAA Rechecking incident
(https://bugzilla.mozilla.org/show_bug.cgi?id=1619047). Also add a line
to our docker setup to propagate this environment variable into the
container, where it can affect builds.
Finally, fix one TLS-ALPN-01 test to have the fake subscriber server
actually willing to negotiate the acme-tls/1 protocol, so that the ACME
server's tls client actually waits to (fail to) get the certificate,
instead of dying immediately. This fix is related to the upgrade to
go1.21, not the loopvar experiment.
Fixes https://github.com/letsencrypt/boulder/issues/6950
- Update consul container from `1.13.1` to `1.14.2` to match production.
- Specify `grpc_tls`, now required instead of defaulted to `8503` when
`enable_agent_tls_for_checks` is specified.
Part of #6911
Enable SA gRPC health checks in Consul ahead of further changes for
#6878. Calls to the `Check` method of the SA's grpc.health.v1.Health
service must respond `SERVING` before the `sa` service will be
advertised in Consul DNS. Consul will continue to poll this service
every 5 seconds.
- Add `bconsul` docker service to boulder `bluenet` and `rednet`
- Add TLS credentials for `consul.boulder`:
```shell
$ openssl x509 -in consul.boulder/cert.pem -text | grep DNS
DNS:consul.boulder
```
- Update `test/grpc-creds/generate.sh` to add `consul.boulder`
- Update test SA configs to allow `consul.boulder` to access to
`grpc.health.v1.Health`
Part of #6878
These external port bindings are not necessary, as the integration test
configs resolve the bjaeger container directly. In addition, these
external port bindings cause problems for rootless docker, so let's
remove them.
This adds Jaeger's all-in-one dev container (with no persistent storage)
to boulder's dev docker-compose. It configures config-next/ to send all
traces there.
A new integration test creates an account and issues a cert, then
verifies the trace contains some set of expected spans.
This test found that async finalize broke spans, so I fixed that and a
few related spots where we make a new context.
Add an upstream ProxySQL container to our docker-compose. Configure
ProxySQL to manage database connections for our unit and integration
tests.
Fixes#5873
## Remove cfssl recommendation
While it is a valuable PKI toolkit, it really isn't an alternative to
boulder -- there are other private ACME CA projects, and I don't think
we should be in the business of recommending other software when there's
many tradeoffs to be made.
## Remove references to "two API versions"
This removes the reference to running two WFEs, and simplifies some of
the description around "objects" being passed around, which I don't
think is helpful for understanding how Boulder works as the RPCs aren't
generally broadcasting updated objects in the way the removed paragraph
suggests.
## Update information about solving ACME challenges
In #6619 we removed the VA PortConfig, so information about ports 5001
and 5002 are obsolete.
As well, the docker host IP is almost always (barring a user changing
it) the same, so while there's a longer explanation in the README, a
comment in docker-compose.yml is a useful quick reference.
Only build arm64 images for one version of Go.
Split build.sh into two scripts: build.sh (which installs apt and
Python) and install-go.sh (which installs a specific Go version and Go
dependencies). This allows reusing a cached layer for the build.sh step
across multiple Go versions.
Remove installation of fpm from build.sh. This is no longer needed since
#6669 and allows us to get rid of `rpm`, `ruby`, and `ruby-dev`.
Remove apt dependency on pkg-config, libtool, autoconf, and automake.
These were introduced in
https://github.com/letsencrypt/boulder/pull/4832 but aren't needed
anymore because we don't build softhsm2 ourselves (we get it from apt).
Remove apt dependency on cmake, libssl-dev, and openssl. I'm not totally
sure what these were needed for but they're not needed anymore.
Running this locally on my laptop for our current 3 GO_CI_VERSIONS and 1
GO_DEV_VERSION takes 23 minutes of wall time, dominated by the cross
build for arm64.
Remove `example.com` domain name, which was used by the deleted OldTLS
tests.
Remove GODEBUG=x509sha1=1.
Add a longer comment for the Consul DNS fallback in docker-compose.yml.
Use the "dnsAuthority" field for all gRPC clients in config-next,
instead of implicitly relying on the system DNS. This matches what we do
in prod.
Make "dnsAuthority" field of GRPCClientConfig mandatory whenever
SRVLookup or SRVLookups is used.
Make test/config/ocsp-responder.json use ServerAddress instead of
SRVLookup, like the rest of test/config.