This solves a few problems:
- When producing a new revision of boulder-tools, it often requires
multiple iterations to get it right. This provides a straightforward
path to build those iterations without trying to upload them to a Docker
repository each time.
- It's no longer necessary to produce dev container images in addition
to CI container images. Dev images are built on-demand and cached.
- Cross builds are no longer needed unless building the CI images on
non-amd64.
For third-party integration tests that do `docker compose up`, this may
result in longer build times if they are rebuilding from scratch each
time. That can be improved by keeping docker cache around.
`t.sh` and `tn.sh` now each use a distinct GOCACHE. This causes
back-to-back invocations of the unittests in config and config-next
to not incorrectly skip tests whose behavior changes between
those two environments.
Remove the "netaccess" container from the docker-compose dev
environment.
It isn't needed during a regular 'docker compose up' developer
environment, and only really serves as a way to use the same tools image
in CI. Two checks run during CI are the govulncheck and verifying go mod
tidy / go vendor. Neither of these checks require anything from the
custom image other than Golang itself, which can be provided directly
from the CI environment.
If a developer is working inside the existing containers, they can still
run `go mod tidy; go mod vendor` themselves, which is a standard Golang
workflow and thus is simpler than using the netaccess image via docker
compose.
Run staticcheck as a standalone binary rather than as a library via
golangci-lint. From the golangci-lint help out,
> staticcheck (megacheck): It's a set of rules from staticcheck. It's
not the same thing as the staticcheck binary. The author of staticcheck
doesn't support or approve the use of staticcheck as a library inside
golangci-lint.
We decided to disable ST1000 which warns about incorrect or missing
package comments.
For SA4011, I chose to change the semantics[1] of the for loop rather
than ignoring the SA4011 lint for that line.
Fixes https://github.com/letsencrypt/boulder/issues/6988
1. https://go.dev/ref/spec#Continue_statements
This is a follow-up fix to
https://github.com/letsencrypt/boulder/pull/6987 which mistakenly did
not update the docker compose default BOULDER_TOOLS_TAG variable from
go1.20.5 to go1.20.6. This would only manifest on developer machines
manually running unit/integration tests rather than CI which explicitly
tests against a matrix of BOULDER_TOOLS_TAG versions.
```
Error response from daemon: manifest for letsencrypt/boulder-tools:go1.20.5_2023-07-11 not found: manifest unknown: manifest unknown
```
This version includes a fix that seems relevant to us:
> The HTTP/1 client did not fully validate the contents of the Host
header. A maliciously crafted Host header could inject additional
headers or entire requests. The HTTP/1 client now refuses to send
requests containing an invalid Request.Host or Request.URL.Host value.
>
> Thanks to Bartek Nowotarski for reporting this issue.
>
> Includes security fixes for CVE-2023-29406 and Go issue
https://go.dev/issue/60374
Add go1.21rc2 to the matrix of go versions we test against.
Add a new step to our CI workflows (boulder-ci, try-release, and
release) which sets the "GOEXPERIMENT=loopvar" environment variable if
we're running go1.21. This experiment makes it so that loop variables
are scoped only to their single loop iteration, rather than to the whole
loop. This prevents bugs such as our CAA Rechecking incident
(https://bugzilla.mozilla.org/show_bug.cgi?id=1619047). Also add a line
to our docker setup to propagate this environment variable into the
container, where it can affect builds.
Finally, fix one TLS-ALPN-01 test to have the fake subscriber server
actually willing to negotiate the acme-tls/1 protocol, so that the ACME
server's tls client actually waits to (fail to) get the certificate,
instead of dying immediately. This fix is related to the upgrade to
go1.21, not the loopvar experiment.
Fixes https://github.com/letsencrypt/boulder/issues/6950
- Update consul container from `1.13.1` to `1.14.2` to match production.
- Specify `grpc_tls`, now required instead of defaulted to `8503` when
`enable_agent_tls_for_checks` is specified.
Part of #6911
Enable SA gRPC health checks in Consul ahead of further changes for
#6878. Calls to the `Check` method of the SA's grpc.health.v1.Health
service must respond `SERVING` before the `sa` service will be
advertised in Consul DNS. Consul will continue to poll this service
every 5 seconds.
- Add `bconsul` docker service to boulder `bluenet` and `rednet`
- Add TLS credentials for `consul.boulder`:
```shell
$ openssl x509 -in consul.boulder/cert.pem -text | grep DNS
DNS:consul.boulder
```
- Update `test/grpc-creds/generate.sh` to add `consul.boulder`
- Update test SA configs to allow `consul.boulder` to access to
`grpc.health.v1.Health`
Part of #6878
These external port bindings are not necessary, as the integration test
configs resolve the bjaeger container directly. In addition, these
external port bindings cause problems for rootless docker, so let's
remove them.
This adds Jaeger's all-in-one dev container (with no persistent storage)
to boulder's dev docker-compose. It configures config-next/ to send all
traces there.
A new integration test creates an account and issues a cert, then
verifies the trace contains some set of expected spans.
This test found that async finalize broke spans, so I fixed that and a
few related spots where we make a new context.
Add an upstream ProxySQL container to our docker-compose. Configure
ProxySQL to manage database connections for our unit and integration
tests.
Fixes#5873
## Remove cfssl recommendation
While it is a valuable PKI toolkit, it really isn't an alternative to
boulder -- there are other private ACME CA projects, and I don't think
we should be in the business of recommending other software when there's
many tradeoffs to be made.
## Remove references to "two API versions"
This removes the reference to running two WFEs, and simplifies some of
the description around "objects" being passed around, which I don't
think is helpful for understanding how Boulder works as the RPCs aren't
generally broadcasting updated objects in the way the removed paragraph
suggests.
## Update information about solving ACME challenges
In #6619 we removed the VA PortConfig, so information about ports 5001
and 5002 are obsolete.
As well, the docker host IP is almost always (barring a user changing
it) the same, so while there's a longer explanation in the README, a
comment in docker-compose.yml is a useful quick reference.
Only build arm64 images for one version of Go.
Split build.sh into two scripts: build.sh (which installs apt and
Python) and install-go.sh (which installs a specific Go version and Go
dependencies). This allows reusing a cached layer for the build.sh step
across multiple Go versions.
Remove installation of fpm from build.sh. This is no longer needed since
#6669 and allows us to get rid of `rpm`, `ruby`, and `ruby-dev`.
Remove apt dependency on pkg-config, libtool, autoconf, and automake.
These were introduced in
https://github.com/letsencrypt/boulder/pull/4832 but aren't needed
anymore because we don't build softhsm2 ourselves (we get it from apt).
Remove apt dependency on cmake, libssl-dev, and openssl. I'm not totally
sure what these were needed for but they're not needed anymore.
Running this locally on my laptop for our current 3 GO_CI_VERSIONS and 1
GO_DEV_VERSION takes 23 minutes of wall time, dominated by the cross
build for arm64.
Remove `example.com` domain name, which was used by the deleted OldTLS
tests.
Remove GODEBUG=x509sha1=1.
Add a longer comment for the Consul DNS fallback in docker-compose.yml.
Use the "dnsAuthority" field for all gRPC clients in config-next,
instead of implicitly relying on the system DNS. This matches what we do
in prod.
Make "dnsAuthority" field of GRPCClientConfig mandatory whenever
SRVLookup or SRVLookups is used.
Make test/config/ocsp-responder.json use ServerAddress instead of
SRVLookup, like the rest of test/config.
Add go1.20 as a new version to run tests on, and to build release
artifacts from. Fix one test which was failing because it was
accidentally relying on consistent (i.e. unseeded) non-cryptographic
random number generation, which go1.20 now automatically seeds at import
time.
Update the version of golangci-lint used in our docker containers to the
new version that has go1.20 support. Remove a number of nolint comments
that were required due to an old version of the gosec linter.
Clean up several spots where we were behaving differently on
go1.18 and go1.19, now that we're using go1.19 everywhere. Also
re-enable the lint and generate tests, and fix the various places where
the two versions disagreed on how comments should be formatted.
Also clean up the OldTLS codepaths, now that both go1.19 and our
own feature flags have forbidden TLS < 1.2 everywhere.
Fixes#6011
- Add a dedicated Consul container
- Replace `sd-test-srv` with Consul
- Add documentation for configuring Consul
- Re-issue all gRPC credentials for `<service-name>.service.consul`
Part of #6111
In dev docker we've always used a single schema (`boulder_sa`), with two
environments (`test` and `integration`) making for a combined total of two
databases sharing the same users and schema (e.g. `boulder_sa_test` and
`boulder_sa_integration`). There are also two versions of this schema. `db` and
`db-next`. The former is the schema as it should exist in production and the
latter is everything from `db` with some un-deployed schema changes. This change
adds support for additional schemas with the same aforementioned environments
and versions.
- Add support for additional schemas in `test/create_db.sh` and sa/migrations.sh
- Add new schema `incidents_sa` with its own users
- Replace `bitbucket.org/liamstask/goose/` with `github.com/rubenv/sql-migrate`
Part of #6328
When rsyslog receives multiple identical log lines in a row, it can
collapse those lines into a single instance of the log line and a
follow-up line saying "message repeated X times". However, that
rsyslog-generated line does not contain our log line checksum, so it
immediately causes log-validator to complain about the line. In
addition, the rsyslog docs themselves state that this feature is a
misfeature and should never be turned on. Despite this, Ubuntu turns the
feature on by default when the rsyslog package is installed from apt.
Add an additional command to our dockerfile which overwrites Ubuntu's
default setting to disable this misfeature, and update our test
environment to use the new docker image.
Fixes#6252
Run the Boulder unit and integration tests with go1.19.
In addition, make a few small changes to allow both sets of
tests to run side-by-side. Mark a few tests, including our lints
and generate checks, as go1.18-only. Reformat a few doc
comments, particularly lists, to abide by go1.19's stricter gofmt.
Causes #6275
Update:
- golangci-lint from v1.42.1 to v1.46.2
- protoc from v3.15.6 to v3.20.1
- protoc-gen-go from v1.26.0 to v1.28.0
- protoc-gen-go-grpc from v1.1.0 to v1.2.0
- fpm from v1.14.0 to v1.14.2
Also remove a reference to go1.17.9 from one last place.
This does result in updating all of our generated .pb.go files, but only
to update the version number embedded in each file's header.
Fixes#6123
Redis recently released version 7.0.0, which has several breaking
changes. The go-redis library that we rely on does not yet support
communicating with a Redis 7.0.0 cluster.
Pin ourselves to the latest non-7.0.0 version, 6.2.7, until such time
as go-redis releases a version with support for 7.0.0.
Fixes#6071