### Improve consistency
- Make registration `id` an `int64`
- Use `address`, `recipient`, and `record` terminology
- Use `errors.New()` in place of `fmt.Errorf()`
- Use `strings.Builder` in place of `bytes.Buffer`
- Use `errors.Is()` when checking for sentinel errors
- Remove unused (duplicate) `cmd.PasswordFile` in `config`
- Remove unused `cmd.Features` in `config`
### Improve readability
- Use godoc standard comments
- Replace multiple calls to `len(someVariable)` with `totalSomeVariable`
Part of #5420
In normal operation, Boulder does not have release branches, only
release tags. However, when we need to add hotfix commits on top of an
old release, we create a release branch, merge the commits there, and
then produce a new tag pointing at the tip of that branch. These release
branches are documented[1] to be named `refs/heads/release-branch-*`.
Therefore, we should run CI for PRs targeting, and new commits on, those
release branches.
[1] https://github.com/letsencrypt/boulder-release-process#when-main-is-dirty
- Remove field `ECDSAAllowedAccounts` from CA
- Remove `ECDSAAllowedAccounts` from CA tests
- Replace `ECDSAAllowedAccounts` with `ECDSAAllowListFilename` in
`test/config/ca-a.json` and `test/config/ca-b.json`
- Add YAML allow list file at `test/config/ecdsaAllowList.yml`
Fixes#5394
We are in the process of reducing our validity periods by one second,
from 90 days plus one second, to exactly 90 days. This change causes
cert-checker to be comfortable with certificates that have either of
those validity periods.
Future work is necessary to make cert-checker much more robust
and configurable, so we don't need changes like this every time we
reduce our validity period.
Part of #5472
Because ct-test-srv is lives in `//test` instead of in `//cmd`, it is not
included by default in the set of objects which are bundled into the
.deb and .rpb packages produced by the Makefile (although it is
compiled by the `make build` command). Add it to the set of files
bundled into the .deb, for the sake of our SREs.
Add tool to audit subscriber registrations for e-mail addresses that
`notify-mailer` is currently configured to skip.
- Add `cmd/contact-auditor` with README
- Add test coverage for `cmd/contact-auditor`
- Add config file at `test/config/contact-auditor`
Part of #5372
Create script which finds every .proto file in the repo and correctly
invokes `protoc` for each. Create a single file with a `//go:generate`
directive to invoke the new script. Delete all of the other generate.go
files, so that our proto generation is unified in one place.
Fixes#5453
Move the logEvent.Endpoint and .Slug assignment as well as tracing to the
top of the HandlerFunc so a return cannot happen before the assignment.
Fixes cases where the endpoint is blank in logs in certain error cases.
Fixes: #5432
Replace `core.Empty` with `google.protobuf.Empty` in all of our gRPC
methods which consume or return an empty protobuf. The golang core
proto libraries provide an empty message type, so there is no need
for us to reinvent the wheel.
This change is backwards-compatible and does not require a special
deploy. The protobuf message descriptions of `core.Empty` and
`google.protobuf.Empty` are identical, so their wire-formats are
indistinguishable and therefore interoperable / cross-compatible.
Fixes#5443
The //grpc/test_proto/generate.go file was not generating the protos
in its own directory, it was regenerating the VA protos. Therefore the
generated files were out of date, and were relying on an old version
of the go proto library, which we can now remove from our direct deps.
Part of #5443
Part of #5453
Update the signature of the RA's RevokeCertificateWithReg
method to exactly match that of the gRPC method it implements.
Remove all logic from the `RevokeCertificateWithReg` client
and server wrappers. Move the small amount of checking they
were performing directly into the server implementation.
Fixes#5440
The sampling rate integer is used by the span collector to estimate
"how many spans does this span actually represent". This allows accurate
volume comparisons: for example, if you sample successful requests at
a rate of 1/100 and error requests at a rate of 1/10, the trace query
interface will know to scale its query results by those respective
values in order to arrive at accurate error rate estimates.
Previously, this code was returning a sample rate integer of 0 to
indicate that the span was selected for sampling due to an extraordinary
circumstance. This was wrong. This change updates the sample rate int
to be 1, indicating that every such span which exhibited this feature
was sampled, and represents only itself.
Add a test to the Honeycomb SamplerHook which never sends
spans which have a "meta.type" of "grpc_client".
This field and value are set automatically by the Honeycomb gRPC
client interceptor, and can't be set by application code (any fields
set by application code have "app." prepended to their name).
Never sending these spans reduces our visibility into in-datacenter
network latency, but also reduces the number of spans sent by
roughly 50%.
Switch from using the honeycomb beeline's built-in sampling
to a sampler hook which bases its sampling decisions on a
hash of the trace ID. This allows us to do "deterministic"
sampling, where every span in a given trace will either be
sent or not (since the trace ID is the same across all spans
in a trace), giving us more complete traces.
This preserves the same simple (single integer) configuration
of the sample rate. The sample rate can be set differently for
different boulder components (e.g. 1 at the WFE, 100 at the
RA, and 1000 at the nonce-service), but the sampling rate
denominator should only increase towards the leaves of a
gRPC request path.
Use the built-in grpc-go client and server interceptor chaining
utilities, instead of the ones provided by go-grpc-middleware.
Simplify our interceptors to call their handlers/invokers directly,
instead of delegating to the metrics interceptor, and add the
metrics interceptor to the chains instead.
A small collection of bug fixes, code cleanup, terminology standardization,
flag descriptor updates, and comment formatting that wasn't within the
scope of #5389.
- Set database transaction isolation level to `READ UNCOMMITTED` by
default
- Add flag `-use-default-isolation-level` to use database default
instead
- Replace database query used for method `findIDs`
- Replace database query used for method `findIDsForHostnames`
- **Bugfix:** reject `hostnamesFile` with zero entries
- **Bugfix:** use database settings provided in the configuration file
- **Terminology:** standardize on hostname(s) instead of domain(s)
- **Terminology:** update method and function comments to godoc standard
- **Terminology:** rename method `findIDsForDomains` to `findIDsForHostnames`
Fixes#5419
Add Honeycomb tracing to all Boulder components which act as
HTTP servers, gRPC servers, or gRPC clients. Add many values
which we currently emit to logs to the trace spans. Add a way to
configure the Honeycomb integration to our config files, and by
default configure all of our tests to "mute" (send nothing).
Followup changes will refine the configuration, attempt to reduce
the new dependency load, and introduce better sampling.
Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218
- Add support for gathering hostnames in addition to IDs
- Add flag `-with-example-hostnames`
- Add test for new `-with-example-hostnames` code path
- Add types to handle results with a `hostname` field
- Refactor the JSON marshaling and file writing as methods
of the new `idExporterResults` type
- Refactor `main` to account for the `-with-example-hostnames`
code path and add comments
- Update usage text to reflect the addition of `hostname` as a
JSON field
- Update tests to reflect refactoring
- Remove inaccessible code path and corresponding test for
`-outfile` being an empty string
Fixes#5389
Add a new rate limit, identical in implementation to the current
`CertificatesPerFQDNSet` limit, intended to always have both a lower
window and a lower threshold. This allows us to block runaway clients
quickly, and give their owners the ability to fix and try again quickly
(on the order of hours instead of days).
Configure the integration tests to set this new limit at 2 certs per 2
hours. Also increase the existing limit from 5 to 6 certs in 7 days, to
allow clients to hit the first limit three times before being fully
blocked for the week. Also add a new integration test to verify this
behavior.
Note that the new ratelimit must have a window greater than the
configured certificate backdate (currently 1 hour) in order to be
useful.
Fixes#5210
Solve a nil pointer dereference of `ecdsaAllowList` in `boulder-ca` by
calling `reloader.New()` in constructor `ca.NewECDSAAllowListFromFile`
instead.
- Add missing entry `ECDSAAllowListFilename` to
`test/config-next/ca-a.json` and `test/config-next/ca-b.json`
- Add missing file ecdsaAllowList.yml to `test/config-next`
- Add missing entry `ECDSAAllowedAccounts` to `test/config/ca-a.json`
and `test/config/ca-b.json`
- Move creation of the reloader to `NewECDSAAllowListFromFile`
Fixes#5414
- Add field `ECDSAAllowListFilename` to `config.CA`
- Move ECDSA allow list logic from `boulder-ca/main.go` to new file
`ca/ecdsa_allow_list.go`
- Add field `ecdsaAllowList` to `certificateAuthorityImpl`
- Update units test to account for changes to `certificateAuthorityImpl`
- Move previous allow list unit tests to `TestDeprecatedECDSAAllowList`
- Add `TestECDSAAllowList` units tests
Fixes#5361
Switch from using `ORDER BY` and `LIMIT` to obtain a minimum ID from the
certificates table, to using the `MIN()` aggregation function.
Relational databases are most optimized for set aggregation functions,
and anywhere that aggregations can be used for `SELECT` queries tends to
bring performance improvements. Experimentally this is an
order-of-magnitude improvement in query time. Theoretically the query
optimizer should have constructed the same underlying query from each,
but it didn't.
Partially reverts #5400
Fixes of #5393
Explicitly opt in to the least-consistent transaction coherency for the
duration of all cert-checker queries.
The primary risk here is that the windowed table scan across the
certificates table can, on replicas, read a series of rows that aren't
from consistent timesteps. However, the certificates table is
append-only, so in practice this is not a concern, and there is no risk
to enabling the dirtiest of reads, done dirt cheap.
This doesn't impact the length of the window function, so existing
overlap mechanisms to ensure coverage will remain as good as they are
today.
Based on #5400
Part of #5393
The `http.Request` object can already have a context associated
with it. If it does, preserve that context rather than creating a new
one. If it doesn't, create a new `context.Background` instead.
Create a new `ocspImpl` struct which satisfies the interface required
by the `OCSPGenerator` gRPC service. Move the `GenerateOCSP`
method from the `certificateAuthorityImpl` to this new type. To support
existing gRPC clients, keep a reference to the new OCSP service in
the CA impl, and maintain a pass-through `GenerateOCSP` method.
Simplify some of the CA setup code, and make the CA implementation
non-exported because it doesn't need to be.
In order to maintain our existing signature and sign error metrics,
they now need to be initialized outside the CA and OCSP constructors.
This complicates the tests slightly, but seems like a worthwhile
tradeoff.
Fixes#5226Fixes#5086
Abstract out the way that the bdns library keeps track of the
resolvers it uses to do DNS lookups. Create one implementation,
the `StaticProvider`, which behaves exactly the same as the old
mechanism (providing whatever names or addresses were given
in the config). Create another implementation, `DynamicProvider`,
which re-resolves the provided name on a regular basis.
The dynamic provider consumes a single name, does a lookup
on that name for any SRV records suggesting that it is running a
DNS service, and then looks up A records to get the address of
all the names returned by the SRV query. It exports its successes
and failures as a prometheus metric.
Finally, update the tests and config-next configs to work with
this new mechanism. Give sd-test-srv the capability to respond
to SRV queries, and put the names it provides into docker's
default DNS resolver.
Fixes#5306