Commit Graph

1793 Commits

Author SHA1 Message Date
Shiloh Heurich 76705b60a2
s3-test-srv: sync r/w to srv.allShards (#7361)
Fixes https://github.com/letsencrypt/boulder/issues/7353
2024-03-06 11:59:25 -05:00
Aaron Gable 7ddb2be3f9
Update CI to go1.21.8 and go1.22.1 (#7356)
Security releases announced here:
https://groups.google.com/g/golang-announce/c/5pwGVUPoMbg
2024-03-05 14:13:21 -08:00
Matthew McPherrin 313e3b93ba
Add DNSStaticResolver option (#7336)
We run the RVAs in AWS, where we don't have all the same service
discovery infrastructure we do for the primary VAs and the rest of
Boulder. The solution for populating SRV records we have today hasn't
been reliable, so we'd like to experiment with bringing up RVAs paired
1:1 with a local DNS resolver. This brings back some of the previous
static DNS resolver configuration, though it's not a clean revert
because other configuration has changed in the meantime
2024-02-23 14:45:01 -08:00
Aaron Gable 6c9d41f0d9
Update from go1.22rc1 to go1.22 (#7329)
Go 1.22 has been officially released, so update our unit and integration
tests to run on the official version.
2024-02-15 16:15:21 -08:00
Aaron Gable 78e4e82ffa
Feature cleanup (#7320)
Remove three deprecated feature flags which have been removed from all
production configs:
- StoreLintingCertificateInsteadOfPrecertificate
- LeaseCRLShards
- AllowUnrecognizedFeatures

Deprecate three flags which are set to true in all production configs:
- CAAAfterValidation
- AllowNoCommonName
- SHA256SubjectKeyIdentifier

IN-9879 tracked the removal of these flags.
2024-02-13 17:42:27 -08:00
Aaron Gable ad699af3d4
Add CRL capabilities to issuance package (#7300)
Move the CRL issuance logic -- building an x509.RevocationList template,
populating it with correctly-built extensions, linting it, and actually
signing it -- out of the //ca package and into the //issuance package.
This means that the CA's CRL code no longer needs to be able to reach
inside the issuance package to access its issuers and certificates (and
those fields will be able to be made private after the same is done for
OCSP issuance).

Additionally, improve the configuration of CRL issuance, create
additional checks on CRL's ThisUpdate and NextUpdate fields, and make it
possible for a CRL to contain two IssuingDistributionPoint URIs so that
we can migrate to shorter addresses.

IN-10045 tracks the corresponding production changes.

Fixes https://github.com/letsencrypt/boulder/issues/7159
Part of https://github.com/letsencrypt/boulder/issues/7296
Part of https://github.com/letsencrypt/boulder/issues/7294
Part of https://github.com/letsencrypt/boulder/issues/7094
Part of https://github.com/letsencrypt/boulder/issues/7100
2024-02-13 09:13:36 -08:00
Phil Porada aece244f3b
test: Use more //test/hierarchy/ key material in tests (#7318)
The `//ca/ca_test.go` `setup` function will now create issuers that each
have a unique private key from `//test/hierarchy/`, rather than multiple
issuers sharing a private key. This was spotted while reviewing an [OCSP
test](10e894a172/ca/ocsp_test.go (L53-L87)).
Some now unnecessary key material has been deleted from `//test/`.

Fixes https://github.com/letsencrypt/boulder/issues/7304
2024-02-09 14:39:07 -05:00
Samantha f10abd27eb
SA/ARI: Add method of tracking certificate replacement (#7284)
Part of #6732
Part of #7038
2024-02-08 14:19:29 -05:00
Aaron Gable 10e894a172
Create new admin tool (#7276)
Create a new administration tool "bin/admin" as a successor to and
replacement of "admin-revoker".

This new tool supports all the same fundamental capabilities as the old
admin-revoker, including:
- Revoking by serial, by batch of serials, by incident table, and by
private key
- Blocking a key to let bad-key-revoker take care of revocation
- Clearing email addresses from all accounts that use them

Improvements over the old admin-revoker include:
- All commands run in "dry-run" mode by default, to prevent accidental
executions
- All revocation mechanisms allow setting the revocation reason,
skipping blocking the key, indicating that the certificate is malformed,
and controlling the number of parallel workers conducting revocation
- All revocation mechanisms do not parse the cert in question, leaving
that to the RA
- Autogenerated usage information for all subcommands
- A much more modular structure to simplify adding more capabilities in
the future
- Significantly simplified tests with smaller mocks

The new tool has analogues of all of admin-revokers unit tests, and all
integration tests have been updated to use the new tool instead. A
future PR will remove admin-revoker, once we're sure SRE has had time to
update all of their playbooks.

Fixes https://github.com/letsencrypt/boulder/issues/7135
Fixes https://github.com/letsencrypt/boulder/issues/7269
Fixes https://github.com/letsencrypt/boulder/issues/7268
Fixes https://github.com/letsencrypt/boulder/issues/6927
Part of https://github.com/letsencrypt/boulder/issues/6840
2024-02-07 09:35:18 -08:00
Aaron Gable 0358bd7bf3
Ensure gRPC suberror metadata is ascii-only (#7282)
When passing detailed error information between services as gRPC
metadata, ensure that the suberrors being sent contain only ascii
characters, because gRPC metadata is sent as HTTP headers which only
allow visible ascii characters.

Also add a regression test.
2024-02-06 17:40:45 -08:00
Jacob Hoffman-Andrews 14a8378dd0
test: remove use of 10.88.88.88 in most places (#7270)
Part of #7245.

There are still a few places that use 10.88.88.88 that will be harder to
remove. In particular, some of the Python integration tests start up
their own HTTP servers that differ from challtestsrv in some important
way (like timing out requests). Because challtestsrv already binds to
10.77.77.77:80, those test servers need a different IP address to bind
to. We can probably solve that but I'll leave it for another PR.
2024-01-30 11:34:13 -08:00
Aaron Gable d1f8fd2921
RA: improve AdministrativelyRevokeCertificate (#7275)
The RA.AdministrativelyRevokeCertificate method has two primary modes of
operation: if a certificate DER blob is provided, it parses and extracts
information from that blob, and revokes the cert; if no DER is provided,
it assumes the cert is malformed, and revokes it (but doesn't do an OCSP
cache purge) based on the serial alone. However, this scheme has
slightly confusing semantics in the RA and requires that the admin
tooling look up the certificates to provide them to the RA.

Instead, add a new "malformed" field to the RA's
AdministrativelyRevokeCertificateRequest, and deprecate the "cert" field
of that same request. When the malformed boolean is false, the RA will
look up and parse the certificate itself. When the malformed field is
true, it will revoke the cert based on serial alone.

Note that the main logic of AdministrativelyRevokeCertificate -- namely
revoking, potentially re-revoking, doing an akamai cache purge, etc --
is not changed by this PR. The only thing that changes here is how the
RA gets access to the to-be-revoked certificate's information.

Part of https://github.com/letsencrypt/boulder/issues/7135
2024-01-29 13:54:44 -08:00
Samantha 97a19b18d2
WFE: Check NewOrder rate limits (#7201)
Add non-blocking checks of New Order limits to the WFE using the new
key-value based rate limits package.

Part of #5545
2024-01-26 21:05:30 -05:00
Phil Porada 03152aadc6
RVA: Recheck CAA records (#7221)
Previously, `va.IsCAAValid` would only check CAA records from the
primary VA during initial domain control validation, completely ignoring
any configured RVAs. The upcoming
[MPIC](https://github.com/ryancdickson/staging/pull/8) ballot will
require that it be done from multiple perspectives. With the currently
deployed [Multi-Perspective
Validation](https://letsencrypt.org/2020/02/19/multi-perspective-validation.html)
in staging and production, this change brings us in line with the
[proposed phase
3](https://github.com/ryancdickson/staging/pull/8/files#r1368708684).
This change reuses the existing
[MaxRemoteValidationFailures](21fc191273/cmd/boulder-va/main.go (L35))
variable for the required non-corroboration quorum.
> Phase 3: June 15, 2025 - December 14, 2025 ("CAs MUST implement MPIC
in blocking mode*"):
>
>    MUST implement MPIC? Yes
> Required quorum?: Minimally, 2 remote perspectives must be used. If
using less than 6 remote perspectives, 1 non-corroboration is allowed.
If using 6 or more remote perspectives, 2 non-corroborations are
allowed.
>    MUST block issuance if quorum is not met: Yes.
> Geographic diversity requirements?: Perspectives must be 500km from 1)
the primary perspective and 2) all other perspectives used in the
quorum.
>
> * Note: "Blocking Mode" is a nickname. As opposed to "monitoring mode"
(described in the last milestone), CAs MUST NOT issue a certificate if
quorum requirements are not met from this point forward.

Adds new VA feature flags: 
* `EnforceMultiCAA` instructs a primary VA to command each of its
configured RVAs to perform a CAA recheck.
* `MultiCAAFullResults` causes the primary VA to block waiting for all
RVA CAA recheck results to arrive.


Renamed `va.logRemoteValidationDifferentials` to
`va.logRemoteDifferentials` because it can handle initial domain control
validations and CAA rechecking with minimal editing.

Part of https://github.com/letsencrypt/boulder/issues/7061
2024-01-25 16:23:25 -05:00
Aaron Gable adb9673c37
Exempt renewals from NewOrders rate limit (#7002)
When a client is attempting to open a new Order which is identical to an
already-issued certificate, allow that request to bypass the normal New
Orders rate limit. This will allow renewals to go through even when a
client is exhibiting other bad behavior. This should not open the door
to floods of requests for the same certificate in rapid success, as the
Duplicate Certificates rate limit will still block those.

Fixes https://github.com/letsencrypt/boulder/issues/6792
2024-01-23 14:57:37 -08:00
Aaron Gable 6ac1e46bcf
boulder-tools: plumb TARGETPLATFORM into build.sh (#7278)
This is necessary in order for build.sh to download the correct version
of protoc.

This bug was introduced by
https://github.com/letsencrypt/boulder/pull/7205, which inserted another
"FROM" clause between the top of the file (where TARGETPLATFORM was
originally pulled in) and the point where build.sh is executed.
2024-01-23 11:43:43 -08:00
Jacob Hoffman-Andrews ce5632b480
Remove `service1` / `service2` names in consul (#7266)
These names corresponded to single instances of a service, and were
primarily used for (a) specifying which interface to bind a gRPC port on
and (b) allowing `health-checker` to check individual instances rather
than a service as a whole.

For (a), change the `--grpc-addr` flags to bind to "all interfaces." For
(b), provide a specific IP address and port for health checking. This
required adding a `--hostOverride` flag for `health-checker` because the
service certificates contain hostname SANs, not IP address SANs.

Clarify the situation with nonce services a little bit. Previously we
had one nonce "service" in Consul and got nonces from that (i.e.
randomly between the two nonce-service instances). Now we have two nonce
services in consul, representing multiple datacenters, and one of them
is explicitly configured as the "get" service, while both are configured
as the "redeem" service.

Part of #7245.

Note this change does not yet get rid of the rednet/bluenet distinction,
nor does it get rid of all use of 10.88.88.88. That will be a followup
change.
2024-01-22 09:34:20 -08:00
Phil Porada eb69e9a66d
Replace codespell with typos (#7265)
Replace the python "codespell" tool with the rust "typos" tool.
To accomplish this, add a new rust-based step to the boulder-tools
docker build process, with some complexity to handle builds on
multiple developer architectures.

Co-authored-by: Viktor Szépe <viktor@szepe.net>
2024-01-17 18:08:22 -08:00
Aaron Gable d57edfa0f1
Run more go vet checks (#7255)
Enable the atomicalign, deepequalerrors, findcall, nilness,
reflectvaluecompare, sortslice, timeformat, and unusedwrite go vet
analyzers, which golangci-lint does not enable by default. Additionally,
enable new go vet analyzers by default as they become available.

The fieldalignment and shadow analyzers remain disabled because they
report so many errors that they should be fixed in a separate PR.

Note that the nilness analyzer appears to have found one very real bug
in tlsalpn.go.
2024-01-17 12:27:55 -05:00
Matthew McPherrin 56c10c613c
Update zlint (#7252)
Upgrade to zlint v3.6.0

Two new lints are triggered in various places:
aia_contains_internal_names is ignored in integration test
configurations, and unit tests are updated to have more realistic URLs.
The w_subject_common_name_included lint needs to be ignored where we'd
ignored n_subject_common_name_included before.

Related to https://github.com/letsencrypt/boulder/issues/7261
2024-01-16 11:50:37 -08:00
Aaron Gable d38b7b685b
Fix flaky integration test failures (#7262)
This partially reverts commit 20b121138c,
which was landed in https://github.com/letsencrypt/boulder/pull/7254.
Specifically, it reverts the addition of "noWaitForReady" to the
health-checker's gRPC config. This appears to stop the flaky `last
resolver error: produced zero addresses` failures we've been seeing in
the CI integration tests.
2024-01-16 09:50:13 -08:00
Phil Porada 442b906ee8
test: dont overlap ca2 and va2 debug ports (#7257)
https://github.com/letsencrypt/boulder/pull/7246 introduced
using different ports for instances of the same service. CA2 and VA2
accidentally configured the same debug port.
2024-01-11 12:57:35 -08:00
Jacob Hoffman-Andrews 20b121138c
health-checker: bail early on handshake failure (#7254)
When we have a problem with our authentication certificates, it's better
to get a clear error early than to wait for health checker to time out.

Also, set noWaitForReady in the config, which prevents detailed errors
from being obscured by "timed out" errors.
2024-01-11 09:36:35 -08:00
Jacob Hoffman-Andrews 7b347dd6c3
Use different ports for instances of the same service (#7246)
Part of #7245.

This just provides a unique port for each instance, and breaks the
service<->port mapping. A subsequent PR will move to listening on the
same IP.

Remove unused `-b` variants of crl-storer and akamai-purger.

The new port scheme is that the first instance of a service is on `93xx`
and the second instance of a service is on `94xx`.

Part of a stacked change with #7243.
2024-01-10 14:32:33 -08:00
Jacob Hoffman-Andrews cd3bbf91ad
test: move SRV stanzas from config-next to config (#7243)
Service discovery via SRV records is now deployed in prod.
2024-01-10 10:31:23 -08:00
Phil Porada 2fe77e630e
Add additional service resolution strategy to consul doc (#7244)
While working on https://github.com/letsencrypt/boulder/pull/7238, I dug
into why the consul services config has, for example, `[ca-a, ca-b]` in
addition to `[ca1, ca2]`. Boulder test configs use `ca.service.consul`
which will return both CAs (`[ca-a, ca-b]`). For `[ca1, ca2]` though, a
grpc load balancing [integration
test](a55bf19ea0/test/integration-test.py (L121-L143))
individually targets services such as to verify that each backend is
working correctly.
2024-01-09 13:46:44 -08:00
Viktor Szépe 5c0ca04575
Fix typos (#7241)
Found new misspellings using the `typos` rust crate:
https://crates.io/crates/typos
2024-01-09 13:17:27 -08:00
Phil Porada 2e951b0105
Remove ca-a and ca-b distinction in test configs (#7238)
Fixes https://github.com/letsencrypt/boulder/issues/7187
2024-01-08 13:19:28 -08:00
Aaron Gable d84e8d08f2
Begin testing on go1.22rc1 (#7226)
Draft release notes: https://tip.golang.org/doc/go1.22
2023-12-20 11:41:35 -08:00
Aaron Gable 6b54b61f21
Prevent serial prefixes from beginning with a 1 (#7214)
Change the max value of the CA's `SerialPrefix` config value from 255 (a
byte of all 1s) to 127 (a byte of one 0 followed by seven 1s). This
prevents the serial prefix from ever beginning with a 1.

This is important because serials are interpreted as signed
(twos-complement) integers, and are required to be positive -- a serial
whose first bit is 1 is considered to be negative and therefore in
violation of RFC 5280. The go stdlib fixes this for us by prepending a
zero byte to any serial that begins with a 1 bit, but we'd prefer all
our serials to be the same length.

Corresponding config change was completed in IN-9880.
2023-12-15 07:37:44 -08:00
Aaron Gable 26e3646249
Add integration test for account key change (#7208)
Fixes https://github.com/letsencrypt/boulder/issues/3112
Fixes https://github.com/letsencrypt/boulder/issues/7063
2023-12-13 13:54:38 -08:00
Aaron Gable 97cba52e09
Remove deprecated and unused feature flags (#7207)
These feature flags are no longer referenced in any test, staging, or
production configuration. They were removed in:
- StoreRevokerInfo: IN-8546
- ROCSPStage6 and ROCSPStage7: IN-8886
- CAAValidationMethods and CAAAccountURI: IN-9301
2023-12-13 13:53:31 -08:00
Aaron Gable ea9291a4d3
Remove slow query test (#7211)
This test has been "temporarily" disabled for four years. In the mean
time, our approach to the database has changed drastically. Remove it,
since it is likely not worth the effort to re-enable it.

Fixes https://github.com/letsencrypt/boulder/issues/4625
Fixes https://github.com/letsencrypt/boulder/issues/4583
2023-12-13 13:52:52 -08:00
Aaron Gable 5e1bc3b501
Simplify the features package (#7204)
Replace the current three-piece setup (enum of feature variables, map of
feature vars to default values, and autogenerated bidirectional maps of
feature variables to and from strings) with a much simpler one-piece
setup: a single struct with one boolean-typed field per feature. This
preserves the overall structure of the package -- a single global
feature set protected by a mutex, and Set, Reset, and Enabled methods --
although the exact function signatures have all changed somewhat.

The executable config format remains the same, so no deployment changes
are necessary. This change does deprecate the AllowUnrecognizedFeatures
feature, as we cannot tell the json config parser to ignore unknown
field names, but that flag is set to False in all of our deployment
environments already.

Fixes https://github.com/letsencrypt/boulder/issues/6802
Fixes https://github.com/letsencrypt/boulder/issues/5229
2023-12-12 15:51:57 -05:00
Jacob Hoffman-Andrews a0e0bbdb24
boulder-tools: move install-go steps into Dockerfile (#7205)
Previously we made these a single `RUN` step in the Dockerfile to reduce
the size of the final image. Docker pulls all the dependent layers for
an image, which means that even if you delete intermediate build files
in a later `RUN` step, they still contribute to the overall download
size. You can work around that by deleting the intermediate files within
a single `RUN` step.

However, that has downsides: changing one Go dependency meant
downloading Go and all the other dependencies again. By moving these
back into `RUN` steps we get incremental builds, which are nice. And by
adding the builder pattern (`FROM ... AS godeps`), we can avoid having
intermediate files contribute to the overall image size.
2023-12-12 10:14:52 -05:00
Samantha 8cd1e60abf
ratelimits: More compact overrides format (#7199)
Support a more compact format for supplying overrides to default rate
limits.

Fixes #7197
2023-12-11 11:23:39 -08:00
Jacob Hoffman-Andrews c21b376623
Implement DoH for validation queries (#7178)
Fixes: #7141
2023-12-11 10:49:00 -08:00
Jacob Hoffman-Andrews 23b4088a97
Build boulder-tools locally for dev (#7194)
This solves a few problems:

- When producing a new revision of boulder-tools, it often requires
multiple iterations to get it right. This provides a straightforward
path to build those iterations without trying to upload them to a Docker
repository each time.
- It's no longer necessary to produce dev container images in addition
to CI container images. Dev images are built on-demand and cached.
- Cross builds are no longer needed unless building the CI images on
non-amd64.
 
For third-party integration tests that do `docker compose up`, this may
result in longer build times if they are rebuilding from scratch each
time. That can be improved by keeping docker cache around.
2023-12-11 11:11:14 -05:00
Jacob Hoffman-Andrews f8636cc40e
startserver: check for DNS before starting (#7188)
The servers are invoked such that they have to look up their service
names in DNS in order to bind a port. This means that when consul is
down, they take a long time to start up- they are timing out the query.

In the meantime there are a number of messages about timed out health
checks. This winds up obscuring the real error, so let's do a quick DNS
check at startup and give a more meaningful error.
2023-12-07 20:03:43 -08:00
Jacob Hoffman-Andrews a0ce126a0f
set permissions for generated certs and keys (#7193)
minica by default sets restrictive permissions on the directories it
makes. This produced confusing behavior after regenerating keys: the
`bconsul` container failed to start up because it couldn't access its
TLS keys, which led to other errors during startservers.
2023-12-07 20:03:35 -08:00
Matthew McPherrin cb5384dcd7
Add --addr and/or --debug-addr flags to all commands (#7175)
Many services already have --addr and/or --debug-addr flags.

However, it wasn't universal, so this PR adds flags to commands where
they're not currently present.

This makes it easier to use a shared config file but listen on different
ports, for running multiple instances on a single host.

The config options are made optional as well, and removed from
config-next/.
2023-12-07 17:41:01 -08:00
Aaron Gable aa738b5a37
Stop testing on go1.21.4 (#7192) 2023-12-07 15:58:37 -08:00
Phil Porada 3366be50f1
Use RFC 7093 truncated SHA256 hash for Subject Key Identifier (#7179)
- Adds a feature flag to gate rollout for SHA256 Subject Key Identifiers
for end-entity certificates.
- The ceremony tool will now use the RFC 7093 section 2 option 1 method
for generating Subject Key Identifiers for future root CA, intermediate
CA, and cross-sign ceremonies.

- - - -

[RFC 7093 section 2 option
1](https://datatracker.ietf.org/doc/html/rfc7093#section-2) provides a
method for generating a truncated SHA256 hash for the Subject Key
Identifier field in accordance with Baseline Requirement [section
7.1.2.11.4 Subject Key
Identifier](90a98dc7c1/docs/BR.md (712114-subject-key-identifier)).

> [RFC5280] specifies two examples for generating key identifiers from
>    public keys.  Four additional mechanisms are as follows:
> 
>    1) The keyIdentifier is composed of the leftmost 160-bits of the
>       SHA-256 hash of the value of the BIT STRING subjectPublicKey
>       (excluding the tag, length, and number of unused bits).

The related [RFC 5280 section
4.2.1.2](https://datatracker.ietf.org/doc/html/rfc5280#section-4.2.1.2)
states:
>   For CA certificates, subject key identifiers SHOULD be derived from
>   the public key or a method that generates unique values.  Two common
>   methods for generating key identifiers from the public key are:
>   ...
>   Other methods of generating unique numbers are also acceptable.
2023-12-06 13:44:17 -05:00
Aaron Gable c45bfb8aed
Begin testing on go1.21.5 (#7185) 2023-12-05 11:16:55 -08:00
Matthew McPherrin 32adaf1846
Make log-validator take glob patterns to monitor for log files (#7172)
To simplify deployment of the log validator, this allows wildcards
(using go's filepath.Glob) to be included in the file paths.

In order to detect new files, a new background goroutine polls the glob
patterns every minute for matches.

Because the "monitor" function is running in its own goroutine, a lock
is needed to ensure it's not trying to add new tailers while shutdown is
happening.
2023-11-27 12:48:46 -08:00
Matthew McPherrin 54c25f9152
Regenerate redis-tls certs and include script (#7171)
This copies the prelude from grpc-creds/generate.sh into
redis-tls/generate.sh, and regenerates all the certs there, which are
expiring.
2023-11-22 16:45:17 -05:00
Samantha 1bb8ef6e47
Upgrade from go1.21.3 to go1.21.4 (#7154) 2023-11-09 16:17:35 -05:00
Aaron Gable 19582cee4b
Remove go1.21.1 from CI (#7144)
We are running go1.21.3 in all environments.
2023-11-08 16:31:28 -08:00
Aaron Gable 16081d8e30
Invert RequireCommonName into AllowNoCommonName (#7139)
The RequireCommonName feature flag was our only "inverted" feature flag,
which defaulted to true and had to be explicitly set to false. This
inversion can lead to confusion, especially to readers who expect all Go
default values to be zero values. We plan to remove the ability for our
feature flag system to support default-true flags, which the existence
of this flag blocked. Since this flag has not been set in any real
configs, inverting it is easy.

Part of https://github.com/letsencrypt/boulder/issues/6802
2023-11-06 10:58:30 -08:00
Aaron Gable 81cb970d30
Remove crlURL from test CA issuer configs (#7132)
This value is always set to the empty string in prod, which (correctly)
results in the issued certificates not having a CRLDP at all. It turns
out our integration test environment has been including CRLDPs in all of
our test certs because we set crlURL to a non-empty value! This change
updates our test configs to match reality.

I'll remove the code which supports this config value as part of my
upcoming CA CRLDP changes.
2023-11-02 11:20:50 -07:00
Aaron Gable f24ec910ef
Further simplifications to test.ThrowAwayCert (#7129)
Remove ThrowAwayCert's nameCount argument, since it is always set to 1
by all callers. Remove ThrowAwayCertWithSerial, because it has no
callers. Change the throwaway cert's key from RSA512 to ECDSA P-224 for
a two-orders-of-magnitude speedup in key generation. Use this simplified
form in two new places in the RA that were previously rolling their own
test certs.
2023-11-02 09:45:56 -07:00
Aaron Gable 3a3e32514c
Give throwaway test certs reasonable validity intervals (#7128)
Add a new clock argument to the test-only ThrowAwayCert function, and
use that clock to generate reasonable notBefore and notAfter timestamps
in the resulting throwaway test cert. This is necessary to easily test
functions which rely on the expiration timestamp of the certificate,
such as upcoming work about computing CRL shards.

Part of https://github.com/letsencrypt/boulder/issues/7094
2023-11-01 15:24:43 -07:00
Matthew McPherrin 5b3c84d001
Remove the "netaccess" container from the docker-compose dev environment. (#7123)
Remove the "netaccess" container from the docker-compose dev
environment.

It isn't needed during a regular 'docker compose up' developer
environment, and only really serves as a way to use the same tools image
in CI. Two checks run during CI are the govulncheck and verifying go mod
tidy / go vendor. Neither of these checks require anything from the
custom image other than Golang itself, which can be provided directly
from the CI environment.

If a developer is working inside the existing containers, they can still
run `go mod tidy; go mod vendor` themselves, which is a standard Golang
workflow and thus is simpler than using the netaccess image via docker
compose.
2023-11-01 15:11:51 -07:00
Jacob Hoffman-Andrews c84201c09a
observer: add TCP prober (#7118)
This is potentially useful for diagnosing issues with connection
timeouts, which could have separate causes from HTTP errors. For
instance, a connection timeout is more likely to be caused by network
congestion or high CPU usage on the load balancer.
2023-10-27 09:11:18 -07:00
Phil Porada d250a3d7e9
Update to go1.21.3 (#7114)
The [go1.21.3
release](https://groups.google.com/g/golang-announce/c/iNNxDTCjZvo)
contains updates to the `net/http` package for the [HTTP/2 rapid reset
bug](https://cloud.google.com/blog/products/identity-security/how-it-works-the-novel-http2-rapid-reset-ddos-attack).
The fixes in `x/net/http2` will be handled by [another
PR](https://github.com/letsencrypt/boulder/pull/7113).

The following CVEs are fixed in this release:
- [CVE-2023-39325](https://nvd.nist.gov/vuln/detail/CVE-2023-39325)
- [CVE-2023-44487](https://nvd.nist.gov/vuln/detail/CVE-2023-44487)
2023-10-12 15:08:42 -07:00
Samantha 9aef5839b5
WFE: Add new key-value ratelimits implementation (#7089)
Integrate the key-value rate limits from #6947 into the WFE. Rate limits
are backed by the Redis source added in #7016, and use the SRV record
shard discovery added in #7042.

Part of #5545
2023-10-04 14:12:38 -04:00
Aaron Gable 19f03da1e4
crl-storer: check number before uploading (#7065)
Have the crl-storer download the previous CRL from S3, parse it, and
compare its number against the about-to-be-uploaded CRL. This is not an
atomic operation, so it is not a 100% guarantee, but it is still a
useful safety check to prevent accidentally uploading CRL shards whose
CRL Numbers are not strictly increasing.

Part of https://github.com/letsencrypt/boulder/issues/6456
2023-09-27 09:12:44 -07:00
Christian Clauss 574c5cfa9b
latency-charter.py: import matplotlib once, not twice (#7096)
`matplotlib` is already imported on line 3.

% `pipx run ruff --select=F811 .`
```
test/load-generator/latency-charter.py:10:8: F811 Redefinition of unused `matplotlib` from line 3
```
2023-09-22 08:27:48 -07:00
Phil Porada 4bd90ea82f
Log version string for more tools at startup (#7087)
This is a followup to https://github.com/letsencrypt/boulder/pull/7086
2023-09-19 12:46:55 -04:00
Aaron Gable 3b880e1ccf
Add CAAAfterValidation feature flag (#7082)
Add a new feature flag "CAAAfterValidation" which, when set to true in
the VA, causes the VA to only begin CAA checks after basic domain
control validation has completed successfully. This will make successful
validations take longer, since the DCV and CAA checks are performed
serially instead of in parallel. However, it will also reduce the number
of CAA checks we perform by up to 80%, since such a high percentage of
validations also fail.

IN-9575 tracks enabling this feature flag in staging and prod
Fixes https://github.com/letsencrypt/boulder/issues/7058
2023-09-18 13:30:31 -07:00
Aaron Gable cb28a001e9
Unfork crl x509 (#7078)
Delete our forked version of the x509 library, and update all call-sites
to use the version that we upstreamed and got released in go1.21. This
requires making a few changes to calling code:
- replace crl_x509.RevokedCertificate with x509.RevocationListEntry
- replace RevocationList.RevokedCertificates with
RevocationList.RevokedCertificateEntries
- make RevocationListEntry.ReasonCode a non-pointer integer

Our lints cannot yet be updated to use the new types and fields, because
those improvements have not yet been adopted by the zcrypto/x509 package
used by the linting framework.

Fixes https://github.com/letsencrypt/boulder/issues/6741
2023-09-15 20:25:13 -07:00
Samantha 7068db96fe
redis: Add support for *redis.Ring shard configuration using SRV records (#7042)
Part of #5545
2023-09-11 15:05:55 -04:00
Aaron Gable 58ec67c7a8
Remove go1.20 from CI (#7071)
We now deploy go1.21.1 in both Staging and in Prod.
2023-09-08 14:32:51 -04:00
Aaron Gable 102b447e8d
Smoother scheduling and leasing for crl-updater (#7010)
Overhaul crl-updater's default (i.e. non-runOnce) behavior to update
individual CRL shards continuously, rather than updating all shards in a
large batch.

To accomplish this, it spins up one goroutine for each shard of each
issuer this updater is responsible for. Each goroutine is solely
responsible for its assigned shard. It sleeps for a random amount of
time (to stagger their starts), then begins a ticker to wake up every
updateInterval and re-issue its shard.

As part of this change, refactor updater.go into three separate files
(batch.go, continuous.go, and updater.go) containing functions dedicated
to single-run batch processing, long-running continuous processing, and
shared helpers, respectively.

IN-9475 tracks the deprecation of the `updateOffset` config key. The
other configuration changes in this PR do not require production
changes.

Fixes https://github.com/letsencrypt/boulder/issues/7023
2023-09-08 09:16:15 -07:00
Samantha b13174538d
go: Update go1.20.7 and go1.21rc4 to go1.20.8 and go1.21.1 (#7068) 2023-09-06 16:05:05 -04:00
Phil Porada 439517543b
CI: Run staticcheck standalone (#7055)
Run staticcheck as a standalone binary rather than as a library via
golangci-lint. From the golangci-lint help out,
> staticcheck (megacheck): It's a set of rules from staticcheck. It's
not the same thing as the staticcheck binary. The author of staticcheck
doesn't support or approve the use of staticcheck as a library inside
golangci-lint.

We decided to disable ST1000 which warns about incorrect or missing
package comments.

For SA4011, I chose to change the semantics[1] of the for loop rather
than ignoring the SA4011 lint for that line.

Fixes https://github.com/letsencrypt/boulder/issues/6988

1. https://go.dev/ref/spec#Continue_statements
2023-08-31 21:09:40 -07:00
Phil Porada 72e01b337a
ceremony: Distinguish between intermediate and cross-sign ceremonies (#7005)
In `//cmd/ceremony`:
* Added `CertificateToCrossSignPath` to the `cross-certificate` ceremony
type. This new input field takes an existing certificate that will be
cross-signed and performs checks against the manually configured data in
each ceremony file.
* Added byte-for-byte subject/issuer comparison checks to root,
intermediate, and cross-certificate ceremonies to detect that signing is
happening as expected.
* Added Fermat factorization check from the `//goodkey` package to all
functions that generate new key material.

In `//linter`: 
* The Check function now exports linting certificate bytes. The idea is
that a linting certificate's `tbsCertificate` bytes can be compared
against the final certificate's `tbsCertificate` bytes as a verification
that `x509.CreateCertificate` was deterministic and produced identical
DER bytes after each signing operation.

Other notable changes:
* Re-orders the issuers list in each CA config to match staging and
production. There is an ordering issue mentioned by @aarongable two
years ago on IN-5913 that didn't make it's way back to this repository.
> Order here matters – the default chain we serve for each intermediate
should be the first listed chain containing that intermediate.
* Enables `ECDSAForAll` in `config-next` CA configs to match Staging.
* Generates 2x new ECDSA subordinate CAs cross-signed by an RSA root and
adds these chains to the WFE for clients to download.
* Increased the test.sh startup timeout to account for the extra
ceremony run time.


Fixes https://github.com/letsencrypt/boulder/issues/7003

---------

Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
2023-08-23 14:01:19 -04:00
Samantha 48f211c7ba
ratelimits: Add Redis source (#7016)
Part of #5545
2023-08-10 11:45:04 -04:00
Aaron Gable 6a450a2272
Improve CRL shard leasing (#7030)
Simplify the index-picking logic in the SA's leaseOldestCrlShard method.
Specifically, more clearly separate it into "missing" and "non-missing"
cases, which require entirely different logic: picking a random missing
shard, or picking the oldest unleased shard, respectively.

Also change the UpdateCRLShard method to "unlease" shards when they're
updated. This allows the crl-updater to run as quickly as it likes,
while still ensuring that multiple instances do not step on each other's
toes.

The config change for shardWidth and lookbackPeriod instead of
certificateLifetime has been deployed in prod since IN-8445. The config
change changing the shardWidth is just so that the tests neither produce
a bazillion shards, nor have to do a bazillion SA queries for each chunk
within a shard, improving the readability of test logs.

Part of https://github.com/letsencrypt/boulder/issues/7023
2023-08-08 17:05:00 -07:00
Aaron Gable 9a4f0ca678
Deprecate LeaseCRLShards feature (#7009)
This feature flag is enabled in both staging and prod.
2023-08-07 15:17:00 -07:00
Jacob Hoffman-Andrews 725f190c01
ca: remove orphan queue code (#7025)
The `orphanQueueDir` config field is no longer used anywhere.

Fixes #6551
2023-08-02 16:04:28 -07:00
Aaron Gable 359d3f7a1d
Update CI to go1.20.7 and go1.21rc4 (#7028) 2023-08-02 14:26:43 -07:00
Samantha 33109ce384
test: Fix naming of integration test config structs (#7020)
Significantly differentiate configuration struct naming in the
integration package.
2023-08-01 16:24:42 -07:00
Samantha e7cb74b5f8
grpc: Allow for some SRV resolution failures (#7014)
Allow gRPC SRV resolver to succeed even when some names are not resolved
successfully. Cross-DC services (e.g. nonce) will fail to resolve when
the link between DCs is severed or one DC is taken offline, this should
not result in hard gRPC service failures.

Fixes #6974
2023-08-01 12:55:05 -04:00
Jacob Hoffman-Andrews 8d7b87c9ca
cert-checker: check for precertificate correspondence (#7015)
This adds a lookup in cert-checker to find the linting precertificate
with the same serial number as a given final certificate, and checks
precertificate correspondence between the two.

Fixes #6959
2023-07-28 12:45:47 -04:00
Samantha b141fa7c78
WFE: Correct Error Handling for Nonce Redemption RPCs with Unknown Prefixes (#7004)
Fix an issue related to the custom gRPC Picker implementation introduced
in #6618. When a nonce contained a prefix not associated with a known
backend, the Picker would continuously rebuild, re-resolve DNS, and
eventually throw a 500 "Server Error" at RPC timeout. The Picker now
promptly returns a 400 "Bad Nonce" error as expected, in response the
requesting client should retry their request with a fresh nonce.

Additionally:
- WFE unit tests use derived nonces when `"BOULDER_CONFIG_DIR" ==
"test/config-next"`.
- `Balancer.Build()` in "noncebalancer" forces a rebuild until non-zero
backends are available. This matches the
[balancer/roundrobin](d524b40946/balancer/roundrobin/roundrobin.go (L49-L53))
implementation.
- Nonces with no matching backend increment "jose_errors" with label
`"type": "JWSInvalidNonce"` and "nonce_no_backend_found".
- Nonces of incorrect length are now rejected at the WFE and increment
"jose_errors" with label `"type": "JWSMalformedNonce"` instead of
`"type": "JWSInvalidNonce"`.
- Nonces not encoded as base64url are now rejected at the WFE and
increment "jose_errors" with label `"type": "JWSMalformedNonce"` instead
of `"type": "JWSInvalidNonce"`.

Fixes #6969
Part of #6974
2023-07-28 12:07:52 -04:00
Jacob Hoffman-Andrews 04a4805042
tests: add explicit versions to Python dependencies (#6993)
This avoids a situation where building a fresh boulder-tools image
accidentally brings in a new version of codespell, which flags new
misspellings.
2023-07-20 11:20:26 -07:00
Aaron Gable 908421bb98
crl-updater: lease CRL shards to prevent races (#6941)
Add a new feature flag, LeaseCRLShards, which controls certain aspects
of crl-updater's behavior.

When this flag is enabled, crl-updater calls the new SA.LeaseCRLShard
method before beginning work on a shard. This prevents it from stepping
on the toes of another crl-updater instance which may be working on the
same shard. This is important to prevent two competing instances from
accidentally updating a CRL's Number (which is an integer representation
of its thisUpdate timestamp) *backwards*, which would be a compliance
violation.

When this flag is enabled, crl-updater also calls the new
SA.UpdateCRLShard method after finishing work on a shard.

In the future, additional work will be done to make crl-updater use the
"give me the oldest available shard" mode of the LeaseCRLShard method.

Fixes https://github.com/letsencrypt/boulder/issues/6897
2023-07-19 15:11:16 -07:00
Aaron Gable 2bf5b26397
Remove ability to configure policy OIDs (#6992)
Completely remove the ability to configure Certificate Policy OIDs in
the Boulder CA. Instead, hard-code the Baseline Requirements Domain
Validated Reserved Policy Identifier. Boulder will never perform OV or
EV validation, so this is the only identifier that will be necessary.

In the ceremony tool, introduce additional checks that assert that Root
certificates do not have policies, and Intermediate certificates have
exactly the one Baseline Requirements Domain Validated Reserved Policy
Identifier.
2023-07-19 10:38:59 -04:00
Jacob Hoffman-Andrews 7d66d67054
It's borpin' time! (#6982)
This change replaces [gorp] with [borp].

The changes consist of a mass renaming of the import and comments / doc
fixups, plus modifications of many call sites to provide a
context.Context everywhere, since gorp newly requires this (this was one
of the motivating factors for the borp fork).

This also refactors `github.com/letsencrypt/boulder/db.WrappedMap` and
`github.com/letsencrypt/boulder/db.Transaction` to not embed their
underlying gorp/borp objects, but to have them as plain fields. This
ensures that we can only call methods on them that are specifically
implemented in `github.com/letsencrypt/boulder/db`, so we don't miss
wrapping any. This required introducing a `NewWrappedMap` method along
with accessors `SQLDb()` and `BorpDB()` to get at the internal fields
during metrics and logging setup.

Fixes #6944
2023-07-17 14:38:29 -07:00
Aaron Gable e09c5faf5e
Deprecate CAA AccountURI and ValidationMethods feature flags (#7000)
These flags are set to true in all environments.
2023-07-14 14:54:39 -04:00
Aaron Gable 8d8fd3731b
Remove VA.DNSResolver (#7001)
I have confirmed that this config field is not set in any deployment
environment.

Fixes https://github.com/letsencrypt/boulder/issues/6868
2023-07-13 17:56:41 -07:00
Aaron Gable f7b79d07e5
Generate self-signed lint certs when linting roots (#6994)
Our linting system uses a throwaway key to sign an untrusted version of
the to-be-signed cert, then runs the lints over that. But this means
that, when linting a self-signed cert, the signature no longer matches
the embedded public key. This in turn causes a bunch of zlint's checks
to think they're linting a Subordinate CA cert, rather than a Root CA
cert.

Change our linting system to make the lint cert appear self-signed when
the input cert is intended to be self-signed.
2023-07-13 12:29:12 -07:00
Aaron Gable 158f62bd0c
Remove policy qualifiers from all issuance paths (#6980)
The inclusion of Policy Qualifiers inside Policy Information elements of
a Certificate Policies extension is now NOT RECOMMENDED by the Baseline
Requirements. We have already removed these fields from all of our
Boulder configuration, and ceased issuing certificates with Policy
Qualifiers.

Remove all support for configuring and including Policy Qualifiers in
our certificates, both in Boulder's main issuance path and in our
ceremony tool. Switch from using the policyasn1 library to manually
encode these extensions, to using the crypto/x509's
Certificate.PolicyIdentifiers field. Delete the policyasn1 package as it
is no longer necessary.

Fixes https://github.com/letsencrypt/boulder/issues/6880
2023-07-13 10:37:05 -07:00
Phil Porada c7dc3a8d72
Test against go1.20.6 (#6987)
This version includes a fix that seems relevant to us:

> The HTTP/1 client did not fully validate the contents of the Host
header. A maliciously crafted Host header could inject additional
headers or entire requests. The HTTP/1 client now refuses to send
requests containing an invalid Request.Host or Request.URL.Host value.
> 
> Thanks to Bartek Nowotarski for reporting this issue.
> 
> Includes security fixes for CVE-2023-29406 and Go issue
https://go.dev/issue/60374
2023-07-11 12:50:42 -07:00
Phil Porada 947e199016
Add govulncheck to CI (#6963)
Fixes https://github.com/letsencrypt/boulder/issues/6354

Runs
[govulncheck](https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck) in
a one-shot container so that PR creation, updates to a PR, and merges
to main can contact the govuln API and check for known vulnerabilities.

Lastly, upgrades the version of golangci-lint to the [latest available
(v1.53.3)](https://github.com/golangci/golangci-lint/releases).

---------

Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
2023-07-11 09:51:20 -04:00
Jacob Hoffman-Andrews cd24b9db20
ca: deprecate StoreLintingCertificateInsteadOfPrecertificate (#6970)
And turn off the orphan queue in config-next.
2023-07-05 10:44:08 -07:00
Aaron Gable cc596bd4eb
Begin testing on go1.21rc2 with loopvar experiment (#6952)
Add go1.21rc2 to the matrix of go versions we test against.

Add a new step to our CI workflows (boulder-ci, try-release, and
release) which sets the "GOEXPERIMENT=loopvar" environment variable if
we're running go1.21. This experiment makes it so that loop variables
are scoped only to their single loop iteration, rather than to the whole
loop. This prevents bugs such as our CAA Rechecking incident
(https://bugzilla.mozilla.org/show_bug.cgi?id=1619047). Also add a line
to our docker setup to propagate this environment variable into the
container, where it can affect builds.

Finally, fix one TLS-ALPN-01 test to have the fake subscriber server
actually willing to negotiate the acme-tls/1 protocol, so that the ACME
server's tls client actually waits to (fail to) get the certificate,
instead of dying immediately. This fix is related to the upgrade to
go1.21, not the loopvar experiment.

Fixes https://github.com/letsencrypt/boulder/issues/6950
2023-06-26 16:35:29 -07:00
Aaron Gable 8224fad20b
Update to go1.20.5 (#6946)
We are already running go1.20.5 in production.
2023-06-20 14:55:37 -07:00
Jacob Hoffman-Andrews a2b2e53045
cmd: fail without panic (#6935)
For "ordinary" errors like "file not found" for some part of the config,
we would prefer to log an error and exit without logging about a panic
and printing a stack trace.

To achieve that, we want to call `defer AuditPanic()` once, at the top
of `cmd/boulder`'s main. That's so early that we haven't yet parsed the
config, which means we haven't yet initialized a logger. We compromise:
`AuditPanic` now calls `log.Get()`, which will retrieve the configured
logger if one has been set up, or will create a default one (which logs
to stderr/stdout).

AuditPanic and Fail/FailOnError now cooperate: Fail/FailOnError panic
with a special type, and AuditPanic checks for that type and prints a
simple message before exiting when it's present.

This PR also coincidentally fixes a bug: panicking didn't previously
cause the program to exit with nonzero status, because it recovered the
panic but then did not explicitly exit nonzero.

Fixes #6933
2023-06-20 12:29:02 -07:00
Samantha 124c4cc6f5
grpc/sa: Implement deep health checks (#6928)
Add the necessary scaffolding for deep health checking of our various
gRPC components. Each component implementation that also implements the
grpc.checker interface will be checked periodically, and the health
status of the component will be updated accordingly.

Add the necessary methods to SA to implement the grpc.checker interface
and register these new health checks with Consul.

Additionally:
- Update entry point script to check for ProxySQL readiness.
- Increase the poll rate for gRPC Consul checks from 5s to 2s to help
with DNS failures, due to check failures, on startup.
- Change log level for Consul from INFO to ERROR to deal with noisy logs
full of transport failures due to Consul gRPC checks firing before the
SAs are up.

Fixes #6878
Part of #6795
2023-06-12 13:58:53 -04:00
Jacob Hoffman-Andrews 2041e8723b
integration: shorten log output (#6894)
Remove the load test stage of the integration test, which generates
superfluous amounts of log.

Turn down logging on the CA and VA from info to error-only.

Part of https://github.com/letsencrypt/boulder/issues/6890
2023-06-05 13:11:19 -04:00
Samantha dc269a63d5
docker: Update consul container to match production (#6913)
- Update consul container from `1.13.1` to `1.14.2` to match production.
- Specify `grpc_tls`, now required instead of defaulted to `8503` when
`enable_agent_tls_for_checks` is specified.

Part of #6911
2023-06-02 14:35:07 -04:00
Jacob Hoffman-Andrews 80e1510819
admin: add clear-email subcommand (#6919)
When a user wants their email address deleted from the database but no
longer has access to their account, this allows an administrator to
clear it.

This adds `admin` as an alias for `admin-revoker`, because we'd like the
clear-email sub-command to be a part of that overall tool, but it's not
really revocation related.

Part of #6864
2023-06-01 14:33:24 -04:00
Jacob Hoffman-Andrews 521eb55d1e
test: better message for different empty slices (#6920)
Given two empty slices, one that is equal to nil and one that is not,
AssertDeepEquals used to produce this confusing output:

    [[]] !(deep)= [[]]

After this change, it produces:

    [[]string(nil)] !(deep)= [[]string{}]
2023-05-26 09:41:23 -07:00
Aaron Gable 4305f64a28
Replace integration test root ocsp with crls (#6905)
We no longer issue OCSP responses for our intermediate certificates,
instead producing CRLs which cover those intermediates. Remove the OCSP
response from our integration test ceremony, remove the configuration
for the static ocsp-responder which serves that response, and remove the
integration test which spins up and checks that responder. Replace all
of the above with new CRLs generated as part of the integration test
ceremony.
2023-05-24 14:22:43 -07:00
Samantha f09a94bd74
consul: Configure gRPC health check for SA (#6908)
Enable SA gRPC health checks in Consul ahead of further changes for
#6878. Calls to the `Check` method of the SA's grpc.health.v1.Health
service must respond `SERVING` before the `sa` service will be
advertised in Consul DNS. Consul will continue to poll this service
every 5 seconds.

- Add `bconsul` docker service to boulder `bluenet` and `rednet`
- Add TLS credentials for `consul.boulder`:
  ```shell
  $ openssl x509 -in consul.boulder/cert.pem -text | grep DNS
                DNS:consul.boulder
  ```
- Update `test/grpc-creds/generate.sh` to add `consul.boulder`
- Update test SA configs to allow `consul.boulder` to access to
`grpc.health.v1.Health`

Part of #6878
2023-05-23 13:16:49 -04:00
Aaron Gable 26adec08cc
Remove go1.20.3 from CI (#6898)
We are no longer be using go1.20.3 in prod.
2023-05-22 14:47:33 -07:00
Aaron Gable fe523f142d
crl-updater: retry failed shards (#6907)
Add per-shard exponential backoff and retry to crl-updater. Each
individual CRL shard will be retried up to MaxAttempts (default 1)
times, with exponential backoff starting at 1 second and maxing out at 1
minute between each attempt.

This can effectively reduce the parallelism of crl-updater: while a
goroutine is sleeping between attempts of a failing shard, it is not
doing work on another shard. This is a desirable feature, since it means
that crl-updater gently reduces the total load it places on the network
and database when shards start to fail.

Setting this new config parameter is tracked in IN-9140
Fixes https://github.com/letsencrypt/boulder/issues/6895
2023-05-22 12:59:09 -07:00
Aaron Gable 3990a08328
Add relevant domain to CAA errors and logs (#6886)
When processing CAA records, keep track of the FQDN at which that CAA
record was found (which may be different from the FQDN for which we are
attempting issuance, since we crawl CAA records upwards from the
requested name to the TLD). Then surface this name upwards so that it
can be included in our own log lines and in the problem documents which
we return to clients.

Fixes https://github.com/letsencrypt/boulder/issues/3171
2023-05-22 15:08:56 -04:00
Matthew McPherrin b7d9f8c2e3
In config-next/, opentelemetry -> openTelemetry for consistency (#6888)
In configs, opentelemetry -> openTelemetry

As pointed out in review of #6867, these should match the case of their
corresponding Go identifiers for consistency.

JSON keys are case-insensitive in Go (part of why we've got a fork in
go-jose),
so this change should have no functional impact.
2023-05-15 17:07:29 -04:00
Aaron Gable 62ff373885
Probs: remove divergences from RFC8555 (#6877)
Remove the remaining divergences from RFC8555 regarding what error types
we use in certain situations. Specifically:
- use "invalidContact" instead of "invalidEmail";
- use "unsupportedContact" for contact addresses that use a protocol
other than "mailto:"; and
- use "unsupportedIdentifier" for identifiers that specify a type other
than "dns".
2023-05-15 12:35:12 -07:00
Matthew McPherrin c21b44bdc2
Rename CA's "--ca-addr" flag to "--addr" (#6889)
Most boulder components have a command line flag to override what gRPC
and debug port they listen on, which is used in tests to run multiple
instances with the same configuration.

However, CA's flag is named "--ca-addr", and not "--addr". This is
inconsistent with SA, RA, VA, nonce, publisher, and purger.

This flag isn't used in production, where we set it in the config file,
so it shouldn't be a breaking change to rename it.
2023-05-15 11:17:07 -07:00
Matthew McPherrin 3aae67b8a9
Opentelemetry: Add option for public endpoints (#6867)
This PR adds a new configuration block specifically for the otelhttp
instrumentation. This block is separate from the existing
"opentelemetry" configuration, and is only relevant when using otelhttp
instrumentation. It does not share any codepath with the existing
configuration, so it is at the top level to indicate which services it
applies to.

There's a bit of plumbing new configuration through. I've adopted the
measured_http package to also set up opentelemetry instead of just
metrics, which should hopefully allow any future changes to be smaller
(just config & there) and more consistent between the wfe2 and ocsp
responder.

There's one option here now, which disables setting
[otelhttp.WithPublicEndpoint](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp#WithPublicEndpoint).
This option is designed to do exactly what we want: Don't accept
incoming spans as parents of the new span created in the server.
Previously we had a setting to disable parent-based sampling to help
with this problem, which doesn't really make sense anymore, so let's
just remove it and simplify that setup path. The default of "false" is
designed to be the safe option. It's set to True in the test/ configs
for integration tests that use traces, and I expect we'll likely set it
true in production eventually once the LBs are configured to handle
tracing themselves.

Fixes #6851
2023-05-12 15:34:34 -04:00
Samantha 310546a14e
VA: Support discovery of DNS resolvers via Consul (#6869)
Deprecate `va.DNSResolver` in favor of backwards compatible
`va.DNSProvider`.

Fixes #6852
2023-05-12 12:54:31 -04:00
Samantha 19c5244088
test: Use consul hostname instead of IP for dnsAuthority (#6883)
Standardize on hostnames for dnsAuthority to match production. 

Related to #6869
2023-05-11 14:13:53 -07:00
Jacob Hoffman-Andrews f295626e4c
ca: remove simulated ISRG OID from config (#6879)
We intend to issue in the future with only the CA/Browser Forum Domain
Validated OID.
2023-05-10 12:39:12 -04:00
Jacob Hoffman-Andrews ac4be89b56
grpc: add NoWaitForReady config field (#6850)
Currently we set WaitForReady(true), which causes gRPC requests to not
fail immediately if no backends are available, but instead wait until
the timeout in case a backend does become available. The downside is
that this behavior masks true connection errors. We'd like to turn it
off.

Fixes #6834
2023-05-09 16:16:44 -07:00
Samantha c453ca0571
grpc: Deprecate clientNames field (#6870)
- SRE removed in IN-8755

Fixes #6698
2023-05-08 14:49:27 -04:00
Samantha 487680629d
cmd: TLSConfig values should be string not *string (#6872)
Fixes #6737
2023-05-08 13:21:42 -04:00
Samantha c9173cc024
boulder-va: Remove deprecated Common fields stanza (#6871)
- SRE removed in IN-8752.

Fixes #6716
2023-05-08 11:47:17 -04:00
Matthew McPherrin 8427245675
OTel Integration test using jaeger (#6842)
This adds Jaeger's all-in-one dev container (with no persistent storage)
to boulder's dev docker-compose. It configures config-next/ to send all
traces there.

A new integration test creates an account and issues a cert, then
verifies the trace contains some set of expected spans.

This test found that async finalize broke spans, so I fixed that and a
few related spots where we make a new context.
2023-05-05 10:41:29 -04:00
Phil Porada f8f45f90a9
Test and build release on go1.20.4 (#6862)
[Go 1.20.4](https://groups.google.com/g/golang-announce/c/MEb0UyuSMsU)
contains a security updates for the html/template package, which we use
in `//cmd/bad-key-revoker`.
2023-05-04 10:55:02 -04:00
Aaron Gable 02fa680b08
Update path to ARI endpoint (#6859)
Update the document number to the latest version, and remove the /get/
prefix since it now supports both the GET and POST portions of the spec.

Also update one piece of tooling to properly get the ARI URL from the
directory, rather than hard-coding it.
2023-05-03 15:20:51 -07:00
Matthew McPherrin b5118dde36
Stop using DIRECTORY env var in integration tests (#6854)
We only ever set it to the same value, and then read it back in
make_client, so just hardcode it there instead.

It's a bit spooky-action-at-a-distance and is process-wide with no
synchronization, which means we can't safely use different values
anyway.
2023-05-03 09:54:48 -04:00
Jacob Hoffman-Andrews a9fc1cb882
Improve cert_storage_failed_test (#6849)
Replace inline connect string with a new one in test/vars (that points
to boulder_sa_integration).

Remove comments about interpolateParams=false being required; it is not.

Add clauses to getPrecertByName to ensure it follows its documented
constraints (return the latest one).

Follow-up on #6807. Fixes #6848.
2023-05-02 15:43:07 -07:00
Jacob Hoffman-Andrews 1c7e0fd1d8
Store linting certificate instead of precertificate (#6807)
In order to get rid of the orphan queue, we want to make sure that
before we sign a precertificate, we have enough data in the database
that we can fulfill our revocation-checking obligations even if storing
that precertificate in the database fails. That means:

- We should have a row in the certificateStatus table for the serial.
- But we should not serve "good" for that serial until we are positive
the precertificate was issued (BRs 4.9.10).
- We should have a record in the live DB of the proposed certificate's
public key, so the bad-key-revoker can mark it revoked.
- We should have a record in the live DB of the proposed certificate's
names, so it can be revoked if we are required to revoke based on names.

The SA.AddPrecertificate method already achieves these goals for
precertificates by writing to the various metadata tables. This PR
repurposes the SA.AddPrecertificate method to write "proposed
precertificates" instead.

We already create a linting certificate before the precertificate, and
that linting certificate is identical to the precertificate that will be
issued except for the private key used to sign it (and the AKID). So for
instance it contains the right pubkey and SANs, and the Issuer name is
the same as the Issuer name that will be used. So we'll use the linting
certificate as the "proposed precertificate" and store it to the DB,
along with appropriate metadata.

In the new code path, rather than writing "good" for the new
certificateStatus row, we write a new, fake OCSP status string "wait".
This will cause us to return internalServerError to OCSP requests for
that serial (but we won't get such requests because the serial has not
yet been published). After we finish precertificate issuance, we update
the status to "good" with SA.SetCertificateStatusReady.

Part of #6665
2023-04-26 13:54:24 -07:00
Aaron Gable 25cae29f70
Resolve TestAkamaiPurgerDrainQueueFails race (#6844)
Fixes https://github.com/letsencrypt/boulder/issues/6837
2023-04-26 11:30:09 -04:00
Phil Porada 17fb1b287f
cmd: Export prometheus metrics for TLS cert notBefore and notAfter fields (#6836)
Export new prometheus metrics for the `notBefore` and `notAfter` fields
to track internal certificate validity periods when calling the `Load()`
method for a `*tls.Config`. Each metric is labeled with the `serial`
field.

```
tlsconfig_notafter_seconds{serial="2152072875247971686"} 1.664821961e+09
tlsconfig_notbefore_seconds{serial="2152072875247971686"} 1.664821960e+09
```

Fixes https://github.com/letsencrypt/boulder/issues/6829
2023-04-24 16:28:05 -04:00
Aaron Gable 5480f1060b
Clean up database schema (#6832)
Make a series of small changes to our test database schema, both to make
it simpler to reason about and to bring it closer in alignment to our
production database schema:
- Incorporate the IssuedNamesDropIndex, Incidents, SimplePartitioning,
and NotUnique migrations into the CombinedSchema, as they have been
fully applied in prod;
- Use CHARSET=utf8mb4 everywhere, instead of just utf8;
- Use UNSIGNED for auto-increment ID columns in the tables where prod
does; and
- Re-sort the tables in CombinedSchema which no longer have foreign key
constraints.

Part of https://github.com/letsencrypt/boulder/issues/6820
2023-04-21 10:37:05 -07:00
Aaron Gable 3ddca2d1b8
Update eggsampler/acme and use it for ARI tests (#6811)
Update github.com/eggsampler/acme from v3.3.0 to v3.4.0.
Changelog: https://github.com/eggsampler/acme/compare/v3.3.0...v3.4.0

Update the ARI integration test to use the eggampler/acme client's new
ARI capabilities for making both GET and POST requests. This simplifies
and streamlines the test significantly, and lets us test the POST path.

Fixes #6781
2023-04-19 14:14:43 -07:00
Phil Porada 0ac848173e
Appease errcheck (#6821)
Check errors during shutdown for several components to appease errcheck.

Related to [1] and [2].

1) https://github.com/letsencrypt/boulder/pull/6808
2) https://github.com/letsencrypt/boulder/pull/6819
2023-04-14 22:32:24 -04:00
Aaron Gable bd1d27b8e8
Fix non-gRPC process cleanup and exit (#6808)
Although #6771 significantly cleaned up how gRPC services stop and clean
up, it didn't make any changes to our HTTP servers or our non-server
(e.g. crl-updater, log-validator) processes. This change finishes the
work.

Add a new helper method cmd.WaitForSignal, which simply blocks until one
of the three signals we care about is received. This easily replaces all
calls to cmd.CatchSignals which passed `nil` as the callback argument,
with the added advantage that it doesn't call os.Exit() and therefore
allows deferred cleanup functions to execute. This new function is
intended to be the last line of main(), allowing the whole process to
exit once it returns.

Reimplement cmd.CatchSignals as a thin wrapper around cmd.WaitForSignal,
but with the added callback functionality. Also remove the os.Exit()
call from CatchSignals, so that the main goroutine is allowed to finish
whatever it's doing, call deferred functions, and exit naturally.

Update all of our non-gRPC binaries to use one of these two functions.
The vast majority use WaitForSignal, as they run their main processing
loop in a background goroutine. A few (particularly those that can run
either in run-once or in daemonized mode) still use CatchSignals, since
their primary processing happens directly on the main goroutine.

The changes to //test/load-generator are the most invasive, simply
because that binary needed to have a context plumbed into it for proper
cancellation, but it already had a custom struct type named "context"
which needed to be renamed to avoid shadowing.

Fixes https://github.com/letsencrypt/boulder/issues/6794
2023-04-14 16:22:56 -04:00
Aaron Gable 98fa0f07b4
Re-enable errcheck linter (#6819)
Enable the errcheck linter. Update the way we express exclusions to use
the new, non-deprecated, non-regex-based format. Fix all places where we
began accidentally violating errcheck while it was disabled.
2023-04-14 15:41:12 -04:00
Phil Porada 56a11f0896
Fix CI failures related to akamai-test-srv (#6815)
Fixes a CI problem introduced by
https://github.com/letsencrypt/boulder/pull/6758 where we could send two
purge requests which caused sporadic CI failures due to an infinite
loop.

Fixes https://github.com/letsencrypt/boulder/issues/6806
2023-04-13 09:56:30 -07:00
Aaron Gable 45329c9472
Deprecate ROCSPStage7 flag (#6804)
Deprecate the ROCSPStage7 feature flag, which caused the RA and CA to
stop generating OCSP responses when issuing new certs and when revoking
certs. (That functionality is now handled just-in-time by the
ocsp-responder.) Delete the old OCSP-generating codepaths from the RA
and CA. Remove the CA's internal reference to an OCSP implementation,
because it no longer needs it.

Additionally, remove the SA's "Issuers" config field, which was never
used.

Fixes #6285
2023-04-12 17:03:06 -07:00
Aaron Gable d6192e7c56
Stop testing go1.20.2 (#6809)
Staging and Prod have fully upgraded to go1.20.3, per IN-8865.
2023-04-10 11:00:25 -07:00
Aaron Gable e55a276efe
CA: Remove deprecated config stanzas (#6595)
These config stanzas have been removed in staging and prod. They used to
configure the separate OCSP and CRL gRPC services provided by the CA
process, but the CA now provides those services on the same port as the
main CA gRPC service.

Fixes #6448
2023-04-07 09:37:34 -07:00
Aaron Gable 94f93361a0
Promote the first SAN from the CSR (#6796)
Rather than promoting the alphabetically-first SAN to be the CN, promote
the SAN which came first in the CSR. This is a reversion to previous
behavior that was changed as a side-effect of:
- https://github.com/letsencrypt/boulder/pull/6706;
- https://github.com/letsencrypt/boulder/pull/6749; and
- https://github.com/letsencrypt/boulder/pull/6757

Fixes https://github.com/letsencrypt/boulder/issues/6801
2023-04-06 14:30:19 -07:00
Aaron Gable 7e994a1216
Deprecate ROCSPStage6 feature flag (#6770)
Deprecate the ROCSPStage6 feature flag. Remove all references to the
`ocspResponse` column from the SA, both when reading from and when
writing to the `certificateStatus` table. This makes it safe to fully
remove that column from the database.

IN-8731 enabled this flag in all environments, so it is safe to
deprecate.

Part of #6285
2023-04-04 15:41:51 -07:00
Phil Porada 8824e347fd
Golang 1.20.3 security release upgrade (#6793)
Release notes: https://groups.google.com/g/golang-announce/c/Xdv6JL9ENs8

This update includes fixes for excessive memory usage when parsing
headers in the net/http package.
2023-04-04 15:33:34 -07:00
Aaron Gable 8c67769be4
Remove ocsp-updater from Boulder (#6769)
Delete the ocsp-updater service, and the //ocsp/updater library that
supports it. Remove test configs for the service, and remove references
to the service from other test files.

This service has been fully shut down for an extended period now, and is
safe to remove.

Fixes #6499
2023-03-31 14:39:04 -07:00
Aaron Gable 22fd579cf2
ARI: write Retry-After header before body (#6787)
When sending an ARI response, write the Retry-After header before
writing the JSON response body. This is necessary because
http.ResponseWriter implicitly calls WriteHeader whenever Write is
called, flushing all headers to the network and preventing any
additional headers from being written. Unfortunately, the unittests use
httptest.ResponseRecorder, which doesn't seem to enforce this invariant
(it's happy to report headers which were written after the body). Add a
header check to the integration tests, to make up for this deficiency.
2023-03-31 10:48:45 -07:00
Aaron Gable 9262ca6e3f
Add grpc implementation tests to all services (#6782)
As a follow-up to #6780, add the same style of implementation test to
all of our other gRPC services. This was not included in that PR just to
keep it small and single-purpose.
2023-03-31 09:52:26 -07:00
Aaron Gable 0d0116dd3f
Implement GetSerialMetadata on StorageAuthorityRO (#6780)
When external clients make POST requests to our ARI endpoint, they're
getting 404s even when a GET request with the same exact CertID
succeeds. Logs show that this is because the SA is returning "method
GetSerialMetadata not implemented" when the WFE attempts that gRPC
request. This is due to an oversight: the GetSerialMetadata method is
not implemented on the SQLStorageAuthorityRO object, only on the
SQLStorageAuthority object. The unit tests did not catch this bug
because they supply a mock SA, which does implement the method in
question.

Update the receiver and add a wrapper so that GetSerialMetadata is
implemented on both the read-write and read-only SA implementation
types. Add a new kind of test assertion which helps ensure this won't
happen again. Add a TODO for an integration test covering the ARI POST
codepath to prevent a regression.

Fixes #6778
2023-03-30 12:32:14 -07:00
Samantha 511f5b79f1
test: Add ProxySQL to our Docker development stack (#6754)
Add an upstream ProxySQL container to our docker-compose. Configure
ProxySQL to manage database connections for our unit and integration
tests.

Fixes #5873
2023-03-29 18:41:24 -04:00
Matthew McPherrin 49851d7afd
Remove Beeline configuration (#6765)
In a previous PR, #6733, this configuration was marked deprecated
pending removal.  Here is that removal.
2023-03-23 16:58:36 -04:00
Samantha b2224eb4bc
config: Add validation tags to all configuration structs (#6674)
- Require `letsencrypt/validator` package.
- Add a framework for registering configuration structs and any custom
validators for each Boulder component at `init()` time.
- Add a `validate` subcommand which allows you to pass a `-component`
name and `-config` file path.
- Expose validation via exported utility functions
`cmd.LookupConfigValidator()`, `cmd.ValidateJSONConfig()` and
`cmd.ValidateYAMLConfig()`.
- Add unit test which validates all registered component configuration
structs against test configuration files.

Part of #6052
2023-03-21 14:08:03 -04:00
Aaron Gable 6d6f3632da
Change SetCommonName to RequireCommonName (#6749)
Change the SetCommonName flag, introduced in #6706, to
RequireCommonName. Rather than having the flag control both whether or
not a name is hoisted from the SANs into the CN *and* whether or not the
CA is willing to issue certs with no CN, this updated flag now only
controls the latter. By default, the new flag is true, and continues our
current behavior of failing issuance if we cannot set a CN in the cert.
When the flag is set to false, then we are willing to issue certificates
for which the CSR contains no CN and there is no SAN short enough to be
hoisted into the CN field.

When we have rolled out this change, we can move on to the next flag in
this series: HoistCommonName, which will control whether or not a SAN is
hoisted at all, effectively giving the CSRs (and therefore the clients)
full control over whether their certificate contains a SAN.

This change is safe because no environment explicitly sets the
SetCommonName flag to false yet.

Fixes #5112
2023-03-21 11:07:06 -07:00
Matthew McPherrin 1365dacb3f
Remove executable bit from JSON file (#6764)
This is a JSON file that shouldn't be executable. All other
executable files in the repository are python or shell scripts.
2023-03-21 08:59:41 -07:00
Matthew McPherrin 05c9106eba
lints: Consistently format JSON configuration files (#6755)
- Consistently format existing test JSON config files
- Add a small Python script which loads and dumps JSON files
- Add CI JSON lint test to CI

---------

Co-authored-by: Aaron Gable <aaron@aarongable.com>
2023-03-20 18:11:19 -04:00
Aaron Gable 7199a88b6b
Remove go1.20.1 from CI (#6742) 2023-03-15 13:08:22 -04:00
Matthew McPherrin e1ed1a2ac2
Remove beeline tracing (#6733)
Remove tracing using Beeline from Boulder. The only remnant left behind
is the deprecated configuration, to ensure deployability.

We had previously planned to swap in OpenTelemetry in a single PR, but
that adds significant churn in a single change, so we're doing this as
multiple steps that will each be significantly easier to reason about
and review.

Part of #6361
2023-03-14 15:14:27 -07:00
Aaron Gable 9af4871e59
Add SetCommonName feature flag (#6706)
Add a new feature flag, `SetCommonName`, which defaults to `true`. In
this default state, no behavior changes.

When set to `false` on the CA, this flag will cause the CA to leave the
Subject commonName field of the certificate blank, as is recommended by
the Baseline Requirements Section 7.1.4.2.2(a).

Also slightly modify the behavior of the RA's `matchesCSR()` function,
to allow for both certificates that have a CN and certificates that
don't. It is not feasible to put this behavior behind the same
SetCommonName flag, because that would require an atomic deploy of both
the RA and the CA.

Obsoletes #5112
2023-03-09 13:31:55 -05:00
Aaron Gable 46be4927fb
Test and build releases on go1.20.2 (#6723)
Go 1.20.2 contains a security update to the ScalarMult method in the
crypto/elliptic package, which we use inside our goodkey package.
2023-03-08 13:54:07 -08:00
Samantha dcf4a4bd51
ocsp-responder: Remove Config.MaxAge (#6711)
Fixes #6710
Part of #6052
Blocks #6674
2023-03-01 15:45:41 -05:00
Samantha 8440a47d0b
expiration-mailer: Remove Config.NagCheckInterval (#6712)
Fixes #6097
Part of #6052
Blocks #6674
2023-03-01 15:45:18 -05:00
Aaron Gable 29bf521121
CA: Remove secondary gRPC servers (#6496)
Remove the OCSPGenerator and CRLGenerator gRPC servers that run on
separate ports from the CA's main gRPC server, which exposes both those
and the CertificateAuthority service as well. These additional servers
are no longer necessary, now that all three services are exposed on the
single address/port.

Fixes #6448
2023-03-01 11:45:28 -08:00
Phil Porada fdb9c543b7
Remove ReuseValidAuthz code (#6686)
Removes all code related to the `ReuseValidAuthz` feature flag. The
Boulder default is to now always reuse valid authorizations.

Fixes a panic in `test.AssertErrorIs` when `err` is unexpectedly `nil`
that was found this while reworking the
`TestPerformValidationAlreadyValid` test. The go stdlib `func Is`[1]
does not check for this.

1. https://go.dev/src/errors/wrap.go

Part 2/2, fixes https://github.com/letsencrypt/boulder/issues/2734
2023-02-28 17:57:16 -05:00
Phil Porada 6d651cff65
Initialize a stdout/stderr logger for the generate tool (#6703)
Return errors to user in the cert-ceremony generate tool rather than
throwing a panic if syslog facilities are unavailable. Defaults the tool
to only using stdout/stderr.

Fixes #6653
2023-02-28 10:22:47 -08:00
Samantha 98ef3bb2b4
VA/config: Remove unused va.CAA service in config (#6697)
GRPC config from `va.VA` is used for both `va.VA` and `va.CAA`.
2023-02-27 13:44:47 -05:00
Samantha a0fe7dc93e
SA: Remove Redis config (#6695)
This field doesn't appear to be in use.

Part of #6052
2023-02-27 09:29:38 -08:00
Aaron Gable 5ce4b5a6d4
Use time format constants (#6694)
Use constants from the go stdlib time package, such as time.DateTime and
time.RFC3339, when parsing and formatting timestamps. Additionally,
simplify or remove some of our uses of parsing timestamps, such as to
set fake clocks in tests.
2023-02-24 11:22:23 -08:00
Aaron Gable cdf1a6f9f9
Add flag to make order finalization async (#6589)
Add the "AsyncFinalize" feature flag. When enabled, this causes the RA
to return almost immediately from FinalizeOrder requests, with the
actual hard work of issuing the precertificate, getting SCTs, issuing
the final certificate, and updating the database accordingly all
occuring in a background goroutine while the client polls the GetOrder
endpoint waiting for the result.

This is implemented by factoring out the majority of the finalization
work into a new `issueCertificateOuter` helper function, and simply
using the new flag to determine whether we call that helper in a
goroutine or not. This makes removing the feature flag in the future
trivially easy.

Also add a new prometheus metric named `inflight_finalizes` which can be
used to count the number of simultaneous goroutines which are performing
finalization work. This metric is exported regardless of the state of
the AsyncFinalize flag, so that we can observe any changes to this
metric when the flag is flipped.

Fixes #6575
2023-02-24 09:57:54 -08:00
Aaron Gable 427bced0cd
Remove OCSP and CRL methods from CA gRPC service (#6474)
Remove the GenerateOCSP and GenerateCRL methods from the
CertificateAuthority gRPC service. These methods are no longer called by
any clients; all clients use their respective OCSPGenerator and
CRLGenerator gRPC services instead.

In addition, remove the CRLGeneratorServer field from the caImpl, as it
no longer needs it to serve as a backing implementation for the
GenerateCRL pass-through method. Unfortunately, we can't remove the
OCSPGeneratorServer field until after ROCSPStage7 is complete, and the
CA is no longer generating an OCSP response during initial certificate
issuance.

Part of #6448
2023-02-23 14:42:14 -08:00
Jacob Hoffman-Andrews 79250756bf
expiration-mailer: limit number of mails sent to same address per day (#6675)
This adds a config field, "mailsPerAddressPerDay." Addresses that get
that many mails won't receive any more until the next day (UTC).

Fixes #6508.
2023-02-22 15:24:31 -08:00
Phil Porada 6c84a69043
Remove MandatoryPOSTasGET flag (#6672)
Remove the `MandatoryPOSTasGET` flag from the WFE2.
Update the ACMEv2 divergence doc to note that neither staging nor
production use MandatoryPOSTasGET.

Fixes #6582.
2023-02-17 13:04:31 -05:00
Aaron Gable 1c785e75fc
Remove go1.19 from CI (#6671)
Go 1.20.1 is now deployed everywhere. Removing go 1.19 from CI will
allow us to begin adopting various go 1.20-only features that we want,
such as the new crypto/ecdh package.
2023-02-16 17:22:03 -05:00
Phil Porada 1b42b50bff
Update the docker-compose.yml container build timestamp when running tag_and_upload.sh (#6664)
Update the docker-compose.yml container build timestamp when running
tag_and_upload.sh. Does not currently handle updating the Go version in
the container tag.
2023-02-16 14:25:50 -05:00
Jacob Hoffman-Andrews f662332bcf
Speed up builds of boulder-tools images. (#6663)
Only build arm64 images for one version of Go.

Split build.sh into two scripts: build.sh (which installs apt and
Python) and install-go.sh (which installs a specific Go version and Go
dependencies). This allows reusing a cached layer for the build.sh step
across multiple Go versions.

Remove installation of fpm from build.sh. This is no longer needed since
#6669 and allows us to get rid of `rpm`, `ruby`, and `ruby-dev`.

Remove apt dependency on pkg-config, libtool, autoconf, and automake.
These were introduced in
https://github.com/letsencrypt/boulder/pull/4832 but aren't needed
anymore because we don't build softhsm2 ourselves (we get it from apt).

Remove apt dependency on cmake, libssl-dev, and openssl. I'm not totally
sure what these were needed for but they're not needed anymore.

Running this locally on my laptop for our current 3 GO_CI_VERSIONS and 1
GO_DEV_VERSION takes 23 minutes of wall time, dominated by the cross
build for arm64.
2023-02-16 09:35:39 -08:00
Jacob Hoffman-Andrews cd1bbc0d82
Tidy up integration test environment (#6668)
Remove `example.com` domain name, which was used by the deleted OldTLS
tests.

Remove GODEBUG=x509sha1=1.

Add a longer comment for the Consul DNS fallback in docker-compose.yml.

Use the "dnsAuthority" field for all gRPC clients in config-next,
instead of implicitly relying on the system DNS. This matches what we do
in prod.

Make "dnsAuthority" field of GRPCClientConfig mandatory whenever
SRVLookup or SRVLookups is used.

Make test/config/ocsp-responder.json use ServerAddress instead of
SRVLookup, like the rest of test/config.
2023-02-16 09:33:24 -08:00
Aaron Gable f9e4fb6c06
Add replication lag retries to some SA methods (#6649)
Add a new time.Duration field, LagFactor, to both the SA's config struct
and the read-only SA's implementation struct. In the GetRegistration,
GetOrder, and GetAuthorization2 methods, if the database select returned
a NoRows error and a lagFactor duration is configured, then sleep for
lagFactor seconds and retry the select.

This allows us to compensate for the replication lag between our primary
write database and our read-only replica databases. Sometimes clients
will fire requests in rapid succession (such as creating a new order,
then immediately querying the authorizations associated with that
order), and the subsequent requests will fail because they are directed
to read replicas which are lagging behind the primary. Adding this
simple sleep-and-retry will let us mitigate many of these failures,
without adding too much complexity.

Fixes #6593
2023-02-14 17:25:13 -08:00
Phil Porada 28c5595ec6
Golang 1.19.6/1.20.1 security release upgrade (#6659)
Golang 1.19.6/1.20.1 security update release notes: https://groups.google.com/g/golang-announce/c/V0aBFqaFs_E
2023-02-14 16:36:29 -05:00
Samantha 5c49231ea6
ROCSP: Remove support for Redis Cluster (#6645)
Fixes #6517
2023-02-09 17:14:37 -05:00
Phil Porada 134321040b
Default ReuseValidAuthz to true (#6644)
`ReuseValidAuthz` was introduced
here [1] and enabled in staging and production configs on 2016-07-13. 
There was a brief stint during the TLS-SNI-01 challenge type removal where 
SRE disabled it. However, time has finally come to remove this configuration
option. Issue #6623 will determine the feasibility of shorter authz
lifetimes and potentially the removal of authz reuse.

This change is broken up into two parts to allow SRE to safely remove
the flag from staging and production configs. We'll merge this PR, SRE
will deploy boulder and the config change, then we'll finish removing
`ReuseValidAuthz` configuration from the codebase.

[1] boulder commit 9abc212448

Part 1 of 2 for fixing #2734.
2023-02-09 14:26:06 -05:00
Aaron Gable 6dae612e81
ARI: Improve error message and add tooling (#6631)
Give ARI improved error messages when no request path is specified and
when parsing of the request path blob fails.

Also, add a tool which can be used to quickly generate ARI requests and
print their results, to make manual spot-checking easier.

Fixes #6629
2023-02-08 08:22:22 -08:00
Samantha d73125d8f6
WFE: Add custom balancer implementation which routes nonce redemption RPCs by prefix (#6618)
Assign nonce prefixes for each nonce-service by taking the first eight
characters of the the base64url encoded HMAC-SHA256 hash of the RPC
listening address using a provided key. The provided key must be same
across all boulder-wfe and nonce-service instances.
- Add a custom `grpc-go` load balancer implementation (`nonce`) which
can route nonce redemption RPC messages by matching the prefix to the
derived prefix of the nonce-service instance which created it.
- Modify the RPC client constructor to allow the operator to override
the default load balancer implementation (`round_robin`).
- Modify the `srv` RPC resolver to accept a comma separated list of
targets to be resolved.
- Remove unused nonce-service `-prefix` flag.

Fixes #6404
2023-02-03 17:52:18 -05:00
Jacob Hoffman-Andrews e57c788086
Add checking of validations to cert-checker (#6617)
This includes two feature flags: one that controls turning on the extra
database queries, and one that causes cert-checker to fail on missing
validations. If the second flag isn't turned on, it will just emit error
log lines. This will help us find any edge conditions we need to deal
with before making the new code trigger alerts.

Fixes #6562
2023-02-03 16:25:41 -05:00
Phil Porada c0e158ed93
Limit input fields during new authz creation in sa.NewOrderAndAuthz (#6622)
A `core.Authorization` object has lots of fields (e.g. `status`, 
`attempted`, `attemptedAt`) which are not relevant to a 
newly-created authorization: a brand new authz can only be in 
the "pending" state, cannot have been attempted already or have 
been validated.

Fix a nil pointer dereference in `sa.NewOrderAndAuthzs` if a 
`req *sapb.NewOrderAndAuthzsRequest` is passed into the 
function with an inner nil `req.NewOrder`.

Add new tests. 
- TestNewOrderAndAuthzs_MissingInnerOrder 
  - Checks that
the nil pointer dereference no longer occurs 
- TestNewOrderAndAuthzs_NewAuthzExpectedFields 
  - Checks that the `Attempted`, `AttemptedAt`, `ValidationRecords`,
     and `ValidationErrors` fields for a brand new authz in the 
    `pending` state are correctly defaulted to `nil` in 
    `sa.NewOrderAndAuthzs`.

Add a new test assertion `AssertBoxedNil` that returns true for the
existence of a "boxed nil" - a nil value wrapped in a non-nil interface
type.

Fixes #6535

---------

Co-authored-by: Samantha <hello@entropy.cat>
2023-02-03 15:38:51 -05:00
Aaron Gable 18216a7ea8
Run CI tests on go1.20 (#6550)
Add go1.20 as a new version to run tests on, and to build release
artifacts from. Fix one test which was failing because it was
accidentally relying on consistent (i.e. unseeded) non-cryptographic
random number generation, which go1.20 now automatically seeds at import
time.

Update the version of golangci-lint used in our docker containers to the
new version that has go1.20 support. Remove a number of nolint comments
that were required due to an old version of the gosec linter.
2023-02-03 11:57:07 -08:00
Phil Porada 9390c0e5f5
Put errors at end of log lines (#6627)
For consistency, put the error field at the end of unstructured log
lines to make them more ... structured.

Adds the `issuerID` field to "orphaning certificate" log line in the CA
to match the "orphaning precertificate" log line.

Fixes broken tests as a result of the CA and bdns log line change.

Fixes #5457
2023-02-03 11:28:38 -05:00
Phil Porada c091e64aa3
Switch from docker-compose to "docker compose" (#6599)
Switch from standalone docker-compose binary to the "docker compose" subcommand everywhere.
2023-01-30 15:04:52 -05:00
Jacob Hoffman-Andrews 9d3f7d8f84
Add timeout config to WFE (#6621) 2023-01-30 10:07:41 -08:00
Aaron Gable 86c8a23a1a
Add fermat factorization integration test (#6613)
Add an integration test which verifies that we reject finalize requests
with CSRs containing a fermat-factorizable public key.

Originally this change was also going to remove our Fermat factorization
implementation from good_key.go, and simply rely on the similar check in
zlint's e_rsa_fermat_factorization check. However, while relying solely
on the lint works, it causes us to block such requests with a 500
serverInternal error, because we consider failing lints to be our fault.
This would be a regression from the current status quo, where such
requests are rejected with a 400 badCSR error and details of the
factorization, so we are leaving our goodkey checks in place.
2023-01-27 10:15:38 -08:00
Aaron Gable a7dc34f127
ocsp-responder: make db config optional (#6601)
In #6293, we gave the ocsp-responder the ability to use a gRPC
connection to the SA to get status information for certificates, rather
than using a database connection directly. However, that change
neglected to make the database connection configuration optional: an
ocsp-responder with an SA gRPC client configured would never use its
database connection, but if it wasn't configured it would refuse to
start. Fix this oversight by making the DBConfig stanza optional.
2023-01-26 15:21:39 -08:00
Phil Porada 3866e4f60d
VA: Use default PortConfig during testing (#6609)
Part of #3940
2023-01-25 16:16:08 -05:00
Samantha 0d6f8569c5
grpc/rocsp: Allow use of TLSv1.2 and TLSv1.3 (#6600)
When we clamped our MaxVersion to TLS1.2, there wasn't any
support for TLS1.3 yet. Allowing higher versions to be negotiated
is good.

Fixes #6580
2023-01-24 12:53:13 -08:00
Phil Porada 26e5b24585
dependencies: Replace square/go-jose.v2 with go-jose/go-jose.v2 (#6598)
Fixes #6573
2023-01-24 12:08:30 -05:00
Phil Porada aae4175186
Remove deprecated feature flags (#6566)
Remove deprecated feature flags.

Fixes #6559
2023-01-23 20:56:15 -05:00
Jacob Hoffman-Andrews 85e8f1f5cf
Change GHA release workflow to not use artifacts (#6590)
Fixes #6571
2023-01-19 14:30:26 -08:00
Matthew McPherrin 1f6a873fcc
Remove MandatoryPOSTAsGET from config-next (#6585)
In preparation for removing this flag completely in #6582 , remove it
from config-next. This matches boulder's configuration in all LE
environments.
2023-01-12 17:42:28 -08:00
Aaron Gable 86622654fc
Run tests on go1.19.5 (#6576)
Run go1.19.5 alongside go1.19.2 for a while.

Fixes #6574
2023-01-11 11:37:02 -08:00
Samantha 6c6da76400
ROCSP: Replace Redis Cluster with a consistently sharded all-primary nodes (#6516) 2022-12-19 15:06:47 -05:00
Jacob Hoffman-Andrews fe2cf7d136
ocsp: add load shedding for live signer (#6523)
In live.go we use a semaphore to limit how many inflight signing
requests we can have, so a flood of OCSP traffic doesn't flood our CA
instances. If traffic exceeds our capacity to sign responses for long
enough, we want to eventually start fast-rejecting inbound requests that
are unlikely to get serviced before their deadline is reached. To do
that, add a MaxSigningWaiters config field to the OCSP responder.

Note that the files in //semaphore are forked from x/sync/semaphore,
with modifications to add the MaxWaiters field and functionality.

Fixes #6392
2022-12-12 15:48:44 -08:00
Jacob Hoffman-Andrews fd74d20934
wfe2: update unittest to use gRPC-style backend (#6533)
Originally, WFEs had a built-in nonce service. Then we added a "remote
nonce service" via gRPC, but we kept a fallback path for when the remote
nonce service was not configured, to use a built-in nonce service. This
PR removes that fallback path.

Since the fallback path was relied on by the unittests, this also
refactors the unittests to use a gRPC-style nonce service (but in-memory
for the unittests).

Fixes #6530
2022-12-05 11:36:31 -08:00
Aaron Gable d8d5a030f4
SA: Remove NewOrder and NewAuthorizations2 (#6536)
Delete the NewOrder and NewAuthorizations2 methods from the SA's gRPC
interface. These methods have been replaced by the unified
NewOrderAndAuthzs method, which performs both sets of insertions in a
single transaction.

Also update the SA and RA unittests to not rely on these methods for
setting up test data that other functions-under-test rely on. In most
cases, replace calls to NewOrder with calls to NewOrderAndAuthzs. In the
SA tests specifically, replace calls to NewAuthorizations2 with a
streamlined helper function that simply does the single necessary
database insert.

Fixes #6510
Fixes #5816
2022-12-02 14:34:35 -08:00
Aaron Gable a7a2afef7a
ARI: Suggest immediate renewal for revoked certs (#6534)
Update our implementation of ARI to return a renewal window entirely in
the past (i.e., suggesting immediate renewal) if the certificate in
question has been revoked for any reason. This will allow clients which
implement ARI to discover that they need to replace their certificate
without having to query OCSP directly, especially as we move into a
future where OCSP is mostly supplanted by aggregated CRLs.

Fixes #6503
2022-12-02 14:33:55 -08:00
Aaron Gable ba34ac6b6e
Use read-only SA clients in wfe, ocsp, and crl (#6484)
In the WFE, ocsp-responder, and crl-updater, switch from using
StorageAuthorityClients to StorageAuthorityReadOnlyClients. This ensures
that these services cannot call methods which write to our database.

Fixes #6454
2022-12-02 13:48:28 -08:00
Aaron Gable 7517b0d80f
Rehydrate CAA account and method binding (#6501)
Make minor changes to our implementation of CAA Account and Method
Binding, as a result of reviewing the code in preparation for enabling
it in production. Specifically:
- Ensure that the validation method and account ID are included at the
request level, rather than waiting until we perform the checks which use
those parameters;
- Clean up code which assumed the validation method and account ID might
not be populated;
- Use the core.AcmeChallenge type (rather than plain string) for the
validation method everywhere;
- Update comments to reference the latest version and correct sections
of the CAA RFCs; and
- Remove the CAA feature flags from the config integration tests to
reflect that they are not yet enabled in prod.

I have reviewed this code side-by-side with RFC 8659 (CAA) and RFC 8657
(ACME CAA Account and Method Binding) and believe it to be compliant
with both.
2022-11-17 13:31:04 -08:00
Jacob Hoffman-Andrews 659d21cc87
checkocsp: allow fetching by serial number (#6413)
This requires setting --issuer-file and --url, but it allows (for
instance) collecting a big pile of serial numbers for a known issuer,
rather than having to keep whole certificates.
2022-11-15 15:52:59 -08:00
Jacob Hoffman-Andrews 75338135e4
expiration-mailer: use a JOIN to find work more efficiently (#6439)
Right now the expiration mailer does one big SELECT on
`certificateStatus` to find certificates to work on, then several
thousand SELECTs of individual serial numbers in `certificates`.

Since it's more efficient to get that data as a stream from a single
query, rather than thousands of separate queries, turn that into a JOIN.

NOTE: We used to use a JOIN, and switched to the current approach in
#2440 for performance reasons. I _believe_ part of the issue was that at
the time we were not using READ UNCOMMITTED, so we may have been slowing
down the database by requiring it to keep copies of a lot of rows during
the query. Still, it's possible that I've misunderstood the performance
characteristics here and it will still be a regression to use JOIN. So
I've gated the new behavior behind a feature flag.

The feature flag required extracting a new function, `getCerts`. That in
turn required changing some return types so we are not as closely tied
to `core.Certificate`. Instead we use a new local type named
`certDERWithRegId`, which can be provided either by the new code path or
the old code path.
2022-11-14 17:34:58 -08:00
Aaron Gable 4f473edfa8
Deprecate 10 feature flags (#6502)
Deprecate these feature flags, which are consistently set in both prod
and staging and which we do not expect to change the value of ever
again:
- AllowReRevocation
- AllowV1Registration
- CheckFailedAuthorizationsFirst
- FasterNewOrdersRateLimit
- GetAuthzReadOnly
- GetAuthzUseIndex
- MozRevocationReasons
- RejectDuplicateCSRExtensions
- RestrictRSAKeySizes
- SHA1CSRs

Move each feature flag to the "deprecated" section of features.go.
Remove all references to these feature flags from Boulder application
code, and make the code they were guarding the only path. Deduplicate
tests which were testing both the feature-enabled and feature-disabled
code paths. Remove the flags from all config-next JSON configs (but
leave them in config ones until they're fully deleted, not just
deprecated). Finally, replace a few testdata CSRs used in CA tests,
because they had SHA1WithRSAEncryption signatures that are now rejected.

Fixes #5171 
Fixes #6476
Part of #5997
2022-11-14 09:24:50 -08:00
Aaron Gable 9e67423110
Create new StorageAuthorityReadOnly gRPC service (#6483)
Create a new gRPC service named StorageAuthorityReadOnly which only
exposes a read-only subset of the existing StorageAuthority service's
methods.

Implement this by splitting the existing SA in half, and having the
read-write half embed and wrap an instance of the read-only half.
Unfortunately, many of our tests use exported read-write methods as part
of their test setup, so the tests are all being performed against the
read-write struct, but they are exercising the same code as the
read-only implementation exposes.

Expose this new service at the SA on the same port as the existing
service, but with (in config-next) different sets of allowed clients. In
the future, read-only clients will be removed from the read-write
service's set of allowed clients.

Part of #6454
2022-11-09 11:09:12 -08:00
Aaron Gable 4466c953de
CA: Expose all gRPC services on single address (#6495)
Now that we have the ability to easily add multiple gRPC services to the
same server, and control access to each service individually, use that
capability to expose the CA's CertificateAuthority, OCSPGenerator, and
CRLGenerator services all on the same address/port. This will make
establishing connections to the CA easier, but no less secure.

Part of #6448
2022-11-08 15:28:59 -08:00
Samantha b35fe81d7b
ctpolicy: Remove deprecated codepath and fix metrics (#6485)
- Remove deprecated code for #5938
- Fix broken metrics flagged in #6435
- Make CT operator and log selection random

Fixes #6435
Fixes #5938
Fixes #6486
2022-11-07 11:31:20 -08:00
Aaron Gable 46c8d66c31
bgrpc.NewServer: support multiple services (#6487)
Turn bgrpc.NewServer into a builder-pattern, with a config-based
initialization, multiple calls to Add to add new gRPC services, and a
final call to Build to produce the start() and stop() functions which
control server behavior. All calls are chainable to produce compact code
in each component's main() function.

This improves the process of creating a new gRPC server in three ways:
1) It avoids the need for generics/templating, which was slightly
verbose.
2) It allows the set of services to be registered on this server to be
known ahead of time.
3) It greatly streamlines adding multiple services to the same server,
which we use today in the VA and will be using soon in the SA and CA.

While we're here, add a new per-service config stanza to the
GRPCServerConfig, so that individual services on the same server can
have their own configuration. For now, only provide a "ClientNames" key,
which will be used in a follow-up PR.

Part of #6454
2022-11-04 13:26:42 -07:00
Samantha 6d519059a3
akamai-purger: Deprecate PurgeInterval config field (#6489)
Fixes #6003
2022-11-04 12:44:35 -07:00
Aaron Gable 0a02cdf7e3
Streamline gRPC client creation (#6472)
Remove the need for clients to explicitly call bgrpc.NewClientMetrics,
by moving that call inside bgrpc.ClientSetup. In case ClientSetup is
called multiple times, use the recommended method to gracefully recover
from registering duplicate metrics. This makes gRPC client setup much
more similar to gRPC server setup after the previous server refactoring
change landed.
2022-10-28 08:45:52 -07:00
J.C. Jones c791075e00
ct-test-srv should print the logID (#6475)
When using `ct-test-srv` with Boulder infrastructure, it's important now
that the logIDs are correctly configured. This is a nice-to-have that
prints the logID for the provided EC privkey on startup, the same way
that `ct-test-srv` prints the EC pubkey.
2022-10-27 18:02:56 -07:00
Aaron Gable 6efd941e3c
Stabilize CRL shard boundaries (#6445)
Add two new config keys to the crl-updater:
* shardWidth, which controls the width of the chunks that we divide all
of time into, with a default value of "16h" (approximately the same as
today's shard width derived from 128 shards covering 90 days); and
* lookbackPeriod, which controls the amount of already-expired
certificates that should be included in our CRLs to ensure that even
certificates which are revoked immediately before they expire still show
up in aborts least one CRL, with a default value of "24h" (approximately
the same as today's lookback period derived from our run frequency of
6h).

Use these two new values to change the way CRL shards are computed.

Previously, we would compute the total time we care about based on the
configured certificate lifetime (to determine how far forward to look)
and the configured update period (to determine how far back to look),
and then divide that time evenly by the number of shards. However, this
method had two fatal flaws. First, if the certificate lifetime is
configured incorrectly, then the CRL updater will fail to query the
database for some certs that should be included in the CRLs. Second, if
the update period is changed, this would change the lookback period,
which in turn would change the shard width, causing all CRL entries to
suddenly change which shard they're in.

Instead, first compute all chunk locations based only on the shard width
and number of shards. Then determine which chunks we need to care about
based on the configured lookback period and by querying the database for
the farthest-future expiration, to ensure we cover all extant
certificates. This may mean that more than one chunk of time will get
mapped to a single shard, but that's okay -- each chunk will remain
mapped to the same shard for the whole time we care about it.

Fixes #6438
Fixes #6440
2022-10-27 15:59:48 -07:00
Aaron Gable 868214b85e
CRLs: include IssuingDistributionPoint extension (#6412)
Add the Issuing Distribution Point extension to all of our end-entity
CRLs. The extension contains the Distribution Point, the URL from
which this CRL is meant to be downloaded. Because our CRLs are
sharded, this URL prevents an on-path attacker from substituting a
different shard than the client expected in order to hide a revocation.
The extension also contains the OnlyContainsUserCerts boolean,
because our CRLs only contain end-entity certificates.

The Distribution Point url is constructed from a configurable base URI,
the issuer's NameID, the shard index, and the suffix ".crl". The base
URI must use the "http://" scheme and must not end with a slash.

openssl displays the IDP extension as:
```
X509v3 Issuing Distribution Point: critical
  Full Name:
    URI:http://c.boulder.test/66283756913588288/0.crl                Only User Certificates
```

Fixes #6410
2022-10-24 11:21:55 -07:00
Aaron Gable ab4b1eb3e1
Add ROCSPStage7 flag to disable OCSP calls (#6461)
Rather than simply refusing to write OCSP Response bytes to the
database (which is what ROCSP Stage 6 did), Stage 7 refuses to
even generate those bytes in the first place. We obviously can't
disable OCSP Response generation in the CA, since it still needs to
be usable by the ocsp-responder's live-signing path, so instead we
disable it in all of the non-live-signing codepaths (orphan finder,
issue precertificate, revoke certificate, and re-revoke certificate)
which have previously called GenerateOCSP.

Part of #6285
2022-10-21 17:24:19 -07:00
Aaron Gable 02432fcd51
RA: Use OCSPGenerator gRPC service (#6453)
When the RA is generating OCSP (as part of new issuance, revocation,
or when its own GenerateOCSP method is called by the ocsp-responder)
have it use the CA's dedicated OCSPGenerator service, rather than
calling the method exposed by the CA's catch-all CertificateAuthority
service. To facilitate this, add a new GRPCClientConfig stanza to the
RA.

This change will allow us to remove the GenerateOCSP and GenerateCRL
methods from the catch-all CertificateAuthority service, allowing us to
independently control which kinds of objects the CA is willing to sign
by turning off individual service interfaces. The RA's new config stanza
will need to be populated in prod before further changes are possible.

Fixes #6451
2022-10-21 15:37:01 -07:00
Aaron Gable 30d8f19895
Deprecate ROCSP Stage 1, 2, and 3 flags (#6460)
These flags are set in both staging and prod. Deprecate them, make
all code gated behind them the only path, and delete code (multi_source)
which was only accessible when these flags were not set.

Part of #6285
2022-10-21 14:58:34 -07:00
Aaron Gable 410732e8a7
Remove go1.18 from testing (#6459)
We are no longer running on go1.18 in production.
2022-10-21 14:55:37 -07:00
Aaron Gable 6b1857d4b0
Switch to using go1.18.7 and go1.19.2 in tests (#6437)
Fixes #6434
2022-10-18 09:45:44 -07:00
Aaron Gable 272625b4a4
Add CRLDPBase config key to boulder-ca (#6442)
Add a new configuration key to the CA which allows us to
specify the "base URL" for our CRLs. This will be necessary
before including an Issuing Distribution Point extension in our
CRLs, or a CRL Distribution Point in our certificates.

Part of #6410
2022-10-11 08:55:25 -07:00
Matthew McPherrin 1d16ff9b00
Add support for subcommands to "boulder" command (#6426)
Boulder builds a single binary which is symlinked to the different binary names, which are included in its releases.
However, requiring symlinks isn't always convenient.

This change makes the base `boulder` command usable as any of the other binary names.  If the binary is invoked as boulder, runs the second argument as the command name.  It shifts off the `boulder` from os.Args so that all the existing argument parsing can remain unchanged.

This uses the subcommand versions in integration tests, which I think is important to verify this change works, however we can debate whether or not that should be merged, since we're using the symlink method in production, that's what we want to test.

Issue #6362 suggests we want to move to a more fully-featured command-line parsing library that has proper subcommand support. This fixes one fragment of that, by providing subcommands, but is definitely nowhere near as nice as it could be with a more fully fleshed out library.  Thus this change takes a minimal-touch approach to this change, since we know a larger refactoring is coming.
2022-10-06 11:21:47 -07:00
Samantha 9c12e58c7b
grpc: Allow static host override in client config (#6423)
- Add a new gRPC client config field which overrides the dNSName checked in the
  certificate presented by the gRPC server.
- Revert all test gRPC credentials to `<service>.boulder`
- Revert all ClientNames in gRPC server configs to `<service>.boulder`
- Set all gRPC clients in `test/config` to use `serverAddress` + `hostOverride`
- Set all gRPC clients in `test/config-next` to use `srvLookup` + `hostOverride`
- Rename incorrect SRV record for `ca` with port `9096` to `ca-ocsp` 
- Rename incorrect SRV record for `ca` with port `9106` to `ca-crl` 

Resolves #6424
2022-10-03 15:23:55 -07:00
Jacob Hoffman-Andrews 582b5e346f
Make caa-log-checker run over docker logs (#6388)
This uncovered a bug! The stdout logger was truncating the microseconds part
of its timestamp if the last digit was zero. Fixed that. Also coerced the
stdout logger to use UTC.

To run the checker over our integration test logs, I changed t.sh to use
an explicit name for the container that runs boulder during the tests,
and pulled logs from that container after the tests.
2022-09-26 14:59:15 -07:00
Jacob Hoffman-Andrews 46e41ca8bd
expiration-mailer: allow limiting UPDATE statement (#6400)
This avoids the statements getting so big they can't run.

Also, drive-by add some comments to the expiration-mailer config.
2022-09-26 12:07:31 -07:00
Samantha ffad58009e
grpc: Backend discovery improvements (#6394)
- Fork the default `dns` resolver from `go-grpc` to add backend discovery via
  DNS SRV resource records.
- Add new fields for SRV based discovery to `cmd.GRPCClientConfig`
- Add new (optional) field `DNSAuthority` for specifying custom DNS server to
  `cmd.GRPCClientConfig`
- Add a utility method to `cmd.GRPCClientConfig` to simplify target URI and host
  construction. With three schemes and `DNSAuthority` it makes more sense to
  handle all of this parsing and construction outside of the RPC client
  constructor.

Resolves #6111
2022-09-23 13:11:59 -07:00
Samantha 90eb90bdbe
test: Replace sd-test-srv with consul (#6389)
- Add a dedicated Consul container
- Replace `sd-test-srv` with Consul
- Add documentation for configuring Consul
- Re-issue all gRPC credentials for `<service-name>.service.consul`

Part of #6111
2022-09-19 16:13:53 -07:00
Samantha a97893070f
admin-revoker: Add support for revoking by incident table (#6376)
- Add subcommand `incident-table-revoke` to `admin-revoker`
- Implement streaming RPC adapter for `SerialsForIncident()` in `/test/inmem/sa`
- Refactor the `admin-revoker` tests to use shared setup functions and methods

Resolves #6332
2022-09-16 16:38:40 -07:00
Jacob Hoffman-Andrews db044a8822
log: fix spurious honeycomb warnings; improve stdout logger (#6364)
Honeycomb was emitting logs directly to stderr like this:

```
WARN: Missing API Key.
WARN: Dataset is ignored in favor of service name. Data will be sent to service name: boulder
```

Fix this by providing a fake API key and replacing "dataset" with "serviceName" in configs. Also add missing Honeycomb configs for crl-updater.

For stdout-only logger, include checksums and escape newlines.
2022-09-14 11:25:02 -07:00
Jacob Hoffman-Andrews 797f3c7217
responder: return InternalError for expired responses (#6377)
This was masking a bug, because the integration test for OCSP responses
for expired certificates was looking for the "unauthorized" OCSP
response status. Which we were returning, even though our HTTP-level
response code was 533.
2022-09-14 11:24:46 -07:00
Jacob Hoffman-Andrews 3a72f6b0a9
Reject SHA-1 CSRs in config-next (#6374) 2022-09-12 16:34:07 -07:00
Aaron Gable 7f189f7a3b
Improve how crl-updater formats and surfaces errors (#6369)
Make every function in the Run -> Tick -> tickIssuer -> tickShard chain
return an error. Make that return value a named return (which we usually
avoid) so that we can remove the manual setting of the metric result
label and have the deferred metric handling function take care of that
instead. In addition, let that cleanup function wrap the returned error
(if any) with the identity of the shard, issuer, or tick that is
returning it, so that we don't have to include that info in every
individual error message. Finally, have the functions which spin off
many helpers (Tick and tickIssuer) collect all of their helpers' errors
and only surface that error at the end, to ensure the process completes
even in the presence of transient errors.

In crl-updater's main, surface the error returned by Run or Tick, to
make debugging easier.
2022-09-12 11:36:42 -07:00
Aaron Gable 78fbda1cd2
Enable CRL test in config integration tests (#6368)
Now that both crl-updater and crl-storer are running in prod,
run this integration test in both test environments as well.

In addition, remove the fake storer grpc client that the updater
used when no storer client was configured, as storer clients
are now configured in all environments.
2022-09-09 16:03:49 -07:00
Samantha 78ea1d2c9d
SA: Use separate schema for incidents tables (#6350)
- Move incidents tables from `boulder_sa` to `incidents_sa` (added in #6344)
- Grant read perms for all tables in `incidents_sa`
- Modify unit tests to account for new schema and grants
- Add database cleaning func for `boulder_sa`
- Adjust cleanup funcs to omit `sql-migrate` tables instead of `goose`

Resolves #6328
2022-09-09 15:17:14 -07:00
Jacob Hoffman-Andrews 252147f2a1
ocsp/helper: don't register flags by default (#6359)
Fixes #6330
2022-09-09 13:11:08 -07:00
Jacob Hoffman-Andrews 5443b23239
startservers: add ocsp-responder -> ra dependency (#6365)
This ensures the RA comes up before we start the ocsp-responder,
preventing spurious connection errors in the log output.

Fixes #6331
2022-09-09 12:55:32 -07:00
Samantha bc1bf0fde4
test: Support multiple database schemas (#6344)
In dev docker we've always used a single schema (`boulder_sa`), with two
environments (`test` and `integration`) making for a combined total of two
databases sharing the same users and schema (e.g. `boulder_sa_test` and
`boulder_sa_integration`). There are also two versions of this schema. `db` and
`db-next`. The former is the schema as it should exist in production and the
latter is everything from `db` with some un-deployed schema changes. This change
adds support for additional schemas with the same aforementioned environments
and versions.

- Add support for additional schemas in `test/create_db.sh` and sa/migrations.sh
- Add new schema `incidents_sa` with its own users
- Replace `bitbucket.org/liamstask/goose/` with `github.com/rubenv/sql-migrate`

Part of #6328
2022-09-07 14:59:08 -07:00
Aaron Gable 6d3a9d17d2
Update to go1.18.6/1.19.1 for net security fixes (#6353)
Update to go1.18.6/1.19.1 for net security fixes.

Fix typos found by newer codespell.
2022-09-06 12:45:22 -07:00
Samantha b7b662e755
boulder-tools: Update README (#6343)
- Add fix for `gem install` issues encountered with `build.sh`
- Add setup steps note for macOS users
2022-09-02 08:30:52 -07:00
Aaron Gable c706609e79
Update grpc from v1.36.1 to v1.49.0 (#6336)
Changelog: https://github.com/grpc/grpc-go/compare/v1.36.1...v1.49.0

The biggest change for us is that grpc.WithBalancerName has
transitioned from deprecated to fully removed. The fix is to replace
it with a JSON-formatted "default config" object, as demonstrated in
https://github.com/grpc/grpc-go/pull/5232#issuecomment-1106921140.

This should unblock updating other dependencies which want to
transitively update gRPC as well.
2022-09-01 13:29:06 -07:00
Aaron Gable 4e8df49908
loglist: handle logs with no state (#6329)
While the Chrome log_list.json has a `state` stanza for every
log, the all_logs_list.json file does not. This code was originally
tested against the former file, but we are actually using the
latter file in production. Add a check for missing `state` stanzas
to avoid a nil pointer dereference.
2022-08-31 09:10:29 -07:00
Aaron Gable 73b72e8fa2
ARI: Implement GET portion of draft-ietf-acme-ari-00 (#6322)
Update our ACME Renewal Info implementation to parse
the CertID-based request format specified in the current
version of the draft specification.

Part of #6033
2022-08-30 14:03:26 -07:00
Jacob Hoffman-Andrews f98d74c14d
log: emit warnings and errors on stderr (#6325)
Debug and Info messages still go to stdout.

Fix the CAA integration test, which asserted that stderr should be empty
when caa-log-checker finds a problem. That used to be the case because
we never logged to stderr, but now it is the case.

Update the logging docs.

Fixes #6324
2022-08-29 15:00:55 -07:00
Jacob Hoffman-Andrews dd1c52573e
log: allow logging to stdout/stderr instead of syslog (#6307)
Right now, Boulder expects to be able to connect to syslog, and panics
if it's not available. We'd like to be able to log to stdout/stderr as a
replacement for syslog.

- Add a detailed timestamp (down to microseconds, same as we collect in
prod via syslog).
- Remove the escape codes for colorizing output.
- Report the severity level numerically rather than with a letter prefix.

Add locking for stdout/stderr and syslog logs. Neither the [syslog] package
nor the [os] package document concurrency-safety, and the Go rule is: if
it's not documented to be concurrent-safe, it's not. Notably the [log.Logger]
package is documented to be concurrent-safe, and a look at its implementation
shows it uses a Mutex internally.

Remove places that use the singleton `blog.Get()`, and instead pass through
a logger from main in all the places that need it.

[syslog]: https://pkg.go.dev/log/syslog
[os]: https://pkg.go.dev/os
[log.Logger]: https://pkg.go.dev/log#Logger
2022-08-29 06:19:22 -07:00
Jacob Hoffman-Andrews 6ad06789d9
rocsp-tool: add "get-pem" output (#6317)
Emit PEM output instead of pretty-printed output. Send the pretty-printed
output straight to stdout instead of via a logger, so the internal newlines don't
get escaped.

Fixes #6310
2022-08-25 12:52:58 -07:00
Aaron Gable 0340b574d9
Add unparam linter to CI (#6312)
Enable the "unparam" linter, which checks for unused function
parameters, unused function return values, and parameters and
return values that always have the same value every time they
are used.

In addition, fix many instances where the unparam linter complains
about our existing codebase. Remove error return values from a
number of functions that never return an error, remove or use
context and test parameters that were previously unused, and
simplify a number of (mostly test-only) functions that always take the
same value for their parameter. Most notably, remove the ability to
customize the RSA Public Exponent from the ceremony tooling,
since it should always be 65537 anyway.

Fixes #6104
2022-08-23 12:37:24 -07:00
Aaron Gable c1be8cfc52
crl-storer: load whole AWS config files (#6309)
Allow the crl-storer to load whole AWS config files. Although
this requires a deployment to maintain an additional config
files for the crl-storer, and one in a format we usually don't
use, it does give us lots of flexibility in setting up things like
role assumption.

Also remove the S3Region config flag, as it is now redundant
with the contents of the config file, and rename the existing
S3CredsFile config key to AWSCredsFile to better represent
its true contents.

Fixes #6308
2022-08-23 11:04:12 -07:00
Aaron Gable 4ad66729d2
Tests: use reflect.IsNil() to avoid boxed nil issues (#6305)
Add a new `test.AssertNil()` helper to facilitate asserting that a given
unit test result is a non-boxed nil. Update `test.AssertNotNil()` to use
the reflect package's `.IsNil()` method to catch boxed nils.

In Go, variables whose type is constrained to be an interface type (e.g.
a function parameter which takes an interface, or the return value of a
function which returns `error`, itself an interface type) should
actually be thought of as a (T, V) tuple, where T is their underlying
concrete type and V is their underlying value. Thus, there are two ways
for such a variable to be nil-like: it can be truly nil where T=nil and
V is uninitialized, or it can be a "boxed nil" where T is a nillable
type such as a pointer or a slice and V=nil.

Unfortunately, only the former of these is == nil. The latter is the
cause of frequent bugs, programmer frustration, a whole entry in the Go
FAQ, and considerable design effort to remove from Go 2.

Therefore these two test helpers both call `t.Fatal()` when passed a
boxed nil. We want to avoid passing around boxed nils whenever possible,
and having our tests fail whenever we do is a good way to enforce good
nil hygiene.

Fixes #3279
2022-08-19 14:47:34 -07:00
Aaron Gable b001af71e8
Add new services to log-validator test config (#6303)
Fixes #6289
2022-08-17 16:46:11 -07:00
Aaron Gable 09195e6804
ocsp-responder: get minimal status info from SA (#6293)
Add a new `GetRevocationStatus` gRPC method to the SA which retrieves
only the subset of the certificate status metadata relevant to
revocation, namely whether the certificate has been revoked, when it was
revoked, and the revocation reason. Notably, this method is our first
use of the `goog.protobuf.Timestamp` type in a message, which is more
ergonomic and less prone to errors than using unix nanoseconds.

Use this new method in ocsp-responder's checked_redis_source, to avoid
having to send many other pieces of metadata and the full ocsp response
bytes over the network. It provides all the information necessary to
determine if the response from Redis is up-to-date.

Within the checked_redis_source, use this new method in two different
ways: if only a database connection is configured (as is the case today)
then get this information directly from the db; if a gRPC connection to
the SA is available then prefer that instead. This may make requests
slower, but will allow us to remove database access from the hosts which
run the ocsp-responder today, simplifying our network.

The new behavior consists of two pieces, each locked behind a config
gate:
- Performing the smaller database query is only enabled if the
  ocsp-responder has the `ROCSPStage3` feature flag enabled.
- Talking to the SA rather than the database directly is only enabled if
  the ocsp-responder has an `saService` gRPC stanza in its config.

Fixes #6274
2022-08-16 16:37:24 -07:00
Aaron Gable 00734a6edf
Stop rsyslog from de-duplicating log lines (#6291)
When rsyslog receives multiple identical log lines in a row, it can
collapse those lines into a single instance of the log line and a
follow-up line saying "message repeated X times". However, that
rsyslog-generated line does not contain our log line checksum, so it
immediately causes log-validator to complain about the line. In
addition, the rsyslog docs themselves state that this feature is a
misfeature and should never be turned on. Despite this, Ubuntu turns the
feature on by default when the rsyslog package is installed from apt.

Add an additional command to our dockerfile which overwrites Ubuntu's
default setting to disable this misfeature, and update our test
environment to use the new docker image.

Fixes #6252
2022-08-11 12:37:16 -07:00
Aaron Gable 3a12177eab
ROCSP Stage 6: Never write OCSP responses to DB (#6284)
Create a new `ROCSPStage6` feature flag which affects the behavior of
the SA. When enabled, this flag causes the `AddPrecertificate`,
`RevokeCertificate`, and `UpdateRevokedCertificate` methods to ignore
the OCSP response bytes provided by their caller. They will no longer
error out if those bytes are missing, and if the bytes are present they
will still not be written to the database.

This allows us to, in the future, cause the RA and CA to stop generating
those OCSP responses entirely, and stop providing them to the SA,
without causing any errors when we do.

Part of #6079
2022-08-10 15:31:26 -07:00
Aaron Gable d1b211ec5a
Start testing on go1.19 (#6227)
Run the Boulder unit and integration tests with go1.19.

In addition, make a few small changes to allow both sets of
tests to run side-by-side. Mark a few tests, including our lints
and generate checks, as go1.18-only. Reformat a few doc
comments, particularly lists, to abide by go1.19's stricter gofmt.

Causes #6275
2022-08-10 15:30:43 -07:00
Aaron Gable 9c197e1f43
Use io and os instead of deprecated ioutil (#6286)
The iotuil package has been deprecated since go1.16; the various
functions it provided now exist in the os and io packages. Replace all
instances of ioutil with either io or os, as appropriate.
2022-08-10 13:30:17 -07:00
Aaron Gable 93d3e0b9e5
Enable early ROCSP stages in integration tests (#6280)
For some reason ROCSPStage3 was enabled without also enabling
ROCSP Stages 1 and 2. Fix the oversight so we're actually running
all of the first three ROCSP stages in config-next integration tests.
2022-08-10 12:40:18 -07:00
Aaron Gable 6a9bb399f7
Create new crl-storer service (#6264)
Create a new crl-storer service, which receives CRL shards via gRPC and
uploads them to an S3 bucket. It ignores AWS SDK configuration in the
usual places, in favor of configuration from our standard JSON service
config files. It ensures that the CRLs it receives parse and are signed
by the appropriate issuer before uploading them.

Integrate crl-updater with the new service. It streams bytes to the
crl-storer as it receives them from the CA, without performing any
checking at the same time. This new functionality is disabled if the
crl-updater does not have a config stanza instructing it how to connect
to the crl-storer.

Finally, add a new test component, the s3-test-srv. This acts similarly
to the existing mail-test-srv: it receives requests, stores information
about them, and exposes that information for later querying by the
integration test. The integration test uses this to ensure that a
newly-revoked certificate does show up in the next generation of CRLs
produced.

Fixes #6162
2022-08-08 16:22:48 -07:00
Samantha 576b6777b5
grpc: Implement a static multiple IP address gRPC resolver (#6270)
- Implement a static resolver for the gPRC dialer under the scheme `static:///`
  which allows the dialer to resolve a backend from a static list of IPv4/IPv6
  addresses passed via the existing JSON config.
- Add config key `serverAddresses` to the `GRPCClientConfig` which, when
  populated, enables static IP resolution of gRPC server backends.
- Set `config-next` to use static gRPC backend resolution for all SA clients.
- Generate a new SA certificate which adds `10.77.77.77` and `10.88.88.88` to
  the SANs.

Resolves #6255
2022-08-05 10:20:57 -07:00
Jacob Hoffman-Andrews b6c4d9bc21
ocsp/responder: add checked Redis source (#6272)
Add checkedRedisSource, a new OCSP Source which gets
responses from Redis, gets metadata from the database, and
only serves the Redis response if it matches the authoritative
metadata. If there is a mismatch, it requests a new OCSP
response from the CA, stores it in Redis, and serves the new
response.

This behavior is locked behind a new ROCSPStage3 feature flag.

Part of #6079
2022-08-04 16:22:14 -07:00
Samantha 0e7940bb48
test: Fix gRPC creds and script (#6276)
- Move entry for `nonce` service to the second `minica` loop so that DNS names
  `nonce1.boulder` and `nonce2.boulder` are added to the SANS
- Remove anachronistic `crl-storer` gRPC cert and key added in #6212
2022-08-04 13:00:26 -07:00
Aaron Gable 305f5b1bc0
Stop testing on go1.18.1 (#6258)
Prod has been updated to 1.18.4.
2022-08-02 13:20:38 -07:00
Samantha 1464c34938
RA: Implement leaky bucket for duplicate certificate limit (#6262)
- Modify `ra.checkCertificatesPerFQDNSetLimit()` to use a leaky bucket algorithm
- Return issuance timestamps from `sa.FQDNSetTimestampsForWindow()` in descending order

Resolves #6154
2022-07-29 17:39:31 -07:00
Aaron Gable 694d73d67b
crl-updater: add UpdateOffset config to run on a schedule (#6260)
Add a new config key `UpdateOffset` to crl-updater, which causes it to
run on a regular schedule rather than running immediately upon startup
and then every `UpdatePeriod` after that. It is safe for this new config
key to be omitted and take the default zero value.

Also add a new command line flag `runOnce` to crl-updater which causes
it to immediately run a single time and then exit, rather than running
continuously as a daemon. This will be useful for integration tests and
emergency situations.

Part of #6163
2022-07-29 13:30:16 -07:00
Aaron Gable 9ae16edf51
Fix race condition in revocation integration tests (#6253)
Add a new filter to mail-test-srv, allowing test processes to query
for messages sent from a specific address, not just ones sent to
a specific address. This fixes a race condition in the revocation
integration tests where the number of messages sent to a cert's
contact address would be higher than expected because expiration
mailer sent a message while the test was running. Also reduce
bad-key-revoker's maximum backoff to 2 seconds to ensure that
it continues to run frequently during the integration tests, despite
usually not having any work to do.

While we're here, also improve the comments on various revocation
integration tests, remove some unnecessary cruft, and split the tests
out to explicitly test functionality with the MozRevocationReasons
flag both enabled and disabled. Also, change ocsp_helper's default
output from os.Stdout to ioutil.Discard to prevent hundreds of lines
of log spam when the integration tests fail during a test that uses
that library.

Fixes #6248
2022-07-29 09:23:50 -07:00
Jacob Hoffman-Andrews 2e64736e45
redis-create.sh: run `exec` on the last line (#6254)
Previously, when shutting down a `docker-compose` stack,
bredis_clusterer would take 10s to shut down. This decreases the time to
0.4s.

I believe this is because docker-compose was killing `bash` and waiting
for its children to die (they weren't), then hitting a timeout and hard
killing the container. Now, since `exec` replaces the current pid,
docker-compose can kill redis-server directly.
2022-07-26 13:19:50 -07:00