Commit Graph

6201 Commits

Author SHA1 Message Date
Jacob Hoffman-Andrews 521eb55d1e
test: better message for different empty slices (#6920)
Given two empty slices, one that is equal to nil and one that is not,
AssertDeepEquals used to produce this confusing output:

    [[]] !(deep)= [[]]

After this change, it produces:

    [[]string(nil)] !(deep)= [[]string{}]
2023-05-26 09:41:23 -07:00
Samantha efbc2ad89b
goodkey: Remove dependency on berrors (#6917)
Fixes #6910
2023-05-26 11:26:10 -04:00
Phil Porada 23a0a71b2d
ctpolicy: More stats and monitoring (#6822)
Adds new prometheus metrics from the configured log list and configured
CT logs to the ctpolicy constructor. `ct_operator_group_size_gauge`
returns the number of configured logs managed by each operator in the
log list. `ct_shard_expiration_seconds` returns a Unix timestamp
representation of the `end_exclusive` field for each configured log in
the `sctLogs` list. For posterity, Boulder retrieves SCTs from logs in
the `sctLogs` list.

```
  ct_operator_group_size_gauge{operator="Operator A",source="finalLogs"} 2
ct_operator_group_size_gauge{operator="Operator A",source="sctLogs"} 4
ct_operator_group_size_gauge{operator="Operator B",source="sctLogs"} 2
ct_operator_group_size_gauge{operator="Operator D",source="sctLogs"} 1
ct_operator_group_size_gauge{operator="Operator F",source="finalLogs"} 1
ct_operator_group_size_gauge{operator="Operator F",source="infoLogs"} 1


ct_shard_expiration_seconds{logID="A1 Current",operator="Operator A"} 3.15576e+09
ct_shard_expiration_seconds{logID="A1 Future",operator="Operator A"} 3.47126688e+10
ct_shard_expiration_seconds{logID="A2 Current",operator="Operator A"} 3.15576e+09
ct_shard_expiration_seconds{logID="A2 Past",operator="Operator A"} 0
ct_shard_expiration_seconds{logID="B1",operator="Operator B"} 3.15576e+09
ct_shard_expiration_seconds{logID="B2",operator="Operator B"} 3.15576e+09
ct_shard_expiration_seconds{logID="D1",operator="Operator D"} 3.15576e+09
```

Fixes https://github.com/letsencrypt/boulder/issues/5705
2023-05-25 17:25:08 -04:00
Phil Porada 33fc8c4b6f
ctpolicy: Remove init function from loglist.go (#6918)
Removes the `//ctpolicy/loglist.go` init function which previously
seeded the math/rand global random generator in favor of Go 1.20
math/rand now doing this automatically. See release notes
[here.](https://tip.golang.org/doc/go1.20)

> The [math/rand](https://tip.golang.org/pkg/math/rand/) package now
automatically seeds the global random number generator (used by
top-level functions like Float64 and Int) with a random value, and the
top-level [Seed](https://tip.golang.org/pkg/math/rand/#Seed) function
has been deprecated. Programs that need a reproducible sequence of
random numbers should prefer to allocate their own random source, using
rand.New(rand.NewSource(seed)).
2023-05-25 16:43:42 -04:00
Aaron Gable 6ea74d5be9
OCSP: Use FilterSource for static responders (#6901)
Move the creation of the FilterSource outside of the conditional block,
so that the underlying source gets wrapped no matter which kind (either
a inMemorySource or a checkedRedisSource) it is.

This has two advantages: first, it means that static ocsp responders are
safer and more accurate, because they're not basing their responses on
both the issuer and the serial, not just the serial; and second, it
makes the current config validation tag which marks the "issuerCerts"
config field as required with `min=1` accurate.
2023-05-24 14:23:27 -07:00
Aaron Gable 4305f64a28
Replace integration test root ocsp with crls (#6905)
We no longer issue OCSP responses for our intermediate certificates,
instead producing CRLs which cover those intermediates. Remove the OCSP
response from our integration test ceremony, remove the configuration
for the static ocsp-responder which serves that response, and remove the
integration test which spins up and checks that responder. Replace all
of the above with new CRLs generated as part of the integration test
ceremony.
2023-05-24 14:22:43 -07:00
Jacob Hoffman-Andrews 54b5294651
bdns: fix handling of NXDOMAIN (#6916)
A recent refactoring (https://github.com/letsencrypt/boulder/pull/6906)
started treating NXDOMAIN for a CAA lookup as a hard error, when it
should be treated (from Boulder's point of view) as meaning there is an
empty list of resource records.
2023-05-24 12:16:01 -07:00
Phil Porada c75bf7033a
SA: Don't store HTTP-01 hostname and port in database validationrecord (#6863)
Removes the `Hostname` and `Port` fields from an http-01
ValidationRecord model prior to storing the record in the database.
Using `"hostname":"example.com","port":"80"` as a snippet of a whole
validation record, we'll save minimum 36 bytes for each new http-01
ValidationRecord that gets stored. When retrieving the record, the
ValidationRecord `RehydrateHostPort` method will repopulate the
`Hostname` and `Port` fields from the `URL` field.

Fixes the main goal of
https://github.com/letsencrypt/boulder/issues/5231.

---------

Co-authored-by: Samantha <hello@entropy.cat>
2023-05-23 15:36:17 -04:00
Samantha f09a94bd74
consul: Configure gRPC health check for SA (#6908)
Enable SA gRPC health checks in Consul ahead of further changes for
#6878. Calls to the `Check` method of the SA's grpc.health.v1.Health
service must respond `SERVING` before the `sa` service will be
advertised in Consul DNS. Consul will continue to poll this service
every 5 seconds.

- Add `bconsul` docker service to boulder `bluenet` and `rednet`
- Add TLS credentials for `consul.boulder`:
  ```shell
  $ openssl x509 -in consul.boulder/cert.pem -text | grep DNS
                DNS:consul.boulder
  ```
- Update `test/grpc-creds/generate.sh` to add `consul.boulder`
- Update test SA configs to allow `consul.boulder` to access to
`grpc.health.v1.Health`

Part of #6878
2023-05-23 13:16:49 -04:00
Aaron Gable 26adec08cc
Remove go1.20.3 from CI (#6898)
We are no longer be using go1.20.3 in prod.
2023-05-22 14:47:33 -07:00
Aaron Gable fe523f142d
crl-updater: retry failed shards (#6907)
Add per-shard exponential backoff and retry to crl-updater. Each
individual CRL shard will be retried up to MaxAttempts (default 1)
times, with exponential backoff starting at 1 second and maxing out at 1
minute between each attempt.

This can effectively reduce the parallelism of crl-updater: while a
goroutine is sleeping between attempts of a failing shard, it is not
doing work on another shard. This is a desirable feature, since it means
that crl-updater gently reduces the total load it places on the network
and database when shards start to fail.

Setting this new config parameter is tracked in IN-9140
Fixes https://github.com/letsencrypt/boulder/issues/6895
2023-05-22 12:59:09 -07:00
Aaron Gable 3990a08328
Add relevant domain to CAA errors and logs (#6886)
When processing CAA records, keep track of the FQDN at which that CAA
record was found (which may be different from the FQDN for which we are
attempting issuance, since we crawl CAA records upwards from the
requested name to the TLD). Then surface this name upwards so that it
can be included in our own log lines and in the problem documents which
we return to clients.

Fixes https://github.com/letsencrypt/boulder/issues/3171
2023-05-22 15:08:56 -04:00
Samantha 90dec0ca95
docker-compose: Fix small spacing inconsistency (#6909) 2023-05-19 15:58:08 -04:00
Jacob Hoffman-Andrews 4f171604fe
Expose Extended DNS Errors (#6906)
If the resolver provides EDE (https://www.rfc-editor.org/rfc/rfc8914),
Boulder will automatically expose it in the error message. Note that
most error messages contain the error RCODE (NXDOMAIN, SERVFAIL, etc),
when there is EDE present we omit it in the interest of brevity. In
practice it will almost always be SERVFAIL, and the extended error
information is more informative anyhow.

This will have no effect in production until we configure Unbound to
enable EDE.

Fixes #6875.

---------

Co-authored-by: Matthew McPherrin <mattm@letsencrypt.org>
2023-05-18 20:43:00 -07:00
alexzorin 0a65e87c1b
va: make http keyAuthz mismatch problem wording less ambiguous (#6903)
Occasionally (and just now) I've responded to an issue or thread that
involves this error message:

> The key authorization file from the server did not match this
challenge
"LoqXcYV8q5ONbJQxbmR7SCTNo3tiAXDfowyjxAjEuX0.9jg46WB3rR_AHD-EBXdN7cBkH1WOu0tA3M9fm21mqTI"
!= "\xef\xffAABBCC

and I've found myself looking at Boulder's source code, to check which
way around the values are. I suspect that users are not understanding it
either.
2023-05-18 12:04:14 -04:00
Aaron Gable 56f8537e68
Ensure SelectOne queries never return more than 1 row (#6900)
As a follow-up to https://github.com/letsencrypt/boulder/issues/5467, I
did an audit of all places where we call SelectOne to ensure that those
queries can never return more than one result. These four functions were
the only places that weren't already constrained to a single result
through the use of "SELECT COUNT", "LIMIT 1", "WHERE uniqueKey =", or
similar. Limit these functions' queries to always only return a single
result, now that their underlying tables no longer have unique key
constraints.

Additionally, slightly refactor selectRegistration to just take a single
column name rather than a whole WHERE clause.

Fixes https://github.com/letsencrypt/boulder/issues/6521
2023-05-17 14:13:21 -07:00
Aaron Gable f91aa1d57d
Shorten logline prefix in integration tests (#6893)
When the "integration" build tag is set, reduce the stdout prefix to
just a short timestamp, log level, and process name. The other details
(e.g. date, datacenter, and hostname) are not relevant in CI, and only
serve to clutter the logs.

Part of https://github.com/letsencrypt/boulder/issues/6890
2023-05-17 14:07:54 -07:00
Phil Porada 19380cda68
WFE: Enforce parseJWS precondition for more safety while handling JWS (#6860)
Define a `bJSONWebSignature` struct which embeds a
`*jose.JSONWebSignature`. The only method that can produce a
`bJSONWebSignature` is `wfe.parseJWS` so that we can ensure
safety/sanity checks are performed on the incoming data. Restricts
several methods and functions to take a `jose.Header` as an input
parameter, rather than a full JWS.

Fixes https://github.com/letsencrypt/boulder/issues/5676.
2023-05-17 11:55:16 -04:00
Matthew McPherrin b7d9f8c2e3
In config-next/, opentelemetry -> openTelemetry for consistency (#6888)
In configs, opentelemetry -> openTelemetry

As pointed out in review of #6867, these should match the case of their
corresponding Go identifiers for consistency.

JSON keys are case-insensitive in Go (part of why we've got a fork in
go-jose),
so this change should have no functional impact.
2023-05-15 17:07:29 -04:00
Aaron Gable 204a218ed5
Remove port bindings from bjaeger container (#6892)
These external port bindings are not necessary, as the integration test
configs resolve the bjaeger container directly. In addition, these
external port bindings cause problems for rootless docker, so let's
remove them.
2023-05-15 13:56:32 -07:00
Jacob Hoffman-Andrews 2b1ac9e915
admin-revoker: fix help output (#6891)
Previously if you passed `-h` or `-help` to a sub-sub-command of
admin-revoker it would error out with a red message and a stack trace
(in addition to printing help).

Now, it will print help and exit 1.
2023-05-15 13:54:13 -07:00
Aaron Gable 62ff373885
Probs: remove divergences from RFC8555 (#6877)
Remove the remaining divergences from RFC8555 regarding what error types
we use in certain situations. Specifically:
- use "invalidContact" instead of "invalidEmail";
- use "unsupportedContact" for contact addresses that use a protocol
other than "mailto:"; and
- use "unsupportedIdentifier" for identifiers that specify a type other
than "dns".
2023-05-15 12:35:12 -07:00
Aaron Gable 46183df5dc
Add link to list of root programs to ceremony docs (#6884)
Fixes https://github.com/letsencrypt/boulder/issues/6730
2023-05-15 12:34:34 -07:00
Matthew McPherrin c21b44bdc2
Rename CA's "--ca-addr" flag to "--addr" (#6889)
Most boulder components have a command line flag to override what gRPC
and debug port they listen on, which is used in tests to run multiple
instances with the same configuration.

However, CA's flag is named "--ca-addr", and not "--addr". This is
inconsistent with SA, RA, VA, nonce, publisher, and purger.

This flag isn't used in production, where we set it in the config file,
so it shouldn't be a breaking change to rename it.
2023-05-15 11:17:07 -07:00
Samantha 9e8101ff3a
main: Validate config files by default (#6885)
- Make config validation run by default for all Boulder components with
a registered validator.
- Refactor main to parse `boulder` flags directly instead of declaring
them as subcommands.
- Remove the `validate` subcommand and update relevant docs.
- Fix configuration validation for issuer (file source) OCSP responder.

Fixes #6857
Fixes #6763
2023-05-15 14:16:04 -04:00
Matthew McPherrin 8c9c55609b
Remove redundant jose import alias (#6887)
This PR should have no functional change; just a cleanup.
2023-05-15 09:45:58 -07:00
Matthew McPherrin 3aae67b8a9
Opentelemetry: Add option for public endpoints (#6867)
This PR adds a new configuration block specifically for the otelhttp
instrumentation. This block is separate from the existing
"opentelemetry" configuration, and is only relevant when using otelhttp
instrumentation. It does not share any codepath with the existing
configuration, so it is at the top level to indicate which services it
applies to.

There's a bit of plumbing new configuration through. I've adopted the
measured_http package to also set up opentelemetry instead of just
metrics, which should hopefully allow any future changes to be smaller
(just config & there) and more consistent between the wfe2 and ocsp
responder.

There's one option here now, which disables setting
[otelhttp.WithPublicEndpoint](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp#WithPublicEndpoint).
This option is designed to do exactly what we want: Don't accept
incoming spans as parents of the new span created in the server.
Previously we had a setting to disable parent-based sampling to help
with this problem, which doesn't really make sense anymore, so let's
just remove it and simplify that setup path. The default of "false" is
designed to be the safe option. It's set to True in the test/ configs
for integration tests that use traces, and I expect we'll likely set it
true in production eventually once the LBs are configured to handle
tracing themselves.

Fixes #6851
2023-05-12 15:34:34 -04:00
Samantha 310546a14e
VA: Support discovery of DNS resolvers via Consul (#6869)
Deprecate `va.DNSResolver` in favor of backwards compatible
`va.DNSProvider`.

Fixes #6852
2023-05-12 12:54:31 -04:00
Aaron Gable 1fcd951622
Probs: simplifications and cleanup (#6876)
Make minor, non-user-visible changes to how we structure the probs
package. Notably:
- Add new problem types for UnsupportedContact and
UnsupportedIdentifier, which are specified by RFC8555 and which we will
use in the future, but haven't been using historically.
- Sort the problem types and constructor functions to match the
(alphabetical) order given in RFC8555.
- Rename some of the constructor functions to better match their
underlying problem types (e.g. "TLSError" to just "TLS").
- Replace the redundant ProblemDetailsToStatusCode function with simply
always returning a 500 if we haven't properly set the problem's
HTTPStatus.
- Remove the ability to use either the V1 or V2 error namespace prefix;
always use the proper RFC namespace prefix.
2023-05-12 12:10:13 -04:00
Samantha 19c5244088
test: Use consul hostname instead of IP for dnsAuthority (#6883)
Standardize on hostnames for dnsAuthority to match production. 

Related to #6869
2023-05-11 14:13:53 -07:00
Aaron Gable 42bd62e50b
Purger: list failed urls in error message (#6882)
Fixes https://github.com/letsencrypt/boulder/issues/6853
2023-05-11 10:39:54 -07:00
dependabot[bot] 8d3dc74645
Bump github.com/aws/aws-sdk-go-v2/config from 1.18.12 to 1.18.25 (#6881)
Bumps
[github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2)
from 1.18.12 to 1.18.25.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-10 15:32:25 -04:00
Jacob Hoffman-Andrews f295626e4c
ca: remove simulated ISRG OID from config (#6879)
We intend to issue in the future with only the CA/Browser Forum Domain
Validated OID.
2023-05-10 12:39:12 -04:00
Jacob Hoffman-Andrews ac4be89b56
grpc: add NoWaitForReady config field (#6850)
Currently we set WaitForReady(true), which causes gRPC requests to not
fail immediately if no backends are available, but instead wait until
the timeout in case a backend does become available. The downside is
that this behavior masks true connection errors. We'd like to turn it
off.

Fixes #6834
2023-05-09 16:16:44 -07:00
Samantha 9dce86fda0
boulder-wfe: Remove deprecated chains fields (#6874)
Fields CertificateChains and AlternativeCertificateChains were removed
by SRE in IN-5913.

Fixes #6873
Related to #5164
2023-05-08 15:55:20 -04:00
Samantha c453ca0571
grpc: Deprecate clientNames field (#6870)
- SRE removed in IN-8755

Fixes #6698
2023-05-08 14:49:27 -04:00
Samantha 487680629d
cmd: TLSConfig values should be string not *string (#6872)
Fixes #6737
2023-05-08 13:21:42 -04:00
Samantha c9173cc024
boulder-va: Remove deprecated Common fields stanza (#6871)
- SRE removed in IN-8752.

Fixes #6716
2023-05-08 11:47:17 -04:00
Matthew McPherrin 8427245675
OTel Integration test using jaeger (#6842)
This adds Jaeger's all-in-one dev container (with no persistent storage)
to boulder's dev docker-compose. It configures config-next/ to send all
traces there.

A new integration test creates an account and issues a cert, then
verifies the trace contains some set of expected spans.

This test found that async finalize broke spans, so I fixed that and a
few related spots where we make a new context.
2023-05-05 10:41:29 -04:00
Phil Porada f8f45f90a9
Test and build release on go1.20.4 (#6862)
[Go 1.20.4](https://groups.google.com/g/golang-announce/c/MEb0UyuSMsU)
contains a security updates for the html/template package, which we use
in `//cmd/bad-key-revoker`.
2023-05-04 10:55:02 -04:00
Matthew McPherrin 5f0d2ae002
Upgrade Opentelemetry dependencies (#6855)
This upgrades otel to v1.15.0, and the /contrib/ packages to v0.41.0.
Several dependencies are upgraded as dependencies, notably grpc.

This contains a change to grpc, only mapping some grpc.Errors into span
errors if it's Unknown, DeadlineExceeded, Unimplemented, Internal,
Unavailable, or DataLoss, which should be helpful for us as we use grpc
errors semantically in Boulder, especially NotFound.
2023-05-03 15:40:11 -07:00
Aaron Gable 02fa680b08
Update path to ARI endpoint (#6859)
Update the document number to the latest version, and remove the /get/
prefix since it now supports both the GET and POST portions of the spec.

Also update one piece of tooling to properly get the ARI URL from the
directory, rather than hard-coding it.
2023-05-03 15:20:51 -07:00
Matthew McPherrin b5118dde36
Stop using DIRECTORY env var in integration tests (#6854)
We only ever set it to the same value, and then read it back in
make_client, so just hardcode it there instead.

It's a bit spooky-action-at-a-distance and is process-wide with no
synchronization, which means we can't safely use different values
anyway.
2023-05-03 09:54:48 -04:00
Jacob Hoffman-Andrews a9fc1cb882
Improve cert_storage_failed_test (#6849)
Replace inline connect string with a new one in test/vars (that points
to boulder_sa_integration).

Remove comments about interpolateParams=false being required; it is not.

Add clauses to getPrecertByName to ensure it follows its documented
constraints (return the latest one).

Follow-up on #6807. Fixes #6848.
2023-05-02 15:43:07 -07:00
Aaron Gable b0d63e60fc
Fix order_ages histogram to have buckets up to 7 days (#6858)
A copy-paste typo caused the order_ages histogram to stop at 2 days,
rather than 7 days.
2023-05-02 17:39:50 -04:00
Jacob Hoffman-Andrews 1c7e0fd1d8
Store linting certificate instead of precertificate (#6807)
In order to get rid of the orphan queue, we want to make sure that
before we sign a precertificate, we have enough data in the database
that we can fulfill our revocation-checking obligations even if storing
that precertificate in the database fails. That means:

- We should have a row in the certificateStatus table for the serial.
- But we should not serve "good" for that serial until we are positive
the precertificate was issued (BRs 4.9.10).
- We should have a record in the live DB of the proposed certificate's
public key, so the bad-key-revoker can mark it revoked.
- We should have a record in the live DB of the proposed certificate's
names, so it can be revoked if we are required to revoke based on names.

The SA.AddPrecertificate method already achieves these goals for
precertificates by writing to the various metadata tables. This PR
repurposes the SA.AddPrecertificate method to write "proposed
precertificates" instead.

We already create a linting certificate before the precertificate, and
that linting certificate is identical to the precertificate that will be
issued except for the private key used to sign it (and the AKID). So for
instance it contains the right pubkey and SANs, and the Issuer name is
the same as the Issuer name that will be used. So we'll use the linting
certificate as the "proposed precertificate" and store it to the DB,
along with appropriate metadata.

In the new code path, rather than writing "good" for the new
certificateStatus row, we write a new, fake OCSP status string "wait".
This will cause us to return internalServerError to OCSP requests for
that serial (but we won't get such requests because the serial has not
yet been published). After we finish precertificate issuance, we update
the status to "good" with SA.SetCertificateStatusReady.

Part of #6665
2023-04-26 13:54:24 -07:00
Aaron Gable 25cae29f70
Resolve TestAkamaiPurgerDrainQueueFails race (#6844)
Fixes https://github.com/letsencrypt/boulder/issues/6837
2023-04-26 11:30:09 -04:00
Aaron Gable 97aa50977f
Give orderToAuthz2 an auto-increment ID column (#6835)
Replace the current orderToAuthz2 table schema with one that includes an
auto-increment ID column, so that this table can be partitioned simply
by ID, like all of our other partitioned tables.

Update the SA so that when it selects from a join over this table and
the authz2 table, it explicitly selects the columns from the authz2
table, to avoid the ambiguity introduced by having two columns named
"id" in the result set.

This work is already in-progress in prod, represented by IN-8916 and
IN-8928.

Fixes https://github.com/letsencrypt/boulder/issues/6820
2023-04-24 14:59:18 -07:00
Samantha 43bb293d6f
CA: Only increment lintErrorCount for true lint violations (#6843)
Return the sentinel error indicative of lint violation from
`linter.ProcessResultSet()` instead of `issuance`. This removes a
potential source of false-positives.
2023-04-24 14:30:50 -07:00
Samantha 7048978028
RA: Track order and authz age during NewOrder and FinalizeOrder (#6841)
Overhaul tracking of order and authz age/ reuse to gather data for
#5061.

- Modify `ra.orderAges` histogram to track the method that observed the
order.
- Add observation of the order age at NewOrder time.
- Modify `ra.authzAges` histogram to track the method that observed the
authz as well its type (valid or pending).
- Add observation of authz age at Finalize time.
- Remove `reusedValidAuthzCounter`, which erroneously counted authzs
with status pending as valid.
2023-04-24 16:30:26 -04:00