Commit Graph

5636 Commits

Author SHA1 Message Date
Jacob Hoffman-Andrews f238409089
Split rocsp-tool into files; add some tests (#5795)
This splits rocsp-tool/main.go into main.go, client.go, issuers.go,
and inflight.go.

Adds tests for issuers and inflight, plus storeResponse in
client.go. Doesn't yet have a test for loadFromDB in client.go.

Part of #5786
2021-11-12 17:41:14 -08:00
Jacob Hoffman-Andrews 2144018d6b
Speed up load-from-db and reduce stderr noisiness (#5787)
Previously loadFromDB was calling cl.storeResponse, which parses, stores, and
then fetches a response, and logs it to stderr. Since we'll be storing
responses at high volume, we don't want to log them all to stderr. And
we're willing to trust that the CA signed a valid response, so we don't
need to parse it again. And we certainly don't need to fetch it right
after storing it.

Fixes #5782
2021-11-11 16:35:27 -08:00
Samantha 7a7f436212
log-validator: ensure that log lines contain a checksum (#5788)
Most Boulder logging is supposed to go through our logging subsystem, where a
checksum is added. However, very occasionally Boulder emits output on stdout or
stderr. For instance this can happen during panics, or if we load a pkcs11
module that emits messages on stdout or stderr.

When that happens, the logs are collected by systemd and sent into rsyslog with
the same programname as the lines that went through our logging subsystem. This
causes spurious alerts from log-validator because it can't find the checksum in
those log lines.

This change reduces the risk of spurious alerting by providing a separate metric
for "malformed log line" vs "well-formed log line with a checksum mismatch."
We'll still want to alert on "malformed log line", in case a future change to
logging causes all log lines to be malformed. But we can set the threshold for
it much higher.

Fixes #5771
2021-11-09 12:38:09 -08:00
Jacob Hoffman-Andrews 4f1934af82
Add load-from-db support to rocsp-tool (#5778)
This scans the database for certificateStatus rows, gets them signed by the CA, and writes them to Redis.

Also, bump the default PoolSize for Redis to 100.
2021-11-08 17:35:10 -08:00
Jacob Hoffman-Andrews 1a63aed5c2
Improve caa-log-checker errors (#5784)
Previously we would emit them all together in one log event. The log
event had interior newlines, but our log system removes those newlines
as a matter of course, resulting in an error message that is much too
long.

This change replaces that with a single log line per error.
2021-11-08 15:53:58 -08:00
Aaron Gable bbe53e92d0
Revert "Expiry mailer: fetch certificates in bulk" (#5780)
This reverts commit e3ce816425,
which was reviewed in https://github.com/letsencrypt/boulder/pull/5607.

This change caused database queries to exceed the maximum packet size
and fail. Because this was an opportunistic optimization, reverting it
is the safest course moving forward.
2021-11-05 13:26:06 -07:00
Aaron Gable 9a4eb56fc7
Make DNS label validation logic clearer (#5763)
Implement our label validation logic directly as specified in the
relevant RFCs: P-Labels are a subset of XN-Labels which are a
subset of Reserved LDH Labels which are a subset of LDH Labels.

This approach allows us to much more clearly document what each
check is doing, to remove two regular expressions, and to simplify
one additional regex.

Fixes https://github.com/letsencrypt/dev-misc-tickets/issues/247
2021-11-05 11:42:42 -07:00
Aaron Gable a70d994ff1
ARI: Set Retry-After header to 6 hours (#5766)
The draft requires that the renewalInfo endpoint have a
Retry-After header indicating how often clients should poll
for their renewal information. As per our previous thinking,
set this timer to 6 hours for now.

Fixes #5765
2021-11-05 11:38:45 -07:00
Jacob Hoffman-Andrews 3d0a818bef
Quiet the output of wait-for-it (#5775)
When wait-for-it is trying to connect and failing, bash emits errors on
stderr. This captures those errors and sends them to /dev/null.

This also replaces an internal wait_tcp_port function inside
entrypoint.sh with a call to wait-for-it.sh.
2021-11-05 11:38:20 -07:00
Jacob Hoffman-Andrews 44d9d50a92
Check column names in ScanCertStatusMetadataRow (#5776)
This ensures we queried the right set of columns, and didn't get them
out of order.
2021-11-05 10:51:15 -07:00
Andrew Gabbitas 98d9a12ccd
Use authorization attemptedAt date for CAA recheck (#5746)
When a valid authorization is stored in the database the authorization
column attemptedAt is set based on the challenge `Validated` value. Use
this value in `checkAuthorizationsCAA` to determine if an authorization
is sufficiently stale to need a recheck of the CAA DNS record. Error if the
time is nil. Keeps old codepath for safety check and increments a metric
if the old codepath is used.
2021-11-04 14:50:11 -06:00
Jacob Hoffman-Andrews 6b1ad1ce21
Enumerate bredis_N containers explicitly (#5774)
Previously we were using the `deploy:` config field, but that's not
supported in some cases. Splitting things out also allows us to
explicitly assign IP addresses rather than relying on their most-likely
assignment to containers.
2021-11-03 17:16:19 -07:00
Andrew Gabbitas 3a5f23de9e
Pass context to ocsp-responder sources (#5773)
The `Source` interface in ocsp-responder defines a `Response` function.
Add a context to the function signature so that ocsp lookups can be more
tracable and cancelable. This is also a precursor to having cancelable
parallel lookups to multiple sources.
2021-11-02 17:29:42 -07:00
Jacob Hoffman-Andrews 7fab32a000
Add rocsp-tool to manually store OCSP responses in Redis (#5758)
This is a sort of proof of concept of the Redis interaction, which will
evolve into a tool for inspection and manual repair of missing entries,
if we find ourselves needing to do that.

The important bits here are rocsp/rocsp.go and
cmd/rocsp-tool/main.go. Also, the newly-vendored Redis client.
2021-11-02 11:04:03 -07:00
alexzorin 9d07942c9d
Upgrade dependency weppos/publicsuffix-go (#5769)
37 additions and 22 removals
2021-11-02 00:21:32 -06:00
Jacob Hoffman-Andrews e249267fe5
Update protobuf and golang.org/x/net (#5767) 2021-11-01 15:28:01 -07:00
Aaron Gable 6d3d80fdc6
ocsp-updater: use ID instead of Serial for updates (#5762)
Update the SA's CertStatusMetadata methods to include the "id"
column in the resulting object; also create a new struct representing
this object and delete the old unused methods. Plumb this id through
all of ocsp-updater, and use it in the SQL queries which update row
with new Expired statuses or with newly-signed OCSP responses.

This should allow the updates to be ever-so-slightly more efficient.

Fixes #5655
Fixes #5587
2021-11-01 15:24:03 -07:00
Jacob Hoffman-Andrews ae1c14865c
Extract pretty-printer from ocsp/helper package (#5757)
This will allow other tools to easily print OCSP responses.
2021-10-28 10:37:08 -07:00
Jacob Hoffman-Andrews c1d221abe6
Add Redis to Boulder's docker-compose (#5747)
This gets us ready to add writing to Redis from ocsp-updater. The Go
redis client requires different configuration for cluster operation
than non-cluster, so we need to simulate a cluster in our integration
environment. Cluster operation requires a manual initialization step,
which you can do like so:

```
docker-compose up -d bredis docker-compose exec bredis bash
/test/redis-create.sh
```

I still need to figure out how to make that happen automatically during
integration tests and when you run docker-compose up.

The hex values in redis.config are randomly generated passwords for the
different users.

Fixes #5723
2021-10-28 10:36:11 -07:00
Aaron Gable e3ce816425
Expiry mailer: fetch certificates in bulk (#5607)
Use `sa.SelectCertificates` instead of `sa.SelectCertificate` to
fetch the entire batch of certificates all at once, instead of doing
up to 10k individual certificate selections in serial.
2021-10-28 09:33:20 -07:00
Aaron Gable 6dff9c583a
Avoid inline error checking (#5752)
Update our contributing guidelines to state that we prefer to
do error checking in a separate stanza, rather than doing error
assignment and checking on the same line. At this time, there
are less than 400 instances of the latter (unpreferred) pattern
in the Boulder codebase, and 1300+ instances of the former
(preferred) pattern.
2021-10-27 15:04:20 -07:00
Aaron Gable 1a1cd24237
Add tests for the experimental `renewalInfo` endpoint (#5750)
Add a unit test and an integration test that both exercise the new
experimental ACME Renewal Info endpoint. These tests do not
yet validate the contents of the response, just that the appropriate
HTTP response code is returned, but they will be developed as the
code under test evolves.

Fixes #5674
2021-10-27 15:00:56 -07:00
Aaron Gable a57d97c6e5
CA: Store IssuerNameID in certStatus table (#5593)
Finally, the second-to-last step of switching from IssuerIDs to
IssuerNameIDs: have the CA store the IssuerNameID in the
certStatus database when first issuing precertificates and final
certificates.

This is the change we can't come back from: once this is deployed,
we've effectively changed our database schema (by changing
the semantic meaning of the certStatus table's "IssuerID" column).
Although it can be rolled back and no harm should come to anything,
rolling back (e.g. because some component actually *doesn't* handle
this gracefully) will not remove the data that was written while it
was deployed.

Part of #5152
2021-10-26 16:17:58 -07:00
Jacob Hoffman-Andrews 960fde9347
Add manual Akamai cache tag purger (#5742)
Add functionality to purge by cache tags in our Akamai CachePurgeClient.

Use that functionality in a new manual mode of akamai-purger, which takes
a single tag with the `--tag` flag, or a file containing multiple tags
with `--tag-file`.

A tag file containing a random set of cache tags can be generated with:

    printf "%x\n" $(seq 0 255) | shuf -n 5
2021-10-25 18:21:27 -07:00
Andrew Gabbitas 55a519fc33
Decrease InternetFacingBuckets histogram (#5748) 2021-10-25 17:39:28 -06:00
Aaron Gable 18f556201a
First draft of ACME Renewal Info (#5691)
Add a new feature flag to control whether or not the experimental ARI
information is exposed. Add a new entry to the Directory object which
provides the base URL for ARI requests. Add a new handler to the WFE
which parses incoming requests and returns reasonable renewalInfo.

Part of #5674
2021-10-25 14:55:25 -07:00
Jacob Hoffman-Andrews acca5f9c76
Add Edge-Cache-Tag to OCSP responses (#5739)
Introduce one cache tag: the last byte (hex-encoded) of the serial
number. This allows us to purge groups of responses, in chunks of
1/256 of our whole cache. We assume this is more or less evenly
distributed because serial numbers are random.

Fixes #5736.
2021-10-25 11:07:41 -07:00
Jacob Hoffman-Andrews cdb4b930ea
Add a hostname validity checker (#5704) 2021-10-25 11:06:47 -07:00
Samantha 6e6f452945
admin-revoker: tool should only need to query the `precertificates` table (#5737)
- Add new function `SelectPrecertificates` to `SA` which returns `[]CertWithID`
- Replace `admin-revoker` calls to `sa.SelectCertificate(s)` with sa.SelectPrecertificate(s)
- Add SQL permissions for the `revoker` user to the `precertificates` table

Fixes #5708
2021-10-22 18:31:30 -07:00
Aaron Gable eb5d0e9ba9
Update golangci-lint from v1.29.0 to v1.42.1 (#5745)
Update the version of golangci-lint we use in our docker image,
and update the version of the docker image we use in our tests.
Fix a couple places where we were violating lints (ineffective assign
and calling `t.Fatal` from outside the main test goroutine), and add
one lint (using math/rand) to the ignore list.

Fixes #5710
2021-10-22 16:26:59 -07:00
Aaron Gable 011e453df6
Update zlint to check for reserved IDNs (#5743)
Update zlint from v3.2.0 to just past v3.3.0, pulling in both an update
to the zlint interface and a number of new and improved checks. In
particular, pull in `lint_dnsname_contains_prohibited_reserved_label`,
which checks that DNSNames do not begin with any two characters followed
by two dashes, unless those two leading characters are "xn".

Also, update our few custom lints to match the new zlint v3.3.0
interface.

Fixes #5720
2021-10-22 12:37:09 -07:00
Samantha fabb9cd9ea
admin-revoker: Batch files should be scanned not loaded (#5738) 2021-10-21 17:15:08 -07:00
Andrew Gabbitas dbc15ce4f5
Performance improvement for cert-checker (#5722)
Instead of getting the number of certificates to work on with a
`select count(*)` and bailing when that number has been
retrieved, get batches until the last certificate in the batch
is newer than the start window. If it is, then `getCerts` is done.
2021-10-21 08:55:38 -06:00
Jacob Hoffman-Andrews ba0ea090b2
integration: save hierarchy across runs (#5729)
This allows repeated runs using the same hiearchy, and avoids spurious
errors from ocsp-updater saying "This CA doesn't have an issuer cert
with ID XXX"

Fixes #5721
2021-10-20 17:06:33 -07:00
Jacob Hoffman-Andrews 23dd1e21f9
Build all boulder binaries into a single binary (#5693)
The resulting `boulder` binary can be invoked by different names to
trigger the behavior of the relevant subcommand. For instance, symlinking
and invoking as `boulder-ca` acts as the CA. Symlinking and invoking as
`boulder-va` acts as the VA.

This reduces the .deb file size from about 200MB to about 20MB.

This works by creating a registry that maps subcommand names to `main`
functions. Each subcommand registers itself in an `init()` function. The
monolithic `boulder` binary then checks what name it was invoked with
(`os.Args[0]`), looks it up in the registry, and invokes the appropriate
`main`. To avoid conflicts, all of the old `package main` are replaced
with `package notmain`.

To get the list of registered subcommands, run `boulder --list`. This
is used when symlinking all the variants into place, to ensure the set
of symlinked names matches the entries in the registry.

Fixes #5692
2021-10-20 17:05:45 -07:00
Jacob Hoffman-Andrews 803d6cfbf6
Fix leftover test.sh in matrix. (#5730) 2021-10-20 08:23:00 -07:00
Jacob Hoffman-Andrews 11bda3e486
Add error counter for TLD (#5717) 2021-10-19 15:57:31 -07:00
Samantha 139412f3b5
ocsp-updater: Add test for empty table (#5716)
Test that no errors are returned if the `CertificateStatus` table contains no rows.
2021-10-19 14:53:46 -07:00
Jacob Hoffman-Andrews dc742fc320
Fix expiration-mailer integration test locally. (#5719)
The expiration mailer processes certificates in batches of size
`certLimit` (default 100). In production, it runs in daemon mode, so it
will go on to the next batch when the current one is done. However, in
local integration tests we rely on it getting all its work done in a
single run. This works when you're running from a clean slate, but if
you've run integration tests a bunch of times, there will be a bunch of
certificates from previous runs that clog up the queue, and it won't
send mail for the specific certificate the integration test is looking
for.

Solution: Set `certLimit` very high in the config.

Also, update the default times for sending mail to match what we have in
prod.
2021-10-18 19:51:34 -07:00
Andrew Gabbitas 536afb86b4
Pipeline ocsp-updater work (#5687)
* Pipeline ocsp-updater work

Create a three stage pipeline for concurrent work of ocsp-updates.
`findStaleOCSPResponses` will send query results on a channel to
`processExpired` which will then mark expired certs and send the stale
statuses on a channel to `generateOCSPResponses` which already
concurrently signs and stores new responses.

Two new stats are introduced for `mark_expired` and `find_stale_ocsp`
which give visibility into the number of and the status of those calls to
the database.
2021-10-18 14:06:45 -06:00
Samantha 99502b1ffb
oscp-updater: use rows.Scan() to get query results (#5656)
- Replace `gorp.DbMap` with calls that use `sql.DB` directly
- Use `rows.Scan()` and `rows.Next()` to get query results (which opens the door to streaming the results)
- Export function `CertStatusMetadataFields` from `SA`
- Add new function `ScanCertStatusRow` to `SA`
- Add new function `NewDbSettingsFromDBConfig` to `SA`

Fixes #5642
Part Of #5715
2021-10-18 10:33:09 -07:00
Jacob Hoffman-Andrews d3302cbb50
Separate install / build steps of tests. (#5714)
Previously, `starservers.start()` would implicitly build the binaries.
This separates the `startservers.install()` step as a separate one that
must happen first. This is useful because it allows us to ensure the
`ceremony` tool has been built before we run `setupHierarchy`.

Also, add a `-s` flag to `curl` when checking whether start.py resulted
in a successful startup. This reduces the amount of log spam when it
failed to come up.
2021-10-15 09:30:55 -07:00
Jacob Hoffman-Andrews 25ef9c3bfc
Shorten maximum serial length. (#5694)
Fixes #5690
2021-10-14 16:49:27 -07:00
Jacob Hoffman-Andrews 1677aeaf95
expiration-mailer: log attempted and failed sends (#5702)
This will help us do better analysis of how many emails we're sending
and figure out any failure patterns.
2021-10-14 16:46:01 -07:00
Jacob Hoffman-Andrews d3c027c93d
Fix log filtering for grpc errors. (#5712)
We had in place some filtering for grpc errors that we consider
spurious, but that filtering was broken. This change ensures the
filtering gets called regardless of which of the various error/warning
methods grpc calls. This removes a lot of unnecessary red from our
integration test output.
2021-10-14 16:45:16 -07:00
Jacob Hoffman-Andrews ac125dc60f
Make test matrix results more readable (#5711)
Right now when looking at a list of Boulder CI test results, they all
say:

boulder_ci_tests (go_1.17_2021-...

Which is not very informative as to which type of test failed. This
shortens the test name to "ci", and also changes the invoked command so
more of it fits on the screen. That involves adding two new scripts,
t.sh and tn.sh, which each run `docker-compose run ... test.sh`. tn.sh
runs it with the appropriate flags to use config-next.
2021-10-14 16:15:57 -07:00
Andrew Gabbitas ba673673a4
Match revocation reason and request signing method (#5713)
Match revocation reason and request signing method

Add more detailed logging about request signing methods
2021-10-14 15:39:22 -06:00
Samantha 18e5f405ed
Add script to perform weppos/publicsuffix-go upgrades (#5661)
Part Of #5650
2021-10-13 18:45:08 -07:00
Jacob Hoffman-Andrews 1309da6275
Consolidate name resolution in sd-test-srv. (#5709)
Previously we relied on aliases in Docker's DNS for some names, and
sd-test-srv for some other names. This moves them all into sd-test-srv.
2021-10-13 18:38:38 -07:00
Aaron Gable 6292311780
ocsp-updater: improve tick and staleness buckets (#5640)
Add additional staleness buckets going from 24 hours
to 48 hours stale, so we can know how far beyond the
BRs we are.

Customize the tick buckets so that it doesn't top out at
reporting 10 seconds per tick (the default when no
buckets are provided).
2021-10-12 15:12:27 -07:00