This splits rocsp-tool/main.go into main.go, client.go, issuers.go,
and inflight.go.
Adds tests for issuers and inflight, plus storeResponse in
client.go. Doesn't yet have a test for loadFromDB in client.go.
Part of #5786
Previously loadFromDB was calling cl.storeResponse, which parses, stores, and
then fetches a response, and logs it to stderr. Since we'll be storing
responses at high volume, we don't want to log them all to stderr. And
we're willing to trust that the CA signed a valid response, so we don't
need to parse it again. And we certainly don't need to fetch it right
after storing it.
Fixes#5782
Most Boulder logging is supposed to go through our logging subsystem, where a
checksum is added. However, very occasionally Boulder emits output on stdout or
stderr. For instance this can happen during panics, or if we load a pkcs11
module that emits messages on stdout or stderr.
When that happens, the logs are collected by systemd and sent into rsyslog with
the same programname as the lines that went through our logging subsystem. This
causes spurious alerts from log-validator because it can't find the checksum in
those log lines.
This change reduces the risk of spurious alerting by providing a separate metric
for "malformed log line" vs "well-formed log line with a checksum mismatch."
We'll still want to alert on "malformed log line", in case a future change to
logging causes all log lines to be malformed. But we can set the threshold for
it much higher.
Fixes#5771
This scans the database for certificateStatus rows, gets them signed by the CA, and writes them to Redis.
Also, bump the default PoolSize for Redis to 100.
Previously we would emit them all together in one log event. The log
event had interior newlines, but our log system removes those newlines
as a matter of course, resulting in an error message that is much too
long.
This change replaces that with a single log line per error.
This reverts commit e3ce816425,
which was reviewed in https://github.com/letsencrypt/boulder/pull/5607.
This change caused database queries to exceed the maximum packet size
and fail. Because this was an opportunistic optimization, reverting it
is the safest course moving forward.
Implement our label validation logic directly as specified in the
relevant RFCs: P-Labels are a subset of XN-Labels which are a
subset of Reserved LDH Labels which are a subset of LDH Labels.
This approach allows us to much more clearly document what each
check is doing, to remove two regular expressions, and to simplify
one additional regex.
Fixes https://github.com/letsencrypt/dev-misc-tickets/issues/247
The draft requires that the renewalInfo endpoint have a
Retry-After header indicating how often clients should poll
for their renewal information. As per our previous thinking,
set this timer to 6 hours for now.
Fixes#5765
When wait-for-it is trying to connect and failing, bash emits errors on
stderr. This captures those errors and sends them to /dev/null.
This also replaces an internal wait_tcp_port function inside
entrypoint.sh with a call to wait-for-it.sh.
When a valid authorization is stored in the database the authorization
column attemptedAt is set based on the challenge `Validated` value. Use
this value in `checkAuthorizationsCAA` to determine if an authorization
is sufficiently stale to need a recheck of the CAA DNS record. Error if the
time is nil. Keeps old codepath for safety check and increments a metric
if the old codepath is used.
Previously we were using the `deploy:` config field, but that's not
supported in some cases. Splitting things out also allows us to
explicitly assign IP addresses rather than relying on their most-likely
assignment to containers.
The `Source` interface in ocsp-responder defines a `Response` function.
Add a context to the function signature so that ocsp lookups can be more
tracable and cancelable. This is also a precursor to having cancelable
parallel lookups to multiple sources.
This is a sort of proof of concept of the Redis interaction, which will
evolve into a tool for inspection and manual repair of missing entries,
if we find ourselves needing to do that.
The important bits here are rocsp/rocsp.go and
cmd/rocsp-tool/main.go. Also, the newly-vendored Redis client.
Update the SA's CertStatusMetadata methods to include the "id"
column in the resulting object; also create a new struct representing
this object and delete the old unused methods. Plumb this id through
all of ocsp-updater, and use it in the SQL queries which update row
with new Expired statuses or with newly-signed OCSP responses.
This should allow the updates to be ever-so-slightly more efficient.
Fixes#5655Fixes#5587
This gets us ready to add writing to Redis from ocsp-updater. The Go
redis client requires different configuration for cluster operation
than non-cluster, so we need to simulate a cluster in our integration
environment. Cluster operation requires a manual initialization step,
which you can do like so:
```
docker-compose up -d bredis docker-compose exec bredis bash
/test/redis-create.sh
```
I still need to figure out how to make that happen automatically during
integration tests and when you run docker-compose up.
The hex values in redis.config are randomly generated passwords for the
different users.
Fixes#5723
Use `sa.SelectCertificates` instead of `sa.SelectCertificate` to
fetch the entire batch of certificates all at once, instead of doing
up to 10k individual certificate selections in serial.
Update our contributing guidelines to state that we prefer to
do error checking in a separate stanza, rather than doing error
assignment and checking on the same line. At this time, there
are less than 400 instances of the latter (unpreferred) pattern
in the Boulder codebase, and 1300+ instances of the former
(preferred) pattern.
Add a unit test and an integration test that both exercise the new
experimental ACME Renewal Info endpoint. These tests do not
yet validate the contents of the response, just that the appropriate
HTTP response code is returned, but they will be developed as the
code under test evolves.
Fixes#5674
Finally, the second-to-last step of switching from IssuerIDs to
IssuerNameIDs: have the CA store the IssuerNameID in the
certStatus database when first issuing precertificates and final
certificates.
This is the change we can't come back from: once this is deployed,
we've effectively changed our database schema (by changing
the semantic meaning of the certStatus table's "IssuerID" column).
Although it can be rolled back and no harm should come to anything,
rolling back (e.g. because some component actually *doesn't* handle
this gracefully) will not remove the data that was written while it
was deployed.
Part of #5152
Add functionality to purge by cache tags in our Akamai CachePurgeClient.
Use that functionality in a new manual mode of akamai-purger, which takes
a single tag with the `--tag` flag, or a file containing multiple tags
with `--tag-file`.
A tag file containing a random set of cache tags can be generated with:
printf "%x\n" $(seq 0 255) | shuf -n 5
Add a new feature flag to control whether or not the experimental ARI
information is exposed. Add a new entry to the Directory object which
provides the base URL for ARI requests. Add a new handler to the WFE
which parses incoming requests and returns reasonable renewalInfo.
Part of #5674
Introduce one cache tag: the last byte (hex-encoded) of the serial
number. This allows us to purge groups of responses, in chunks of
1/256 of our whole cache. We assume this is more or less evenly
distributed because serial numbers are random.
Fixes#5736.
- Add new function `SelectPrecertificates` to `SA` which returns `[]CertWithID`
- Replace `admin-revoker` calls to `sa.SelectCertificate(s)` with sa.SelectPrecertificate(s)
- Add SQL permissions for the `revoker` user to the `precertificates` table
Fixes#5708
Update the version of golangci-lint we use in our docker image,
and update the version of the docker image we use in our tests.
Fix a couple places where we were violating lints (ineffective assign
and calling `t.Fatal` from outside the main test goroutine), and add
one lint (using math/rand) to the ignore list.
Fixes#5710
Update zlint from v3.2.0 to just past v3.3.0, pulling in both an update
to the zlint interface and a number of new and improved checks. In
particular, pull in `lint_dnsname_contains_prohibited_reserved_label`,
which checks that DNSNames do not begin with any two characters followed
by two dashes, unless those two leading characters are "xn".
Also, update our few custom lints to match the new zlint v3.3.0
interface.
Fixes#5720
Instead of getting the number of certificates to work on with a
`select count(*)` and bailing when that number has been
retrieved, get batches until the last certificate in the batch
is newer than the start window. If it is, then `getCerts` is done.
This allows repeated runs using the same hiearchy, and avoids spurious
errors from ocsp-updater saying "This CA doesn't have an issuer cert
with ID XXX"
Fixes#5721
The resulting `boulder` binary can be invoked by different names to
trigger the behavior of the relevant subcommand. For instance, symlinking
and invoking as `boulder-ca` acts as the CA. Symlinking and invoking as
`boulder-va` acts as the VA.
This reduces the .deb file size from about 200MB to about 20MB.
This works by creating a registry that maps subcommand names to `main`
functions. Each subcommand registers itself in an `init()` function. The
monolithic `boulder` binary then checks what name it was invoked with
(`os.Args[0]`), looks it up in the registry, and invokes the appropriate
`main`. To avoid conflicts, all of the old `package main` are replaced
with `package notmain`.
To get the list of registered subcommands, run `boulder --list`. This
is used when symlinking all the variants into place, to ensure the set
of symlinked names matches the entries in the registry.
Fixes#5692
The expiration mailer processes certificates in batches of size
`certLimit` (default 100). In production, it runs in daemon mode, so it
will go on to the next batch when the current one is done. However, in
local integration tests we rely on it getting all its work done in a
single run. This works when you're running from a clean slate, but if
you've run integration tests a bunch of times, there will be a bunch of
certificates from previous runs that clog up the queue, and it won't
send mail for the specific certificate the integration test is looking
for.
Solution: Set `certLimit` very high in the config.
Also, update the default times for sending mail to match what we have in
prod.
* Pipeline ocsp-updater work
Create a three stage pipeline for concurrent work of ocsp-updates.
`findStaleOCSPResponses` will send query results on a channel to
`processExpired` which will then mark expired certs and send the stale
statuses on a channel to `generateOCSPResponses` which already
concurrently signs and stores new responses.
Two new stats are introduced for `mark_expired` and `find_stale_ocsp`
which give visibility into the number of and the status of those calls to
the database.
- Replace `gorp.DbMap` with calls that use `sql.DB` directly
- Use `rows.Scan()` and `rows.Next()` to get query results (which opens the door to streaming the results)
- Export function `CertStatusMetadataFields` from `SA`
- Add new function `ScanCertStatusRow` to `SA`
- Add new function `NewDbSettingsFromDBConfig` to `SA`
Fixes#5642
Part Of #5715
Previously, `starservers.start()` would implicitly build the binaries.
This separates the `startservers.install()` step as a separate one that
must happen first. This is useful because it allows us to ensure the
`ceremony` tool has been built before we run `setupHierarchy`.
Also, add a `-s` flag to `curl` when checking whether start.py resulted
in a successful startup. This reduces the amount of log spam when it
failed to come up.
We had in place some filtering for grpc errors that we consider
spurious, but that filtering was broken. This change ensures the
filtering gets called regardless of which of the various error/warning
methods grpc calls. This removes a lot of unnecessary red from our
integration test output.
Right now when looking at a list of Boulder CI test results, they all
say:
boulder_ci_tests (go_1.17_2021-...
Which is not very informative as to which type of test failed. This
shortens the test name to "ci", and also changes the invoked command so
more of it fits on the screen. That involves adding two new scripts,
t.sh and tn.sh, which each run `docker-compose run ... test.sh`. tn.sh
runs it with the appropriate flags to use config-next.
Add additional staleness buckets going from 24 hours
to 48 hours stale, so we can know how far beyond the
BRs we are.
Customize the tick buckets so that it doesn't top out at
reporting 10 seconds per tick (the default when no
buckets are provided).