In #6478, we stopped passing through Redis errors to the top-level
Responder object, preferring instead to live-sign. As part of that
change, we logged the Redis errors so they wouldn't disappear. However,
the sample rate for those errors was hard coded to 1-in-1000, instead of
following the LogSampleRate configured in the JSON.
This adds a field to redisSource for logSampleRate, and passes it
through from the JSON config in ocsp-responder/main.go.
Part of #7091
Enable the errcheck linter. Update the way we express exclusions to use
the new, non-deprecated, non-regex-based format. Fix all places where we
began accidentally violating errcheck while it was disabled.
Previously, the live-signing routine was lookking for
`rocsp.ErrRedisNotFound` errors in order to increment the
`certificate_not_found` metrics. But this was a bug, copy-pasted from
code higher in the file that does a similar check. The live-signing code
actually returns `responder.ErrNotFound`. Check for that error instead,
to properly increment our metrics.
In live.Source, translate berrors.NotFound (returned by RA when the
certificate is expired) into responder.NotFound (which causes an
Unauthorized response rather than a 5xx).
In the Redis source, remove the special case that will return a stale
response if live signing fails, and simply pass through the error from
the live source.
Before this fix, if we found a stale response in Redis, tried to get a
fresh response, and found that the certificate was expired, we would
have served the stale response rather than our usual 404 for expired
certificates. Since that messes with our metrics, we don't want to do
it.
Also, fix an incorrect use of `%w` in log.Warningf.
Give less confusing names to the metrics in checked_redis_source, e.g.
"revocation_re_sign_success" instead of "sign_and_save_success".
Also use a new enum type as the `cause` parameter to signAndSave, to
make it clear what should be passed.
Finally, in redis_source, split `counter` into two separate Prometheus
counters: one for requests in general, and a separate one for
signAndSave. The counter for signAndSave has two labels: cause and
result.
Fixes#6339
For multiSource, split out checkSecondary's metrics into their own
counter. Treat NotFound as a separate error type (so we can more
clearly distinguish the half-hourly pattern of fetches for expired
certificates).
In redisSource, add a histogram for the ages of responses fetched from
cache (regardless of whether they are served or not). This parallels
ocsp_respond_ages in ocsp/responder.go, but may show ages beyond the
compliance limit, even under normal operations, because it is checked
before signAndServe is called.
Previously we used "ExpectedFreshness" to control how frequently the
Redis source would request re-signing of stale entries. But that field
also controls whether multi_source is willing to serve a MariaDB
response. It's better to split these into two values.
This enables ocsp-responder to talk to the RA and request freshly signed
OCSP responses.
ocsp/responder/redis_source is moved to ocsp/responder/redis/redis_source.go
and significantly modified. Instead of assuming a response is always available
in Redis, it wraps a live-signing source. When a response is not available,
it attempts a live signing.
If live signing succeeds, the Redis responder returns the result right away
and attempts to write a copy to Redis on a goroutine using a background
context.
To make things more efficient, I eliminate an unneeded ocsp.ParseResponse
from the storage path. And I factored out a FakeResponse helper to make
the unittests more manageable.
Commits should be reviewable one-by-one.
Fixes#6191