This PR adds a new configuration block specifically for the otelhttp
instrumentation. This block is separate from the existing
"opentelemetry" configuration, and is only relevant when using otelhttp
instrumentation. It does not share any codepath with the existing
configuration, so it is at the top level to indicate which services it
applies to.
There's a bit of plumbing new configuration through. I've adopted the
measured_http package to also set up opentelemetry instead of just
metrics, which should hopefully allow any future changes to be smaller
(just config & there) and more consistent between the wfe2 and ocsp
responder.
There's one option here now, which disables setting
[otelhttp.WithPublicEndpoint](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp#WithPublicEndpoint).
This option is designed to do exactly what we want: Don't accept
incoming spans as parents of the new span created in the server.
Previously we had a setting to disable parent-based sampling to help
with this problem, which doesn't really make sense anymore, so let's
just remove it and simplify that setup path. The default of "false" is
designed to be the safe option. It's set to True in the test/ configs
for integration tests that use traces, and I expect we'll likely set it
true in production eventually once the LBs are configured to handle
tracing themselves.
Fixes#6851
The iotuil package has been deprecated since go1.16; the various
functions it provided now exist in the os and io packages. Replace all
instances of ioutil with either io or os, as appropriate.
These new linters are almost all part of golangci-lint's collection
of default linters, that would all be running if we weren't setting
`disable-all: true`. By adding them, we now have parity with the
default configuration, as well as the additional linters we like.
Adds the following linters:
* unconvert
* deadcode
* structcheck
* typecheck
* varcheck
* wastedassign
Completely refactor the way we organize our code related to OCSP.
- Move it all into one `//ocsp/` package, rather than having multiple
top-level packages.
- Merge the OCSP updater's config sub-package with its parent
(since it isn't necessary to break it out to avoid cyclic imports).
- Remove all `Source` logic from ocsp-responder's `main.go`, because
it was difficult to mentally trace the control flow there.
- Replace that logic with a set of composable `Source`s in the
`//ocsp/responder/` package, each of which is good at just one thing.
- Update the way the filters work to make sure that the request's
`IssuerKeyHash` and the response's `ResponderName` can both
be derived from the same issuer certificate, ensuring that the req and
resp are correctly matched.
- Split the metrics into a separate metric for each `Source`, so we can
tell what all of them are doing, not just aggregate behavior.
- Split the tests into individual files for each `Source`, and update them
for the new public interfaces.
This is the first step in moving OCSP responses from mysql to redis.
Adds support for parallel lookups to mysql and redis. The mysql source
remains the source of truth. If the secondaryLookup [redis] succeeds,
compare against the primaryLookup [mysql] and return if they concur
that the status is the same and the redis source is at least as fresh
as mysql.
There are checks on the database response for `certStatus.IsExpired`,
`certStatus.OCSPLastUpdated.IsZero()` and
`!src.filter.responseMatchesIssuer`.
The expired check isn't necessary for redis because the response will
be set with a ttl and drop out of redis when it reaches the ttl, and
delivering a response for an expired certificate until that happens
isn't a problem.
The `certStatus.OCSPLastUpdated.IsZero()` check is a MySQL check that
isn't needed in redis.
The `responseMatchesIssuer` check is important and will need to be
checked in some form before MySQL is no longer the source of truth.
There is another project to check issuer for responses and isn't scoped
for this change.
The `Source` interface in ocsp-responder defines a `Response` function.
Add a context to the function signature so that ocsp lookups can be more
tracable and cancelable. This is also a precursor to having cancelable
parallel lookups to multiple sources.
Introduce one cache tag: the last byte (hex-encoded) of the serial
number. This allows us to purge groups of responses, in chunks of
1/256 of our whole cache. We assume this is more or less evenly
distributed because serial numbers are random.
Fixes#5736.
The resulting `boulder` binary can be invoked by different names to
trigger the behavior of the relevant subcommand. For instance, symlinking
and invoking as `boulder-ca` acts as the CA. Symlinking and invoking as
`boulder-va` acts as the VA.
This reduces the .deb file size from about 200MB to about 20MB.
This works by creating a registry that maps subcommand names to `main`
functions. Each subcommand registers itself in an `init()` function. The
monolithic `boulder` binary then checks what name it was invoked with
(`os.Args[0]`), looks it up in the registry, and invokes the appropriate
`main`. To avoid conflicts, all of the old `package main` are replaced
with `package notmain`.
To get the list of registered subcommands, run `boulder --list`. This
is used when symlinking all the variants into place, to ensure the set
of symlinked names matches the entries in the registry.
Fixes#5692
Change `CertificateStatus.IssuerID` from `*int64` to just an
`int64`. It might make sense for this field to be nillable in a world
where we want to distinguish between it being missing and it
being zero, but none of our code actually does that: we error
out either way.
Part of #5152
When making an OCSP request, the client provides three pieces of
information: the URL which it is querying to get OCSP info, the
hash of the issuer public which issued the cert in question, and
the serial number of the cert in question. In Boulder, the first
of these is only provided implicitly, based on which instance of
ocsp-responder is handling the request: we ensure (via configs)
that the ocsp-responder handling a given OCSP AIA URL has the
corresponding issuer cert loaded in memory.
When handling a request, the ocsp-responder checks three things:
that the request is using SHA1 to hash the issuer public key, that
the requested issuer public key matches one of the loaded issuer
certs, and that the requested serial number is one we could have
issued (i.e. has the correct prefix). It relies on the database
query to filter out requests for non-existent serials.
However, this means that a request to an ocsp-responder instance
with issuer cert A loaded could receive and handle a request which
specifies cert A as the issuer, but names a serial which was actually
issued by issuer cert B. The checks all pass and the database lookup
succeeds. But the returned OCSP response is for a certificate that
was issued by a different issuer, and the response itself was
signed by that other issuer.
In order to resolve this potentially confusing situation, this change
adds one additional check to the ocsp-responder: after it has
retrieved the ocsp response, it looks up which issuer produced that
ocsp response, confirms that it has that issuer cert loaded in
memory, and confirms that its issuer key hash matches that in the
original request.
There is still one wrinkle if issuer certs A and B are both loaded
in one ocsp-responder, and that one ocsp-responder is handling OCSP
requests to both of their AIA OCSP URLs. In this case, it is possible
that a request to a.ocsp.com, but requesting OCSP for a cert issued
by B, could still have its request answered. This is because the
ocsp-responder itself does not know which URL was requested. But
regardless, this change does guarantee that the response will match
the contents of the request (or no response will be given), no matter
what URL that request was sent to.
Fixes#5182
The ocsp-responder takes a path to a certificate file as one of
its config values. It uses this path as one of the inputs when
constructing its DBSource, the object responsible for querying
the database for pregenerated OCSP responses to fulfill requests.
However, this certificate file is not necessary to query the
database; rather, it only acts as a filter: OCSP requests whose
IssuerKeyHash do not match the hash of the loaded certificate are
rejected outright, without querying the DB. In addition, there is
currently only support for a single certificate file in the config.
This change adds support for multiple issuer certificate files in
the config, and refactors the pre-database filtering of bad OCSP
requests into a helper object dedicated solely to that purpose.
Fixes#5119
Simplify database interactions
This change is a result of an audit of all places where
Go code directly constructs SQL queries and executes them
against a dbMap, with the goal of eliminating all instances
of constructing a well-known object type (such as a
core.CertificateStatus) from explicitly-listed database columns.
Instead, we should be relying on helper functions defined in the
sa itself to determine which columns are relevant for the
construction of any given object.
This audit did not find many places where this was occurring. It
did reveal a few simplifications, which are contained in this
change:
1) Greater use of existing SelectFoo methods provided by models.go
2) Streamlining of various SelectSingularFoo methods to always
select by serial string, rather than user-provided WHERE clause
3) One spot (in ocsp-responder) where using a well-known type seemed
better than using a more minimal custom type
Addresses #4899
In a handful of places I've nuked old stats which are not used in any alerts or dashboards as they either duplicate other stats or don't provide much insight/have never actually been used. If we feel like we need them again in the future it's trivial to add them back.
There aren't many dashboards that rely on old statsd style metrics, but a few will need to be updated when this change is deployed. There are also a few cases where prometheus labels have been changed from camel to snake case, dashboards that use these will also need to be updated. As far as I can tell no alerts are impacted by this change.
Fixes#4591.
Integrates the cfssl/ocsp responder code directly into boulder. I've tried to
pare down the existing code to only the bits we actually use and have removed
some generic interfaces in places in favor of directly using our boulder
specific interfaces.
Fixes#4427.
The ocsp-updater ocspStaleMaxAge config var has to be bumped up to ~7 months so that when it is run after the six-months-ago run it will actually update the ocsp responses generated during that period and mark the certificate status row as expired.
Fixes#4338.
Right now if ocsp-responder gets flooded with traffic, it will have a number of requests that
spend long enough waiting for an available connection that the reverse proxy will have given
up on them before they get a chance to execute the SQL query. Add a timeout parameter so
ocsp-responder can gracefully shed this load rather than try to do pointless work.
Fixes#3836.
```
$ ./test.sh
ok github.com/cloudflare/cfssl/api 1.023s coverage: 81.1% of statements
ok github.com/cloudflare/cfssl/api/bundle 1.464s coverage: 87.2% of statements
ok github.com/cloudflare/cfssl/api/certadd 16.766s coverage: 86.8% of statements
ok github.com/cloudflare/cfssl/api/client 1.062s coverage: 51.9% of statements
ok github.com/cloudflare/cfssl/api/crl 1.075s coverage: 75.0% of statements
ok github.com/cloudflare/cfssl/api/gencrl 1.038s coverage: 72.5% of statements
ok github.com/cloudflare/cfssl/api/generator 1.478s coverage: 33.3% of statements
ok github.com/cloudflare/cfssl/api/info 1.085s coverage: 84.1% of statements
ok github.com/cloudflare/cfssl/api/initca 1.050s coverage: 90.5% of statements
ok github.com/cloudflare/cfssl/api/ocsp 1.114s coverage: 93.8% of statements
ok github.com/cloudflare/cfssl/api/revoke 3.063s coverage: 75.0% of statements
ok github.com/cloudflare/cfssl/api/scan 2.988s coverage: 62.1% of statements
ok github.com/cloudflare/cfssl/api/sign 2.680s coverage: 83.3% of statements
ok github.com/cloudflare/cfssl/api/signhandler 1.114s coverage: 26.3% of statements
ok github.com/cloudflare/cfssl/auth 1.010s coverage: 68.2% of statements
ok github.com/cloudflare/cfssl/bundler 22.078s coverage: 84.5% of statements
ok github.com/cloudflare/cfssl/certdb/dbconf 1.013s coverage: 84.2% of statements
ok github.com/cloudflare/cfssl/certdb/ocspstapling 1.302s coverage: 69.2% of statements
ok github.com/cloudflare/cfssl/certdb/sql 1.223s coverage: 70.5% of statements
ok github.com/cloudflare/cfssl/cli 1.014s coverage: 62.5% of statements
ok github.com/cloudflare/cfssl/cli/bundle 1.011s coverage: 0.0% of statements [no tests to run]
ok github.com/cloudflare/cfssl/cli/crl 1.086s coverage: 57.8% of statements
ok github.com/cloudflare/cfssl/cli/gencert 7.927s coverage: 83.6% of statements
ok github.com/cloudflare/cfssl/cli/gencrl 1.064s coverage: 73.3% of statements
ok github.com/cloudflare/cfssl/cli/gencsr 1.058s coverage: 70.3% of statements
ok github.com/cloudflare/cfssl/cli/genkey 2.718s coverage: 70.0% of statements
ok github.com/cloudflare/cfssl/cli/ocsprefresh 1.077s coverage: 64.3% of statements
ok github.com/cloudflare/cfssl/cli/revoke 1.033s coverage: 88.2% of statements
ok github.com/cloudflare/cfssl/cli/scan 1.014s coverage: 36.0% of statements
ok github.com/cloudflare/cfssl/cli/selfsign 2.342s coverage: 73.2% of statements
ok github.com/cloudflare/cfssl/cli/serve 1.076s coverage: 38.2% of statements
ok github.com/cloudflare/cfssl/cli/sign 1.070s coverage: 54.8% of statements
ok github.com/cloudflare/cfssl/cli/version 1.011s coverage: 100.0% of statements
ok github.com/cloudflare/cfssl/cmd/cfssl 1.028s coverage: 0.0% of statements [no tests to run]
ok github.com/cloudflare/cfssl/cmd/cfssljson 1.012s coverage: 3.4% of statements
ok github.com/cloudflare/cfssl/cmd/mkbundle 1.011s coverage: 0.0% of statements [no tests to run]
ok github.com/cloudflare/cfssl/config 1.023s coverage: 67.7% of statements
ok github.com/cloudflare/cfssl/crl 1.054s coverage: 68.3% of statements
ok github.com/cloudflare/cfssl/csr 8.473s coverage: 89.6% of statements
ok github.com/cloudflare/cfssl/errors 1.014s coverage: 79.6% of statements
ok github.com/cloudflare/cfssl/helpers 1.216s coverage: 80.6% of statements
ok github.com/cloudflare/cfssl/helpers/derhelpers 1.017s coverage: 48.0% of statements
ok github.com/cloudflare/cfssl/helpers/testsuite 7.826s coverage: 65.8% of statements
ok github.com/cloudflare/cfssl/initca 151.314s coverage: 73.2% of statements
ok github.com/cloudflare/cfssl/log 1.013s coverage: 59.3% of statements
ok github.com/cloudflare/cfssl/multiroot/config 1.258s coverage: 77.4% of statements
ok github.com/cloudflare/cfssl/ocsp 1.353s coverage: 75.1% of statements
ok github.com/cloudflare/cfssl/revoke 1.149s coverage: 75.0% of statements
ok github.com/cloudflare/cfssl/scan 1.023s coverage: 1.1% of statements
skipped github.com/cloudflare/cfssl/scan/crypto/md5
skipped github.com/cloudflare/cfssl/scan/crypto/rsa
skipped github.com/cloudflare/cfssl/scan/crypto/sha1
skipped github.com/cloudflare/cfssl/scan/crypto/sha256
skipped github.com/cloudflare/cfssl/scan/crypto/sha512
skipped github.com/cloudflare/cfssl/scan/crypto/tls
ok github.com/cloudflare/cfssl/selfsign 1.098s coverage: 70.0% of statements
ok github.com/cloudflare/cfssl/signer 1.020s coverage: 19.4% of statements
ok github.com/cloudflare/cfssl/signer/local 4.886s coverage: 77.9% of statements
ok github.com/cloudflare/cfssl/signer/remote 2.500s coverage: 70.0% of statements
ok github.com/cloudflare/cfssl/signer/universal 2.228s coverage: 67.7% of statements
ok github.com/cloudflare/cfssl/transport 1.012s
ok github.com/cloudflare/cfssl/transport/ca/localca 1.046s coverage: 94.9% of statements
ok github.com/cloudflare/cfssl/transport/kp 1.050s coverage: 37.1% of statements
ok github.com/cloudflare/cfssl/ubiquity 1.037s coverage: 88.3% of statements
ok github.com/cloudflare/cfssl/whitelist 3.519s coverage: 100.0% of statements
...
$ go test ./... (master✱)
ok golang.org/x/crypto/acme 2.782s
ok golang.org/x/crypto/acme/autocert 2.963s
? golang.org/x/crypto/acme/autocert/internal/acmetest [no test files]
ok golang.org/x/crypto/argon2 0.047s
ok golang.org/x/crypto/bcrypt 4.694s
ok golang.org/x/crypto/blake2b 0.056s
ok golang.org/x/crypto/blake2s 0.050s
ok golang.org/x/crypto/blowfish 0.015s
ok golang.org/x/crypto/bn256 0.460s
ok golang.org/x/crypto/cast5 4.204s
ok golang.org/x/crypto/chacha20poly1305 0.560s
ok golang.org/x/crypto/cryptobyte 0.014s
? golang.org/x/crypto/cryptobyte/asn1 [no test files]
ok golang.org/x/crypto/curve25519 0.025s
ok golang.org/x/crypto/ed25519 0.073s
? golang.org/x/crypto/ed25519/internal/edwards25519 [no test files]
ok golang.org/x/crypto/hkdf 0.012s
ok golang.org/x/crypto/internal/chacha20 0.047s
ok golang.org/x/crypto/internal/subtle 0.011s
ok golang.org/x/crypto/md4 0.013s
ok golang.org/x/crypto/nacl/auth 9.226s
ok golang.org/x/crypto/nacl/box 0.016s
ok golang.org/x/crypto/nacl/secretbox 0.012s
ok golang.org/x/crypto/nacl/sign 0.012s
ok golang.org/x/crypto/ocsp 0.047s
ok golang.org/x/crypto/openpgp 8.872s
ok golang.org/x/crypto/openpgp/armor 0.012s
ok golang.org/x/crypto/openpgp/clearsign 16.984s
ok golang.org/x/crypto/openpgp/elgamal 0.013s
? golang.org/x/crypto/openpgp/errors [no test files]
ok golang.org/x/crypto/openpgp/packet 0.159s
ok golang.org/x/crypto/openpgp/s2k 7.597s
ok golang.org/x/crypto/otr 0.612s
ok golang.org/x/crypto/pbkdf2 0.045s
ok golang.org/x/crypto/pkcs12 0.073s
ok golang.org/x/crypto/pkcs12/internal/rc2 0.013s
ok golang.org/x/crypto/poly1305 0.016s
ok golang.org/x/crypto/ripemd160 0.034s
ok golang.org/x/crypto/salsa20 0.013s
ok golang.org/x/crypto/salsa20/salsa 0.013s
ok golang.org/x/crypto/scrypt 0.942s
ok golang.org/x/crypto/sha3 0.140s
ok golang.org/x/crypto/ssh 0.939s
ok golang.org/x/crypto/ssh/agent 0.529s
ok golang.org/x/crypto/ssh/knownhosts 0.027s
ok golang.org/x/crypto/ssh/terminal 0.016s
ok golang.org/x/crypto/tea 0.010s
ok golang.org/x/crypto/twofish 0.019s
ok golang.org/x/crypto/xtea 0.012s
ok golang.org/x/crypto/xts 0.016s
```
A match of an OCSP request's serial number to *any* of the configured `reqSerialPrefixes` entries is sufficient for the request to be valid, not just the last `reqSerialPrefixes` entry.
Resolves https://github.com/letsencrypt/boulder/issues/3829
Previously, all errors were treated as Not Found, but we actually want
to treat database errors differently; for instance, by not caching them,
and by setting tighter alerting thresholds for them.
Fixes#3419.
Uses a special mux for the OCSP Responder so that we stop collapsing double slashes in GET requests which cause a small number of requests to be considered malformed.
- Remove error signatures from log methods. This means fewer places where errcheck will show ignored errors.
- Pull in latest cfssl to be compatible with errorless log messages.
- Reduce the number of message priorities we support to just those we actually use.
- AuditNotice -> AuditInfo
- Remove InfoObject (only one use, switched to Info)
- Remove EmergencyExit and related functions in favor of panic
- Remove SyslogWriter / AuditLogger separate types in favor of a single interface, Logger, that has all the logging methods on it.
- Merge mock log into logger. This allows us to unexport the internals but still override them in the mock.
- Shorten names to be compatible with Go style: New, Set, Get, Logger, NewMock, etc.
- Use a shorter log format for stdout logs.
- Remove "... Starting" log messages. We have better information in the "Versions" message logged at startup.
Motivation: The AuditLogger / SyslogWriter distinction was confusing and exposed internals only necessary for tests. Some components accepted one type and some accepted the other. This made it hard to consistently use mock loggers in tests. Also, the unnecessarily fat interface for AuditLogger made it hard to meaningfully mock out.
Some stat services, we believe, are saying the ocsp-responder is down
because / returns 400 Bad Request currently.
Shuffle some code into a new `mux` function to make it easier to test.
Fixes https://github.com/letsencrypt/boulder/issues/898
Also removes currently-unused 'development' DB, and do initial migrations in
parallel, which shortens create_db.sh from 20 seconds to 10 seconds.
Changes ResetTestDatabase into two functions, one each for SA and Policy DBs,
which take care of setting up the DB connection using a special higher-privileged
user called test_setup.
* Moves revocation from the CA to the OCSP-Updater, the RA will mark certificates as
revoked then wait for the OCSP-Updater to create a new (final) revoked response
* Merges the ocspResponses table with the certificateStatus table and only use UPDATES
to update the OCSP response (vs INSERT-only since this happens quite often and will
lead to an extremely large table)
We had a staging deploy go bad because of the missing error handling on
the "source" config not being in the JSON. While we debugged, I wrote
some tests.
Fixes#936.