The stderr, when reviewing logs, saying that the sample size was 0
implies that cert-checker sampled no certificates. In reality, it
is working fine; that zero indicates that the JSON array dumped to
stdout has 0 entries, which is actually good.
Let's just fix this stderr message (which isn't parsed by any tooling we
have) so that the next poor SRE that sees it doesn't freak out and file
tickets to start incident response.
Boulder components initialize their gorp and gorp-less (non-wrapped) database
clients via two new SA helpers. These helpers handle client construction,
database metric initialization, and (for gorp only) debug logging setup.
Removes transaction isolation parameter `'READ-UNCOMMITTED'` from all database
connections.
Fixes#5715Fixes#5889
Use `crypto.x509` to parse the certificate instead of
`github.com/zmap/zcrypto/x509` before passing to the GoodKey policy check.
The `zmap/zcrypto` version sets ecdsa certificate key types that are not
expected in the GoodKey check.
- Import `crypto/x509` natively.
- Change `zcrypto/x509` to be imported as zX509 for clarity.
- Create a test case for an ECDSA key.
Fixes#5883
Instead of getting the number of certificates to work on with a
`select count(*)` and bailing when that number has been
retrieved, get batches until the last certificate in the batch
is newer than the start window. If it is, then `getCerts` is done.
The resulting `boulder` binary can be invoked by different names to
trigger the behavior of the relevant subcommand. For instance, symlinking
and invoking as `boulder-ca` acts as the CA. Symlinking and invoking as
`boulder-va` acts as the VA.
This reduces the .deb file size from about 200MB to about 20MB.
This works by creating a registry that maps subcommand names to `main`
functions. Each subcommand registers itself in an `init()` function. The
monolithic `boulder` binary then checks what name it was invoked with
(`os.Args[0]`), looks it up in the registry, and invokes the appropriate
`main`. To avoid conflicts, all of the old `package main` are replaced
with `package notmain`.
To get the list of registered subcommands, run `boulder --list`. This
is used when symlinking all the variants into place, to ensure the set
of symlinked names matches the entries in the registry.
Fixes#5692
- Replace `gorp.DbMap` with calls that use `sql.DB` directly
- Use `rows.Scan()` and `rows.Next()` to get query results (which opens the door to streaming the results)
- Export function `CertStatusMetadataFields` from `SA`
- Add new function `ScanCertStatusRow` to `SA`
- Add new function `NewDbSettingsFromDBConfig` to `SA`
Fixes#5642
Part Of #5715
Add `acceptableValidityDurations` to cert-checker's config, which
uses `cmd.ConfigDuration` instead of `int` to express the acceptable
validity periods. Deprecate the older int-based `acceptableValidityPeriods`.
This makes it easier to reason about the values in the configs, and
brings this config into line with other configs (such as the CA).
Fixes#5542
Throw away the result of parsing various command-line flags in
cert-checker. Leave the flags themselves in place to avoid breaking
any scripts which pass them, but only respect the values provided by
the config file.
Part of #5489
It's possible for an AUTO_INCREMENT database field (such as the ID
field on the Certificates table) to not actually be monotonically
increasing. As such, it is possible for cert-checker to get into a
state where it correctly queries for the number of certs it has to
process and the minimum ID from among those certs, but then actually
retrieves much older certs when querying for them just by ID.
Update cert-checker's query to predicate on id, expiry, and issuance
time, to avoid checking certs which were issued a significant time in
the past.
Part of #5541
This changeset adds a second DB connect string for the SA for use in
read-only queries that are not themselves dependencies for read-write
queries. In other words, this is attempting to only catch things like
rate-limit `SELECT`s and other coarse-counting, so we can potentially
move those read queries off the read-write primary database.
It also adds a second DB connect string to the OCSP Updater. This is a
little trickier, as the subsequent `UPDATE`s _are_ dependent on the
output of the `SELECT`, but in this case it's operating on data batches,
and a few seconds' replication latency are several orders of magnitude
below the threshold for update frequency, so any certificates that
aren't caught on run `n` can be caught on run `n+1`.
Since we export DB metrics to Prometheus, this also refactors
`InitDBMetrics` to take a DB Address (host:port tuple) and User out of
the DB connection DSN and include those as labels in the metrics.
Fixes#5550Fixes#4985
Add a new `acceptableValidityPeriods` field to cert-checker's config.
This field is a list of integers representing validity periods measured
in seconds (so 7776000 is 90 days). This field is multi-valued to enable
transitions between different validity periods (e.g. 90 days + 1 second
to 90 days, or 90 days to 30 days). If the field is not provided,
cert-checker defaults to 90 days.
Also update the way that cert-checker computes the validity period of
the certificates it is checking to include the full width of the final
second represented by the notAfter timestamp.
Finally, update the tests to support this new behavior.
Fixes#5472
We are in the process of reducing our validity periods by one second,
from 90 days plus one second, to exactly 90 days. This change causes
cert-checker to be comfortable with certificates that have either of
those validity periods.
Future work is necessary to make cert-checker much more robust
and configurable, so we don't need changes like this every time we
reduce our validity period.
Part of #5472
Switch from using `ORDER BY` and `LIMIT` to obtain a minimum ID from the
certificates table, to using the `MIN()` aggregation function.
Relational databases are most optimized for set aggregation functions,
and anywhere that aggregations can be used for `SELECT` queries tends to
bring performance improvements. Experimentally this is an
order-of-magnitude improvement in query time. Theoretically the query
optimizer should have constructed the same underlying query from each,
but it didn't.
Partially reverts #5400
Fixes of #5393
Explicitly opt in to the least-consistent transaction coherency for the
duration of all cert-checker queries.
The primary risk here is that the windowed table scan across the
certificates table can, on replicas, read a series of rows that aren't
from consistent timesteps. However, the certificates table is
append-only, so in practice this is not a concern, and there is no risk
to enabling the dirtiest of reads, done dirt cheap.
This doesn't impact the length of the window function, so existing
overlap mechanisms to ensure coverage will remain as good as they are
today.
Based on #5400
Part of #5393
Update the pinned version of zlint from v2.2.1 to v3.1.0.
Also update the relevant path from v2 to v3 in both go.mod
and in individual imports. Update the vendored files to match.
No changes from v2.2.1 to v3.1.0 appear to affect the lints
we directly care about (e.g. those that we explicitly ignore).
Fixes#5206
Named field `DB`, in a each component configuration struct, acts as the
receiver for the value of `db` when component JSON files are
unmarshalled.
When `cmd.DBConfig` fields are received at the root of component
configuration struct instead of `DB` copy them to the `DB` field of the
component configuration struct.
Move existing `cmd.DBConfig` values from the root of each component's
JSON configuration in `test/config-next` to `db`
Part of #5275
In #5235 we replaced MaxDBConns in favor of MaxOpenConns.
One week ago MaxDBConns was removed from all dev, staging, and
production configurations. This change completes the removal of
MaxDBConns from all components and test/config.
Fixes#5249
Historically the only database/sql driver setting exposed via JSON
config was maxDBConns. This change adds support for maxIdleConns,
connMaxLifetime, connMaxIdleTime, and renames maxDBConns to
maxOpenConns. The addition of these settings will give our SRE team a
convenient method for tuning the reuse/closure of database connections.
A new struct, DBSettings, has been added to SA. The struct, and each of
it's fields has been commented.
All new fields have been plumbed through to the relevant Boulder
components and exported as Prometheus metrics. Tests have been
added/modified to ensure that the fields are being set. There should be
no loss in coverage
Deployability concerns for the migration from maxDBConns to maxOpenConns
have been addressed with the temporary addition of the helper method
cmd.DBConfig.GetMaxOpenConns(). This method can be removed once
test/config is defaulted to using maxOpenConns. Relevant sections of the
code have TODOs added that link back to an newly opened issue.
Fixes#5199
Simplify database interactions
This change is a result of an audit of all places where
Go code directly constructs SQL queries and executes them
against a dbMap, with the goal of eliminating all instances
of constructing a well-known object type (such as a
core.CertificateStatus) from explicitly-listed database columns.
Instead, we should be relying on helper functions defined in the
sa itself to determine which columns are relevant for the
construction of any given object.
This audit did not find many places where this was occurring. It
did reveal a few simplifications, which are contained in this
change:
1) Greater use of existing SelectFoo methods provided by models.go
2) Streamlining of various SelectSingularFoo methods to always
select by serial string, rather than user-provided WHERE clause
3) One spot (in ocsp-responder) where using a well-known type seemed
better than using a more minimal custom type
Addresses #4899
In a handful of places I've nuked old stats which are not used in any alerts or dashboards as they either duplicate other stats or don't provide much insight/have never actually been used. If we feel like we need them again in the future it's trivial to add them back.
There aren't many dashboards that rely on old statsd style metrics, but a few will need to be updated when this change is deployed. There are also a few cases where prometheus labels have been changed from camel to snake case, dashboards that use these will also need to be updated. As far as I can tell no alerts are impacted by this change.
Fixes#4591.
`cert-checker` assumes an undefined behavior of MySQL which is only sometimes true, which means sometimes we select fewer certificates than we actually expect to. Instead of adding an explicit ORDER BY we simply switch to cursoring using the primary key, which gets us overall much more efficient usage of indexes.
Fixes#4315.
This updates the `cert-checker` utility configuration with a new allow list of
ignored lints so we can exclude known false-positives/accepted info results by
name instead of result level. To start only the `n_subject_common_name_included`
lint is excluded in `test/config-next/cert-checker.json`. Once this lands we can
treat info/warning lint results as errors as a follow-up to not break
deployability guarantees.
Resolves https://github.com/letsencrypt/boulder/issues/4271
This will allow implementing sub-problems without creating a cyclic
dependency between `core` and `problems`.
The `identifier` package is somewhat small/single-purpose and in the
future we may want to move more "ACME" bits beyond the `identifier`
types into a dedicated package outside of `core`.
This follows up on some refactoring we had done previously but not
completed. This removes various binary-specific config structs from the
common cmd package, and moves them into their appropriate packages. In
the case of CT configs, they had to be moved into their own package to
avoid a dependency loop between RA and ctpolicy.
Go 1.11+ updated the `sql.DBStats` struct with new fields that are of
interest to us. This PR routes these stats to Prometheus by replacing
the existing autoprom stats code with new first-class Prometheus
metrics. Resolves https://github.com/letsencrypt/boulder/issues/4095
The `max_db_connections` stat from the SA is removed because the Go 1.11+
`sql.DBStats.MaxOpenConnections` field will give us a better view of
the same information.
The autoprom "reused_authz" stat that was being incremented in
`SA.GetPendingAuthorization` was also removed. It wasn't doing what it
says it was (counting reused authorizations) and was instead counting
the number of times `GetPendingAuthorization` returned an authz.
Removes the checks for a handful of deployed feature flags in preparation for removing the flags entirely. Also moves all of the currently deprecated flags to a separate section of the flags list so they can be more easily removed once purged from production configs.
Fixes#3880.
Switch linting library to zmap/zlint.
```
github.com/zmap/zlint$ go test ./...
ok github.com/zmap/zlint 0.190s
? github.com/zmap/zlint/cmd/zlint [no test files]
ok github.com/zmap/zlint/lints 0.216s
ok github.com/zmap/zlint/util (cached)
```
* Update `globalsign/certlint` to d4a45be.
This commit updates the `github.com/globalsign/certlint` dependency to
the latest tip of master (d4a45be06892f3e664f69892aca79a48df510be0).
Unit tests are confirmed to pass:
```
$ go test ./...
ok github.com/globalsign/certlint 3.816s
ok github.com/globalsign/certlint/asn1 (cached)
? github.com/globalsign/certlint/certdata [no test files]
? github.com/globalsign/certlint/checks [no test files]
? github.com/globalsign/certlint/checks/certificate/aiaissuers [no
test files]
? github.com/globalsign/certlint/checks/certificate/all [no test
files]
? github.com/globalsign/certlint/checks/certificate/basicconstraints
[no test files]
? github.com/globalsign/certlint/checks/certificate/extensions [no
test files]
? github.com/globalsign/certlint/checks/certificate/extkeyusage [no
test files]
ok github.com/globalsign/certlint/checks/certificate/internal
(cached)
? github.com/globalsign/certlint/checks/certificate/issuerdn [no
test files]
? github.com/globalsign/certlint/checks/certificate/keyusage [no
test files]
? github.com/globalsign/certlint/checks/certificate/publickey [no
test files]
? github.com/globalsign/certlint/checks/certificate/publickey/goodkey
[no test files]
ok github.com/globalsign/certlint/checks/certificate/publicsuffix
(cached)
? github.com/globalsign/certlint/checks/certificate/revocation [no
test files]
? github.com/globalsign/certlint/checks/certificate/serialnumber
[no test files]
? github.com/globalsign/certlint/checks/certificate/signaturealgorithm
[no test files]
ok github.com/globalsign/certlint/checks/certificate/subject (cached)
ok github.com/globalsign/certlint/checks/certificate/subjectaltname
(cached)
? github.com/globalsign/certlint/checks/certificate/validity [no
test files]
? github.com/globalsign/certlint/checks/certificate/version [no test
files]
? github.com/globalsign/certlint/checks/certificate/wildcard [no
test files]
? github.com/globalsign/certlint/checks/extensions/adobetimestamp
[no test files]
? github.com/globalsign/certlint/checks/extensions/all [no test
files]
? github.com/globalsign/certlint/checks/extensions/authorityinfoaccess
[no test files]
? github.com/globalsign/certlint/checks/extensions/authoritykeyid
[no test files]
? github.com/globalsign/certlint/checks/extensions/basicconstraints
[no test files]
? github.com/globalsign/certlint/checks/extensions/crldistributionpoints
[no test files]
? github.com/globalsign/certlint/checks/extensions/ct [no test
files]
? github.com/globalsign/certlint/checks/extensions/extkeyusage [no
test files]
? github.com/globalsign/certlint/checks/extensions/keyusage [no test
files]
? github.com/globalsign/certlint/checks/extensions/nameconstraints
[no test files]
ok github.com/globalsign/certlint/checks/extensions/ocspmuststaple
(cached)
? github.com/globalsign/certlint/checks/extensions/ocspnocheck [no
test files]
? github.com/globalsign/certlint/checks/extensions/pdfrevocation
[no test files]
? github.com/globalsign/certlint/checks/extensions/policyidentifiers
[no test files]
? github.com/globalsign/certlint/checks/extensions/smimecapabilities
[no test files]
? github.com/globalsign/certlint/checks/extensions/subjectaltname
[no test files]
? github.com/globalsign/certlint/checks/extensions/subjectkeyid [no
test files]
ok github.com/globalsign/certlint/errors (cached)
? github.com/globalsign/certlint/examples/ct [no test files]
? github.com/globalsign/certlint/examples/specificchecks [no test
files]
```
* Certchecker: Remove OCSP Must Staple err ignore, fix typos.
This commit removes the explicit ignore for OCSP Must Staple errors that
was added when the upstream `certlint` package didn't understand that
PKIX extension. That problem was resolved and so we can remove the
ignore from `cert-checker`.
This commit also fixes two typos that were fixed upstream and needed to
be reflected in expected error messages in the `certlint` unit test.
* Certchecker: Ignore Certlint CN/SAN == PSL errors.
`globalsign/certlint`, used by `cmd/cert-checker` to vet certs,
improperly flags certificates that have subj CN/SANs equal to a private
entry in the public suffix list as faulty.
This commit adds a regex that will skip errors that match the certlint
PSL error string. Prior to this workaround the addition of a private PSL
entry as a SAN in the `TestCheckCert` test cert fails the test:
```
--- FAIL: TestCheckCert (1.72s)
main_test.go:221: Found unexpected problem 'Certificate subjectAltName
"dev-myqnapcloud.com" equals "dev-myqnapcloud.com" from the public
suffix list'.
```
With the workaround in place, the test passes again.
The upstream `certlint` package doesn't understand the RFC 7633 OCSP
Must Staple PKIX Extension and flags its presence as an error. Until
this is resolved upstream this commit updates `cmd/cert-checker` to
ignore the error.
The `TestCheckCert` unit test is updated to add an unsupported extension
and the OCSP must staple extension to its test cert. Only the
unsupported extension should be flagged as a problem.
Fixes#3020.
In order to write integration tests for some features, especially related to rate limiting, rechecking of CAA, and expiration of authzs, orders, and certs, we need to be able to fake the passage of time in integration tests.
To do so, this change switches out all clock.Default() instances for cmd.Clock(), which can be set manually with the FAKECLOCK environment variable. integration-test.py now starts up all servers once before the main body of tests, with FAKECLOCK set to a date 70 days ago, and does some initial setup for a new integration test case. That test case tries to fetch a 70-day-old authz URL, and expects it to 404.
In order to make this work, I also had to change a number of our test binaries to shut down cleanly in response to SIGTERM. Without that change, stopping the servers between the setup phase and the main tests caused startservers.check() to fail, because some processes exited with nonzero status.
Note: This is an initial stab at things, to prove out the technique. Long-term, I think we will want to use an idiom where test cases are classes that have a number of optional setup phases that may be run at e.g. 70 days prior and 5 days prior. This could help us avoid a proliferation of global state as we add more time-dependent test cases.
This removes the config and code to output to statsd.
- Change `cmd.StatsAndLogging` to output a `Scope`, not a `Statter`.
- Remove the prefixing of component name (e.g. "VA") in front of stats; this was stripped by `autoProm` but now no longer needs to be.
- Delete vendored statsd client.
- Delete `MockStatter` (generated by gomock) and `mocks.Statter` (hand generated) in favor of mocking `metrics.Scope`, which is the interface we now use everywhere.
- Remove a few unused methods on `metrics.Scope`, and update its generated mock.
- Refactor `autoProm` and add `autoRegisterer`, which can be included in a `metrics.Scope`, avoiding global state. `autoProm` now registers everything with the `prometheus.Registerer` it is given.
- Change va_test.go's `setup()` to not return a stats object; instead the individual tests that care about stats override `va.stats` directly.
Fixes#2639, #2733.
In https://groups.google.com/forum/#!topic/mozilla.dev.security.policy/_pSjsrZrTWY, we had a problem with the policy authority configuration, but cert-checker didn't alert about it because it uses the same policy configuration.
This PR adds support for an explicit list of regular expressions used to match forbidden names. The regular expressions are applied after the PA has done its usual validation process in order to act as a defense-in-depth mechanism for cases (.mil, .local, etc) that we know we never want to support, even if the PA thinks they are valid (e.g. due to a policy configuration malfunction).
Initially the forbidden name regexps are:
`^\s*$`,
`\.mil$`,
`\.local$`,
`^localhost$`,
`\.localhost$`,
Additionally, the existing cert-checker.json config in both test/config/ and test/config-next/ was missing the hostnamePolicyFile entry required for operation of cert-checker. This PR adds a hostnamePolicyFile entry pointing at the existing test/hostname-policy.json file. The cert checker can now be used in the dev env with cert-checker -config test/config/cert-checker.json without error.
Resolves#2366