Add a new `acceptableValidityPeriods` field to cert-checker's config.
This field is a list of integers representing validity periods measured
in seconds (so 7776000 is 90 days). This field is multi-valued to enable
transitions between different validity periods (e.g. 90 days + 1 second
to 90 days, or 90 days to 30 days). If the field is not provided,
cert-checker defaults to 90 days.
Also update the way that cert-checker computes the validity period of
the certificates it is checking to include the full width of the final
second represented by the notAfter timestamp.
Finally, update the tests to support this new behavior.
Fixes#5472
We are in the process of reducing our validity periods by one second,
from 90 days plus one second, to exactly 90 days. This change causes
cert-checker to be comfortable with certificates that have either of
those validity periods.
Future work is necessary to make cert-checker much more robust
and configurable, so we don't need changes like this every time we
reduce our validity period.
Part of #5472
Switch from using `ORDER BY` and `LIMIT` to obtain a minimum ID from the
certificates table, to using the `MIN()` aggregation function.
Relational databases are most optimized for set aggregation functions,
and anywhere that aggregations can be used for `SELECT` queries tends to
bring performance improvements. Experimentally this is an
order-of-magnitude improvement in query time. Theoretically the query
optimizer should have constructed the same underlying query from each,
but it didn't.
Partially reverts #5400
Fixes of #5393
Explicitly opt in to the least-consistent transaction coherency for the
duration of all cert-checker queries.
The primary risk here is that the windowed table scan across the
certificates table can, on replicas, read a series of rows that aren't
from consistent timesteps. However, the certificates table is
append-only, so in practice this is not a concern, and there is no risk
to enabling the dirtiest of reads, done dirt cheap.
This doesn't impact the length of the window function, so existing
overlap mechanisms to ensure coverage will remain as good as they are
today.
Based on #5400
Part of #5393
Update the pinned version of zlint from v2.2.1 to v3.1.0.
Also update the relevant path from v2 to v3 in both go.mod
and in individual imports. Update the vendored files to match.
No changes from v2.2.1 to v3.1.0 appear to affect the lints
we directly care about (e.g. those that we explicitly ignore).
Fixes#5206
Named field `DB`, in a each component configuration struct, acts as the
receiver for the value of `db` when component JSON files are
unmarshalled.
When `cmd.DBConfig` fields are received at the root of component
configuration struct instead of `DB` copy them to the `DB` field of the
component configuration struct.
Move existing `cmd.DBConfig` values from the root of each component's
JSON configuration in `test/config-next` to `db`
Part of #5275
In #5235 we replaced MaxDBConns in favor of MaxOpenConns.
One week ago MaxDBConns was removed from all dev, staging, and
production configurations. This change completes the removal of
MaxDBConns from all components and test/config.
Fixes#5249
Historically the only database/sql driver setting exposed via JSON
config was maxDBConns. This change adds support for maxIdleConns,
connMaxLifetime, connMaxIdleTime, and renames maxDBConns to
maxOpenConns. The addition of these settings will give our SRE team a
convenient method for tuning the reuse/closure of database connections.
A new struct, DBSettings, has been added to SA. The struct, and each of
it's fields has been commented.
All new fields have been plumbed through to the relevant Boulder
components and exported as Prometheus metrics. Tests have been
added/modified to ensure that the fields are being set. There should be
no loss in coverage
Deployability concerns for the migration from maxDBConns to maxOpenConns
have been addressed with the temporary addition of the helper method
cmd.DBConfig.GetMaxOpenConns(). This method can be removed once
test/config is defaulted to using maxOpenConns. Relevant sections of the
code have TODOs added that link back to an newly opened issue.
Fixes#5199
Simplify database interactions
This change is a result of an audit of all places where
Go code directly constructs SQL queries and executes them
against a dbMap, with the goal of eliminating all instances
of constructing a well-known object type (such as a
core.CertificateStatus) from explicitly-listed database columns.
Instead, we should be relying on helper functions defined in the
sa itself to determine which columns are relevant for the
construction of any given object.
This audit did not find many places where this was occurring. It
did reveal a few simplifications, which are contained in this
change:
1) Greater use of existing SelectFoo methods provided by models.go
2) Streamlining of various SelectSingularFoo methods to always
select by serial string, rather than user-provided WHERE clause
3) One spot (in ocsp-responder) where using a well-known type seemed
better than using a more minimal custom type
Addresses #4899
In a handful of places I've nuked old stats which are not used in any alerts or dashboards as they either duplicate other stats or don't provide much insight/have never actually been used. If we feel like we need them again in the future it's trivial to add them back.
There aren't many dashboards that rely on old statsd style metrics, but a few will need to be updated when this change is deployed. There are also a few cases where prometheus labels have been changed from camel to snake case, dashboards that use these will also need to be updated. As far as I can tell no alerts are impacted by this change.
Fixes#4591.
`cert-checker` assumes an undefined behavior of MySQL which is only sometimes true, which means sometimes we select fewer certificates than we actually expect to. Instead of adding an explicit ORDER BY we simply switch to cursoring using the primary key, which gets us overall much more efficient usage of indexes.
Fixes#4315.
This updates the `cert-checker` utility configuration with a new allow list of
ignored lints so we can exclude known false-positives/accepted info results by
name instead of result level. To start only the `n_subject_common_name_included`
lint is excluded in `test/config-next/cert-checker.json`. Once this lands we can
treat info/warning lint results as errors as a follow-up to not break
deployability guarantees.
Resolves https://github.com/letsencrypt/boulder/issues/4271
This will allow implementing sub-problems without creating a cyclic
dependency between `core` and `problems`.
The `identifier` package is somewhat small/single-purpose and in the
future we may want to move more "ACME" bits beyond the `identifier`
types into a dedicated package outside of `core`.
This follows up on some refactoring we had done previously but not
completed. This removes various binary-specific config structs from the
common cmd package, and moves them into their appropriate packages. In
the case of CT configs, they had to be moved into their own package to
avoid a dependency loop between RA and ctpolicy.
Go 1.11+ updated the `sql.DBStats` struct with new fields that are of
interest to us. This PR routes these stats to Prometheus by replacing
the existing autoprom stats code with new first-class Prometheus
metrics. Resolves https://github.com/letsencrypt/boulder/issues/4095
The `max_db_connections` stat from the SA is removed because the Go 1.11+
`sql.DBStats.MaxOpenConnections` field will give us a better view of
the same information.
The autoprom "reused_authz" stat that was being incremented in
`SA.GetPendingAuthorization` was also removed. It wasn't doing what it
says it was (counting reused authorizations) and was instead counting
the number of times `GetPendingAuthorization` returned an authz.
Removes the checks for a handful of deployed feature flags in preparation for removing the flags entirely. Also moves all of the currently deprecated flags to a separate section of the flags list so they can be more easily removed once purged from production configs.
Fixes#3880.
Switch linting library to zmap/zlint.
```
github.com/zmap/zlint$ go test ./...
ok github.com/zmap/zlint 0.190s
? github.com/zmap/zlint/cmd/zlint [no test files]
ok github.com/zmap/zlint/lints 0.216s
ok github.com/zmap/zlint/util (cached)
```
* Update `globalsign/certlint` to d4a45be.
This commit updates the `github.com/globalsign/certlint` dependency to
the latest tip of master (d4a45be06892f3e664f69892aca79a48df510be0).
Unit tests are confirmed to pass:
```
$ go test ./...
ok github.com/globalsign/certlint 3.816s
ok github.com/globalsign/certlint/asn1 (cached)
? github.com/globalsign/certlint/certdata [no test files]
? github.com/globalsign/certlint/checks [no test files]
? github.com/globalsign/certlint/checks/certificate/aiaissuers [no
test files]
? github.com/globalsign/certlint/checks/certificate/all [no test
files]
? github.com/globalsign/certlint/checks/certificate/basicconstraints
[no test files]
? github.com/globalsign/certlint/checks/certificate/extensions [no
test files]
? github.com/globalsign/certlint/checks/certificate/extkeyusage [no
test files]
ok github.com/globalsign/certlint/checks/certificate/internal
(cached)
? github.com/globalsign/certlint/checks/certificate/issuerdn [no
test files]
? github.com/globalsign/certlint/checks/certificate/keyusage [no
test files]
? github.com/globalsign/certlint/checks/certificate/publickey [no
test files]
? github.com/globalsign/certlint/checks/certificate/publickey/goodkey
[no test files]
ok github.com/globalsign/certlint/checks/certificate/publicsuffix
(cached)
? github.com/globalsign/certlint/checks/certificate/revocation [no
test files]
? github.com/globalsign/certlint/checks/certificate/serialnumber
[no test files]
? github.com/globalsign/certlint/checks/certificate/signaturealgorithm
[no test files]
ok github.com/globalsign/certlint/checks/certificate/subject (cached)
ok github.com/globalsign/certlint/checks/certificate/subjectaltname
(cached)
? github.com/globalsign/certlint/checks/certificate/validity [no
test files]
? github.com/globalsign/certlint/checks/certificate/version [no test
files]
? github.com/globalsign/certlint/checks/certificate/wildcard [no
test files]
? github.com/globalsign/certlint/checks/extensions/adobetimestamp
[no test files]
? github.com/globalsign/certlint/checks/extensions/all [no test
files]
? github.com/globalsign/certlint/checks/extensions/authorityinfoaccess
[no test files]
? github.com/globalsign/certlint/checks/extensions/authoritykeyid
[no test files]
? github.com/globalsign/certlint/checks/extensions/basicconstraints
[no test files]
? github.com/globalsign/certlint/checks/extensions/crldistributionpoints
[no test files]
? github.com/globalsign/certlint/checks/extensions/ct [no test
files]
? github.com/globalsign/certlint/checks/extensions/extkeyusage [no
test files]
? github.com/globalsign/certlint/checks/extensions/keyusage [no test
files]
? github.com/globalsign/certlint/checks/extensions/nameconstraints
[no test files]
ok github.com/globalsign/certlint/checks/extensions/ocspmuststaple
(cached)
? github.com/globalsign/certlint/checks/extensions/ocspnocheck [no
test files]
? github.com/globalsign/certlint/checks/extensions/pdfrevocation
[no test files]
? github.com/globalsign/certlint/checks/extensions/policyidentifiers
[no test files]
? github.com/globalsign/certlint/checks/extensions/smimecapabilities
[no test files]
? github.com/globalsign/certlint/checks/extensions/subjectaltname
[no test files]
? github.com/globalsign/certlint/checks/extensions/subjectkeyid [no
test files]
ok github.com/globalsign/certlint/errors (cached)
? github.com/globalsign/certlint/examples/ct [no test files]
? github.com/globalsign/certlint/examples/specificchecks [no test
files]
```
* Certchecker: Remove OCSP Must Staple err ignore, fix typos.
This commit removes the explicit ignore for OCSP Must Staple errors that
was added when the upstream `certlint` package didn't understand that
PKIX extension. That problem was resolved and so we can remove the
ignore from `cert-checker`.
This commit also fixes two typos that were fixed upstream and needed to
be reflected in expected error messages in the `certlint` unit test.
* Certchecker: Ignore Certlint CN/SAN == PSL errors.
`globalsign/certlint`, used by `cmd/cert-checker` to vet certs,
improperly flags certificates that have subj CN/SANs equal to a private
entry in the public suffix list as faulty.
This commit adds a regex that will skip errors that match the certlint
PSL error string. Prior to this workaround the addition of a private PSL
entry as a SAN in the `TestCheckCert` test cert fails the test:
```
--- FAIL: TestCheckCert (1.72s)
main_test.go:221: Found unexpected problem 'Certificate subjectAltName
"dev-myqnapcloud.com" equals "dev-myqnapcloud.com" from the public
suffix list'.
```
With the workaround in place, the test passes again.
The upstream `certlint` package doesn't understand the RFC 7633 OCSP
Must Staple PKIX Extension and flags its presence as an error. Until
this is resolved upstream this commit updates `cmd/cert-checker` to
ignore the error.
The `TestCheckCert` unit test is updated to add an unsupported extension
and the OCSP must staple extension to its test cert. Only the
unsupported extension should be flagged as a problem.
Fixes#3020.
In order to write integration tests for some features, especially related to rate limiting, rechecking of CAA, and expiration of authzs, orders, and certs, we need to be able to fake the passage of time in integration tests.
To do so, this change switches out all clock.Default() instances for cmd.Clock(), which can be set manually with the FAKECLOCK environment variable. integration-test.py now starts up all servers once before the main body of tests, with FAKECLOCK set to a date 70 days ago, and does some initial setup for a new integration test case. That test case tries to fetch a 70-day-old authz URL, and expects it to 404.
In order to make this work, I also had to change a number of our test binaries to shut down cleanly in response to SIGTERM. Without that change, stopping the servers between the setup phase and the main tests caused startservers.check() to fail, because some processes exited with nonzero status.
Note: This is an initial stab at things, to prove out the technique. Long-term, I think we will want to use an idiom where test cases are classes that have a number of optional setup phases that may be run at e.g. 70 days prior and 5 days prior. This could help us avoid a proliferation of global state as we add more time-dependent test cases.
This removes the config and code to output to statsd.
- Change `cmd.StatsAndLogging` to output a `Scope`, not a `Statter`.
- Remove the prefixing of component name (e.g. "VA") in front of stats; this was stripped by `autoProm` but now no longer needs to be.
- Delete vendored statsd client.
- Delete `MockStatter` (generated by gomock) and `mocks.Statter` (hand generated) in favor of mocking `metrics.Scope`, which is the interface we now use everywhere.
- Remove a few unused methods on `metrics.Scope`, and update its generated mock.
- Refactor `autoProm` and add `autoRegisterer`, which can be included in a `metrics.Scope`, avoiding global state. `autoProm` now registers everything with the `prometheus.Registerer` it is given.
- Change va_test.go's `setup()` to not return a stats object; instead the individual tests that care about stats override `va.stats` directly.
Fixes#2639, #2733.
In https://groups.google.com/forum/#!topic/mozilla.dev.security.policy/_pSjsrZrTWY, we had a problem with the policy authority configuration, but cert-checker didn't alert about it because it uses the same policy configuration.
This PR adds support for an explicit list of regular expressions used to match forbidden names. The regular expressions are applied after the PA has done its usual validation process in order to act as a defense-in-depth mechanism for cases (.mil, .local, etc) that we know we never want to support, even if the PA thinks they are valid (e.g. due to a policy configuration malfunction).
Initially the forbidden name regexps are:
`^\s*$`,
`\.mil$`,
`\.local$`,
`^localhost$`,
`\.localhost$`,
Additionally, the existing cert-checker.json config in both test/config/ and test/config-next/ was missing the hostnamePolicyFile entry required for operation of cert-checker. This PR adds a hostnamePolicyFile entry pointing at the existing test/hostname-policy.json file. The cert checker can now be used in the dev env with cert-checker -config test/config/cert-checker.json without error.
Resolves#2366
In #2178 we moved to explicit `SELECT` statements using a set of `const`
fields for each type to support db migrations and forward compatibility.
This commit removes the temptation to interpolate queries by providing
convenience `SelectFoo` functions for each type allowing the caller to
provide the `WHERE` clause and arguments.
Resolves#2214.
Fixes#2160.
When we use Gorp's built-in `Get` method, it generates `SELECT *` queries. If we do a migration without a simultaneous change of the data structure, Gorp will subsequently error out when it sees a column in the output of the `SELECT *` which doesn't have a corresponding field in the struct it is trying to marshal. In order to be forward compatible with schema changes, we need to always use `SELECT a, b, c`, where `a`, `b`, and `c` are columns / fields in the current struct.
Updates #1699.
Adds a new package, `features`, which exposes methods to set and check if various internal features are enabled. The implementation uses global state to store the features so that services embedded in another service do not each require their own features map in order to check if something is enabled.
Requires a `boulder-tools` image update to include `golang.org/x/tools/cmd/stringer`.
Boulder issue #2004 describes a panic observed in `getCerts` caused by an index out of range. It appears as though this is caused by a race condition between the initial `SelectOne` lookup for the count of certificates, and the subsequent individual `Select` queries to fetch the Certificates. If the number of eligible certificates changes between these points (e.g. due to certificates expiring) there is a potential that one of the Select calls will return an empty result set. If this happens, then the `lastSerial` update will access an index out of range.
This PR adds an explicit `len` check to the processing loop before the `lastSerial` update. If there are no results returned from the DB query then the loop is broken. This resolves#2004
A test case for the fix was written and included in this PR. The testcase initially caused the out of range panic observed in #2004. After adding the `len` fix in this commit the test began passing without error.
The `regID` parameter in the PA's `WillingToIssue` function was originally used for whitelisting purposes, but is not used any longer. This PR removes it.
* rename, change params, restructure
* I'm wondering how I managed that one
* use a metrics.Scope
* move method to SA, update callers
* rerun goimports
* fix compile error
* revert cmd/shell.go
https://github.com/letsencrypt/boulder/pull/1805
* Delete Policy DB.
This is no longer needed now that we have a JSON policy file.
* Fix tests.
* Revert Dockerfile.
* Fix create_db
* Simplify user addition.
* Fix tests.
* Fix tests
* Review fixes.
https://github.com/letsencrypt/boulder/pull/1773
* Fix all errcheck errors
* Add errcheck to test.sh
* Add a new sa.Rollback method to make handling errors in rollbacks easier.
This also causes a behavior change in the VA. If a HTTP connection is
abruptly closed after serving the headers for a non-200 response, the
reported error will be the read failure instead of the non-200.
- Remove error signatures from log methods. This means fewer places where errcheck will show ignored errors.
- Pull in latest cfssl to be compatible with errorless log messages.
- Reduce the number of message priorities we support to just those we actually use.
- AuditNotice -> AuditInfo
- Remove InfoObject (only one use, switched to Info)
- Remove EmergencyExit and related functions in favor of panic
- Remove SyslogWriter / AuditLogger separate types in favor of a single interface, Logger, that has all the logging methods on it.
- Merge mock log into logger. This allows us to unexport the internals but still override them in the mock.
- Shorten names to be compatible with Go style: New, Set, Get, Logger, NewMock, etc.
- Use a shorter log format for stdout logs.
- Remove "... Starting" log messages. We have better information in the "Versions" message logged at startup.
Motivation: The AuditLogger / SyslogWriter distinction was confusing and exposed internals only necessary for tests. Some components accepted one type and some accepted the other. This made it hard to consistently use mock loggers in tests. Also, the unnecessarily fat interface for AuditLogger made it hard to meaningfully mock out.
* Fix cert-checker panic
Fixes a silly panic in the cert-checker that would've caused it to fail outside of tests, also fixed the test to catch that silliness.
Consolidate initialization of stats and logging from each main.go into cmd
package.
Define a new config parameter, `StdoutLevel`, that determines the maximum log
level that will be printed to stdout. It can be set to 6 to inhibit debug
messages, or 0 to print only emergency messages, or -1 to print no messages at
all.
Remove the existing config parameter `Tag`. Instead, choose the tag from the
basename of the currently running process. Previously all Boulder log messages
had the tag "boulder", but now they will be differentiated by process, like
"boulder-wfe".
Shorten the date format used in stdout logging, and add the current binary's
basename.
Consolidate setup function in audit-logger_test.go.
Note: Most CLI binaries now get their stats and logging from the parameters of
Action. However, a few of our binaries don't use our custom AppShell, and
instead use codegangsta/cli directly. For those binaries, we export the new
StatsAndLogging method from cmd.
Fixes https://github.com/letsencrypt/boulder/issues/852
Currently, the whitelisted registration ID is one that is impossible for the
database to return. Once the partner's registration is in place, we can
deploy a change to it.
Fixes#810
This means after parsing the config file, setting up stats, and dialing the
syslogger. But it is still before trying to initialize the given server. This
means that we are more likely to get version numbers logged for some common
runtime failures.
The WFE test relies on a pre-generated cert. Since there are some sanity checks
on the dates in certs, we were getting errors during the test.
One quick fix is to have those sanity checks rely on RA's clock object, which
can be replaced with a fake for testing. In order to do that, I had to move the
sanity check (MatchesCSR) into the registration authority package, where it
makes more sense anyhow.
I also removed a handful of equality testing functions in objects.go that were
only used by MatchesCSR and whose purpose is better served by reflect.DeepEqual.
This was to avoid having to also move those equality testing functions into the
registration authority.
The ca's TestRevoke was failing occasionally.
The test was saying "has the certificate's OCSPLastUpdated been set to a
time within the last second?" as a way to see if the revocation updated
the OCSPLastUpdated. OCSPLastUpdated was not being set on revocation,
but the test still passed most of the time.
The test still passed most of the time because the creation of the
certificate (which also sets the OCSPLastUpdated) has usually happened
within the last second. So, even without revocation, the OCSPLastUpdated
was set to something in the last second because the test is fast.
Threading a clock.FakeClock through the CA induced the test to fail
consistently. Debugging and threading a FakeClock through the SA caused
changes in times reported but did not fix the test because the
OCSPLastUpdated was simply not being updated. There were not tests for
the sa.MarkCertificateRevoked API that was being called by
ca.RevokeCertificate.
Now the SA has tests for its MarkCertificateRevoked method. It uses a
fake clock to ensure not just that OCSPLastUpdated is set correctly, but
that RevokedDate is, as well. The test also checks for the
CertificateStatus's status and RevocationCode changes.
The SA and CA now use Clocks throughout instead of time.Now() allowing
for more reliable and expansive testing in the future.
The CA had to gain a public Clock field in order for the RA to use the
CertificateAuthorityImpl struct without using its constructor
function. Otherwise, the field would be nil and cause panics in the RA
tests.
The RA tests are similarly also panicking when the CAImpl attempts to
log something with its private, nil-in-those-tests log field but we're
getting "lucky" because the RA tests only cause the CAImpl to log when
they are broken.
There is a TODO there to make the CAImpl's constructor function take
just what it needs to operate instead of taking large config objects and
doing file IO and such. The Clk field should be made private and the log
field filled in for the RA tests.
Fixes#734.