In a handful of places I've nuked old stats which are not used in any alerts or dashboards as they either duplicate other stats or don't provide much insight/have never actually been used. If we feel like we need them again in the future it's trivial to add them back.
There aren't many dashboards that rely on old statsd style metrics, but a few will need to be updated when this change is deployed. There are also a few cases where prometheus labels have been changed from camel to snake case, dashboards that use these will also need to be updated. As far as I can tell no alerts are impacted by this change.
Fixes#4591.
The `orphans` Prometheus `CounterVec` is used to count orphans that
couldn't be confirmed saved by the SA and were queued by the CA.
The `adopted_orphans` `CounterVec` is used to count orphans pulled from
the queue by the CA and successfully integrated through to the SA.
Both counter stats are labelled by "type", e.g. "precert" or "cert".
This avoids needing to send the entire certificate in OCSP generation
RPCs.
Ended up including a few cleanups that made the implementation easier.
Initially I was struggling with how to derive the issuer identification info.
We could just stick the full SPKI hash in certificateStatus, but that takes a
significant amount of space, we could configure unique issuer IDs in the CA
config, but that would require being very careful about keeping the IDs
constant, and never reusing an ID, or we could store issuers in a table in the
database and use that as a lookup table, but that requires figuring out how to
get that info into the table etc. Instead I've just gone with what I found to
be the easiest solution, deriving a stable ID from the cert hash. This means we
don't need to remember to configure anything special and the CA config stays
the same as it is now.
Fixes#4469.
In the process, rename generateOCSPAndStoreCertificate to just
storeCertificate, because that function doesn't generate OCSP anymore;
instead the OCSP is generated (and stored) at precertificate issuance
time.
This also adds the badCSR error type specified by RFC 8555. It is a natural fit for the errors in VerifyCSR that aren't covered by badPublicKey. The web package function for converting a berror to
a problem is updated for the new badCSR error type.
The callers (RA and CA) are updated to return the berrors from VerifyCSR as is instead of unconditionally wrapping them as a berrors.MalformedError instance. Unit/integration tests are updated accordingly.
Resolves#4418
This change adds two tables and two methods in the SA, to store precertificates
and serial numbers.
In the CA, when the feature flag is turned on, we generate a serial number, store it,
sign a precertificate and OCSP, store them, and then return the precertificate. Storing
the serial as an additional step before signing the certificate adds an extra layer of
insurance against duplicate serials, and also serves as a check on database availability.
Since an error storing the serial prevents going on to sign the precertificate, this decreases
the chance of signing something while the database is down.
Right now, neither table has read operations available in the SA.
To make this work, I needed to remove the check for duplicate certificateStatus entry
when inserting a final certificate and its OCSP response. I also needed to remove
an error that can occur when expiration-mailer processes a precertificate that lacks
a final certificate. That error would otherwise have prevented further processing of
expiration warnings.
Fixes#4412
This change builds on #4417, please review that first for ease of review.
The `test/config-next` CA configs are both updated to use `zlint` to lint TBS
pre-certificates with a throw-away key and treat any lint findings >=
`lints.Pass` as an error, blocking the CA from signing the TBS pre-cert with its
private key.
The CA `issuePrecertificateInner` function is updated to specifically catch
linting related errors from CFSSL to marshal the linting findings to the audit
log. A small unit test for this change is included.
The CA `IssueCertificateForPrecertificate` function remains unchanged: the CFSSL
interface that defines `SignFromPrecert` doesn't facilitate linting. We still
lint final certificates post-issuance with `cert-checker` and accept the
possibility there may be some compliance issues that could occur between the
precertificate passing linting and the final certificate being signed.
Resolves https://github.com/letsencrypt/boulder/issues/4255
Precertificate issuance has been the only supported mode for a while now. This
cleans up the remaining flags in the CA code. The same is true of must staple.
This also removes the IssueCertificate RPC call and its corresponding wrappers,
and removes a lot of plumbing in the CA unittests that was used to test the
situation where precertificate issuance was not enabled.
* in boulder-ra we connected to the publisher and created a publisher gRPC client twice for no apparent reason
* in the SA we ignored errors from `getChallenges` in `GetAuthorizations` which could result in a nil challenge being returned in an authorization
Use a boulder error type to indicate duplicate rows instead of a normal untyped error (as gRPC mangles this type of error but understands how to properly handle a boulder error).
Retains the existing logging of orphaned certs until we are confident that this
solution can fully replace it (even then we may want to keep it just for auditing etc).
Fixes#3636.
Fixes#3836.
```
$ ./test.sh
ok github.com/cloudflare/cfssl/api 1.023s coverage: 81.1% of statements
ok github.com/cloudflare/cfssl/api/bundle 1.464s coverage: 87.2% of statements
ok github.com/cloudflare/cfssl/api/certadd 16.766s coverage: 86.8% of statements
ok github.com/cloudflare/cfssl/api/client 1.062s coverage: 51.9% of statements
ok github.com/cloudflare/cfssl/api/crl 1.075s coverage: 75.0% of statements
ok github.com/cloudflare/cfssl/api/gencrl 1.038s coverage: 72.5% of statements
ok github.com/cloudflare/cfssl/api/generator 1.478s coverage: 33.3% of statements
ok github.com/cloudflare/cfssl/api/info 1.085s coverage: 84.1% of statements
ok github.com/cloudflare/cfssl/api/initca 1.050s coverage: 90.5% of statements
ok github.com/cloudflare/cfssl/api/ocsp 1.114s coverage: 93.8% of statements
ok github.com/cloudflare/cfssl/api/revoke 3.063s coverage: 75.0% of statements
ok github.com/cloudflare/cfssl/api/scan 2.988s coverage: 62.1% of statements
ok github.com/cloudflare/cfssl/api/sign 2.680s coverage: 83.3% of statements
ok github.com/cloudflare/cfssl/api/signhandler 1.114s coverage: 26.3% of statements
ok github.com/cloudflare/cfssl/auth 1.010s coverage: 68.2% of statements
ok github.com/cloudflare/cfssl/bundler 22.078s coverage: 84.5% of statements
ok github.com/cloudflare/cfssl/certdb/dbconf 1.013s coverage: 84.2% of statements
ok github.com/cloudflare/cfssl/certdb/ocspstapling 1.302s coverage: 69.2% of statements
ok github.com/cloudflare/cfssl/certdb/sql 1.223s coverage: 70.5% of statements
ok github.com/cloudflare/cfssl/cli 1.014s coverage: 62.5% of statements
ok github.com/cloudflare/cfssl/cli/bundle 1.011s coverage: 0.0% of statements [no tests to run]
ok github.com/cloudflare/cfssl/cli/crl 1.086s coverage: 57.8% of statements
ok github.com/cloudflare/cfssl/cli/gencert 7.927s coverage: 83.6% of statements
ok github.com/cloudflare/cfssl/cli/gencrl 1.064s coverage: 73.3% of statements
ok github.com/cloudflare/cfssl/cli/gencsr 1.058s coverage: 70.3% of statements
ok github.com/cloudflare/cfssl/cli/genkey 2.718s coverage: 70.0% of statements
ok github.com/cloudflare/cfssl/cli/ocsprefresh 1.077s coverage: 64.3% of statements
ok github.com/cloudflare/cfssl/cli/revoke 1.033s coverage: 88.2% of statements
ok github.com/cloudflare/cfssl/cli/scan 1.014s coverage: 36.0% of statements
ok github.com/cloudflare/cfssl/cli/selfsign 2.342s coverage: 73.2% of statements
ok github.com/cloudflare/cfssl/cli/serve 1.076s coverage: 38.2% of statements
ok github.com/cloudflare/cfssl/cli/sign 1.070s coverage: 54.8% of statements
ok github.com/cloudflare/cfssl/cli/version 1.011s coverage: 100.0% of statements
ok github.com/cloudflare/cfssl/cmd/cfssl 1.028s coverage: 0.0% of statements [no tests to run]
ok github.com/cloudflare/cfssl/cmd/cfssljson 1.012s coverage: 3.4% of statements
ok github.com/cloudflare/cfssl/cmd/mkbundle 1.011s coverage: 0.0% of statements [no tests to run]
ok github.com/cloudflare/cfssl/config 1.023s coverage: 67.7% of statements
ok github.com/cloudflare/cfssl/crl 1.054s coverage: 68.3% of statements
ok github.com/cloudflare/cfssl/csr 8.473s coverage: 89.6% of statements
ok github.com/cloudflare/cfssl/errors 1.014s coverage: 79.6% of statements
ok github.com/cloudflare/cfssl/helpers 1.216s coverage: 80.6% of statements
ok github.com/cloudflare/cfssl/helpers/derhelpers 1.017s coverage: 48.0% of statements
ok github.com/cloudflare/cfssl/helpers/testsuite 7.826s coverage: 65.8% of statements
ok github.com/cloudflare/cfssl/initca 151.314s coverage: 73.2% of statements
ok github.com/cloudflare/cfssl/log 1.013s coverage: 59.3% of statements
ok github.com/cloudflare/cfssl/multiroot/config 1.258s coverage: 77.4% of statements
ok github.com/cloudflare/cfssl/ocsp 1.353s coverage: 75.1% of statements
ok github.com/cloudflare/cfssl/revoke 1.149s coverage: 75.0% of statements
ok github.com/cloudflare/cfssl/scan 1.023s coverage: 1.1% of statements
skipped github.com/cloudflare/cfssl/scan/crypto/md5
skipped github.com/cloudflare/cfssl/scan/crypto/rsa
skipped github.com/cloudflare/cfssl/scan/crypto/sha1
skipped github.com/cloudflare/cfssl/scan/crypto/sha256
skipped github.com/cloudflare/cfssl/scan/crypto/sha512
skipped github.com/cloudflare/cfssl/scan/crypto/tls
ok github.com/cloudflare/cfssl/selfsign 1.098s coverage: 70.0% of statements
ok github.com/cloudflare/cfssl/signer 1.020s coverage: 19.4% of statements
ok github.com/cloudflare/cfssl/signer/local 4.886s coverage: 77.9% of statements
ok github.com/cloudflare/cfssl/signer/remote 2.500s coverage: 70.0% of statements
ok github.com/cloudflare/cfssl/signer/universal 2.228s coverage: 67.7% of statements
ok github.com/cloudflare/cfssl/transport 1.012s
ok github.com/cloudflare/cfssl/transport/ca/localca 1.046s coverage: 94.9% of statements
ok github.com/cloudflare/cfssl/transport/kp 1.050s coverage: 37.1% of statements
ok github.com/cloudflare/cfssl/ubiquity 1.037s coverage: 88.3% of statements
ok github.com/cloudflare/cfssl/whitelist 3.519s coverage: 100.0% of statements
...
$ go test ./... (master✱)
ok golang.org/x/crypto/acme 2.782s
ok golang.org/x/crypto/acme/autocert 2.963s
? golang.org/x/crypto/acme/autocert/internal/acmetest [no test files]
ok golang.org/x/crypto/argon2 0.047s
ok golang.org/x/crypto/bcrypt 4.694s
ok golang.org/x/crypto/blake2b 0.056s
ok golang.org/x/crypto/blake2s 0.050s
ok golang.org/x/crypto/blowfish 0.015s
ok golang.org/x/crypto/bn256 0.460s
ok golang.org/x/crypto/cast5 4.204s
ok golang.org/x/crypto/chacha20poly1305 0.560s
ok golang.org/x/crypto/cryptobyte 0.014s
? golang.org/x/crypto/cryptobyte/asn1 [no test files]
ok golang.org/x/crypto/curve25519 0.025s
ok golang.org/x/crypto/ed25519 0.073s
? golang.org/x/crypto/ed25519/internal/edwards25519 [no test files]
ok golang.org/x/crypto/hkdf 0.012s
ok golang.org/x/crypto/internal/chacha20 0.047s
ok golang.org/x/crypto/internal/subtle 0.011s
ok golang.org/x/crypto/md4 0.013s
ok golang.org/x/crypto/nacl/auth 9.226s
ok golang.org/x/crypto/nacl/box 0.016s
ok golang.org/x/crypto/nacl/secretbox 0.012s
ok golang.org/x/crypto/nacl/sign 0.012s
ok golang.org/x/crypto/ocsp 0.047s
ok golang.org/x/crypto/openpgp 8.872s
ok golang.org/x/crypto/openpgp/armor 0.012s
ok golang.org/x/crypto/openpgp/clearsign 16.984s
ok golang.org/x/crypto/openpgp/elgamal 0.013s
? golang.org/x/crypto/openpgp/errors [no test files]
ok golang.org/x/crypto/openpgp/packet 0.159s
ok golang.org/x/crypto/openpgp/s2k 7.597s
ok golang.org/x/crypto/otr 0.612s
ok golang.org/x/crypto/pbkdf2 0.045s
ok golang.org/x/crypto/pkcs12 0.073s
ok golang.org/x/crypto/pkcs12/internal/rc2 0.013s
ok golang.org/x/crypto/poly1305 0.016s
ok golang.org/x/crypto/ripemd160 0.034s
ok golang.org/x/crypto/salsa20 0.013s
ok golang.org/x/crypto/salsa20/salsa 0.013s
ok golang.org/x/crypto/scrypt 0.942s
ok golang.org/x/crypto/sha3 0.140s
ok golang.org/x/crypto/ssh 0.939s
ok golang.org/x/crypto/ssh/agent 0.529s
ok golang.org/x/crypto/ssh/knownhosts 0.027s
ok golang.org/x/crypto/ssh/terminal 0.016s
ok golang.org/x/crypto/tea 0.010s
ok golang.org/x/crypto/twofish 0.019s
ok golang.org/x/crypto/xtea 0.012s
ok golang.org/x/crypto/xts 0.016s
```
Things removed:
* features.EmbedSCTs (and all the associated RA/CA/ocsp-updater code etc)
* ca.enablePrecertificateFlow (and all the associated RA/CA code)
* sa.AddSCTReceipt and sa.GetSCTReceipt RPCs
* publisher.SubmitToCT and publisher.SubmitToSingleCT RPCs
Fixes#3755.
The Boulder orphan-finder command uses the SA's AddCertificate RPC to add orphaned certificates it finds back to the DB. Prior to this commit this RPC always set the core.Certificate.Issued field to the
current time. For the orphan-finder case this meant that the Issued date would incorrectly be set to when the certificate was found, not when it was actually issued. This could cause cert-checker to alarm based on the unusual delta between the cert NotBefore and the core.Certificate.Issued value.
This PR updates the AddCertificate RPC to accept an optional issued timestamp in the request arguments. In the SA layer we address deployability concerns by setting a default value of the current time when none is explicitly provided. This matches the classic behaviour and will let an old RA communicate with a new SA.
This PR updates the orphan-finder to provide an explicit issued time to sa.AddCertificate. The explicit issued time is calculated using the found certificate's NotBefore and the configured backdate.
This lets the orphan-finder set the true issued time in the core.Certificate object, avoiding any cert-checker alarms.
Resolves#3624
Adds SCT embedding to the certificate issuance flow. When a issuance is requested a precertificate (the requested certificate but poisoned with the critical CT extension) is issued and submitted to the required CT logs. Once the SCTs for the precertificate have been collected a new certificate is issued with the poison extension replace with a SCT list extension containing the retrieved SCTs.
Fixes#2244, fixes#3492 and fixes#3429.
We removed most of the component-specific configs out of cmd a long time ago. We
left CAConfig in, because it is used directly in the NewCertificateAuthority
constructor. That means that moving CAConfig into cmd/boulder-ca would have
resulted in a circular dependency.
Eventually we probably want to decompose CAConfig so it's a set of arguments to
NewCertificateAuthority, but as a short term improvement, move the config into
its own package to break the dependency. This has the advantage of removing a
couple of big dependencies from cmd.
Add a logging statement that fires when a remote VA fail causes
overall failure. Also change remoteValidationFailures into a
counter that counts the same thing, instead of a histogram. Since
the histogram had the default bucket sizes, it failed to collect
what we needed, and produced more metrics than necessary.
Stub out IssueCertificateForPrecertificate() enough so that we can continue with the PRs that implement & test it in parallel with PRs that implement and test the calling side (via mock implementations of the CA side).
The notBefore date in certificates is set based on the current system time,
not based on ca.clk. Work around that problem in the issuance tests by
syncing the test ca.clk with the system time. This doesn't affect any
current tests but is required for upcoming tests to work correctly.
* CA: Stub IssuePrecertificate gPRC method.
* CA: Implement IssuePrecertificate.
* CA: Test Precertificate flow in TestIssueCertificate().
move verification of certificate storage
IssuePrecertificate tests
Add CT precertificate poison extension to CFSSL whitelist.
CFSSL won't allow us to add an extension to a certificate unless that
certificate is in the whitelist.
According to its documentation, "Extensions requested in the CSR are
ignored, except for those processed by ParseCertificateRequest (mainly
subjectAltName)." Still, at least we need to add tests to make sure a
poison extension in a CSR isn't copied into the final certificate.
This allows us to avoid making invasive changes to CFSSL.
* CA: Test precertificate issuance in TestInvalidCSRs().
* CA: Only support IssuePrecertificate() if it is explicitly enabled.
* CA: Test that we produce CT poison extensions in the valid form.
The poison extension must be critical in order to work correctly. It probably wouldn't
matter as much what the value is, but the spec requires the value to be ASN.1 NULL, so
verify that it is.
The test previously used an invalid encoding of the CT poison extension
(the value was empty, but a valid CT poison extension has a NULL value).
In preparation for testing specifically how the CT poison extension is
handled, change the test to use a different extension instead.
Split the profile issuance tests such that there is one call to IssueCertificate per test, like
the other certificate issuance tests. This will make it easier to later move the calls to
IssueCertificate() into TestIssueCertificate(), which will make it much easier to test the
precertificate-based flow in addition to the current issuance flow.
Take a step towards enabling the testing of precertificate issuance by
enabling the logic in these tests to be used for testing all forms
of Certificate and Precertificate issuance.
An early design mistake meant that some fields of our services were exported
unnecessarily. In particular, fields storing handles of other services (e.g.
"SA" or "PA") were exported. This introduces the possibility of race conditions,
though in practice these fields are set at startup and never modified
concurrently.
We'd like to go through our codebase and change these all to unexported fields,
set at construction time. This is one step in that process.
This is a step towards the long-term goal of eliminating wrappers and a step
towards the short-term goal of making it easier to refactor ca/ca_test.go to
add testing of precertificate-based issuance.
This will make it easier to add new tests of this form and will also
make it easier to adapt the tests to also test the precertificate +
certificate issuance flow.
Remove the use of mocks for stats in ca_test.go in order to make refactoring
those tests easier. To do so, switch to the same pattern used by the
signature metrics.
Following up on #2752, we don't need to use global vars for our Prometheus stats. We already have a custom registry plumbed through using Scope objects. In this PR, expose the MustRegister method of that registry through the Scope interface, and move existing global vars to be fields of objects. This should improve testability somewhat.
Note that this has a bit of an unfortunate side effect: two instances of the same stats-using class (e.g. VA) can't use the same Scope object, because their MustRegister calls will conflict. In practice this is fine since we never instantiate duplicates of the the classes that use stats, but it's something we should keep an eye on.
Updates #2733
Generate first OCSP response in ca.IssueCertificate instead of ocsp-updater.newCertificateTick
if features.GenerateOCSPEarly is enabled. Adds a new field to the sa.AddCertiifcate RPC for
the OCSP response and only adds it to the certificate status + sets ocspLastUpdated if it is a
non-empty slice. ocsp-updater.newCertificateTick stays the same so we can catch certificates
that were successfully signed + stored but a OCSP response couldn't be generated (for whatever
reason).
Fixes#2477.
This patch removes all usages of the `core.XXXError` and almost all usages of `probs` outside of the WFE and VA and replaces them with a unified internal error type. Since the VA uses `probs.ProblemDetails` quite extensively in challenges, and currently stores them in the DB I've saved this change for another change (it'll also require a migration). Since `ProblemDetails` should only ever be exposed to end-users all of its related logic should be moved into the `WFE` but since it still needs to be exposed to the VA and SA I've left it in place for now.
The new internal `errors` package offers the same convenience functions as `probs` does as well as a new simpler type testing method. A few small changes have also been made to error messages, mainly adding the library and function name to internal server errors for easier debugging (i.e. where a number of functions return the exact same errors and there is no other way to distinguish which method threw the error).
Also adds proper encoding of internal errors transferred over gRPC (the current encoding scheme is kept for `core` and `probs` errors since it'll be ideally be removed after we deploy this and follow-up changes) using `grpc/metadata` instead of the gRPC status codes.
Fixes#2507. Updates #2254 and #2505.
Add a super basic counter for certificate and OCSP signatures so we have a slightly less noisy idea of our current HSM signing performance and where it is going.
Fixes#2438.
Pulls in logging improvements in OCSP Responder and the CT client, plus a handful of API changes. Also, the CT client verifies responses by default now.
This change includes some Boulder diffs to accommodate the API changes.
Updates #1699.
Adds a new package, `features`, which exposes methods to set and check if various internal features are enabled. The implementation uses global state to store the features so that services embedded in another service do not each require their own features map in order to check if something is enabled.
Requires a `boulder-tools` image update to include `golang.org/x/tools/cmd/stringer`.
Part of #2080.
This change vendors `crypto/x509`, `crypto/x509/pkix`, and `encoding/asn1` from 1d5f6a765d. That commit is a direct child of the Go 1.5.4 release tag, so it contains the same code as the current Go version we are using. In that commit I rewrote imports in those packages so they depend on each other internally rather than calling out to the standard library, which would cause type disagreements.
I changed the imports in each place where we're parsing CSRs, and imported under a different name `oldx509`, both to avoid collisions and make it clear what's going on. Places that only use `x509` to parse certificates are not changed, and will use the current standard library.
This will unblock us from moving to Go 1.6, and subsequently Go 1.7.