Adds SCT embedding to the certificate issuance flow. When a issuance is requested a precertificate (the requested certificate but poisoned with the critical CT extension) is issued and submitted to the required CT logs. Once the SCTs for the precertificate have been collected a new certificate is issued with the poison extension replace with a SCT list extension containing the retrieved SCTs.
Fixes#2244, fixes#3492 and fixes#3429.
Right now we check safe browsing at new-authz time, which introduces a possible
external dependency when calling new-authz. This is usually fine, since most safe
browsing checks can be satisfied locally, but when requests have to go external,
it can create variance in new-authz timing.
Fixes#3491.
This code was never enabled in production. Our original intent was to
ship this as part of the ACMEv2 API. Before that could happen flaws were
identified in TLS-SNI-01|02 that resulted in TLS-SNI-02 being removed
from the ACME protocol. We won't ever be enabling this code and so we
might as well remove it.
Add a set of logs which will be submitted to but not relied on for their SCTs,
this allows us to test submissions to a particular log or submit to a log which
is not yet approved by a browser/root program.
Also add a feature which stops cancellations of remaining submissions when racing
to get a SCT from a group of logs.
Additionally add an informational log that always times out in config-next.
Fixes#3464 and fixes#3465.
Removes usage of the `EnforceChallengeDisable` feature, the feature itself is not removed as it is still configured in staging/production, once that is fixed I'll submit another PR removing the actual flag.
This keeps the behavior that when authorizations are retrieved from the SA they have their challenges populated, because that seems to make the most sense to me? It also retains TLS re-validation.
Fixes#3441.
Boulder is fairly noisy about gRPC connection errors. This is a mixed
blessing: Our gRPC configuration will try to reconnect until it hits
an RPC deadline, and most likely eventually succeed. In that case,
we don't consider those to really be errors. However, in cases where
a connection is repeatedly failing, we'd like to see errors in the
logs about connection failure, rather than "deadline exceeded." So
we want to keep logging of gRPC errors.
However, right now we get a lot of these errors logged during
integration tests. They make the output hard to read, and may disguise
more serious errors. So we'd like to avoid causing such errors in
normal integration test operation.
This change reorders the startup of Boulder components by their gRPC
dependencies, so everything's backend is likely to be up and running
before it starts. It also reverses that order for clean shutdowns,
and waits for each process to exit before signalling the next one.
With these changes, I still got connection errors. Taking listenbuddy
out of the gRPC path fixed them. I believe the issue is that
listenbuddy is not a truly transparent proxy. In particular, it
accepts an inbound TCP connection before opening an outbound TCP
connection. If opening that outbound connection results in "connection
refused," it closes the inbound connection. That means gRPC sees a
"connection closed" (or "connection reset"?) rather than "connection
refused". I'm guessing it handles those cases differently, explaining
the different error results.
We've been using listenbuddy to trigger disconnects while Boulder is
running, to ensure that gRPC's reconnect code works. I think we can
probably rely on gRPC's reconnect to work. The initial problem that
led us to start testing this was a configuration problem; now that
we have the configuration we want, we should be fine and don't need
to keep testing reconnects on every integration test run.
For the upcoming SCT embedding changes, it will be useful to have a CT test server that blocks for nontrivial amounts of time before responding. This change introduces a config file for `ct-test-srv` that can be used to set up multiple "personalities" on various ports. Each personality can have a "latencySchedule" that determines how long it will sleep before servicing responding to a submission.
This change also introduces two new "personalities" on :4510 and :4511, plus configures CTLogGroups in the RA. Having four CT log personalities allows us to simulate two nontrivial log groups.
Note: This triggers Publisher to emit audit errors on timed-out submissions. We may want to make Publisher not treat those as errors, and instead only log an error if a whole log group fails.
This commit implements a mapping from certificate AIA Issuer URL to PEM
encoded certificate chain. GET's to the V2 Certificate endpoint will
return a full PEM encoded certificate chain in addition to the leaf cert
using the AIA issuer URL of the leaf cert and the configured mapping.
The boulder-wfe2 command builds the chain mapping by reading the
"wfe" config section's 'certificateChains" field, specifying a list
of file paths to PEM certificates for each AIA issuer URL. At startup
the PEM file contents are ready, verified and separated by a newline.
The resulting populated AIA issuer URL -> PEM cert chain mapping is
given to the WFE for use with the Certificate endpoint.
Resolves#3291
The WFE2 doesn't check any of the feature flags that are configured in
the `test/config/wfe2.json` and `test/config-next/wfe2.json` config
files - we default to acting as if all new features are enabled for the
V2 work. This commit removes the flags from the config to avoid
confusion or expectations that changing the config will disable the
features.
This change adds a feature flag, TLSSNIRevalidation. When it is enabled, Boulder
will create new authorization objects with TLS-SNI challenges if the requesting
account has issued a certificate with the relevant domain name, and was the most
recent account to do so*. This setting overrides the configured list of
challenges in the PolicyAuthority, so even if TLS-SNI is disabled in general, it
will be enabled for revalidation.
Note that this interacts with EnforceChallengeDisable. Because
EnforceChallengeDisable causes additional checked at validation time and at
issuance time, we need to update those two places as well. We'll send a
follow-up PR with that.
*We chose to make this work only for the most recent account to issue, even if
there were overlapping certificates, because it significantly simplifies the
database access patterns and should work for 95+% of cases.
Note that this change will let an account revalidate and reissue for a domain
even if the previous issuance on that account used http-01 or dns-01. This also
simplifies implementation, and fits within the intent of the mitigation plan: If
someone previously issued for a domain using http-01, we have high confidence
that they are actually the owner, and they are not going to "steal" the domain
from themselves using tls-sni-01.
Also note: This change also doesn't work properly with ReusePendingAuthz: true.
Specifically, if you attempted issuance in the last couple days and failed
because there was no tls-sni challenge, you'll still have an http-01 challenge
lying around, and we'll reuse that; then your client will fail due to lack of
tls-sni challenge again.
This change was joint work between @rolandshoemaker and @jsha.
This updates the PA component to allow authorization challenge types that are globally disabled if the account ID owning the authorization is on a configured whitelist for that challenge type.
Previously, there was a disagreement between WFE and CA as to what the correct
issuer certificate was. Consolidate on test-ca2.pem (h2ppy h2cker fake CA).
Also, the CA configs contained an outdated entry for "IssuerCert", which was not
being used: The CA configs now use an "Issuers" array to allow signing by
multiple issuer certificates at once (for instance when rolling intermediates).
Removed this outdated entry, and the config code for CA to load it. I've
confirmed these changes match what is currently in production.
Added an integration test to check for this problem in the future.
Fixes#3309, thanks to @icing for bringing the issue to our attention!
This also includes changes from #3321 to clarify certificates for WFE.
- Encode certificate as PEM.
- Use lowercase for field names.
- Use termsOfServiceAgreed instead of Agreement
- Use a different ToS URL for v2 that points at the v2 HTTPS port.
Resolves#3280
This PR implements issuance for wildcard names in the V2 order flow. By policy, pending authorizations for wildcard names only receive a DNS-01 challenge for the base domain. We do not re-use authorizations for the base domain that do not come from a previous wildcard issuance (e.g. a normal authorization for example.com turned valid by way of a DNS-01 challenge will not be reused for a *.example.com order).
The wildcard prefix is stripped off of the authorization identifier value in two places:
When presenting the authorization to the user - ACME forbids having a wildcard character in an authorization identifier.
When performing validation - We validate the base domain name without the *. prefix.
This PR is largely a rewrite/extension of #3231. Instead of using a pseudo-challenge-type (DNS-01-Wildcard) to indicate an authorization & identifier correspond to the base name of a wildcard order name we instead allow the identifier to take the wildcard order name with the *. prefix.
This PR changes the VA's singleDialTimeout value from 5 * time.Second to 10 * time.Second. This will give slower servers a better chance to respond, especially for the multi-VA case where n requests arrive ~simultaneously.
This PR also bumps the RA->VA timeout by 5s and the WFE->RA timeout by 5s to accommodate the increased dial timeout. I put this in a separate commit in case we'd rather deal with this separately.
Makes a couple of changes:
* Change `SubmitToCT` to make submissions to each log in parallel instead of in serial, this prevents a single slow log from eating up the majority of the deadline and causing submissions to other logs to fail
* Remove the 'submissionTimeout' field on the publisher since it is actually bounded by the gRPC timeout as is misleading
* Add a timeout to the CT clients internal HTTP client so that when log servers hang indefinitely we actually do retries instead of just using the entire submission deadline. Currently set at 2.5 minutes
Fixes#3218.
* Remove non-TLS support from mailer entirely
* Add a config option for trusted roots in expiration-mailer. If unset, it defaults to the system roots, so this does not need to be set in production.
* Use TLS in mail-test-srv, along with an internal root and localhost certificates signed by that root.
Go's default is 2: https://golang.org/src/database/sql/sql.go#L686.
Graphs show we are opening 100-200 fresh connections per second on the SA.
Changing this default should reduce that a lot, which should reduce load on both
the SA and MariaDB. This should also improve latency, since every new TCP
connection adds a little bit of latency.
We used to use TCP because we would request DNSSEC records from Unbound, and
they would always cause truncated records when present. Now that we no longer
request those (#2718), we can use UDP. This is better because the TCP serving
paths in Unbound are likely less thoroughly tested, and not optimized for high
load. In particular this may resolve some availability problems we've seen
recently when trying to upgrade to a more recent Unbound.
Note that this only affects the Boulder->Unbound path. The Unbound->upstream
path is already UDP by default (with TCP fallback for truncated ANSWERs).
This commit adds CAA `issue` paramter parsing and the `challenge` parameter to permit a single challenge type only. By setting `challenge=dns-01`, the nameserver keeps control over every issued certificate.
This implements the pre-erratum 5065 version of CAA, behind a feature flag.
This involved refactoring DNSClient.LookupCAA to return a list of CNAMEs in addition to the CAA records, and adding an alternate lookuper that does tree-climbing on single-depth aliases.
Fixes#2889.
VA now implements two gRPC services: VA and CAA. These both run on the same port, but this allows implementation of the IsCAAValid RPC to skip using the gRPC wrappers, and makes it easier to potentially separate the service into its own package in the future.
RA.NewCertificate now checks the expiration times of authorizations, and will call out to VA to recheck CAA for those authorizations that were not validated recently enough.
The existing ReusePendingAuthz implementation had some bugs:
It would recycle deactivated authorizations, which then couldn't be fulfilled. (#2840)
Since it was implemented in the SA, it wouldn't get called until after the RA checks the Pending Authorizations rate limit. Which means it wouldn't fulfill its intended purpose of making accounts less likely to get stuck in a Pending Authorizations limited state. (#2831)
This factors out the reuse functionality, which used to be inside an "if" statement in the SA. Now the SA has an explicit GetPendingAuthorization RPC, which gets called from the RA before calling NewPendingAuthorization. This happens to obsolete #2807, by putting the recycling logic for both valid and pending authorizations in the RA.
This PR is the initial duplication of the WFE to create a WFE2
package. The rationale is briefly explained in `wfe2/README.md`.
Per #2822 this PR only lays the groundwork for further customization
and deduplication. Presently both the WFE and WFE2 are identical except
for the following configuration differences:
* The WFE offers HTTP and HTTPS on 4000 and 4430 respectively, the WFE2
offers HTTP on 4001 and 4431.
* The WFE has a debug port on 8000, the WFE2 uses the next free "8000
range port" and puts its debug service on 8013
Resolves https://github.com/letsencrypt/boulder/issues/2822
Adds basic multi-path validation functionality. A new method `performRemoteValidation` is added to `boulder-va` which is called if it is configured with a list of remote VA gRPC addresses. In this initial implementation the remote VAs are only used to check the validation result of the main VA, if all of the remote validations succeed but the local validation failed, the overall validation will still fail. Remote VAs use the exact same code as the local VA to perform validation. If the local validation succeeds then a configured quorum of the remote VA successes must be met in order to fully complete the validation.
This implementation assumes that metrics are collected from the remote VAs in order to have visibility into their individual validation latencies etc.
Fixes#2621.
As described in Boulder issue #2800 the implementation of the SA's
`countCertificates` function meant that the renewal exemption for the
Certificates Per Domain rate limit was difficult to work with. To
maximize allotted certificates clients were required to perform all new
issuances first, followed by the "free" renewals. This arrangement was
difficult to coordinate.
In this PR `countCertificates` is updated such that renewals are
excluded from the count reliably. To do so the SA takes the serials it
finds for a given domain from the issuedNames table and cross references
them with the FQDN sets it can find for the associated serials. With the
FQDN sets a second query is done to find all the non-renewal FQDN sets
for the serials, giving a count of the total non-renewal issuances to
use for rate limiting.
Resolves#2800
Adds a basic truncated modulus hash check for RSA keys that can be used to check keys against the Debian `{openssl,openssh,openvpn}-blacklist` lists of weak keys generated during the [Debian weak key incident](https://wiki.debian.org/SSLkeys).
Testing is gated on adding a new configuration key to the WFE, RA, and CA configs which contains the path to a directory which should contain the weak key lists.
Fixes#157.
This removes the config and code to output to statsd.
- Change `cmd.StatsAndLogging` to output a `Scope`, not a `Statter`.
- Remove the prefixing of component name (e.g. "VA") in front of stats; this was stripped by `autoProm` but now no longer needs to be.
- Delete vendored statsd client.
- Delete `MockStatter` (generated by gomock) and `mocks.Statter` (hand generated) in favor of mocking `metrics.Scope`, which is the interface we now use everywhere.
- Remove a few unused methods on `metrics.Scope`, and update its generated mock.
- Refactor `autoProm` and add `autoRegisterer`, which can be included in a `metrics.Scope`, avoiding global state. `autoProm` now registers everything with the `prometheus.Registerer` it is given.
- Change va_test.go's `setup()` to not return a stats object; instead the individual tests that care about stats override `va.stats` directly.
Fixes#2639, #2733.
This commit adds the acme draft-02+ optional "meta" element for the
/directory response. Presently we only include the optional
"terms-of-service" URL. Whether the meta entry is included is controlled
by two factors:
1. The state of the "DirectoryMeta" feature flag, which defaults to
off
2. Whether the client advertises the UA we know to be intolerant of
new directory entries.
The TestDirectory unit test is updated to test both states of the flag
and the UA detection.
This PR introduces a new feature flag "IPv6First".
When the "IPv6First" feature is enabled the VA's HTTP dialer and TLS SNI
(01 and 02) certificate fetch requests will attempt to automatically
retry when the initial connection was to IPv6 and there is an IPv4
address available to retry with.
This resolves https://github.com/letsencrypt/boulder/issues/2623
Prior to this PR if a domain was an exact match to a public suffix
list entry the certificates per name rate limit was applied based on the
count of certificates issued for that exact name and all of its
subdomains.
This PR introduces an exception such that exact public suffix
matches correctly have the certificate per name rate limit applied based
on only exact name matches.
In order to accomplish this a new RPC is added to the SA
`CountCertificatesByExactNames`. This operates similar to the existing
`CountCertificatesByNames` but does *not* include subdomains in the
count, only exact matches to the names provided. The usage of this new
RPC is feature flag gated behind the "CountCertificatesExact" feature flag.
The RA unit tests are updated to test the new code paths both with and
without the feature flag enabled.
Resolves#2681
Deletes github.com/streadway/amqp and the various RabbitMQ setup tools etc. Changes how listenbuddy is used to proxy all of the gRPC client -> server connections so we test reconnection logic.
+49 -8,221 😁Fixes#2640 and #2562.
Generate first OCSP response in ca.IssueCertificate instead of ocsp-updater.newCertificateTick
if features.GenerateOCSPEarly is enabled. Adds a new field to the sa.AddCertiifcate RPC for
the OCSP response and only adds it to the certificate status + sets ocspLastUpdated if it is a
non-empty slice. ocsp-updater.newCertificateTick stays the same so we can catch certificates
that were successfully signed + stored but a OCSP response couldn't be generated (for whatever
reason).
Fixes#2477.
Adds a daemon mode to `expiration-mailer` that is triggered by using the flag `--daemon` in order to follow deployability guidelines. If the `--daemon` flag is used the `mailer.runPeriod` config field is checked for a tick duration, if the value is `0` it exits.
Super lightweight implementation, OCSP-Updater has some custom ticker code which we use to do fancy things when the method being invoked in the loop takes longer expected, but that isn't necessary here.
Fixes#2617.
I think these are all the necessary changes to implement TLS-SNI-02 validations, according to the section 7.3 of draft 05:
https://tools.ietf.org/html/draft-ietf-acme-acme-05#section-7.3
I don't have much experience with this code, I'll really appreciate your feedback.
Signed-off-by: David Calavera <david.calavera@gmail.com>
The NotAfter and IsExpired fields on the certificateStatus table
have been migrated in staging & production. Similarly the
CertStatusOptimizationsMigrated feature flag has been turned on after
a successful backfill operation. We have confirmed the optimization is
working as expected and can now clean out the duplicated v1 and v2
models, and the feature flag branching. The notafter-backfill command
is no longer useful and so this commit also cleans it out of the repo.
Note: Some unit tests were sidestepping the SA and inserting
certificateStatus rows explicitly. These tests had to be updated to
set the NotAfter field in order for the queries used by the
ocsp-updater and the expiration-mailer to perform the way the tests
originally expected.
Resolves#2530
In order to provide the correct issuer certificate for older certificates after an issuer certificate rollover or when using multiple issuer certificates (e.g. RSA and ECDSA), use the AIA CA Issuer URL embedded in the certificate for the rel="up" link served by WFE. This behaviour is gated behind the UseAIAIssuerURL feature, which defaults to false.
To prevent MitM vulnerabilities in cases where the AIA URL is HTTP-only, it is upgraded to HTTPS.
This also adds a test for the issuer URL returned by the /acme/cert endpoint. wfe/test/178.{crt,key} were regenerated to add the AIA extension required to pass the test.
/acme/cert was changed to return an absolute URL to the issuer endpoint (making it consistent with /acme/new-cert).
Fixes#1663
Based on #1780
This PR has three primary contributions:
1. The existing code for using the V4 safe browsing API introduced in #2446 had some bugs that are fixed in this PR.
2. A gsb-test-srv is added to provide a mock Google Safebrowsing V4 server for integration testing purposes.
3. A short integration test is added to test end-to-end GSB lookup for an "unsafe" domain.
For 1) most notably Boulder was assuming the new V4 library accepted a directory for its database persistence when it instead expects an existing file to be provided. Additionally the VA wasn't properly instantiating feature flags preventing the V4 api from being used by the VA.
For 2) the test server is designed to have a fixed set of "bad" domains (Currently just honest.achmeds.discount.hosting.com). When asked for a database update by a client it will package the list of bad domains up & send them to the client. When the client is asked to do a URL lookup it will check the local database for a matching prefix, and if found, perform a lookup against the test server. The test server will process the lookup and increment a count for how many times the bad domain was asked about.
For 3) the Boulder startservers.py was updated to start the gsb-test-srv and the VA is configured to talk to it using the V4 API. The integration test consists of attempting issuance for a domain pre-configured in the gsb-test-srv as a bad domain. If the issuance succeeds we know the GSB lookup code is faulty. If the issuance fails, we check that the gsb-test-srv received the correct number of lookups for the "bad" domain and fail if the expected isn't reality.
Notes for reviewers:
* The gsb-test-srv has to be started before anything will use it. Right now the v4 library handles database update request failures poorly and will not retry for 30min. See google/safebrowsing#44 for more information.
* There's not an easy way to test for "good" domain lookups, only hits against the list. The design of the V4 API is such that a list of prefixes is delivered to the client in the db update phase and if the domain in question matches no prefixes then the lookup is deemed unneccesary and not performed. I experimented with sending 256 1 byte prefixes to try and trick the client to always do a lookup, but the min prefix size is 4 bytes and enumerating all possible prefixes seemed gross.
* The test server has a /add endpoint that could be used by integration tests to add new domains to the block list, but it isn't being used presently. The trouble is that the client only updates its database every 30 minutes at present, and so adding a new domain will only take affect after the client updates the database.
Resolves#2448
Pulls in logging improvements in OCSP Responder and the CT client, plus a handful of API changes. Also, the CT client verifies responses by default now.
This change includes some Boulder diffs to accommodate the API changes.
Set authorizationLifetimeDays to 60 across both config and config-next.
Set NumSessions to 2 in both config and config-next. A decrease from 10 because pkcs11-proxy (or pkcs11-daemon?) seems to error out under load if you have more sessions than CPUs.
Reorder parallelGenerateOCSPRequests to match config-next.
Remove extra tags for parsing yaml in config objects.
We have a number of stats already expressed using the statsd interface. During
the switchover period to direct Prometheus collection, we'd like to make those
stats available both ways. This change automatically exports any stats exported
using the statsd interface via Prometheus as well.
This is a little tricky because Prometheus expects all stats to by registered
exactly once. Prometheus does offer a mechanism to gracefully recover from
registering a stat more than once by handling a certain error, but it is not
safe for concurrent access. So I added a concurrency-safe wrapper that creates
Prometheus stats on demand and memoizes them.
In the process, made a few small required side changes:
- Clean "/" from method names in the gRPC interceptors. They are allowed in
statsd but not in Prometheus.
- Replace "127.0.0.1" with "boulder" as the name of our testing CT log.
Prometheus stats can't start with a number.
- Remove ":" from the CT-log stat names emitted by Publisher. Prometheus stats
can't include it.
- Remove a stray "RA" in front of some rate limit stats, since it was
duplicative (we were emitting "RA.RA..." before).
Note that this means two stat groups in particular are duplicated:
- Gostats* is duplicated with the default process-level stats exported by the
Prometheus library.
- gRPCClient* are duplicated by the stats generated by the go-grpc-prometheus
package.
When writing dashboards and alerts in the Prometheus world, we should be careful
to avoid these two categories, as they will disappear eventually. As a general
rule, if a stat is available with an all-lowercase name, choose that one, as it
is probably the Prometheus-native version.
In the long run we will want to create most stats using the native Prometheus
stat interface, since it allows us to use add labels to metrics, which is very
useful. For instance, currently our DNS stats distinguish types of queries by
appending the type to the stat name. This would be more natural as a label in
Prometheus.
Previously, a given binary would have three TLS config fields (CA cert, cert,
key) for its gRPC server, plus each of its configured gRPC clients. In typical
use, we expect all three of those to be the same across both servers and clients
within a given binary.
This change reuses the TLSConfig type already defined for use with AMQP, adds a
Load() convenience function that turns it into a *tls.Config, and configures it
for use with all of the binaries. This should make configuration easier and more
robust, since it more closely matches usage.
This change preserves temporary backwards-compatibility for the
ocsp-updater->publisher RPCs, since those are the only instances of gRPC
currently enabled in production.
This allows finer-grained control of which components can request issuance. The OCSP Updater should not be able to request issuance.
Also, update test/grpc-creds/generate.sh to reissue the certs properly.
Resolves#2417
Previously, all gRPC services used the same client and server certificates. Now,
each service has its own certificate, which it uses for both client and server
authentication, more closely simulating production.
This also adds aliases for each of the relevant hostnames in /etc/hosts. There
may be some issues if Docker decides to rewrite /etc/hosts while Boulder is
running, but this seems to work for now.
Right now we are using a third-party client for the Google Safe Browsing API, but Google has recently released their own [Golang library](https://github.com/google/safebrowsing) which also supports the newer v4 API. Using this library will let us avoid fixing some lingering race conditions & unpleasantness in our fork of `go-safebrowsing-api`.
This PR adds support for using the Google library & the v4 API in place of our existing fork when the `GoogleSafeBrowsingV4` feature flag is enabled in the VA "features" configuration.
Resolves https://github.com/letsencrypt/boulder/issues/1863
Per `CONTRIBUTING.md` I also ran the unit tests for the new dependency:
```
daniel@XXXXXXXXXX:~/go/src/github.com/google/safebrowsing$ go test ./...
ok github.com/google/safebrowsing 3.274s
? github.com/google/safebrowsing/cmd/sblookup [no test files]
? github.com/google/safebrowsing/cmd/sbserver [no test files]
? github.com/google/safebrowsing/cmd/sbserver/statik [no test files]
? github.com/google/safebrowsing/internal/safebrowsing_proto [no test files]
ok github.com/google/safebrowsing/vendor/github.com/golang/protobuf/jsonpb 0.012s
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/jsonpb/jsonpb_test_proto [no test files]
ok github.com/google/safebrowsing/vendor/github.com/golang/protobuf/proto 0.062s
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/proto/proto3_proto [no test files]
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go [no test files]
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go/descriptor [no test files]
ok github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go/generator 0.017s
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go/grpc [no test files]
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/protoc-gen-go/plugin [no test files]
ok github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes 0.009s
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/any [no test files]
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/duration [no test files]
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/empty [no test files]
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/struct [no test files]
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/timestamp [no test files]
? github.com/google/safebrowsing/vendor/github.com/golang/protobuf/ptypes/wrappers [no test files]
? github.com/google/safebrowsing/vendor/github.com/rakyll/statik [no test files]
? github.com/google/safebrowsing/vendor/github.com/rakyll/statik/fs [no test files]
ok github.com/google/safebrowsing/vendor/golang.org/x/net/idna 0.003s
```
Similar to #2431 the expiration-mailer's `findExpiringCertificates()` query can be optimized slightly by using `certificateStatus.NotAfter` in place of `certificate.expires` in the `WHERE` clause of its query when the `CertStatusOptimizationsMigrated` feature is enabled.
Resolves https://github.com/letsencrypt/boulder/issues/2425
Following on to https://github.com/letsencrypt/boulder/pull/2177 and https://github.com/letsencrypt/boulder/issues/2227 this PR adds code to the `ocsp-updater` that takes advantage of the migrations & backfill from the previous optimization PRs.
This has the primary effect of removing the `JOIN` on the `certificates` table in the `findStaleOCSPResponses` query. We expect this to be a big win in terms of query performance.
The `ocsp-updater` is also updated to opportunistically fill in the newly added `isExpired` field of the `CertificateStatus` table as it encounters rows that aren't marked as expired but correspond to an expired certificate.
Resolves https://github.com/letsencrypt/boulder/issues/2238 and #2239
Previously all OCSP signing and storage would be serial, which meant it was hard
to exercise the full capacity of our HSM. In this change, we run a limited
number of update and store requests in parallel.
This change also changes stats generation in generateOCSPResponses so we can
tell the difference between stats produced by new OCSP requests vs existing ones,
and adds a new stat that records how long the SQL query in findStaleOCSPResponses
takes.
This PR adds a new `OCSPStaleMaxAge` configuration parameter to the `ocsp-updater`. The default value when not provided is 30 days, and this is explicitly added to both `config/ocsp-updater.json` and `config-next/ocsp-updater.json`.
The OCSP updater uses this new parameter in `findStaleOCSPResponses` as a lower bound on the `ocspLastUpdated` field of the certificateStatus table. This is intended to speed up the processing of this query until we can land the proper fixes that require more intensive migrations & backfilling.
The `TestGenerateOCSPResponses` and `TestFindStaleOCSPResponses` unit tests had to be updated to explicitly set the `ocspLastUpdated` field of the certificate status rows that the tests add, because otherwise they are left at a default value of `0` and are excluded by the new `OCSPStaleMaxAge` functionality.
- Remove spinner from test.js. It made Travis logs hard to read.
- Listen on all interfaces for debugAddr. This makes it possible to check
Prometheus metrics for instances running in a Docker container.
- Standardize DNS timeouts on 1s and 3 retries across all configs. This ensures
DNS completes within the relevant RPC timeouts.
- Remove RA service queue from VA, since VA no longer uses the callback to RA on
completing a challenge.
This PR introduces the ability for the ocsp-updater to only resubmit certificates to logs that we are missing SCTs from. Prior to this commit when a certificate was missing one or more SCTs we would submit it to every log, causing unnecessary overhead for us and the log operator.
To accomplish this a new RPC endpoint is added to the Publisher service "SubmitToSingleCT". Unlike the existing "SubmitToCT" this RPC endpoint accepts a log URI and public key in addition to the certificate DER bytes. The certificate is submitted directly to that log, and a cache of constructed resources is maintained so that subsequent submissions to the same log can reuse the stat name, verifier, and submission client.
Resolves#1679
Adds a gRPC server to the SA and SA gRPC Clients to the WFE, RA, CA, Publisher, OCSP updater, orphan finder, admin revoker, and expiration mailer.
Also adds a CA gRPC client to the OCSP Updater which was missed in #2193.
Fixes#2347.
In https://groups.google.com/forum/#!topic/mozilla.dev.security.policy/_pSjsrZrTWY, we had a problem with the policy authority configuration, but cert-checker didn't alert about it because it uses the same policy configuration.
This PR adds support for an explicit list of regular expressions used to match forbidden names. The regular expressions are applied after the PA has done its usual validation process in order to act as a defense-in-depth mechanism for cases (.mil, .local, etc) that we know we never want to support, even if the PA thinks they are valid (e.g. due to a policy configuration malfunction).
Initially the forbidden name regexps are:
`^\s*$`,
`\.mil$`,
`\.local$`,
`^localhost$`,
`\.localhost$`,
Additionally, the existing cert-checker.json config in both test/config/ and test/config-next/ was missing the hostnamePolicyFile entry required for operation of cert-checker. This PR adds a hostnamePolicyFile entry pointing at the existing test/hostname-policy.json file. The cert checker can now be used in the dev env with cert-checker -config test/config/cert-checker.json without error.
Resolves#2366
With the current gRPC design the CA talks directly to the Publisher when calling SubmitToCT which crosses security bounadries (secure internal segment -> internet facing segment) which is dangerous if (however unlikely) the Publisher is compromised and there is a gRPC exploit that allows memory corruption on the caller end of a RPC which could expose sensitive information or cause arbitrary issuance.
Instead we move the RPC call to the RA which is in a less sensitive network segment. Switching the call site from the CA -> RA is gated on adding the gRPC PublisherService object to the RA config.
Fixes#2202.
As described in #2282, our gRPC code uses mutual TLS to authenticate both clients and servers. However, currently our gRPC servers will accept any client certificate signed by the internal CA we use to authenticate connections. Instead, we would like each server to have a list of which clients it will accept. This will improve security by preventing the compromise of one client private key being used to access endpoints unrelated to its intended scope/purpose.
This PR implements support for gRPC servers to specify a list of accepted client names. A `serverTransportCredentials` implementing `ServerHandshake` uses a `verifyClient` function to enforce that the connecting peer presents a client certificate with a SAN entry that matches an entry on the list of accepted client names
The `NewServer` function from `grpc/server.go` is updated to instantiate the `serverTransportCredentials` used by `grpc.NewServer`, specifying an accepted names list populated from the `cmd.GRPCServerConfig.ClientNames` config field.
The pre-existing client and server certificates in `test/grpc-creds/` are replaced by versions that contain SAN entries as well as subject common names. A DNS and an IP SAN entry are added to allow testing both methods of specifying allowed SANs. The `generate.sh` script is converted to use @jsha's `minica` tool (OpenSSL CLI is blech!).
An example client whitelist is added to each of the existing gRPC endpoints in config-next/ to allow the SAN of the test RPC client certificate.
Resolves#2282
This PR reworks the validateEmail() function from the RA to allow timeouts during DNS validation of MX/A/AAAA records for an email to be non-fatal and match our intention to verify emails best-effort.
Notes:
bdns/problem.go - DNSError.Timeout() was changed to also include context cancellation and timeout as DNS timeouts. This matches what DNSError.Error() was doing to set the error message and supports external callers to Timeout not duplicating the work.
bdns/mocks.go - the LookupMX mock was changed to support always.error and always.timeout in a manner similar to the LookupHost mock. Otherwise the TestValidateEmail unit test for the RA would fail when the MX lookup completed before the Host lookup because the error wouldn't be correct (empty DNS records vs a timeout or network error).
test/config/ra.json, test/config-next/ra.json - the dnsTries and dnsTimeout values were updated such that dnsTries * dnsTimeout was <= the WFE->RA RPC timeout (currently 15s in the test configs). This allows the dns lookups to all timeout without the overall RPC timing out.
Resolves#2260.
Fixes#503.
Functionality is gated by the feature flag `AllowKeyRollover`. Since this functionality is only specified in ACME draft-03 and we mostly implement the draft-02 style this takes some liberties in the implementation, which are described in the updated divergences doc. The `key-change` resource is used to side-step draft-03 `url` requirement.
Right now, we only get single-threaded performance from our HSM, even though it
has multiple cores. We can use the pkcs11key's NewPool function to create a pool
of PKCS#11 sessions, allowing us to take advantage of the HSM's full
performance.
The "20160817143417_AddCertStatusNotAfter.sql" db migration adds a "notAfter" column to the certificateStatus database table. This field duplicates the contents of the certificates table "expires" column. This enables performance improvements (see #1864) for both the ocsp-updater and the expiration-mailer utilities.
Since existing rows will have a NULL value in the new field the notafter-backfill utility exists to perform a one-time update of the existing certificateStatus rows to set their notAfter column based on the data that exists in the certificates table.
This follows on https://github.com/letsencrypt/boulder/pull/2177 and requires that the migration be applied & the feature flag set accordingly before use.
Fixes#2237.
Add feature flagged support for issuing for IDNs, fixes#597.
This patch expects that clients have performed valid IDN2008 encoding on any label that includes unicode characters. Invalid encodings (including non-compatible IDN2003 encoding) will be rejected. No script-mixing or script exclusion checks are performed as we assume that if a name is resolvable that it conforms to the registrar's policies on these matters and if it uses non-standard scripts in sub-domains etc that browsers should be the ones choosing how to display those names.
Required a full update of the golang.org/x/net tree to pull in golang.org/x/net/idna, all test suites pass.
Move features sections to the correct JSON object and only test registration validity if regCheck is true
* Pull other flag up to correct level
* Only check status update when status is non-empty
This PR adds a migration to create two new fields on the `certificateStatus` table: `notAfter` and `isExpired`. The rationale for these fields is explained in #1864. Usage of these fields is gated behind `features.CertStatusOptimizationsMigrated` per [CONTRIBUTING.md](https://github.com/letsencrypt/boulder/blob/master/CONTRIBUTING.md#gating-migrations). This flag should be set to true **only** when the `20160817143417_CertStatusOptimizations.sql` migration has been applied.
Points of difference from #2132 (the initial preparatory "all-in-one go" PR):
**Note 1**: Updating the `isExpired` field in the OCSP updater can not be done yet, the `notAfter` field needs to be fully populated first - otherwise a separate query or a messy `JOIN` would have to be used to determine if a certStatus `isExpired` by using the `certificates` table's `expires` field.
**Note 2**: Similarly we can't remove the `JOIN` on `certificates` from the `findStaleOCSPResponse` query yet until all DB rows have `notAfter` populated. This will happen in a separate **Part Two** PR.
The LookupIPv6 flag has been enabled in production and isn't required anymore. This PR removes the flag entirely.
The errA and errAAAA error handling in LookupHost is left as-is, meaning that a non-nil errAAAA will not be returned to the caller. This matches the existing behaviour, and the expectations of the TestDNSLookupHost unit tests.
This commit also removes the tests from TestDNSLookupHost that tested the LookupIPv6 == false behaviours since those are no longer implemented.
Resolves#2191
Updates #1699.
Adds a new package, `features`, which exposes methods to set and check if various internal features are enabled. The implementation uses global state to store the features so that services embedded in another service do not each require their own features map in order to check if something is enabled.
Requires a `boulder-tools` image update to include `golang.org/x/tools/cmd/stringer`.
Fixes#140.
This patch allows users to specify the following revocation reasons based on my interpretation of the meaning of the codes but could use confirmation from others.
* unspecified (0)
* keyCompromise (1)
* affiliationChanged (3)
* superseded (4)
* cessationOfOperation (5)
Another step in completing #1962, which will remove the global configuration file and codegangsta/cli from boulder. 3 more to go!
This PR, is a little bit different than others in that there was a lot more reliance on codegangsta/cli especially in the implementation of subcommands. I put some thought into creating our own SubCommand struct, but given the lack of complexity it seemed unnecessary as the same could be accomplished with slightly more advanced usage of os and flag.
Right now we use the Source field for both DB and file URLs. However, we want to move to the DBConnect config field, so that we can take advantage of the code that reads DSNs from a file on disk. It turns out the existing code didn't work if you configure a dbConnect string, because it would error out with:
"source" parameter not found in JSON config
After rearranging, both methods should work.
Introduces the `authorizationLifetimeDays` and `pendingAuthorizationLifetimeDays` configuration options for `RA`.
If the values are missing from configuration, the code defaults back to the current values (300/7 days).
fixes#2024
Instead of reading the CA key from a file on disk into memory and using that for signing in `boulder-ca` this patch adds a new Docker container that runs SoftHSM and pkcs11-proxy in order to hold the key and perform signing operations. The pkcs11-proxy module is used by `boulder-ca` to talk to the SoftHSM container.
This exercises (almost) the full pkcs11 path through boulder and will allow testing various HSM related failures in the future as well as simplifying tuning signing performance for benchmarking.
Fixes#703.
For the notify-mailer, this PR fixes a bug with the -end parameter where the default (99999999) would cause a slice index out of range error. This was fixed by setting the -end value to len(m.destinations) in run when it is too large.
For both the notify-mailer and the contact-exporter a bug was fixed that was comparing the required flags against nil when the defaults were set to a non-nil pointer to "". This resulted in confusing errors when the mandatory arguments were not provided.
This PR also adds a separated config example for both the notify-mailer and the contact-exporter into test/config and test/config-next respectively.
Finally a documentation string was added to describe the overall design & usage of both tools, including example invocations.
Adds a test for CSRs generated using a pre-1.0.2 version of OpenSSL and a buggy client which will fail to parse with Golang 1.6+.
This test checks the values of the bytes in the 8th and 9th offsets, which in a properly formatted CSR should be the version integer declaration bytes, and if the malformed values are present will return a error to the user informing them that they are using an old version of OpenSSL and/or a client which doesn't explicitly set the CSR version.
Fixes#1902.
Moves the wfe to it's own config file.
Each config will now belong in `test/config` and `test/config-next` analogous to `boulder-config` and `boulder-config-next`.