Commit Graph

155 Commits

Author SHA1 Message Date
Jacob Hoffman-Andrews 29724cb0b7
ocsp/responder: update Redis source to use live signing (#6207)
This enables ocsp-responder to talk to the RA and request freshly signed
OCSP responses.

ocsp/responder/redis_source is moved to ocsp/responder/redis/redis_source.go
and significantly modified. Instead of assuming a response is always available
in Redis, it wraps a live-signing source. When a response is not available,
it attempts a live signing.

If live signing succeeds, the Redis responder returns the result right away
and attempts to write a copy to Redis on a goroutine using a background
context.

To make things more efficient, I eliminate an unneeded ocsp.ParseResponse
from the storage path. And I factored out a FakeResponse helper to make
the unittests more manageable.

Commits should be reviewable one-by-one.

Fixes #6191
2022-07-18 10:47:14 -07:00
Aaron Gable 3000339dee
Reject CSRs with duplicate extensions (#6153)
This behavior will be on by default in go1.19, so let's turn
it on ourselves now to ensure there won't be any breakage
when we upgrade in August.
2022-06-17 13:13:30 -07:00
Aaron Gable 11544756bb
Support new Google CT Policy (#6082)
Add a new code path to the ctpolicy package which enforces Chrome's new
CT Policy, which requires that SCTs come from logs run by two different
operators, rather than one Google and one non-Google log. To achieve
this, invert the "race" logic: rather than assuming we always have two
groups, and racing the logs within each group against each other, we now
race the various groups against each other, and pick just one arbitrary
log from each group to attempt submission to.

Ensure that the new code path does the right thing by adding a new zlint
which checks that the two SCTs embedded in a certificate come from logs
run by different operators. To support this lint, which needs to have a
canonical mapping from logs to their operators, import the Chrome CT Log
List JSON Schema and autogenerate Go structs from it so that we can
parse a real CT Log List. Also add flags to all services which run these
lints (the CA and cert-checker) to let them load a CT Log List from disk
and provide it to the lint.

Finally, since we now have the ability to load a CT Log List file
anyway, use this capability to simplify configuration of the RA. Rather
than listing all of the details for each log we're willing to submit to,
simply list the names (technically, Descriptions) of each log, and look
up the rest of the details from the log list file.

To support this change, SRE will need to deploy log list files (the real
Chrome log list for prod, and a custom log list for staging) and then
update the configuration of the RA, CA, and cert-checker. Once that
transition is complete, the deletion TODOs left behind by this change
will be able to be completed, removing the old RA configuration and old
ctpolicy race logic.

Part of #5938
2022-05-25 15:14:57 -07:00
Aaron Gable dab8a71b0e
Use new RA methods from WFE revocation path (#5983)
Simplify the WFE `RevokeCertificate` API method in three ways:
- Remove most of the logic checking if the requester is authorized to
  revoke the certificate in question (based on who is making the
  request, what authorizations they have, and what reason they're
  requesting). That checking is now done by the RA. Instead, simply
  verify that the JWS is authenticated.
- Remove the hard-to-read `authorizedToRevoke` callbacks, and make the
  `revokeCertBySubscriberKey` (nee `revokeCertByKeyID`) and
  `revokeCertByCertKey` (nee `revokeCertByJWK`) helpers much more
  straight-line in their execution logic.
- Call the RA's new `RevokeCertByApplicant` and `RevokeCertByKey` gRPC
  methods, rather than the deprecated `RevokeCertificateWithReg`.

This change, without any flag flips, should be invisible to the
end-user. It will slightly change some of our log message formats.
However, by now relying on the new RA gRPC revocation methods, this
change allows us to change our revocation policies by enabling the
`AllowDoubleRevocation` and `MozRevocationReasons` feature flags, which
affect the behavior of those new helpers.

Fixes #5936
2022-03-28 14:14:11 -07:00
Aaron Gable 07d56e3772
Add new, simpler revocation methods to RA (#5969)
Add two new gRPC methods to the SA:
- `RevokeCertByKey` will be used when the API request was signed by the
  certificate's keypair, rather than a Subscriber keypair. If the
  request is for reason `keyCompromise`, it will ensure that the key is
  added to the blocked keys table, and will attempt to "re-revoke" a
  certificate that was already revoked for some other reason.
- `RevokeCertByApplicant` supports both the path where the original
  subscriber or another account which has proven control over all of the
  identifier in the certificate requests revocation via the API. It does
  not allow the requested reason to be `keyCompromise`, as these
  requests do not represent a demonstration of key compromise.

In addition, add a new feature flag `MozRevocationReasons` which
controls the behavior of these new methods. If the flag is not set, they
behave like they have historically (see above). If the flag is set to true,
then the new methods enforce the upcoming Mozilla policies around
revocation reasons, namely:
- Only the original Subscriber can choose the revocation reason; other
  clients will get a set reason code based on the method of requesting
  revocation. When the original Subscriber requests reason
  `keyCompromise`, this request will be honored, but the key will not be
  blocked and other certificates with that key will not also be revoked.
- Revocations signed with the certificate key will always get reason
  `keyCompromise`, because we do not know who is sending the request and
  therefore must assume that the use of the key in this way represents
  compromise. Because these requests will always be fore reason
  `keyCompromise`, they will always be added to the blocked keys table
  and they will always attempt "re-revocation".
- Revocations authorized via control of all names in the cert will
  always get reason `cessationOfOperation`, which is to be used when the
  original Subscriber does not control all names in the certificate
  anymore.

Finally, update the existing `AdministrativelyRevokeCertificate` method
to use the new helper functions shared by the two new methods.

Part of #5936
2022-03-14 08:58:17 -07:00
Aaron Gable 89000bd61c
Add close-primes detection via Fermat's factorization (#5853)
Add a new check to GoodKey which attempts to factor the public modulus
of the presented key using Fermat's factorization method. This method
will succeed if and only if the prime factors are very close to each
other -- i.e. almost certainly were not selected independently from a
random uniform distribution, but were instead calculated via some other
less secure method.

To support this new feature, add a new config flag to the RA, CA, and
WFE, which all use the GoodKey checks. As part of adding this new config
value, refactor the GoodKey config items into their own config struct
which can be re-used across all services.

If the new `FermatRounds` config value has not been set, it will default
to zero, causing no factorization to be attempted.

Fixes #5850
Part of #5851
2021-12-14 09:19:33 -08:00
Jacob Hoffman-Andrews ba0ea090b2
integration: save hierarchy across runs (#5729)
This allows repeated runs using the same hiearchy, and avoids spurious
errors from ocsp-updater saying "This CA doesn't have an issuer cert
with ID XXX"

Fixes #5721
2021-10-20 17:06:33 -07:00
Aaron Gable 4ef9fb1b4f
Add new SA.NewOrderAndAuthzs gRPC method (#5602)
Add a new method to the SA's gRPC interface which takes both an Order
and a list of new Authorizations to insert into the database, and adds
both (as well as the various ancillary rows) inside a transaction.

To enable this, add a new abstraction layer inside the `db/` package
that facilitates inserting many rows at once, as we do for the `authz2`,
`orderToAuthz2`, and `requestedNames` tables in this operation. 

Finally, add a new codepath to the RA (and a feature flag to control it)
which uses this new SA method instead of separately calling the
`NewAuthorization` method multiple times. Enable this feature flag in
the config-next integration tests.

This should reduce the failure rate of the new-order flow by reducing
the number of database operations by coalescing multiple inserts into a
single multi-row insert. It should also reduce the incidence of new
authorizations being created in the database but then never exposed to
the subscriber because of a failure later in the new-order flow, both by
reducing failures overall and by adding those authorizations in a
transaction which will be rolled back if there is a later failure.

Fixes #5577
2021-09-03 13:48:04 -07:00
Aaron Gable 9abb39d4d6
Honeycomb integration proof-of-concept (#5408)
Add Honeycomb tracing to all Boulder components which act as
HTTP servers, gRPC servers, or gRPC clients. Add many values
which we currently emit to logs to the trace spans. Add a way to
configure the Honeycomb integration to our config files, and by
default configure all of our tests to "mute" (send nothing).

Followup changes will refine the configuration, attempt to reduce
the new dependency load, and introduce better sampling.

Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218
2021-05-24 16:13:08 -07:00
Jacob Hoffman-Andrews b4e483d38b
Add gRPC MaxConnectionAge config. (#5311)
This allows servers to tell clients to go away after some period of time, which triggers the clients to re-resolve DNS.

Per grpc/grpc#12295, this is the preferred way to do this.

Related: #5307.
2021-03-01 18:37:47 -08:00
Aaron Gable 16c7a21a57
RA: Multi-issuer support for OCSP purging (#5160)
The RA is responsible for contacting Akamai to purge cached OCSP
responses when a certificate is revoked and fresh OCSP responses need to
be served ASAP. In order to do so, it needs to construct the same OCSP
URLs that clients would construct, and that Akamai would cache. In order
to do that, it needs access to the issuing certificate to compute a hash
across its Subject Info and Public Key.

Currently, the RA holds a single issuer certificate in memory, and uses
that cert to compute all OCSP URLs, on the assumption that all certs
we're being asked to revoke were issued by the same issuer.

In order to support issuance from multiple intermediates at the same
time (e.g. RSA and ECDSA), and to support rollover between different
issuers of the same type (we may need to revoke certs issued by two
different issuers for the 90 days in which their end-entity certs
overlap), this commit changes the configuration to provide a list of
issuer certificates instead.

In order to support efficient lookup of issuer certs, this change also
introduces a new concept, the Chain ID. The Chain ID is a truncated hash
across the raw bytes of either the Issuer Info or the Subject Info of a
given cert. As such, it can be used to confirm issuer/subject
relationships between certificates. In the future, this may be a
replacement for our current IssuerID (a truncated hash over the whole
issuer certificate), but for now it is used to map revoked certs to
their issuers inside the RA.

Part of #5120
2020-11-06 13:58:32 -08:00
Aaron Gable 3666322817
Add health-checker tool and use it from startservers.py (#5095)
This adds a new tool, `health-checker`, which is a client of the new
Health Checker Service that has been integrated into all of our
boulder components. This tool takes an address, a timeout, and a
config file. It then attempts to connect to a gRPC Health Service at
the given address, retrying until it hits its timeout, using credentials
specified by the config file.

This is then wrapped by a new function `waithealth` in our Python
helpers, which serves much the same function as `waitport`, but
specifically for services which surface a gRPC Health Service
This in turn requires slight modifications to `startservers`, namely
specifying the address and port on which each service starts its
gRPC listener.

Finally, this change also introduces new credentials for this
health-checker, and adds those credentials as a valid client to
all services' json configs. A similar change would have to be made
to our production configs if we were to establish a long-lived 
health checker/prober in prod.

Fixes #5074
2020-10-06 15:01:35 -07:00
Aaron Gable 440c5f96d9
Remove unreferenced values from test configs (#4959) 2020-07-15 13:50:00 -07:00
Roland Bracewell Shoemaker aa79d8360d
Move FasterNewOrdersRateLimit feature flag to the right test/config-next file (#4929) 2020-07-02 14:03:38 -07:00
Aaron Gable 91d4e235ad
Deprecate the BlockedKeyTable feature flag (#4881)
This commit consists of three classes of changes:
1) Changing various command main.go files to always behave as they
   would have when features.BlockedKeyTable was true. Also changing
   one test in the same manner.
2) Removing the BlockedKeyTable flag from configuration in config-next,
   because the flag is already live.
3) Moving the BlockedKeyTable flag to the "deprecated" section of
   features.go, and regenerating featureflag_strings.go.

A future change will remove the BlockedKeyTable flag (and other
similarly deprecated flags) from features.go entirely.

Fixes #4873
2020-06-22 16:35:37 -07:00
Roland Bracewell Shoemaker b7ad70caff
sa: implement faster new orders rate limit (#4857)
Fixes #4840
2020-06-09 17:14:23 -07:00
Roland Bracewell Shoemaker 56898e8953
Log RSA key sizes in WFE/WFE2 and add feature to restrict them (#4839)
Currently 99.99% of RSA keys we see in certificates at Let's Encrypt are
either 2048, 3072, or 4096 bits, but we support every 8 bit increment
between 2048 and 4096. Supporting these uncommon key sizes opens us up to
having to block much larger ranges of keys when dealing with something
like the Debian weak keys incident. Instead we should just reduce the
set of key sizes we support down to what people actually use.

Fixes #4835.
2020-06-08 11:23:27 -07:00
Roland Bracewell Shoemaker 7673f02803
Use cmd/ceremony in integration tests (#4832)
This ended up taking a lot more work than I expected. In order to make the implementation more robust a bunch of stuff we previously relied on has been ripped out in order to reduce unnecessary complexity (I think I insisted on a bunch of this in the first place, so glad I can kill it now).

In particular this change:

* Removes bhsm and pkcs11-proxy: softhsm and pkcs11-proxy don't play well together, and any softhsm manipulation would need to happen on bhsm, then require a restart of pkcs11-proxy to pull in the on-disk changes. This makes manipulating softhsm from the boulder container extremely difficult, and because of the need to initialize new on each run (described below) we need direct access to the softhsm2 tools since pkcs11-tool cannot do slot initialization operations over the wire. I originally argued for bhsm as a way to mimic a network attached HSM, mainly so that we could do network level fault testing. In reality we've never actually done this, and the extra complexity is not really realistic for a handful of reasons. It seems better to just rip it out and operate directly on a local softhsm instance (the other option would be to use pkcs11-proxy locally, but this still would require manually restarting the proxy whenever softhsm2-util was used, and wouldn't really offer any realistic benefit).
* Initializes the softhsm slots on each integration test run, rather than when creating the docker image (this is necessary to prevent churn in test/cert-ceremonies/generate.go, which would need to be updated to reflect the new slot IDs each time a new boulder-tools image was created since slot IDs are randomly generated)
* Installs softhsm from source so that we can use a more up to date version (2.5.0 vs. 2.2.0 which is in the debian repo)
* Generates the root and intermediate private keys in softhsm and writes out the root and intermediate public keys to /tmp for use in integration tests (the existing test-{ca,root} certs are kept in test/ because they are used in a whole bunch of unit tests. At some point these should probably be renamed/moved to be more representative of what they are used for, but that is left for a follow-up in order to keep the churn in this PR as related to the ceremony work as possible)
Another follow-up item here is that we should really be zeroing out the database at the start of each integration test run, since certain things like certificates and ocsp responses will be signed by a key/issuer that is no longer is use/doesn't match the current key/issuer.

Fixes #4832.
2020-06-03 15:20:23 -07:00
Roland Bracewell Shoemaker 70ff4d9347
Add bad-key-revoker daemon (#4788)
Adds a daemon which monitors the new blockedKeys table and checks for any unexpired, unrevoked certificates that are associated with the added SPKI hashes and revokes them, notifying the user that issued the certificates.

Fixes #4772.
2020-04-23 11:51:59 -07:00
Jacob Hoffman-Andrews 87fb6028c1
Add log validator to integration tests (#4782)
For now this mainly provides an example config and confirms that
log-validator can start up and shut down cleanly, as well as provide a
stat indicating how many log lines it has handled.

This introduces a syslog config to the boulder-tools image that will write
logs to /var/log/program.log. It also tweaks the various .json config
files so they have non-default syslogLevel, to ensure they actually
write something for log-validator to verify.
2020-04-20 13:33:42 -07:00
Roland Bracewell Shoemaker 9df97cbf06
Add a blocked keys table, and use it (#4773)
Fixes #4712 and fixes #4711.
2020-04-15 13:42:51 -07:00
Roland Bracewell Shoemaker ea231adc36 features: remove deprecated feature flags (#4607)
Confirmed none of these features are currently present in any staging or 
production configs.
2019-12-09 15:59:27 -05:00
Daniel McCarney fde145ab96
RA: implement stricter email validation. (#4574)
Prev. we weren't checking the domain portion of an email contact address
very strictly in the RA. This updates the PA to export a function that
can be used to validate the domain the same way we validate domain
portions of DNS type identifiers for issuance.

This also changes the RA to use the `invalidEmail` error type in more
places.

A new Go integration test is added that checks these errors end-to-end
for both account creation and account update.
2019-11-22 13:39:31 -05:00
Roland Bracewell Shoemaker 46e0468220 Make authz2 the default storage format (#4476)
This change set makes the authz2 storage format the default format. It removes
most of the functionality related to the previous storage format, except for
the SA fallbacks and old gRPC methods which have been left for a follow-up
change in order to make these changes deployable without introducing
incompatibilities.

Fixes #4454.
2019-10-21 15:29:15 -04:00
Daniel McCarney f02e9da38f
Support admin. blocking public keys. (#4419)
We occasionally have reason to block public keys from being used in CSRs
or for JWKs. This work adds support for loading a YAML blocked keys list
to the WFE, the RA and the CA (all the components already using the
`goodekey` package).

The list is loaded in-memory and is intended to be used sparingly and
not for more complicated mass blocking scenarios. This augments the
existing debian weak key checking which is specific to RSA keys and
operates on a truncated hash of the key modulus. In comparison the
admin. blocked keys are identified by the Base64 encoding of a SHA256
hash over the DER encoding of the public key expressed as a PKIX subject
public key. For ECDSA keys in particular we believe a more thorough
solution would have to consider inverted curve points but to start we're
calling this approach "Good Enough".

A utility program (`block-a-key`) is provided that can read a PEM
formatted x509 certificate or a JSON formatted JWK and emit lines to be
added to the blocked keys YAML to block the related public key.

A test blocked keys YAML file is included
(`test/example-blocked-keys.yml`), initially populated with a few of the
keys from the `test/` directory. We may want to do a more through pass
through Boulder's source code and add a block entry for every test
private key.

Resolves https://github.com/letsencrypt/boulder/issues/4404
2019-09-06 16:54:26 -04:00
Roland Bracewell Shoemaker 62e52f4103 test: weakKeyDirectory -> weakKeyFile in test configs (#4397) 2019-08-12 18:07:56 -04:00
Roland Bracewell Shoemaker acc44498d1 RA: Make RevokeAtRA feature standard behavior (#4268)
Now that it is live in production and is working as intended we can remove
the old ocsp-updater functionality entirely.

Fixes #4048.
2019-06-20 14:32:53 -04:00
Roland Bracewell Shoemaker 3532dce246 Excise grpc maxConcurrentStreams configuration (#4257) 2019-06-12 09:35:24 -04:00
Roland Bracewell Shoemaker 4d40cf58e4
Enable integration tests for authz2 and fix a few bugs (#4221)
Enables integration tests for authz2 and fixes a few bugs that were flagged up during the process. Disables expired-authorization-purger integration tests if config-next is being used as expired-authz-purger expects to purge some stuff but doesn't know about authz2 authorizations, a new test will be added with #4188.

Fixes #4079.
2019-05-23 15:06:50 -07:00
Jacob Hoffman-Andrews 0c700143bb Clean up README and test configs (#4185)
- docker-rebuild isn't needed now that boulder and bhsm containers run directly off
 the boulder-tools image.
- Remove DNS options from RA config.
- Remove GSB options from VA config.
2019-04-30 13:26:19 -07:00
Daniel McCarney 748f315b1a
PA: Support YAML for hostname policy. (#4180)
Our existing hostname policy configs use JSON. We would like to switch to YAML
to match the rate limit policy configs and to support commenting/tagging
entries in the hostname policy.

The PA is updated to support both JSON and YAML while we migrate the existing
policy data to the new format. To verify there are no changes in functionality
the existing unit tests were updated to test the same policy expressed in YAML
and JSON form.

In integration tests the `config` tree continues to use a JSON hostname policy
file to test the legacy support. In `config-next` a YAML hostname policy file
is used instead.

The new YAML format allows separating out `HighRiskBlockedNames` (primarily
used to meet the requirement of managing issuance for "high risk" domains per
CABF BRs, mostly static) and `AdminBlockedNames` (used after administrative
action is taken to block a domain. Additions are made with some frequency).
Since the rate at which we change entries in these lists differs it is helpful
to separate them in the policy content.
2019-04-26 14:35:28 -04:00
Jacob Hoffman-Andrews 8f578f3a93
Improve integration tests (#4143)
- Move fakeclock, get_future_output, and random_domain to helpers.py.
- Remove tempdir handling from integration-test.py since it's already
  done in helpers.py
- Consolidate handling of config dir into helpers.py, and add
  CONFIG_NEXT boolean.
- Move RevokeAtRA config gating into verify_revocation to reduce
  redundancy.
- Skip load-balancing test when filter is enabled.
- Ungate test_sct_embedding
- Rework test_ct_submissions, which was out of date. In particular, have a couple of
  logs where submitFinalCert: false, and make ct-test-srv store submission counts
  by hostnames for better test case isolation.
2019-04-04 10:59:38 -07:00
Jacob Hoffman-Andrews d1e6d0f190 Remove TLS-SNI-01 (#4114)
* Remove the challenge whitelist
* Reduce the signature for ChallengesFor and ChallengeTypeEnabled
* Some unit tests in the VA were changed from testing TLS-SNI to testing the same behavior
  in TLS-ALPN, when that behavior wasn't already tested. For instance timeouts during connect 
  are now tested.

Fixes #4109
2019-03-15 09:05:24 -04:00
Daniel McCarney 279947ade2 CI/Devenv: restore 20s RA->VA timeout. (#4084)
I tried dropping the RA->VA timeout to make the
`test_http_challenge_timeout` integration test faster. It seems to flake
in CI so I'm restoring the original 20s timeout. This makes
`test_http_challenge_timeout` slower but c'est la vie.
2019-02-22 08:53:18 -08:00
Daniel McCarney 16e464a37d RA: apply certificate rate limits at NewOrder time. (#4074)
If an order for a given set of names will fail finalization because of certificate rate limits (certs per domain, certs per fqdn set) there isn't any point in allowing an order for those names to be created. We can stop a lot of requests earlier by enforcing the cert rate limits at new order time as well as finalization time. A new RA `EarlyOrderRateLimit` feature flag controls whether this is done or not.

Resolves #3975
2019-02-21 11:02:40 -08:00
Daniel McCarney 3324989205 CI/Dev: Increase RA->VA timeout to 8s. (#4062)
There has been some flakyness in CI related to RA->VA timeouts.
2019-02-15 13:38:12 -08:00
Roland Bracewell Shoemaker 3e54cea295 Implement direct revocation at RA (#4043)
Implements a feature that enables immediate revocation instead of marking a certificate revoked and waiting for the OCSP-Updater to generate the OCSP response. This means that as soon as the request returns from the WFE the revoked OCSP response should be available to the user. This feature requires that the RA be configured to use the standalone Akamai purger service.

Fixes #4031.
2019-02-14 14:47:42 -05:00
Daniel McCarney 1c0be52e53 VA: Add integration test for HTTP timeouts. (#4050)
Also update `TestHTTPTimeout` to test with the `SimplifiedVAHTTP`
feature flag enabled.
2019-02-12 13:42:01 -08:00
Jacob Hoffman-Andrews 92e8e1708a Update config and config-next challenge settings. (#4017)
- Allow tls-alpn-01 challenge in config.
- Disallow tls-sni-01 challenge in config-next.
- Remove gating of tls-alpn integration test.
- Remove TLSSNIRevalidation in config-next.
2019-01-18 10:30:38 -08:00
Roland Bracewell Shoemaker 842739bccd Remove deprecated features that have been purged from prod and staging configs (#4001) 2019-01-15 16:16:35 -08:00
Roland Bracewell Shoemaker 196f019851 Add support for temporal CT logs (#3853)
Required a little bit of rework of the RA issuance flow (to add parsing of the precert to determine the expiration date, and moving final cert parsing before final cert submission) and RA tests, but I think it shouldn't create any issues...

Fixes #3197.
2018-09-14 16:14:42 -07:00
Daniel McCarney d39babdcf3
RA: Remove vestigial DNS config/setup. (#3854)
In db01b0b we removed email validation from the RA. This was the only
use of the `bdns` package by the RA and so we can go one step further
and delete the remaining setup, configuration and `bdns` fields.
2018-09-13 13:39:23 -04:00
Roland Bracewell Shoemaker 876c727b6f Update gRPC (#3817)
Fixes #3474.
2018-08-20 10:55:42 -04:00
Jacob Hoffman-Andrews 36a83150ad Add stagger to CT log submissions. (#3794)
This allows each log a chance to respond before we move onto the next,
spreading our load more evenly across the logs in a log group.
2018-07-06 16:25:51 -04:00
Roland Bracewell Shoemaker e27f370fd3 Excise code relating to pre-SCT embedding issuance flow (#3769)
Things removed:

* features.EmbedSCTs (and all the associated RA/CA/ocsp-updater code etc)
* ca.enablePrecertificateFlow (and all the associated RA/CA code)
* sa.AddSCTReceipt and sa.GetSCTReceipt RPCs
* publisher.SubmitToCT and publisher.SubmitToSingleCT RPCs

Fixes #3755.
2018-06-28 08:33:05 -04:00
Maciej Dębski bb9ddb124e Implement TLS-ALPN-01 and integration test for it (#3654)
This implements newly proposed TLS-ALPN-01 validation method, as described in
https://tools.ietf.org/html/draft-ietf-acme-tls-alpn-01 This challenge type is disabled 
except in the config-next tree.
2018-06-06 13:04:09 -04:00
Jacob Hoffman-Andrews dbcb16543e Start using multiple-IP hostnames for load balancing (#3687)
We'd like to start using the DNS load balancer in the latest version of gRPC. That means putting all IPs for a service under a single hostname (or using a SRV record, but we're not taking that path). This change adds an sd-test-srv to act as our service discovery DNS service. It returns both Boulder IP addresses for any A lookup ending in ".boulder". This change also sets up the Docker DNS for our boulder container to defer to sd-test-srv when it doesn't know an answer.

sd-test-srv doesn't know how to resolve public Internet names like `github.com`. Resolving public names is required for the `godep-restore` test phase, so this change breaks out a copy of the boulder container that is used only for `godep-restore`.

This change implements a shim of a DNS resolver for gRPC, so that we can switch to DNS-based load balancing with the currently vendored gRPC, then when we upgrade to the latest gRPC we won't need a simultaneous config update.

Also, this change introduces a check at the end of the integration test that each backend received at least one RPC, ensuring that we are not sending all load to a single backend.
2018-05-23 09:47:14 -04:00
Daniel McCarney 76a3f4a18f RA/CA: Use `doNotForceCN: false` for `test/config`. (#3698)
In staging/prod we use `doNotForceCN: false` for both the RA & CA
config. Switching this to `true` is blocked on CABF work that will
likely take considerable time.

In the short-term we should use `doNotForceCN: false` in `test/config`
and only use `doNotForceCN: true` in `test/config-next`.
2018-05-09 12:54:16 -07:00
Jacob Hoffman-Andrews a4421ae75b Run gRPC backends on multiple IPs instead of multiple ports (#3679)
We're currently stuck on gRPC v1.1 because of a breaking change to certificate validation in gRPC 1.8. Our gRPC balancer uses a static list of multiple hostnames, and expects to validate against those hostnames. However gRPC expects that a service is one hostname, with multiple IP addresses, and validates all those IP addresses against the same hostname. See grpc/grpc-go#2012.

If we follow gRPC's assumptions, we can rip out our custom Balancer and custom TransportCredentials, and will probably have a lower-friction time in general.

This PR is the first step in doing so. In order to satisfy the "multiple IPs, one port" property of gRPC backends in our Docker container infrastructure, we switch to Docker's user-defined networking. This allows us to give the Boulder container multiple IP addresses on different local networks, and gives it different DNS aliases in each network.

In startservers.py, each shard of a service listens on a different DNS alias for that service, and therefore a different IP address. The listening port for each shard of a service is now identical.

This change also updates the gRPC service certificates. Now, each certificate that is used in a gRPC service (as opposed to something that is "only" a client) has three names. For instance, sa1.boulder, sa2.boulder, and sa.boulder (the generic service name). For now, we are validating against the specific hostnames. When we update our gRPC dependency, we will begin validating against the generic service name.

Incidentally, the DNS aliases feature of Docker allows us to get rid of some hackery in entrypoint.sh that inserted entries into /etc/hosts.

Note: Boulder now has a dependency on the DNS aliases feature in Docker. By default, docker-compose run creates a temporary container and doesn't assign any aliases to it. We now need to specify docker-compose run --use-aliases to get the correct behavior. Without --use-aliases, Boulder won't be able to resolve the hostnames it wants to bind to.
2018-05-07 10:38:31 -07:00
Roland Bracewell Shoemaker 0a86573a73 Update integration tests 2018-04-20 13:18:40 -07:00
Roland Bracewell Shoemaker ebc86fd778 Fix CT submission cancelations (#3658)
When the WFE calls the RA the RA creates a sub context which is cancelled when
the RPC returns. Because we were spawning the publisher RPC calls in a
goroutine with the context from ra.IssueCertificate as soon as
ra.IssueCertificate returned that context was being canceled which in turn
canceled the publisher RPC calls.

Instead of using the RA RPC context simply use a `context.Background()` so
that the RPC context doesn't break these submissions. Also return to
pre-features.CancelCTSubmissions behavior where precert submissions would
be canceled once we retrieved SCTs from the winning logs instead of relying
on the magic behavior of the RA RPC canceling them itself.
2018-04-20 11:26:02 -04:00
Roland Bracewell Shoemaker 1271a15be7 Submit final certs to CT logs (#3640)
Submits final certificates to any configured CT logs. This doesn't introduce a feature flag as it is config gated, any log we want to submit final certificates to needs to have it's log description updated to include the `"submitFinalCerts": true` field.

Fixes #3605.
2018-04-13 12:02:01 -04:00
Jacob Hoffman-Andrews 2a1cd4981a Allow configuring gRPC's MaxConcurrentStreams (#3642)
During periods of peak load, some RPCs are significantly delayed (on the order of seconds) by client-side blocking. HTTP/2 clients have to obey a "max concurrent streams" setting sent by the server. In Go's HTTP/2 implementation, this value [defaults to 250](https://github.com/golang/net/blob/master/http2/server.go#L56), so the gRPC default is also 250. So whenever there are more than 250 requests in progress at a time, additional requests will be delayed until there is a slot available.

During this peak load, we aren't hitting limits on CPU or memory, so we should increase the max concurrent streams limit to take better advantage of our available resources. This PR adds a config field to do that.

Fixes #3641.
2018-04-12 17:17:17 -04:00
Jacob Hoffman-Andrews a4f9de9e35 Improve nesting of RPC deadlines (#3619)
gRPC passes deadline information through the RPC boundary, but client and server have the same deadline. Ideally we'd like the server to have a slightly tighter deadline than the client, so if one of the server's onward RPCs or other network calls times out, the server can pass back more detailed information to the client, rather than the client timing out the server and losing the opportunity to log more detailed information about which component caused the timeout.

In this change, I subtract 100ms from the deadline on the server side of our interceptors, using our existing serverInterceptor. I also check that there is at least 100ms remaining in which to do useful work, so the server doesn't begin a potentially expensive task only to abort it.

Fixes #3608.
2018-04-06 15:40:18 +01:00
Roland Bracewell Shoemaker cc5ec34539 Allow configuration of multiple DNS resolvers (#3612)
* Allow configuration of multiple DNS resolvers
* Use multiple DNS resolvers in integration tests

Fixes #3611.
2018-04-05 11:51:22 -04:00
Jacob Hoffman-Andrews 700604dda1
Overlapping wildcard errors are 400, not 500. (#3561)
Return a malformed error for these requests. Also add an integration test.

Fixes #3558
2018-03-14 13:18:25 -07:00
Roland Bracewell Shoemaker 9c9e944759 Add SCT embedding (#3521)
Adds SCT embedding to the certificate issuance flow. When a issuance is requested a precertificate (the requested certificate but poisoned with the critical CT extension) is issued and submitted to the required CT logs. Once the SCTs for the precertificate have been collected a new certificate is issued with the poison extension replace with a SCT list extension containing the retrieved SCTs.

Fixes #2244, fixes #3492 and fixes #3429.
2018-03-12 11:58:30 -07:00
Jacob Hoffman-Andrews 11434650b7 Check safe browsing at validation time (#3539)
Right now we check safe browsing at new-authz time, which introduces a possible
external dependency when calling new-authz. This is usually fine, since most safe
browsing checks can be satisfied locally, but when requests have to go external,
it can create variance in new-authz timing.

Fixes #3491.
2018-03-09 11:15:05 +00:00
Daniel McCarney 28cc969814 Remove TLS-SNI-02 implementation. (#3516)
This code was never enabled in production. Our original intent was to
ship this as part of the ACMEv2 API. Before that could happen flaws were
identified in TLS-SNI-01|02 that resulted in TLS-SNI-02 being removed
from the ACME protocol. We won't ever be enabling this code and so we
might as well remove it.
2018-03-02 10:56:13 -08:00
Roland Bracewell Shoemaker 0b53063a72 ctpolicy: Add informational logs and don't cancel remaining submissions (#3472)
Add a set of logs which will be submitted to but not relied on for their SCTs,
this allows us to test submissions to a particular log or submit to a log which
is not yet approved by a browser/root program.

Also add a feature which stops cancellations of remaining submissions when racing
to get a SCT from a group of logs.

Additionally add an informational log that always times out in config-next.

Fixes #3464 and fixes #3465.
2018-02-23 21:51:50 -05:00
Jacob Hoffman-Andrews 92c9340fe8 Deprecate CountCertificatesExact. (#3462)
This is now enabled in prod and we can make it the default.
2018-02-20 14:34:03 -08:00
Roland Bracewell Shoemaker 8446571b46 Remove EnforceChallengeDisable (#3444)
Removes usage of the `EnforceChallengeDisable` feature, the feature itself is not removed as it is still configured in staging/production, once that is fixed I'll submit another PR removing the actual flag.

This keeps the behavior that when authorizations are retrieved from the SA they have their challenges populated, because that seems to make the most sense to me? It also retains TLS re-validation.

Fixes #3441.
2018-02-14 13:21:26 -08:00
Jacob Hoffman-Andrews c556a1a20d
Reduce spurious errors in integration test (#3436)
Boulder is fairly noisy about gRPC connection errors. This is a mixed
blessing: Our gRPC configuration will try to reconnect until it hits
an RPC deadline, and most likely eventually succeed. In that case,
we don't consider those to really be errors. However, in cases where
a connection is repeatedly failing, we'd like to see errors in the
logs about connection failure, rather than "deadline exceeded." So
we want to keep logging of gRPC errors.

However, right now we get a lot of these errors logged during
integration tests. They make the output hard to read, and may disguise
more serious errors. So we'd like to avoid causing such errors in
normal integration test operation.

This change reorders the startup of Boulder components by their gRPC
dependencies, so everything's backend is likely to be up and running
before it starts. It also reverses that order for clean shutdowns,
and waits for each process to exit before signalling the next one.

With these changes, I still got connection errors. Taking listenbuddy
out of the gRPC path fixed them. I believe the issue is that
listenbuddy is not a truly transparent proxy. In particular, it
accepts an inbound TCP connection before opening an outbound TCP
connection. If opening that outbound connection results in "connection
refused," it closes the inbound connection. That means gRPC sees a
"connection closed" (or "connection reset"?) rather than "connection
refused". I'm guessing it handles those cases differently, explaining
the different error results.

We've been using listenbuddy to trigger disconnects while Boulder is
running, to ensure that gRPC's reconnect code works. I think we can
probably rely on gRPC's reconnect to work. The initial problem that
led us to start testing this was a configuration problem; now that
we have the configuration we want, we should be fine and don't need
to keep testing reconnects on every integration test run.
2018-02-12 18:17:50 -08:00
Jacob Hoffman-Andrews 2dc3b56fa9
Add variable latency to ct-test-srv (#3435)
For the upcoming SCT embedding changes, it will be useful to have a CT test server that blocks for nontrivial amounts of time before responding. This change introduces a config file for `ct-test-srv` that can be used to set up multiple "personalities" on various ports. Each personality can have a "latencySchedule" that determines how long it will sleep before servicing responding to a submission.

This change also introduces two new "personalities" on :4510 and :4511, plus configures CTLogGroups in the RA. Having four CT log personalities allows us to simulate two nontrivial log groups.

Note: This triggers Publisher to emit audit errors on timed-out submissions. We may want to make Publisher not treat those as errors, and instead only log an error if a whole log group fails.
2018-02-09 13:48:19 -08:00
Roland Bracewell Shoemaker fc5c8f76b6 Remove unused features (#3393)
This removes a number of unused features (i.e. they are never checked anywhere).
2018-01-25 08:55:05 -05:00
Daniel McCarney c6d56b7a84 Match RA `authorizationLifetimeDays` to prod. (#3370) 2018-01-16 10:39:57 -08:00
Jacob Hoffman-Andrews 8153b919be
Implement TLSSNIRevalidation (#3361)
This change adds a feature flag, TLSSNIRevalidation. When it is enabled, Boulder
will create new authorization objects with TLS-SNI challenges if the requesting
account has issued a certificate with the relevant domain name, and was the most
recent account to do so*. This setting overrides the configured list of
challenges in the PolicyAuthority, so even if TLS-SNI is disabled in general, it
will be enabled for revalidation.

Note that this interacts with EnforceChallengeDisable. Because
EnforceChallengeDisable causes additional checked at validation time and at
issuance time, we need to update those two places as well. We'll send a
follow-up PR with that.

*We chose to make this work only for the most recent account to issue, even if
there were overlapping certificates, because it significantly simplifies the
database access patterns and should work for 95+% of cases.

Note that this change will let an account revalidate and reissue for a domain
even if the previous issuance on that account used http-01 or dns-01. This also
simplifies implementation, and fits within the intent of the mitigation plan: If
someone previously issued for a domain using http-01, we have high confidence
that they are actually the owner, and they are not going to "steal" the domain
from themselves using tls-sni-01.

Also note: This change also doesn't work properly with ReusePendingAuthz: true.
Specifically, if you attempted issuance in the last couple days and failed
because there was no tls-sni challenge, you'll still have an http-01 challenge
lying around, and we'll reuse that; then your client will fail due to lack of
tls-sni challenge again.

This change was joint work between @rolandshoemaker and @jsha.
2018-01-12 11:00:06 -08:00
Maciej Dębski 44984cd84a Implement regID whitelist for allowed challenge types. (#3352)
This updates the PA component to allow authorization challenge types that are globally disabled if the account ID owning the authorization is on a configured whitelist for that challenge type.
2018-01-10 13:44:53 -05:00
Roland Shoemaker 1a3a76438c Fix tests and GetOrderAuthorizations 2018-01-09 20:38:52 -08:00
Jacob Hoffman-Andrews cd49316493 Do ROCAChecks by default. (#3283)
This feature flag has been enabled in prod, and we don't expect to want to
turn it off any time soon.
2017-12-15 13:44:39 -08:00
Jacob Hoffman-Andrews f16c3af335 Active UDPDNS by default. (#3285)
Active UDPDNS by default
2017-12-15 12:26:45 -08:00
Daniel McCarney 1c99f91733 Policy based issuance for wildcard identifiers (Round two) (#3252)
This PR implements issuance for wildcard names in the V2 order flow. By policy, pending authorizations for wildcard names only receive a DNS-01 challenge for the base domain. We do not re-use authorizations for the base domain that do not come from a previous wildcard issuance (e.g. a normal authorization for example.com turned valid by way of a DNS-01 challenge will not be reused for a *.example.com order).

The wildcard prefix is stripped off of the authorization identifier value in two places:

When presenting the authorization to the user - ACME forbids having a wildcard character in an authorization identifier.
When performing validation - We validate the base domain name without the *. prefix.
This PR is largely a rewrite/extension of #3231. Instead of using a pseudo-challenge-type (DNS-01-Wildcard) to indicate an authorization & identifier correspond to the base name of a wildcard order name we instead allow the identifier to take the wildcard order name with the *. prefix.
2017-12-04 12:18:10 -08:00
Daniel McCarney 55dd1020c0 Increase VA SingleDialTimeout to 10s. (#3260)
This PR changes the VA's singleDialTimeout value from 5 * time.Second to 10 * time.Second. This will give slower servers a better chance to respond, especially for the multi-VA case where n requests arrive ~simultaneously.

This PR also bumps the RA->VA timeout by 5s and the WFE->RA timeout by 5s to accommodate the increased dial timeout. I put this in a separate commit in case we'd rather deal with this separately.
2017-12-04 09:53:26 -08:00
Roland Bracewell Shoemaker d5db80ab12 Various publisher CT fixes (#3219)
Makes a couple of changes:
* Change `SubmitToCT` to make submissions to each log in parallel instead of in serial, this prevents a single slow log from eating up the majority of the deadline and causing submissions to other logs to fail
* Remove the 'submissionTimeout' field on the publisher since it is actually bounded by the gRPC timeout as is misleading
* Add a timeout to the CT clients internal HTTP client so that when log servers hang indefinitely we actually do retries instead of just using the entire submission deadline. Currently set at 2.5 minutes

Fixes #3218.
2017-11-09 10:05:26 -05:00
Jacob Hoffman-Andrews 5df083a57e Add ROCA weak key checking (#3189)
Thanks to @titanous for the library!
2017-11-02 08:42:59 -04:00
Jacob Hoffman-Andrews 4e68fb2ff6 Switch to udp for internal DNS. (#3135)
We used to use TCP because we would request DNSSEC records from Unbound, and
they would always cause truncated records when present. Now that we no longer
request those (#2718), we can use UDP. This is better because the TCP serving
paths in Unbound are likely less thoroughly tested, and not optimized for high
load. In particular this may resolve some availability problems we've seen
recently when trying to upgrade to a more recent Unbound.

Note that this only affects the Boulder->Unbound path. The Unbound->upstream
path is already UDP by default (with TCP fallback for truncated ANSWERs).
2017-10-10 10:06:33 -04:00
Jacob Hoffman-Andrews b0c7bc1bee Recheck CAA for authorizations older than 8 hours (#3014)
Fixes #2889.

VA now implements two gRPC services: VA and CAA. These both run on the same port, but this allows implementation of the IsCAAValid RPC to skip using the gRPC wrappers, and makes it easier to potentially separate the service into its own package in the future.

RA.NewCertificate now checks the expiration times of authorizations, and will call out to VA to recheck CAA for those authorizations that were not validated recently enough.
2017-08-28 16:40:57 -07:00
Roland Bracewell Shoemaker 90ba766af9 Add NewOrder RPCs + methods to SA and RA (#2907)
Fixes #2875, #2900 and #2901.
2017-08-11 14:24:25 -04:00
Kleber Correia 338c61171b Remove IDNASupport flag (#2926)
Splitting #2712 into multiple per-flag PRs
2017-08-01 16:51:19 -07:00
Jacob Hoffman-Andrews 3431acfb92 Adjust testing maxNames config to match prod. (#2911) 2017-07-27 15:23:29 -07:00
Jacob Hoffman-Andrews 8bc1db742c Improve recycling of pending authzs (#2896)
The existing ReusePendingAuthz implementation had some bugs:

It would recycle deactivated authorizations, which then couldn't be fulfilled. (#2840)
Since it was implemented in the SA, it wouldn't get called until after the RA checks the Pending Authorizations rate limit. Which means it wouldn't fulfill its intended purpose of making accounts less likely to get stuck in a Pending Authorizations limited state. (#2831)
This factors out the reuse functionality, which used to be inside an "if" statement in the SA. Now the SA has an explicit GetPendingAuthorization RPC, which gets called from the RA before calling NewPendingAuthorization. This happens to obsolete #2807, by putting the recycling logic for both valid and pending authorizations in the RA.
2017-07-26 14:00:30 -07:00
Roland Bracewell Shoemaker 8ce2f8b432 Basic RSA known weak key checking (#2765)
Adds a basic truncated modulus hash check for RSA keys that can be used to check keys against the Debian `{openssl,openssh,openvpn}-blacklist` lists of weak keys generated during the [Debian weak key incident](https://wiki.debian.org/SSLkeys).

Testing is gated on adding a new configuration key to the WFE, RA, and CA configs which contains the path to a directory which should contain the weak key lists.

Fixes #157.
2017-05-25 09:33:58 -07:00
Jacob Hoffman-Andrews b17b5c72a6 Remove statsd from Boulder (#2752)
This removes the config and code to output to statsd.

- Change `cmd.StatsAndLogging` to output a `Scope`, not a `Statter`.
- Remove the prefixing of component name (e.g. "VA") in front of stats; this was stripped by `autoProm` but now no longer needs to be.
- Delete vendored statsd client.
- Delete `MockStatter` (generated by gomock) and `mocks.Statter` (hand generated) in favor of mocking `metrics.Scope`, which is the interface we now use everywhere.
- Remove a few unused methods on `metrics.Scope`, and update its generated mock.
- Refactor `autoProm` and add `autoRegisterer`, which can be included in a `metrics.Scope`, avoiding global state. `autoProm` now registers everything with the `prometheus.Registerer` it is given.
- Change va_test.go's `setup()` to not return a stats object; instead the individual tests that care about stats override `va.stats` directly.

Fixes #2639, #2733.
2017-05-15 10:19:54 -04:00
Daniel McCarney 1ed34a4a5d Fixes cert count rate limit for exact PSL matches. (#2703)
Prior to this PR if a domain was an exact match to a public suffix
list entry the certificates per name rate limit was applied based on the
count of certificates issued for that exact name and all of its
subdomains.

This PR introduces an exception such that exact public suffix
matches correctly have the certificate per name rate limit applied based
on only exact name matches.

In order to accomplish this a new RPC is added to the SA
`CountCertificatesByExactNames`. This operates similar to the existing
`CountCertificatesByNames` but does *not* include subdomains in the
count, only exact matches to the names provided. The usage of this new
RPC is feature flag gated behind the "CountCertificatesExact" feature flag.

The RA unit tests are updated to test the new code paths both with and
without the feature flag enabled.

Resolves #2681
2017-05-02 13:43:35 -07:00
Roland Bracewell Shoemaker a46d30945c Purge remaining AMQP code (#2648)
Deletes github.com/streadway/amqp and the various RabbitMQ setup tools etc. Changes how listenbuddy is used to proxy all of the gRPC client -> server connections so we test reconnection logic.

+49 -8,221 😁

Fixes #2640 and #2562.
2017-04-04 15:02:22 -07:00
Jacob Hoffman-Andrews 6719dc17a6 Remove AMQP config and code (#2634)
We now use gRPC everywhere.
2017-04-03 10:39:39 -04:00
David Calavera c71c3cff80 Implement TLS-SNI-02 challenge validations. (#2585)
I think these are all the necessary changes to implement TLS-SNI-02 validations, according to the section 7.3 of draft 05:

https://tools.ietf.org/html/draft-ietf-acme-acme-05#section-7.3

I don't have much experience with this code, I'll really appreciate your feedback.

Signed-off-by: David Calavera <david.calavera@gmail.com>
2017-03-22 10:17:59 -07:00
Jacob Hoffman-Andrews cbde78d58f Harmonize and tweak configs (#2479)
Set authorizationLifetimeDays to 60 across both config and config-next.

Set NumSessions to 2 in both config and config-next. A decrease from 10 because pkcs11-proxy (or pkcs11-daemon?) seems to error out under load if you have more sessions than CPUs.

Reorder parallelGenerateOCSPRequests to match config-next.

Remove extra tags for parsing yaml in config objects.
2017-01-10 13:46:38 -08:00
Jacob Hoffman-Andrews 510e279208 Simplify gRPC TLS configs. (#2470)
Previously, a given binary would have three TLS config fields (CA cert, cert,
key) for its gRPC server, plus each of its configured gRPC clients. In typical
use, we expect all three of those to be the same across both servers and clients
within a given binary.

This change reuses the TLSConfig type already defined for use with AMQP, adds a
Load() convenience function that turns it into a *tls.Config, and configures it
for use with all of the binaries. This should make configuration easier and more
robust, since it more closely matches usage.

This change preserves temporary backwards-compatibility for the
ocsp-updater->publisher RPCs, since those are the only instances of gRPC
currently enabled in production.
2017-01-06 14:19:18 -08:00
Jacob Hoffman-Andrews 089a270453 Add instructions on load testing OCSP generation. (#2459) 2017-01-02 11:36:03 -08:00
Jacob Hoffman-Andrews 0c665b2053 Split up gRPC certificates by service. (#2453)
Previously, all gRPC services used the same client and server certificates. Now,
each service has its own certificate, which it uses for both client and server
authentication, more closely simulating production.

This also adds aliases for each of the relevant hostnames in /etc/hosts. There
may be some issues if Docker decides to rewrite /etc/hosts while Boulder is
running, but this seems to work for now.
2016-12-29 14:53:59 -08:00
Jacob Hoffman-Andrews 1c1449b284 Improvements to tests and test configs. (#2396)
- Remove spinner from test.js. It made Travis logs hard to read.
- Listen on all interfaces for debugAddr. This makes it possible to check
  Prometheus metrics for instances running in a Docker container.
- Standardize DNS timeouts on 1s and 3 retries across all configs. This ensures
  DNS completes within the relevant RPC timeouts.
- Remove RA service queue from VA, since VA no longer uses the callback to RA on
  completing a challenge.
2016-12-05 14:35:27 -08:00
Roland Bracewell Shoemaker 03fdd65bfe Add gRPC server to SA (#2374)
Adds a gRPC server to the SA and SA gRPC Clients to the WFE, RA, CA, Publisher, OCSP updater, orphan finder, admin revoker, and expiration mailer.

Also adds a CA gRPC client to the OCSP Updater which was missed in #2193.

Fixes #2347.
2016-12-02 17:24:46 -08:00
Jacob Hoffman-Andrews 7c624d013e Remove RA->{VA,CA} AMQP configs. (#2371) 2016-11-30 08:55:03 -05:00
Roland Bracewell Shoemaker a87379bc6e Add gRPC server to RA (#2350)
Fixes #2348.
2016-11-29 15:34:35 -08:00
Roland Bracewell Shoemaker c5f99453a9 Switch CT submission RPC from CA -> RA (#2304)
With the current gRPC design the CA talks directly to the Publisher when calling SubmitToCT which crosses security bounadries (secure internal segment -> internet facing segment) which is dangerous if (however unlikely) the Publisher is compromised and there is a gRPC exploit that allows memory corruption on the caller end of a RPC which could expose sensitive information or cause arbitrary issuance.

Instead we move the RPC call to the RA which is in a less sensitive network segment. Switching the call site from the CA -> RA is gated on adding the gRPC PublisherService object to the RA config.

Fixes #2202.
2016-11-08 11:39:02 -08:00
Daniel McCarney 6c983e8c9e Implements client whitelisting for gRPC. (#2307)
As described in #2282, our gRPC code uses mutual TLS to authenticate both clients and servers. However, currently our gRPC servers will accept any client certificate signed by the internal CA we use to authenticate connections. Instead, we would like each server to have a list of which clients it will accept. This will improve security by preventing the compromise of one client private key being used to access endpoints unrelated to its intended scope/purpose.

This PR implements support for gRPC servers to specify a list of accepted client names. A `serverTransportCredentials` implementing `ServerHandshake` uses a `verifyClient` function to enforce that the connecting peer presents a client certificate with a SAN entry that matches an entry on the list of accepted client names

The `NewServer` function from `grpc/server.go` is updated to instantiate the `serverTransportCredentials` used by `grpc.NewServer`, specifying an accepted names list populated from the `cmd.GRPCServerConfig.ClientNames` config field.

The pre-existing client and server certificates in `test/grpc-creds/` are replaced by versions that contain SAN entries as well as subject common names. A DNS and an IP SAN entry are added to allow testing both methods of specifying allowed SANs. The `generate.sh` script is converted to use @jsha's `minica` tool (OpenSSL CLI is blech!).

An example client whitelist is added to each of the existing gRPC endpoints in config-next/ to allow the SAN of the test RPC client certificate.

Resolves #2282
2016-11-08 13:57:34 -05:00
Daniel McCarney eb67ad4f88 Allow `validateEmail` to timeout w/o error. (#2288)
This PR reworks the validateEmail() function from the RA to allow timeouts during DNS validation of MX/A/AAAA records for an email to be non-fatal and match our intention to verify emails best-effort.

Notes:

bdns/problem.go - DNSError.Timeout() was changed to also include context cancellation and timeout as DNS timeouts. This matches what DNSError.Error() was doing to set the error message and supports external callers to Timeout not duplicating the work.

bdns/mocks.go - the LookupMX mock was changed to support always.error and always.timeout in a manner similar to the LookupHost mock. Otherwise the TestValidateEmail unit test for the RA would fail when the MX lookup completed before the Host lookup because the error wouldn't be correct (empty DNS records vs a timeout or network error).

test/config/ra.json, test/config-next/ra.json - the dnsTries and dnsTimeout values were updated such that dnsTries * dnsTimeout was <= the WFE->RA RPC timeout (currently 15s in the test configs). This allows the dns lookups to all timeout without the overall RPC timing out.

Resolves #2260.
2016-10-27 11:56:12 -07:00
Roland Bracewell Shoemaker ce679bad41 Implement key rollover (#2231)
Fixes #503.

Functionality is gated by the feature flag `AllowKeyRollover`. Since this functionality is only specified in ACME draft-03 and we mostly implement the draft-02 style this takes some liberties in the implementation, which are described in the updated divergences doc. The `key-change` resource is used to side-step draft-03 `url` requirement.
2016-10-27 10:22:09 -04:00
Roland Bracewell Shoemaker 5fabc90a16 Add IDN support (#2215)
Add feature flagged support for issuing for IDNs, fixes #597.

This patch expects that clients have performed valid IDN2008 encoding on any label that includes unicode characters. Invalid encodings (including non-compatible IDN2003 encoding) will be rejected. No script-mixing or script exclusion checks are performed as we assume that if a name is resolvable that it conforms to the registrar's policies on these matters and if it uses non-standard scripts in sub-domains etc that browsers should be the ones choosing how to display those names.

Required a full update of the golang.org/x/net tree to pull in golang.org/x/net/idna, all test suites pass.
2016-10-06 13:05:37 -04:00