Commit Graph

82 Commits

Author SHA1 Message Date
Jacob Hoffman-Andrews a4f9de9e35 Improve nesting of RPC deadlines (#3619)
gRPC passes deadline information through the RPC boundary, but client and server have the same deadline. Ideally we'd like the server to have a slightly tighter deadline than the client, so if one of the server's onward RPCs or other network calls times out, the server can pass back more detailed information to the client, rather than the client timing out the server and losing the opportunity to log more detailed information about which component caused the timeout.

In this change, I subtract 100ms from the deadline on the server side of our interceptors, using our existing serverInterceptor. I also check that there is at least 100ms remaining in which to do useful work, so the server doesn't begin a potentially expensive task only to abort it.

Fixes #3608.
2018-04-06 15:40:18 +01:00
Roland Bracewell Shoemaker 9c9e944759 Add SCT embedding (#3521)
Adds SCT embedding to the certificate issuance flow. When a issuance is requested a precertificate (the requested certificate but poisoned with the critical CT extension) is issued and submitted to the required CT logs. Once the SCTs for the precertificate have been collected a new certificate is issued with the poison extension replace with a SCT list extension containing the retrieved SCTs.

Fixes #2244, fixes #3492 and fixes #3429.
2018-03-12 11:58:30 -07:00
Daniel McCarney 2612bf7168 Remove deprecated `sa.CountPendingOrders` cruft. (#3527)
#3501 made this code deprecated. We've deployed 3501 to the staging environment and can now pull out the old cruft.

Resolves #3502
2018-03-06 21:20:40 +00:00
Daniel McCarney f2d3ad6d52 Enforce new orders per acct per window rate limit. (#3501)
Previously we introduced the concept of a "pending orders per account
ID" rate limit. After struggling with making an implementation of this
rate limit perform well we reevaluated the problem and decided a "new
orders per account per time window" rate limit would be a better fit for
ACMEv2 overall.

This commit introduces the new newOrdersPerAccount rate limit. The RA
now checks this before creating new pending orders in ra.NewOrder. It
does so after order reuse takes place ensuring the rate limit is only
applied in cases when a distinct new pending order row would be created.
To accomplish this a migration for a new orders field (created) and an
index over created and registrationID is added. It would be possible to
use the existing expires field for this like we've done in the past, but that
was primarily to avoid running a migration on a large table in prod. Since
we don't have that problem yet for V2 tables we can Do The Right Thing
and add a column.

For deployability the deprecated pendingOrdersPerAccount code & SA
gRPC bits are left around. A follow-up PR will be needed to remove
those (#3502).

Resolves #3410
2018-03-02 10:47:39 -08:00
Daniel McCarney 04b2b17db3 Remove deprecated `sa.GetOrderAuthorizations`. (#3470)
It has been replaced by `sa.GetValidOrderAuthorizations`, the same RPC
with a clearer name.

Resolves #3424
2018-02-21 11:59:46 -08:00
Daniel McCarney 4ac109ac25 Do not reuse legacy authzs in V2 new-order. (#3432)
Prior to this commit when building up the authorizations for a new-order
request we looked for any unexpired pending/valid authorizations owned
by the account and used them for the order. This allows a client to use
the V1 new-authz endpoint in combination with the V2 new-order endpoint
and we do not want to support this behaviour. All V2 authorizations
should be sourced from other V2 orders. This commit implements a new
parameter for the SA's getAuthorizations function that allows filtering
out legacy V1 authorizations by doing a JOIN on the order to
authorizations join table.

Resolves #3328
2018-02-08 12:31:04 -08:00
Daniel McCarney d7bfb542c0
Handle order finalization errors. (#3404)
This commit resolves the case where an error during finalization occurs.
Prior to this commit if an error (expected or otherwise) occurred after
setting an order to status processing at the start of order
finalization the order would be stuck processing forever.

The SA now has a `SetOrderError` RPC that can be used by the RA to
persist an error onto an order. The order status calculation can use
this error to decide if the order is invalid. The WFE is updated to
write the error to the order JSON when displaying the order information.

Prior to this commit the order protobuf had the error field as
a `[]byte`. It doesn't seem like this is the right decision, we have
a specific protobuf type for ProblemDetails and so this commit switches
the error field to use it. The conversion to/from `[]byte` is done with
the model by the SA.

An integration test is included that prior to this commit left an order
in a stuck processing state. With this commit the integration test
passes as expected.

Resolves https://github.com/letsencrypt/boulder/issues/3403
2018-02-07 16:34:07 -05:00
Daniel McCarney 67ae7f75b4 `sa.GetOrderAuthorizations` -> `sa.GetValidOrderAuthorizations`. (#3411)
The SA RPC previously called `GetOrderAuthorizations` only returns
**valid, unexpired** authorizations. This commit updates the name to
emphasize that it only returns valid order authzs.
2018-02-07 11:54:18 -08:00
Roland Bracewell Shoemaker 62f3978f3b
Add inital CTPolicy impl (#3414)
Adds a package which implements group based SCT retrieval.

Fixes #3412.
2018-02-06 10:52:20 -08:00
Daniel McCarney eea049da40
Fix order reuse, calc order status by authz status (#3402)
This PR is a rework of what was originally https://github.com/letsencrypt/boulder/pull/3382, integrating the design feedback proposed by @jsha: https://github.com/letsencrypt/boulder/pull/3382#issuecomment-359912549 

This PR removes the stored Order status field and replaces it with a value that is calculated on-the-fly by the SA when fetching an order, based on the order's associated authorizations. 

In summary (and order of precedence):
* If any of the order's authorizations are invalid, the order is invalid.
* If any of the order's authorizations are deactivated, the order is deactivated.
* If any of the order's authorizations are pending, the order is pending.
* If all of the order's authorizations are valid, and there is a certificate serial, the order is valid.
* If all of the order's authorizations are valid, and we have began processing, but there is no certificate serial, the order is processing.
* If all of the order's authorizations are valid, and we haven't processing, then the order is pending waiting a finalization request.

This avoids having to explicitly update the order status when an associated authorization changes status.

The RA's implementation of new-order is updated to only reuse an existing order if the calculated status is pending. This avoids giving back invalid or deactivated orders to clients.

Resolves #3333
2018-02-01 16:33:42 -05:00
Jacob Hoffman-Andrews 8153b919be
Implement TLSSNIRevalidation (#3361)
This change adds a feature flag, TLSSNIRevalidation. When it is enabled, Boulder
will create new authorization objects with TLS-SNI challenges if the requesting
account has issued a certificate with the relevant domain name, and was the most
recent account to do so*. This setting overrides the configured list of
challenges in the PolicyAuthority, so even if TLS-SNI is disabled in general, it
will be enabled for revalidation.

Note that this interacts with EnforceChallengeDisable. Because
EnforceChallengeDisable causes additional checked at validation time and at
issuance time, we need to update those two places as well. We'll send a
follow-up PR with that.

*We chose to make this work only for the most recent account to issue, even if
there were overlapping certificates, because it significantly simplifies the
database access patterns and should work for 95+% of cases.

Note that this change will let an account revalidate and reissue for a domain
even if the previous issuance on that account used http-01 or dns-01. This also
simplifies implementation, and fits within the intent of the mitigation plan: If
someone previously issued for a domain using http-01, we have high confidence
that they are actually the owner, and they are not going to "steal" the domain
from themselves using tls-sni-01.

Also note: This change also doesn't work properly with ReusePendingAuthz: true.
Specifically, if you attempted issuance in the last couple days and failed
because there was no tls-sni challenge, you'll still have an http-01 challenge
lying around, and we'll reuse that; then your client will fail due to lack of
tls-sni challenge again.

This change was joint work between @rolandshoemaker and @jsha.
2018-01-12 11:00:06 -08:00
Daniel McCarney 7bb16ff21e ACMEv2: Add pending order reuse (#3290)
This commit adds pending order reuse. Subsequent to this commit multiple
add-order requests from the same account ID for the same set of order
names will result in only one order being created. Orders are only
reused while they are not expired. Finalized orders will not be reused
for subsequent new-order requests allowing for duplicate order issuance.

Note that this is a second level of reuse, building on the pending
authorization reuse that's done between separate orders already.

To efficiently find an appropriate order ID given a set of names,
a registration ID, and the current time a new orderFqdnSets table is
added with appropriate indexes and foreign keys.

Resolves #3258
2018-01-02 13:27:16 -08:00
Daniel McCarney 491f82f367
v2 API gRPC wrapper bugfixes (#3273)
* Allow nil `Authz` slice in `GetAuthorizations` response.

The `StorageAuthorityClientWrapper` was enforcing that the response to
a `GetAuthorizations` request did not have `resp.Authz == nil`. This
meant that the RA's `NewOrder` function failed when creating an order
for names that had no existing authorizations to reuse.

This commit updates the wrapper to allow `resp.Authz` to be nil - this
is a valid case when there are no authorizations found.

* Fix SA server wrapper `AddPendingAuthorizations` logic.

Prior to this commit the `StorageAuthorityServerWrapper`'s
`AddPendingAuthorizations` function had an error in the boolean logic
for determining if a request was incomplete. It was rejecting any
requests that had a non-nil `Authz`. This commit fixes the logic so that
it rejects requests that have a **nil** `Authz`.

* Add `newOrderValid` for new-order rpc wrappers.

This commit updates the `StorageAuthorityServerWrapper`'s `NewOrder`
function to use a new pb-marshalling utility function `newOrderValid` to
determine if the provided order is valid or not. Previous to this commit
the `NewOrder` server wrapper used `orderValid` which rejected orders
that had a nil `Id`. This is incorrect because **all** orders provided
to `NewOrder` have a nil id! They haven't been added yet :-)

* Fix SA server wrapper `GetOrder` incomplete response check.

Prior to this commit the `StorageAuthorityClientWrapper`'s `GetOrder`
function was validating that the returned order had a non-nil
`CertificateSerial`. This isn't correct - you can GET an order that
hasn't been finalized with a certificate and it should work. This commit
updates the `GetOrder` function to use the utility `orderValid` function
that allows for a nil `CertificateSerial` but enforces all other fields
are populated as expected.

* Allow nil Authz in `GetOrderAuthorizations` response.

This commit fixes the `StorageAuthorityClientWrapper`'s
`GetOrderAuthorizations` function to not consider a response with a nil
`Authz` array incomplete. This condition happens under normal
circumstances when an attempt to finalize an order is made for an order
that has completed no authorizations.
2017-12-14 14:48:44 -05:00
Jacob Hoffman-Andrews 68d5cc3331
Restore gRPC metrics (#3265)
The go-grpc-prometheus package by default registers its metrics with Prometheus' global registry. In #3167, when we stopped using the global registry, we accidentally lost our gRPC metrics. This change adds them back.

Specifically, it adds two convenience functions, one for clients and one for servers, that makes the necessary metrics object and registers it. We run these in the main function of each server.

I considered adding these as part of StatsAndLogging, but the corresponding ClientMetrics and ServerMetrics objects (defined by go-grpc-prometheus) need to be subsequently made available during construction of the gRPC clients and servers. We could add them as fields on Scope, but this seemed like a little too much tight coupling.

Also, update go-grpc-prometheus to get the necessary methods.

```
$ go test github.com/grpc-ecosystem/go-grpc-prometheus/...
ok      github.com/grpc-ecosystem/go-grpc-prometheus    0.069s
?       github.com/grpc-ecosystem/go-grpc-prometheus/examples/testproto [no test files]
```
2017-12-07 15:44:55 -08:00
Daniel McCarney 0684d5fc73
Add pending orders rate limit to new-order. (#3257)
This commit adds a new rate limit to restrict the number of outstanding
pending orders per account. If the threshold for this rate limit is
crossed subsequent new-order requests will return a 429 response.

Note: Since this the rate limit object itself defines an `Enabled()`
test based on whether or not it has been configured there is **not**
a feature flag for this change.

Resolves https://github.com/letsencrypt/boulder/issues/3246
2017-12-04 16:36:48 -05:00
Daniel McCarney 2f263f8ed5 ACME v2 Finalize order support (#3169)
This PR implements order finalization for the ACME v2 API.

In broad strokes this means:

* Removing the CSR from order objects & the new-order flow
* Adding identifiers to the order object & new-order
* Providing a finalization URL as part of orders returned by new-order
* Adding support to the WFE's Order endpoint to receive finalization POST requests with a CSR
* Updating the RA to accept finalization requests and to ensure orders are fully validated before issuance can proceed
* Updating the SA to allow finding order authorizations & updating orders.
* Updating the CA to accept an Order ID to log when issuing a certificate corresponding to an order object

Resolves #3123
2017-11-01 12:39:44 -07:00
Roland Bracewell Shoemaker b7bca87134 Batch fetching of existing authorizations and creation of pending authorizations (#3058)
For the new-order endpoint only. This does some refactoring of the order of operations in `ra.NewAuthorization` as well in order to reduce the duplication of code relating to creating pending authorizations, existing tests still seem to work as intended... A close eye should be given to this since we don't have integration tests yet that test it end to end. This also changes the inner type of `grpc.StorageAuthorityServerWrapper` to `core.StorageAuthority` so that we can avoid a circular import that is created by needing to import `grpc.AuthzToPB` and `grpc.PBToAuthz` in `sa/sa.go`.

This is a big change but should considerably improve the performance of the new-order flow.

Fixes #2955.
2017-09-25 09:10:59 -07:00
Roland Bracewell Shoemaker e91349217e Switch to using go 1.9 (#3047)
* Switch to using go 1.9

* Regenerate with 1.9

* Manually fix import path...

* Upgrade mockgen and regenerate

* Update github.com/golang/mock
2017-09-06 16:30:13 -04:00
Roland Bracewell Shoemaker f193137405 Remove superfluous gRPC error encodings (#3048)
Follow up from #3041

Fixes #2589
2017-09-06 12:38:10 -07:00
Jacob Hoffman-Andrews 18f15b2b3d Remove unused error types (#3041)
* Remove all of the errors under core. Their purpose is now served by errors, and they were almost entirely unused. The remaining uses were switched to errors.
* Remove errors.NotSupportedError. It was used in only one place (ca.go), and that usage is more appropriately a ServerInternal error.
2017-09-05 16:51:32 -07:00
Roland Bracewell Shoemaker 191a043585 Implement handler for retrieving an order object and SA RPC (#3016)
Fixes #2984 and fixes #2985.
2017-09-01 15:26:36 -07:00
Brian Smith e670e6e6b5 CA: Stub IssueCertificateForPrecertificate(). (#2973)
Stub out IssueCertificateForPrecertificate() enough so that we can continue with the PRs that implement & test it in parallel with PRs that implement and test the calling side (via mock implementations of the CA side).
2017-08-15 16:50:21 -07:00
Roland Bracewell Shoemaker 90ba766af9 Add NewOrder RPCs + methods to SA and RA (#2907)
Fixes #2875, #2900 and #2901.
2017-08-11 14:24:25 -04:00
Brian Smith d2291f6c5a CA: Implement IssuePrecertificate. (#2946)
* CA: Stub IssuePrecertificate gPRC method.

* CA: Implement IssuePrecertificate.

* CA: Test Precertificate flow in TestIssueCertificate().

move verification of certificate storage

IssuePrecertificate tests

Add CT precertificate poison extension to CFSSL whitelist.

CFSSL won't allow us to add an extension to a certificate unless that
certificate is in the whitelist.

According to its documentation, "Extensions requested in the CSR are
ignored, except for those processed by ParseCertificateRequest (mainly
subjectAltName)." Still, at least we need to add tests to make sure a
poison extension in a CSR isn't copied into the final certificate.

This allows us to avoid making invasive changes to CFSSL.

* CA: Test precertificate issuance in TestInvalidCSRs().

* CA: Only support IssuePrecertificate() if it is explicitly enabled.

* CA: Test that we produce CT poison extensions in the valid form.

The poison extension must be critical in order to work correctly. It probably wouldn't
matter as much what the value is, but the spec requires the value to be ASN.1 NULL, so
verify that it is.
2017-08-09 21:05:39 -07:00
Roland Bracewell Shoemaker fcef38f78c Performance and cleanup database migration (#2882)
Switch certificates and certificateStatus to use autoincrement primary keys to avoid performance problems with clustered indexes (fixes #2754).

Remove empty externalCerts and identifierData tables (fixes #2881).

Make progress towards deleting unnecessary LockCol and subscriberApproved fields (#856,  #873) by making them NULLable and not including them in INSERTs and UPDATEs.
2017-07-26 15:18:28 -07:00
Jacob Hoffman-Andrews 8bc1db742c Improve recycling of pending authzs (#2896)
The existing ReusePendingAuthz implementation had some bugs:

It would recycle deactivated authorizations, which then couldn't be fulfilled. (#2840)
Since it was implemented in the SA, it wouldn't get called until after the RA checks the Pending Authorizations rate limit. Which means it wouldn't fulfill its intended purpose of making accounts less likely to get stuck in a Pending Authorizations limited state. (#2831)
This factors out the reuse functionality, which used to be inside an "if" statement in the SA. Now the SA has an explicit GetPendingAuthorization RPC, which gets called from the RA before calling NewPendingAuthorization. This happens to obsolete #2807, by putting the recycling logic for both valid and pending authorizations in the RA.
2017-07-26 14:00:30 -07:00
Daniel McCarney 2a84bc2495 Replace go-jose v1 with go-jose v2. (#2899)
This commit replaces the Boulder dependency on
gopkg.in/square/go-jose.v1 with gopkg.in/square/go-jose.v2. This is
necessary both to stay in front of bitrot and because the ACME v2 work
will require a feature from go-jose.v2 for JWS validation.

The largest part of this diff is cosmetic changes:

Changing import paths
jose.JsonWebKey -> jose.JSONWebKey
jose.JsonWebSignature -> jose.JSONWebSignature
jose.JoseHeader -> jose.Header
Some more significant changes were caused by updates in the API for
for creating new jose.Signer instances. Previously we constructed
these with jose.NewSigner(algorithm, key). Now these are created with
jose.NewSigner(jose.SigningKey{},jose.SignerOptions{}). At present all
signers specify EmbedJWK: true but this will likely change with
follow-up ACME V2 work.

Another change was the removal of the jose.LoadPrivateKey function
that the wfe tests relied on. The jose v2 API removed these functions,
moving them to a cmd's main package where we can't easily import them.
This function was reimplemented in the WFE's test code & updated to fail
fast rather than return errors.

Per CONTRIBUTING.md I have verified the go-jose.v2 tests at the imported
commit pass:

ok      gopkg.in/square/go-jose.v2      14.771s
ok      gopkg.in/square/go-jose.v2/cipher       0.025s
?       gopkg.in/square/go-jose.v2/jose-util    [no test files]
ok      gopkg.in/square/go-jose.v2/json 1.230s
ok      gopkg.in/square/go-jose.v2/jwt  0.073s

Resolves #2880
2017-07-26 10:55:14 -07:00
Brian Smith ac63c78313 CA: Have IssueCertificate use IssueCertificateRequest directly. (#2886)
This is a step towards the long-term goal of eliminating wrappers and a step
towards the short-term goal of making it easier to refactor ca/ca_test.go to
add testing of precertificate-based issuance.
2017-07-25 13:35:25 -04:00
Daniel McCarney fbd87b1757 Splits CountRegistrationsByIP to exact-match and by /48. (#2782)
Prior to this PR the SA's `CountRegistrationsByIP` treated IPv6
differently than IPv4 by counting registrations within a /48 for IPv6 as
opposed to exact matches for IPv4. This PR updates
`CountRegistrationsByIP` to treat IPv4 and IPv6 the
same, always matching exactly. The existing RegistrationsPerIP rate
limit policy will be applied against this exact matching count.

A new `CountRegistrationsByIPRange` function is added to the SA that
performs the historic matching process, e.g. for IPv4 it counts exactly
the same as `CountRegistrationsByIP`, but for IPv6 it counts within
a /48.

A new `RegistrationsPerIPRange` rate limit policy is added to allow
configuring the threshold/window for the fuzzy /48 matching registration
limit. Stats for the "Exceeded" and "Pass" events for this rate limit are
separated into a separate `RegistrationsByIPRange` stats scope under
the `RateLimit` scope to allow us to track it separate from the exact 
registrations per IP rate limit.

Resolves https://github.com/letsencrypt/boulder/issues/2738
2017-05-30 15:12:20 -07:00
Daniel McCarney 47452d6c6c Prefer IPv6 addresses, fall back to IPv4. (#2715)
This PR introduces a new feature flag "IPv6First". 

When the "IPv6First" feature is enabled the VA's HTTP dialer and TLS SNI
(01 and 02) certificate fetch requests will attempt to automatically
retry when the initial connection was to IPv6 and there is an IPv4
address available to retry with.

This resolves https://github.com/letsencrypt/boulder/issues/2623
2017-05-08 13:00:16 -07:00
Daniel McCarney 361e7d4caa Clean up `berrors` (#2724)
This PR removes two berrors that aren't used anywhere in the codebase:

TooManyRequests , a holdover from AMQP, and is no longer used.
UnsupportedIdentifier, used just for rejecting IDNs, which we no longer do.
In addition, the SignatureValidation error was only used by the WFE so it is moved there and unexported.

Note for reviewers: To remove berrors.UnsupportedIdentifierError I replaced the errIDNNotSupported error in policy/pa.go with a berrors.MalformedError with the same name. This allows removing UnsupportedIdentifierError ahead of #2712 which removes the IDNASupport feature flag. This seemed OK to me, but I can restore UnsupportedIdentifierError and clean it up after 2712 if that's preferred.

Resolves #2709
2017-05-04 10:56:26 +01:00
Daniel McCarney 1ed34a4a5d Fixes cert count rate limit for exact PSL matches. (#2703)
Prior to this PR if a domain was an exact match to a public suffix
list entry the certificates per name rate limit was applied based on the
count of certificates issued for that exact name and all of its
subdomains.

This PR introduces an exception such that exact public suffix
matches correctly have the certificate per name rate limit applied based
on only exact name matches.

In order to accomplish this a new RPC is added to the SA
`CountCertificatesByExactNames`. This operates similar to the existing
`CountCertificatesByNames` but does *not* include subdomains in the
count, only exact matches to the names provided. The usage of this new
RPC is feature flag gated behind the "CountCertificatesExact" feature flag.

The RA unit tests are updated to test the new code paths both with and
without the feature flag enabled.

Resolves #2681
2017-05-02 13:43:35 -07:00
Jacob Hoffman-Andrews d542960a35 Remove statsd version of RPC stats (#2693)
* Remove statsd-style RPC stats.

* Remove tests for old code.
2017-04-25 10:10:35 -04:00
Jacob Hoffman-Andrews d99800ecb1 Remove some last traces of AMQP. (#2687)
Fixes #2665
2017-04-20 10:43:17 -07:00
Roland Bracewell Shoemaker fd561ef842 Block issuance on first OCSP response generation (#2633)
Generate first OCSP response in ca.IssueCertificate instead of ocsp-updater.newCertificateTick
if features.GenerateOCSPEarly is enabled. Adds a new field to the sa.AddCertiifcate RPC for
the OCSP response and only adds it to the certificate status + sets ocspLastUpdated if it is a
non-empty slice. ocsp-updater.newCertificateTick stays the same so we can catch certificates
that were successfully signed + stored but a OCSP response couldn't be generated (for whatever
reason).

Fixes #2477.
2017-04-04 11:28:09 -07:00
Roland Bracewell Shoemaker 08f4dda038 Update github.com/grpc-ecosystem/go-grpc-prometheus and google.golang.org/grpc (#2637)
Updates the various gRPC/protobuf libs (google.golang.org/grpc/... and github.com/golang/protobuf/proto) and the boulder-tools image so that we can update to the newest github.com/grpc-ecosystem/go-grpc-prometheus. Also regenerates all of the protobuf definition files.

Tests run on updated packages all pass.

Unblocks #2633 fixes #2636.
2017-04-03 11:13:48 -07:00
Roland Bracewell Shoemaker e2b2511898 Overhaul internal error usage (#2583)
This patch removes all usages of the `core.XXXError` and almost all usages of `probs` outside of the WFE and VA and replaces them with a unified internal error type. Since the VA uses `probs.ProblemDetails` quite extensively in challenges, and currently stores them in the DB I've saved this change for another change (it'll also require a migration). Since `ProblemDetails` should only ever be exposed to end-users all of its related logic should be moved into the `WFE` but since it still needs to be exposed to the VA and SA I've left it in place for now.

The new internal `errors` package offers the same convenience functions as `probs` does as well as a new simpler type testing method. A few small changes have also been made to error messages, mainly adding the library and function name to internal server errors for easier debugging (i.e. where a number of functions return the exact same errors and there is no other way to distinguish which method threw the error).

Also adds proper encoding of internal errors transferred over gRPC (the current encoding scheme is kept for `core` and `probs` errors since it'll be ideally be removed after we deploy this and follow-up changes) using `grpc/metadata` instead of the gRPC status codes.

Fixes #2507. Updates #2254 and #2505.
2017-03-22 23:27:31 -07:00
Roland Bracewell Shoemaker a7cd4fb2c7 Don't wrap errors we return from boulder/grpc/creds.ClientHandshake (#2590)
The gRPC client reconnect code needs to be able to check if a error is temporary so that it can decide if it should attempt to reconnect or just fail and kill the client[1]. By wrapping the error we were receiving in our TLS handshake code we were removing the existing `Temporary` interface on the error. This meant that if a client attempted to reconnect to a server that was in the process of being shutdown, the client would consider that server permanently dead and never retry.

Fix is simple: don't wrap errors that we pass back into the gRPC internals so that they can be properly inspected.

[1]: aefc96d792/clientconn.go (L783)
2017-03-01 11:27:03 -08:00
David Calavera 0dc2513d2d
Generate GRPC objects with Go 1.8.
Signed-off-by: David Calavera <david.calavera@gmail.com>
2017-02-21 12:11:17 +01:00
Roland Bracewell Shoemaker 0c04fe2f5e Move error wrapping/unwrapping into the interceptors (#2556)
Instead of using `unwrapError/wrapError` in each of the wrapper functions do it in the server/client interceptors instead. This means we now consistently do error unwrapping/wrapping.

Fixes #2509.
2017-02-13 12:56:23 -05:00
Roland Bracewell Shoemaker 18de73f0d8 Pass nil errors through boulder/grpc wrapError/unwrapError (#2544)
Instead of trying to wrap or unwrap them which causes panics.

Also, expand the test_ct_submission integration test to include resubmissions.
2017-02-06 18:19:39 -08:00
Daniel e88db3cd5e
Revert "Revert "Copy all statsd stats to Prometheus. (#2474)" (#2541)"
This reverts commit 9d9e4941a5 and
restores the statsd prometheus code.
2017-02-01 15:48:18 -05:00
Daniel McCarney 9d9e4941a5 Revert "Copy all statsd stats to Prometheus. (#2474)" (#2541)
This reverts commit 58ccd7a71a.

We are seeing multiple boulder components restart when they encounter the stat registration race condition described in https://github.com/letsencrypt/boulder/issues/2540
2017-02-01 12:50:27 -05:00
Roland Bracewell Shoemaker 7853532972 Encode challenge errors and validation records when handling protobufs (#2520)
Previously we had `Error` and `ValidationRecords` fields in the `Challenge` protobuf but they were never populated which mean't that when using gRPC these fields wouldn't be sent to the SA from the RA on a `FinalizeAuthorization` call. This change populates those fields and updates the PB marshaling tests to verify the correct behavior.

Fixes #2514.
2017-01-25 09:39:35 -05:00
Jacob Hoffman-Andrews 6c93b41f20 Add a limit on failed authorizations (#2513)
Fixes #976.

This implements a new rate limit, InvalidAuthorizationsPerAccount. If a given account fails authorization for a given hostname too many times within the window, subsequent new-authz attempts for that account and hostname will fail early with a rateLimited error. This mitigates the misconfigured clients that constantly retry authorization even though they always fail (e.g., because the hostname no longer resolves).

For the new rate limit, I added a new SA RPC, CountInvalidAuthorizations. I chose to implement this only in gRPC, not in AMQP-RPC, so checking the rate limit is gated on gRPC. See #2406 for some description of the how and why. I also chose to directly use the gRPC interfaces rather than wrapping them in core.StorageAuthority, as a step towards what we will want to do once we've moved fully to gRPC.

Because authorizations don't have a created time, we need to look at the expires time instead. Invalid authorizations retain the expiration they were given when they were created as pending authorizations, so we use now + pendingAuthorizationLifetime as one side of the window for rate limiting, and look backwards from there. Note that this means you could maliciously bypass this rate limit by stacking up pending authorizations over time, then failing them all at once.

Similarly, since this limit is by (account, hostname) rather than just (hostname), you can bypass it by creating multiple accounts. It would be more natural and robust to limit by hostname, like our certificate limits. However, we currently only have two indexes on the authz table: the primary key, and

(`registrationID`,`identifier`,`status`,`expires`)

Since this limit is intended mainly to combat misconfigured clients, I think this is sufficient for now.

Corresponding PR for website: letsencrypt/website#125
2017-01-23 11:22:51 -08:00
Roland Bracewell Shoemaker 7d7adabe44 Allow probs.ProblemDetails to be passed across gRPC layer (#2506)
Currently services will pass both `core.XXXError` and `probs.XXX` type errors across the gRPC layer. In the future (#2505) we intend to stop passing `probs.XXX` type errors across this layer but for now we need to support them until that change is landed. This patch takes the easiest path to allow this by encoding the `probs.ProblemDetails` to JSON and storing it in the gRPC error body so that it can be passed around.

Fixes #2497.
2017-01-19 14:59:44 -08:00
Jacob Hoffman-Andrews 9dacdd5443 Fix SA wrappers for maps. (#2498)
We turn arrays into maps with a range command. Previously, we were taking the
address of the iteration variable in that range command, which meant incorrect
results since the iteration variable gets reassigned.

Also change the integration test to catch this error.

Fixes #2496
2017-01-17 14:07:07 -08:00
Jacob Hoffman-Andrews d6ba7fcba9 Add some timing histogram stats (#2482)
Previously our gRPC client code called the wrong function, enabling server-side instead of client-side histograms.

Also, add a timing stat for the generate / store combination in OCSP Updater.
2017-01-10 11:02:41 -08:00
Jacob Hoffman-Andrews 58ccd7a71a Copy all statsd stats to Prometheus. (#2474)
We have a number of stats already expressed using the statsd interface. During
the switchover period to direct Prometheus collection, we'd like to make those
stats available both ways. This change automatically exports any stats exported
using the statsd interface via Prometheus as well.

This is a little tricky because Prometheus expects all stats to by registered
exactly once. Prometheus does offer a mechanism to gracefully recover from
registering a stat more than once by handling a certain error, but it is not
safe for concurrent access. So I added a concurrency-safe wrapper that creates
Prometheus stats on demand and memoizes them.

In the process, made a few small required side changes:
 - Clean "/" from method names in the gRPC interceptors. They are allowed in
   statsd but not in Prometheus.
 - Replace "127.0.0.1" with "boulder" as the name of our testing CT log.
   Prometheus stats can't start with a number.
 - Remove ":" from the CT-log stat names emitted by Publisher. Prometheus stats
   can't include it.
 - Remove a stray "RA" in front of some rate limit stats, since it was
   duplicative (we were emitting "RA.RA..." before).

Note that this means two stat groups in particular are duplicated:
 - Gostats* is duplicated with the default process-level stats exported by the
   Prometheus library.
 - gRPCClient* are duplicated by the stats generated by the go-grpc-prometheus
   package.

When writing dashboards and alerts in the Prometheus world, we should be careful
to avoid these two categories, as they will disappear eventually. As a general
rule, if a stat is available with an all-lowercase name, choose that one, as it
is probably the Prometheus-native version.

In the long run we will want to create most stats using the native Prometheus
stat interface, since it allows us to use add labels to metrics, which is very
useful. For instance, currently our DNS stats distinguish types of queries by
appending the type to the stat name. This would be more natural as a label in
Prometheus.
2017-01-10 10:30:15 -05:00
Jacob Hoffman-Andrews 510e279208 Simplify gRPC TLS configs. (#2470)
Previously, a given binary would have three TLS config fields (CA cert, cert,
key) for its gRPC server, plus each of its configured gRPC clients. In typical
use, we expect all three of those to be the same across both servers and clients
within a given binary.

This change reuses the TLSConfig type already defined for use with AMQP, adds a
Load() convenience function that turns it into a *tls.Config, and configures it
for use with all of the binaries. This should make configuration easier and more
robust, since it more closely matches usage.

This change preserves temporary backwards-compatibility for the
ocsp-updater->publisher RPCs, since those are the only instances of gRPC
currently enabled in production.
2017-01-06 14:19:18 -08:00