boulder

Commit Graph

Author	SHA1	Message	Date
Jacob Hoffman-Andrews	a4f9de9e35	Improve nesting of RPC deadlines (#3619 ) gRPC passes deadline information through the RPC boundary, but client and server have the same deadline. Ideally we'd like the server to have a slightly tighter deadline than the client, so if one of the server's onward RPCs or other network calls times out, the server can pass back more detailed information to the client, rather than the client timing out the server and losing the opportunity to log more detailed information about which component caused the timeout. In this change, I subtract 100ms from the deadline on the server side of our interceptors, using our existing serverInterceptor. I also check that there is at least 100ms remaining in which to do useful work, so the server doesn't begin a potentially expensive task only to abort it. Fixes #3608.	2018-04-06 15:40:18 +01:00
Roland Bracewell Shoemaker	9c9e944759	Add SCT embedding (#3521 ) Adds SCT embedding to the certificate issuance flow. When a issuance is requested a precertificate (the requested certificate but poisoned with the critical CT extension) is issued and submitted to the required CT logs. Once the SCTs for the precertificate have been collected a new certificate is issued with the poison extension replace with a SCT list extension containing the retrieved SCTs. Fixes #2244, fixes #3492 and fixes #3429.	2018-03-12 11:58:30 -07:00
Daniel McCarney	2612bf7168	Remove deprecated `sa.CountPendingOrders` cruft. (#3527 ) #3501 made this code deprecated. We've deployed 3501 to the staging environment and can now pull out the old cruft. Resolves #3502	2018-03-06 21:20:40 +00:00
Daniel McCarney	f2d3ad6d52	Enforce new orders per acct per window rate limit. (#3501 ) Previously we introduced the concept of a "pending orders per account ID" rate limit. After struggling with making an implementation of this rate limit perform well we reevaluated the problem and decided a "new orders per account per time window" rate limit would be a better fit for ACMEv2 overall. This commit introduces the new newOrdersPerAccount rate limit. The RA now checks this before creating new pending orders in ra.NewOrder. It does so after order reuse takes place ensuring the rate limit is only applied in cases when a distinct new pending order row would be created. To accomplish this a migration for a new orders field (created) and an index over created and registrationID is added. It would be possible to use the existing expires field for this like we've done in the past, but that was primarily to avoid running a migration on a large table in prod. Since we don't have that problem yet for V2 tables we can Do The Right Thing and add a column. For deployability the deprecated pendingOrdersPerAccount code & SA gRPC bits are left around. A follow-up PR will be needed to remove those (#3502). Resolves #3410	2018-03-02 10:47:39 -08:00
Daniel McCarney	04b2b17db3	Remove deprecated `sa.GetOrderAuthorizations`. (#3470 ) It has been replaced by `sa.GetValidOrderAuthorizations`, the same RPC with a clearer name. Resolves #3424	2018-02-21 11:59:46 -08:00
Daniel McCarney	4ac109ac25	Do not reuse legacy authzs in V2 new-order. (#3432 ) Prior to this commit when building up the authorizations for a new-order request we looked for any unexpired pending/valid authorizations owned by the account and used them for the order. This allows a client to use the V1 new-authz endpoint in combination with the V2 new-order endpoint and we do not want to support this behaviour. All V2 authorizations should be sourced from other V2 orders. This commit implements a new parameter for the SA's getAuthorizations function that allows filtering out legacy V1 authorizations by doing a JOIN on the order to authorizations join table. Resolves #3328	2018-02-08 12:31:04 -08:00
Daniel McCarney	d7bfb542c0	Handle order finalization errors. (#3404 ) This commit resolves the case where an error during finalization occurs. Prior to this commit if an error (expected or otherwise) occurred after setting an order to status processing at the start of order finalization the order would be stuck processing forever. The SA now has a `SetOrderError` RPC that can be used by the RA to persist an error onto an order. The order status calculation can use this error to decide if the order is invalid. The WFE is updated to write the error to the order JSON when displaying the order information. Prior to this commit the order protobuf had the error field as a `[]byte`. It doesn't seem like this is the right decision, we have a specific protobuf type for ProblemDetails and so this commit switches the error field to use it. The conversion to/from `[]byte` is done with the model by the SA. An integration test is included that prior to this commit left an order in a stuck processing state. With this commit the integration test passes as expected. Resolves https://github.com/letsencrypt/boulder/issues/3403	2018-02-07 16:34:07 -05:00
Daniel McCarney	67ae7f75b4	`sa.GetOrderAuthorizations` -> `sa.GetValidOrderAuthorizations`. (#3411 ) The SA RPC previously called `GetOrderAuthorizations` only returns valid, unexpired authorizations. This commit updates the name to emphasize that it only returns valid order authzs.	2018-02-07 11:54:18 -08:00
Roland Bracewell Shoemaker	62f3978f3b	Add inital CTPolicy impl (#3414 ) Adds a package which implements group based SCT retrieval. Fixes #3412.	2018-02-06 10:52:20 -08:00
Daniel McCarney	eea049da40	Fix order reuse, calc order status by authz status (#3402 ) This PR is a rework of what was originally https://github.com/letsencrypt/boulder/pull/3382, integrating the design feedback proposed by @jsha: https://github.com/letsencrypt/boulder/pull/3382#issuecomment-359912549 This PR removes the stored Order status field and replaces it with a value that is calculated on-the-fly by the SA when fetching an order, based on the order's associated authorizations. In summary (and order of precedence): * If any of the order's authorizations are invalid, the order is invalid. * If any of the order's authorizations are deactivated, the order is deactivated. * If any of the order's authorizations are pending, the order is pending. * If all of the order's authorizations are valid, and there is a certificate serial, the order is valid. * If all of the order's authorizations are valid, and we have began processing, but there is no certificate serial, the order is processing. * If all of the order's authorizations are valid, and we haven't processing, then the order is pending waiting a finalization request. This avoids having to explicitly update the order status when an associated authorization changes status. The RA's implementation of new-order is updated to only reuse an existing order if the calculated status is pending. This avoids giving back invalid or deactivated orders to clients. Resolves #3333	2018-02-01 16:33:42 -05:00
Jacob Hoffman-Andrews	8153b919be	Implement TLSSNIRevalidation (#3361 ) This change adds a feature flag, TLSSNIRevalidation. When it is enabled, Boulder will create new authorization objects with TLS-SNI challenges if the requesting account has issued a certificate with the relevant domain name, and was the most recent account to do so. This setting overrides the configured list of challenges in the PolicyAuthority, so even if TLS-SNI is disabled in general, it will be enabled for revalidation. Note that this interacts with EnforceChallengeDisable. Because EnforceChallengeDisable causes additional checked at validation time and at issuance time, we need to update those two places as well. We'll send a follow-up PR with that. We chose to make this work only for the most recent account to issue, even if there were overlapping certificates, because it significantly simplifies the database access patterns and should work for 95+% of cases. Note that this change will let an account revalidate and reissue for a domain even if the previous issuance on that account used http-01 or dns-01. This also simplifies implementation, and fits within the intent of the mitigation plan: If someone previously issued for a domain using http-01, we have high confidence that they are actually the owner, and they are not going to "steal" the domain from themselves using tls-sni-01. Also note: This change also doesn't work properly with ReusePendingAuthz: true. Specifically, if you attempted issuance in the last couple days and failed because there was no tls-sni challenge, you'll still have an http-01 challenge lying around, and we'll reuse that; then your client will fail due to lack of tls-sni challenge again. This change was joint work between @rolandshoemaker and @jsha.	2018-01-12 11:00:06 -08:00
Daniel McCarney	7bb16ff21e	ACMEv2: Add pending order reuse (#3290 ) This commit adds pending order reuse. Subsequent to this commit multiple add-order requests from the same account ID for the same set of order names will result in only one order being created. Orders are only reused while they are not expired. Finalized orders will not be reused for subsequent new-order requests allowing for duplicate order issuance. Note that this is a second level of reuse, building on the pending authorization reuse that's done between separate orders already. To efficiently find an appropriate order ID given a set of names, a registration ID, and the current time a new orderFqdnSets table is added with appropriate indexes and foreign keys. Resolves #3258	2018-01-02 13:27:16 -08:00
Daniel McCarney	491f82f367	v2 API gRPC wrapper bugfixes (#3273 ) * Allow nil `Authz` slice in `GetAuthorizations` response. The `StorageAuthorityClientWrapper` was enforcing that the response to a `GetAuthorizations` request did not have `resp.Authz == nil`. This meant that the RA's `NewOrder` function failed when creating an order for names that had no existing authorizations to reuse. This commit updates the wrapper to allow `resp.Authz` to be nil - this is a valid case when there are no authorizations found. * Fix SA server wrapper `AddPendingAuthorizations` logic. Prior to this commit the `StorageAuthorityServerWrapper`'s `AddPendingAuthorizations` function had an error in the boolean logic for determining if a request was incomplete. It was rejecting any requests that had a non-nil `Authz`. This commit fixes the logic so that it rejects requests that have a nil `Authz`. * Add `newOrderValid` for new-order rpc wrappers. This commit updates the `StorageAuthorityServerWrapper`'s `NewOrder` function to use a new pb-marshalling utility function `newOrderValid` to determine if the provided order is valid or not. Previous to this commit the `NewOrder` server wrapper used `orderValid` which rejected orders that had a nil `Id`. This is incorrect because all orders provided to `NewOrder` have a nil id! They haven't been added yet :-) * Fix SA server wrapper `GetOrder` incomplete response check. Prior to this commit the `StorageAuthorityClientWrapper`'s `GetOrder` function was validating that the returned order had a non-nil `CertificateSerial`. This isn't correct - you can GET an order that hasn't been finalized with a certificate and it should work. This commit updates the `GetOrder` function to use the utility `orderValid` function that allows for a nil `CertificateSerial` but enforces all other fields are populated as expected. * Allow nil Authz in `GetOrderAuthorizations` response. This commit fixes the `StorageAuthorityClientWrapper`'s `GetOrderAuthorizations` function to not consider a response with a nil `Authz` array incomplete. This condition happens under normal circumstances when an attempt to finalize an order is made for an order that has completed no authorizations.	2017-12-14 14:48:44 -05:00
Jacob Hoffman-Andrews	68d5cc3331	Restore gRPC metrics (#3265 ) The go-grpc-prometheus package by default registers its metrics with Prometheus' global registry. In #3167, when we stopped using the global registry, we accidentally lost our gRPC metrics. This change adds them back. Specifically, it adds two convenience functions, one for clients and one for servers, that makes the necessary metrics object and registers it. We run these in the main function of each server. I considered adding these as part of StatsAndLogging, but the corresponding ClientMetrics and ServerMetrics objects (defined by go-grpc-prometheus) need to be subsequently made available during construction of the gRPC clients and servers. We could add them as fields on Scope, but this seemed like a little too much tight coupling. Also, update go-grpc-prometheus to get the necessary methods. ``` $ go test github.com/grpc-ecosystem/go-grpc-prometheus/... ok github.com/grpc-ecosystem/go-grpc-prometheus 0.069s ? github.com/grpc-ecosystem/go-grpc-prometheus/examples/testproto [no test files] ```	2017-12-07 15:44:55 -08:00
Daniel McCarney	0684d5fc73	Add pending orders rate limit to new-order. (#3257 ) This commit adds a new rate limit to restrict the number of outstanding pending orders per account. If the threshold for this rate limit is crossed subsequent new-order requests will return a 429 response. Note: Since this the rate limit object itself defines an `Enabled()` test based on whether or not it has been configured there is not a feature flag for this change. Resolves https://github.com/letsencrypt/boulder/issues/3246	2017-12-04 16:36:48 -05:00
Daniel McCarney	2f263f8ed5	ACME v2 Finalize order support (#3169 ) This PR implements order finalization for the ACME v2 API. In broad strokes this means: * Removing the CSR from order objects & the new-order flow * Adding identifiers to the order object & new-order * Providing a finalization URL as part of orders returned by new-order * Adding support to the WFE's Order endpoint to receive finalization POST requests with a CSR * Updating the RA to accept finalization requests and to ensure orders are fully validated before issuance can proceed * Updating the SA to allow finding order authorizations & updating orders. * Updating the CA to accept an Order ID to log when issuing a certificate corresponding to an order object Resolves #3123	2017-11-01 12:39:44 -07:00
Roland Bracewell Shoemaker	b7bca87134	Batch fetching of existing authorizations and creation of pending authorizations (#3058 ) For the new-order endpoint only. This does some refactoring of the order of operations in `ra.NewAuthorization` as well in order to reduce the duplication of code relating to creating pending authorizations, existing tests still seem to work as intended... A close eye should be given to this since we don't have integration tests yet that test it end to end. This also changes the inner type of `grpc.StorageAuthorityServerWrapper` to `core.StorageAuthority` so that we can avoid a circular import that is created by needing to import `grpc.AuthzToPB` and `grpc.PBToAuthz` in `sa/sa.go`. This is a big change but should considerably improve the performance of the new-order flow. Fixes #2955.	2017-09-25 09:10:59 -07:00
Roland Bracewell Shoemaker	e91349217e	Switch to using go 1.9 (#3047 ) * Switch to using go 1.9 * Regenerate with 1.9 * Manually fix import path... * Upgrade mockgen and regenerate * Update github.com/golang/mock	2017-09-06 16:30:13 -04:00
Roland Bracewell Shoemaker	f193137405	Remove superfluous gRPC error encodings (#3048 ) Follow up from #3041 Fixes #2589	2017-09-06 12:38:10 -07:00
Jacob Hoffman-Andrews	18f15b2b3d	Remove unused error types (#3041 ) * Remove all of the errors under core. Their purpose is now served by errors, and they were almost entirely unused. The remaining uses were switched to errors. * Remove errors.NotSupportedError. It was used in only one place (ca.go), and that usage is more appropriately a ServerInternal error.	2017-09-05 16:51:32 -07:00
Roland Bracewell Shoemaker	191a043585	Implement handler for retrieving an order object and SA RPC (#3016 ) Fixes #2984 and fixes #2985.	2017-09-01 15:26:36 -07:00
Brian Smith	e670e6e6b5	CA: Stub IssueCertificateForPrecertificate(). (#2973 ) Stub out IssueCertificateForPrecertificate() enough so that we can continue with the PRs that implement & test it in parallel with PRs that implement and test the calling side (via mock implementations of the CA side).	2017-08-15 16:50:21 -07:00
Roland Bracewell Shoemaker	90ba766af9	Add NewOrder RPCs + methods to SA and RA (#2907 ) Fixes #2875, #2900 and #2901.	2017-08-11 14:24:25 -04:00
Brian Smith	d2291f6c5a	CA: Implement IssuePrecertificate. (#2946 ) * CA: Stub IssuePrecertificate gPRC method. * CA: Implement IssuePrecertificate. * CA: Test Precertificate flow in TestIssueCertificate(). move verification of certificate storage IssuePrecertificate tests Add CT precertificate poison extension to CFSSL whitelist. CFSSL won't allow us to add an extension to a certificate unless that certificate is in the whitelist. According to its documentation, "Extensions requested in the CSR are ignored, except for those processed by ParseCertificateRequest (mainly subjectAltName)." Still, at least we need to add tests to make sure a poison extension in a CSR isn't copied into the final certificate. This allows us to avoid making invasive changes to CFSSL. * CA: Test precertificate issuance in TestInvalidCSRs(). * CA: Only support IssuePrecertificate() if it is explicitly enabled. * CA: Test that we produce CT poison extensions in the valid form. The poison extension must be critical in order to work correctly. It probably wouldn't matter as much what the value is, but the spec requires the value to be ASN.1 NULL, so verify that it is.	2017-08-09 21:05:39 -07:00
Roland Bracewell Shoemaker	fcef38f78c	Performance and cleanup database migration (#2882 ) Switch certificates and certificateStatus to use autoincrement primary keys to avoid performance problems with clustered indexes (fixes #2754). Remove empty externalCerts and identifierData tables (fixes #2881). Make progress towards deleting unnecessary LockCol and subscriberApproved fields (#856, #873) by making them NULLable and not including them in INSERTs and UPDATEs.	2017-07-26 15:18:28 -07:00
Jacob Hoffman-Andrews	8bc1db742c	Improve recycling of pending authzs (#2896 ) The existing ReusePendingAuthz implementation had some bugs: It would recycle deactivated authorizations, which then couldn't be fulfilled. (#2840) Since it was implemented in the SA, it wouldn't get called until after the RA checks the Pending Authorizations rate limit. Which means it wouldn't fulfill its intended purpose of making accounts less likely to get stuck in a Pending Authorizations limited state. (#2831) This factors out the reuse functionality, which used to be inside an "if" statement in the SA. Now the SA has an explicit GetPendingAuthorization RPC, which gets called from the RA before calling NewPendingAuthorization. This happens to obsolete #2807, by putting the recycling logic for both valid and pending authorizations in the RA.	2017-07-26 14:00:30 -07:00
Daniel McCarney	2a84bc2495	Replace go-jose v1 with go-jose v2. (#2899 ) This commit replaces the Boulder dependency on gopkg.in/square/go-jose.v1 with gopkg.in/square/go-jose.v2. This is necessary both to stay in front of bitrot and because the ACME v2 work will require a feature from go-jose.v2 for JWS validation. The largest part of this diff is cosmetic changes: Changing import paths jose.JsonWebKey -> jose.JSONWebKey jose.JsonWebSignature -> jose.JSONWebSignature jose.JoseHeader -> jose.Header Some more significant changes were caused by updates in the API for for creating new jose.Signer instances. Previously we constructed these with jose.NewSigner(algorithm, key). Now these are created with jose.NewSigner(jose.SigningKey{},jose.SignerOptions{}). At present all signers specify EmbedJWK: true but this will likely change with follow-up ACME V2 work. Another change was the removal of the jose.LoadPrivateKey function that the wfe tests relied on. The jose v2 API removed these functions, moving them to a cmd's main package where we can't easily import them. This function was reimplemented in the WFE's test code & updated to fail fast rather than return errors. Per CONTRIBUTING.md I have verified the go-jose.v2 tests at the imported commit pass: ok gopkg.in/square/go-jose.v2 14.771s ok gopkg.in/square/go-jose.v2/cipher 0.025s ? gopkg.in/square/go-jose.v2/jose-util [no test files] ok gopkg.in/square/go-jose.v2/json 1.230s ok gopkg.in/square/go-jose.v2/jwt 0.073s Resolves #2880	2017-07-26 10:55:14 -07:00
Brian Smith	ac63c78313	CA: Have IssueCertificate use IssueCertificateRequest directly. (#2886 ) This is a step towards the long-term goal of eliminating wrappers and a step towards the short-term goal of making it easier to refactor ca/ca_test.go to add testing of precertificate-based issuance.	2017-07-25 13:35:25 -04:00
Daniel McCarney	fbd87b1757	Splits CountRegistrationsByIP to exact-match and by /48. (#2782 ) Prior to this PR the SA's `CountRegistrationsByIP` treated IPv6 differently than IPv4 by counting registrations within a /48 for IPv6 as opposed to exact matches for IPv4. This PR updates `CountRegistrationsByIP` to treat IPv4 and IPv6 the same, always matching exactly. The existing RegistrationsPerIP rate limit policy will be applied against this exact matching count. A new `CountRegistrationsByIPRange` function is added to the SA that performs the historic matching process, e.g. for IPv4 it counts exactly the same as `CountRegistrationsByIP`, but for IPv6 it counts within a /48. A new `RegistrationsPerIPRange` rate limit policy is added to allow configuring the threshold/window for the fuzzy /48 matching registration limit. Stats for the "Exceeded" and "Pass" events for this rate limit are separated into a separate `RegistrationsByIPRange` stats scope under the `RateLimit` scope to allow us to track it separate from the exact registrations per IP rate limit. Resolves https://github.com/letsencrypt/boulder/issues/2738	2017-05-30 15:12:20 -07:00
Daniel McCarney	47452d6c6c	Prefer IPv6 addresses, fall back to IPv4. (#2715 ) This PR introduces a new feature flag "IPv6First". When the "IPv6First" feature is enabled the VA's HTTP dialer and TLS SNI (01 and 02) certificate fetch requests will attempt to automatically retry when the initial connection was to IPv6 and there is an IPv4 address available to retry with. This resolves https://github.com/letsencrypt/boulder/issues/2623	2017-05-08 13:00:16 -07:00
Daniel McCarney	361e7d4caa	Clean up `berrors` (#2724 ) This PR removes two berrors that aren't used anywhere in the codebase: TooManyRequests , a holdover from AMQP, and is no longer used. UnsupportedIdentifier, used just for rejecting IDNs, which we no longer do. In addition, the SignatureValidation error was only used by the WFE so it is moved there and unexported. Note for reviewers: To remove berrors.UnsupportedIdentifierError I replaced the errIDNNotSupported error in policy/pa.go with a berrors.MalformedError with the same name. This allows removing UnsupportedIdentifierError ahead of #2712 which removes the IDNASupport feature flag. This seemed OK to me, but I can restore UnsupportedIdentifierError and clean it up after 2712 if that's preferred. Resolves #2709	2017-05-04 10:56:26 +01:00
Daniel McCarney	1ed34a4a5d	Fixes cert count rate limit for exact PSL matches. (#2703 ) Prior to this PR if a domain was an exact match to a public suffix list entry the certificates per name rate limit was applied based on the count of certificates issued for that exact name and all of its subdomains. This PR introduces an exception such that exact public suffix matches correctly have the certificate per name rate limit applied based on only exact name matches. In order to accomplish this a new RPC is added to the SA `CountCertificatesByExactNames`. This operates similar to the existing `CountCertificatesByNames` but does not include subdomains in the count, only exact matches to the names provided. The usage of this new RPC is feature flag gated behind the "CountCertificatesExact" feature flag. The RA unit tests are updated to test the new code paths both with and without the feature flag enabled. Resolves #2681	2017-05-02 13:43:35 -07:00
Jacob Hoffman-Andrews	d542960a35	Remove statsd version of RPC stats (#2693 ) * Remove statsd-style RPC stats. * Remove tests for old code.	2017-04-25 10:10:35 -04:00
Jacob Hoffman-Andrews	d99800ecb1	Remove some last traces of AMQP. (#2687 ) Fixes #2665	2017-04-20 10:43:17 -07:00
Roland Bracewell Shoemaker	fd561ef842	Block issuance on first OCSP response generation (#2633 ) Generate first OCSP response in ca.IssueCertificate instead of ocsp-updater.newCertificateTick if features.GenerateOCSPEarly is enabled. Adds a new field to the sa.AddCertiifcate RPC for the OCSP response and only adds it to the certificate status + sets ocspLastUpdated if it is a non-empty slice. ocsp-updater.newCertificateTick stays the same so we can catch certificates that were successfully signed + stored but a OCSP response couldn't be generated (for whatever reason). Fixes #2477.	2017-04-04 11:28:09 -07:00
Roland Bracewell Shoemaker	08f4dda038	Update github.com/grpc-ecosystem/go-grpc-prometheus and google.golang.org/grpc (#2637 ) Updates the various gRPC/protobuf libs (google.golang.org/grpc/... and github.com/golang/protobuf/proto) and the boulder-tools image so that we can update to the newest github.com/grpc-ecosystem/go-grpc-prometheus. Also regenerates all of the protobuf definition files. Tests run on updated packages all pass. Unblocks #2633 fixes #2636.	2017-04-03 11:13:48 -07:00
Roland Bracewell Shoemaker	e2b2511898	Overhaul internal error usage (#2583 ) This patch removes all usages of the `core.XXXError` and almost all usages of `probs` outside of the WFE and VA and replaces them with a unified internal error type. Since the VA uses `probs.ProblemDetails` quite extensively in challenges, and currently stores them in the DB I've saved this change for another change (it'll also require a migration). Since `ProblemDetails` should only ever be exposed to end-users all of its related logic should be moved into the `WFE` but since it still needs to be exposed to the VA and SA I've left it in place for now. The new internal `errors` package offers the same convenience functions as `probs` does as well as a new simpler type testing method. A few small changes have also been made to error messages, mainly adding the library and function name to internal server errors for easier debugging (i.e. where a number of functions return the exact same errors and there is no other way to distinguish which method threw the error). Also adds proper encoding of internal errors transferred over gRPC (the current encoding scheme is kept for `core` and `probs` errors since it'll be ideally be removed after we deploy this and follow-up changes) using `grpc/metadata` instead of the gRPC status codes. Fixes #2507. Updates #2254 and #2505.	2017-03-22 23:27:31 -07:00
Roland Bracewell Shoemaker	a7cd4fb2c7	Don't wrap errors we return from boulder/grpc/creds.ClientHandshake (#2590 ) The gRPC client reconnect code needs to be able to check if a error is temporary so that it can decide if it should attempt to reconnect or just fail and kill the client[1]. By wrapping the error we were receiving in our TLS handshake code we were removing the existing `Temporary` interface on the error. This meant that if a client attempted to reconnect to a server that was in the process of being shutdown, the client would consider that server permanently dead and never retry. Fix is simple: don't wrap errors that we pass back into the gRPC internals so that they can be properly inspected. [1]: `aefc96d792/clientconn.go (L783)`	2017-03-01 11:27:03 -08:00
David Calavera	0dc2513d2d	Generate GRPC objects with Go 1.8. Signed-off-by: David Calavera <david.calavera@gmail.com>	2017-02-21 12:11:17 +01:00
Roland Bracewell Shoemaker	0c04fe2f5e	Move error wrapping/unwrapping into the interceptors (#2556 ) Instead of using `unwrapError/wrapError` in each of the wrapper functions do it in the server/client interceptors instead. This means we now consistently do error unwrapping/wrapping. Fixes #2509.	2017-02-13 12:56:23 -05:00
Roland Bracewell Shoemaker	18de73f0d8	Pass nil errors through boulder/grpc wrapError/unwrapError (#2544 ) Instead of trying to wrap or unwrap them which causes panics. Also, expand the test_ct_submission integration test to include resubmissions.	2017-02-06 18:19:39 -08:00
Daniel	e88db3cd5e	Revert "Revert "Copy all statsd stats to Prometheus. (#2474 )" (#2541 )" This reverts commit `9d9e4941a5` and restores the statsd prometheus code.	2017-02-01 15:48:18 -05:00
Daniel McCarney	9d9e4941a5	Revert "Copy all statsd stats to Prometheus. (#2474 )" (#2541 ) This reverts commit `58ccd7a71a`. We are seeing multiple boulder components restart when they encounter the stat registration race condition described in https://github.com/letsencrypt/boulder/issues/2540	2017-02-01 12:50:27 -05:00
Roland Bracewell Shoemaker	7853532972	Encode challenge errors and validation records when handling protobufs (#2520 ) Previously we had `Error` and `ValidationRecords` fields in the `Challenge` protobuf but they were never populated which mean't that when using gRPC these fields wouldn't be sent to the SA from the RA on a `FinalizeAuthorization` call. This change populates those fields and updates the PB marshaling tests to verify the correct behavior. Fixes #2514.	2017-01-25 09:39:35 -05:00
Jacob Hoffman-Andrews	6c93b41f20	Add a limit on failed authorizations (#2513 ) Fixes #976. This implements a new rate limit, InvalidAuthorizationsPerAccount. If a given account fails authorization for a given hostname too many times within the window, subsequent new-authz attempts for that account and hostname will fail early with a rateLimited error. This mitigates the misconfigured clients that constantly retry authorization even though they always fail (e.g., because the hostname no longer resolves). For the new rate limit, I added a new SA RPC, CountInvalidAuthorizations. I chose to implement this only in gRPC, not in AMQP-RPC, so checking the rate limit is gated on gRPC. See #2406 for some description of the how and why. I also chose to directly use the gRPC interfaces rather than wrapping them in core.StorageAuthority, as a step towards what we will want to do once we've moved fully to gRPC. Because authorizations don't have a created time, we need to look at the expires time instead. Invalid authorizations retain the expiration they were given when they were created as pending authorizations, so we use now + pendingAuthorizationLifetime as one side of the window for rate limiting, and look backwards from there. Note that this means you could maliciously bypass this rate limit by stacking up pending authorizations over time, then failing them all at once. Similarly, since this limit is by (account, hostname) rather than just (hostname), you can bypass it by creating multiple accounts. It would be more natural and robust to limit by hostname, like our certificate limits. However, we currently only have two indexes on the authz table: the primary key, and (`registrationID`,`identifier`,`status`,`expires`) Since this limit is intended mainly to combat misconfigured clients, I think this is sufficient for now. Corresponding PR for website: letsencrypt/website#125	2017-01-23 11:22:51 -08:00
Roland Bracewell Shoemaker	7d7adabe44	Allow probs.ProblemDetails to be passed across gRPC layer (#2506 ) Currently services will pass both `core.XXXError` and `probs.XXX` type errors across the gRPC layer. In the future (#2505) we intend to stop passing `probs.XXX` type errors across this layer but for now we need to support them until that change is landed. This patch takes the easiest path to allow this by encoding the `probs.ProblemDetails` to JSON and storing it in the gRPC error body so that it can be passed around. Fixes #2497.	2017-01-19 14:59:44 -08:00
Jacob Hoffman-Andrews	9dacdd5443	Fix SA wrappers for maps. (#2498 ) We turn arrays into maps with a range command. Previously, we were taking the address of the iteration variable in that range command, which meant incorrect results since the iteration variable gets reassigned. Also change the integration test to catch this error. Fixes #2496	2017-01-17 14:07:07 -08:00
Jacob Hoffman-Andrews	d6ba7fcba9	Add some timing histogram stats (#2482 ) Previously our gRPC client code called the wrong function, enabling server-side instead of client-side histograms. Also, add a timing stat for the generate / store combination in OCSP Updater.	2017-01-10 11:02:41 -08:00
Jacob Hoffman-Andrews	58ccd7a71a	Copy all statsd stats to Prometheus. (#2474 ) We have a number of stats already expressed using the statsd interface. During the switchover period to direct Prometheus collection, we'd like to make those stats available both ways. This change automatically exports any stats exported using the statsd interface via Prometheus as well. This is a little tricky because Prometheus expects all stats to by registered exactly once. Prometheus does offer a mechanism to gracefully recover from registering a stat more than once by handling a certain error, but it is not safe for concurrent access. So I added a concurrency-safe wrapper that creates Prometheus stats on demand and memoizes them. In the process, made a few small required side changes: - Clean "/" from method names in the gRPC interceptors. They are allowed in statsd but not in Prometheus. - Replace "127.0.0.1" with "boulder" as the name of our testing CT log. Prometheus stats can't start with a number. - Remove ":" from the CT-log stat names emitted by Publisher. Prometheus stats can't include it. - Remove a stray "RA" in front of some rate limit stats, since it was duplicative (we were emitting "RA.RA..." before). Note that this means two stat groups in particular are duplicated: - Gostats* is duplicated with the default process-level stats exported by the Prometheus library. - gRPCClient* are duplicated by the stats generated by the go-grpc-prometheus package. When writing dashboards and alerts in the Prometheus world, we should be careful to avoid these two categories, as they will disappear eventually. As a general rule, if a stat is available with an all-lowercase name, choose that one, as it is probably the Prometheus-native version. In the long run we will want to create most stats using the native Prometheus stat interface, since it allows us to use add labels to metrics, which is very useful. For instance, currently our DNS stats distinguish types of queries by appending the type to the stat name. This would be more natural as a label in Prometheus.	2017-01-10 10:30:15 -05:00
Jacob Hoffman-Andrews	510e279208	Simplify gRPC TLS configs. (#2470 ) Previously, a given binary would have three TLS config fields (CA cert, cert, key) for its gRPC server, plus each of its configured gRPC clients. In typical use, we expect all three of those to be the same across both servers and clients within a given binary. This change reuses the TLSConfig type already defined for use with AMQP, adds a Load() convenience function that turns it into a *tls.Config, and configures it for use with all of the binaries. This should make configuration easier and more robust, since it more closely matches usage. This change preserves temporary backwards-compatibility for the ocsp-updater->publisher RPCs, since those are the only instances of gRPC currently enabled in production.	2017-01-06 14:19:18 -08:00

1 2

82 Commits