boulder

Commit Graph

Author	SHA1	Message	Date
Jacob Hoffman-Andrews	6c93b41f20	Add a limit on failed authorizations (#2513 ) Fixes #976. This implements a new rate limit, InvalidAuthorizationsPerAccount. If a given account fails authorization for a given hostname too many times within the window, subsequent new-authz attempts for that account and hostname will fail early with a rateLimited error. This mitigates the misconfigured clients that constantly retry authorization even though they always fail (e.g., because the hostname no longer resolves). For the new rate limit, I added a new SA RPC, CountInvalidAuthorizations. I chose to implement this only in gRPC, not in AMQP-RPC, so checking the rate limit is gated on gRPC. See #2406 for some description of the how and why. I also chose to directly use the gRPC interfaces rather than wrapping them in core.StorageAuthority, as a step towards what we will want to do once we've moved fully to gRPC. Because authorizations don't have a created time, we need to look at the expires time instead. Invalid authorizations retain the expiration they were given when they were created as pending authorizations, so we use now + pendingAuthorizationLifetime as one side of the window for rate limiting, and look backwards from there. Note that this means you could maliciously bypass this rate limit by stacking up pending authorizations over time, then failing them all at once. Similarly, since this limit is by (account, hostname) rather than just (hostname), you can bypass it by creating multiple accounts. It would be more natural and robust to limit by hostname, like our certificate limits. However, we currently only have two indexes on the authz table: the primary key, and (`registrationID`,`identifier`,`status`,`expires`) Since this limit is intended mainly to combat misconfigured clients, I think this is sufficient for now. Corresponding PR for website: letsencrypt/website#125	2017-01-23 11:22:51 -08:00
Jacob Hoffman-Andrews	9dacdd5443	Fix SA wrappers for maps. (#2498 ) We turn arrays into maps with a range command. Previously, we were taking the address of the iteration variable in that range command, which meant incorrect results since the iteration variable gets reassigned. Also change the integration test to catch this error. Fixes #2496	2017-01-17 14:07:07 -08:00
Jacob Hoffman-Andrews	58ccd7a71a	Copy all statsd stats to Prometheus. (#2474 ) We have a number of stats already expressed using the statsd interface. During the switchover period to direct Prometheus collection, we'd like to make those stats available both ways. This change automatically exports any stats exported using the statsd interface via Prometheus as well. This is a little tricky because Prometheus expects all stats to by registered exactly once. Prometheus does offer a mechanism to gracefully recover from registering a stat more than once by handling a certain error, but it is not safe for concurrent access. So I added a concurrency-safe wrapper that creates Prometheus stats on demand and memoizes them. In the process, made a few small required side changes: - Clean "/" from method names in the gRPC interceptors. They are allowed in statsd but not in Prometheus. - Replace "127.0.0.1" with "boulder" as the name of our testing CT log. Prometheus stats can't start with a number. - Remove ":" from the CT-log stat names emitted by Publisher. Prometheus stats can't include it. - Remove a stray "RA" in front of some rate limit stats, since it was duplicative (we were emitting "RA.RA..." before). Note that this means two stat groups in particular are duplicated: - Gostats* is duplicated with the default process-level stats exported by the Prometheus library. - gRPCClient* are duplicated by the stats generated by the go-grpc-prometheus package. When writing dashboards and alerts in the Prometheus world, we should be careful to avoid these two categories, as they will disappear eventually. As a general rule, if a stat is available with an all-lowercase name, choose that one, as it is probably the Prometheus-native version. In the long run we will want to create most stats using the native Prometheus stat interface, since it allows us to use add labels to metrics, which is very useful. For instance, currently our DNS stats distinguish types of queries by appending the type to the stat name. This would be more natural as a label in Prometheus.	2017-01-10 10:30:15 -05:00
Daniel McCarney	9b89aa7d2c	Logs failed SA.UpdatePendingAuthorization. (#2321 ) This commit resolves #2303 by updating the comment, and returned error type produced when the RA calls `SA.UpdatePendingAuthorization` and it fails. Previously this produced a `MalformedRequestError` that was described as only happening when the client corrupted the challenge data. Now this is returned as more descriptive `ServerInternalError` and the underlying error from the SA is logged as a warning for further debugging.	2016-11-11 09:26:53 -08:00
Roland Bracewell Shoemaker	c5f99453a9	Switch CT submission RPC from CA -> RA (#2304 ) With the current gRPC design the CA talks directly to the Publisher when calling SubmitToCT which crosses security bounadries (secure internal segment -> internet facing segment) which is dangerous if (however unlikely) the Publisher is compromised and there is a gRPC exploit that allows memory corruption on the caller end of a RPC which could expose sensitive information or cause arbitrary issuance. Instead we move the RPC call to the RA which is in a less sensitive network segment. Switching the call site from the CA -> RA is gated on adding the gRPC PublisherService object to the RA config. Fixes #2202.	2016-11-08 11:39:02 -08:00
Daniel McCarney	a6f2b0fafb	Updates `go-jose` dep to v1.1.0 (#2314 ) This commit updates the `go-jose` dependency to [v1.1.0](https://github.com/square/go-jose/releases/tag/v1.1.0) (Commit: aa2e30fdd1fe9dd3394119af66451ae790d50e0d). Since the import path changed from `github.com/square/...` to `gopkg.in/square/go-jose.v1/` this means removing the old dep and adding the new one. The upstream go-jose library added a `[]x509.Certificate` member to the `JsonWebKey` struct that prevents us from using a direct equality test against two `JsonWebKey` instances. Instead we now must compare the inner `Key` members. The `TestRegistrationContactUpdate` function from `ra_test.go` was updated to populate the `Key` members used in testing instead of only using KeyID's to allow the updated comparisons to work as intended. The `Key` field of the `Registration` object was switched from `jose.JsonWebKey` to `jose.JsonWebKey ` to make it easier to represent a registration w/o a Key versus using a value with a nil `JsonWebKey.Key`. I verified the upstream unit tests pass per contributing.md: ``` daniel@XXXXX:~/go/src/gopkg.in/square/go-jose.v1$ git show commit aa2e30fdd1fe9dd3394119af66451ae790d50e0d Merge: 139276c e18a743 Author: Cedric Staub <cs@squareup.com> Date: Thu Sep 22 17:08:11 2016 -0700 Merge branch 'master' into v1 * master: Better docs explaining embedded JWKs Reject invalid embedded public keys Improve multi-recipient/multi-sig handling daniel@XXXXX:~/go/src/gopkg.in/square/go-jose.v1$ go test ./... ok gopkg.in/square/go-jose.v1 17.599s ok gopkg.in/square/go-jose.v1/cipher 0.007s ? gopkg.in/square/go-jose.v1/jose-util [no test files] ok gopkg.in/square/go-jose.v1/json 1.238s ```	2016-11-08 13:56:50 -05:00
Daniel McCarney	eb67ad4f88	Allow `validateEmail` to timeout w/o error. (#2288 ) This PR reworks the validateEmail() function from the RA to allow timeouts during DNS validation of MX/A/AAAA records for an email to be non-fatal and match our intention to verify emails best-effort. Notes: bdns/problem.go - DNSError.Timeout() was changed to also include context cancellation and timeout as DNS timeouts. This matches what DNSError.Error() was doing to set the error message and supports external callers to Timeout not duplicating the work. bdns/mocks.go - the LookupMX mock was changed to support always.error and always.timeout in a manner similar to the LookupHost mock. Otherwise the TestValidateEmail unit test for the RA would fail when the MX lookup completed before the Host lookup because the error wouldn't be correct (empty DNS records vs a timeout or network error). test/config/ra.json, test/config-next/ra.json - the dnsTries and dnsTimeout values were updated such that dnsTries * dnsTimeout was <= the WFE->RA RPC timeout (currently 15s in the test configs). This allows the dns lookups to all timeout without the overall RPC timing out. Resolves #2260.	2016-10-27 11:56:12 -07:00
Daniel McCarney	39c7ce7b69	Properly abort `NewAuthorization` when SA RPC fails. (#2286 ) The RA performs an RPC to the SA's `GetValidAuthorizations` function when attempting to find existing valid authorizations to reuse. Prior to this commit, ff the RPC fails (e.g. due to a timeout) the calling code logs the failure as a warning but fails to return the error and cease processing. This results in a nil panic when we later try to index `auths` This commit inserts the missing `return` to ensure we don't process further, thereby resolving #2274. A test for this fix is provided with `TestReuseAuthorizationFaultySA`. Without `f52f340` applied this test recreates the panic observed in #2274 and produces: ``` go test -p 1 -v -race --test.run TestReuseAuthorizationFaultySA github.com/letsencrypt/boulder/ra === RUN TestReuseAuthorizationFaultySA --- FAIL: TestReuseAuthorizationFaultySA (0.04s) panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal 0xb code=0x1 addr=0x20 pc=0x4be2b8] ``` With `f52f340` it passes. Yay!	2016-10-27 08:58:26 -07:00
Roland Bracewell Shoemaker	ce679bad41	Implement key rollover (#2231 ) Fixes #503. Functionality is gated by the feature flag `AllowKeyRollover`. Since this functionality is only specified in ACME draft-03 and we mostly implement the draft-02 style this takes some liberties in the implementation, which are described in the updated divergences doc. The `key-change` resource is used to side-step draft-03 `url` requirement.	2016-10-27 10:22:09 -04:00
Jacob Hoffman-Andrews	1958dc9065	Update total issued count asynchronously. (#2246 ) Previously the lock on total issued count would exacerbate problems when the count query was slow, which it often is. Fixes #1809.	2016-10-20 14:17:34 -07:00
Jacob Hoffman-Andrews	404e9682b1	Improve error messages. (#2256 ) Quote rejected hostnames. Include term "global" when rejecting based on global rate limit. Fixes #2252	2016-10-18 10:15:21 -07:00
Blake Griffith	d2cf6ee126	Fixes RA `DeactivateRegistration` err message typo (#2222 )	2016-10-03 15:20:43 -04:00
Roland Bracewell Shoemaker	c6e3ef660c	Re-apply 2138 with proper gating (#2199 ) Re-applies #2138 using the new style of feature-flag gated migrations. Account deactivation is gated behind `features.AllowAccountDeactivation`.	2016-09-29 17:16:03 -04:00
Roland Bracewell Shoemaker	2c966c61b2	Revert "Allow account deactivation (#2138 )" (#2188 ) This reverts commit `6f3d078414`, reversing changes made to `c8f1fb3e2f`.	2016-09-19 11:20:41 -07:00
Jacob Hoffman-Andrews	6f3d078414	Allow account deactivation (#2138 ) Fixes #2011.	2016-09-07 19:36:54 -04:00
Roland Bracewell Shoemaker	c8f1fb3e2f	Remove direct usages of go-statsd-client in favor of using metrics.Scope (#2136 ) Fixes #2118, fixes #2082.	2016-09-07 19:35:13 -04:00
Blake Griffith	344a312905	Remove audit comments -- closes #2129 (#2139 ) Closes #2129 * Remove audit comments. * Nuke doc/requirements/*	2016-08-25 18:23:42 -07:00
Roland Shoemaker	dbf9afa7d6	Review fixes pt. 1	2016-08-25 16:28:58 -07:00
Roland Shoemaker	c7e5ed1262	RA/WFE tests	2016-08-24 12:36:41 -07:00
Roland Shoemaker	aa7f85d3f5	Merge branch 'master' into reg-deact	2016-08-24 11:51:15 -07:00
Roland Bracewell Shoemaker	51ee04e6a9	Allow authorization deactivation (#2116 ) Implements `valid` and `pending` authz deactivation.	2016-08-23 16:25:06 -04:00
Roland Bracewell Shoemaker	91bfd05127	Revert #2088 (#2137 ) * Remove oldx509 usage * Un-vendor old crypto/x509, crypto/x509/pkix, and encoding/asn1	2016-08-23 14:01:37 -04:00
Roland Shoemaker	003158c9e3	Initial impl	2016-08-18 14:12:09 -07:00
Ben Irving	8ed5b1e6a1	Replace *AcmeURL with string (#2117 ) Removes core.AcmeURL from boulder and uses string instead. Fixes #1996	2016-08-11 13:27:19 -07:00
Ben Irving	8b622c805a	Move MergeUpdate out of core (#2114 ) Fixes #2022	2016-08-08 17:12:52 -07:00
Roland Bracewell Shoemaker	fc39781274	Allow user specified revocation reason (#2089 ) Fixes #140. This patch allows users to specify the following revocation reasons based on my interpretation of the meaning of the codes but could use confirmation from others. * unspecified (0) * keyCompromise (1) * affiliationChanged (3) * superseded (4) * cessationOfOperation (5)	2016-08-08 14:26:52 -07:00
Jacob Hoffman-Andrews	474b76ad95	Import forked x509 for parsing of CSRs with empty integers (#2088 ) Part of #2080. This change vendors `crypto/x509`, `crypto/x509/pkix`, and `encoding/asn1` from `1d5f6a765d`. That commit is a direct child of the Go 1.5.4 release tag, so it contains the same code as the current Go version we are using. In that commit I rewrote imports in those packages so they depend on each other internally rather than calling out to the standard library, which would cause type disagreements. I changed the imports in each place where we're parsing CSRs, and imported under a different name `oldx509`, both to avoid collisions and make it clear what's going on. Places that only use `x509` to parse certificates are not changed, and will use the current standard library. This will unblock us from moving to Go 1.6, and subsequently Go 1.7.	2016-07-28 10:38:33 -04:00
Patrick Figel	8cd74bf766	Make (pending)AuthorizationLifetime configurable (#2028 ) Introduces the `authorizationLifetimeDays` and `pendingAuthorizationLifetimeDays` configuration options for `RA`. If the values are missing from configuration, the code defaults back to the current values (300/7 days). fixes #2024	2016-07-12 15:18:22 -04:00
Daniel McCarney	7e946eaacc	Registration update optimizations (#2001 ) This PR adds two optimizations to fix the optimistic lock errors observed in #1986. First, the WFE now returns early for registration POST's (before invoking the RA and SA) when the POST body is the trivial update (`{"resource":"reg"}`). This prevents any DB operations from being performed when there is no work to be done. Second, the RA now tracks whether a update actually changes the base registration's `Contact` slice, or `Agreement` string. If the proposed update doesn't change either of these fields then the RA will return early before handing the update to the SA. Both changes save database operations from being performed needlessly and will help avoid the optimistic lock errors we observed when a problematic client was POSTing the trivial update repeatedly in a short period. The fix was verified as follows: I checked out master and artificially introduced lock contention into the SA by adding a 2s sleep into `UpdateRegistration` between fetching the `existingRegModel` to get the `LockCol` value and calling `ssa.dbMap.Update`. With the sleep in place & two certbot clients posting matching registration updates the lock contention error is produced as expected. After checking out the `empty-reg-updates` branch, re-adding the sleep to the SA, and performing the same two client reg updates no error is produced.	2016-07-07 13:40:55 -04:00
Simone Carletti	7172e49650	Replace x/net/publicsuffix with weppos/publicsuffix-go (#1969 ) This PR replaces the `x/net/publicsuffix` package with `weppos/publicsuffix-go`. The conversations that leaded to this decision are #1479 and #1374. To summarize the discussion, the main issue with `x/net/publicsuffix` is that the package compiles the list into the Go source code and doesn't provide a way to easily pull updates (e.g. by re-parsing the original PSL) unless the entire package is recompiled. The PSL update frequency is almost daily, which makes very hard to recompile the official Golang package to stay up-to-date with all the changes. Moreover, Golang maintainers expressed some concerns about rebuilding and committing changes with a frequency that would keep the package in sync with the original PSL. See https://github.com/letsencrypt/boulder/issues/1374#issuecomment-182429297 `weppos/publicsuffix-go` contains a compiled version of the list that is updated weekly (or more frequently). Moreover, the package can read and parse a PSL from a String or a File which will effectively decouple the Boulder source code with the list itself. The main benefit is that it will be possible to update the definition by simply downloading the latest list and restarting the application (assuming the list is persisted in memory).	2016-06-30 15:03:14 -07:00
Daniel McCarney	3a6a254c5c	Expose last issuance timestamp via expvar (#1982 ) This PR adds a expvar.Int published under the key "lastIssuance" that contains the timestamp of the last successful certificate issuance. This allows easy creation of a script that monitors the RA debug server (port 8002) to ensure that there has been a successful issuance within a set period (e.g. last five minutes). The underlying expvar.Int code uses the atomic package to ensure safe updates/reads across multiple goroutines. This resolves #1945 and was selected in place of the more complex circular bucket design. While the timestamp approach doesn't provide the issuance volume as readily it is less complex and meets the immediate need of a reliable external monitoring process hook. https://github.com/letsencrypt/boulder/pull/1982	2016-06-27 10:38:35 -07:00
Ben Irving	d3db851403	remove regID from WillingToIssue (#1957 ) The `regID` parameter in the PA's `WillingToIssue` function was originally used for whitelisting purposes, but is not used any longer. This PR removes it.	2016-06-22 12:21:07 -04:00
Ben Irving	77e64fef79	Disallow non-ASCII email addresses (#1953 ) This PR, adds a check in registration authority for non-ASCII encoded characters in an email address. This is due to a 'funky email implementation'. Fixes #1350	2016-06-21 17:53:38 -07:00
Jacob Hoffman-Andrews	4e0f96d924	Remove last vestiges of challenge.AccountKey. (#1949 ) This is a followup from https://github.com/letsencrypt/boulder/pull/1942. That PR stopped setting challenge.AccountKey. This one removes it entirely. Fixes #1948	2016-06-21 16:25:58 -07:00
Jacob Hoffman-Andrews	0535ac78d7	Stop setting AccountKey in challenges (#1942 ) In https://github.com/letsencrypt/boulder/pull/774 we introduced and account key stored with the challenge. This was a stopgap fix to the now-defunct SimpleHTTP and DNS challenges in the face of https://mailarchive.ietf.org/arch/msg/acme/F71iz6qq1o_QPVhJCV4dqWf-4Yc. However, we no longer offer or implement those challenges, so the extra field is unnecessary. It also take up a huge amount of space in the challenges table, which is our biggest table. SimpleHTTP and DNS challenges were removed in https://github.com/letsencrypt/boulder/pull/1247. We can provide a follow-up migration to delete the column later, once we have a plan for large migrations without downtime. Fixes #1909	2016-06-20 14:26:53 -07:00
Daniel McCarney	778c9bba86	Fix FQDNSet rate limit exemption. (#1935 ) As reported in #1925 the Certificates per Domain rate limit was being incorrectly enforced on certificate renewals for FQDN sets that have been previously issued. This is counter to the described rate limit policies[0] that detail a separate rate limit for certificates issued for the "exact same set of Fully Qualified Domain Names". The bug was caused by the result of `domainsForRateLimiting` overwriting the original `names []string` provided to the RA's `checkCertificatesPerNameLimit` function. This meant instead of looking for an existing FQDN set for the full set of domain names being requested we checked for an FQDN set for just the eTLD+1's of the domains. (e.g "www.example.com, foo.example.com, bar.example.com" vs "example.com"). This commit preserves the original `names` values for doing an FQDN set lookup and uses the `tldNames` from `domainsForRateLimiting` elsewhere. This fixes #1925. A test is added to ensure that `checkCertificatesPerNameLimit` does the correct thing both with and without an existing FQDN set. [0] https://community.letsencrypt.org/t/rate-limits-for-lets-encrypt/6769	2016-06-16 13:50:39 -07:00
Daniel McCarney	cd2d1c4f6b	Allow removing registration contact. (#1923 ) The RA UpdateRegistration function merges a base registration object with an update by calling Registration.MergeUpdate. Prior to this commit MergeUpdate only allowed the updated registration object to overwrite the Contact field of the existing registration if the updated reg. defined at least one AcmeURL. This prevented clients from being able to outright remove the contact associated with an existing registration. This commit removes the len() check on the input.Contact in MergeUpdate to allow the r.Contact field to be overwritten by a []core.AcmeURL(nil) Contact field. Subsequently clients can now send an empty contacts list in the update registration POST in order to remove their reg contact. Fixes #1846 Allow removing registration contact. * Adds a test for `MergeUpdate` contact removal. * Change `Registration.Contact` type to `[]core.AcmeURL`. * End validateContacts early for empty contacts * Test removing reg. contact more thoroughly.	2016-06-13 11:02:29 -07:00
Daniel McCarney	9abc212448	Reuse valid authz for subsequent new authz requests (#1921 ) Presently clients may request a new AuthZ be created for a domain that they have already proved authorization over. This results in unnecessary bloat in the authorizations table and duplicated effort. This commit alters the `NewAuthorization` function of the RA such that before going through the work of creating a new AuthZ it checks whether there already exists a valid AuthZ for the domain/regID that expires in more than 24 hours from the current date. If there is, then we short circuit creation and return the existing AuthZ. When this case occurs the `RA.ReusedValidAuthz` counter is incremented to provide visibility. Since clients requesting a new AuthZ and getting an AuthZ back expect to turn around and post updates to the corresponding challenges we also return early in `UpdateAuthorization` when asked to update an AuthZ that is already valid. When this case occurs the `RA.ReusedValidAuthzChallenge` counter is incremented. All of the above behaviour is gated by a new RA config flag `reuseValidAuthz`. In the default case (false) the RA does not reuse any AuthZ's and instead maintains the historic behaviour; always creating a new AuthZ when requested, irregardless of whether there are already valid AuthZ's that could be reused. In the true case (enabled only in `boulder-config-next.json`) the AuthZ reuse described above is enabled. Resolves #1854	2016-06-10 16:44:16 -04:00
Ben Irving	438580f206	Remove last of UseNewVARPC (#1914 ) `UseNewVARPC` is no longer necessary and is safe to be removed. We default to using the newer VA RPC code.	2016-06-09 10:12:46 -04:00
Daniel McCarney	4c289f2a8f	Reload ratelimit policy automatically at runtime (#1894 ) Resolves #1810 by automatically updating the RA ratelimit.RateLimitConfig whenever the backing config file is changed. Much like the Policy Authority uses a reloader instance to support updating the Hostname policy on the fly, this PR changes the Registration Authority to use a reloader for the rate limit policy file. Access to the ra.rlPolicies member is protected with a RWMutex now that there is a potential for the values to be reloaded while a reader is active. A test is introduced to ensure that writing a new policy YAML to the policy config file results in new values being set in the RA's rlPolicies instance. https://github.com/letsencrypt/boulder/pull/1894	2016-06-08 12:11:46 -07:00
Ben Irving	1336c42813	Replace all log.Err calls with log.AuditErr (#1891 ) * remove calls to log.Err() * go fmt * remove more occurrences * change AuditErr argument to string and replace occurrences	2016-06-06 16:27:16 -04:00
Jacob Hoffman-Andrews	92df4d0fc2	Rename authorities to shorter names. (#1878 ) Fixes #1875.	2016-06-03 13:35:28 -07:00

... 5 6 7 8 9

442 Commits