Commit Graph

442 Commits

Author SHA1 Message Date
Jacob Hoffman-Andrews 6c93b41f20 Add a limit on failed authorizations (#2513)
Fixes #976.

This implements a new rate limit, InvalidAuthorizationsPerAccount. If a given account fails authorization for a given hostname too many times within the window, subsequent new-authz attempts for that account and hostname will fail early with a rateLimited error. This mitigates the misconfigured clients that constantly retry authorization even though they always fail (e.g., because the hostname no longer resolves).

For the new rate limit, I added a new SA RPC, CountInvalidAuthorizations. I chose to implement this only in gRPC, not in AMQP-RPC, so checking the rate limit is gated on gRPC. See #2406 for some description of the how and why. I also chose to directly use the gRPC interfaces rather than wrapping them in core.StorageAuthority, as a step towards what we will want to do once we've moved fully to gRPC.

Because authorizations don't have a created time, we need to look at the expires time instead. Invalid authorizations retain the expiration they were given when they were created as pending authorizations, so we use now + pendingAuthorizationLifetime as one side of the window for rate limiting, and look backwards from there. Note that this means you could maliciously bypass this rate limit by stacking up pending authorizations over time, then failing them all at once.

Similarly, since this limit is by (account, hostname) rather than just (hostname), you can bypass it by creating multiple accounts. It would be more natural and robust to limit by hostname, like our certificate limits. However, we currently only have two indexes on the authz table: the primary key, and

(`registrationID`,`identifier`,`status`,`expires`)

Since this limit is intended mainly to combat misconfigured clients, I think this is sufficient for now.

Corresponding PR for website: letsencrypt/website#125
2017-01-23 11:22:51 -08:00
Jacob Hoffman-Andrews 9dacdd5443 Fix SA wrappers for maps. (#2498)
We turn arrays into maps with a range command. Previously, we were taking the
address of the iteration variable in that range command, which meant incorrect
results since the iteration variable gets reassigned.

Also change the integration test to catch this error.

Fixes #2496
2017-01-17 14:07:07 -08:00
Jacob Hoffman-Andrews 58ccd7a71a Copy all statsd stats to Prometheus. (#2474)
We have a number of stats already expressed using the statsd interface. During
the switchover period to direct Prometheus collection, we'd like to make those
stats available both ways. This change automatically exports any stats exported
using the statsd interface via Prometheus as well.

This is a little tricky because Prometheus expects all stats to by registered
exactly once. Prometheus does offer a mechanism to gracefully recover from
registering a stat more than once by handling a certain error, but it is not
safe for concurrent access. So I added a concurrency-safe wrapper that creates
Prometheus stats on demand and memoizes them.

In the process, made a few small required side changes:
 - Clean "/" from method names in the gRPC interceptors. They are allowed in
   statsd but not in Prometheus.
 - Replace "127.0.0.1" with "boulder" as the name of our testing CT log.
   Prometheus stats can't start with a number.
 - Remove ":" from the CT-log stat names emitted by Publisher. Prometheus stats
   can't include it.
 - Remove a stray "RA" in front of some rate limit stats, since it was
   duplicative (we were emitting "RA.RA..." before).

Note that this means two stat groups in particular are duplicated:
 - Gostats* is duplicated with the default process-level stats exported by the
   Prometheus library.
 - gRPCClient* are duplicated by the stats generated by the go-grpc-prometheus
   package.

When writing dashboards and alerts in the Prometheus world, we should be careful
to avoid these two categories, as they will disappear eventually. As a general
rule, if a stat is available with an all-lowercase name, choose that one, as it
is probably the Prometheus-native version.

In the long run we will want to create most stats using the native Prometheus
stat interface, since it allows us to use add labels to metrics, which is very
useful. For instance, currently our DNS stats distinguish types of queries by
appending the type to the stat name. This would be more natural as a label in
Prometheus.
2017-01-10 10:30:15 -05:00
Daniel McCarney 9b89aa7d2c Logs failed SA.UpdatePendingAuthorization. (#2321)
This commit resolves #2303 by updating the comment, and returned error type produced when the RA calls `SA.UpdatePendingAuthorization` and it fails.

Previously this produced a `MalformedRequestError` that was described as only happening when the client corrupted the challenge data.

Now this is returned as more descriptive `ServerInternalError` and the underlying error from the SA is logged as a warning for further debugging.
2016-11-11 09:26:53 -08:00
Roland Bracewell Shoemaker c5f99453a9 Switch CT submission RPC from CA -> RA (#2304)
With the current gRPC design the CA talks directly to the Publisher when calling SubmitToCT which crosses security bounadries (secure internal segment -> internet facing segment) which is dangerous if (however unlikely) the Publisher is compromised and there is a gRPC exploit that allows memory corruption on the caller end of a RPC which could expose sensitive information or cause arbitrary issuance.

Instead we move the RPC call to the RA which is in a less sensitive network segment. Switching the call site from the CA -> RA is gated on adding the gRPC PublisherService object to the RA config.

Fixes #2202.
2016-11-08 11:39:02 -08:00
Daniel McCarney a6f2b0fafb Updates `go-jose` dep to v1.1.0 (#2314)
This commit updates the `go-jose` dependency to [v1.1.0](https://github.com/square/go-jose/releases/tag/v1.1.0) (Commit: aa2e30fdd1fe9dd3394119af66451ae790d50e0d). Since the import path changed from `github.com/square/...` to `gopkg.in/square/go-jose.v1/` this means removing the old dep and adding the new one.

The upstream go-jose library added a `[]*x509.Certificate` member to the `JsonWebKey` struct that prevents us from using a direct equality test against two `JsonWebKey` instances. Instead we now must compare the inner `Key` members.

The `TestRegistrationContactUpdate` function from `ra_test.go` was updated to populate the `Key` members used in testing instead of only using KeyID's to allow the updated comparisons to work as intended.

The `Key` field of the `Registration` object was switched from `jose.JsonWebKey` to `*jose.JsonWebKey ` to make it easier to represent a registration w/o a Key versus using a value with a nil `JsonWebKey.Key`.

I verified the upstream unit tests pass per contributing.md:
```
daniel@XXXXX:~/go/src/gopkg.in/square/go-jose.v1$ git show
commit aa2e30fdd1fe9dd3394119af66451ae790d50e0d
Merge: 139276c e18a743
Author: Cedric Staub <cs@squareup.com>
Date:   Thu Sep 22 17:08:11 2016 -0700

    Merge branch 'master' into v1
    
    * master:
      Better docs explaining embedded JWKs
      Reject invalid embedded public keys
      Improve multi-recipient/multi-sig handling

daniel@XXXXX:~/go/src/gopkg.in/square/go-jose.v1$ go test ./...
ok  	gopkg.in/square/go-jose.v1	17.599s
ok  	gopkg.in/square/go-jose.v1/cipher	0.007s
?   	gopkg.in/square/go-jose.v1/jose-util	[no test files]
ok  	gopkg.in/square/go-jose.v1/json	1.238s
```
2016-11-08 13:56:50 -05:00
Daniel McCarney eb67ad4f88 Allow `validateEmail` to timeout w/o error. (#2288)
This PR reworks the validateEmail() function from the RA to allow timeouts during DNS validation of MX/A/AAAA records for an email to be non-fatal and match our intention to verify emails best-effort.

Notes:

bdns/problem.go - DNSError.Timeout() was changed to also include context cancellation and timeout as DNS timeouts. This matches what DNSError.Error() was doing to set the error message and supports external callers to Timeout not duplicating the work.

bdns/mocks.go - the LookupMX mock was changed to support always.error and always.timeout in a manner similar to the LookupHost mock. Otherwise the TestValidateEmail unit test for the RA would fail when the MX lookup completed before the Host lookup because the error wouldn't be correct (empty DNS records vs a timeout or network error).

test/config/ra.json, test/config-next/ra.json - the dnsTries and dnsTimeout values were updated such that dnsTries * dnsTimeout was <= the WFE->RA RPC timeout (currently 15s in the test configs). This allows the dns lookups to all timeout without the overall RPC timing out.

Resolves #2260.
2016-10-27 11:56:12 -07:00
Daniel McCarney 39c7ce7b69 Properly abort `NewAuthorization` when SA RPC fails. (#2286)
The RA performs an RPC to the SA's `GetValidAuthorizations` function when attempting to find existing valid authorizations to reuse. Prior to this commit, ff the RPC fails (e.g. due to a timeout) the calling code
logs the failure as a warning but fails to return the error and cease processing. This results in a nil panic when we later try to index `auths`

This commit inserts the missing `return` to ensure we don't process further, thereby resolving #2274.

A test for this fix is provided with `TestReuseAuthorizationFaultySA`. Without f52f340 applied this test recreates the panic observed in #2274 and produces:

```
go test -p 1 -v -race --test.run TestReuseAuthorizationFaultySA github.com/letsencrypt/boulder/ra
=== RUN   TestReuseAuthorizationFaultySA
--- FAIL: TestReuseAuthorizationFaultySA (0.04s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x20 pc=0x4be2b8]

```

With f52f340 it passes. Yay!
2016-10-27 08:58:26 -07:00
Roland Bracewell Shoemaker ce679bad41 Implement key rollover (#2231)
Fixes #503.

Functionality is gated by the feature flag `AllowKeyRollover`. Since this functionality is only specified in ACME draft-03 and we mostly implement the draft-02 style this takes some liberties in the implementation, which are described in the updated divergences doc. The `key-change` resource is used to side-step draft-03 `url` requirement.
2016-10-27 10:22:09 -04:00
Jacob Hoffman-Andrews 1958dc9065 Update total issued count asynchronously. (#2246)
Previously the lock on total issued count would exacerbate problems when the
count query was slow, which it often is.

Fixes #1809.
2016-10-20 14:17:34 -07:00
Jacob Hoffman-Andrews 404e9682b1 Improve error messages. (#2256)
Quote rejected hostnames.
Include term "global" when rejecting based on global rate limit.

Fixes #2252
2016-10-18 10:15:21 -07:00
Blake Griffith d2cf6ee126 Fixes RA `DeactivateRegistration` err message typo (#2222) 2016-10-03 15:20:43 -04:00
Roland Bracewell Shoemaker c6e3ef660c Re-apply 2138 with proper gating (#2199)
Re-applies #2138 using the new style of feature-flag gated migrations. Account deactivation is gated behind `features.AllowAccountDeactivation`.
2016-09-29 17:16:03 -04:00
Roland Bracewell Shoemaker 2c966c61b2 Revert "Allow account deactivation (#2138)" (#2188)
This reverts commit 6f3d078414, reversing
changes made to c8f1fb3e2f.
2016-09-19 11:20:41 -07:00
Jacob Hoffman-Andrews 6f3d078414 Allow account deactivation (#2138)
Fixes #2011.
2016-09-07 19:36:54 -04:00
Roland Bracewell Shoemaker c8f1fb3e2f Remove direct usages of go-statsd-client in favor of using metrics.Scope (#2136)
Fixes #2118, fixes #2082.
2016-09-07 19:35:13 -04:00
Blake Griffith 344a312905 Remove audit comments -- closes #2129 (#2139)
Closes #2129

* Remove audit comments.
* Nuke doc/requirements/*
2016-08-25 18:23:42 -07:00
Roland Shoemaker dbf9afa7d6 Review fixes pt. 1 2016-08-25 16:28:58 -07:00
Roland Shoemaker c7e5ed1262 RA/WFE tests 2016-08-24 12:36:41 -07:00
Roland Shoemaker aa7f85d3f5 Merge branch 'master' into reg-deact 2016-08-24 11:51:15 -07:00
Roland Bracewell Shoemaker 51ee04e6a9 Allow authorization deactivation (#2116)
Implements `valid` and `pending` authz deactivation.
2016-08-23 16:25:06 -04:00
Roland Bracewell Shoemaker 91bfd05127 Revert #2088 (#2137)
* Remove oldx509 usage

* Un-vendor old crypto/x509, crypto/x509/pkix, and encoding/asn1
2016-08-23 14:01:37 -04:00
Roland Shoemaker 003158c9e3 Initial impl 2016-08-18 14:12:09 -07:00
Ben Irving 8ed5b1e6a1 Replace *AcmeURL with string (#2117)
Removes core.AcmeURL from boulder and uses string instead.

Fixes #1996
2016-08-11 13:27:19 -07:00
Ben Irving 8b622c805a Move MergeUpdate out of core (#2114)
Fixes #2022
2016-08-08 17:12:52 -07:00
Roland Bracewell Shoemaker fc39781274 Allow user specified revocation reason (#2089)
Fixes #140.

This patch allows users to specify the following revocation reasons based on my interpretation of the meaning of the codes but could use confirmation from others.

* unspecified (0)
* keyCompromise (1)
* affiliationChanged (3)
* superseded (4)
* cessationOfOperation (5)
2016-08-08 14:26:52 -07:00
Jacob Hoffman-Andrews 474b76ad95 Import forked x509 for parsing of CSRs with empty integers (#2088)
Part of #2080.

This change vendors `crypto/x509`, `crypto/x509/pkix`, and `encoding/asn1` from  1d5f6a765d. That commit is a direct child of the Go 1.5.4 release tag, so it contains the same code as the current Go version we are using. In that commit I rewrote imports in those packages so they depend on each other internally rather than calling out to the standard library, which would cause type disagreements.

I changed the imports in each place where we're parsing CSRs, and imported under a different name `oldx509`, both to avoid collisions and make it clear what's going on. Places that only use `x509` to parse certificates are not changed, and will use the current standard library.

This will unblock us from moving to Go 1.6, and subsequently Go 1.7.
2016-07-28 10:38:33 -04:00
Patrick Figel 8cd74bf766 Make (pending)AuthorizationLifetime configurable (#2028)
Introduces the `authorizationLifetimeDays` and `pendingAuthorizationLifetimeDays` configuration options for `RA`.

If the values are missing from configuration, the code defaults back to the current values (300/7 days).

fixes #2024
2016-07-12 15:18:22 -04:00
Daniel McCarney 7e946eaacc Registration update optimizations (#2001)
This PR adds two optimizations to fix the optimistic lock errors observed in #1986.

First, the WFE now returns early for registration POST's (before invoking the RA and SA) when the POST body is the trivial update (`{"resource":"reg"}`). This prevents any DB operations from being performed when there is no work to be done.

Second, the RA now tracks whether a update actually changes the base registration's `Contact` slice, or `Agreement` string. If the proposed update doesn't change either of these fields then the RA will return early before handing the update to the SA. 

Both changes save database operations from being performed needlessly and will help avoid the optimistic lock errors we observed when a problematic client was POSTing the trivial update repeatedly in a short period.

The fix was verified as follows: I checked out master and artificially introduced lock contention into the SA by adding a 2s sleep into `UpdateRegistration` between fetching the `existingRegModel` to get the `LockCol` value and calling `ssa.dbMap.Update`. With the sleep in place & two certbot clients posting matching registration updates the lock contention error is produced as expected. After checking out the `empty-reg-updates` branch, re-adding the sleep to the SA, and performing the same two client reg updates no error is produced.
2016-07-07 13:40:55 -04:00
Simone Carletti 7172e49650 Replace x/net/publicsuffix with weppos/publicsuffix-go (#1969)
This PR replaces the `x/net/publicsuffix` package with `weppos/publicsuffix-go`.

The conversations that leaded to this decision are #1479 and #1374. To summarize the discussion, the main issue with `x/net/publicsuffix` is that the package compiles the list into the Go source code and doesn't provide a way to easily pull updates (e.g. by re-parsing the original PSL) unless the entire package is recompiled.

The PSL update frequency is almost daily, which makes very hard to recompile the official Golang package to stay up-to-date with all the changes. Moreover, Golang maintainers expressed some concerns about rebuilding and committing changes with a frequency that would keep the package in sync with the original PSL. See https://github.com/letsencrypt/boulder/issues/1374#issuecomment-182429297

`weppos/publicsuffix-go` contains a compiled version of the list that is updated weekly (or more frequently). Moreover, the package can read and parse a PSL from a String or a File which will effectively decouple the Boulder source code with the list itself. The main benefit is that it will be possible to update the definition by simply downloading the latest list and restarting the application (assuming the list is persisted in memory).
2016-06-30 15:03:14 -07:00
Daniel McCarney 3a6a254c5c Expose last issuance timestamp via expvar (#1982)
This PR adds a expvar.Int published under the key "lastIssuance" that contains the timestamp of the last successful certificate issuance. This allows easy creation of a script that monitors the RA debug server (port 8002) to ensure that there has been a successful issuance within a set period (e.g. last five minutes).

The underlying expvar.Int code uses the atomic package to ensure safe updates/reads across multiple goroutines.

This resolves #1945 and was selected in place of the more complex circular bucket design. While the timestamp approach doesn't provide the issuance volume as readily it is less complex and meets the immediate need of a reliable external monitoring process hook.

https://github.com/letsencrypt/boulder/pull/1982
2016-06-27 10:38:35 -07:00
Ben Irving d3db851403 remove regID from WillingToIssue (#1957)
The `regID` parameter in the PA's `WillingToIssue` function was originally used for whitelisting purposes, but is not used any longer. This PR removes it.
2016-06-22 12:21:07 -04:00
Ben Irving 77e64fef79 Disallow non-ASCII email addresses (#1953)
This PR, adds a check in registration authority for non-ASCII encoded characters in an email address. This is due to a 'funky email implementation'.

Fixes #1350
2016-06-21 17:53:38 -07:00
Jacob Hoffman-Andrews 4e0f96d924 Remove last vestiges of challenge.AccountKey. (#1949)
This is a followup from https://github.com/letsencrypt/boulder/pull/1942. That PR stopped setting challenge.AccountKey. This one removes it entirely.

Fixes #1948
2016-06-21 16:25:58 -07:00
Jacob Hoffman-Andrews 0535ac78d7 Stop setting AccountKey in challenges (#1942)
In https://github.com/letsencrypt/boulder/pull/774 we introduced and account key stored with the challenge. This was a stopgap fix to the now-defunct SimpleHTTP and DNS challenges in the face of https://mailarchive.ietf.org/arch/msg/acme/F71iz6qq1o_QPVhJCV4dqWf-4Yc. However, we no longer offer or implement those challenges, so the extra field is unnecessary. It also take up a huge amount of space in the challenges table, which is our biggest table. SimpleHTTP and DNS challenges were removed in https://github.com/letsencrypt/boulder/pull/1247.

We can provide a follow-up migration to delete the column later, once we have a plan for large migrations without downtime.

Fixes #1909
2016-06-20 14:26:53 -07:00
Daniel McCarney 778c9bba86 Fix FQDNSet rate limit exemption. (#1935)
As reported in #1925 the Certificates per Domain rate limit was being
incorrectly enforced on certificate renewals for FQDN sets that have
been previously issued. This is counter to the described rate
limit policies[0] that detail a separate rate limit for certificates
issued for the "exact same set of Fully Qualified Domain Names".

The bug was caused by the result of `domainsForRateLimiting` overwriting
the original `names []string` provided to the RA's
`checkCertificatesPerNameLimit` function. This meant instead of looking
for an existing FQDN set for the full set of domain names being
requested we checked for an FQDN set for just the eTLD+1's of the
domains. (e.g "www.example.com, foo.example.com, bar.example.com" vs
"example.com").

This commit preserves the original `names` values for doing an FQDN set
lookup and uses the `tldNames` from `domainsForRateLimiting` elsewhere.
This fixes #1925.

A test is added to ensure that `checkCertificatesPerNameLimit` does the
correct thing both with and without an existing FQDN set.

[0] https://community.letsencrypt.org/t/rate-limits-for-lets-encrypt/6769
2016-06-16 13:50:39 -07:00
Daniel McCarney cd2d1c4f6b Allow removing registration contact. (#1923)
The RA UpdateRegistration function merges a base registration object with an update by calling Registration.MergeUpdate. Prior to this commit MergeUpdate only allowed the updated registration object to overwrite the Contact field of the existing registration if the updated reg. defined at least one AcmeURL. This prevented clients from being able to outright remove the contact associated with an existing registration.

This commit removes the len() check on the input.Contact in MergeUpdate to allow the r.Contact field to be overwritten by a []*core.AcmeURL(nil) Contact field. Subsequently clients can now send an empty contacts list in the update registration POST in order to remove their reg contact.

Fixes #1846

* Allow removing registration contact.
* Adds a test for `MergeUpdate` contact removal.
* Change `Registration.Contact` type to `*[]*core.AcmeURL`.
* End validateContacts early for empty contacts
* Test removing reg. contact more thoroughly.
2016-06-13 11:02:29 -07:00
Daniel McCarney 9abc212448 Reuse valid authz for subsequent new authz requests (#1921)
Presently clients may request a new AuthZ be created for a domain that they have already proved authorization over. This results in unnecessary bloat in the authorizations table and duplicated effort.

This commit alters the `NewAuthorization` function of the RA such that before going through the work of creating a new AuthZ it checks whether there already exists a valid AuthZ for the domain/regID that expires in more than 24 hours from the current date. If there is, then we short circuit creation and return the existing AuthZ. When this case occurs the `RA.ReusedValidAuthz` counter is incremented to provide visibility.

Since clients requesting a new AuthZ and getting an AuthZ back expect to turn around and post updates to the corresponding challenges we also return early in `UpdateAuthorization` when asked to update an AuthZ that is already valid. When this case occurs the `RA.ReusedValidAuthzChallenge` counter is incremented.

All of the above behaviour is gated by a new RA config flag `reuseValidAuthz`. In the default case (false) the RA does **not** reuse any AuthZ's and instead maintains the historic behaviour; always creating a new AuthZ when requested, irregardless of whether there are already valid AuthZ's that could be reused. In the true case (enabled only in `boulder-config-next.json`) the AuthZ reuse described above is enabled.

Resolves #1854
2016-06-10 16:44:16 -04:00
Ben Irving 438580f206 Remove last of UseNewVARPC (#1914)
`UseNewVARPC` is no longer necessary and is safe to be removed. We default to using the newer VA RPC code.
2016-06-09 10:12:46 -04:00
Daniel McCarney 4c289f2a8f Reload ratelimit policy automatically at runtime (#1894)
Resolves #1810 by automatically updating the RA ratelimit.RateLimitConfig whenever the backing config file is changed. Much like the Policy Authority uses a reloader instance to support updating the Hostname policy on the fly, this PR changes the Registration Authority to use a reloader for the rate limit policy file.

Access to the ra.rlPolicies member is protected with a RWMutex now that there is a potential for the values to be reloaded while a reader is active.

A test is introduced to ensure that writing a new policy YAML to the policy config file results in new values being set in the RA's rlPolicies instance.

https://github.com/letsencrypt/boulder/pull/1894
2016-06-08 12:11:46 -07:00
Ben Irving 1336c42813 Replace all log.Err calls with log.AuditErr (#1891)
* remove calls to log.Err()
* go fmt
* remove more occurrences
* change AuditErr argument to string and replace occurrences
2016-06-06 16:27:16 -04:00
Jacob Hoffman-Andrews 92df4d0fc2 Rename authorities to shorter names. (#1878)
Fixes #1875.
2016-06-03 13:35:28 -07:00