Commit Graph

384 Commits

Author SHA1 Message Date
Daniel McCarney 840badb498 Reverts pending auth/authz table merge. (#2297)
This PR reverts 27d531101f and undoes the merge of the `pendingAuthorizations` and `authz` table.  This change had unintended performance impacts on the `CountPendingAuthorizations` query that exacerbated load issues and need to be addressed.
2016-10-31 10:31:19 -07:00
Daniel McCarney eb67ad4f88 Allow `validateEmail` to timeout w/o error. (#2288)
This PR reworks the validateEmail() function from the RA to allow timeouts during DNS validation of MX/A/AAAA records for an email to be non-fatal and match our intention to verify emails best-effort.

Notes:

bdns/problem.go - DNSError.Timeout() was changed to also include context cancellation and timeout as DNS timeouts. This matches what DNSError.Error() was doing to set the error message and supports external callers to Timeout not duplicating the work.

bdns/mocks.go - the LookupMX mock was changed to support always.error and always.timeout in a manner similar to the LookupHost mock. Otherwise the TestValidateEmail unit test for the RA would fail when the MX lookup completed before the Host lookup because the error wouldn't be correct (empty DNS records vs a timeout or network error).

test/config/ra.json, test/config-next/ra.json - the dnsTries and dnsTimeout values were updated such that dnsTries * dnsTimeout was <= the WFE->RA RPC timeout (currently 15s in the test configs). This allows the dns lookups to all timeout without the overall RPC timing out.

Resolves #2260.
2016-10-27 11:56:12 -07:00
Daniel McCarney 39c7ce7b69 Properly abort `NewAuthorization` when SA RPC fails. (#2286)
The RA performs an RPC to the SA's `GetValidAuthorizations` function when attempting to find existing valid authorizations to reuse. Prior to this commit, ff the RPC fails (e.g. due to a timeout) the calling code
logs the failure as a warning but fails to return the error and cease processing. This results in a nil panic when we later try to index `auths`

This commit inserts the missing `return` to ensure we don't process further, thereby resolving #2274.

A test for this fix is provided with `TestReuseAuthorizationFaultySA`. Without f52f340 applied this test recreates the panic observed in #2274 and produces:

```
go test -p 1 -v -race --test.run TestReuseAuthorizationFaultySA github.com/letsencrypt/boulder/ra
=== RUN   TestReuseAuthorizationFaultySA
--- FAIL: TestReuseAuthorizationFaultySA (0.04s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x20 pc=0x4be2b8]

```

With f52f340 it passes. Yay!
2016-10-27 08:58:26 -07:00
Roland Bracewell Shoemaker ce679bad41 Implement key rollover (#2231)
Fixes #503.

Functionality is gated by the feature flag `AllowKeyRollover`. Since this functionality is only specified in ACME draft-03 and we mostly implement the draft-02 style this takes some liberties in the implementation, which are described in the updated divergences doc. The `key-change` resource is used to side-step draft-03 `url` requirement.
2016-10-27 10:22:09 -04:00
Jacob Hoffman-Andrews 1958dc9065 Update total issued count asynchronously. (#2246)
Previously the lock on total issued count would exacerbate problems when the
count query was slow, which it often is.

Fixes #1809.
2016-10-20 14:17:34 -07:00
Daniel McCarney 27d531101f Store new authorizations in the `authz` table (#2219)
To remove challenges with expired/pending authz's when they are deleted we want to introduce a foreign key relationship to the challenges table's authorizationID field with instruction to cascade on delete (#2155). As pointed out in a comment this is made difficult by the current usage of a separate pendingAuthorizations table for pending authorizations.

To be able to remove the pendingAuthorizations table entirely (#2163) we need to first stop using it. This PR introduces the code changes required to achieve this.

Notes:

The SA's NewPendingAuthorization function was updated to store all new pending auths in the authz table and to ensure the status is StatusPending.
The SA's GetAuthorization, UpdatePendingAuthorization, FinalizeAuthorization, and RevokeAuthorizationsByDomain functions were updated to properly handle the fact that a pending authz could be in either the pendingAuthorizations table, or the authz table, and to do the right thing accordingly.
Several places in the RA unit tests created a pending authorization with a status "Valid", then finalized it later. This broke when NewPendingAuthorization was changed to enforce Pending status before creating the authz row since the FinalizeAuthorization code expected to only finalize Valid rows. To fix this some of the RA unit tests were changed to explicitly set status to Valid before calling FinalizeAuthorization. This matches the true intention of the tests to quickly create a pending & then finalized authorization.
The expired-authz-purger utility was updated to purge from both the pendingAuthorizations and authz table as required.
The return values of RevokeAuthorizationsByDomain have changed slightly. Previously it returned a 2 element array where the first element was the number of pending authorizations revoked and the second element was the number of finalized authorizations revoked. This is changed so that now it is the number of rows from the pendingAuthorizations and authz tables respectively. E.g. the second count for the authz table may now include non-finalized authzs in its count of affected rows. The admin-revoker is the only place that used this SA method and it was updated appropriately to describe the "rows" change.
The "purger" database user needs to have a new GRANT SELECT, DELETE for the authz table in addition to its existing GRANT for the pendingAuthorizations table.
This resolves #2162
2016-10-18 09:39:59 -07:00
Roland Bracewell Shoemaker c6e3ef660c Re-apply 2138 with proper gating (#2199)
Re-applies #2138 using the new style of feature-flag gated migrations. Account deactivation is gated behind `features.AllowAccountDeactivation`.
2016-09-29 17:16:03 -04:00
Roland Bracewell Shoemaker 239bf9ae0a Very basic feature flag impl (#1705)
Updates #1699.

Adds a new package, `features`, which exposes methods to set and check if various internal features are enabled. The implementation uses global state to store the features so that services embedded in another service do not each require their own features map in order to check if something is enabled.

Requires a `boulder-tools` image update to include `golang.org/x/tools/cmd/stringer`.
2016-09-20 16:29:01 -07:00
Roland Bracewell Shoemaker 2c966c61b2 Revert "Allow account deactivation (#2138)" (#2188)
This reverts commit 6f3d078414, reversing
changes made to c8f1fb3e2f.
2016-09-19 11:20:41 -07:00
Jacob Hoffman-Andrews 6f3d078414 Allow account deactivation (#2138)
Fixes #2011.
2016-09-07 19:36:54 -04:00
Roland Bracewell Shoemaker c8f1fb3e2f Remove direct usages of go-statsd-client in favor of using metrics.Scope (#2136)
Fixes #2118, fixes #2082.
2016-09-07 19:35:13 -04:00
Roland Shoemaker dbf9afa7d6 Review fixes pt. 1 2016-08-25 16:28:58 -07:00
Roland Shoemaker c7e5ed1262 RA/WFE tests 2016-08-24 12:36:41 -07:00
Roland Bracewell Shoemaker 51ee04e6a9 Allow authorization deactivation (#2116)
Implements `valid` and `pending` authz deactivation.
2016-08-23 16:25:06 -04:00
Roland Bracewell Shoemaker 91bfd05127 Revert #2088 (#2137)
* Remove oldx509 usage

* Un-vendor old crypto/x509, crypto/x509/pkix, and encoding/asn1
2016-08-23 14:01:37 -04:00
Ben Irving 8ed5b1e6a1 Replace *AcmeURL with string (#2117)
Removes core.AcmeURL from boulder and uses string instead.

Fixes #1996
2016-08-11 13:27:19 -07:00
Ben Irving 8b622c805a Move MergeUpdate out of core (#2114)
Fixes #2022
2016-08-08 17:12:52 -07:00
Jacob Hoffman-Andrews 474b76ad95 Import forked x509 for parsing of CSRs with empty integers (#2088)
Part of #2080.

This change vendors `crypto/x509`, `crypto/x509/pkix`, and `encoding/asn1` from  1d5f6a765d. That commit is a direct child of the Go 1.5.4 release tag, so it contains the same code as the current Go version we are using. In that commit I rewrote imports in those packages so they depend on each other internally rather than calling out to the standard library, which would cause type disagreements.

I changed the imports in each place where we're parsing CSRs, and imported under a different name `oldx509`, both to avoid collisions and make it clear what's going on. Places that only use `x509` to parse certificates are not changed, and will use the current standard library.

This will unblock us from moving to Go 1.6, and subsequently Go 1.7.
2016-07-28 10:38:33 -04:00
Patrick Figel 8cd74bf766 Make (pending)AuthorizationLifetime configurable (#2028)
Introduces the `authorizationLifetimeDays` and `pendingAuthorizationLifetimeDays` configuration options for `RA`.

If the values are missing from configuration, the code defaults back to the current values (300/7 days).

fixes #2024
2016-07-12 15:18:22 -04:00
Daniel McCarney 7e946eaacc Registration update optimizations (#2001)
This PR adds two optimizations to fix the optimistic lock errors observed in #1986.

First, the WFE now returns early for registration POST's (before invoking the RA and SA) when the POST body is the trivial update (`{"resource":"reg"}`). This prevents any DB operations from being performed when there is no work to be done.

Second, the RA now tracks whether a update actually changes the base registration's `Contact` slice, or `Agreement` string. If the proposed update doesn't change either of these fields then the RA will return early before handing the update to the SA. 

Both changes save database operations from being performed needlessly and will help avoid the optimistic lock errors we observed when a problematic client was POSTing the trivial update repeatedly in a short period.

The fix was verified as follows: I checked out master and artificially introduced lock contention into the SA by adding a 2s sleep into `UpdateRegistration` between fetching the `existingRegModel` to get the `LockCol` value and calling `ssa.dbMap.Update`. With the sleep in place & two certbot clients posting matching registration updates the lock contention error is produced as expected. After checking out the `empty-reg-updates` branch, re-adding the sleep to the SA, and performing the same two client reg updates no error is produced.
2016-07-07 13:40:55 -04:00
Roland Bracewell Shoemaker 97bc63c484 Fix RA test race (#1995)
The RA test uses a `DummyValidationAuthority` which is called from `ra.UpdateAuithorization` in a goroutine. Both the tester and the spawned  goroutine want to mutate/read fields on the `DummyValidationAuthority` which causes a race. This PR fixes the race by blocking on a `chan struct{}` until the `DummyValidationAuthority` has mutated the required state.

Fixes #1980.

https://github.com/letsencrypt/boulder/pull/1995
2016-06-30 15:14:38 -07:00
Simone Carletti 7172e49650 Replace x/net/publicsuffix with weppos/publicsuffix-go (#1969)
This PR replaces the `x/net/publicsuffix` package with `weppos/publicsuffix-go`.

The conversations that leaded to this decision are #1479 and #1374. To summarize the discussion, the main issue with `x/net/publicsuffix` is that the package compiles the list into the Go source code and doesn't provide a way to easily pull updates (e.g. by re-parsing the original PSL) unless the entire package is recompiled.

The PSL update frequency is almost daily, which makes very hard to recompile the official Golang package to stay up-to-date with all the changes. Moreover, Golang maintainers expressed some concerns about rebuilding and committing changes with a frequency that would keep the package in sync with the original PSL. See https://github.com/letsencrypt/boulder/issues/1374#issuecomment-182429297

`weppos/publicsuffix-go` contains a compiled version of the list that is updated weekly (or more frequently). Moreover, the package can read and parse a PSL from a String or a File which will effectively decouple the Boulder source code with the list itself. The main benefit is that it will be possible to update the definition by simply downloading the latest list and restarting the application (assuming the list is persisted in memory).
2016-06-30 15:03:14 -07:00
Maxime Boisvert 821d572366 Adds RejectedIdentifierError on Blacklisted error. (#1944)
Boulder uses MalformedRequestError as a universal error. This pull request adds the RejectedIdentifierError and use it on a blacklist error.

For a client implementation, it is easier and cleaner to use an exception than parse the error message.

1336c42813/policy/pa.go (L131)

Fixes #1938

PR in acme : ietf-wg-acme/acme#142
2016-06-29 13:53:38 -07:00
Daniel McCarney 3a6a254c5c Expose last issuance timestamp via expvar (#1982)
This PR adds a expvar.Int published under the key "lastIssuance" that contains the timestamp of the last successful certificate issuance. This allows easy creation of a script that monitors the RA debug server (port 8002) to ensure that there has been a successful issuance within a set period (e.g. last five minutes).

The underlying expvar.Int code uses the atomic package to ensure safe updates/reads across multiple goroutines.

This resolves #1945 and was selected in place of the more complex circular bucket design. While the timestamp approach doesn't provide the issuance volume as readily it is less complex and meets the immediate need of a reliable external monitoring process hook.

https://github.com/letsencrypt/boulder/pull/1982
2016-06-27 10:38:35 -07:00
Simone Carletti 7f7183fab2 Increase timeout to fix test failures on Mac OSX (#1966)
It looks like the filesystem on Mac OSX El Captain
(and potentially previous versions) require at least
one second before you get a different timestamp.

Any value lower than one second caused:

- a deadlock error in a test in the reloader module
- a rate limit error in the ra module

See GH-1955.
2016-06-23 14:13:05 -07:00
Ben Irving 77e64fef79 Disallow non-ASCII email addresses (#1953)
This PR, adds a check in registration authority for non-ASCII encoded characters in an email address. This is due to a 'funky email implementation'.

Fixes #1350
2016-06-21 17:53:38 -07:00
Jacob Hoffman-Andrews 4e0f96d924 Remove last vestiges of challenge.AccountKey. (#1949)
This is a followup from https://github.com/letsencrypt/boulder/pull/1942. That PR stopped setting challenge.AccountKey. This one removes it entirely.

Fixes #1948
2016-06-21 16:25:58 -07:00
Jacob Hoffman-Andrews 0535ac78d7 Stop setting AccountKey in challenges (#1942)
In https://github.com/letsencrypt/boulder/pull/774 we introduced and account key stored with the challenge. This was a stopgap fix to the now-defunct SimpleHTTP and DNS challenges in the face of https://mailarchive.ietf.org/arch/msg/acme/F71iz6qq1o_QPVhJCV4dqWf-4Yc. However, we no longer offer or implement those challenges, so the extra field is unnecessary. It also take up a huge amount of space in the challenges table, which is our biggest table. SimpleHTTP and DNS challenges were removed in https://github.com/letsencrypt/boulder/pull/1247.

We can provide a follow-up migration to delete the column later, once we have a plan for large migrations without downtime.

Fixes #1909
2016-06-20 14:26:53 -07:00
Daniel McCarney 778c9bba86 Fix FQDNSet rate limit exemption. (#1935)
As reported in #1925 the Certificates per Domain rate limit was being
incorrectly enforced on certificate renewals for FQDN sets that have
been previously issued. This is counter to the described rate
limit policies[0] that detail a separate rate limit for certificates
issued for the "exact same set of Fully Qualified Domain Names".

The bug was caused by the result of `domainsForRateLimiting` overwriting
the original `names []string` provided to the RA's
`checkCertificatesPerNameLimit` function. This meant instead of looking
for an existing FQDN set for the full set of domain names being
requested we checked for an FQDN set for just the eTLD+1's of the
domains. (e.g "www.example.com, foo.example.com, bar.example.com" vs
"example.com").

This commit preserves the original `names` values for doing an FQDN set
lookup and uses the `tldNames` from `domainsForRateLimiting` elsewhere.
This fixes #1925.

A test is added to ensure that `checkCertificatesPerNameLimit` does the
correct thing both with and without an existing FQDN set.

[0] https://community.letsencrypt.org/t/rate-limits-for-lets-encrypt/6769
2016-06-16 13:50:39 -07:00
Daniel McCarney cd2d1c4f6b Allow removing registration contact. (#1923)
The RA UpdateRegistration function merges a base registration object with an update by calling Registration.MergeUpdate. Prior to this commit MergeUpdate only allowed the updated registration object to overwrite the Contact field of the existing registration if the updated reg. defined at least one AcmeURL. This prevented clients from being able to outright remove the contact associated with an existing registration.

This commit removes the len() check on the input.Contact in MergeUpdate to allow the r.Contact field to be overwritten by a []*core.AcmeURL(nil) Contact field. Subsequently clients can now send an empty contacts list in the update registration POST in order to remove their reg contact.

Fixes #1846

* Allow removing registration contact.
* Adds a test for `MergeUpdate` contact removal.
* Change `Registration.Contact` type to `*[]*core.AcmeURL`.
* End validateContacts early for empty contacts
* Test removing reg. contact more thoroughly.
2016-06-13 11:02:29 -07:00
Daniel McCarney 9abc212448 Reuse valid authz for subsequent new authz requests (#1921)
Presently clients may request a new AuthZ be created for a domain that they have already proved authorization over. This results in unnecessary bloat in the authorizations table and duplicated effort.

This commit alters the `NewAuthorization` function of the RA such that before going through the work of creating a new AuthZ it checks whether there already exists a valid AuthZ for the domain/regID that expires in more than 24 hours from the current date. If there is, then we short circuit creation and return the existing AuthZ. When this case occurs the `RA.ReusedValidAuthz` counter is incremented to provide visibility.

Since clients requesting a new AuthZ and getting an AuthZ back expect to turn around and post updates to the corresponding challenges we also return early in `UpdateAuthorization` when asked to update an AuthZ that is already valid. When this case occurs the `RA.ReusedValidAuthzChallenge` counter is incremented.

All of the above behaviour is gated by a new RA config flag `reuseValidAuthz`. In the default case (false) the RA does **not** reuse any AuthZ's and instead maintains the historic behaviour; always creating a new AuthZ when requested, irregardless of whether there are already valid AuthZ's that could be reused. In the true case (enabled only in `boulder-config-next.json`) the AuthZ reuse described above is enabled.

Resolves #1854
2016-06-10 16:44:16 -04:00
Ben Irving 438580f206 Remove last of UseNewVARPC (#1914)
`UseNewVARPC` is no longer necessary and is safe to be removed. We default to using the newer VA RPC code.
2016-06-09 10:12:46 -04:00
Daniel McCarney 4c289f2a8f Reload ratelimit policy automatically at runtime (#1894)
Resolves #1810 by automatically updating the RA ratelimit.RateLimitConfig whenever the backing config file is changed. Much like the Policy Authority uses a reloader instance to support updating the Hostname policy on the fly, this PR changes the Registration Authority to use a reloader for the rate limit policy file.

Access to the ra.rlPolicies member is protected with a RWMutex now that there is a potential for the values to be reloaded while a reader is active.

A test is introduced to ensure that writing a new policy YAML to the policy config file results in new values being set in the RA's rlPolicies instance.

https://github.com/letsencrypt/boulder/pull/1894
2016-06-08 12:11:46 -07:00
Jacob Hoffman-Andrews 92df4d0fc2 Rename authorities to shorter names. (#1878)
Fixes #1875.
2016-06-03 13:35:28 -07:00