Previously, we used prometheus.DefaultRegisterer to register our stats, which uses global state to export its HTTP stats. We also used net/http/pprof's behavior of registering to the default global HTTP ServeMux, via DebugServer, which starts an HTTP server that uses that global ServeMux.
In this change, I merge DebugServer's functions into StatsAndLogging. StatsAndLogging now takes an address parameter and fires off an HTTP server in a goroutine. That HTTP server is newly defined, and doesn't use DefaultServeMux. On it is registered the Prometheus stats handler, and handlers for the various pprof traces. In the process I split StatsAndLogging internally into two functions: makeStats and MakeLogger. I didn't port across the expvar variable exporting, which serves a similar function to Prometheus stats but which we never use.
One nice immediate effect of this change: Since StatsAndLogging now requires and address, I noticed a bunch of commands that called StatsAndLogging, and passed around the resulting Scope, but never made use of it because they didn't run a DebugServer. Under the old StatsD world, these command still could have exported their stats by pushing, but since we moved to Prometheus their stats stopped being collected. We haven't used any of these stats, so instead of adding debug ports to all short-lived commands, or setting up a push gateway, I simply removed them and switched those commands to initialize only a Logger, no stats.
We removed most of the component-specific configs out of cmd a long time ago. We
left CAConfig in, because it is used directly in the NewCertificateAuthority
constructor. That means that moving CAConfig into cmd/boulder-ca would have
resulted in a circular dependency.
Eventually we probably want to decompose CAConfig so it's a set of arguments to
NewCertificateAuthority, but as a short term improvement, move the config into
its own package to break the dependency. This has the advantage of removing a
couple of big dependencies from cmd.
We used to use TCP because we would request DNSSEC records from Unbound, and
they would always cause truncated records when present. Now that we no longer
request those (#2718), we can use UDP. This is better because the TCP serving
paths in Unbound are likely less thoroughly tested, and not optimized for high
load. In particular this may resolve some availability problems we've seen
recently when trying to upgrade to a more recent Unbound.
Note that this only affects the Boulder->Unbound path. The Unbound->upstream
path is already UDP by default (with TCP fallback for truncated ANSWERs).
A frequent point of confusion is which ACME draft Boulder implements. Often people imagine (sensibly!) that there is one draft they can reference to understand Boulder.
This commit updates the divergences doc to clarify that it should be used to compare Boulder to whatever the most current ACME draft is and that Boulder doesn't implement a specific draft. This commit also adds a reference to what ACME v1 is and a link to the ACME v2 blog post.
Small references are also added to the "applications" concept from prev. drafts. Otherwise folks that land on older ACME drafts may wonder why the divergences doc doesn't mention "applications", a concept that was renamed to "orders" in subsequent drafts. We do document divergences for "orders" and attention should be directed there.
The contrib guide to migrations creates a migration with goose create AddWizards sql but references it later as
sa/_db/20160915101011_WizardMigrations.sql. This commit fixes the
reference to use the correct AddWizards.sql filename suffix.
safebrowsing defaults to a one minute timeout for its requests. Set a much lower
one so that we don't timeout new-authzs when communicating with safebrowsing is
slow.
Since we can make up to 100 SQL queries from this method (based on the 100-SAN
limit), sometimes it is too slow and we get a timeout for large certificates. By
running some of those queries in parallel, we can speed things up and stop
getting timeouts.
This commit adds CAA `issue` paramter parsing and the `challenge` parameter to permit a single challenge type only. By setting `challenge=dns-01`, the nameserver keeps control over every issued certificate.
These were added as part of #62,
based on the original CPS at
https://letsencrypt.org/documents/ISRG-CPS-May-5-2015.pdf. Request method was an
odd thing to log because for Let's Encrypt it will always be "online", never "in
person." And VerificationMethods is logged separately during the authz
validation process. The newest CPS at
https://letsencrypt.org/documents/isrg-cps-v2.0/ no longer requires these
specific fields, so we're removing them for clarity.
* Move probs.go to web.
* Move probs_test.go
* Factor out probs.go from wfe
* Move context.go
* Extract context.go into web package.
* Add a constructor for TopHandler.
Prior to this commit the VA would follow redirects from the initial
HTTP-01 challenge request on port 80 to any other port. In practice the
Let's Encrypt production environment has network egress firewall rules
that drop outbound requests that are not on port 80 or 443. In effect
this meant any challenge request that was redirected from 80 to a port
other than 80/443 was turned into a mysterious connection timeout error.
We have decided to preserve the egress firewall rule and continue to act
conservatively. Only port 80 and 443 should be allowed in redirects.
This commit updates the VA to return a clear error message when
a non-80/443 redirect is made.
To aid in testing/configuration the actual ports enforced are specified
by the va.httpPort and va.httpsPort that are used for the initial
outbound HTTP-01 connection.
The VA TestHTTPRedirectLookup unit test is updated accordingly to test
that a non-80/443 redirect fails with the expected message.
Resolves#3049
For certificates with many domains it can be difficult to associate
a given CAA error with the specific domain that caused it. To make this
easier this commit explicitly prefixes all of the problems that can be
returned from `va.IsCAAValid` with the domain name in question.
A small unit test is included to check a CAA problem's detail message is
suitably prefixed with the affected domain.
For the new-order endpoint only. This does some refactoring of the order of operations in `ra.NewAuthorization` as well in order to reduce the duplication of code relating to creating pending authorizations, existing tests still seem to work as intended... A close eye should be given to this since we don't have integration tests yet that test it end to end. This also changes the inner type of `grpc.StorageAuthorityServerWrapper` to `core.StorageAuthority` so that we can avoid a circular import that is created by needing to import `grpc.AuthzToPB` and `grpc.PBToAuthz` in `sa/sa.go`.
This is a big change but should considerably improve the performance of the new-order flow.
Fixes#2955.
In 2fb247488f we consolidated the
`regModelV2` and `regModelv1` structs to one `regModel` type. In the
process we accidentally lost the explicit assignment of the
to-be-updated registration model's `LockCol` with the value of the
existing registration's `LockCol`. This meant that the Update was
occurring with a where clause `LockCol=0` (the default value).
In practice this meant that the first reg update would succeed (since
the reg row starts with LockCol=0) but any regs that had already been
updated once before would modify 0 rows in the update (because the where
clause on `LockCol` failed) and this in turn was translated into
a ServerInternal error since we knew the reg being updated did exist.
This commit updates the SA's `UpdateRegistration` function to properly
set the `LockCol` on the to-be-updated row.
This commit additionally adds an integration test for registration
contact information updating to ensure we don't fall into this trap in
the future.
The RA's `checkCertificatesPerFQDNSetLimit` function was using `>` where
it should have been using `>=` when evaluating a FQDNSet count against
the rate limit threshold. This resulted in an off by one error where we
allowed 1 more duplicate certificate than intended.
This commit fixes the off-by-one error and adds a short unit test. The
unit test failed the
`TestCheckExactCertificateLimit/FQDN_set_issuances_equal_to_limit`
subtest before the fix was applied and passes afterwards.
There were a bunch of test fixtures in bdns/mocks.go that were only used in va/caa_test.go. This moves them to be in the same file so we have less spooky action at a distance.
One side-effect: We can't construct bdns.DNSError with the internal fields we want, because those fields are unexported. So we switch a couple of mock cases to just return a generic error, and the corresponding test cases to expect that error.
This is shared code between both packages. Better to have it in a single shared place.
In the process, remove the unexported signatureValidationError, which was unnecessary; all returned errors from checkAlgorithm get turned into Malformed.
The 2.1.3 release of go-jose.v2 contains a bug fix for a nil panic
encountering null values in JWS headers that affects Boulder. This
commit updates Boulder to use the 2.1.3 release.
Unit tests were confirmed to pass:
```
$ go test ./...
ok gopkg.in/square/go-jose.v2 13.648s
ok gopkg.in/square/go-jose.v2/cipher 0.003s
? gopkg.in/square/go-jose.v2/jose-util [no test files]
ok gopkg.in/square/go-jose.v2/json 1.199s
ok gopkg.in/square/go-jose.v2/jwt 0.064s
```
In 7d04ea9 we introduced the notion of a requestEvent, which had an AddError method that could be called to log an error. In that change we also added an AddError call before every wfe.sendError, to ensure errors got logged. In dc58017, we made it so that sendError would automatically add its errors to the request event, so we wouldn't need to write AddError everywhere. However, we never cleaned up the existing AddError calls, and since then have tended to "follow local style" and add a redundant AddError before many of our sendError calls.
This change attempts to undo some of that, by removing all AddError calls that appear to be redundant with the sendError call immediately following. It also adds a section on error handling to CONTRIBUTING.md.
Previously, CAA problems were lumped in under "ConnectionProblem" or
"Unauthorized". This should make things clearer and easier to differentiate.
Fixes#3043
I thought there was a bug in NewRegistration when GetRegByKey returns an error, so I wrote a unittest... and discovered it works correctly. Oh well, now we have more tests!
Fixes#3020.
In order to write integration tests for some features, especially related to rate limiting, rechecking of CAA, and expiration of authzs, orders, and certs, we need to be able to fake the passage of time in integration tests.
To do so, this change switches out all clock.Default() instances for cmd.Clock(), which can be set manually with the FAKECLOCK environment variable. integration-test.py now starts up all servers once before the main body of tests, with FAKECLOCK set to a date 70 days ago, and does some initial setup for a new integration test case. That test case tries to fetch a 70-day-old authz URL, and expects it to 404.
In order to make this work, I also had to change a number of our test binaries to shut down cleanly in response to SIGTERM. Without that change, stopping the servers between the setup phase and the main tests caused startservers.check() to fail, because some processes exited with nonzero status.
Note: This is an initial stab at things, to prove out the technique. Long-term, I think we will want to use an idiom where test cases are classes that have a number of optional setup phases that may be run at e.g. 70 days prior and 5 days prior. This could help us avoid a proliferation of global state as we add more time-dependent test cases.
Since the legacy CAA spec does the wrong thing with DNAMEs (treating them as
CNAMEs), and it's hard to reconcile this approach with CNAME handling, and
DNAMEs are extremely rare, reject outright any CAA responses containing DNAMEs.
Also, in the process, fix a bug in the previous LegacyCAA implementation.
Because the processing of records in LookupCAA was gated by `if
answer.Header().RRType == dnsType`, non-CAA responses were filtered out. This
wasn't caught by previous testing, because it was unittesting that mocked out
bdns.
This implements the pre-erratum 5065 version of CAA, behind a feature flag.
This involved refactoring DNSClient.LookupCAA to return a list of CNAMEs in addition to the CAA records, and adding an alternate lookuper that does tree-climbing on single-depth aliases.