We landed this feature flag disabled pending Certbot's acme library
supporting this status value. That work has landed and so we can enable
this feature in `config-next` ahead of a staging/prod rollout.
During periods of peak load, some RPCs are significantly delayed (on the order of seconds) by client-side blocking. HTTP/2 clients have to obey a "max concurrent streams" setting sent by the server. In Go's HTTP/2 implementation, this value [defaults to 250](https://github.com/golang/net/blob/master/http2/server.go#L56), so the gRPC default is also 250. So whenever there are more than 250 requests in progress at a time, additional requests will be delayed until there is a slot available.
During this peak load, we aren't hitting limits on CPU or memory, so we should increase the max concurrent streams limit to take better advantage of our available resources. This PR adds a config field to do that.
Fixes#3641.
This commit disables the `OrderReadyStatus` feature flag in
`test/config-next/sa.json`. Certbot's ACME implementation breaks when
this flag is enabled (See
https://github.com/certbot/certbot/issues/5856). Since Certbot runs
integration tests against Boulder with config-next we should be
courteous and leave this flag disabled until we are closer to being able
to turn it on for staging/prod.
* SA: Add Order "Ready" status, feature flag.
This commit adds the new "Ready" status to `core/objects.go` and updates
`sa.statusForOrder` to use it conditionally for orders with all valid
authorizations that haven't been finalized yet. This state is used
conditionally based on the `features.OrderReadyStatus` feature flag
since it will likely break some existing clients that expect status
"Processing" for this state. The SA unit test for `statusForOrder` is
updated with a "ready" status test case.
* RA: Enforce order ready status conditionally.
This commit updates the RA to conditionally expect orders that are being
finalized to be in the "ready" status instead of "pending". This is
conditionally enforced based on the `OrderReadyStatus` feature flag.
Along the way the SA was changed to calculate the order status for the
order returned in `sa.NewOrder` dynamically now that it could be
something other than "pending".
* WFE2: Conditionally enforce order ready status for finalization.
Similar to the RA the WFE2 should conditionally enforce that an order's
status is either "ready" or "pending" based on the "OrderReadyStatus"
feature flag.
* Integration: Fix `test_order_finalize_early`.
This commit updates the V2 `test_order_finalize_early` test for the
"ready" status. A nice side-effect of the ready state change is that we
no longer invalidate an order when it is finalized too soon because we
can reject the finalization in the WFE. Subsequently the
`test_order_finalize_early` testcase is also smaller.
* Integration: Test classic behaviour w/o feature flag.
In the previous commit I fixed the integration test for the
`config/test-next` run that has the `OrderReadyStatus` feature flag set
but broke it for the `config/test` run without the feature flag.
This commit updates the `test_order_finalize_early` test to work
correctly based on the feature flag status in both cases.
gRPC passes deadline information through the RPC boundary, but client and server have the same deadline. Ideally we'd like the server to have a slightly tighter deadline than the client, so if one of the server's onward RPCs or other network calls times out, the server can pass back more detailed information to the client, rather than the client timing out the server and losing the opportunity to log more detailed information about which component caused the timeout.
In this change, I subtract 100ms from the deadline on the server side of our interceptors, using our existing serverInterceptor. I also check that there is at least 100ms remaining in which to do useful work, so the server doesn't begin a potentially expensive task only to abort it.
Fixes#3608.
Removes usage of the `EnforceChallengeDisable` feature, the feature itself is not removed as it is still configured in staging/production, once that is fixed I'll submit another PR removing the actual flag.
This keeps the behavior that when authorizations are retrieved from the SA they have their challenges populated, because that seems to make the most sense to me? It also retains TLS re-validation.
Fixes#3441.
This PR implements issuance for wildcard names in the V2 order flow. By policy, pending authorizations for wildcard names only receive a DNS-01 challenge for the base domain. We do not re-use authorizations for the base domain that do not come from a previous wildcard issuance (e.g. a normal authorization for example.com turned valid by way of a DNS-01 challenge will not be reused for a *.example.com order).
The wildcard prefix is stripped off of the authorization identifier value in two places:
When presenting the authorization to the user - ACME forbids having a wildcard character in an authorization identifier.
When performing validation - We validate the base domain name without the *. prefix.
This PR is largely a rewrite/extension of #3231. Instead of using a pseudo-challenge-type (DNS-01-Wildcard) to indicate an authorization & identifier correspond to the base name of a wildcard order name we instead allow the identifier to take the wildcard order name with the *. prefix.
Go's default is 2: https://golang.org/src/database/sql/sql.go#L686.
Graphs show we are opening 100-200 fresh connections per second on the SA.
Changing this default should reduce that a lot, which should reduce load on both
the SA and MariaDB. This should also improve latency, since every new TCP
connection adds a little bit of latency.
The existing ReusePendingAuthz implementation had some bugs:
It would recycle deactivated authorizations, which then couldn't be fulfilled. (#2840)
Since it was implemented in the SA, it wouldn't get called until after the RA checks the Pending Authorizations rate limit. Which means it wouldn't fulfill its intended purpose of making accounts less likely to get stuck in a Pending Authorizations limited state. (#2831)
This factors out the reuse functionality, which used to be inside an "if" statement in the SA. Now the SA has an explicit GetPendingAuthorization RPC, which gets called from the RA before calling NewPendingAuthorization. This happens to obsolete #2807, by putting the recycling logic for both valid and pending authorizations in the RA.
As described in Boulder issue #2800 the implementation of the SA's
`countCertificates` function meant that the renewal exemption for the
Certificates Per Domain rate limit was difficult to work with. To
maximize allotted certificates clients were required to perform all new
issuances first, followed by the "free" renewals. This arrangement was
difficult to coordinate.
In this PR `countCertificates` is updated such that renewals are
excluded from the count reliably. To do so the SA takes the serials it
finds for a given domain from the issuedNames table and cross references
them with the FQDN sets it can find for the associated serials. With the
FQDN sets a second query is done to find all the non-renewal FQDN sets
for the serials, giving a count of the total non-renewal issuances to
use for rate limiting.
Resolves#2800
This removes the config and code to output to statsd.
- Change `cmd.StatsAndLogging` to output a `Scope`, not a `Statter`.
- Remove the prefixing of component name (e.g. "VA") in front of stats; this was stripped by `autoProm` but now no longer needs to be.
- Delete vendored statsd client.
- Delete `MockStatter` (generated by gomock) and `mocks.Statter` (hand generated) in favor of mocking `metrics.Scope`, which is the interface we now use everywhere.
- Remove a few unused methods on `metrics.Scope`, and update its generated mock.
- Refactor `autoProm` and add `autoRegisterer`, which can be included in a `metrics.Scope`, avoiding global state. `autoProm` now registers everything with the `prometheus.Registerer` it is given.
- Change va_test.go's `setup()` to not return a stats object; instead the individual tests that care about stats override `va.stats` directly.
Fixes#2639, #2733.
The NotAfter and IsExpired fields on the certificateStatus table
have been migrated in staging & production. Similarly the
CertStatusOptimizationsMigrated feature flag has been turned on after
a successful backfill operation. We have confirmed the optimization is
working as expected and can now clean out the duplicated v1 and v2
models, and the feature flag branching. The notafter-backfill command
is no longer useful and so this commit also cleans it out of the repo.
Note: Some unit tests were sidestepping the SA and inserting
certificateStatus rows explicitly. These tests had to be updated to
set the NotAfter field in order for the queries used by the
ocsp-updater and the expiration-mailer to perform the way the tests
originally expected.
Resolves#2530
Previously, a given binary would have three TLS config fields (CA cert, cert,
key) for its gRPC server, plus each of its configured gRPC clients. In typical
use, we expect all three of those to be the same across both servers and clients
within a given binary.
This change reuses the TLSConfig type already defined for use with AMQP, adds a
Load() convenience function that turns it into a *tls.Config, and configures it
for use with all of the binaries. This should make configuration easier and more
robust, since it more closely matches usage.
This change preserves temporary backwards-compatibility for the
ocsp-updater->publisher RPCs, since those are the only instances of gRPC
currently enabled in production.
Previously, all gRPC services used the same client and server certificates. Now,
each service has its own certificate, which it uses for both client and server
authentication, more closely simulating production.
This also adds aliases for each of the relevant hostnames in /etc/hosts. There
may be some issues if Docker decides to rewrite /etc/hosts while Boulder is
running, but this seems to work for now.
- Remove spinner from test.js. It made Travis logs hard to read.
- Listen on all interfaces for debugAddr. This makes it possible to check
Prometheus metrics for instances running in a Docker container.
- Standardize DNS timeouts on 1s and 3 retries across all configs. This ensures
DNS completes within the relevant RPC timeouts.
- Remove RA service queue from VA, since VA no longer uses the callback to RA on
completing a challenge.
Adds a gRPC server to the SA and SA gRPC Clients to the WFE, RA, CA, Publisher, OCSP updater, orphan finder, admin revoker, and expiration mailer.
Also adds a CA gRPC client to the OCSP Updater which was missed in #2193.
Fixes#2347.
Move features sections to the correct JSON object and only test registration validity if regCheck is true
* Pull other flag up to correct level
* Only check status update when status is non-empty
This PR adds a migration to create two new fields on the `certificateStatus` table: `notAfter` and `isExpired`. The rationale for these fields is explained in #1864. Usage of these fields is gated behind `features.CertStatusOptimizationsMigrated` per [CONTRIBUTING.md](https://github.com/letsencrypt/boulder/blob/master/CONTRIBUTING.md#gating-migrations). This flag should be set to true **only** when the `20160817143417_CertStatusOptimizations.sql` migration has been applied.
Points of difference from #2132 (the initial preparatory "all-in-one go" PR):
**Note 1**: Updating the `isExpired` field in the OCSP updater can not be done yet, the `notAfter` field needs to be fully populated first - otherwise a separate query or a messy `JOIN` would have to be used to determine if a certStatus `isExpired` by using the `certificates` table's `expires` field.
**Note 2**: Similarly we can't remove the `JOIN` on `certificates` from the `findStaleOCSPResponse` query yet until all DB rows have `notAfter` populated. This will happen in a separate **Part Two** PR.