Our Travis output is quite verbose with the WFE output, and it's very
rare that we have to reference it. I'd like to remove the INFO-level
logs (i.e. the logs of every request) so that it's easier to see real
errors, and faster to scroll to the bottom of logs of failed runs.
ACME draft-13 changed the inner JWS' body to contain the old key
that signed the outer JWS. Because this is a backwards incompatible
change with the draft-12 ACME v2 key rollover we introduce a new feature
flag `features.ACME13KeyRollover` and conditionally use the old or new
key rollover semantics based on its value.
We'd like to start using the DNS load balancer in the latest version of gRPC. That means putting all IPs for a service under a single hostname (or using a SRV record, but we're not taking that path). This change adds an sd-test-srv to act as our service discovery DNS service. It returns both Boulder IP addresses for any A lookup ending in ".boulder". This change also sets up the Docker DNS for our boulder container to defer to sd-test-srv when it doesn't know an answer.
sd-test-srv doesn't know how to resolve public Internet names like `github.com`. Resolving public names is required for the `godep-restore` test phase, so this change breaks out a copy of the boulder container that is used only for `godep-restore`.
This change implements a shim of a DNS resolver for gRPC, so that we can switch to DNS-based load balancing with the currently vendored gRPC, then when we upgrade to the latest gRPC we won't need a simultaneous config update.
Also, this change introduces a check at the end of the integration test that each backend received at least one RPC, ensuring that we are not sending all load to a single backend.
While we intended to allow legacy ACME v1 accounts created through the WFE to work with the ACME v2 implementation and the WFE2 we neglected to consider that a legacy account would have a Key ID URL that doesn't match the expected for a V2 account. This caused `wfe2/verify.go`'s `lookupJWK` to reject all POST requests authenticated by a legacy account unless the ACME client took the extra manual step of "fixing" the URL.
This PR adds a configuration parameter to the WFE2 for an allowed legacy key ID prefix. The WFE2 verification logic is updated to allow both the expected key ID prefix and the configured legacy key ID prefix. This will allow us to specify the correct legacy URL in configuration for both staging/prod to allow unmodified V1 ACME accounts to be used with ACME v2.
Resolves https://github.com/letsencrypt/boulder/issues/3674
We're currently stuck on gRPC v1.1 because of a breaking change to certificate validation in gRPC 1.8. Our gRPC balancer uses a static list of multiple hostnames, and expects to validate against those hostnames. However gRPC expects that a service is one hostname, with multiple IP addresses, and validates all those IP addresses against the same hostname. See grpc/grpc-go#2012.
If we follow gRPC's assumptions, we can rip out our custom Balancer and custom TransportCredentials, and will probably have a lower-friction time in general.
This PR is the first step in doing so. In order to satisfy the "multiple IPs, one port" property of gRPC backends in our Docker container infrastructure, we switch to Docker's user-defined networking. This allows us to give the Boulder container multiple IP addresses on different local networks, and gives it different DNS aliases in each network.
In startservers.py, each shard of a service listens on a different DNS alias for that service, and therefore a different IP address. The listening port for each shard of a service is now identical.
This change also updates the gRPC service certificates. Now, each certificate that is used in a gRPC service (as opposed to something that is "only" a client) has three names. For instance, sa1.boulder, sa2.boulder, and sa.boulder (the generic service name). For now, we are validating against the specific hostnames. When we update our gRPC dependency, we will begin validating against the generic service name.
Incidentally, the DNS aliases feature of Docker allows us to get rid of some hackery in entrypoint.sh that inserted entries into /etc/hosts.
Note: Boulder now has a dependency on the DNS aliases feature in Docker. By default, docker-compose run creates a temporary container and doesn't assign any aliases to it. We now need to specify docker-compose run --use-aliases to get the correct behavior. Without --use-aliases, Boulder won't be able to resolve the hostnames it wants to bind to.
load generator: send correct ACMEv2 Content-Type on POST.
This PR updates the Boulder load-generator to send the correct ACMEv2 Content-Type header when POSTing the ACME server. This is required for ACMEv2 and without it all POST requests to the WFE2 running a test/config-next configuration result in malformed 400 errors. While only required by ACMEv2 this commit sends it for ACMEv1 requests as well. No harm no foul.
integration tests: allow running just the load generator.
Prior to this PR an omission in an if statement in integration-test.py meant that you couldn't invoke test/integration-test.py with just the --load argument to only run the load generator. This commit updates the if to allow this use case.
* SA: Add Order "Ready" status, feature flag.
This commit adds the new "Ready" status to `core/objects.go` and updates
`sa.statusForOrder` to use it conditionally for orders with all valid
authorizations that haven't been finalized yet. This state is used
conditionally based on the `features.OrderReadyStatus` feature flag
since it will likely break some existing clients that expect status
"Processing" for this state. The SA unit test for `statusForOrder` is
updated with a "ready" status test case.
* RA: Enforce order ready status conditionally.
This commit updates the RA to conditionally expect orders that are being
finalized to be in the "ready" status instead of "pending". This is
conditionally enforced based on the `OrderReadyStatus` feature flag.
Along the way the SA was changed to calculate the order status for the
order returned in `sa.NewOrder` dynamically now that it could be
something other than "pending".
* WFE2: Conditionally enforce order ready status for finalization.
Similar to the RA the WFE2 should conditionally enforce that an order's
status is either "ready" or "pending" based on the "OrderReadyStatus"
feature flag.
* Integration: Fix `test_order_finalize_early`.
This commit updates the V2 `test_order_finalize_early` test for the
"ready" status. A nice side-effect of the ready state change is that we
no longer invalidate an order when it is finalized too soon because we
can reject the finalization in the WFE. Subsequently the
`test_order_finalize_early` testcase is also smaller.
* Integration: Test classic behaviour w/o feature flag.
In the previous commit I fixed the integration test for the
`config/test-next` run that has the `OrderReadyStatus` feature flag set
but broke it for the `config/test` run without the feature flag.
This commit updates the `test_order_finalize_early` test to work
correctly based on the feature flag status in both cases.
gRPC passes deadline information through the RPC boundary, but client and server have the same deadline. Ideally we'd like the server to have a slightly tighter deadline than the client, so if one of the server's onward RPCs or other network calls times out, the server can pass back more detailed information to the client, rather than the client timing out the server and losing the opportunity to log more detailed information about which component caused the timeout.
In this change, I subtract 100ms from the deadline on the server side of our interceptors, using our existing serverInterceptor. I also check that there is at least 100ms remaining in which to do useful work, so the server doesn't begin a potentially expensive task only to abort it.
Fixes#3608.
This commit updates the WFE and WFE2 to have configuration support for
setting a value for the `/directory` object's "meta" field's
optional "caaIdentities" and "website" fields. The config-next wfe/wfe2
configuration are updated with values for these fields. Unit tests are
updated to check that they are sent when expected and not otherwise.
Bonus content: The `test.AssertUnmarshaledEquals` function had a bug
where it would consider two inputs equal when the # of keys differed.
This commit also fixes that bug.
We shipped this feature flag disabled because at the time Certbot's acme
module had a bug with revocation that sent the wrong content-type. That
has been fixed upstream and the current boulder-tools image has the fix.
We should be able to turn this flag on now for config-next without
breaking tests.
Boulder is fairly noisy about gRPC connection errors. This is a mixed
blessing: Our gRPC configuration will try to reconnect until it hits
an RPC deadline, and most likely eventually succeed. In that case,
we don't consider those to really be errors. However, in cases where
a connection is repeatedly failing, we'd like to see errors in the
logs about connection failure, rather than "deadline exceeded." So
we want to keep logging of gRPC errors.
However, right now we get a lot of these errors logged during
integration tests. They make the output hard to read, and may disguise
more serious errors. So we'd like to avoid causing such errors in
normal integration test operation.
This change reorders the startup of Boulder components by their gRPC
dependencies, so everything's backend is likely to be up and running
before it starts. It also reverses that order for clean shutdowns,
and waits for each process to exit before signalling the next one.
With these changes, I still got connection errors. Taking listenbuddy
out of the gRPC path fixed them. I believe the issue is that
listenbuddy is not a truly transparent proxy. In particular, it
accepts an inbound TCP connection before opening an outbound TCP
connection. If opening that outbound connection results in "connection
refused," it closes the inbound connection. That means gRPC sees a
"connection closed" (or "connection reset"?) rather than "connection
refused". I'm guessing it handles those cases differently, explaining
the different error results.
We've been using listenbuddy to trigger disconnects while Boulder is
running, to ensure that gRPC's reconnect code works. I think we can
probably rely on gRPC's reconnect to work. The initial problem that
led us to start testing this was a configuration problem; now that
we have the configuration we want, we should be fine and don't need
to keep testing reconnects on every integration test run.
This commit implements a mapping from certificate AIA Issuer URL to PEM
encoded certificate chain. GET's to the V2 Certificate endpoint will
return a full PEM encoded certificate chain in addition to the leaf cert
using the AIA issuer URL of the leaf cert and the configured mapping.
The boulder-wfe2 command builds the chain mapping by reading the
"wfe" config section's 'certificateChains" field, specifying a list
of file paths to PEM certificates for each AIA issuer URL. At startup
the PEM file contents are ready, verified and separated by a newline.
The resulting populated AIA issuer URL -> PEM cert chain mapping is
given to the WFE for use with the Certificate endpoint.
Resolves#3291
The WFE2 doesn't check any of the feature flags that are configured in
the `test/config/wfe2.json` and `test/config-next/wfe2.json` config
files - we default to acting as if all new features are enabled for the
V2 work. This commit removes the flags from the config to avoid
confusion or expectations that changing the config will disable the
features.
Previously, there was a disagreement between WFE and CA as to what the correct
issuer certificate was. Consolidate on test-ca2.pem (h2ppy h2cker fake CA).
Also, the CA configs contained an outdated entry for "IssuerCert", which was not
being used: The CA configs now use an "Issuers" array to allow signing by
multiple issuer certificates at once (for instance when rolling intermediates).
Removed this outdated entry, and the config code for CA to load it. I've
confirmed these changes match what is currently in production.
Added an integration test to check for this problem in the future.
Fixes#3309, thanks to @icing for bringing the issue to our attention!
This also includes changes from #3321 to clarify certificates for WFE.
- Encode certificate as PEM.
- Use lowercase for field names.
- Use termsOfServiceAgreed instead of Agreement
- Use a different ToS URL for v2 that points at the v2 HTTPS port.
Resolves#3280
This PR is the initial duplication of the WFE to create a WFE2
package. The rationale is briefly explained in `wfe2/README.md`.
Per #2822 this PR only lays the groundwork for further customization
and deduplication. Presently both the WFE and WFE2 are identical except
for the following configuration differences:
* The WFE offers HTTP and HTTPS on 4000 and 4430 respectively, the WFE2
offers HTTP on 4001 and 4431.
* The WFE has a debug port on 8000, the WFE2 uses the next free "8000
range port" and puts its debug service on 8013
Resolves https://github.com/letsencrypt/boulder/issues/2822