Commit Graph

909 Commits

Author SHA1 Message Date
Roland Bracewell Shoemaker 876c727b6f Update gRPC (#3817)
Fixes #3474.
2018-08-20 10:55:42 -04:00
Roland Bracewell Shoemaker 3a8f0bc0be Allow ocsp-responder to filter requests by serial prefix (#3815) 2018-08-10 11:16:22 -04:00
Roland Bracewell Shoemaker b5f7c62460 Remove leftover publisher CT config (#3803) 2018-07-27 08:05:51 -04:00
Jacob Hoffman-Andrews 36a83150ad Add stagger to CT log submissions. (#3794)
This allows each log a chance to respond before we move onto the next,
spreading our load more evenly across the logs in a log group.
2018-07-06 16:25:51 -04:00
Roland Bracewell Shoemaker e27f370fd3 Excise code relating to pre-SCT embedding issuance flow (#3769)
Things removed:

* features.EmbedSCTs (and all the associated RA/CA/ocsp-updater code etc)
* ca.enablePrecertificateFlow (and all the associated RA/CA code)
* sa.AddSCTReceipt and sa.GetSCTReceipt RPCs
* publisher.SubmitToCT and publisher.SubmitToSingleCT RPCs

Fixes #3755.
2018-06-28 08:33:05 -04:00
Shiloh Heurich d7bfbd25be Fix gen-ca output (#3773)
Fix: When any of OCSPURL, CRLURL or IssuerURL in the CertProfile are empty, the relevant fields are encoded in ASN.1 as empty arrays.
Fix: KeyUsage is a bitmask requiring bitwise OR.
2018-06-21 15:22:02 -07:00
Shiloh Heurich 3f348f1284 Require OCSPURL and CRLURL in CertProfile only for intermediates. (#3771) 2018-06-21 13:58:00 -04:00
Roland Bracewell Shoemaker 6fe950bc0c
Add PKCS#11 certificate generation tool (#3729)
Tested against relevant hardware for generating both RSA and ECDSA roots and intermediates with keys generated using `gen-key`.

Also this makes a few changes to the `gen-key` tool after further experience with the HSM and more reading of the PCKS#11 specification. Main change is the removal of `compatMode`, which was intended to provide support for two naming schemes for EC used in subsequent PKCS#11 drafts. It turns out these schemes were changes in name only and the underlying structs/ints were the exact same (i.e. `CKA_ECDSA_PARAMS == CKA_EC_PARAMS` and `CKM_ECDSA_KEY_PAIR_GEN == CKM_EC_KEY_PAIR_GEN`) and just allowed using one of the two names based on preference. This meant with `compatMode` enabled or disabled the tool did the exact same thing.

Fixes #3697.
2018-06-12 12:13:09 -07:00
Joel Sing 9c2859c87b Add support for CAA account-uri validation. (#3736)
This adds support for the account-uri CAA parameter as specified by
section 3 of https://tools.ietf.org/html/draft-ietf-acme-caa-04, allowing
issuance to be restricted to one or more ACME accounts as specified by CAA
records.
2018-06-08 12:08:03 -07:00
Roland Bracewell Shoemaker 72949d5915
Switch from globalsign/certlint to zmap/zlint (#3745)
Switch linting library to zmap/zlint.

```
github.com/zmap/zlint$ go test ./...
ok  	github.com/zmap/zlint	0.190s
?   	github.com/zmap/zlint/cmd/zlint	[no test files]
ok  	github.com/zmap/zlint/lints	0.216s
ok  	github.com/zmap/zlint/util	(cached)
```
2018-06-04 17:47:06 -07:00
Kyle Spiers dd0e0249e5 core/util: ValidSerial should return false if the serial is not 32 or 36 (#3712)
The current check always fails because a length can't be simultaneously be both less than 32 and greater than 36.
2018-05-24 15:31:06 -04:00
Roland Bracewell Shoemaker 30394c4b4c Accept empty pin and generate a key ID (#3713)
Two fixes that I found while doing work on the gen-cert tool and setting up the HSM again
* Accept an empty PIN argument, this allows purely using the PED for login if not using challenge mode
* Generate 4 byte key ID for public/private key pairs during key gen, the HSM doesn't generate this field itself and `letsencrypt/pkcs11key` relies on this attribute to function
2018-05-16 08:33:34 -04:00
Daniel McCarney 5597a77ba2
WFE2: Allow legacy Key ID prefix for ACME v2 JWS. (#3705)
While we intended to allow legacy ACME v1 accounts created through the WFE to work with the ACME v2 implementation and the WFE2 we neglected to consider that a legacy account would have a Key ID URL that doesn't match the expected for a V2 account. This caused `wfe2/verify.go`'s `lookupJWK` to reject all POST requests authenticated by a legacy account unless the ACME client took the extra manual step of "fixing" the URL.

This PR adds a configuration parameter to the WFE2 for an allowed legacy key ID prefix. The WFE2 verification logic is updated to allow both the expected key ID prefix and the configured legacy key ID prefix. This will allow us to specify the correct legacy URL in configuration for both staging/prod to allow unmodified V1 ACME accounts to be used with ACME v2.

Resolves https://github.com/letsencrypt/boulder/issues/3674
2018-05-11 15:57:56 -04:00
Joel Sing f8a023e49c Remove various unnecessary uses of fmt.Sprintf (#3707)
Remove various unnecessary uses of fmt.Sprintf - in particular:

- Avoid calls like t.Error(fmt.Sprintf(...)), where t.Errorf can be used directly.

- Use strconv when converting an integer to a string, rather than using
  fmt.Sprintf("%d", ...). This is simpler and can also detect type errors at
  compile time.

- Instead of using x.Write([]byte(fmt.Sprintf(...))), use fmt.Fprintf(x, ...).
2018-05-11 11:55:25 -07:00
Joel Sing 8ebdfc60b6 Provide formatting logger functions. (#3699)
A very large number of the logger calls are of the form log.Function(fmt.Sprintf(...)).
Rather than sprinkling fmt.Sprintf at every logger call site, provide formatting versions
of the logger functions and call these directly with the format and arguments.

While here remove some unnecessary trailing newlines and calls to String/Error.
2018-05-10 11:06:29 -07:00
Daniel McCarney b7f356150a SA: Cleanup, forbid nil issuer arg to AddCertificate (#3675)
In #3651 we introduced a new parameter to sa.AddCertificate to allow specifying the Issued date. If nil, we defaulted to the current time to maintain deployability guidelines.

Now that this has been deployed everywhere this PR updates SA.AddCertificate and the gRPC wrappers such that a nil issuer argument is rejected with an error.

Unit tests that were previously using nil for the issued time are updated to explicitly set the issued time to the fake clock's now().

Resolves #3657
2018-05-02 10:29:21 -07:00
Roland Bracewell Shoemaker d01f74402b Fix ec gen-key test (#3685)
Test sign function didn't properly pad R and S in the EC signature as per the PKCS#11 spec.

Fixes #3671.
2018-05-01 18:07:01 -07:00
Jacob Hoffman-Andrews 4bea2d8a9a Fix ocsp-updater backoff jitter test. (#3670)
Previously this test assumed jitter could only be positive, but it can
be negative too. This fixes the expected bounds.
2018-04-26 08:11:33 -04:00
Daniel McCarney aa810a3142 gRPC: publish RPC latency stat in server interceptor. (#3665)
We may see RPCs that are dispatched by a client but do not arrive at the server for some time afterwards. To have insight into potential request latency at this layer we want to publish the time delta between when a client sent an RPC and when the server received it.

This PR updates the gRPC client interceptor to add the current time to the gRPC request metadata context when it dispatches an RPC. The server side interceptor is updated to pull the client request time out of the gRPC request metadata. Using this timestamp it can calculate the latency and publish it as an observation on a Prometheus histogram.

Accomplishing the above required wiring a clock through to each of the client interceptors. This caused a small diff across each of the gRPC aware boulder commands.

A small unit test is included in this PR that checks that a latency stat is published to the histogram after an RPC to a test ChillerServer is made. It's difficult to do more in-depth testing because using fake clocks makes the latency 0 and using real clocks requires finding a way to queue/delay requests inside of the gRPC mechanisms not exposed to Boulder.

Updates https://github.com/letsencrypt/boulder/issues/3635 - Still TODO: Explicitly logging latency in the VA, tracking outstanding RPCs as a gauge.
2018-04-25 15:37:22 -07:00
Jacob Hoffman-Andrews 4dcbf5c883
Run multiples of services in integration tests (#3662)
Fixes #3653.
2018-04-24 16:00:40 -07:00
Roland Bracewell Shoemaker 24cd01d033 Revert to setting full addresses instead of just ports 2018-04-23 12:39:28 -07:00
Roland Bracewell Shoemaker 5c4eaf841f Review fixes 2018-04-20 16:03:55 -07:00
Roland Bracewell Shoemaker ccb02419c5 Revert client changes + addr debug override 2018-04-20 12:46:33 -07:00
Roland Bracewell Shoemaker d424d0580b Allow cli override of gRPC listen and service addresses 2018-04-20 12:35:12 -07:00
Roland Bracewell Shoemaker 0e6713e573 Randomize order of CT logs when submitting precerts (#3660)
* Randomize order of CT logs when submitting precerts so we maximize the chances we actually exercise all of the logs in a group and not just the first in the list.

* Add metrics for winning logs
2018-04-20 15:00:10 -04:00
Daniel McCarney f8f9a158c7 orphan-finder: set cert issued date based on notbefore. (#3651)
The Boulder orphan-finder command uses the SA's AddCertificate RPC to add orphaned certificates it finds back to the DB. Prior to this commit this RPC always set the core.Certificate.Issued field to the
current time. For the orphan-finder case this meant that the Issued date would incorrectly be set to when the certificate was found, not when it was actually issued. This could cause cert-checker to alarm based on the unusual delta between the cert NotBefore and the core.Certificate.Issued value.

This PR updates the AddCertificate RPC to accept an optional issued timestamp in the request arguments. In the SA layer we address deployability concerns by setting a default value of the current time when none is explicitly provided. This matches the classic behaviour and will let an old RA communicate with a new SA.

This PR updates the orphan-finder to provide an explicit issued time to sa.AddCertificate. The explicit issued time is calculated using the found certificate's NotBefore and the configured backdate.
This lets the orphan-finder set the true issued time in the core.Certificate object, avoiding any cert-checker alarms.

Resolves #3624
2018-04-19 10:25:12 -07:00
Roland Bracewell Shoemaker 1271a15be7 Submit final certs to CT logs (#3640)
Submits final certificates to any configured CT logs. This doesn't introduce a feature flag as it is config gated, any log we want to submit final certificates to needs to have it's log description updated to include the `"submitFinalCerts": true` field.

Fixes #3605.
2018-04-13 12:02:01 -04:00
Jacob Hoffman-Andrews 2a1cd4981a Allow configuring gRPC's MaxConcurrentStreams (#3642)
During periods of peak load, some RPCs are significantly delayed (on the order of seconds) by client-side blocking. HTTP/2 clients have to obey a "max concurrent streams" setting sent by the server. In Go's HTTP/2 implementation, this value [defaults to 250](https://github.com/golang/net/blob/master/http2/server.go#L56), so the gRPC default is also 250. So whenever there are more than 250 requests in progress at a time, additional requests will be delayed until there is a slot available.

During this peak load, we aren't hitting limits on CPU or memory, so we should increase the max concurrent streams limit to take better advantage of our available resources. This PR adds a config field to do that.

Fixes #3641.
2018-04-12 17:17:17 -04:00
Daniel McCarney 74d5decc67 Remove `TotalCertificates` rate limit. (#3638)
The `TotalCertificates` rate limit serves to ensure we don't
accidentally exceed our OCSP signing capacity by issuing too many
certificates within a fixed period. In practice this rate limit has been
fragile and the associated queries have been linked to performance
problems.

Since we now have better means of monitoring our OCSP signing capacity
this commit removes the rate limit and associated code.
2018-04-12 13:25:47 -07:00
Daniel McCarney 299e53b237 RA,CA: Refuse to start with MaxNames == 0. (#3634)
This commit updates the `boulder-ra` and `boulder-ca` commands to refuse
to start if their configured `MaxNames` is 0 (the default value). This
should always be set to a positive number.

This commit also updates `csr/csr.go` to always apply the max names
check since it will never be 0 after the change above.

Also refactor `FailOnError` to pull out a separate `Fail` function.

Related to https://github.com/letsencrypt/boulder/issues/3632
2018-04-10 10:53:23 -07:00
Roland Bracewell Shoemaker cc5ec34539 Allow configuration of multiple DNS resolvers (#3612)
* Allow configuration of multiple DNS resolvers
* Use multiple DNS resolvers in integration tests

Fixes #3611.
2018-04-05 11:51:22 -04:00
Daniel McCarney 590dca0fe1
Cert-checker: Update certlint, add CN/SAN==PSL err ignore. (#3600)
* Update `globalsign/certlint` to d4a45be.

This commit updates the `github.com/globalsign/certlint` dependency to
the latest tip of master (d4a45be06892f3e664f69892aca79a48df510be0).

Unit tests are confirmed to pass:
```
$ go test ./...
ok    github.com/globalsign/certlint  3.816s
ok    github.com/globalsign/certlint/asn1 (cached)
?     github.com/globalsign/certlint/certdata [no test files]
?     github.com/globalsign/certlint/checks [no test files]
?     github.com/globalsign/certlint/checks/certificate/aiaissuers  [no
test files]
?     github.com/globalsign/certlint/checks/certificate/all [no test
files]
?     github.com/globalsign/certlint/checks/certificate/basicconstraints
[no test files]
?     github.com/globalsign/certlint/checks/certificate/extensions  [no
test files]
?     github.com/globalsign/certlint/checks/certificate/extkeyusage [no
test files]
ok    github.com/globalsign/certlint/checks/certificate/internal
(cached)
?     github.com/globalsign/certlint/checks/certificate/issuerdn  [no
test files]
?     github.com/globalsign/certlint/checks/certificate/keyusage  [no
test files]
?     github.com/globalsign/certlint/checks/certificate/publickey [no
test files]
?     github.com/globalsign/certlint/checks/certificate/publickey/goodkey
[no test files]
ok    github.com/globalsign/certlint/checks/certificate/publicsuffix
(cached)
?     github.com/globalsign/certlint/checks/certificate/revocation  [no
test files]
?     github.com/globalsign/certlint/checks/certificate/serialnumber
[no test files]
?     github.com/globalsign/certlint/checks/certificate/signaturealgorithm
[no test files]
ok    github.com/globalsign/certlint/checks/certificate/subject (cached)
ok    github.com/globalsign/certlint/checks/certificate/subjectaltname
(cached)
?     github.com/globalsign/certlint/checks/certificate/validity  [no
test files]
?     github.com/globalsign/certlint/checks/certificate/version [no test
files]
?     github.com/globalsign/certlint/checks/certificate/wildcard  [no
test files]
?     github.com/globalsign/certlint/checks/extensions/adobetimestamp
[no test files]
?     github.com/globalsign/certlint/checks/extensions/all  [no test
files]
?     github.com/globalsign/certlint/checks/extensions/authorityinfoaccess
[no test files]
?     github.com/globalsign/certlint/checks/extensions/authoritykeyid
[no test files]
?     github.com/globalsign/certlint/checks/extensions/basicconstraints
[no test files]
?     github.com/globalsign/certlint/checks/extensions/crldistributionpoints
[no test files]
?     github.com/globalsign/certlint/checks/extensions/ct [no test
files]
?     github.com/globalsign/certlint/checks/extensions/extkeyusage  [no
test files]
?     github.com/globalsign/certlint/checks/extensions/keyusage [no test
files]
?     github.com/globalsign/certlint/checks/extensions/nameconstraints
[no test files]
ok    github.com/globalsign/certlint/checks/extensions/ocspmuststaple
(cached)
?     github.com/globalsign/certlint/checks/extensions/ocspnocheck  [no
test files]
?     github.com/globalsign/certlint/checks/extensions/pdfrevocation
[no test files]
?     github.com/globalsign/certlint/checks/extensions/policyidentifiers
[no test files]
?     github.com/globalsign/certlint/checks/extensions/smimecapabilities
[no test files]
?     github.com/globalsign/certlint/checks/extensions/subjectaltname
[no test files]
?     github.com/globalsign/certlint/checks/extensions/subjectkeyid [no
test files]
ok    github.com/globalsign/certlint/errors (cached)
?     github.com/globalsign/certlint/examples/ct  [no test files]
?     github.com/globalsign/certlint/examples/specificchecks  [no test
files]
```

* Certchecker: Remove OCSP Must Staple err ignore, fix typos.

This commit removes the explicit ignore for OCSP Must Staple errors that
was added when the upstream `certlint` package didn't understand that
PKIX extension. That problem was resolved and so we can remove the
ignore from `cert-checker`.

This commit also fixes two typos that were fixed upstream and needed to
be reflected in expected error messages in the `certlint` unit test.

* Certchecker: Ignore Certlint CN/SAN == PSL errors.

`globalsign/certlint`, used by `cmd/cert-checker` to vet certs,
improperly flags certificates that have subj CN/SANs equal to a private
entry in the public suffix list as faulty.

This commit adds a regex that will skip errors that match the certlint
PSL error string. Prior to this workaround the addition of a private PSL
entry as a SAN in the `TestCheckCert` test cert fails the test:

```
--- FAIL: TestCheckCert (1.72s)
  main_test.go:221: Found unexpected problem 'Certificate subjectAltName
  "dev-myqnapcloud.com" equals "dev-myqnapcloud.com" from the public
  suffix list'.
```

With the workaround in place, the test passes again.
2018-04-04 12:20:43 -04:00
Roland Bracewell Shoemaker 8167abd5e3 Use internet facing appropriate histogram buckets for DNS latencies (#3616)
Also instead of repeating the same bucket definitions everywhere just use a single top level var in the metrics package in order to discourage copy/pasting.

Fixes #3607.
2018-04-04 08:01:54 -04:00
Daniel McCarney 703b134e93 WFE2: Wire missed config elements to WFE object. (#3604)
This commit addresses two config elements that were defined but not
wired through to the WFE implementation object. Prior to this commit the
`c.WFE.DirectoryCAAIdentity` and `c.WFE.DirectoryWebsite` configuration
values were read and unmarshaled from config but not passed to the WFE.
After this commit these two config options will be picked up by the WFE
impl.
2018-03-29 11:01:26 -07:00
Daniel McCarney 57d0141519 cert-checker: Ignore OCSP Must Staple certlint errs. (#3598)
The upstream `certlint` package doesn't understand the RFC 7633 OCSP
Must Staple PKIX Extension and flags its presence as an error. Until
this is resolved upstream this commit updates `cmd/cert-checker` to
ignore the error.

The `TestCheckCert` unit test is updated to add an unsupported extension
and the OCSP must staple extension to its test cert. Only the
unsupported extension should be flagged as a problem.
2018-03-26 10:30:57 -07:00
Daniel McCarney 17922a6d2d
Add CAAIdentities and Website to /directory "meta". (#3588)
This commit updates the WFE and WFE2 to have configuration support for
setting a value for the `/directory` object's "meta" field's
optional "caaIdentities" and "website" fields. The config-next wfe/wfe2
configuration are updated with values for these fields. Unit tests are
updated to check that they are sent when expected and not otherwise.

Bonus content: The `test.AssertUnmarshaledEquals` function had a bug
where it would consider two inputs equal when the # of keys differed.
This commit also fixes that bug.
2018-03-22 16:12:43 -04:00
Daniel McCarney f3a2fd85bc Remove deprecated SubscriberAgreementURL config field. (#3587)
The outer `config.SubscriberAgreementURL` field has been deprecated for
a while in favour of `config.wfe.SubscriberAgreementURL`. After
verifying the prod/staging configurations do not use the legacy field
this commit removes it.
2018-03-22 12:43:53 -07:00
Daniel McCarney 0c4e1daa46 WFE2 Chain File Loading Improvements (#3580)
* Reject WFE2 certificate chain PEM files with CRLF endings.

This commit updates the `boulder-wfe2` command's processing of
certificate chains such that it will reject chain files that contain PEM
encoding with Windows CRLF line endings. Boulder is a UNIX service and
throughout we assume UNIX newlines. CRLF endings in a certificate chain
input file is an error that should be resolved by the operator prior to
startup.

* Add trailing newline to PEM chainfiles automatically.

If a PEM encoded chain file doesn't end with a trailing `\n` the WFE2
should add it. This commit updates the chain file loading to handle this
and adds a corresponding unit test.
2018-03-20 14:54:20 -07:00
Jacob Hoffman-Andrews 65b88a8dbc Run certlint in cert-checker (#3550)
This pulls in the certlint dependency, which in turn pulls in publicsuffix as a dependency.

Fixes #3549
2018-03-15 17:42:58 +00:00
Daniel McCarney e00ed50cc3 Fix nil panic in mailer logger setup when DB conn fails. (#3545)
Prior to this commit, if the expiration-mailer database configuration is
invalid, or the database is unreachable,
`cmd/expiration-mailer/main.go`'s `main()` function tries to call
`sa.SetSQLDebug(dbMap, logger)` before erroring from the DB
initialization failure. This causes a nil panic.

This commit changes the order of two lines such that `sa.SetSQLDebug` is
only called when there was no db setup error.
2018-03-12 13:46:13 -07:00
Roland Bracewell Shoemaker 9c9e944759 Add SCT embedding (#3521)
Adds SCT embedding to the certificate issuance flow. When a issuance is requested a precertificate (the requested certificate but poisoned with the critical CT extension) is issued and submitted to the required CT logs. Once the SCTs for the precertificate have been collected a new certificate is issued with the poison extension replace with a SCT list extension containing the retrieved SCTs.

Fixes #2244, fixes #3492 and fixes #3429.
2018-03-12 11:58:30 -07:00
Jacob Hoffman-Andrews d654675223 Remove BaseURL from WFE config. (#3540)
For a long time now the WFE has generated URLs based on the incoming
request rather than a hardcoded BaseURL. BaseURL is no longer set in the
prod configs.

This also allows factoring out relativeEndpoint into the web package.
2018-03-09 11:04:02 +00:00
Jacob Hoffman-Andrews 5c4f5e346a Fix pprof handlers. (#3533)
Some of the pprof handlers have to be accessed through
pprof.Handler("string"), while some have to be accessed through an
exported var in pprof. We weren't doing the latter before, which meant
some key handlers like Profile weren't available.
2018-03-08 18:18:13 +00:00
Jacob Hoffman-Andrews 6b8b6a37c0 Update chisel2 and boulder-tools (#3495)
This change updates boulder-tools to use Go 1.10, and references a
newly-pushed image built using that new config.

Since boulder-tools pulls in the latest Certbot master at the time of
build, this also pulls in the latest changes to Certbot's acme module,
which now supports ACME v2. This means we no longer have to check out
the special acme-v2-integration branch in our integration tests.

This also updates chisel2.py to reflect some of the API changes that
landed in the acme module as it was merged to master.

Since we don't need additional checkouts to get the ACMEv2-compatible
version of the acme module, we can include it in the default RUN set for
local tests.
2018-02-28 15:21:40 -08:00
Jacob Hoffman-Andrews 9003dd4522 Improve test-tools. (#3481)
Use `t.Helper` and `t.Fatalf` instead of our own versions.

Remove some unused or single-user helpers.

Make the output of `AssetUnmarshaledEquals` clearer by showing one line per field.
2018-02-27 10:19:24 -05:00
Jacob Hoffman-Andrews 9da5a7e1fc Cleanup: TLS and GRPC configs are mandatory. (#3476)
Our various main.go functions gated some key code on whether the TLS
and/or GRPC config fields were present. Now that those fields are fully
deployed in production, we can simplify the code and require them.
    
Also, rename tls to tlsConfig everywhere to avoid confusion with the tls
package.
    
Avoid assigning to the same err from two different goroutines in
boulder-ca (fix a race).
2018-02-26 10:16:50 -05:00
Roland Bracewell Shoemaker 0b53063a72 ctpolicy: Add informational logs and don't cancel remaining submissions (#3472)
Add a set of logs which will be submitted to but not relied on for their SCTs,
this allows us to test submissions to a particular log or submit to a log which
is not yet approved by a browser/root program.

Also add a feature which stops cancellations of remaining submissions when racing
to get a SCT from a group of logs.

Additionally add an informational log that always times out in config-next.

Fixes #3464 and fixes #3465.
2018-02-23 21:51:50 -05:00
Daniel af97ccae18
Fix OCSPUpdater config parameter 2018-02-21 11:52:13 -05:00
Daniel 36de3bf000
Support Akamai CCU v3 API in cache-client.
This commit adds support for the Akamai CCU v3 API. See
https://developer.akamai.com/api/purge/ccu/resources.html for more information.

The V2 and V3 API are close enough to one another that we can support
both with minimal changes. A new OCSP updated configuration parameter
"AkamaiV3Network" is used to determine if the cache client should use
the V2 API or the V3 API. When empty, V2 is used. When set to either
"production" or "staging", the V3 API is used.
2018-02-21 11:41:32 -05:00
Jacob Hoffman-Andrews c556a1a20d
Reduce spurious errors in integration test (#3436)
Boulder is fairly noisy about gRPC connection errors. This is a mixed
blessing: Our gRPC configuration will try to reconnect until it hits
an RPC deadline, and most likely eventually succeed. In that case,
we don't consider those to really be errors. However, in cases where
a connection is repeatedly failing, we'd like to see errors in the
logs about connection failure, rather than "deadline exceeded." So
we want to keep logging of gRPC errors.

However, right now we get a lot of these errors logged during
integration tests. They make the output hard to read, and may disguise
more serious errors. So we'd like to avoid causing such errors in
normal integration test operation.

This change reorders the startup of Boulder components by their gRPC
dependencies, so everything's backend is likely to be up and running
before it starts. It also reverses that order for clean shutdowns,
and waits for each process to exit before signalling the next one.

With these changes, I still got connection errors. Taking listenbuddy
out of the gRPC path fixed them. I believe the issue is that
listenbuddy is not a truly transparent proxy. In particular, it
accepts an inbound TCP connection before opening an outbound TCP
connection. If opening that outbound connection results in "connection
refused," it closes the inbound connection. That means gRPC sees a
"connection closed" (or "connection reset"?) rather than "connection
refused". I'm guessing it handles those cases differently, explaining
the different error results.

We've been using listenbuddy to trigger disconnects while Boulder is
running, to ensure that gRPC's reconnect code works. I think we can
probably rely on gRPC's reconnect to work. The initial problem that
led us to start testing this was a configuration problem; now that
we have the configuration we want, we should be fine and don't need
to keep testing reconnects on every integration test run.
2018-02-12 18:17:50 -08:00