boulder

Commit Graph

Author	SHA1	Message	Date
Roland Bracewell Shoemaker	842739bccd	Remove deprecated features that have been purged from prod and staging configs (#4001 )	2019-01-15 16:16:35 -08:00
Roland Bracewell Shoemaker	ba7a8e8e5d	Add fake Akamai purge server for integration testing (#3946 ) Fixes #3916.	2018-11-27 09:49:05 -05:00
Daniel McCarney	3e61513364	config, config-next: remove deprecated ocsp-updater fields (#3884 )	2018-10-12 13:52:15 -04:00
Roland Bracewell Shoemaker	876c727b6f	Update gRPC (#3817 ) Fixes #3474.	2018-08-20 10:55:42 -04:00
Roland Bracewell Shoemaker	e27f370fd3	Excise code relating to pre-SCT embedding issuance flow (#3769 ) Things removed: * features.EmbedSCTs (and all the associated RA/CA/ocsp-updater code etc) * ca.enablePrecertificateFlow (and all the associated RA/CA code) * sa.AddSCTReceipt and sa.GetSCTReceipt RPCs * publisher.SubmitToCT and publisher.SubmitToSingleCT RPCs Fixes #3755.	2018-06-28 08:33:05 -04:00
Jacob Hoffman-Andrews	dbcb16543e	Start using multiple-IP hostnames for load balancing (#3687 ) We'd like to start using the DNS load balancer in the latest version of gRPC. That means putting all IPs for a service under a single hostname (or using a SRV record, but we're not taking that path). This change adds an sd-test-srv to act as our service discovery DNS service. It returns both Boulder IP addresses for any A lookup ending in ".boulder". This change also sets up the Docker DNS for our boulder container to defer to sd-test-srv when it doesn't know an answer. sd-test-srv doesn't know how to resolve public Internet names like `github.com`. Resolving public names is required for the `godep-restore` test phase, so this change breaks out a copy of the boulder container that is used only for `godep-restore`. This change implements a shim of a DNS resolver for gRPC, so that we can switch to DNS-based load balancing with the currently vendored gRPC, then when we upgrade to the latest gRPC we won't need a simultaneous config update. Also, this change introduces a check at the end of the integration test that each backend received at least one RPC, ensuring that we are not sending all load to a single backend.	2018-05-23 09:47:14 -04:00
Jacob Hoffman-Andrews	a4421ae75b	Run gRPC backends on multiple IPs instead of multiple ports (#3679 ) We're currently stuck on gRPC v1.1 because of a breaking change to certificate validation in gRPC 1.8. Our gRPC balancer uses a static list of multiple hostnames, and expects to validate against those hostnames. However gRPC expects that a service is one hostname, with multiple IP addresses, and validates all those IP addresses against the same hostname. See grpc/grpc-go#2012. If we follow gRPC's assumptions, we can rip out our custom Balancer and custom TransportCredentials, and will probably have a lower-friction time in general. This PR is the first step in doing so. In order to satisfy the "multiple IPs, one port" property of gRPC backends in our Docker container infrastructure, we switch to Docker's user-defined networking. This allows us to give the Boulder container multiple IP addresses on different local networks, and gives it different DNS aliases in each network. In startservers.py, each shard of a service listens on a different DNS alias for that service, and therefore a different IP address. The listening port for each shard of a service is now identical. This change also updates the gRPC service certificates. Now, each certificate that is used in a gRPC service (as opposed to something that is "only" a client) has three names. For instance, sa1.boulder, sa2.boulder, and sa.boulder (the generic service name). For now, we are validating against the specific hostnames. When we update our gRPC dependency, we will begin validating against the generic service name. Incidentally, the DNS aliases feature of Docker allows us to get rid of some hackery in entrypoint.sh that inserted entries into /etc/hosts. Note: Boulder now has a dependency on the DNS aliases feature in Docker. By default, docker-compose run creates a temporary container and doesn't assign any aliases to it. We now need to specify docker-compose run --use-aliases to get the correct behavior. Without --use-aliases, Boulder won't be able to resolve the hostnames it wants to bind to.	2018-05-07 10:38:31 -07:00
Roland Bracewell Shoemaker	0a86573a73	Update integration tests	2018-04-20 13:18:40 -07:00
Jacob Hoffman-Andrews	a4f9de9e35	Improve nesting of RPC deadlines (#3619 ) gRPC passes deadline information through the RPC boundary, but client and server have the same deadline. Ideally we'd like the server to have a slightly tighter deadline than the client, so if one of the server's onward RPCs or other network calls times out, the server can pass back more detailed information to the client, rather than the client timing out the server and losing the opportunity to log more detailed information about which component caused the timeout. In this change, I subtract 100ms from the deadline on the server side of our interceptors, using our existing serverInterceptor. I also check that there is at least 100ms remaining in which to do useful work, so the server doesn't begin a potentially expensive task only to abort it. Fixes #3608.	2018-04-06 15:40:18 +01:00
Roland Bracewell Shoemaker	9c9e944759	Add SCT embedding (#3521 ) Adds SCT embedding to the certificate issuance flow. When a issuance is requested a precertificate (the requested certificate but poisoned with the critical CT extension) is issued and submitted to the required CT logs. Once the SCTs for the precertificate have been collected a new certificate is issued with the poison extension replace with a SCT list extension containing the retrieved SCTs. Fixes #2244, fixes #3492 and fixes #3429.	2018-03-12 11:58:30 -07:00
Jacob Hoffman-Andrews	c556a1a20d	Reduce spurious errors in integration test (#3436 ) Boulder is fairly noisy about gRPC connection errors. This is a mixed blessing: Our gRPC configuration will try to reconnect until it hits an RPC deadline, and most likely eventually succeed. In that case, we don't consider those to really be errors. However, in cases where a connection is repeatedly failing, we'd like to see errors in the logs about connection failure, rather than "deadline exceeded." So we want to keep logging of gRPC errors. However, right now we get a lot of these errors logged during integration tests. They make the output hard to read, and may disguise more serious errors. So we'd like to avoid causing such errors in normal integration test operation. This change reorders the startup of Boulder components by their gRPC dependencies, so everything's backend is likely to be up and running before it starts. It also reverses that order for clean shutdowns, and waits for each process to exit before signalling the next one. With these changes, I still got connection errors. Taking listenbuddy out of the gRPC path fixed them. I believe the issue is that listenbuddy is not a truly transparent proxy. In particular, it accepts an inbound TCP connection before opening an outbound TCP connection. If opening that outbound connection results in "connection refused," it closes the inbound connection. That means gRPC sees a "connection closed" (or "connection reset"?) rather than "connection refused". I'm guessing it handles those cases differently, explaining the different error results. We've been using listenbuddy to trigger disconnects while Boulder is running, to ensure that gRPC's reconnect code works. I think we can probably rely on gRPC's reconnect to work. The initial problem that led us to start testing this was a configuration problem; now that we have the configuration we want, we should be fine and don't need to keep testing reconnects on every integration test run.	2018-02-12 18:17:50 -08:00
Jacob Hoffman-Andrews	827f7859f2	Fix issuerCert in test configs. (#3310 ) Previously, there was a disagreement between WFE and CA as to what the correct issuer certificate was. Consolidate on test-ca2.pem (h2ppy h2cker fake CA). Also, the CA configs contained an outdated entry for "IssuerCert", which was not being used: The CA configs now use an "Issuers" array to allow signing by multiple issuer certificates at once (for instance when rolling intermediates). Removed this outdated entry, and the config code for CA to load it. I've confirmed these changes match what is currently in production. Added an integration test to check for this problem in the future. Fixes #3309, thanks to @icing for bringing the issue to our attention! This also includes changes from #3321 to clarify certificates for WFE.	2018-01-09 07:56:39 -05:00
Kleber Correia	c2156479dd	Remove ResubmitMissingSCTsOnly flag (#3042 ) Part of #2692	2017-09-06 10:30:30 -07:00
Jacob Hoffman-Andrews	b17b5c72a6	Remove statsd from Boulder (#2752 ) This removes the config and code to output to statsd. - Change `cmd.StatsAndLogging` to output a `Scope`, not a `Statter`. - Remove the prefixing of component name (e.g. "VA") in front of stats; this was stripped by `autoProm` but now no longer needs to be. - Delete vendored statsd client. - Delete `MockStatter` (generated by gomock) and `mocks.Statter` (hand generated) in favor of mocking `metrics.Scope`, which is the interface we now use everywhere. - Remove a few unused methods on `metrics.Scope`, and update its generated mock. - Refactor `autoProm` and add `autoRegisterer`, which can be included in a `metrics.Scope`, avoiding global state. `autoProm` now registers everything with the `prometheus.Registerer` it is given. - Change va_test.go's `setup()` to not return a stats object; instead the individual tests that care about stats override `va.stats` directly. Fixes #2639, #2733.	2017-05-15 10:19:54 -04:00
Roland Bracewell Shoemaker	a46d30945c	Purge remaining AMQP code (#2648 ) Deletes github.com/streadway/amqp and the various RabbitMQ setup tools etc. Changes how listenbuddy is used to proxy all of the gRPC client -> server connections so we test reconnection logic. +49 -8,221 😁 Fixes #2640 and #2562.	2017-04-04 15:02:22 -07:00
Daniel McCarney	fcf361c327	Remove CertStatusOptimizationsMigrated Feature Flag & Assoc. Cruft (#2561 ) The NotAfter and IsExpired fields on the certificateStatus table have been migrated in staging & production. Similarly the CertStatusOptimizationsMigrated feature flag has been turned on after a successful backfill operation. We have confirmed the optimization is working as expected and can now clean out the duplicated v1 and v2 models, and the feature flag branching. The notafter-backfill command is no longer useful and so this commit also cleans it out of the repo. Note: Some unit tests were sidestepping the SA and inserting certificateStatus rows explicitly. These tests had to be updated to set the NotAfter field in order for the queries used by the ocsp-updater and the expiration-mailer to perform the way the tests originally expected. Resolves #2530	2017-02-16 11:35:00 -08:00
Roland Bracewell Shoemaker	18de73f0d8	Pass nil errors through boulder/grpc wrapError/unwrapError (#2544 ) Instead of trying to wrap or unwrap them which causes panics. Also, expand the test_ct_submission integration test to include resubmissions.	2017-02-06 18:19:39 -08:00
Daniel	e88db3cd5e	Revert "Revert "Copy all statsd stats to Prometheus. (#2474 )" (#2541 )" This reverts commit `9d9e4941a5` and restores the statsd prometheus code.	2017-02-01 15:48:18 -05:00
Daniel McCarney	9d9e4941a5	Revert "Copy all statsd stats to Prometheus. (#2474 )" (#2541 ) This reverts commit `58ccd7a71a`. We are seeing multiple boulder components restart when they encounter the stat registration race condition described in https://github.com/letsencrypt/boulder/issues/2540	2017-02-01 12:50:27 -05:00
Jacob Hoffman-Andrews	58ccd7a71a	Copy all statsd stats to Prometheus. (#2474 ) We have a number of stats already expressed using the statsd interface. During the switchover period to direct Prometheus collection, we'd like to make those stats available both ways. This change automatically exports any stats exported using the statsd interface via Prometheus as well. This is a little tricky because Prometheus expects all stats to by registered exactly once. Prometheus does offer a mechanism to gracefully recover from registering a stat more than once by handling a certain error, but it is not safe for concurrent access. So I added a concurrency-safe wrapper that creates Prometheus stats on demand and memoizes them. In the process, made a few small required side changes: - Clean "/" from method names in the gRPC interceptors. They are allowed in statsd but not in Prometheus. - Replace "127.0.0.1" with "boulder" as the name of our testing CT log. Prometheus stats can't start with a number. - Remove ":" from the CT-log stat names emitted by Publisher. Prometheus stats can't include it. - Remove a stray "RA" in front of some rate limit stats, since it was duplicative (we were emitting "RA.RA..." before). Note that this means two stat groups in particular are duplicated: - Gostats* is duplicated with the default process-level stats exported by the Prometheus library. - gRPCClient* are duplicated by the stats generated by the go-grpc-prometheus package. When writing dashboards and alerts in the Prometheus world, we should be careful to avoid these two categories, as they will disappear eventually. As a general rule, if a stat is available with an all-lowercase name, choose that one, as it is probably the Prometheus-native version. In the long run we will want to create most stats using the native Prometheus stat interface, since it allows us to use add labels to metrics, which is very useful. For instance, currently our DNS stats distinguish types of queries by appending the type to the stat name. This would be more natural as a label in Prometheus.	2017-01-10 10:30:15 -05:00
Jacob Hoffman-Andrews	510e279208	Simplify gRPC TLS configs. (#2470 ) Previously, a given binary would have three TLS config fields (CA cert, cert, key) for its gRPC server, plus each of its configured gRPC clients. In typical use, we expect all three of those to be the same across both servers and clients within a given binary. This change reuses the TLSConfig type already defined for use with AMQP, adds a Load() convenience function that turns it into a *tls.Config, and configures it for use with all of the binaries. This should make configuration easier and more robust, since it more closely matches usage. This change preserves temporary backwards-compatibility for the ocsp-updater->publisher RPCs, since those are the only instances of gRPC currently enabled in production.	2017-01-06 14:19:18 -08:00
Jacob Hoffman-Andrews	9b8dacab03	Split out separate RPC services for issuing and for signing OCSP (#2452 ) This allows finer-grained control of which components can request issuance. The OCSP Updater should not be able to request issuance. Also, update test/grpc-creds/generate.sh to reissue the certs properly. Resolves #2417	2017-01-05 15:08:39 -08:00
Jacob Hoffman-Andrews	0c665b2053	Split up gRPC certificates by service. (#2453 ) Previously, all gRPC services used the same client and server certificates. Now, each service has its own certificate, which it uses for both client and server authentication, more closely simulating production. This also adds aliases for each of the relevant hostnames in /etc/hosts. There may be some issues if Docker decides to rewrite /etc/hosts while Boulder is running, but this seems to work for now.	2016-12-29 14:53:59 -08:00
Daniel McCarney	32890656b8	`certificateStatus` table optimizations (Part Three) (#2431 ) Following on to https://github.com/letsencrypt/boulder/pull/2177 and https://github.com/letsencrypt/boulder/issues/2227 this PR adds code to the `ocsp-updater` that takes advantage of the migrations & backfill from the previous optimization PRs. This has the primary effect of removing the `JOIN` on the `certificates` table in the `findStaleOCSPResponses` query. We expect this to be a big win in terms of query performance. The `ocsp-updater` is also updated to opportunistically fill in the newly added `isExpired` field of the `CertificateStatus` table as it encounters rows that aren't marked as expired but correspond to an expired certificate. Resolves https://github.com/letsencrypt/boulder/issues/2238 and #2239	2016-12-15 12:53:43 -08:00
Jacob Hoffman-Andrews	26cf552ff9	Sign OCSP in parallel for better performance. (#2422 ) Previously all OCSP signing and storage would be serial, which meant it was hard to exercise the full capacity of our HSM. In this change, we run a limited number of update and store requests in parallel. This change also changes stats generation in generateOCSPResponses so we can tell the difference between stats produced by new OCSP requests vs existing ones, and adds a new stat that records how long the SQL query in findStaleOCSPResponses takes.	2016-12-12 17:22:44 -08:00
Daniel McCarney	6ec93157f7	OCSP Updater "stale max age" parameter. (#2419 ) This PR adds a new `OCSPStaleMaxAge` configuration parameter to the `ocsp-updater`. The default value when not provided is 30 days, and this is explicitly added to both `config/ocsp-updater.json` and `config-next/ocsp-updater.json`. The OCSP updater uses this new parameter in `findStaleOCSPResponses` as a lower bound on the `ocspLastUpdated` field of the certificateStatus table. This is intended to speed up the processing of this query until we can land the proper fixes that require more intensive migrations & backfilling. The `TestGenerateOCSPResponses` and `TestFindStaleOCSPResponses` unit tests had to be updated to explicitly set the `ocspLastUpdated` field of the certificate status rows that the tests add, because otherwise they are left at a default value of `0` and are excluded by the new `OCSPStaleMaxAge` functionality.	2016-12-12 15:57:59 -05:00
Jacob Hoffman-Andrews	1c1449b284	Improvements to tests and test configs. (#2396 ) - Remove spinner from test.js. It made Travis logs hard to read. - Listen on all interfaces for debugAddr. This makes it possible to check Prometheus metrics for instances running in a Docker container. - Standardize DNS timeouts on 1s and 3 retries across all configs. This ensures DNS completes within the relevant RPC timeouts. - Remove RA service queue from VA, since VA no longer uses the callback to RA on completing a challenge.	2016-12-05 14:35:27 -08:00
Daniel McCarney	a2b8faea1e	Only resubmit missing SCTs. (#2342 ) This PR introduces the ability for the ocsp-updater to only resubmit certificates to logs that we are missing SCTs from. Prior to this commit when a certificate was missing one or more SCTs we would submit it to every log, causing unnecessary overhead for us and the log operator. To accomplish this a new RPC endpoint is added to the Publisher service "SubmitToSingleCT". Unlike the existing "SubmitToCT" this RPC endpoint accepts a log URI and public key in addition to the certificate DER bytes. The certificate is submitted directly to that log, and a cache of constructed resources is maintained so that subsequent submissions to the same log can reuse the stat name, verifier, and submission client. Resolves #1679	2016-12-05 13:54:02 -08:00
Roland Bracewell Shoemaker	03fdd65bfe	Add gRPC server to SA (#2374 ) Adds a gRPC server to the SA and SA gRPC Clients to the WFE, RA, CA, Publisher, OCSP updater, orphan finder, admin revoker, and expiration mailer. Also adds a CA gRPC client to the OCSP Updater which was missed in #2193. Fixes #2347.	2016-12-02 17:24:46 -08:00
Daniel McCarney	6c983e8c9e	Implements client whitelisting for gRPC. (#2307 ) As described in #2282, our gRPC code uses mutual TLS to authenticate both clients and servers. However, currently our gRPC servers will accept any client certificate signed by the internal CA we use to authenticate connections. Instead, we would like each server to have a list of which clients it will accept. This will improve security by preventing the compromise of one client private key being used to access endpoints unrelated to its intended scope/purpose. This PR implements support for gRPC servers to specify a list of accepted client names. A `serverTransportCredentials` implementing `ServerHandshake` uses a `verifyClient` function to enforce that the connecting peer presents a client certificate with a SAN entry that matches an entry on the list of accepted client names The `NewServer` function from `grpc/server.go` is updated to instantiate the `serverTransportCredentials` used by `grpc.NewServer`, specifying an accepted names list populated from the `cmd.GRPCServerConfig.ClientNames` config field. The pre-existing client and server certificates in `test/grpc-creds/` are replaced by versions that contain SAN entries as well as subject common names. A DNS and an IP SAN entry are added to allow testing both methods of specifying allowed SANs. The `generate.sh` script is converted to use @jsha's `minica` tool (OpenSSL CLI is blech!). An example client whitelist is added to each of the existing gRPC endpoints in config-next/ to allow the SAN of the test RPC client certificate. Resolves #2282	2016-11-08 13:57:34 -05:00
Ben Irving	653cc004d0	Split Boulder Config (OCSP Updater) (#2013 )	2016-07-06 10:00:52 -04:00

31 Commits