Commit Graph

152 Commits

Author SHA1 Message Date
Aaron Gable d9cb35c60c
Remove unused DBConnect config string (#6615)
Neither our testing, staging, nor production configs use the
DBConfig.DBConnect config value. Remove it.

To connect to a database, you have to provide a connection URL. These
URLs often contain sensitive information such as DB usernames and
passwords, so we don't store them directly in our configs -- instead, we
store paths to files which contain these strings, and provision those
files via a separate mechanism. We maintained the ability to provide a
URL directly in the config for the sake of easy testing, but have not
used it for that purpose for some time now.
2023-01-27 13:10:52 -08:00
Samantha 0d6f8569c5
grpc/rocsp: Allow use of TLSv1.2 and TLSv1.3 (#6600)
When we clamped our MaxVersion to TLS1.2, there wasn't any
support for TLS1.3 yet. Allowing higher versions to be negotiated
is good.

Fixes #6580
2023-01-24 12:53:13 -08:00
Aaron Gable 46c8d66c31
bgrpc.NewServer: support multiple services (#6487)
Turn bgrpc.NewServer into a builder-pattern, with a config-based
initialization, multiple calls to Add to add new gRPC services, and a
final call to Build to produce the start() and stop() functions which
control server behavior. All calls are chainable to produce compact code
in each component's main() function.

This improves the process of creating a new gRPC server in three ways:
1) It avoids the need for generics/templating, which was slightly
verbose.
2) It allows the set of services to be registered on this server to be
known ahead of time.
3) It greatly streamlines adding multiple services to the same server,
which we use today in the VA and will be using soon in the SA and CA.

While we're here, add a new per-service config stanza to the
GRPCServerConfig, so that individual services on the same server can
have their own configuration. For now, only provide a "ClientNames" key,
which will be used in a follow-up PR.

Part of #6454
2022-11-04 13:26:42 -07:00
Samantha 9c12e58c7b
grpc: Allow static host override in client config (#6423)
- Add a new gRPC client config field which overrides the dNSName checked in the
  certificate presented by the gRPC server.
- Revert all test gRPC credentials to `<service>.boulder`
- Revert all ClientNames in gRPC server configs to `<service>.boulder`
- Set all gRPC clients in `test/config` to use `serverAddress` + `hostOverride`
- Set all gRPC clients in `test/config-next` to use `srvLookup` + `hostOverride`
- Rename incorrect SRV record for `ca` with port `9096` to `ca-ocsp` 
- Rename incorrect SRV record for `ca` with port `9106` to `ca-crl` 

Resolves #6424
2022-10-03 15:23:55 -07:00
Samantha ffad58009e
grpc: Backend discovery improvements (#6394)
- Fork the default `dns` resolver from `go-grpc` to add backend discovery via
  DNS SRV resource records.
- Add new fields for SRV based discovery to `cmd.GRPCClientConfig`
- Add new (optional) field `DNSAuthority` for specifying custom DNS server to
  `cmd.GRPCClientConfig`
- Add a utility method to `cmd.GRPCClientConfig` to simplify target URI and host
  construction. With three schemes and `DNSAuthority` it makes more sense to
  handle all of this parsing and construction outside of the RPC client
  constructor.

Resolves #6111
2022-09-23 13:11:59 -07:00
Jacob Hoffman-Andrews db044a8822
log: fix spurious honeycomb warnings; improve stdout logger (#6364)
Honeycomb was emitting logs directly to stderr like this:

```
WARN: Missing API Key.
WARN: Dataset is ignored in favor of service name. Data will be sent to service name: boulder
```

Fix this by providing a fake API key and replacing "dataset" with "serviceName" in configs. Also add missing Honeycomb configs for crl-updater.

For stdout-only logger, include checksums and escape newlines.
2022-09-14 11:25:02 -07:00
Jacob Hoffman-Andrews f98d74c14d
log: emit warnings and errors on stderr (#6325)
Debug and Info messages still go to stdout.

Fix the CAA integration test, which asserted that stderr should be empty
when caa-log-checker finds a problem. That used to be the case because
we never logged to stderr, but now it is the case.

Update the logging docs.

Fixes #6324
2022-08-29 15:00:55 -07:00
Aaron Gable 9c197e1f43
Use io and os instead of deprecated ioutil (#6286)
The iotuil package has been deprecated since go1.16; the various
functions it provided now exist in the os and io packages. Replace all
instances of ioutil with either io or os, as appropriate.
2022-08-10 13:30:17 -07:00
Samantha 576b6777b5
grpc: Implement a static multiple IP address gRPC resolver (#6270)
- Implement a static resolver for the gPRC dialer under the scheme `static:///`
  which allows the dialer to resolve a backend from a static list of IPv4/IPv6
  addresses passed via the existing JSON config.
- Add config key `serverAddresses` to the `GRPCClientConfig` which, when
  populated, enables static IP resolution of gRPC server backends.
- Set `config-next` to use static gRPC backend resolution for all SA clients.
- Generate a new SA certificate which adds `10.77.77.77` and `10.88.88.88` to
  the SANs.

Resolves #6255
2022-08-05 10:20:57 -07:00
Aaron Gable 305ef9cce9
Improve error checking paradigm (#5920)
We have decided that we don't like the if err := call(); err != nil
syntax, because it creates confusing scopes, but we have not cleaned up
all existing instances of that syntax. However, we have now found a
case where that syntax enables a bug: It caused readers to believe that
a later err = call() statement was assigning to an already-declared err
in the local scope, when in fact it was assigning to an
already-declared err in the parent scope of a closure. This caused our
ineffassign and staticcheck linters to be unable to analyze the
lifetime of the err variable, and so they did not complain when we
never checked the actual value of that error.

This change standardizes on the two-line error checking syntax
everywhere, so that we can more easily ensure that our linters are
correctly analyzing all error assignments.
2022-02-01 14:42:43 -07:00
Samantha f69b57e0e1
Make DB client initialization uniform and stop setting 'READ-UNCOMMITTED' (#5741)
Boulder components initialize their gorp and gorp-less (non-wrapped) database
clients via two new SA helpers. These helpers handle client construction,
database metric initialization, and (for gorp only) debug logging setup.

Removes transaction isolation parameter `'READ-UNCOMMITTED'` from all database
connections.

Fixes #5715 
Fixes #5889
2022-01-31 13:34:23 -08:00
Aaron Gable 18389c9024
Remove dead code (#5893)
Running an older version (v0.0.1-2020.1.4) of `staticcheck` in
whole-program mode (`staticcheck --unused.whole-program=true -- ./...`)
finds various instances of unused code which don't normally show up
as CI issues. I've used this to find and remove a large chunk of the
unused code, to pave the way for additional large deletions accompanying
the WFE1 removal.

Part of #5681
2022-01-19 12:23:06 -08:00
Aaron Gable a6ad023c6a
Tracing: don't exclude gRPC client spans (#5664)
We'd prefer not to send these, as they don't contain a significant
amount of relevant information, and they nearly double the total
number of spans sent. However, not sending them is worse: the resulting
server spans created on the other side of the connection all have
parent span IDs that "don't exist" (are never sent to the tracing
service), breaking all ability of the tracing system to infer causality.

The ideal solution here would be to customize the client wrapper code
to not generate these client spans at all, but that requires upstream
changes that may take some time. In the mean time, let's at least do
this.
2021-09-21 13:44:30 -06:00
J.C. Jones 7b31bdb30a
Add read-only dbConns to SQLStorageAuthority and OCSPUpdater (#5555)
This changeset adds a second DB connect string for the SA for use in 
read-only queries that are not themselves dependencies for read-write 
queries. In other words, this is attempting to only catch things like 
rate-limit `SELECT`s and other coarse-counting, so we can potentially 
move those read queries off the read-write primary database.

It also adds a second DB connect string to the OCSP Updater. This is a 
little trickier, as the subsequent `UPDATE`s _are_ dependent on the 
output of the `SELECT`, but in this case it's operating on data batches,
and a few seconds' replication latency are several orders of magnitude 
below the threshold for update frequency, so any certificates that 
aren't caught on run `n` can be caught on run `n+1`.

Since we export DB metrics to Prometheus, this also refactors 
`InitDBMetrics` to take a DB Address (host:port tuple) and User out of 
the DB connection DSN and include those as labels in the metrics.

Fixes #5550
Fixes #4985
2021-08-02 11:21:34 -07:00
Aaron Gable 1f0f59883c
Honeycomb: emit sample rate 1 instead of 0 (#5448)
The sampling rate integer is used by the span collector to estimate
"how many spans does this span actually represent". This allows accurate
volume comparisons: for example, if you sample successful requests at
a rate of 1/100 and error requests at a rate of 1/10, the trace query
interface will know to scale its query results by those respective
values in order to arrive at accurate error rate estimates.

Previously, this code was returning a sample rate integer of 0 to
indicate that the span was selected for sampling due to an extraordinary
circumstance. This was wrong. This change updates the sample rate int
to be 1, indicating that every such span which exhibited this feature
was sampled, and represents only itself.
2021-06-01 08:42:14 -07:00
Aaron Gable 02b6ea1489
Honeycomb: don't transmit gRPC client spans (#5446)
Add a test to the Honeycomb SamplerHook which never sends
spans which have a "meta.type" of "grpc_client". 

This field and value are set automatically by the Honeycomb gRPC
client interceptor, and can't be set by application code (any fields
set by application code have "app." prepended to their name).

Never sending these spans reduces our visibility into in-datacenter
network latency, but also reduces the number of spans sent by
roughly 50%.
2021-05-28 09:36:01 -07:00
Aaron Gable e3d194f4b0
Honeycomb: Use a deterministic SamplerHook (#5433)
Switch from using the honeycomb beeline's built-in sampling
to a sampler hook which bases its sampling decisions on a
hash of the trace ID. This allows us to do "deterministic"
sampling, where every span in a given trace will either be
sent or not (since the trace ID is the same across all spans
in a trace), giving us more complete traces.

This preserves the same simple (single integer) configuration
of the sample rate. The sample rate can be set differently for
different boulder components (e.g. 1 at the WFE, 100 at the 
RA, and 1000 at the nonce-service), but the sampling rate
denominator should only increase towards the leaves of a
gRPC request path.
2021-05-27 13:13:54 -07:00
Aaron Gable 773c98875b
Remove PasswordConfig's in-config option (#5434)
We never really want to be using in-config passwords
anyway, so remove this option.

Fixes #5426
2021-05-26 10:14:13 -07:00
Aaron Gable 9abb39d4d6
Honeycomb integration proof-of-concept (#5408)
Add Honeycomb tracing to all Boulder components which act as
HTTP servers, gRPC servers, or gRPC clients. Add many values
which we currently emit to logs to the trace spans. Add a way to
configure the Honeycomb integration to our config files, and by
default configure all of our tests to "mute" (send nothing).

Followup changes will refine the configuration, attempt to reduce
the new dependency load, and introduce better sampling.

Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218
2021-05-24 16:13:08 -07:00
Samantha 5a92926b0c
Remove dbconfig migration deployability code (#5348)
Default boulder code paths to exclusively use the `db` config key

Fixes #5338
2021-03-18 16:41:15 -07:00
Jacob Hoffman-Andrews b4e483d38b
Add gRPC MaxConnectionAge config. (#5311)
This allows servers to tell clients to go away after some period of time, which triggers the clients to re-resolve DNS.

Per grpc/grpc#12295, this is the preferred way to do this.

Related: #5307.
2021-03-01 18:37:47 -08:00
Samantha e2e7dad034
Move cmd.DBConfig fields to their own named sub-struct (#5286)
Named field `DB`, in a each component configuration struct, acts as the
receiver for the value of `db` when component JSON files are
unmarshalled.

When `cmd.DBConfig` fields are received at the root of component
configuration struct instead of `DB` copy them to the `DB` field of the
component configuration struct.

Move existing `cmd.DBConfig` values from the root of each component's
JSON configuration in `test/config-next` to `db`

Part of #5275
2021-02-16 10:48:58 -08:00
Samantha 7cb0038498
Deprecate MaxDBConns for MaxOpenConns (#5274)
In #5235 we replaced MaxDBConns in favor of MaxOpenConns.

One week ago MaxDBConns was removed from all dev, staging, and
production configurations. This change completes the removal of
MaxDBConns from all components and test/config.

Fixes #5249
2021-02-08 12:00:01 -08:00
Samantha e0510056cc
Enhancements to SQL driver tuning via JSON config (#5235)
Historically the only database/sql driver setting exposed via JSON
config was maxDBConns. This change adds support for maxIdleConns,
connMaxLifetime, connMaxIdleTime, and renames maxDBConns to
maxOpenConns. The addition of these settings will give our SRE team a
convenient method for tuning the reuse/closure of database connections.

A new struct, DBSettings, has been added to SA. The struct, and each of
it's fields has been commented.

All new fields have been plumbed through to the relevant Boulder
components and exported as Prometheus metrics. Tests have been
added/modified to ensure that the fields are being set. There should be
no loss in coverage

Deployability concerns for the migration from maxDBConns to maxOpenConns
have been addressed with the temporary addition of the helper method
cmd.DBConfig.GetMaxOpenConns(). This method can be removed once
test/config is defaulted to using maxOpenConns. Relevant sections of the
code have TODOs added that link back to an newly opened issue.

Fixes #5199
2021-01-25 15:34:55 -08:00
Samantha 07aef67fa6
Refactoring tls.Config mutation out of grpc (#5175)
In all boulder services, we construct a single tls.Config object
and then pass it into multiple gRPC setup methods. In all boulder
services but one, we pass the object into multiple clients, and
just one server. In general, this is safe, because all of the client
setup happens on the main thread, and the server setup similarly
happens on the main thread before spinning off the gRPC server
goroutine.

In the CA, we do the above and pass the tlsConfig object into two
gRPC server setup functions. Thus the first server goroutine races
with the setup of the second server.

This change removes the post-hoc assignment of MinVersion,
MaxVersion, and CipherSuites of the tls.Config object passed
to grpc.ClientSetup and grpc.NewServer. And adds those same
values to the cmd.TLSConfig.Load, the method responsible for
constructing the tls.Config object before it's passed to
grpc.ClientSetup and grpc.NewServer.

Part of #5159
2020-11-12 16:24:16 -08:00
Samantha 9712d21aeb
config: replacing error assertions with errors.As (#5126)
errors.As checks for a specific error in a wrapped error chain
(see https://golang.org/pkg/errors/#As) as opposed to asserting
that an error is of a specific type.

Part of #5010
2020-10-13 18:05:00 -07:00
Aaron Gable e2c8f6743a
Introduce new core.AcmeChallenge type (#5012)
ACME Challenges are well-known strings ("http-01", "dns-01", and
"tlsalpn-01") identifying which kind of challenge should be used
to verify control of a domain. Because they are well-known and
only certain values are valid, it is better to represent them as
something more akin to an enum than as bare strings. This also
improves our ability to ensure that an AcmeChallenge is not
accidentally used as some other kind of string in a different
context. This change also brings them closer in line with the
existing core.AcmeResource and core.OCSPStatus string enums.

Fixes #5009
2020-08-11 15:02:16 -07:00
Daniel McCarney f1894f8d1d
tidy: typo fixes flagged by codespell (#4634) 2020-01-07 14:01:26 -05:00
Roland Bracewell Shoemaker 3532dce246 Excise grpc maxConcurrentStreams configuration (#4257) 2019-06-12 09:35:24 -04:00
Jacob Hoffman-Andrews 0759d2d440 cmd: Split out config structs (#4200)
This follows up on some refactoring we had done previously but not
completed. This removes various binary-specific config structs from the
common cmd package, and moves them into their appropriate packages. In
the case of CT configs, they had to be moved into their own package to
avoid a dependency loop between RA and ctpolicy.
2019-05-06 11:11:08 -04:00
Jacob Hoffman-Andrews 0c700143bb Clean up README and test configs (#4185)
- docker-rebuild isn't needed now that boulder and bhsm containers run directly off
 the boulder-tools image.
- Remove DNS options from RA config.
- Remove GSB options from VA config.
2019-04-30 13:26:19 -07:00
Jacob Hoffman-Andrews d1e6d0f190 Remove TLS-SNI-01 (#4114)
* Remove the challenge whitelist
* Reduce the signature for ChallengesFor and ChallengeTypeEnabled
* Some unit tests in the VA were changed from testing TLS-SNI to testing the same behavior
  in TLS-ALPN, when that behavior wasn't already tested. For instance timeouts during connect 
  are now tested.

Fixes #4109
2019-03-15 09:05:24 -04:00
Roland Bracewell Shoemaker 046955e99c Add a standalone akamai purger service (#4040)
Fixes #4030.
2019-02-05 09:00:31 -08:00
Daniel McCarney d1daeee831 Config: serverAddresses -> serverAddress. (#4035)
The plural `serverAddresses` field in gRPC config has been deprecated for a bit now. We've removed the last usages of it in our staging/prod environments and can clear out the related code. Moving forward we only support a singular `serverAddress` and rely on DNS to direct to multiple instances of a given server.
2019-01-25 10:50:53 -08:00
Jacob Hoffman-Andrews bb15685a0f
No new certificates tick (#4012)
Since #2633 we generate OCSP at first issuance, so we no longer need 
this loop to check for new certificates that need OCSP status generated.
Since the associate SQL query is slow, we should just turn it off.

Also remove the configuration fields for the MissingSCTTick. The code
for that was already deleted.
2019-01-17 14:43:06 -08:00
Jacob Hoffman-Andrews cdc01df24f Revert "Increase default MaxIdleConns. (#3164)" (#4007)
This reverts commit 600640294d,
removing the maxIdleDBConns config setting.
2019-01-16 08:41:21 -05:00
Roland Bracewell Shoemaker cdef80ce67
Remove Akamai CCU v2 support (#3994)
Fixes #3991.
2019-01-08 12:28:11 -08:00
Daniel McCarney 38684ef56e
Remove unused OCSP Updater Publisher config (#3978)
Since 3e61513364 (diff-fd09431b6c4f935692859733688d597f) this field of the `OCSPUpdaterConfig` struct has been unused.
2018-12-11 14:27:10 -05:00
Roland Bracewell Shoemaker 196f019851 Add support for temporal CT logs (#3853)
Required a little bit of rework of the RA issuance flow (to add parsing of the precert to determine the expiration date, and moving final cert parsing before final cert submission) and RA tests, but I think it shouldn't create any issues...

Fixes #3197.
2018-09-14 16:14:42 -07:00
Roland Bracewell Shoemaker 876c727b6f Update gRPC (#3817)
Fixes #3474.
2018-08-20 10:55:42 -04:00
Jacob Hoffman-Andrews 36a83150ad Add stagger to CT log submissions. (#3794)
This allows each log a chance to respond before we move onto the next,
spreading our load more evenly across the logs in a log group.
2018-07-06 16:25:51 -04:00
Roland Bracewell Shoemaker 1271a15be7 Submit final certs to CT logs (#3640)
Submits final certificates to any configured CT logs. This doesn't introduce a feature flag as it is config gated, any log we want to submit final certificates to needs to have it's log description updated to include the `"submitFinalCerts": true` field.

Fixes #3605.
2018-04-13 12:02:01 -04:00
Jacob Hoffman-Andrews 2a1cd4981a Allow configuring gRPC's MaxConcurrentStreams (#3642)
During periods of peak load, some RPCs are significantly delayed (on the order of seconds) by client-side blocking. HTTP/2 clients have to obey a "max concurrent streams" setting sent by the server. In Go's HTTP/2 implementation, this value [defaults to 250](https://github.com/golang/net/blob/master/http2/server.go#L56), so the gRPC default is also 250. So whenever there are more than 250 requests in progress at a time, additional requests will be delayed until there is a slot available.

During this peak load, we aren't hitting limits on CPU or memory, so we should increase the max concurrent streams limit to take better advantage of our available resources. This PR adds a config field to do that.

Fixes #3641.
2018-04-12 17:17:17 -04:00
Jacob Hoffman-Andrews 9da5a7e1fc Cleanup: TLS and GRPC configs are mandatory. (#3476)
Our various main.go functions gated some key code on whether the TLS
and/or GRPC config fields were present. Now that those fields are fully
deployed in production, we can simplify the code and require them.
    
Also, rename tls to tlsConfig everywhere to avoid confusion with the tls
package.
    
Avoid assigning to the same err from two different goroutines in
boulder-ca (fix a race).
2018-02-26 10:16:50 -05:00
Roland Bracewell Shoemaker 0b53063a72 ctpolicy: Add informational logs and don't cancel remaining submissions (#3472)
Add a set of logs which will be submitted to but not relied on for their SCTs,
this allows us to test submissions to a particular log or submit to a log which
is not yet approved by a browser/root program.

Also add a feature which stops cancellations of remaining submissions when racing
to get a SCT from a group of logs.

Additionally add an informational log that always times out in config-next.

Fixes #3464 and fixes #3465.
2018-02-23 21:51:50 -05:00
Daniel af97ccae18
Fix OCSPUpdater config parameter 2018-02-21 11:52:13 -05:00
Maciej Dębski 44984cd84a Implement regID whitelist for allowed challenge types. (#3352)
This updates the PA component to allow authorization challenge types that are globally disabled if the account ID owning the authorization is on a configured whitelist for that challenge type.
2018-01-10 13:44:53 -05:00
Jacob Hoffman-Andrews 600640294d Increase default MaxIdleConns. (#3164)
Go's default is 2: https://golang.org/src/database/sql/sql.go#L686.
Graphs show we are opening 100-200 fresh connections per second on the SA.
Changing this default should reduce that a lot, which should reduce load on both
the SA and MariaDB. This should also improve latency, since every new TCP
connection adds a little bit of latency.
2017-10-16 15:48:17 -07:00
Jacob Hoffman-Andrews fa716769e9 Move CAConfig out of cmd. (#3165)
We removed most of the component-specific configs out of cmd a long time ago. We
left CAConfig in, because it is used directly in the NewCertificateAuthority
constructor. That means that moving CAConfig into cmd/boulder-ca would have
resulted in a circular dependency.

Eventually we probably want to decompose CAConfig so it's a set of arguments to
NewCertificateAuthority, but as a short term improvement, move the config into
its own package to break the dependency. This has the advantage of removing a
couple of big dependencies from cmd.
2017-10-12 16:16:38 -07:00
Brian Smith 9d324631a7 CA: Set certificates' validity periods using the CA's clock. (#2983) 2017-09-14 09:40:31 -04:00