Commit Graph

276 Commits

Author SHA1 Message Date
Samantha 90eb90bdbe
test: Replace sd-test-srv with consul (#6389)
- Add a dedicated Consul container
- Replace `sd-test-srv` with Consul
- Add documentation for configuring Consul
- Re-issue all gRPC credentials for `<service-name>.service.consul`

Part of #6111
2022-09-19 16:13:53 -07:00
Jacob Hoffman-Andrews db044a8822
log: fix spurious honeycomb warnings; improve stdout logger (#6364)
Honeycomb was emitting logs directly to stderr like this:

```
WARN: Missing API Key.
WARN: Dataset is ignored in favor of service name. Data will be sent to service name: boulder
```

Fix this by providing a fake API key and replacing "dataset" with "serviceName" in configs. Also add missing Honeycomb configs for crl-updater.

For stdout-only logger, include checksums and escape newlines.
2022-09-14 11:25:02 -07:00
Aaron Gable 7f189f7a3b
Improve how crl-updater formats and surfaces errors (#6369)
Make every function in the Run -> Tick -> tickIssuer -> tickShard chain
return an error. Make that return value a named return (which we usually
avoid) so that we can remove the manual setting of the metric result
label and have the deferred metric handling function take care of that
instead. In addition, let that cleanup function wrap the returned error
(if any) with the identity of the shard, issuer, or tick that is
returning it, so that we don't have to include that info in every
individual error message. Finally, have the functions which spin off
many helpers (Tick and tickIssuer) collect all of their helpers' errors
and only surface that error at the end, to ensure the process completes
even in the presence of transient errors.

In crl-updater's main, surface the error returned by Run or Tick, to
make debugging easier.
2022-09-12 11:36:42 -07:00
Aaron Gable 78fbda1cd2
Enable CRL test in config integration tests (#6368)
Now that both crl-updater and crl-storer are running in prod,
run this integration test in both test environments as well.

In addition, remove the fake storer grpc client that the updater
used when no storer client was configured, as storer clients
are now configured in all environments.
2022-09-09 16:03:49 -07:00
Jacob Hoffman-Andrews 6ad06789d9
rocsp-tool: add "get-pem" output (#6317)
Emit PEM output instead of pretty-printed output. Send the pretty-printed
output straight to stdout instead of via a logger, so the internal newlines don't
get escaped.

Fixes #6310
2022-08-25 12:52:58 -07:00
Aaron Gable c1be8cfc52
crl-storer: load whole AWS config files (#6309)
Allow the crl-storer to load whole AWS config files. Although
this requires a deployment to maintain an additional config
files for the crl-storer, and one in a format we usually don't
use, it does give us lots of flexibility in setting up things like
role assumption.

Also remove the S3Region config flag, as it is now redundant
with the contents of the config file, and rename the existing
S3CredsFile config key to AWSCredsFile to better represent
its true contents.

Fixes #6308
2022-08-23 11:04:12 -07:00
Aaron Gable b001af71e8
Add new services to log-validator test config (#6303)
Fixes #6289
2022-08-17 16:46:11 -07:00
Aaron Gable 6a9bb399f7
Create new crl-storer service (#6264)
Create a new crl-storer service, which receives CRL shards via gRPC and
uploads them to an S3 bucket. It ignores AWS SDK configuration in the
usual places, in favor of configuration from our standard JSON service
config files. It ensures that the CRLs it receives parse and are signed
by the appropriate issuer before uploading them.

Integrate crl-updater with the new service. It streams bytes to the
crl-storer as it receives them from the CA, without performing any
checking at the same time. This new functionality is disabled if the
crl-updater does not have a config stanza instructing it how to connect
to the crl-storer.

Finally, add a new test component, the s3-test-srv. This acts similarly
to the existing mail-test-srv: it receives requests, stores information
about them, and exposes that information for later querying by the
integration test. The integration test uses this to ensure that a
newly-revoked certificate does show up in the next generation of CRLs
produced.

Fixes #6162
2022-08-08 16:22:48 -07:00
Aaron Gable 694d73d67b
crl-updater: add UpdateOffset config to run on a schedule (#6260)
Add a new config key `UpdateOffset` to crl-updater, which causes it to
run on a regular schedule rather than running immediately upon startup
and then every `UpdatePeriod` after that. It is safe for this new config
key to be omitted and take the default zero value.

Also add a new command line flag `runOnce` to crl-updater which causes
it to immediately run a single time and then exit, rather than running
continuously as a daemon. This will be useful for integration tests and
emergency situations.

Part of #6163
2022-07-29 13:30:16 -07:00
Aaron Gable 9ae16edf51
Fix race condition in revocation integration tests (#6253)
Add a new filter to mail-test-srv, allowing test processes to query
for messages sent from a specific address, not just ones sent to
a specific address. This fixes a race condition in the revocation
integration tests where the number of messages sent to a cert's
contact address would be higher than expected because expiration
mailer sent a message while the test was running. Also reduce
bad-key-revoker's maximum backoff to 2 seconds to ensure that
it continues to run frequently during the integration tests, despite
usually not having any work to do.

While we're here, also improve the comments on various revocation
integration tests, remove some unnecessary cruft, and split the tests
out to explicitly test functionality with the MozRevocationReasons
flag both enabled and disabled. Also, change ocsp_helper's default
output from os.Stdout to ioutil.Discard to prevent hundreds of lines
of log spam when the integration tests fail during a test that uses
that library.

Fixes #6248
2022-07-29 09:23:50 -07:00
Jacob Hoffman-Andrews 3b09571e70
ocsp-responder: add LiveSigningPeriod (#6237)
Previously we used "ExpectedFreshness" to control how frequently the
Redis source would request re-signing of stale entries. But that field
also controls whether multi_source is willing to serve a MariaDB
response. It's better to split these into two values.
2022-07-20 15:36:38 -07:00
Aaron Gable 436061fb35
CRL: Create crl-updater service (#6212)
Create a new service named crl-updater. It is responsible for
maintaining the full set of CRLs we issue: one "full and complete" CRL
for each currently-active Issuer, split into a number of "shards" which
are essentially CRLs with arbitrary scopes.

The crl-updater is modeled after the ocsp-updater: it is a long-running
standalone service that wakes up periodically, does a large amount of
work in parallel, and then sleeps. The period at which it wakes to do
work is configurable. Unlike the ocsp-responder, it does all of its work
every time it wakes, so we expect to set the update frequency at 6-24
hours.

Maintaining CRL scopes is done statelessly. Every certificate belongs to
a specific "bucket", given its notAfter date. This mapping is generally
unchanging over the life of the certificate, so revoked certificate
entries will not be moving between shards upon every update. The only
exception is if we change the number of shards, in which case all of the
bucket boundaries will be recomputed. For more details, see the comment
on `getShardBoundaries`.

It uses the new SA.GetRevokedCerts method to collect all of the revoked
certificates whose notAfter timestamps fall within the boundaries of
each shard's time-bucket. It uses the new CA.GenerateCRL method to sign
the CRLs. In the future, it will send signed CRLs to the crl-storer to
be persisted outside our infrastructure.

Fixes #6163
2022-07-08 09:34:51 -07:00
Jacob Hoffman-Andrews fda4124471
expiration-mailer: truncate serials and dns names (#6148)
This avoids sending excessively large emails and excessively large log
lines.

Fixes #6085
2022-06-14 15:48:00 -07:00
Aaron Gable f7ab64f05b
Remove last references to CFSSL (#6155)
Just a docs and config cleanup.
2022-06-14 14:22:34 -07:00
Jacob Hoffman-Andrews 4467cf27db
Update config from config-next (#6051)
This copies over settings from config-next that are now deployed in prod.

Also, I updated a comment in sd-test-srv to more accurately describe how SRV records work.
2022-04-19 12:10:26 -07:00
Samantha 7c22b99d63
akamai-purger: Improve throughput and configuration safety (#6006)
- Add new configuration key `throughput`, a mapping which contains all
  throughput related akamai-purger settings.
- Deprecate configuration key `purgeInterval` in favor of `purgeBatchInterval` in
  the new `throughput` configuration mapping.
- When no `throughput` or `purgeInterval` is provided, the purger uses optimized
  default settings which offer 1.9x the throughput of current production settings.
- At startup, all throughput related settings are modeled to ensure that we
  don't exceed the limits imposed on us by Akamai.
- Queue is now `[][]string`, instead of `[]string`.
  - When a given queue entry is purged we know all 3 of it's URLs were purged.
  - At startup we know the size of a theoretical request to purge based on the
    number of queue entries included
- Raises the queue size from ~333-thousand cached OCSP responses to
  1.25-million, which is roughly 6 hours of work using the optimized default
  settings
- Raise `purgeInterval` in test config from 1ms, which violates API limits, to 800ms

Fixes #5984
2022-03-23 17:23:07 -07:00
Samantha 3e9eaf84ea
rocsp-tool: Add syslog support (#6010)
Add a logging stanza to rocsp-tool's config, and initialize a boulder
logger rather than using Go's default log facilities.

Fixes #5976
2022-03-21 14:51:56 -07:00
Aaron Gable 910dde95f6
Clean up goodkey configs (#5993)
Fixes https://github.com/letsencrypt/boulder/issues/5851
2022-03-15 15:26:19 -07:00
Andrew Gabbitas d006588f46
Orphan finder: Fix redundant syslog config value (#5971)
Replace redundant stdoutlevel with a sysloglevel value in test configs.
2022-02-24 14:24:03 -08:00
Aaron Gable 5c02deabfb
Remove wfe1 integration tests (#5840)
These tests are testing functionality that is no longer in use in
production deployments of Boulder. As we go about removing wfe1
functionality, these tests will break, so let's just remove them
wholesale right now. I have verified that all of the tests removed in
this PR are duplicated against wfe2.

One of the changes in this PR is to cease starting up the wfe1 process
in the integration tests at all. However, that component was serving
requests for the AIA Issuer URL, which gets queried by various OCSP and
revocation tests. In order to keep those tests working, this change also
adds an integration-test-only handler to wfe2, and updates the CA
configuration to point at the new handler.

Part of #5681
2021-12-10 12:40:22 -08:00
Aaron Gable 316ebb44ea
Enable GetAuthzReadOnly flag in prod tests (#5824)
This flag has been enabled in prod. Not deprecating it yet because it
hasn't been live for very long.
2021-12-01 14:47:51 -08:00
Jacob Hoffman-Andrews 2b21586573
rocsp-tool: cursor scans in load-from-db (#5821)
This is necessary because if a single query response gets too big,
MariaDB will terminate it.
2021-12-01 13:41:17 -08:00
Jacob Hoffman-Andrews 4f1934af82
Add load-from-db support to rocsp-tool (#5778)
This scans the database for certificateStatus rows, gets them signed by the CA, and writes them to Redis.

Also, bump the default PoolSize for Redis to 100.
2021-11-08 17:35:10 -08:00
Jacob Hoffman-Andrews 7fab32a000
Add rocsp-tool to manually store OCSP responses in Redis (#5758)
This is a sort of proof of concept of the Redis interaction, which will
evolve into a tool for inspection and manual repair of missing entries,
if we find ourselves needing to do that.

The important bits here are rocsp/rocsp.go and
cmd/rocsp-tool/main.go. Also, the newly-vendored Redis client.
2021-11-02 11:04:03 -07:00
Jacob Hoffman-Andrews ba0ea090b2
integration: save hierarchy across runs (#5729)
This allows repeated runs using the same hiearchy, and avoids spurious
errors from ocsp-updater saying "This CA doesn't have an issuer cert
with ID XXX"

Fixes #5721
2021-10-20 17:06:33 -07:00
Jacob Hoffman-Andrews dc742fc320
Fix expiration-mailer integration test locally. (#5719)
The expiration mailer processes certificates in batches of size
`certLimit` (default 100). In production, it runs in daemon mode, so it
will go on to the next batch when the current one is done. However, in
local integration tests we rely on it getting all its work done in a
single run. This works when you're running from a clean slate, but if
you've run integration tests a bunch of times, there will be a bunch of
certificates from previous runs that clog up the queue, and it won't
send mail for the specific certificate the integration test is looking
for.

Solution: Set `certLimit` very high in the config.

Also, update the default times for sending mail to match what we have in
prod.
2021-10-18 19:51:34 -07:00
Aaron Gable 3f3f250212
Sync RA feature flags (#5678)
These flags are enabled in both prod and staging, so let's enable them
in our integration tests.
2021-09-30 11:00:41 -07:00
Aaron Gable e0c3e2c1df
Reject unrecognized config keys (#5649)
Instead of using the default `json.Unmarshal`, explicitly
construct and use a `json.Decoder` so that we can set the
`DisallowUnknownFields` flag on the decoder. This causes
any unrecognized config keys to result in errors at boulder
startup time.

Fixes #5643
2021-09-24 10:13:44 -07:00
Aaron Gable 8a70bff2b4
Deprecate cert-checker CLI flags (#5511)
Throw away the result of parsing various command-line flags in
cert-checker. Leave the flags themselves in place to avoid breaking
any scripts which pass them, but only respect the values provided by
the config file.

Part of #5489
2021-08-16 10:12:27 -07:00
Aaron Gable aad7fae228
Synchronize test configs for deployed changes (#5574)
These config changes have been deployed in prod, and
can be synchronized between our config and config-next
test environments.
2021-08-16 08:43:16 -07:00
Aaron Gable 1c6842cf69
Delete expired-authz-purger2 (#5570)
Delete the expired-authz-purger2 binary, as well as the
various config files, tests, and test helpers that exist to
support it.

This utility is no longer necessary, as it has not been
running for quite some time, and we have developed
alternative means of keeping the growth of the authz
table under control.

Fixes #5568
2021-08-11 14:39:57 -07:00
Aaron Gable ac3e5e70c4
Delete boulder-janitor (#5571)
Delete the boulder-janitor binary, and the various configs
and tests which exist to support it.

This tool has not been actively running in quite some time.
The tables which is covers are either supported by our
more recent partitioning methods, or are rate-limit tables
that we hope to move out of mysql entirely. The cost of
maintaining the janitor is not offset by the benefits it brings
us (or the lack thereof).

Fixes #5569
2021-08-11 11:10:24 -07:00
Aaron Gable 20f1bf1d0d
Compute validity periods inclusive of notAfter second (#5494)
In the CA, compute the notAfter timestamp such that the cert is actually
valid for the intended duration, not for one second longer. In the
Issuance library, compute the validity period by including the full
length of the final second indicated by the notAfter date when
determining if the certificate request matches our profile. Update tests
and config files to match.

Fixes #5473
2021-06-24 13:17:29 -07:00
Aaron Gable 6e1357efa3
Update boulder test validity period to match prod (#5493)
In prod, the CA is now configured to issue certificates with
notAfter timestamps 7775999 seconds after their notBefore
timestamp, and to enforce that same difference when validating
issuance requests.

Update our test configs to match.
2021-06-16 18:08:57 -07:00
Samantha be1c24165e
test: Fix uppercase ECDSAAllowListFilename in test JSON configs (#5487) 2021-06-16 14:24:30 -07:00
Andrew Gabbitas b5aab29407
Make boulder-observer HTTP User-Agent configurable (#5484)
- Make User-Agent configurable in config file
- Fix README example
- Add tests
2021-06-14 11:08:18 -06:00
Samantha d574b50c41
CA: Deprecate field ECDSAAllowedAccounts (#5477)
- Remove field `ECDSAAllowedAccounts` from CA
- Remove `ECDSAAllowedAccounts` from CA tests
- Replace `ECDSAAllowedAccounts` with `ECDSAAllowListFilename` in
  `test/config/ca-a.json` and `test/config/ca-b.json`
- Add YAML allow list file at `test/config/ecdsaAllowList.yml`

Fixes #5394
2021-06-11 12:13:01 -07:00
Samantha 6955df0f56
contact-auditor: Add tool to audit registration contacts (#5425)
Add tool to audit subscriber registrations for e-mail addresses that
`notify-mailer` is currently configured to skip.

- Add `cmd/contact-auditor` with README
- Add test coverage for `cmd/contact-auditor`
- Add config file at `test/config/contact-auditor`

Part of #5372
2021-06-07 14:21:54 -07:00
Aaron Gable 9abb39d4d6
Honeycomb integration proof-of-concept (#5408)
Add Honeycomb tracing to all Boulder components which act as
HTTP servers, gRPC servers, or gRPC clients. Add many values
which we currently emit to logs to the trace spans. Add a way to
configure the Honeycomb integration to our config files, and by
default configure all of our tests to "mute" (send nothing).

Followup changes will refine the configuration, attempt to reduce
the new dependency load, and introduce better sampling.

Part of https://github.com/letsencrypt/dev-misc-tickets/issues/218
2021-05-24 16:13:08 -07:00
Samantha 1f19eee55b
CA: Fix startup bug caused by ECDSA allow list reloader (#5412)
Solve a nil pointer dereference of `ecdsaAllowList` in `boulder-ca` by
calling `reloader.New()` in constructor `ca.NewECDSAAllowListFromFile`
instead.

- Add missing entry `ECDSAAllowListFilename` to
  `test/config-next/ca-a.json` and `test/config-next/ca-b.json`
- Add missing file ecdsaAllowList.yml to `test/config-next`
- Add missing entry `ECDSAAllowedAccounts` to `test/config/ca-a.json`
  and `test/config/ca-b.json`
- Move creation of the reloader to `NewECDSAAllowListFromFile`

Fixes #5414
2021-05-17 14:41:15 -07:00
Aaron Gable 6e6be607fa
Deprecate StoreIssuerInfo flag (#5386)
This flag is no longer referenced by any code, and can
be safely deprecated.

Part of #5079
2021-04-13 17:18:01 -07:00
Aaron Gable b246d9cc45
Remove certDER OCSP generation code path from CA (#5117)
Only process OCSP generation requests which are identified
by the certificate's serial number and the ID (not NameID,
unfortunately) of its issuer. Delete the code path which handled
OCSP generation for requests identified by the full DER of
the certificate in question.

Update existing tests to use serial+id to request OCSP, and
move test cases from the old `TestGenerateOCSPWithIssuerID`
into the default test method.

Part of #5079
2021-04-09 16:08:05 -07:00
Samantha 35340ff67a
Move expired-authz-purger2 config to test directory (#5352)
- Edit integration test to start expired-authz-purger2 with config/
  config-next
- Move config from `cmd/expired-authz-purger2/config.json` to
  `test/config/expired-authz-purger2.json`
- Add a copy of `test/config/expired-authz-purger2.json` to
  `test/config-next/`

Fixes #5351
2021-03-18 17:56:25 -07:00
Aaron Gable f569b15b64
Remove common config from ocsp-responder (#5350)
The old `config.Common.IssuerCert` format is no longer used in any
production configs, and can be removed safely.

Part of #5162
Part of #5242
2021-03-18 17:16:37 -07:00
Aaron Gable 91473b384b
Remove common config from publisher (#5353)
The old `config.Common.CT.IntermediateBundleFilename` format is no
longer used in any production configs, and can be removed safely.

Part of #5162
Part of #5242
Fixes #5269
2021-03-18 16:59:06 -07:00
Samantha 5a92926b0c
Remove dbconfig migration deployability code (#5348)
Default boulder code paths to exclusively use the `db` config key

Fixes #5338
2021-03-18 16:41:15 -07:00
Aaron Gable bae699fae1
Update CA test config to use NonCFSSLSigner (#5344)
This config is now live in production.

Part of #5115
2021-03-17 09:41:02 -07:00
Samantha 7cb0038498
Deprecate MaxDBConns for MaxOpenConns (#5274)
In #5235 we replaced MaxDBConns in favor of MaxOpenConns.

One week ago MaxDBConns was removed from all dev, staging, and
production configurations. This change completes the removal of
MaxDBConns from all components and test/config.

Fixes #5249
2021-02-08 12:00:01 -08:00
Aaron Gable 379826d4b5
WFE2: Improve support for multiple issuers & chains (#5247)
This change simplifies and hardens the wfe2's support for having
multiple issuers, and multiple chains for each issuer, configured
and loaded in memory.

The only config-visible change is replacing the old two separate config
values (`certificateChains` and `alternateCertificateChains`) with a
single value (`chains`). This new value does not require the user to
know and hand-code the AIA URLs at which the certificates are available;
instead the chains are simply presented as lists of files. If this new
config value is present, the old config values will be ignored; if it
is not, the old config values will be respected.

Behind the scenes, the chain loading code has been completely changed.
Instead of loading PEM bytes directly from the file, and then asserting
various things (line endings, no trailing bits, etc) about those bytes,
we now parse a certificate from the file, and in-memory recreate the
PEM from that certificate. This approach allows the file loading to be
much more forgiving, while also being stricter: we now check that each
certificate in the chain is correctly signed by the next cert, and that
the last cert in the chain is a self-signed root.

Within the WFE itself, most of the internal structure has been retained.
However, both the internal `issuerCertificates` (used for checking
that certs we are asked to revoke were in fact issued by us) and the
`certificateChains` (used to append chains to end-entity certs when
served to clients) have been updated to be maps keyed by IssuerNameID.
This allows revocation checking to not have to iterate through the
whole list of issuers, and also makes it easy to double-check that
the signatures on end-entity certs are valid before serving them. Actual
checking of the validity will come in a follow-up change, due to the
invasive nature of the necessary test changes.

Fixes #5164
2021-01-27 15:07:58 -08:00
Aaron Gable beee17c510
Janitor: refactor to be controlled by config (#5195)
Previously, configuration of the boulder-janitor was split into
two places: the actual json config file (which controlled which
jobs would be enabled, and what their rate limits should be), and
the janitor code itself (which controlled which tables and columns
those jobs should query). This resulted in significant duplicated
code, as most of the jobs were identical except for their table
and column names.

This change abstracts away the query which jobs use to find work.
Instead of having each job type parse its own config and produce
its own work query (in Go code), now each job supplies just a few
key values (the table name and two column names) in its JSON config,
and the Go code assembles the appropriate query from there. We are
able to delete all of the files defining individual job types, and
replace them with a single slightly smarter job constructor.

This enables further refactorings, namely:
* Moving all of the logic code into its own module;
* Ensuring that the exported interface of that module is safe (i.e.
  that a client cannot create and run jobs without them being valid,
  because the only exposed methods ensure validity);
* Collapsing validity checks into a single location;
* Various renamings.
2020-12-17 09:53:22 -08:00
Aaron Gable cb6effad9f
Add keyHashToSerial cleanup to boulder-janitor (#5194)
This adds a new job type (and corresponding config) to the janitor
to clean up old rows from the `keyHashToSerial` table. Rows in this
table are no longer relevant after their corresponding certificate
has expired.

Fixes #4792
2020-12-10 16:46:07 -08:00
Aaron Gable aa09c1661c
Clean up akamai purger configs (#5174)
These configs contained a `common.issuerCert` field which is not
read or used by the akamai-purger service. This change removes
is for clarity.

In addition, these files had irregular indentation. This change cleans
them up to use two-space indents throughout.
2020-11-11 09:46:58 -08:00
Aaron Gable ff363755d8
Remove dead code from ocsp-updater (#5156)
The akamai purger is now a standalone service, and the ocsp-updater does
not talk to it directly. All of the code in the ocsp-updater related to
the akami purger is gated behind tests that the corresponding config
values are not nil, and the resulting objects are never used by the
service's business logic.

Now that all of the corresponding config stanzas have been removed
from our production configs, we can remove them from the config
struct definitions as well.
2020-11-02 11:10:20 -08:00
Jacob Hoffman-Andrews 5eaa1af35a
Lower log level for bad-key-revoker. (#5132)
Right now if you run docker-compose up, this spams the log output
unnecessarily.
2020-10-16 14:40:52 -07:00
Aaron Gable 2b7389b783
Enable boulder issuance in config-next tests (#5109)
This is effectively a revert of #5036, re-enabling the `NonCFSSLSigner`
feature flag in config-next, and adding the necessary config stanzas
to configure the new issuance library.

It also slightly reorders the config stanzas in the CA's config files
in order to have them in the same order as they are defined in
`boulder-ca/main.go`.

This will be followed by adding an ECDSA issuer cert to our test
environment, and switching to use that one for ECDSA requests
in the integration tests.

Fixes #5053
2020-10-07 14:15:04 -07:00
Aaron Gable 3666322817
Add health-checker tool and use it from startservers.py (#5095)
This adds a new tool, `health-checker`, which is a client of the new
Health Checker Service that has been integrated into all of our
boulder components. This tool takes an address, a timeout, and a
config file. It then attempts to connect to a gRPC Health Service at
the given address, retrying until it hits its timeout, using credentials
specified by the config file.

This is then wrapped by a new function `waithealth` in our Python
helpers, which serves much the same function as `waitport`, but
specifically for services which surface a gRPC Health Service
This in turn requires slight modifications to `startservers`, namely
specifying the address and port on which each service starts its
gRPC listener.

Finally, this change also introduces new credentials for this
health-checker, and adds those credentials as a valid client to
all services' json configs. A similar change would have to be made
to our production configs if we were to establish a long-lived 
health checker/prober in prod.

Fixes #5074
2020-10-06 15:01:35 -07:00
Jacob Hoffman-Andrews 0c543e7e2f
Move FasterNewOrdersRateLimit flag to config/ (#4969)
This flag is now live. Also move the migration from _db-next to _db.
2020-07-20 14:47:31 -07:00
Aaron Gable 440c5f96d9
Remove unreferenced values from test configs (#4959) 2020-07-15 13:50:00 -07:00
orangepizza dee757c057
Remove multiva exception list code (#4933)
Fixes #4931
2020-07-08 10:57:17 -07:00
Aaron Gable 35c19c2e08
Deprecate StoreKeyHashes flag (#4927)
The StoreKeyHashes feature flag controls whether rows are added to the
keyHashToSerial table. This feature is now enabled everywhere, so the
flag-protected code can be turned on unconditionally and the flag
removed from configs.

Related to #4895
2020-07-06 10:02:39 -07:00
Jacob Hoffman-Andrews 56d581613c
Update test/config. (#4923)
This copies over a number of features flags and other settings from
test/config-next that have been applied in prod.

Also, remove the config-next gate on various tests.
2020-07-01 17:59:14 -07:00
Jacob Hoffman-Andrews ca26126ca9
Replace master with main. (#4917)
Also, update an example username in mailer tests.
2020-06-30 16:39:39 -07:00
Aaron Gable d16d3fd067
Remove OCSPStaleMaxAge config value and handling (#4911)
The OCSPStaleMaxAge config value was added in #2419 as part of an
effort to ensure that ocsp-updater's queries of the certificateStatus
table were efficient. It was never intended as a long-term fix:
in #2431 and #2432 the query was updated to index on the much more
efficient isExpired and notAfter columns if a feature flag was set,
and in #2561 that code path was made the default and the flag removed.

However, the `WHERE ocspLastUpdate > ocspStaleMaxAge` clause has
remained in the query. This is redundant, as the ocspStaleMaxAge has
always been set to 5040 hours, or 210 days, significantly longer than
the 90-day expiration of Let's Encrypt certs.

This change removes that clause from the query, and removes the config
scaffolding around it. In addition, it updates the tests to remove
workarounds necessitated by this column, and simplifies and documents
them for future readers.

Fixes #4884
2020-06-29 12:42:51 -07:00
Jacob Hoffman-Andrews 0b0917cea6
Revert "Remove StoreIssuerInfo flag in CA (#4850)" (#4868)
This reverts commit 6454513ded.

We actually need to wait 90 days to ensure the issuerID field of the
certificateStatus table is non-nil for all extant certificates.
2020-06-12 12:50:24 -07:00
Jacob Hoffman-Andrews 6454513ded
Remove StoreIssuerInfo flag in CA (#4850)
As part of that, add support for issuer IDs in orphan-finder's
and RA's calls to GenerateOCSP.

This factors out the idForIssuer logic from ca/ca.go into a new
issuercerts package.

orphan-finder refactors:

Add a list of issuers in config.

Create an orphanFinder struct to hold relevant fields, including the
newly added issuers field.

Factor out a storeDER function to reduce duplication between the
parse-der and parse-ca-log cases.

Use test certificates generated specifically for orphan-finder tests.
This was necessary because the issuers of these test certificates have
to be configured for the orphan finder.
2020-06-09 12:25:13 -07:00
Roland Bracewell Shoemaker 7673f02803
Use cmd/ceremony in integration tests (#4832)
This ended up taking a lot more work than I expected. In order to make the implementation more robust a bunch of stuff we previously relied on has been ripped out in order to reduce unnecessary complexity (I think I insisted on a bunch of this in the first place, so glad I can kill it now).

In particular this change:

* Removes bhsm and pkcs11-proxy: softhsm and pkcs11-proxy don't play well together, and any softhsm manipulation would need to happen on bhsm, then require a restart of pkcs11-proxy to pull in the on-disk changes. This makes manipulating softhsm from the boulder container extremely difficult, and because of the need to initialize new on each run (described below) we need direct access to the softhsm2 tools since pkcs11-tool cannot do slot initialization operations over the wire. I originally argued for bhsm as a way to mimic a network attached HSM, mainly so that we could do network level fault testing. In reality we've never actually done this, and the extra complexity is not really realistic for a handful of reasons. It seems better to just rip it out and operate directly on a local softhsm instance (the other option would be to use pkcs11-proxy locally, but this still would require manually restarting the proxy whenever softhsm2-util was used, and wouldn't really offer any realistic benefit).
* Initializes the softhsm slots on each integration test run, rather than when creating the docker image (this is necessary to prevent churn in test/cert-ceremonies/generate.go, which would need to be updated to reflect the new slot IDs each time a new boulder-tools image was created since slot IDs are randomly generated)
* Installs softhsm from source so that we can use a more up to date version (2.5.0 vs. 2.2.0 which is in the debian repo)
* Generates the root and intermediate private keys in softhsm and writes out the root and intermediate public keys to /tmp for use in integration tests (the existing test-{ca,root} certs are kept in test/ because they are used in a whole bunch of unit tests. At some point these should probably be renamed/moved to be more representative of what they are used for, but that is left for a follow-up in order to keep the churn in this PR as related to the ceremony work as possible)
Another follow-up item here is that we should really be zeroing out the database at the start of each integration test run, since certain things like certificates and ocsp responses will be signed by a key/issuer that is no longer is use/doesn't match the current key/issuer.

Fixes #4832.
2020-06-03 15:20:23 -07:00
Jacob Hoffman-Andrews dd1e50f4b3
Move multiva to config/ (#4808) 2020-05-13 19:15:19 -07:00
Jacob Hoffman-Andrews 5527716410
Port v1 integration tests to v2. (#4807)
As of this change, each test case in v1_integration.py has an equivalent
in v2_integration.py. This mostly involved copying the test cases and
tweaking them to use chisel2.py. I had to add support for updating email
addresses in chisel2.py (copied from chisel.py) in order to support one
of the test cases.

The VA was not yet configured to recognize account paths that start
with the ACMEv2 path, so I added that configuration.

The most useful way to see what's changed in porting the test cases
is to check out this branch and then do a diff between v1_integration.py
and v2_integration.py.
2020-05-13 11:59:04 -07:00
Jacob Hoffman-Andrews 87fb6028c1
Add log validator to integration tests (#4782)
For now this mainly provides an example config and confirms that
log-validator can start up and shut down cleanly, as well as provide a
stat indicating how many log lines it has handled.

This introduces a syslog config to the boulder-tools image that will write
logs to /var/log/program.log. It also tweaks the various .json config
files so they have non-default syslogLevel, to ensure they actually
write something for log-validator to verify.
2020-04-20 13:33:42 -07:00
Jacob Hoffman-Andrews 36c1f1ab2d
Deprecate some feature flags (#4771)
Deprecate some feature flags.

These are all enabled in production.
2020-04-13 15:49:55 -07:00
Roland Bracewell Shoemaker 47d6225201
SA: Make WriteIssuedNamesPrecert behavior default (#4662)
Fixes #4579.
2020-02-03 13:44:11 -05:00
Daniel McCarney bff9eb0534
CI/Dev: restore allowOrigins wfe/wfe2 config. (#4650)
In 67ec373a96 we removed "unused" WFE and WFE2
config elements. Unfortunately I missed that one of these elements,
`allowOrigins`, **is** used and without this config in
place CORS is broken.

We have unit tests for the CORS headers but we did not have any end-to-end
integration tests that would catch a problem with the WFE/WFE2 missing the
`allowOrigins` config element.

This commit restores the `allowOrigins` config value across the WFE/WFE2
configs and also adds a very small integration test. That test only checks one
CORS header and only for the HTTP ACMEv2 endpoint but I think it's sufficient
for the moment (and definitely better than nothing).

Prior to fixing the config elements the integration test fails as expected:
```
--- FAIL: TestWFECORS (0.00s)
    wfe_test.go:28: "" != "*"
FAIL
FAIL	github.com/letsencrypt/boulder/test/integration	0.014s
FAIL
```
2020-01-17 14:41:20 -05:00
Daniel McCarney 67ec373a96
CI/Dev: Delete old/unused WFE config elements. (#4641)
The config elements deleted from the four WFE config files are not used
anywhere.
2020-01-10 12:37:22 -05:00
Roland Bracewell Shoemaker ea231adc36 features: remove deprecated feature flags (#4607)
Confirmed none of these features are currently present in any staging or 
production configs.
2019-12-09 15:59:27 -05:00
Daniel McCarney fde145ab96
RA: implement stricter email validation. (#4574)
Prev. we weren't checking the domain portion of an email contact address
very strictly in the RA. This updates the PA to export a function that
can be used to validate the domain the same way we validate domain
portions of DNS type identifiers for issuance.

This also changes the RA to use the `invalidEmail` error type in more
places.

A new Go integration test is added that checks these errors end-to-end
for both account creation and account update.
2019-11-22 13:39:31 -05:00
Daniel McCarney df059e093b
janitor: add cleanup of Orders and assoc. rows. (#4544)
The `boulder-janitor` is extended to cleanup rows from the `orders` table that
have expired beyond the configured grace period, and the associated referencing
rows in `requestedNames`, `orderFqdnSets`, and `orderToAuthz2`.

To make implementing the transaction work for the deletions easier/consistent
I lifted the SA's `WithTransaction` code and assoc. functions to a new shared
`db` package. This also let me drop the one-off `janitorDb` interface from the
existing code.

There is an associated change to the `GRANT` statements for the `janitor` DB
user to allow it to find/delete the rows related to orders.

Resolves https://github.com/letsencrypt/boulder/issues/4527
2019-11-13 13:47:55 -05:00
Jacob Hoffman-Andrews 5e608ccbb8 tests: use relative paths in Go integration tests. (#4526)
This makes it simpler to specify paths to testdata and config files.

Fixes #4508
2019-11-05 10:22:20 -05:00
Jacob Hoffman-Andrews d4168626ad Fix orphan-finder (#4507)
This creates the correct type of backend service for the OCSP generator.
It also adds an invocation of orphan-finder during the integration
tests.

This also adds a minor safety check to SA that I hit while writing the
test. Without this safety check, passing a certificate with no DNSNames
to AddCertificate would result in an obscure MariaDB syntax error
without enough context to track it down. In normal circumstances this
shouldn't be hit, but it will be good to have a solid error message if
we hit it in tests sometime.

Also, this tweaks the .travis.yml so it explicitly sets BOULDER_CONFIG_DIR
to test/config in the default case. Because the docker-compose run
command uses -e BOULDER_CONFIG_DIR="${BOULDER_CONFIG_DIR}",
we were setting a blank BOULDER_CONFIG_DIR in default case.
Since the Python startservers script sets a default if BOULDER_CONFIG_DIR
is not set, we haven't noticed this before. But since this test case relies
on the actual environment variable, it became an issue.

Fixes #4499
2019-10-25 09:51:14 -07:00
Jacob Hoffman-Andrews 75e1902524 publisher: allow custom UA for CT submissions. (#4492)
Configure "User-Agent: boulder/1.0" for publisher CT submissions.
2019-10-21 15:08:03 -04:00
Jacob Hoffman-Andrews bdd29a1e27
Promote authzv2 to test/config now that it's live (#4421)
This also removes some awkward dancing we did in integration_test.py to
run setup_twenty_days_ago under the opposite config of whatever we were
about to run tests under.

Reverts most of #4288 and #4290.
2019-09-05 12:33:56 -07:00
Roland Bracewell Shoemaker 62e52f4103 test: weakKeyDirectory -> weakKeyFile in test configs (#4397) 2019-08-12 18:07:56 -04:00
Jacob Hoffman-Andrews 5e7fee0c4a test: update test/config with deployed configs. (#4396) 2019-08-09 12:08:56 -04:00
Jacob Hoffman-Andrews b7250c1d43
integration: test for DisableAuthz2Orders. (#4390)
To make this work, I changed the twenty_days_ago setup to use
`config-next` when the main test phase is running `config`. That, in
turn, made the recheck_caa test fail, so I added a tweak to that.

I also moved the authzv2 migrations into `db`. Without that change,
the integration test would fail during the twenty_days_ago setup because
Boulder would attempt to create authzv2 objects but the table wouldn't
exist yet.
2019-08-08 17:07:29 -07:00
Roland Bracewell Shoemaker db01830508
Return OCSP unauthorized status if the certificate is expired (#4380)
The ocsp-updater ocspStaleMaxAge config var has to be bumped up to ~7 months so that when it is run after the six-months-ago run it will actually update the ocsp responses generated during that period and mark the certificate status row as expired.

Fixes #4338.
2019-08-01 14:13:27 -07:00
Jacob Hoffman-Andrews a68c39ad9b SA: Delete unused challenges (#4353)
For authzv1, this actually executes a SQL DELETE for the unused challenges
when an authorization is updated upon validation.

For authzv2, this doesn't perform a delete, but changes the authorizations that
are returned so they don't include unused challenges.

In order to test the flag for both authz storage models, I set the feature flag in
both config/ and config-next/.

Fixes #4352
2019-07-26 14:04:46 -04:00
Roland Bracewell Shoemaker af41bea99a Switch to more efficient multi nonce-service design (#4308)
Basically a complete re-write/re-design of the forwarding concept introduced in
#4297 (sorry for the rapid churn here). Instead of nonce-services blindly
forwarding nonces around to each other in an attempt to find out who issued the
nonce we add an identifying prefix to each nonce generated by a service. The
WFEs then use this prefix to decide which nonce-service to ask to validate the
nonce.

This requires a slightly more complicated configuration at the WFE/2 end, but
overall I think ends up being a way cleaner, more understandable, easy to
reason about implementation. When configuring the WFE you need to provide two
forms of gRPC config:

* one gRPC config for retrieving nonces, this should be a DNS name that
resolves to all available nonce-services (or at least the ones you want to
retrieve nonces from locally, in a two DC setup you might only configure the
nonce-services that are in the same DC as the WFE instance). This allows
getting a nonce from any of the configured services and is load-balanced
transparently at the gRPC layer. 
* a map of nonce prefixes to gRPC configs, this maps each individual
nonce-service to it's prefix and allows the WFE instances to figure out which
nonce-service to ask to validate a nonce it has received (in a two DC setup
you'd want to configure this with all the nonce-services across both DCs so
that you can validate a nonce that was generated by a nonce-service in another
DC).

This balancing is implemented in the integration tests.

Given the current remote nonce code hasn't been deployed anywhere yet this
makes a number of hard breaking changes to both the existing nonce-service
code, and the forwarding code.

Fixes #4303.
2019-06-28 12:58:46 -04:00
Roland Bracewell Shoemaker 0a16b5f57d Reduce akamai purger interval in integration tests (#4277)
and reduce the verify_akamai_purge deadline/sleep to match the much smaller interval.
2019-06-20 16:31:44 -04:00
Roland Bracewell Shoemaker acc44498d1 RA: Make RevokeAtRA feature standard behavior (#4268)
Now that it is live in production and is working as intended we can remove
the old ocsp-updater functionality entirely.

Fixes #4048.
2019-06-20 14:32:53 -04:00
Roland Bracewell Shoemaker 098a761c02 ocsp-updater: Remove integrated akamai purger (#4258)
This is now an external service.

Also bumps up the deadline in the integration test helper which checks for
purging because using the remote service from the ocsp-updater takes a little
longer. Once we remove ocsp-updater revocation support that can probably be
cranked back down to a more reasonable timeframe.
2019-06-12 09:36:53 -04:00
Roland Bracewell Shoemaker 3532dce246 Excise grpc maxConcurrentStreams configuration (#4257) 2019-06-12 09:35:24 -04:00
Roland Bracewell Shoemaker 4ca01b5de3
Implement standalone nonce service (#4228)
Fixes #3976.
2019-06-05 10:41:19 -07:00
Jacob Hoffman-Andrews 76beffe074 Clean up must staple and precert options in CA (#4201)
Precertificate issuance has been the only supported mode for a while now. This
cleans up the remaining flags in the CA code. The same is true of must staple.

This also removes the IssueCertificate RPC call and its corresponding wrappers,
and removes a lot of plumbing in the CA unittests that was used to test the
situation where precertificate issuance was not enabled.
2019-05-21 15:34:28 -04:00
Daniel McCarney 443c949180
tidy: cleanup JSON hostname policy support. (#4214)
We transitioned this data to YAML to have support for comments and can
remove the legacy JSON support/test data.
2019-05-14 17:06:36 -04:00
Jacob Hoffman-Andrews 0c700143bb Clean up README and test configs (#4185)
- docker-rebuild isn't needed now that boulder and bhsm containers run directly off
 the boulder-tools image.
- Remove DNS options from RA config.
- Remove GSB options from VA config.
2019-04-30 13:26:19 -07:00
Jacob Hoffman-Andrews 8f578f3a93
Improve integration tests (#4143)
- Move fakeclock, get_future_output, and random_domain to helpers.py.
- Remove tempdir handling from integration-test.py since it's already
  done in helpers.py
- Consolidate handling of config dir into helpers.py, and add
  CONFIG_NEXT boolean.
- Move RevokeAtRA config gating into verify_revocation to reduce
  redundancy.
- Skip load-balancing test when filter is enabled.
- Ungate test_sct_embedding
- Rework test_ct_submissions, which was out of date. In particular, have a couple of
  logs where submitFinalCert: false, and make ct-test-srv store submission counts
  by hostnames for better test case isolation.
2019-04-04 10:59:38 -07:00
Jacob Hoffman-Andrews d1e6d0f190 Remove TLS-SNI-01 (#4114)
* Remove the challenge whitelist
* Reduce the signature for ChallengesFor and ChallengeTypeEnabled
* Some unit tests in the VA were changed from testing TLS-SNI to testing the same behavior
  in TLS-ALPN, when that behavior wasn't already tested. For instance timeouts during connect 
  are now tested.

Fixes #4109
2019-03-15 09:05:24 -04:00
Daniel McCarney 279947ade2 CI/Devenv: restore 20s RA->VA timeout. (#4084)
I tried dropping the RA->VA timeout to make the
`test_http_challenge_timeout` integration test faster. It seems to flake
in CI so I'm restoring the original 20s timeout. This makes
`test_http_challenge_timeout` slower but c'est la vie.
2019-02-22 08:53:18 -08:00
Daniel McCarney 3324989205 CI/Dev: Increase RA->VA timeout to 8s. (#4062)
There has been some flakyness in CI related to RA->VA timeouts.
2019-02-15 13:38:12 -08:00
Daniel McCarney 1c0be52e53 VA: Add integration test for HTTP timeouts. (#4050)
Also update `TestHTTPTimeout` to test with the `SimplifiedVAHTTP`
feature flag enabled.
2019-02-12 13:42:01 -08:00
Roland Bracewell Shoemaker 046955e99c Add a standalone akamai purger service (#4040)
Fixes #4030.
2019-02-05 09:00:31 -08:00
Daniel McCarney d1daeee831 Config: serverAddresses -> serverAddress. (#4035)
The plural `serverAddresses` field in gRPC config has been deprecated for a bit now. We've removed the last usages of it in our staging/prod environments and can clear out the related code. Moving forward we only support a singular `serverAddress` and rely on DNS to direct to multiple instances of a given server.
2019-01-25 10:50:53 -08:00
Jacob Hoffman-Andrews 4b9fd1f97e notify-mailer: Support CSV and parameters (#4024)
Fixes #4018 

This rearranges notify-mailer so we can give it CSV input and interpolate fields from that CSV.
It removes the old-style JSON input so we don't have to support two different input styles.

When multiple accounts have the same email address, their recipient data is consolidated under
that address so they only receive a single email. The CSV data can be interpolated using
the `range` operator in Golang templates.

Because we're now operating on the resolved email addresses instead of purely on accounts,
this PR also changes the checkpointing mode. Instead of a numeric start and end, it takes
a pair of strings, and only sends to email addresses between those two strings.
2019-01-22 16:07:17 -08:00
Jacob Hoffman-Andrews 92e8e1708a Update config and config-next challenge settings. (#4017)
- Allow tls-alpn-01 challenge in config.
- Disallow tls-sni-01 challenge in config-next.
- Remove gating of tls-alpn integration test.
- Remove TLSSNIRevalidation in config-next.
2019-01-18 10:30:38 -08:00
Jacob Hoffman-Andrews bb15685a0f
No new certificates tick (#4012)
Since #2633 we generate OCSP at first issuance, so we no longer need 
this loop to check for new certificates that need OCSP status generated.
Since the associate SQL query is slow, we should just turn it off.

Also remove the configuration fields for the MissingSCTTick. The code
for that was already deleted.
2019-01-17 14:43:06 -08:00
Roland Bracewell Shoemaker 842739bccd Remove deprecated features that have been purged from prod and staging configs (#4001) 2019-01-15 16:16:35 -08:00
Roland Bracewell Shoemaker cdef80ce67
Remove Akamai CCU v2 support (#3994)
Fixes #3991.
2019-01-08 12:28:11 -08:00
Roland Bracewell Shoemaker ba7a8e8e5d Add fake Akamai purge server for integration testing (#3946)
Fixes #3916.
2018-11-27 09:49:05 -05:00
Jacob Hoffman-Andrews 4670be1210 Reduce log level for WFE in tests. (#3918)
Our Travis output is quite verbose with the WFE output, and it's very
rare that we have to reference it. I'd like to remove the INFO-level
logs (i.e. the logs of every request) so that it's easier to see real
errors, and faster to scroll to the bottom of logs of failed runs.
2018-11-01 09:50:41 -04:00
Daniel McCarney 3e61513364
config, config-next: remove deprecated ocsp-updater fields (#3884) 2018-10-12 13:52:15 -04:00
Roland Bracewell Shoemaker 9b94d4fdfe Add a orphan queue to the CA (#3832)
Retains the existing logging of orphaned certs until we are confident that this
solution can fully replace it (even then we may want to keep it just for auditing etc).

Fixes #3636.
2018-09-05 11:12:07 -07:00
Roland Bracewell Shoemaker b5f7c62460 Remove leftover publisher CT config (#3803) 2018-07-27 08:05:51 -04:00
Roland Bracewell Shoemaker e27f370fd3 Excise code relating to pre-SCT embedding issuance flow (#3769)
Things removed:

* features.EmbedSCTs (and all the associated RA/CA/ocsp-updater code etc)
* ca.enablePrecertificateFlow (and all the associated RA/CA code)
* sa.AddSCTReceipt and sa.GetSCTReceipt RPCs
* publisher.SubmitToCT and publisher.SubmitToSingleCT RPCs

Fixes #3755.
2018-06-28 08:33:05 -04:00
Jacob Hoffman-Andrews b2f5cf39b9
Bring test/config up to date with test/config-next (#3743)
Notably, enable the precertificate flow, RPCHeadroom, and multi-IP hostnames.
Lots of other changes and feature flags too.
2018-06-01 12:00:52 -07:00
Jacob Hoffman-Andrews f7dd91534d Backport config-next/wfe2.json changes to config/ (#3721) 2018-05-17 08:24:34 -04:00
Daniel McCarney 76a3f4a18f RA/CA: Use `doNotForceCN: false` for `test/config`. (#3698)
In staging/prod we use `doNotForceCN: false` for both the RA & CA
config. Switching this to `true` is blocked on CABF work that will
likely take considerable time.

In the short-term we should use `doNotForceCN: false` in `test/config`
and only use `doNotForceCN: true` in `test/config-next`.
2018-05-09 12:54:16 -07:00
Jacob Hoffman-Andrews a4421ae75b Run gRPC backends on multiple IPs instead of multiple ports (#3679)
We're currently stuck on gRPC v1.1 because of a breaking change to certificate validation in gRPC 1.8. Our gRPC balancer uses a static list of multiple hostnames, and expects to validate against those hostnames. However gRPC expects that a service is one hostname, with multiple IP addresses, and validates all those IP addresses against the same hostname. See grpc/grpc-go#2012.

If we follow gRPC's assumptions, we can rip out our custom Balancer and custom TransportCredentials, and will probably have a lower-friction time in general.

This PR is the first step in doing so. In order to satisfy the "multiple IPs, one port" property of gRPC backends in our Docker container infrastructure, we switch to Docker's user-defined networking. This allows us to give the Boulder container multiple IP addresses on different local networks, and gives it different DNS aliases in each network.

In startservers.py, each shard of a service listens on a different DNS alias for that service, and therefore a different IP address. The listening port for each shard of a service is now identical.

This change also updates the gRPC service certificates. Now, each certificate that is used in a gRPC service (as opposed to something that is "only" a client) has three names. For instance, sa1.boulder, sa2.boulder, and sa.boulder (the generic service name). For now, we are validating against the specific hostnames. When we update our gRPC dependency, we will begin validating against the generic service name.

Incidentally, the DNS aliases feature of Docker allows us to get rid of some hackery in entrypoint.sh that inserted entries into /etc/hosts.

Note: Boulder now has a dependency on the DNS aliases feature in Docker. By default, docker-compose run creates a temporary container and doesn't assign any aliases to it. We now need to specify docker-compose run --use-aliases to get the correct behavior. Without --use-aliases, Boulder won't be able to resolve the hostnames it wants to bind to.
2018-05-07 10:38:31 -07:00
Daniel McCarney 054f181458 load-generator: send correct ACMEv2 Content-Type on POST (#3667)
load generator: send correct ACMEv2 Content-Type on POST.

This PR updates the Boulder load-generator to send the correct ACMEv2 Content-Type header when POSTing the ACME server. This is required for ACMEv2 and without it all POST requests to the WFE2 running a test/config-next configuration result in malformed 400 errors. While only required by ACMEv2 this commit sends it for ACMEv1 requests as well. No harm no foul.

integration tests: allow running just the load generator.
Prior to this PR an omission in an if statement in integration-test.py meant that you couldn't invoke test/integration-test.py with just the --load argument to only run the load generator. This commit updates the if to allow this use case.
2018-05-01 12:22:43 -07:00
Roland Bracewell Shoemaker 0a86573a73 Update integration tests 2018-04-20 13:18:40 -07:00
Jacob Hoffman-Andrews 2093337fb9 Fix whitespace 2018-03-15 10:34:15 -07:00
Jacob Hoffman-Andrews 4a961c3bc8 Ungate config-next for wfe2 and Wildcards. 2018-03-14 13:18:37 -07:00
Roland Bracewell Shoemaker 9c9e944759 Add SCT embedding (#3521)
Adds SCT embedding to the certificate issuance flow. When a issuance is requested a precertificate (the requested certificate but poisoned with the critical CT extension) is issued and submitted to the required CT logs. Once the SCTs for the precertificate have been collected a new certificate is issued with the poison extension replace with a SCT list extension containing the retrieved SCTs.

Fixes #2244, fixes #3492 and fixes #3429.
2018-03-12 11:58:30 -07:00
Roland Bracewell Shoemaker 8446571b46 Remove EnforceChallengeDisable (#3444)
Removes usage of the `EnforceChallengeDisable` feature, the feature itself is not removed as it is still configured in staging/production, once that is fixed I'll submit another PR removing the actual flag.

This keeps the behavior that when authorizations are retrieved from the SA they have their challenges populated, because that seems to make the most sense to me? It also retains TLS re-validation.

Fixes #3441.
2018-02-14 13:21:26 -08:00
Jacob Hoffman-Andrews c556a1a20d
Reduce spurious errors in integration test (#3436)
Boulder is fairly noisy about gRPC connection errors. This is a mixed
blessing: Our gRPC configuration will try to reconnect until it hits
an RPC deadline, and most likely eventually succeed. In that case,
we don't consider those to really be errors. However, in cases where
a connection is repeatedly failing, we'd like to see errors in the
logs about connection failure, rather than "deadline exceeded." So
we want to keep logging of gRPC errors.

However, right now we get a lot of these errors logged during
integration tests. They make the output hard to read, and may disguise
more serious errors. So we'd like to avoid causing such errors in
normal integration test operation.

This change reorders the startup of Boulder components by their gRPC
dependencies, so everything's backend is likely to be up and running
before it starts. It also reverses that order for clean shutdowns,
and waits for each process to exit before signalling the next one.

With these changes, I still got connection errors. Taking listenbuddy
out of the gRPC path fixed them. I believe the issue is that
listenbuddy is not a truly transparent proxy. In particular, it
accepts an inbound TCP connection before opening an outbound TCP
connection. If opening that outbound connection results in "connection
refused," it closes the inbound connection. That means gRPC sees a
"connection closed" (or "connection reset"?) rather than "connection
refused". I'm guessing it handles those cases differently, explaining
the different error results.

We've been using listenbuddy to trigger disconnects while Boulder is
running, to ensure that gRPC's reconnect code works. I think we can
probably rely on gRPC's reconnect to work. The initial problem that
led us to start testing this was a configuration problem; now that
we have the configuration we want, we should be fine and don't need
to keep testing reconnects on every integration test run.
2018-02-12 18:17:50 -08:00
Roland Bracewell Shoemaker fc5c8f76b6 Remove unused features (#3393)
This removes a number of unused features (i.e. they are never checked anywhere).
2018-01-25 08:55:05 -05:00
Daniel McCarney ba264a5091 Remove unused WFE2 feature flags. (#3375)
The WFE2 doesn't check any of the feature flags that are configured in
the `test/config/wfe2.json` and `test/config-next/wfe2.json` config
files - we default to acting as if all new features are enabled for the
V2 work. This commit removes the flags from the config to avoid
confusion or expectations that changing the config will disable the
features.
2018-01-17 12:28:19 -08:00
Daniel McCarney c6d56b7a84 Match RA `authorizationLifetimeDays` to prod. (#3370) 2018-01-16 10:39:57 -08:00
Jacob Hoffman-Andrews 198fd1426a Bring config up-to-date with prod. (#3359)
This brings in some changes from config-next that are now live in production.
2018-01-11 16:29:41 -05:00
Jacob Hoffman-Andrews 827f7859f2 Fix issuerCert in test configs. (#3310)
Previously, there was a disagreement between WFE and CA as to what the correct
issuer certificate was. Consolidate on test-ca2.pem (h2ppy h2cker fake CA).
    
Also, the CA configs contained an outdated entry for "IssuerCert", which was not
being used: The CA configs now use an "Issuers" array to allow signing by
multiple issuer certificates at once (for instance when rolling intermediates).
Removed this outdated entry, and the config code for CA to load it. I've
confirmed these changes match what is currently in production.

Added an integration test to check for this problem in the future.

Fixes #3309, thanks to @icing for bringing the issue to our attention!

This also includes changes from #3321 to clarify certificates for WFE.
2018-01-09 07:56:39 -05:00
Jacob Hoffman-Andrews fdd854a7e5 Fix various WFE2 bugs. (#3292)
- Encode certificate as PEM.
- Use lowercase for field names.
- Use termsOfServiceAgreed instead of Agreement
- Use a different ToS URL for v2 that points at the v2 HTTPS port.

Resolves #3280
2017-12-19 13:13:29 -08:00
Roland Bracewell Shoemaker bdea281ae0 Remove CAA SERVFAIL exceptions code (#3262)
Fixes #3080.
2017-12-05 14:39:37 -08:00
Daniel McCarney 55dd1020c0 Increase VA SingleDialTimeout to 10s. (#3260)
This PR changes the VA's singleDialTimeout value from 5 * time.Second to 10 * time.Second. This will give slower servers a better chance to respond, especially for the multi-VA case where n requests arrive ~simultaneously.

This PR also bumps the RA->VA timeout by 5s and the WFE->RA timeout by 5s to accommodate the increased dial timeout. I put this in a separate commit in case we'd rather deal with this separately.
2017-12-04 09:53:26 -08:00
Jacob Hoffman-Andrews 4296dd985a Use TLS in mailer integration tests (#3213)
* Remove non-TLS support from mailer entirely
* Add a config option for trusted roots in expiration-mailer. If unset, it defaults to the system roots, so this does not need to be set in production.
* Use TLS in mail-test-srv, along with an internal root and localhost certificates signed by that root.
2017-11-06 14:57:14 -08:00
Jacob Hoffman-Andrews 0a64fd4066 Bring test/config up-to-date. (#3056)
Methodology: Copy test/config-next/* into test/config/, then manually review
the diffs, removing any diffs that are not yet in production.
2017-09-11 16:55:58 -04:00
Jacob Hoffman-Andrews 3431acfb92 Adjust testing maxNames config to match prod. (#2911) 2017-07-27 15:23:29 -07:00
Daniel McCarney bd3e2747ba Duplicate WFE to WFE2. (#2839)
This PR is the initial duplication of the WFE to create a WFE2
package. The rationale is briefly explained in `wfe2/README.md`.

Per #2822 this PR only lays the groundwork for further customization
and deduplication. Presently both the WFE and WFE2 are identical except
for the following configuration differences:

* The WFE offers HTTP and HTTPS on 4000 and 4430 respectively, the WFE2
  offers HTTP on 4001 and 4431.
* The WFE has a debug port on 8000, the WFE2 uses the next free "8000
  range port" and puts its debug service on 8013

Resolves https://github.com/letsencrypt/boulder/issues/2822
2017-07-05 13:32:45 -07:00
Jacob Hoffman-Andrews d99800ecb1 Remove some last traces of AMQP. (#2687)
Fixes #2665
2017-04-20 10:43:17 -07:00
Daniel McCarney e6c63f1f11 Removes notafter-backfiller configs. (#2650)
This commit removes the notafter-backfiller config files. The actual
tool was already removed and these are just cruft.
2017-04-05 18:26:05 -07:00
Jacob Hoffman-Andrews cef0a630b3 Remove old-style gRPC TLS configs (#2495)
* Switch Publisher gRPC to use new "tls" block.

* Remove old-style GRPC TLS configs.

* Fix incorrect TLS blocks.

* Remove more config.
2017-04-05 12:41:41 -04:00
Roland Bracewell Shoemaker a46d30945c Purge remaining AMQP code (#2648)
Deletes github.com/streadway/amqp and the various RabbitMQ setup tools etc. Changes how listenbuddy is used to proxy all of the gRPC client -> server connections so we test reconnection logic.

+49 -8,221 😁

Fixes #2640 and #2562.
2017-04-04 15:02:22 -07:00
Jacob Hoffman-Andrews 6719dc17a6 Remove AMQP config and code (#2634)
We now use gRPC everywhere.
2017-04-03 10:39:39 -04:00
Patrick Figel 6ba8aadfd7 Use X.509 AIA Issuer URL in rel="up" link header (#2545)
In order to provide the correct issuer certificate for older certificates after an issuer certificate rollover or when using multiple issuer certificates (e.g. RSA and ECDSA), use the AIA CA Issuer URL embedded in the certificate for the rel="up" link served by WFE. This behaviour is gated behind the UseAIAIssuerURL feature, which defaults to false.

To prevent MitM vulnerabilities in cases where the AIA URL is HTTP-only, it is upgraded to HTTPS.

This also adds a test for the issuer URL returned by the /acme/cert endpoint. wfe/test/178.{crt,key} were regenerated to add the AIA extension required to pass the test.

/acme/cert was changed to return an absolute URL to the issuer endpoint (making it consistent with /acme/new-cert).

Fixes #1663
Based on #1780
2017-02-07 11:19:22 -08:00
Roland Bracewell Shoemaker 18de73f0d8 Pass nil errors through boulder/grpc wrapError/unwrapError (#2544)
Instead of trying to wrap or unwrap them which causes panics.

Also, expand the test_ct_submission integration test to include resubmissions.
2017-02-06 18:19:39 -08:00
Daniel e88db3cd5e
Revert "Revert "Copy all statsd stats to Prometheus. (#2474)" (#2541)"
This reverts commit 9d9e4941a5 and
restores the statsd prometheus code.
2017-02-01 15:48:18 -05:00
Daniel McCarney 9d9e4941a5 Revert "Copy all statsd stats to Prometheus. (#2474)" (#2541)
This reverts commit 58ccd7a71a.

We are seeing multiple boulder components restart when they encounter the stat registration race condition described in https://github.com/letsencrypt/boulder/issues/2540
2017-02-01 12:50:27 -05:00
Jacob Hoffman-Andrews 373ff015a2 Update cfssl, CT, and OCSP dependencies (#2170)
Pulls in logging improvements in OCSP Responder and the CT client, plus a handful of API changes. Also, the CT client verifies responses by default now.

This change includes some Boulder diffs to accommodate the API changes.
2017-01-12 16:01:14 -08:00
Jacob Hoffman-Andrews cbde78d58f Harmonize and tweak configs (#2479)
Set authorizationLifetimeDays to 60 across both config and config-next.

Set NumSessions to 2 in both config and config-next. A decrease from 10 because pkcs11-proxy (or pkcs11-daemon?) seems to error out under load if you have more sessions than CPUs.

Reorder parallelGenerateOCSPRequests to match config-next.

Remove extra tags for parsing yaml in config objects.
2017-01-10 13:46:38 -08:00
Jacob Hoffman-Andrews 58ccd7a71a Copy all statsd stats to Prometheus. (#2474)
We have a number of stats already expressed using the statsd interface. During
the switchover period to direct Prometheus collection, we'd like to make those
stats available both ways. This change automatically exports any stats exported
using the statsd interface via Prometheus as well.

This is a little tricky because Prometheus expects all stats to by registered
exactly once. Prometheus does offer a mechanism to gracefully recover from
registering a stat more than once by handling a certain error, but it is not
safe for concurrent access. So I added a concurrency-safe wrapper that creates
Prometheus stats on demand and memoizes them.

In the process, made a few small required side changes:
 - Clean "/" from method names in the gRPC interceptors. They are allowed in
   statsd but not in Prometheus.
 - Replace "127.0.0.1" with "boulder" as the name of our testing CT log.
   Prometheus stats can't start with a number.
 - Remove ":" from the CT-log stat names emitted by Publisher. Prometheus stats
   can't include it.
 - Remove a stray "RA" in front of some rate limit stats, since it was
   duplicative (we were emitting "RA.RA..." before).

Note that this means two stat groups in particular are duplicated:
 - Gostats* is duplicated with the default process-level stats exported by the
   Prometheus library.
 - gRPCClient* are duplicated by the stats generated by the go-grpc-prometheus
   package.

When writing dashboards and alerts in the Prometheus world, we should be careful
to avoid these two categories, as they will disappear eventually. As a general
rule, if a stat is available with an all-lowercase name, choose that one, as it
is probably the Prometheus-native version.

In the long run we will want to create most stats using the native Prometheus
stat interface, since it allows us to use add labels to metrics, which is very
useful. For instance, currently our DNS stats distinguish types of queries by
appending the type to the stat name. This would be more natural as a label in
Prometheus.
2017-01-10 10:30:15 -05:00
Jacob Hoffman-Andrews 9b8dacab03 Split out separate RPC services for issuing and for signing OCSP (#2452)
This allows finer-grained control of which components can request issuance. The OCSP Updater should not be able to request issuance.

Also, update test/grpc-creds/generate.sh to reissue the certs properly.

Resolves #2417
2017-01-05 15:08:39 -08:00
Daniel McCarney 74c5e68491 Fixes the `config/publisher.json` clientNames list. (#2466)
In https://github.com/letsencrypt/boulder/pull/2453 we created
individual client certificates for each gRPC client. The "clientNames"
list in the `config-next/publisher.json` was updated for the new
component-specific SANs but we neglected to updated
`config/publisher.json`. This caused the `ocsp-updater` (which uses gRPC
in the base `config/` to talk to the `publisher`) to fail to connect.

This commit updates `config/publisher.json` to have the same clientNames
as `config-next/publisher.json` and resolves #2465
2017-01-03 10:10:01 -08:00
Jacob Hoffman-Andrews 089a270453 Add instructions on load testing OCSP generation. (#2459) 2017-01-02 11:36:03 -08:00