We used to use TCP because we would request DNSSEC records from Unbound, and
they would always cause truncated records when present. Now that we no longer
request those (#2718), we can use UDP. This is better because the TCP serving
paths in Unbound are likely less thoroughly tested, and not optimized for high
load. In particular this may resolve some availability problems we've seen
recently when trying to upgrade to a more recent Unbound.
Note that this only affects the Boulder->Unbound path. The Unbound->upstream
path is already UDP by default (with TCP fallback for truncated ANSWERs).
Fixes#3020.
In order to write integration tests for some features, especially related to rate limiting, rechecking of CAA, and expiration of authzs, orders, and certs, we need to be able to fake the passage of time in integration tests.
To do so, this change switches out all clock.Default() instances for cmd.Clock(), which can be set manually with the FAKECLOCK environment variable. integration-test.py now starts up all servers once before the main body of tests, with FAKECLOCK set to a date 70 days ago, and does some initial setup for a new integration test case. That test case tries to fetch a 70-day-old authz URL, and expects it to 404.
In order to make this work, I also had to change a number of our test binaries to shut down cleanly in response to SIGTERM. Without that change, stopping the servers between the setup phase and the main tests caused startservers.check() to fail, because some processes exited with nonzero status.
Note: This is an initial stab at things, to prove out the technique. Long-term, I think we will want to use an idiom where test cases are classes that have a number of optional setup phases that may be run at e.g. 70 days prior and 5 days prior. This could help us avoid a proliferation of global state as we add more time-dependent test cases.
They used to be a millisecond, which remarkably worked most of the time.
However, some fraction of DNS requests would fail and need to be retried. Even
successful integration test runs had a number of such failures, but retries
generally saved them. However, sometimes all of the retries for a given lookup
would fail, leading to a failure of the overall lookup. This typically
manifested as an error looking up CAA, because our integration tests look up CAA
much more frequently than other record types.
This appears to fix our integration test flakiness.
- Remove error signatures from log methods. This means fewer places where errcheck will show ignored errors.
- Pull in latest cfssl to be compatible with errorless log messages.
- Reduce the number of message priorities we support to just those we actually use.
- AuditNotice -> AuditInfo
- Remove InfoObject (only one use, switched to Info)
- Remove EmergencyExit and related functions in favor of panic
- Remove SyslogWriter / AuditLogger separate types in favor of a single interface, Logger, that has all the logging methods on it.
- Merge mock log into logger. This allows us to unexport the internals but still override them in the mock.
- Shorten names to be compatible with Go style: New, Set, Get, Logger, NewMock, etc.
- Use a shorter log format for stdout logs.
- Remove "... Starting" log messages. We have better information in the "Versions" message logged at startup.
Motivation: The AuditLogger / SyslogWriter distinction was confusing and exposed internals only necessary for tests. Some components accepted one type and some accepted the other. This made it hard to consistently use mock loggers in tests. Also, the unnecessarily fat interface for AuditLogger made it hard to meaningfully mock out.
Adds a dns-01 type validation to test.js and reworks dns-test-srv to allow changing TXT record values.
Also makes some changes to how integration-test.py works in order to reduce complexity now the
ct-test-srv is working again.
Since Boulder always requests DNSSEC records, in practice DNS responses often
exceed the IP MTU.
Boulder installations expect to have a local DNS resolver, and all modern DNS
resolvers support TCP connections. Since miekg/dns does not perform an
"attempt udp, timeout, retry via tcp" approach, it's simpler and more reliable
to always use TCP for internal DNS resolution. This makes failures more
obvious as well.
Also change the integration test DNS server to TCP.
Add an easy script to build and run the Docker instance.
Update some out-of-date information in the README.
Add goose to the Docker image.
Remove unnecessary go install step from Dockerfile.
Allow dns-test-srv to return a hardcoded address other than localhost. This was
preventing a Dockerized Boulder from answering requests from a letsencrypt
client on the host.
Change allowLoopbackAddresses to allowRestrictedAddresses and make it cover all
the private IPv4 ranges. The host IP in Docker is commonly in the 172.* range.
Fix a couple of references to lets-encrypt-preview.
This was inspired by investigation into https://github.com/letsencrypt/boulder/issues/756.
To try and reproduce, I tried running Boulder inside a container, and found some
broken things.