In 2fb247488f we consolidated the
`regModelV2` and `regModelv1` structs to one `regModel` type. In the
process we accidentally lost the explicit assignment of the
to-be-updated registration model's `LockCol` with the value of the
existing registration's `LockCol`. This meant that the Update was
occurring with a where clause `LockCol=0` (the default value).
In practice this meant that the first reg update would succeed (since
the reg row starts with LockCol=0) but any regs that had already been
updated once before would modify 0 rows in the update (because the where
clause on `LockCol` failed) and this in turn was translated into
a ServerInternal error since we knew the reg being updated did exist.
This commit updates the SA's `UpdateRegistration` function to properly
set the `LockCol` on the to-be-updated row.
This commit additionally adds an integration test for registration
contact information updating to ensure we don't fall into this trap in
the future.
Previously, CAA problems were lumped in under "ConnectionProblem" or
"Unauthorized". This should make things clearer and easier to differentiate.
Fixes#3043
Fixes#3020.
In order to write integration tests for some features, especially related to rate limiting, rechecking of CAA, and expiration of authzs, orders, and certs, we need to be able to fake the passage of time in integration tests.
To do so, this change switches out all clock.Default() instances for cmd.Clock(), which can be set manually with the FAKECLOCK environment variable. integration-test.py now starts up all servers once before the main body of tests, with FAKECLOCK set to a date 70 days ago, and does some initial setup for a new integration test case. That test case tries to fetch a 70-day-old authz URL, and expects it to 404.
In order to make this work, I also had to change a number of our test binaries to shut down cleanly in response to SIGTERM. Without that change, stopping the servers between the setup phase and the main tests caused startservers.check() to fail, because some processes exited with nonzero status.
Note: This is an initial stab at things, to prove out the technique. Long-term, I think we will want to use an idiom where test cases are classes that have a number of optional setup phases that may be run at e.g. 70 days prior and 5 days prior. This could help us avoid a proliferation of global state as we add more time-dependent test cases.
As described in Boulder issue #2800 the implementation of the SA's
`countCertificates` function meant that the renewal exemption for the
Certificates Per Domain rate limit was difficult to work with. To
maximize allotted certificates clients were required to perform all new
issuances first, followed by the "free" renewals. This arrangement was
difficult to coordinate.
In this PR `countCertificates` is updated such that renewals are
excluded from the count reliably. To do so the SA takes the serials it
finds for a given domain from the issuedNames table and cross references
them with the FQDN sets it can find for the associated serials. With the
FQDN sets a second query is done to find all the non-renewal FQDN sets
for the serials, giving a count of the total non-renewal issuances to
use for rate limiting.
Resolves#2800
The `submissions_b` count in the integration test `test_ct_submission` function was being populated initially by using `url_a` when it _should_ be initialized using `url_b` since it's the count of submissions to log b.
This resolves https://github.com/letsencrypt/boulder/issues/2723
I tested this fix with a branch that ran this test 12 times per build. Prior to this fix multiple builds out of 20 (~4-5) would fail. With this fix, all 20 passed.
In 18f4c5c we introduced a workaround for the CT submission integration
test to allow exactly expected, or twice as many CT log submissions as
expected to account for the case where the ocsp-updater and the CA race.
This didn't completely patch over the issue because the number of
submissions can fall between `n` and `2n`.
This commit updates the hack to be even hackier (twice as hacky or your
money back). Now we consider any value *between* `n` and `2n` as a test
pass.
Instead of running it at the current time to clean out left over cruft run it with a FAKECLOCK of +1 year so that we catch everything that could get in the way.
Fixes#2148.
Instead of just doing a blanket `DELETE FROM ...` this changes the `expired-authz-purger` to select all of the expired IDs (for both pending and finalized authorizations) then loop over them deleting each and its associated challenges from their respective tables.
Local testing indicates the performance of this is not awful but we should do a test run on staging to verify. If it ends up taking way too long to run there the easiest optimization would be to turn the slice of IDs into a channel and run multiple workers looping over the channel deleting stuff instead of just a single one.
Makes a few small integration test changes in order to facilitate deleting both pending and finalized authorizations.
Presently the CA and the ocsp-updater can race on the initial
submission of a certificate to the configured logs. This results
in double submitting certificates. In integration tests with the fake CT
server this manifests as an occasional failure of the
`test_ct_submission` test (Issue #2579).
The race we currently experience is expected to be fixed in
the future by a planned redesign so for now this commit works around the
failure by allowing either the expected number of submissions, or
exactly double the expected. This fixes#2579. The need to fix the
underlying race was captured in #2610.
The workaround was verified by submitting 10 builds to travis, all
succeeded.
Simply rely on exceptions from check_output.
Also, factor out common params for check_output into a `run` helper function.
Makes sure we always capture stderr into stdout.
This PR has three primary contributions:
1. The existing code for using the V4 safe browsing API introduced in #2446 had some bugs that are fixed in this PR.
2. A gsb-test-srv is added to provide a mock Google Safebrowsing V4 server for integration testing purposes.
3. A short integration test is added to test end-to-end GSB lookup for an "unsafe" domain.
For 1) most notably Boulder was assuming the new V4 library accepted a directory for its database persistence when it instead expects an existing file to be provided. Additionally the VA wasn't properly instantiating feature flags preventing the V4 api from being used by the VA.
For 2) the test server is designed to have a fixed set of "bad" domains (Currently just honest.achmeds.discount.hosting.com). When asked for a database update by a client it will package the list of bad domains up & send them to the client. When the client is asked to do a URL lookup it will check the local database for a matching prefix, and if found, perform a lookup against the test server. The test server will process the lookup and increment a count for how many times the bad domain was asked about.
For 3) the Boulder startservers.py was updated to start the gsb-test-srv and the VA is configured to talk to it using the V4 API. The integration test consists of attempting issuance for a domain pre-configured in the gsb-test-srv as a bad domain. If the issuance succeeds we know the GSB lookup code is faulty. If the issuance fails, we check that the gsb-test-srv received the correct number of lookups for the "bad" domain and fail if the expected isn't reality.
Notes for reviewers:
* The gsb-test-srv has to be started before anything will use it. Right now the v4 library handles database update request failures poorly and will not retry for 30min. See google/safebrowsing#44 for more information.
* There's not an easy way to test for "good" domain lookups, only hits against the list. The design of the V4 API is such that a list of prefixes is delivered to the client in the db update phase and if the domain in question matches no prefixes then the lookup is deemed unneccesary and not performed. I experimented with sending 256 1 byte prefixes to try and trick the client to always do a lookup, but the min prefix size is 4 bytes and enumerating all possible prefixes seemed gross.
* The test server has a /add endpoint that could be used by integration tests to add new domains to the block list, but it isn't being used presently. The trouble is that the client only updates its database every 30 minutes at present, and so adding a new domain will only take affect after the client updates the database.
Resolves#2448
Add a new tiny client called chisel, in place of test.js. This reduces the
number of language runtimes Boulder depends on for its tests. Also, since chisel
uses the acme Python library, we get more testing of that library, which
underlies Certbot. This also gives us more flexibility to hook different parts
of the issuance flows in our tests.
Reorganize integration-test.py itself. There was not clear separation of
specific test cases. Some test cases were added as part of run_node_test; some
were wrapped around it. There is now much closer to one function per test case.
Eventually we may be able to adopt Python's test infrastructure for these test
cases.
Remove some unused imports; consolidate on urllib2 instead of urllib.
For getting serial number and expiration date, replace shelling out to OpenSSL
with using pyOpenSSL, since we already have an in-memory parsed certificate.
Replace ISSUANCE_FAILED, REVOCATION_FAILED, MAILER_FAILED with simple die, since
we don't use these. Later, I'd like to remove the other specific exit codes. We
don't make very good use of them, and it would be more effective to just use
stack traces or, even better, reporting of which test cases failed.
Make single_ocsp_sign responsible for its own subprocess lifecycle.
Skip running startservers if WFE is already running, to make it easier to
iterate against a running Boulder (saves a few seconds of Boulder startup).
We turn arrays into maps with a range command. Previously, we were taking the
address of the iteration variable in that range command, which meant incorrect
results since the iteration variable gets reassigned.
Also change the integration test to catch this error.
Fixes#2496
Previously, the expiration-mailer would always run with the default config, even
if BOULDER_CONFIG_DIR was used to point at config-next. This led to missing some
config parse problems that should have been test failures.
Use labels ending in _key for private key labels.
Create two separate slots in make-softhsm rather than overwriting a single slot.
Update make-softhsm instructions to point out both files to edit.
Improve error output from integation test.
Output base64-encoded DER, as expected by ocsp-responder.
Use flags instead of template for Status, ThisUpdate, NextUpdate.
Provide better help.
Remove old test (wasn't run automatically).
Add it to integration test, and use its output for integration test of issuer ocsp-responder.
Add another slot to boulder-tools HSM image, to store root key.
Under the new defaults, if the syslog section is missing, we'll use the default
config that we use in prod: no logs to stdout, INFO and below to syslog.
This allows us to remove the syslog section from prod configs, and potentially
move it to individual service configs in the future.
* Improve syslog defaults.
* Add stdout logging for purger test.
* Use plain int for sysloglevel.
* Fix JSON syntax
* Fix syslog config for expired-authz-purger.
https://github.com/letsencrypt/boulder/pull/1932
* Add cmd/expired-authz-purger with integration test
* Return count
* gofmt >.>
* add to boulder-config-next.json
* Commit missing file
* Exec on the dbMap
* fprintf the error message
* Review fixes + test
* Review fixes pt. 1
* Review fixes pt. 2 (actually add test file this time :|)
* Fix prompt
* Switch to using flag lib
* Use COUNT(1)
* Revert config -> flag stuff
* Review fixes
* Revert-revert COUNT(1) change
* Review fixes pt. 1
* Nest config struct
* Test review fixes
* Factor out getting future output with FAKECLOCK
* Review fixes pt. 2
* Review fixes pt. 3
This creates a new server, 'mail-test-srv', which is a simplistic SMTP
server that accepts mail and can report the received mail over HTTP.
An integration test is added that uses the new server to test the expiry
mailer.
The FAKECLOCK environment variable is used to force the expiry mailer to
think that the just-issued certificate is about to expire.
Additionally, the expiry mailer is modified to cleanly shut down its
SMTP connections.
Adds a dns-01 type validation to test.js and reworks dns-test-srv to allow changing TXT record values.
Also makes some changes to how integration-test.py works in order to reduce complexity now the
ct-test-srv is working again.
Fixes https://github.com/letsencrypt/boulder/issues/1212.
This exposes a new constructor in amqp-rpc.go specifically for ActivityMonitor,
which overrides the normal routingKey to be the wildcard "#".
It also adds an expvar for the number of messages processed in ActivityMonitor,
and adds an integration test case that checks that ActivityMonitor has received
more than zero messages.
In https://github.com/letsencrypt/boulder/pull/1110 we put
the activate command in the wrong place so it didn't run if
LETSENCRYPT_PATH was set.
Also remove SIMPLE_HTTP_PORT which is no longer necessary. It was used to keep
the build passing as the client transitioned ports. The client now defaults to
5002.
This gets us closer to allowing the client repo to use
integration-test.py. They have a different path without "venv" in it for
their virtualenv set up.
Updates #1101