The refactor to include `Apply` and `ApplyWithPrune` interface methods
in `Storage` caused a regression where status is not alway applied. This
commit ensures that the status is always applied.
This commit introduces 2 new Storage interface methods to enable clients
to implement their own logic for applying inventory objects to the live
cluster.
- Don't error when a delete returns NotFound.
This means the object was deleted asynchronously by another client
between the initial GET for task scheduling and the delete task itself.
Log a warning, because it's possible the deletion may have violated
dependency ordering.
- Update apply and delte task logs to only log errors if the verbosity
is greater than -v=4. This should help reduce duplicate log messages.
- Duplicate the 1,000 Deployment test, but use a 1m reconcile timeout in
a retry loop.
- This verifies that the applier and destroyer are re-entrant at scale.
- This matches previous StatusPoller behavior which would error
and exit if there was a 403 Forbidden error from the apiserver.
- Handle status error before synchronization with immediate exit
- Add DefaultStatusWatcher that wraps DynamicClient and manages
informers for a set of resource objects.
- Supports two modes: root-scoped & namespace-scoped.
- Root-scoped mode uses root-scoped informers to efficiency and
performance.
- Namespace-scoped mode uses namespace-scoped informers to
minimize the permissions needed to run and the size of the
in-memory object cache.
- Automatic mode selects which mode to use based on whether the
objects being watched are in one or multiple namespaces.
This is the default mode, optimizing for performance.
- If CRDs are being watched, the creation/deletion of CRDs can
cause informers for those custom resources to be created/deleted.
- In namespace-scope mode, if namespaces are being watched, the
creation/deletion of namespaces can also trigger informers to
be created/deleted.
- All creates/updates/deletes to CRDs also cause RESTMapper reset.
- Allow pods to be unschedulable for 15s before reporting the
status as Failed. Any update resets the timer.
- Add BlindStatusWatcher for testing and disabling for dry-run.
- Add DynamicClusterReader that wraps DynamicClient.
This is now used to look up generated resources
(ex: Deployment > ReplicaSets > Pods).
- Add DefaultStatusReader which uses a DelegatingStatusReader to
wrap a list of conventional and specific StatusReaders.
This should make it easier to extend the list of StatusReaders.
- Move some pending WaitEvents to be optional in tests, now that
StatusWatcher can resolve their status before the WaitTask starts.
- Add a new Thousand Deployments stress test (10x kind nodes)
- Add some new logs for easier debugging
- Add internal SyncEvent so that apply/delete tasks don't start
until the StatusWatcher has finished initial synchronization.
This helps avoid missing events from actions that happen while
synchronization is incomplete.
- Filter optional pending WaitEvents when testing.
BREAKING CHANGE: Replace StatusPoller w/ StatusWatcher
BREAKING CHANGE: Remove PollInterval (obsolete with watcher)
- Events are critical to understanding progress and debugging issues
in e2e tests.
- Change events to log at level 3 instead of just level 5.
- Change the events string format to reduce verbosity.
- Hide the "Error" in the event string when there's no error.
This reduces unnecessary highlighting of lines in Prow test logs.
Previous sorting method was not stable, and only worked coincidentally
for the two use cases that were using it. This new method works on
more event types and only sorts contiguous events. This should make
the sort usable when we add parallel apply and watch instead of poll.
Event Changes:
- Renamed ActionGroupEvent.Type -> Status
- Renamed Event.Operation -> Status
- Renamed Status fields to use consistent prefixes and suffixes
- Combined Applied, Changed, Unchanged, and ServersideApplied into
ApplySuccessful
- Added Failed status for apply, prune, and delete events
- Replaced Unspecified with Pending
- Made enum String output more consistent
Printer Changes:
- Added FormatSummary to print summary stats at the end of the
apply/destroy, instead of after the last of each type of action
group.
- Modified printer output to match new more consistent events.
- Updated JSON printer docs with latest schema details.
BREAKING CHANGE: Event "operations" and "type" are now "status"
BREAKING CHANGE: JSON printer schema changed to match events
BREAKING CHANGE: Event status enums renamed/refactored
- Ex: make test-e2e-focus FOCUS=ApplyDestroy
- Rename e2e tests to be easier to copy/paste/focus without spaces.
- Reduce stress test verbosity to reduce log spam.
- Wait for kind controllers to be ready before running tests.