- Add DefaultStatusWatcher that wraps DynamicClient and manages
informers for a set of resource objects.
- Supports two modes: root-scoped & namespace-scoped.
- Root-scoped mode uses root-scoped informers to efficiency and
performance.
- Namespace-scoped mode uses namespace-scoped informers to
minimize the permissions needed to run and the size of the
in-memory object cache.
- Automatic mode selects which mode to use based on whether the
objects being watched are in one or multiple namespaces.
This is the default mode, optimizing for performance.
- If CRDs are being watched, the creation/deletion of CRDs can
cause informers for those custom resources to be created/deleted.
- In namespace-scope mode, if namespaces are being watched, the
creation/deletion of namespaces can also trigger informers to
be created/deleted.
- All creates/updates/deletes to CRDs also cause RESTMapper reset.
- Allow pods to be unschedulable for 15s before reporting the
status as Failed. Any update resets the timer.
- Add BlindStatusWatcher for testing and disabling for dry-run.
- Add DynamicClusterReader that wraps DynamicClient.
This is now used to look up generated resources
(ex: Deployment > ReplicaSets > Pods).
- Add DefaultStatusReader which uses a DelegatingStatusReader to
wrap a list of conventional and specific StatusReaders.
This should make it easier to extend the list of StatusReaders.
- Move some pending WaitEvents to be optional in tests, now that
StatusWatcher can resolve their status before the WaitTask starts.
- Add a new Thousand Deployments stress test (10x kind nodes)
- Add some new logs for easier debugging
- Add internal SyncEvent so that apply/delete tasks don't start
until the StatusWatcher has finished initial synchronization.
This helps avoid missing events from actions that happen while
synchronization is incomplete.
- Filter optional pending WaitEvents when testing.
BREAKING CHANGE: Replace StatusPoller w/ StatusWatcher
BREAKING CHANGE: Remove PollInterval (obsolete with watcher)
- Events are critical to understanding progress and debugging issues
in e2e tests.
- Change events to log at level 3 instead of just level 5.
- Change the events string format to reduce verbosity.
- Hide the "Error" in the event string when there's no error.
This reduces unnecessary highlighting of lines in Prow test logs.
Previous sorting method was not stable, and only worked coincidentally
for the two use cases that were using it. This new method works on
more event types and only sorts contiguous events. This should make
the sort usable when we add parallel apply and watch instead of poll.
Event Changes:
- Renamed ActionGroupEvent.Type -> Status
- Renamed Event.Operation -> Status
- Renamed Status fields to use consistent prefixes and suffixes
- Combined Applied, Changed, Unchanged, and ServersideApplied into
ApplySuccessful
- Added Failed status for apply, prune, and delete events
- Replaced Unspecified with Pending
- Made enum String output more consistent
Printer Changes:
- Added FormatSummary to print summary stats at the end of the
apply/destroy, instead of after the last of each type of action
group.
- Modified printer output to match new more consistent events.
- Updated JSON printer docs with latest schema details.
BREAKING CHANGE: Event "operations" and "type" are now "status"
BREAKING CHANGE: JSON printer schema changed to match events
BREAKING CHANGE: Event status enums renamed/refactored
- Ex: make test-e2e-focus FOCUS=ApplyDestroy
- Rename e2e tests to be easier to copy/paste/focus without spaces.
- Reduce stress test verbosity to reduce log spam.
- Wait for kind controllers to be ready before running tests.
- Use the client-go ListPager to paginate requests made by the
CachingClusterReader used by the StatusPoller.
- Limit requests to 500 items per response (ListPager default)
- Benefits:
- Reduce the length of blocking etcd operations, allowing for more
responsiveness and higher throughput
- Reduce risk of request timeout
- Reduce size of response for extra large objects
- Rewrite actuation filters to return an error with the reason for
skipping.
- Add explicit error types for most skip errors, to make it easier to catch
and handle them.
- Add Is method to explicit error types to allow use of errors.Is for
recursive unwrapped matching.
- Rename InventoryPolicyFilter to InventoryPolicyPruneFilter for
consistency with InventoryPolicyApplyFilter
- Update deletion prevention inventory-id removal to use errors.As instead
of matching the filter name.
- Convert error structs to use pointers to allow nil errors and avoid
copying contents.
- Update printers to handle skip errors
BREAKING CHANGE: Skipped actuation events now include an error.
BREAKING CHANGE: DeleteEvent.Reason replaced with an error.
BREAKING CHANGE: Unused InventoryNamespaceInSet error removed.
BREAKING CHANGE: InventoryOverlapError replaced with PolicyPreventedActuationError.
BREAKING CHANGE: NeedAdoptionError replaced with PolicyPreventedActuationError.
BREAKING CHANGE: NoInventoryObjError & MultipleInventoryObjError now use pointers.
- Stress test tests 1,000 Namespaces, CofnigMaps, & CronTabs (CR)
- Stress test is a new test suite with its own make entrypoint
- Refactor shared test code so the e2e and stress tests can both use it
- Update test client QPS to 20 (from 5)
- StatusPolicyNone disables inventory status updates.
- StatusPolicyAll fully enables inventory status updates.
- This allows an opt-out feature for working around the problem
that adding status can make the inventory larger than the max
etcd object size, causing the applier to exit without applying
or pruning anything. With StatusPolicyNone, the user can still
safely prune objects to make their inventory smaller, and then
re-enable the status with StatusPolicyAll.
- Note: the default ConfigMap does not currently support status,
so this only affects custom inventory impls.