- Add DefaultStatusWatcher that wraps DynamicClient and manages
informers for a set of resource objects.
- Supports two modes: root-scoped & namespace-scoped.
- Root-scoped mode uses root-scoped informers to efficiency and
performance.
- Namespace-scoped mode uses namespace-scoped informers to
minimize the permissions needed to run and the size of the
in-memory object cache.
- Automatic mode selects which mode to use based on whether the
objects being watched are in one or multiple namespaces.
This is the default mode, optimizing for performance.
- If CRDs are being watched, the creation/deletion of CRDs can
cause informers for those custom resources to be created/deleted.
- In namespace-scope mode, if namespaces are being watched, the
creation/deletion of namespaces can also trigger informers to
be created/deleted.
- All creates/updates/deletes to CRDs also cause RESTMapper reset.
- Allow pods to be unschedulable for 15s before reporting the
status as Failed. Any update resets the timer.
- Add BlindStatusWatcher for testing and disabling for dry-run.
- Add DynamicClusterReader that wraps DynamicClient.
This is now used to look up generated resources
(ex: Deployment > ReplicaSets > Pods).
- Add DefaultStatusReader which uses a DelegatingStatusReader to
wrap a list of conventional and specific StatusReaders.
This should make it easier to extend the list of StatusReaders.
- Move some pending WaitEvents to be optional in tests, now that
StatusWatcher can resolve their status before the WaitTask starts.
- Add a new Thousand Deployments stress test (10x kind nodes)
- Add some new logs for easier debugging
- Add internal SyncEvent so that apply/delete tasks don't start
until the StatusWatcher has finished initial synchronization.
This helps avoid missing events from actions that happen while
synchronization is incomplete.
- Filter optional pending WaitEvents when testing.
BREAKING CHANGE: Replace StatusPoller w/ StatusWatcher
BREAKING CHANGE: Remove PollInterval (obsolete with watcher)
Event Changes:
- Renamed ActionGroupEvent.Type -> Status
- Renamed Event.Operation -> Status
- Renamed Status fields to use consistent prefixes and suffixes
- Combined Applied, Changed, Unchanged, and ServersideApplied into
ApplySuccessful
- Added Failed status for apply, prune, and delete events
- Replaced Unspecified with Pending
- Made enum String output more consistent
Printer Changes:
- Added FormatSummary to print summary stats at the end of the
apply/destroy, instead of after the last of each type of action
group.
- Modified printer output to match new more consistent events.
- Updated JSON printer docs with latest schema details.
BREAKING CHANGE: Event "operations" and "type" are now "status"
BREAKING CHANGE: JSON printer schema changed to match events
BREAKING CHANGE: Event status enums renamed/refactored
- Add a new Inventory KRM object for storing the spec and status
of the inventory objects in memory.
- Improve reconcile, apply, & delete status tracking in the
TaskContext/Inventory to cover all possible statuses
- Move most of the convenience methods from the TaskContext into a
new inventory.Manager.
- Fix a minor bug where object UID might have drifted (delete &
recreate) between GET and DELETE.
- Make FakeRESTMapper comparable
- Add Comparer for WaitTask
- Add Asserter to allow pre-configuring comparison options for Equal/NotEqual
- Update solver tests to use Asserter for task list comparison (more actionable errors)
- Converted WaitTask to use a context for timeout/cancellation, to
improve readability and reduce error cases. Now it only sends
TaskResult events from one place, removing the need for a token to
silence subsequent complete() calls.
- Add pending object tracking to WaitTask to ensure all objects are
accounted for at least one WaitEvent.
- Upgraded applier tests to use full event comparison, for better signal
on breaking changes.
- Enhanced task tests to consume all events before validating and test
actual event output.
- Add set sorting to graph.SortObjs, to make wait event ordering more
consistent, to make testing easier.
BREAKING CHANGE: wait tasks always execute after apply/prune/delete (except dry runs)
BREAKING CHANGE: wait tasks default to waiting until cancelled (previously 1m default)
- WaitEvent can be Pending, Reconciled, Skipped, or Timeout.
Skipped, Pending, and Reconciled events are sent at task start.
Reconciled events are sent later as status updates are recieved.
Timeout events are sent for remaining events on timeout.
- Rewrite WaitTask.Start to use context.WithTimeout and a goroutine to
handle task completion (replacing setTimer and the token hack).
- Replaced Task.ClearTimeout with Task.Cancel.
- Replaced WaitTask.complete & checkCondition with Task.StatusUpdate.
- Replaced WaitTask.startAndComplete with a check in WaitTask.Start.
- Replaced WaitTask.amendTimeoutError with WaitTask.sendTimeoutEvents
to send multiple timeout events, instead of one event with a list of
TimedOutResources.
- Updated all printers to handle WaitEvent.
Event printer now includes reconcile events.
JSON printer now includes resourceReconciled eventType.
Table printer not includes reconcile column.
- Added JSON printer tests for error handling.
- Updated Formatter.FormatActionGroupEvent to collect WaitStats.
- Enable status events by default for kapply with table output
BREAKING CHANGE: WaitEvents now sent for each object
Today, when a WaitTask timeout happens, the WaitTask sends the
TimeoutError on the TaskChannel. After receiving the TimeoutError,
`baseRunner.run` terminates immediately by returning the error to its
caller (Applier.Run or Destroyer.Run). The caller then sends the error
onto the EventChannel and terminates.
With this PR, when a WaitTask timeout happens, the WaitTask sends a
WaitType Event including the TimeoutError on the EventChannel, and then
sends an empty TaskResult on the TaskChannel. An empty TaskResult
suggests that the task finished successfully, and therefore
`baseRunner.run` would continue instead of terminate.
The motivation of this change is to make sure that cli-utils only
terminates on fatal errors (such as inventory-related errors, and
ApplyOptions creation errors). A WaitTask timeout may not always mean a
fatal error (it may happen because the StatusPoller has not finished
polling everything, or some but not all the resources have not reached
the desired status), and therefore should not terminate cli-utils.
- Added SkippedApplies and SkippedDeletes to the TaskContext
- Modified tasks to use the new skipped tracking, replacing usage
of failure tracking, where skipped is more accurate.
- Renamed some TaskContext methods for consistency
- Added ObjMetadataSetFromMap for use by TaskContext
- Added ObjectMetadataSet.Intersection for use by InvSetTask
- Cleaned up InvSetTask to be more readable with comments explaining
intended behavior, including handling of skips.
- Added apply and prune tests for skipped, failure, and abandoned
- Refactor usages of []ObjMetadata to use ObjMetadataSet
- Move Union, Diff, Contains, Hash, Remove, and Equal into
ObjMetadataSet
- Add ToStringMap and FromStringMap for inventory serialization
- Make ResourceCache thread-safe
- Make ResourceCache store status and messagei
- Add ResourceCache to baseRunner and TaskContext
- Make Mutator compute resource status for uncached resources
- Share cache between StatusPoller and Mutator
- Move Condition and conditionMet() to its own file
- Simplify WaitTask.checkCondition
- Simplify baseRunner.amendTimeoutError