Commit Graph

28 Commits

Author SHA1 Message Date
Karl Isenberg c46949360e feat: replace StatusPoller w/ StatusWatcher
- Add DefaultStatusWatcher that wraps DynamicClient and manages
  informers for a set of resource objects.
  - Supports two modes: root-scoped & namespace-scoped.
  - Root-scoped mode uses root-scoped informers to efficiency and
    performance.
  - Namespace-scoped mode uses namespace-scoped informers to
    minimize the permissions needed to run and the size of the
    in-memory object cache.
  - Automatic mode selects which mode to use based on whether the
    objects being watched are in one or multiple namespaces.
    This is the default mode, optimizing for performance.
  - If CRDs are being watched, the creation/deletion of CRDs can
    cause informers for those custom resources to be created/deleted.
  - In namespace-scope mode, if namespaces are being watched, the
    creation/deletion of namespaces can also trigger informers to
    be created/deleted.
  - All creates/updates/deletes to CRDs also cause RESTMapper reset.
  - Allow pods to be unschedulable for 15s before reporting the
    status as Failed. Any update resets the timer.
- Add BlindStatusWatcher for testing and disabling for dry-run.
- Add DynamicClusterReader that wraps DynamicClient.
  This is now used to look up generated resources
  (ex: Deployment > ReplicaSets > Pods).
- Add DefaultStatusReader which uses a DelegatingStatusReader to
  wrap a list of conventional and specific StatusReaders.
  This should make it easier to extend the list of StatusReaders.
- Move some pending WaitEvents to be optional in tests, now that
  StatusWatcher can resolve their status before the WaitTask starts.
- Add a new Thousand Deployments stress test (10x kind nodes)
- Add some new logs for easier debugging
- Add internal SyncEvent so that apply/delete tasks don't start
  until the StatusWatcher has finished initial synchronization.
  This helps avoid missing events from actions that happen while
  synchronization is incomplete.
- Filter optional pending WaitEvents when testing.

BREAKING CHANGE: Replace StatusPoller w/ StatusWatcher
BREAKING CHANGE: Remove PollInterval (obsolete with watcher)
2022-05-10 10:40:05 -07:00
Karl Isenberg 393ecfe7a5 feat: improve event status consistency
Event Changes:
- Renamed ActionGroupEvent.Type -> Status
- Renamed Event.Operation -> Status
- Renamed Status fields to use consistent prefixes and suffixes
- Combined Applied, Changed, Unchanged, and ServersideApplied into
  ApplySuccessful
- Added Failed status for apply, prune, and delete events
- Replaced Unspecified with Pending
- Made enum String output more consistent

Printer Changes:
- Added FormatSummary to print summary stats at the end of the
  apply/destroy, instead of after the last of each type of action
  group.
- Modified printer output to match new more consistent events.
- Updated JSON printer docs with latest schema details.

BREAKING CHANGE: Event "operations" and "type" are now "status"
BREAKING CHANGE: JSON printer schema changed to match events
BREAKING CHANGE: Event status enums renamed/refactored
2022-04-14 01:14:10 -07:00
Karl Isenberg 98d3504e3d fix: Avoid logging UID error for skipped apply 2022-03-05 16:12:20 -08:00
Karl Isenberg 412341fa6f fix: Handle async object replacement
Fixes: https://github.com/kubernetes-sigs/cli-utils/issues/527
2022-02-10 12:50:14 -08:00
Karl Isenberg 5c095e8d66 chore: Add Inventory to TaskContext
- Add a new Inventory KRM object for storing the spec and status
  of the inventory objects in memory.
- Improve reconcile, apply, & delete status tracking in the
  TaskContext/Inventory to cover all possible statuses
- Move most of the convenience methods from the TaskContext into a
  new inventory.Manager.
- Fix a minor bug where object UID might have drifted (delete &
  recreate) between GET and DELETE.
2022-02-03 11:16:54 -08:00
Mikhail Mazurskiy 7b398fa33a
fix: remove reflection hack
Reset RESTMapper using the new API rather than using reflection.
2022-01-23 14:46:39 +11:00
Karl Isenberg 6536f948a8 fix: Improve solver tests
- Make FakeRESTMapper comparable
- Add Comparer for WaitTask
- Add Asserter to allow pre-configuring comparison options for Equal/NotEqual
- Update solver tests to use Asserter for task list comparison (more actionable errors)
2022-01-21 11:12:44 -08:00
Morten Torkildsen 703535fa6d Avoid waiting for stalled resources 2021-12-02 09:17:44 -08:00
Karl Isenberg c5636c0243 feat: Always wait for reconciliation
- Converted WaitTask to use a context for timeout/cancellation, to
  improve readability and reduce error cases. Now it only sends
  TaskResult events from one place, removing the need for a token to
  silence subsequent complete() calls.
- Add pending object tracking to WaitTask to ensure all objects are
  accounted for at least one WaitEvent.
- Upgraded applier tests to use full event comparison, for better signal
  on breaking changes.
- Enhanced task tests to consume all events before validating and test
  actual event output.
- Add set sorting to graph.SortObjs, to make wait event ordering more
  consistent, to make testing easier.

BREAKING CHANGE: wait tasks always execute after apply/prune/delete (except dry runs)
BREAKING CHANGE: wait tasks default to waiting until cancelled (previously 1m default)
2021-11-10 12:34:54 -08:00
Karl Isenberg ab5a4dc294 fix: Improve task & event logging
- Log tasks at start and end
- Log events before sending
- Add String funcs for readable event logs
2021-11-09 16:43:25 -08:00
Karl Isenberg a24aaea775 feat: send WaitEvent for every resource
- WaitEvent can be Pending, Reconciled, Skipped, or Timeout.
  Skipped, Pending, and Reconciled events are sent at task start.
  Reconciled events are sent later as status updates are recieved.
  Timeout events are sent for remaining events on timeout.
- Rewrite WaitTask.Start to use context.WithTimeout and a goroutine to
  handle task completion (replacing setTimer and the token hack).
- Replaced Task.ClearTimeout with Task.Cancel.
- Replaced WaitTask.complete & checkCondition with Task.StatusUpdate.
- Replaced WaitTask.startAndComplete with a check in WaitTask.Start.
- Replaced WaitTask.amendTimeoutError with WaitTask.sendTimeoutEvents
  to send multiple timeout events, instead of one event with a list of
  TimedOutResources.
- Updated all printers to handle WaitEvent.
  Event printer now includes reconcile events.
  JSON printer now includes resourceReconciled eventType.
  Table printer not includes reconcile column.
- Added JSON printer tests for error handling.
- Updated Formatter.FormatActionGroupEvent to collect WaitStats.
- Enable status events by default for kapply with table output

BREAKING CHANGE: WaitEvents now sent for each object
2021-11-04 10:57:35 -07:00
Haiyan Meng 87877d28d1 Change the behavior of hanlding a WaitTask timeout
Today, when a WaitTask timeout happens, the WaitTask sends the
TimeoutError on the TaskChannel. After receiving the TimeoutError,
`baseRunner.run` terminates immediately by returning the error to its
caller (Applier.Run or Destroyer.Run). The caller then sends the error
onto the EventChannel and terminates.

With this PR, when a WaitTask timeout happens, the WaitTask sends a
WaitType Event including the TimeoutError on the EventChannel, and then
sends an empty TaskResult on the TaskChannel. An empty TaskResult
suggests that the task finished successfully, and therefore
`baseRunner.run` would continue instead of terminate.

The motivation of this change is to make sure that cli-utils only
terminates on fatal errors (such as inventory-related errors, and
ApplyOptions creation errors). A WaitTask timeout may not always mean a
fatal error (it may happen because the StatusPoller has not finished
polling everything, or some but not all the resources have not reached
the desired status), and therefore should not terminate cli-utils.
2021-10-29 16:18:21 -07:00
Karl Isenberg eda3554fb1 fix: skipped deletes no longer cause waiting
- Added SkippedApplies and SkippedDeletes to the TaskContext
- Modified tasks to use the new skipped tracking, replacing usage
  of failure tracking, where skipped is more accurate.
- Renamed some TaskContext methods for consistency
- Added ObjMetadataSetFromMap for use by TaskContext
- Added ObjectMetadataSet.Intersection for use by InvSetTask
- Cleaned up InvSetTask to be more readable with comments explaining
  intended behavior, including handling of skips.
- Added apply and prune tests for skipped, failure, and abandoned
2021-10-29 10:28:06 -07:00
Mikhail Mazurskiy 7333873c91
Drop k8s.io/apiextensions-apiserver dependency 2021-10-11 21:27:48 +11:00
Karl Isenberg d83ce93efd Add ObjMetadataSet to encapsulate set functions
- Refactor usages of []ObjMetadata to use ObjMetadataSet
- Move Union, Diff, Contains, Hash, Remove, and Equal into
  ObjMetadataSet
- Add ToStringMap and FromStringMap for inventory serialization
2021-10-06 14:25:12 -07:00
Karl Isenberg d964b0397b Merge Collector and ResourceCache
- Make ResourceCache thread-safe
- Make ResourceCache store status and messagei
- Add ResourceCache to baseRunner and TaskContext
- Make Mutator compute resource status for uncached resources
- Share cache between StatusPoller and Mutator
- Move Condition and conditionMet() to its own file
- Simplify WaitTask.checkCondition
- Simplify baseRunner.amendTimeoutError
2021-09-30 12:01:19 -07:00
Sean Sullivan 7b512edd67 No wait checking for resource failures or filters 2021-07-01 17:36:25 -07:00
Sean Sullivan afa0ba9c46 Log statement at start of tasks 2021-06-25 11:41:39 -07:00
Morten Torkildsen ae80e561e2 Improve the event hierarchy 2021-05-18 14:58:41 -07:00
Jingfang Liu a6bb8c2150 ingore the failed resources in wait task 2020-12-15 09:28:11 -08:00
Morten Torkildsen 202621ee54 Update to the latest version of golint 2020-10-04 18:11:56 -07:00
Morten Torkildsen 034d5c3456 Use the TaskContext for passing UIDs of applied resources to the pruner 2020-09-25 14:25:13 -07:00
Morten Torkildsen 9135af4218 Improve error handling 2020-08-16 20:10:17 -07:00
Morten Torkildsen dd5ae818cb Print more info on wait timeout 2020-05-03 14:52:30 -07:00
Morten Torkildsen b494b8e33b Use resource generation in status decisions 2020-04-21 20:43:41 -07:00
Morten Torkildsen 260f419d8b Add TaskContext to be passed between tasks 2020-04-17 14:04:58 -07:00
Morten Torkildsen 060d1c1df6 Fix some synchronization issues 2020-03-25 21:32:21 -07:00
Morten Torkildsen a8a3e0d034 Implement the applier as a queue of tasks 2020-03-23 14:47:40 -07:00