- Fix regression from switch to GroupVersionKind from GroupKind
- New error string format: G/V/K, G/V/K
- Avoid K.G/V, which looks like but isn't R.G/V
- Converted WaitTask to use a context for timeout/cancellation, to
improve readability and reduce error cases. Now it only sends
TaskResult events from one place, removing the need for a token to
silence subsequent complete() calls.
- Add pending object tracking to WaitTask to ensure all objects are
accounted for at least one WaitEvent.
- Upgraded applier tests to use full event comparison, for better signal
on breaking changes.
- Enhanced task tests to consume all events before validating and test
actual event output.
- Add set sorting to graph.SortObjs, to make wait event ordering more
consistent, to make testing easier.
BREAKING CHANGE: wait tasks always execute after apply/prune/delete (except dry runs)
BREAKING CHANGE: wait tasks default to waiting until cancelled (previously 1m default)
This fixes a flakey test failure in the inventory policy test.
The failure was caused by the Deployment becoming reconciled
imediately after apply, before the WaitTask has started. Normally,
this should be fine, but the task runner was "de-duping" status
events without checking for generation changes. So when the status
went from NotFound (gen 1) to Current (gen 2), it was sent to the
still running ApplyTask. And when the status went from Current
(gen 2) to Current (gen 3), it was being de-duped and not sent to
the WaitTask.
Since the inventory test is the only e2e test performing an update,
it's the only one that was failing, and only sometimes.
The fix is to remove de-duping from the task runner, because the
status poller already performs its own deduping:
statusPollerRunner.isUpdatedResourceStatus.
Tests will now each have a context that is cancelled on timeout.
BeforeEach and AfterEach blocks have their own timeouts, to allow
bounded setup and cleanup.
- Test Timeout: 5m
- Before Test Timeout: 30s
- After Test Timeout: 30s
Linter complains that the expected event lists are too similar,
which is deliberate and should not be abstracted away. Explicit
test expectations in the test makes it easier to debug failures.
- WaitEvent can be Pending, Reconciled, Skipped, or Timeout.
Skipped, Pending, and Reconciled events are sent at task start.
Reconciled events are sent later as status updates are recieved.
Timeout events are sent for remaining events on timeout.
- Rewrite WaitTask.Start to use context.WithTimeout and a goroutine to
handle task completion (replacing setTimer and the token hack).
- Replaced Task.ClearTimeout with Task.Cancel.
- Replaced WaitTask.complete & checkCondition with Task.StatusUpdate.
- Replaced WaitTask.startAndComplete with a check in WaitTask.Start.
- Replaced WaitTask.amendTimeoutError with WaitTask.sendTimeoutEvents
to send multiple timeout events, instead of one event with a list of
TimedOutResources.
- Updated all printers to handle WaitEvent.
Event printer now includes reconcile events.
JSON printer now includes resourceReconciled eventType.
Table printer not includes reconcile column.
- Added JSON printer tests for error handling.
- Updated Formatter.FormatActionGroupEvent to collect WaitStats.
- Enable status events by default for kapply with table output
BREAKING CHANGE: WaitEvents now sent for each object
The Pruner modifies the input object when the deletion prevention
annotation is present. This in turn modifies the object map, which
creates a race condition when reading the object map from the task
runner. To fix the race, a copy of the object is made, but only if
the deletion prevention filter prevents deletion.
Historically, the objects in ApplyTask were being modified in the
Start func by BuildInfo to remove the path annotation, but this can
cause a race condition in the task runner if it tried to read the
task objects.
Now a DeepCopy is made in BuildInfo instead. This avoids the race
condition but broke mutation, which executes after BuildInfo on the
original object. So we now extract the object from the into after
BuildInfo and use that instead for mutations, filters, and events.
Today, when a WaitTask timeout happens, the WaitTask sends the
TimeoutError on the TaskChannel. After receiving the TimeoutError,
`baseRunner.run` terminates immediately by returning the error to its
caller (Applier.Run or Destroyer.Run). The caller then sends the error
onto the EventChannel and terminates.
With this PR, when a WaitTask timeout happens, the WaitTask sends a
WaitType Event including the TimeoutError on the EventChannel, and then
sends an empty TaskResult on the TaskChannel. An empty TaskResult
suggests that the task finished successfully, and therefore
`baseRunner.run` would continue instead of terminate.
The motivation of this change is to make sure that cli-utils only
terminates on fatal errors (such as inventory-related errors, and
ApplyOptions creation errors). A WaitTask timeout may not always mean a
fatal error (it may happen because the StatusPoller has not finished
polling everything, or some but not all the resources have not reached
the desired status), and therefore should not terminate cli-utils.