There doesn't seem to be any benefit to tidying after
getting every dependency pkg. And trying to tidy after
every get seems bugged:
* Some of the k8s.io dependencies depend on each other.
For example k8s.io/client-go depends on k8s.io/api.
* We bump k8s.io/api to a new version, and that version removes
some pkg (e.g. replaces v1alpha1 with v1alpha2), that
some other dependency (e.g. k8s.io/client-go) requires.
* If we try to tidy immediately after getting k8s.io/api, this can
fail because the required new k8s.io/api version doesn't have the
removed pkg that k8s.io/client-go (still at a lower version
since we haven't processed it yet) requires.
Getting all of the dependencies at the new versions first, and then
tidying once afterwards solves this issue.
simulator.BuildNodeInfoForNode, core_utils.GetNodeInfoFromTemplate,
and scheduler_utils.DeepCopyTemplateNode all had very similar logic
for sanitizing and copying NodeInfos. They're all consolidated to
one file in simulator, sharing common logic.
DeepCopyNodeInfo is changed to be a framework.NodeInfo method.
MixedTemplateNodeInfoProvider now correctly uses ClusterSnapshot to
correlate Nodes to scheduled pods, instead of using a live Pod lister.
This means that the snapshot now has to be properly initialized in a
bunch of tests.
This allows using errors.Is() to check if an AutoscalerError wraps
a sentinel error (e.g. cloudprovider.ErrNotImplemented) when a prefix is
added to it.
The current log message for when no container is found is very
misleading and can cause confusion.
This passes the entire VPA object into that function, in order for it to
create a log file with the relevant VPA name in it.
It kinda feels like surgery with a scalpel, any alternative approaches
would be appreciated.
RemoveNode is renamed to RemoveNodeInfo for consistency with other
NodeInfo methods.
For DRA, the snapshot will have to potentially allocate ResourceClaims
when adding a Pod to a Node, and deallocate them when removing a Pod
from a Node. This will happen in new methods added to ClusterSnapshot
in later commits - SchedulePod and UnschedulePod. These new methods
should be the "default" way of moving pods around the snapshot going
forward.
However, we'll still need to be able to add and remove pods from the
snapshot "forcefully" to handle some corner cases (e.g. expendable pods).
AddPod is renamed to ForceAddPod, and RemovePod to ForceRemovePod to
highlight that these are no longer the "default" methods of moving pods
around the snapshot, and are bypassing something important.
AddNodeInfo already provides the same functionality, and has to be used
in production code in order to propagate DRA objects correctly.
Uses in production are replaced with SetClusterState(), which will later
take DRA objects into account. Uses in the test code are replaced with
AddNodeInfo().
AddNodes() is redundant - it was indended for batch adding nodes,
with batch-specific optimizations in mind probably. However, it
has always been implemented as just iterating over AddNode(), and
is only used in test code.
Most of the uses in the test code were initializing the cluster state.
They are replaced with SetClusterState(), which will later be needed for
handling DRA anyway (we'll have to start tracking things that aren't
node- or pod-scoped). The other uses are replaced with inline loops over
AddNode().