Commit Graph

27 Commits

Author SHA1 Message Date
Piotr Betkier ac1c7b5463 use k8s.io/component-helpers/resource for pod request calculations 2025-04-22 17:36:17 +02:00
Jayant Jain 76b20e430f
Revert "Fix nil pointer exception for case when node is nil while processing …" 2023-08-04 13:29:58 +02:00
Jayant Jain e777d7962c Fix nil pointer exception for case when node is nil while processing gpuInfo 2023-08-01 12:00:10 +02:00
Hakan Bostan 2ea2fb66f6 Add "resource_name" to scaled_up_gpu_nodes_total and scaled_down_gpu_nodes_total metrics
* Added the new resource_name field to scaled_up/down_gpu_nodes_total,
  representing the resource name for the gpu.
* Changed metrics registrations to use GpuConfig
2023-02-22 10:09:45 +00:00
Hakan Bostan 1f646e4095 Add GetNodeGpuConfig to cloud provider
* Added GetNodeGpuConfig to cloud provider which returns a GpuConfig
  struct containing the gpu label, type and resource name if the node
  has a GPU.
* Added initial implementaion of the GetNodeGpuConfig to all cloud
  providers.
2023-02-14 14:08:29 +00:00
Flavian f1b6d4ded6 handle directx nodes the same as gpu nodes 2022-09-23 09:55:14 +02:00
Bartłomiej Wróblewski 1698e0e583 Separate and refactor custom resources logic 2021-04-07 10:31:11 +00:00
Maciek Pytel 655b4081f4 Migrate to klog v2 2020-06-05 17:22:26 +02:00
Julien Balestra af270b05f6 cluster-autoscaler/taints: ignore taints on existing nodes
Signed-off-by: Julien Balestra <julien.balestra@datadoghq.com>
2020-02-25 13:55:17 +01:00
Jiaxin Shan 90666881d3 Move GPULabel and GPUTypes to cloud provider 2019-03-25 13:03:01 -07:00
Łukasz Osipiuk 016bf7fc2c Use k8s.io/klog instead github.com/golang/glog 2018-11-26 17:30:31 +01:00
Łukasz Osipiuk 52aaac362f Remove GetGpuRequests function 2018-09-05 11:58:46 +02:00
Aleksandra Malinowska 800ee56b34 Refactor and extend GPU metrics error types 2018-07-05 13:13:11 +02:00
Karol Gołąb 553db2c9fc Separated errors 2018-07-05 11:30:12 +02:00
Karol Gołąb aae4d1270a Make GetGpuTypeForMetrics more robust 2018-06-26 21:35:16 +02:00
Karol Gołąb 5eb7021f82 Add GPU-related scaled_up & scaled_down metrics (#974)
* Add GPU-related scaled_up & scaled_down metrics

* Fix name to match SD naming convention

* Fix import after master rebase

* Change the logic to include GPU-being-installed nodes
2018-06-22 21:00:52 +02:00
Łukasz Osipiuk 57ea19599e Explicitly return AutoscalerError from GetNodeTargetGpus 2018-06-14 15:46:58 +02:00
Łukasz Osipiuk 087a5cc9a9 Respect GPU limits in scale_down 2018-06-13 14:19:59 +02:00
Karol Gołąb bada827839 Simplify the code by removing superfluous variable 2018-05-18 09:38:47 +02:00
Karol Gołąb f877f5a64e Remove unused error handling 2018-05-10 12:15:42 +02:00
Maciej Pytel abbc45da2e Delay scale-up including GPU request
Nodes with GPU are expensive and it's likely a bunch of pods
using them will be created in a batch. In this case we can
wait a bit for all pods to be created to make more efficient
scale-up decision.
2018-03-02 15:55:04 +01:00
Maciej Pytel d876d74912 Ignore unfitness in price expander if using GPU 2018-03-02 15:50:43 +01:00
Maciej Pytel b7f8622eb2 Create node groups with GPU in scale-up.go
This is still not implemented in cloudprovider.
Extended NewNodeGroup inteface to have a way of passing
parameters for more complex resources.
2017-12-11 13:12:22 +01:00
Maciej Pytel 6554919700 Helper function to calculate GPU requests for NAP 2017-12-11 13:12:22 +01:00
Marcin Wielgus f8c0e20ad9 Source fix after godep update 2017-11-28 14:01:43 +01:00
Maciej Pytel d81dca5991 Mark nodes with uninitialized GPUs as unready 2017-11-10 17:56:10 +01:00
Beata Skiba 2b28ac1a04 Add a workaround for scaling of VMs with GPUs
When a machine with GPU becomes ready it can take
up to 15 minutes before it reports that GPU is allocatable.
This can cause Cluster Autoscaler to trigger a second
unnecessary scale up.
The workaround sets allocatable to capacity for GPU so that
a node that waits for GPUs to become ready to use will be
considered as a place where pods requesting GPUs can be
scheduled.
2017-11-06 16:04:22 +01:00