Kubernetes Prow Robot
f6a36bfc42
Merge pull request #14194 from jandersen-plaid/jandersen-plaid-exit-first-error
...
Exit rolling updates when encountering specific errors
2023-01-09 23:59:25 -08:00
John Gardiner Myers
c68be498c6
Refactor NewAssetBuilder to not take a Cluster
2023-01-01 13:37:52 -08:00
justinsb
90cbf75584
Context threading: more wiring
...
We're aiming to use this for testing immediately and better
logging/tracing in future, but to make the changes manageable breaking
them into a smaller series that don't directly achieve much.
2022-12-22 17:52:22 -05:00
Jack Andersen
89dfafefe7
Make struct members private, alter formatting, add unwrap method
...
Signed-off-by: Jack Andersen <jandersen@plaid.com>
2022-12-21 09:30:19 -08:00
Jack Andersen
66fe8e8118
Move results insert to original location to reduce diff
...
Signed-off-by: Jack Andersen <jandersen@plaid.com>
2022-12-21 09:30:18 -08:00
Jack Andersen
dfd9516a4f
Continue to log if an error is encountered, separate the exit check
2022-12-21 09:30:18 -08:00
Jack Andersen
f5f71f17f9
Satisfy the Is interface with ValidationTimeoutError and change callers of err check
...
Signed-off-by: Jack Andersen <jandersen@plaid.com>
2022-12-21 09:30:17 -08:00
jandersen-plaid
4eb455c6b9
Update pkg/instancegroups/rollingupdate.go
...
Co-authored-by: Ole Markus With <olemarkus@gmail.com>
2022-12-21 09:30:16 -08:00
Jack Andersen
2bd5403f37
Create a specific error type for validation timeouts and classify as exitable
...
Signed-off-by: Jack Andersen <jandersen@plaid.com>
2022-12-21 09:30:16 -08:00
Jack Andersen
6efd68f428
Remove optionality and exit when specific error prefix is matched
...
Signed-off-by: Jack Andersen <jandersen@plaid.com>
2022-12-21 09:30:14 -08:00
Jack Andersen
f9ea9b3ef8
Add a flag to rolling update to fail immediately on IG error
...
Signed-off-by: Jack Andersen <jandersen@plaid.com>
2022-12-21 09:30:13 -08:00
John Gardiner Myers
235aa61594
v1alpha3: move networking fields under networking
2022-12-02 19:19:59 -08:00
John Gardiner Myers
de9055b588
Update control-plane terminology in CLI output strings
2022-11-23 21:32:10 -08:00
John Gardiner Myers
d39ba74bd7
Change the control-plane IG role to "ControlPlane" in v1alpha3 API
2022-11-22 17:05:29 -08:00
Ciprian Hacman
8f79c9bd68
Replace fi.Bool/Float*/Int*/String() with fi.PtrTo()
2022-11-19 03:45:22 +02:00
John Gardiner Myers
64be690211
Update TopologySpec for v1alpha3 API
2022-11-06 09:10:38 -08:00
Ole Markus With
b45968c992
Log and aggregate errors from rolling update
...
Rather than just returning the error from the first failing IG
2022-10-20 20:04:18 +02:00
Ole Markus With
a5b1722110
Ensure kOps doesn't surge on karpenter IGs
2022-10-17 15:22:39 +02:00
justinsb
4b2f773748
rolling-update: don't deregister our only apiserver
...
If we do, we can't drain the node afterwards. We also are going to
have dropped connections in this case anyway.
2022-09-15 09:16:57 -04:00
Ole Markus With
1ea5243406
Warm pool-enabled ASGs scaled to zero will no longer panic
2022-09-09 11:08:00 +02:00
Ciprian Hacman
cb99db0757
Run make goimports
2022-08-17 07:03:33 +03:00
Ole Markus With
c260cf69b3
Log errors from detachInstance
2022-06-27 19:58:16 +02:00
Rémy Léone
80d2d53643
fix tenv linter
2022-06-15 18:06:28 +02:00
Ciprian Hacman
b5f14b589b
Add initial support for Hetzner Cloud
2022-05-09 06:12:15 +03:00
Ole Markus With
ce2e877aeb
Remove bazel files from vendor
2022-04-12 13:29:03 +02:00
Ole Markus With
982463683d
Remove checks that doesn't work when we do not delete the node object
2022-03-06 07:34:52 +01:00
Ole Markus With
2ba9c1670f
Only delete node object on GCE
2022-03-06 07:34:52 +01:00
John Gardiner Myers
cac727c357
Make cloudProvider a struct in v1alpha3 API
2022-03-02 21:59:49 -08:00
John Gardiner Myers
70f7d9bdb2
Use function to get cloud provider from cluster spec
2022-03-02 21:59:47 -08:00
Bronson Mirafuentes
86b0ef0d0c
add drain-timeout flag to rolling-update cluster
2022-01-20 14:05:55 -08:00
Jesse Haka
b88d110f58
Drain OpenStack loadbalancers
2021-12-31 13:16:02 +02:00
Ole Markus With
5e944f1a15
Do not try to detach karpenter nodes from ASGs
2021-12-15 09:56:33 +01:00
Ole Markus With
b785965c50
Rename InstanceManager to Manager
2021-12-13 09:14:24 +01:00
Ole Markus With
1ccb7840ac
make rolling update work
2021-12-12 19:33:41 +01:00
Ciprian Hacman
ea7df00719
Run hack/update-gofmt.sh
2021-12-01 22:39:50 +02:00
Kubernetes Prow Robot
ec7fe88868
Merge pull request #12730 from johngmyers/fix-deprecated
...
Fix use of deprecated method
2021-11-13 23:22:46 -08:00
John Gardiner Myers
c5914d6ddb
Fix use of deprecated method
2021-11-13 20:29:52 -08:00
John Gardiner Myers
4396270d74
Fix out of bounds error when instance detach fails
2021-11-08 23:00:28 -08:00
John Gardiner Myers
d935a419f8
Simplify AddSSHPublicKey() interface
2021-07-24 08:59:57 -07:00
John Gardiner Myers
e0915887ed
Move asset copying out of apply_cluster
2021-06-05 21:17:50 -07:00
John Gardiner Myers
d46ee9c883
Exclude nodes from load balancers upon cordoning
2021-04-20 17:58:26 -07:00
Ole Markus With
09615935fd
Make kOps CLI handle ASG warm pools
2021-04-15 11:10:23 +02:00
Ole Markus With
ab1b85818d
Pass ctx to drain helper
...
In some rare cases, we hit an NPR because the k8s code tries to use the
ctx we are not passing.
2021-03-26 10:29:11 +01:00
Ole Markus With
20bd724f5e
Add support for scaling out the control plane with dedicated apiserver nodes
...
Ensure apiserver role can only be used on AWS (because of firewalling)
Apply api-server label to CP as well
Consolidate node not ready validation message
Guard apiserver nodes with a feature flag
Rename Apiserver role to APIServer
Add an integration test for apiserver nodes
Rename Apiserver role to APIServer
Enumerate all roles in rolling update docs
Apply suggestions from code review
Co-authored-by: Steven E. Harris <seh@panix.com>
2021-03-20 20:57:00 +01:00
Markos Chandras
0a49650c70
aws: Graceful handling of EC2 detach errors
...
Sometimes, we observe the following error during a rolling update:
error detaching instance "i-XXXX", node "ip-10-X-X-X.ec2.internal": error detaching instance "i-XXXX": ValidationError: The instance i-XXXX is not part of Auto Scaling group XXXXX
The sequence of events that lead to this problem is the following:
- A new ASG object is being built from the launch template
- Existing instances are being added to it
- An existing instance is being ignored because it's already terminating
W0205 08:01:32.593377 191 aws_cloud.go:791] ignoring instance as it is terminating: i-XXXX in autoscaling group: XXXX
- Due to maxSurge, the terminating instance is trying to be detached
from the autoscaling group and fails.
As such, in case of EC@ ASG deatch failures we can simply try to detach
the next node instead of aborting the whole update operation.
2021-03-05 15:01:30 +02:00
Jesse Haka
46de9f145e
update gophercloud dependency
2021-01-11 14:48:22 +02:00
Ole Markus With
5a2f1274fb
Don't try to detach masters
2020-11-28 09:44:42 +01:00
Kubernetes Prow Robot
0b5646e94a
Merge pull request #10266 from rifelpet/k8s120
...
Update k8s dependencies to 1.20.0-beta.2
2020-11-18 10:48:07 -08:00
Peter Rifel
47354ce010
Update kubectl drain fields for 1.20
2020-11-18 11:55:03 -06:00
Kubernetes Prow Robot
92911d7dcf
Merge pull request #10167 from olemarkus/cilium-ondelete
...
Make it possible to use OnDelete update strategy on addon daemonset
2020-11-16 12:38:03 -08:00