Commit Graph

197 Commits

Author SHA1 Message Date
Kubernetes Prow Robot f6a36bfc42
Merge pull request #14194 from jandersen-plaid/jandersen-plaid-exit-first-error
Exit rolling updates when encountering specific errors
2023-01-09 23:59:25 -08:00
John Gardiner Myers c68be498c6 Refactor NewAssetBuilder to not take a Cluster 2023-01-01 13:37:52 -08:00
justinsb 90cbf75584 Context threading: more wiring
We're aiming to use this for testing immediately and better
logging/tracing in future, but to make the changes manageable breaking
them into a smaller series that don't directly achieve much.
2022-12-22 17:52:22 -05:00
Jack Andersen 89dfafefe7
Make struct members private, alter formatting, add unwrap method
Signed-off-by: Jack Andersen <jandersen@plaid.com>
2022-12-21 09:30:19 -08:00
Jack Andersen 66fe8e8118
Move results insert to original location to reduce diff
Signed-off-by: Jack Andersen <jandersen@plaid.com>
2022-12-21 09:30:18 -08:00
Jack Andersen dfd9516a4f
Continue to log if an error is encountered, separate the exit check 2022-12-21 09:30:18 -08:00
Jack Andersen f5f71f17f9
Satisfy the Is interface with ValidationTimeoutError and change callers of err check
Signed-off-by: Jack Andersen <jandersen@plaid.com>
2022-12-21 09:30:17 -08:00
jandersen-plaid 4eb455c6b9
Update pkg/instancegroups/rollingupdate.go
Co-authored-by: Ole Markus With <olemarkus@gmail.com>
2022-12-21 09:30:16 -08:00
Jack Andersen 2bd5403f37
Create a specific error type for validation timeouts and classify as exitable
Signed-off-by: Jack Andersen <jandersen@plaid.com>
2022-12-21 09:30:16 -08:00
Jack Andersen 6efd68f428
Remove optionality and exit when specific error prefix is matched
Signed-off-by: Jack Andersen <jandersen@plaid.com>
2022-12-21 09:30:14 -08:00
Jack Andersen f9ea9b3ef8
Add a flag to rolling update to fail immediately on IG error
Signed-off-by: Jack Andersen <jandersen@plaid.com>
2022-12-21 09:30:13 -08:00
John Gardiner Myers 235aa61594 v1alpha3: move networking fields under networking 2022-12-02 19:19:59 -08:00
John Gardiner Myers de9055b588 Update control-plane terminology in CLI output strings 2022-11-23 21:32:10 -08:00
John Gardiner Myers d39ba74bd7 Change the control-plane IG role to "ControlPlane" in v1alpha3 API 2022-11-22 17:05:29 -08:00
Ciprian Hacman 8f79c9bd68 Replace fi.Bool/Float*/Int*/String() with fi.PtrTo() 2022-11-19 03:45:22 +02:00
John Gardiner Myers 64be690211 Update TopologySpec for v1alpha3 API 2022-11-06 09:10:38 -08:00
Ole Markus With b45968c992 Log and aggregate errors from rolling update
Rather than just returning the error from the first failing IG
2022-10-20 20:04:18 +02:00
Ole Markus With a5b1722110 Ensure kOps doesn't surge on karpenter IGs 2022-10-17 15:22:39 +02:00
justinsb 4b2f773748 rolling-update: don't deregister our only apiserver
If we do, we can't drain the node afterwards.  We also are going to
have dropped connections in this case anyway.
2022-09-15 09:16:57 -04:00
Ole Markus With 1ea5243406 Warm pool-enabled ASGs scaled to zero will no longer panic 2022-09-09 11:08:00 +02:00
Ciprian Hacman cb99db0757 Run make goimports 2022-08-17 07:03:33 +03:00
Ole Markus With c260cf69b3 Log errors from detachInstance 2022-06-27 19:58:16 +02:00
Rémy Léone 80d2d53643 fix tenv linter 2022-06-15 18:06:28 +02:00
Ciprian Hacman b5f14b589b Add initial support for Hetzner Cloud 2022-05-09 06:12:15 +03:00
Ole Markus With ce2e877aeb Remove bazel files from vendor 2022-04-12 13:29:03 +02:00
Ole Markus With 982463683d Remove checks that doesn't work when we do not delete the node object 2022-03-06 07:34:52 +01:00
Ole Markus With 2ba9c1670f Only delete node object on GCE 2022-03-06 07:34:52 +01:00
John Gardiner Myers cac727c357 Make cloudProvider a struct in v1alpha3 API 2022-03-02 21:59:49 -08:00
John Gardiner Myers 70f7d9bdb2 Use function to get cloud provider from cluster spec 2022-03-02 21:59:47 -08:00
Bronson Mirafuentes 86b0ef0d0c add drain-timeout flag to rolling-update cluster 2022-01-20 14:05:55 -08:00
Jesse Haka b88d110f58 Drain OpenStack loadbalancers 2021-12-31 13:16:02 +02:00
Ole Markus With 5e944f1a15 Do not try to detach karpenter nodes from ASGs 2021-12-15 09:56:33 +01:00
Ole Markus With b785965c50 Rename InstanceManager to Manager 2021-12-13 09:14:24 +01:00
Ole Markus With 1ccb7840ac make rolling update work 2021-12-12 19:33:41 +01:00
Ciprian Hacman ea7df00719 Run hack/update-gofmt.sh 2021-12-01 22:39:50 +02:00
Kubernetes Prow Robot ec7fe88868
Merge pull request #12730 from johngmyers/fix-deprecated
Fix use of deprecated method
2021-11-13 23:22:46 -08:00
John Gardiner Myers c5914d6ddb Fix use of deprecated method 2021-11-13 20:29:52 -08:00
John Gardiner Myers 4396270d74 Fix out of bounds error when instance detach fails 2021-11-08 23:00:28 -08:00
John Gardiner Myers d935a419f8 Simplify AddSSHPublicKey() interface 2021-07-24 08:59:57 -07:00
John Gardiner Myers e0915887ed Move asset copying out of apply_cluster 2021-06-05 21:17:50 -07:00
John Gardiner Myers d46ee9c883 Exclude nodes from load balancers upon cordoning 2021-04-20 17:58:26 -07:00
Ole Markus With 09615935fd Make kOps CLI handle ASG warm pools 2021-04-15 11:10:23 +02:00
Ole Markus With ab1b85818d Pass ctx to drain helper
In some rare cases, we hit an NPR because the k8s code tries to use the
ctx we are not passing.
2021-03-26 10:29:11 +01:00
Ole Markus With 20bd724f5e Add support for scaling out the control plane with dedicated apiserver nodes
Ensure apiserver role can only be used on AWS (because of firewalling)

Apply api-server label to CP as well

Consolidate node not ready validation message

Guard apiserver nodes with a feature flag

Rename Apiserver role to APIServer

Add an integration test for apiserver nodes

Rename Apiserver role to APIServer

Enumerate all roles in rolling update docs

Apply suggestions from code review

Co-authored-by: Steven E. Harris <seh@panix.com>
2021-03-20 20:57:00 +01:00
Markos Chandras 0a49650c70
aws: Graceful handling of EC2 detach errors
Sometimes, we observe the following error during a rolling update:

error detaching instance "i-XXXX", node "ip-10-X-X-X.ec2.internal": error detaching instance "i-XXXX": ValidationError: The instance i-XXXX is not part of Auto Scaling group XXXXX

The sequence of events that lead to this problem is the following:

- A new ASG object is being built from the launch template
- Existing instances are being added to it
- An existing instance is being ignored because it's already terminating
W0205 08:01:32.593377     191 aws_cloud.go:791] ignoring instance as it is terminating: i-XXXX in autoscaling group: XXXX
- Due to maxSurge, the terminating instance is trying to be detached
  from the autoscaling group and fails.

As such, in case of EC@ ASG deatch failures we can simply try to detach
the next node instead of aborting the whole update operation.
2021-03-05 15:01:30 +02:00
Jesse Haka 46de9f145e update gophercloud dependency 2021-01-11 14:48:22 +02:00
Ole Markus With 5a2f1274fb Don't try to detach masters 2020-11-28 09:44:42 +01:00
Kubernetes Prow Robot 0b5646e94a
Merge pull request #10266 from rifelpet/k8s120
Update k8s dependencies to 1.20.0-beta.2
2020-11-18 10:48:07 -08:00
Peter Rifel 47354ce010
Update kubectl drain fields for 1.20 2020-11-18 11:55:03 -06:00
Kubernetes Prow Robot 92911d7dcf
Merge pull request #10167 from olemarkus/cilium-ondelete
Make it possible to use OnDelete update strategy on addon daemonset
2020-11-16 12:38:03 -08:00