gke-operator

Commit Graph

Author	SHA1	Message	Date
Michal Jura	8976a0eb44	Revert "[main] fix: Retry operation on conflict"	2024-10-28 13:19:09 +05:30
yiannistri	bd7389f4eb	fix: Retry operation on conflict	2024-09-26 16:17:24 +01:00
Parthvi	1675d2bff2	Enhance the logs Enhancing the logs: - [cluster-name (cluster-id)] > [cluster-name (id:cluster-id)] - Use structs instead of pointers to print data - Remove upstream data from info log - Add debug info - Update nodepool name duplicate error Signed-off-by: Parthvi <parthvi.vala@suse.com>	2024-08-20 11:35:25 +02:00
vardhaman22	4a53c57695	updated wrangler to v3	2024-06-12 19:29:32 +05:30
yiannistri	530dec7cb7	feat: Set service account for GKE node pool	2024-05-20 13:43:24 +01:00
Michal Jura	2e8fb0737c	Fix unit tests Don't check error during creating namaspaces, due to namespace usage limitation https://book.kubebuilder.io/reference/envtest.html#namespace-usage-limitation	2024-02-28 13:17:29 +01:00
Michal Jura	9bc3add233	Add more unit tests to gke-cluster-config-handler_test.go	2024-02-27 16:40:33 +01:00
Alexandr Demicev	4fa821a8dc	Use mock gke client in tests	2024-02-27 16:00:47 +01:00
Michal Jura	00f5baab19	Add unit tests for gke-cluster-config-handler.go	2024-02-27 16:00:46 +01:00
Michal Jura	b2e3d2c9b8	Add support for Customer Managed Encryption Key Add support for The Customer Managed Encryption Key used to encrypt the boot disk attached to each node in the node pool. For more information about protecting resources with Cloud KMS Keys please see: https://cloud.google.com/compute/docs/disks/customer-managed-encryption Issue: https://github.com/rancher/gke-operator/issues/261	2024-02-19 17:26:36 +01:00
Kevin Joiner	0825036af7	Bump Wrangler version to v2.0.2 The previous wrangler commit included all of the v2 changes. Except for the import path changes.	2024-01-12 12:06:33 -05:00
Michal Jura	93ec9749db	Add suport for GKE Autopilot (cherry picked from commit c0e21add69d99c00b40238905641dfabd4b729fa)	2023-12-20 08:40:13 +01:00
Furkat Gofurov	b04cf615b1	Fix linting issues Signed-off-by: Furkat Gofurov <furkat.gofurov@suse.com>	2023-09-04 13:25:16 +03:00
Michal Jura	4a22a27a8e	Add unit test for gke-operator Issue: https://github.com/rancher/gke-operator/issues/158	2023-08-01 19:09:43 +02:00
Michal Jura	9aac7f118c	Adding support for updating labels in node pools	2023-07-13 09:05:38 +02:00
Furkat Gofurov	52324eae07	Add lint GH action workflow, enable linterts and fix the linter reported problems (cherry picked from commit 7387d1c1185e594d2062ba064417f2b3fdcb864d)	2023-05-09 13:37:16 +03:00
Colleen Murphy	e0b80134bb	Add Tags to upstream spec builder Without this, the true value of the nodepool tags was appearing empty in the upstream spec.	2021-07-26 09:20:38 -07:00
Colleen Murphy	961a9f49a6	Move internal/ to pkg/	2021-06-07 09:48:16 -07:00
Colleen Murphy	24a127ae72	Merge pull request #47 from cmurphy/fix-labels-2 Follow up on labels issues	2021-06-03 08:42:30 -07:00
Colleen Murphy	1f0015eb1b	Ignore label mismatch error Reverts `3244dfa5a` and instead catches the fingerprint mismatch error and downgrades it to a debug log. The reasoning behind that commit still applies - the upstream GKE cluster is slightly delayed in processing the label updates - but the controller will naturally retry the update, so we don't need to block on retrying ourselves. This way the equality check will also be done again and so the cluster won't be updated twice.	2021-06-01 15:09:35 -07:00
Colleen Murphy	bfa6312a98	Fix setting failure message on empty resource `53dbce90` was an attempt to address the occurrence of resource conflict errors during an UpdateStatus call to set the resource phase to "updating". It helped reduce the frequency of this happening but it did not fully address the issue because it was not the only place where the resource state is updated. The resource conflict error always occurs after an error has occurred during the reconcile loop. Multiple status updates happen in quick succession during a single loop, and upon entering the next loop, the shared informer cache is not fully synced and the controller worker fetches an out of date version of the resource object. Without this change, when this happens, the recordError handler function tries to continue persisting the error state to the resource object with UpdateStatus, but since UpdateStatus returns an empty struct, there's no object it can actually update at this point, and another error is logged. If this happens, just return the error right away instead of trying to update an empty object.	2021-05-25 13:41:02 -07:00
Colleen Murphy	522d7c848c	Fix MaxPodsConstraint for routes-based cluster The max pods setting is only available for VPC-native (alias IP) clusters, so requiring to be set to something when UseIPAliases is false will cause a validation error from GKE. This change loosens the requirement that MaxPodsConstraint be non-nil on create, and fixes the upstreamSpec builder to tolerate it not being set on the upstream cluster.	2021-05-06 14:54:39 -07:00
Colleen Murphy	1838715453	Fix upstream state for labels If labels are not set on the gkeapi cluster object, they will appear as `nil`, but they should be converted to an empty map so that it is comparable to the applied cluster state.	2021-05-04 12:42:03 -07:00
Colleen Murphy	53dbce900e	Retry on conflict during enqueueUpdate It is very important for the status update to succeed here, otherwise the update loop will be entered again unnecessarily. If this happens very quickly, there may be an inconsistency between the upstreamSpec state, the config.Spec state, and the actual upstream cluster state. It is best to ensure that this status update does not compound on other problems.	2021-05-04 11:24:27 -07:00
Donnie Adams	9c2beb0ea4	Error on nodepool name collisions Previously, if a user tried to create two nodepools with the same name, an update loop would commence where gke-operator would continually try to upgrade a nodepool in GKE that had two configs in Rancher. After this change, the collision will be detected and no update will happen until the collision is fixed.	2021-04-30 17:06:52 -07:00
Colleen Murphy	2188e9cbd1	Add cluster labels field Cluster labels were available in the old kontainer-enginer provisioner[1] so we should ensure feature parity with it. [1] `89626b028c/drivers/gke/gke_driver.go (L59)`	2021-04-28 12:37:34 -07:00
Colleen Murphy	c2fa25ce94	Fix upstreamspec construction for private clusters Fix a typo that caused upstreamSpec to set enablePrivateEndpoint to the wrong value.	2021-04-20 11:53:02 -07:00
Colleen Murphy	b1a048870f	Remove version validation for updates The validateUpdate method was adopted from eks-provider and mimicked its kubernetes version validation, which ensures the provided version is valid Semver and that the control plane and node versions are within a constrained range of each other. This was causing a problem with GKE versions as it was not always parsing the Semver components correctly and would result in a confusing and incorrect error message like: versions for cluster [1.18.16-gke.300] and nodegroup [1.18.15-gke.1501] not compatible: all nodegroup kubernetes versionsmust be equal to or one minor version lower than the cluster kubernetes version Rather than fix the version parsing, this change just removes this validation. GKE does not place such strict constraints on the delta between the control plane and node pool versions, you can even create a cluster that is up to seven minor versions apart. The UI queries GKE for valid versions to input, so as long as it does so there is no danger of requesting an invalid version. This is also just an awkward place to do this validation, since it's only validating particular attributes and not the full update request, so if validation is needed it should be done elsewhere.	2021-04-14 14:41:57 -07:00
Colleen Murphy	6c8f894454	Add log message for imported clusters The operator logs an info message when cluster creation is started, but gives no indication when a cluster is being imported. Add a log for when a cluster import is starting.	2021-04-09 09:27:09 -07:00
Colleen Murphy	c7e9249a7e	Ignore already exists error for CA Secret creation When clusters are imported there might be a race condition where the GKEClusterConfig Status doesn't successfully get saved on the first try and the controller will retry on its own. Without this patch, if it does this, it would reattempt to create the CA secret and end up failing trying to do this. This change ensures that there is no reattempt to create the same CA secret, which also helps ensure that if there is a legitimate failure to update the cluster config's Status then that failure message isn't clobbered by this unrelated error.	2021-04-06 16:47:49 -07:00
Colleen Murphy	9e2dcf914f	Merge pull request #22 from cmurphy/fix-ca-secret Fix CACert encoding	2021-04-06 13:29:59 -07:00
Colleen Murphy	c862a1c0a3	Merge pull request #21 from cmurphy/endpoint-cleanup Remove publicEndpoint and privateEndpoint	2021-04-06 13:29:43 -07:00
Colleen Murphy	ef44b0ca0a	Revert "Add support for Autopilot" This reverts commit `559c5d0025`. The Autopilot feature is not compatible with Rancher because it restricts management of the kube-system namespace[1]. We can reevaluate this feature and look into a workaround for this at a later time. [1] https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview#managed_namespaces	2021-04-06 07:21:30 -07:00
Colleen Murphy	904303c262	Fix CACert encoding Without this change, the CA secret gets created from the raw CA certificate, which means the resulting Secret is base64-encoded one time. This caused a problem in Rancher which, for EKS, was relying on not having to manage the encoding at all[1], but was accidentally changed when GKE support was added. This change simplifies the secret generator to just use the CA cert from GKE as-is without decoding, and Rancher will be fixed to also not reencode it. [1] `9007198a6b/pkg/controllers/management/eks/eks_cluster_handler.go (L497)` [2] `a6f0b91b8a (diff-a774fa3e1a522b4a76b3f3b100a84fa19e2166a41ed9414ec37bf2c57e70ffa9R248)`	2021-04-05 19:45:33 -07:00
Colleen Murphy	9ae95fc350	Remove publicEndpoint and privateEndpoint PrivateClusterConfig.PublicEndpoint and PrivateClusterConfig.PrivateEndpoint are read-only informational parameters in the GKE SDK[1]. They should never have been exposed as configurable options here. Configuration of the public and private endpoints is done via the EnablePrivateEndpoint toggle. [1] https://pkg.go.dev/google.golang.org/api/container/v1#PrivateClusterConfig	2021-04-05 15:13:52 -07:00
Colleen Murphy	559c5d0025	Add support for Autopilot GKE introduced a new mode called Autopilot[1] in which node pools no longer need to be managed by the user. This change allows users to opt into this mode. There are caveats regarding the flexibility of this mode[2] so it is not necessarily suitable for all users and standard mode should still be supported. The cluster must have clusterAddons.horizontalPodAutoscaling and clusterAddons.httpLoadBalancing enabled. The cluster must be created for a region, not a zone, so zone must be unset. Any configured node pools will be ignored, and setting node pools to an empty list should be allowed in Rancher. The autopilot setting is immutable. [1] https://cloud.google.com/blog/products/containers-kubernetes/introducing-gke-autopilot [2] https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview	2021-04-05 08:38:01 -07:00
Colleen Murphy	7f7ad301a9	Merge pull request #17 from cmurphy/fix-record-error Fix recordError function	2021-03-31 10:50:53 -07:00
Colleen Murphy	3fb5e3977a	Rename GKE structs Rancher's machine driver creates a dynamic schema with name "nodeconfig" that collides with the static schema generated for the GKE operator's NodeConfig struct. The result is that repeated calls to /v3/schemas/nodeConfig may alternately return either the NodeConfig schema from this operator or a different NodeConfig schema containing the googleConfig schema for GCP cloud credentials. This change renames the NodeConfig struct to avoid this collision, and also renames every other struct defined for the operator in order to prevent potential collisions in the future.	2021-03-30 20:08:58 -07:00
Colleen Murphy	30517e2386	Fix recordError function The function that was supposed to set the FailureMessage status attribute was not doing that. This change ensures the error from the onChange handler is actually used and the cluster status is updated.	2021-03-29 15:58:40 -07:00
Colleen Murphy	a1bf448365	Merge pull request #15 from cmurphy/nodepool-mgmt Add missing fields for maintenance and availability	2021-03-22 18:24:26 -07:00
Colleen Murphy	535ef2bd7e	Fix controller sync for empty version/image string It is valid to create a GKE cluster with the kubernetes version, node pool version, or node pool image type set to "". GKE will set defaults on the backend. However, it causes problems during the update cycle. The validateAndUpdate function on the controller cannot check whether "" is semver compliant, and the Update* functions cannot set "" as a desired value to update to. This change adds allowances for certain string parameters to be empty. Also restructures the UpdateMasterKubernetesVersion to be more Go-idiomatic by returning early if possible.	2021-03-22 08:20:44 -07:00
Colleen Murphy	bc5497ed83	Add MaintenanceWindow field Add a field MaintenanceWindow which represents the start time at which automatic maintenance is allowed to be performed. The duration of the window is not settable. Setting MaintenanceWindow to "" means maintenance can occur at any time, which is the default.	2021-03-20 14:34:18 -07:00
Colleen Murphy	632453b70b	Add Locations field For parity with the KEv1 implementation, add the ability to set and update the Locations field for a cluster. This represents additional zones (within a single region) in which node pools can be deployed for greater availability.	2021-03-20 14:25:12 -07:00
Colleen Murphy	acfc5bda22	Add support for AutoRepair and AutoUpgrade feature Add a struct called Management as an attribute for NodePool which supports toggling auto-repair and auto-upgrade for a node pool. Also update the examples. GKE defaults to setting these to true on the backend, so before this patch, since we were not setting the Management pointer to an object for a node pool, the node pool would be created with auto-repair and auto-upgrade enabled. Now that we're explicitly setting it, it defaults to disabled.	2021-03-19 15:30:41 -07:00
Colleen Murphy	4c61a49312	Fix handling of region vs zone In GKE, a cluster is created either in a Region or a Zone. The CRD supports setting either and the controller validates that one but not both is set. But without this patch, the operator was only respecting the Region setting and never regarding the Zone. The examples also incorrectly referred to an example Zone identifier in the Region setting. This patch ensures that Zone will be used to identify the cluster if it is used and fixes the examples to make sense.	2021-03-17 17:58:25 -07:00
Colleen Murphy	40f9b1a0f5	Merge pull request #11 from cmurphy/kubeconfig Add controller.GetTokenSource function	2021-03-17 07:55:26 -07:00
Colleen Murphy	81a9439c81	Change CredentialContent to GoogleCredentialSecret The CredentialContent attribute does not really have the contents of the credential, but a reference to a Cloud Credential Secret which does contain the credential. Rename it to make its purpose clearer and to be consistent with the credential field in eks- and aks-operator.	2021-03-15 19:34:48 -07:00
Colleen Murphy	5f12f59fd8	Fix importCluster comment There is no displayName attribute in GKEClusterConfigSpec. Also, the function creates a Secret as a side effect, which is worth mentioning in the comment.	2021-03-15 18:45:35 -07:00
Colleen Murphy	941b68e3de	Rename EnableAlphaFeature to EnableKubernetesAlpha Rename the alpha-feature toggle to be consistent with the Go SDK and improve the readability of the translation between CRD attributes and GKE API attributes.	2021-03-15 18:40:11 -07:00
Colleen Murphy	4e5c4d885c	Add controller.GetTokenSource function The GKE handler in Rancher needs to be able to build an initial admin kubeconfig in order to generate the first service account in the cluster. Since gke-operator already knows how to get the credential Secret and convert it to an OAuth2 token, it is convenient to expose this as a public function that Rancher can use to authenticate to the cluster.	2021-03-15 15:16:32 -07:00

1 2

73 Commits