Commit Graph

73 Commits

Author SHA1 Message Date
Michal Jura 8976a0eb44
Revert "[main] fix: Retry operation on conflict" 2024-10-28 13:19:09 +05:30
yiannistri bd7389f4eb
fix: Retry operation on conflict 2024-09-26 16:17:24 +01:00
Parthvi 1675d2bff2 Enhance the logs
Enhancing the logs:
- [cluster-name (cluster-id)] > [cluster-name (id:cluster-id)]
- Use structs instead of pointers to print data
- Remove upstream data from info log
- Add debug info
- Update nodepool name duplicate error

Signed-off-by: Parthvi <parthvi.vala@suse.com>
2024-08-20 11:35:25 +02:00
vardhaman22 4a53c57695 updated wrangler to v3 2024-06-12 19:29:32 +05:30
yiannistri 530dec7cb7
feat: Set service account for GKE node pool 2024-05-20 13:43:24 +01:00
Michal Jura 2e8fb0737c
Fix unit tests
Don't check error during creating namaspaces, due to namespace usage limitation
https://book.kubebuilder.io/reference/envtest.html#namespace-usage-limitation
2024-02-28 13:17:29 +01:00
Michal Jura 9bc3add233
Add more unit tests to gke-cluster-config-handler_test.go 2024-02-27 16:40:33 +01:00
Alexandr Demicev 4fa821a8dc
Use mock gke client in tests 2024-02-27 16:00:47 +01:00
Michal Jura 00f5baab19
Add unit tests for gke-cluster-config-handler.go 2024-02-27 16:00:46 +01:00
Michal Jura b2e3d2c9b8
Add support for Customer Managed Encryption Key
Add support for The Customer Managed Encryption Key used to encrypt
the boot disk attached to each node in the node pool.

For more information about protecting resources with Cloud KMS Keys please see:
  https://cloud.google.com/compute/docs/disks/customer-managed-encryption

Issue: https://github.com/rancher/gke-operator/issues/261
2024-02-19 17:26:36 +01:00
Kevin Joiner 0825036af7 Bump Wrangler version to v2.0.2
The previous wrangler commit included all of the v2 changes.
Except for the import path changes.
2024-01-12 12:06:33 -05:00
Michal Jura 93ec9749db
Add suport for GKE Autopilot
(cherry picked from commit c0e21add69d99c00b40238905641dfabd4b729fa)
2023-12-20 08:40:13 +01:00
Furkat Gofurov b04cf615b1
Fix linting issues
Signed-off-by: Furkat Gofurov <furkat.gofurov@suse.com>
2023-09-04 13:25:16 +03:00
Michal Jura 4a22a27a8e
Add unit test for gke-operator
Issue: https://github.com/rancher/gke-operator/issues/158
2023-08-01 19:09:43 +02:00
Michal Jura 9aac7f118c
Adding support for updating labels in node pools 2023-07-13 09:05:38 +02:00
Furkat Gofurov 52324eae07
Add lint GH action workflow, enable linterts and fix the linter reported problems
(cherry picked from commit 7387d1c1185e594d2062ba064417f2b3fdcb864d)
2023-05-09 13:37:16 +03:00
Colleen Murphy e0b80134bb Add Tags to upstream spec builder
Without this, the true value of the nodepool tags was appearing empty in
the upstream spec.
2021-07-26 09:20:38 -07:00
Colleen Murphy 961a9f49a6 Move internal/ to pkg/ 2021-06-07 09:48:16 -07:00
Colleen Murphy 24a127ae72
Merge pull request #47 from cmurphy/fix-labels-2
Follow up on labels issues
2021-06-03 08:42:30 -07:00
Colleen Murphy 1f0015eb1b Ignore label mismatch error
Reverts 3244dfa5a and instead catches the fingerprint mismatch error and
downgrades it to a debug log. The reasoning behind that commit still
applies - the upstream GKE cluster is slightly delayed in processing the
label updates - but the controller will naturally retry the update, so
we don't need to block on retrying ourselves. This way the equality
check will also be done again and so the cluster won't be updated twice.
2021-06-01 15:09:35 -07:00
Colleen Murphy bfa6312a98 Fix setting failure message on empty resource
53dbce90 was an attempt to address the occurrence of resource conflict
errors during an UpdateStatus call to set the resource phase to
"updating". It helped reduce the frequency of this happening but it did
not fully address the issue because it was not the only place where the
resource state is updated.

The resource conflict error always occurs after an error has occurred
during the reconcile loop. Multiple status updates happen in quick
succession during a single loop, and upon entering the next loop, the
shared informer cache is not fully synced and the controller worker
fetches an out of date version of the resource object.

Without this change, when this happens, the recordError handler function
tries to continue persisting the error state to the resource object with
UpdateStatus, but since UpdateStatus returns an empty struct, there's no
object it can actually update at this point, and another error is
logged.

If this happens, just return the error right away instead of trying to
update an empty object.
2021-05-25 13:41:02 -07:00
Colleen Murphy 522d7c848c Fix MaxPodsConstraint for routes-based cluster
The max pods setting is only available for VPC-native (alias IP)
clusters, so requiring to be set to something when UseIPAliases is false
will cause a validation error from GKE. This change loosens the
requirement that MaxPodsConstraint be non-nil on create, and fixes the
upstreamSpec builder to tolerate it not being set on the upstream
cluster.
2021-05-06 14:54:39 -07:00
Colleen Murphy 1838715453 Fix upstream state for labels
If labels are not set on the gkeapi cluster object, they will appear as
`nil`, but they should be converted to an empty map so that it is
comparable to the applied cluster state.
2021-05-04 12:42:03 -07:00
Colleen Murphy 53dbce900e Retry on conflict during enqueueUpdate
It is very important for the status update to succeed here, otherwise
the update loop will be entered again unnecessarily. If this happens
very quickly, there may be an inconsistency between the upstreamSpec
state, the config.Spec state, and the actual upstream cluster state. It
is best to ensure that this status update does not compound on other
problems.
2021-05-04 11:24:27 -07:00
Donnie Adams 9c2beb0ea4 Error on nodepool name collisions
Previously, if a user tried to create two nodepools with the same name,
an update loop would commence where gke-operator would continually try
to upgrade a nodepool in GKE that had two configs in Rancher.

After this change, the collision will be detected and no update will
happen until the collision is fixed.
2021-04-30 17:06:52 -07:00
Colleen Murphy 2188e9cbd1 Add cluster labels field
Cluster labels were available in the old kontainer-enginer
provisioner[1] so we should ensure feature parity with it.

[1] 89626b028c/drivers/gke/gke_driver.go (L59)
2021-04-28 12:37:34 -07:00
Colleen Murphy c2fa25ce94 Fix upstreamspec construction for private clusters
Fix a typo that caused upstreamSpec to set enablePrivateEndpoint to the
wrong value.
2021-04-20 11:53:02 -07:00
Colleen Murphy b1a048870f Remove version validation for updates
The validateUpdate method was adopted from eks-provider and mimicked its
kubernetes version validation, which ensures the provided version is
valid Semver and that the control plane and node versions are within a
constrained range of each other. This was causing a problem with GKE
versions as it was not always parsing the Semver components correctly
and would result in a confusing and incorrect error message like:

  versions for cluster [1.18.16-gke.300] and nodegroup [1.18.15-gke.1501] not compatible: all nodegroup kubernetes versionsmust be equal to or one minor version lower than the cluster kubernetes version

Rather than fix the version parsing, this change just removes this
validation. GKE does not place such strict constraints on the delta
between the control plane and node pool versions, you can even create a
cluster that is up to seven minor versions apart. The UI queries GKE for
valid versions to input, so as long as it does so there is no danger of
requesting an invalid version. This is also just an awkward place to do
this validation, since it's only validating particular attributes and
not the full update request, so if validation is needed it should be
done elsewhere.
2021-04-14 14:41:57 -07:00
Colleen Murphy 6c8f894454 Add log message for imported clusters
The operator logs an info message when cluster creation is started, but
gives no indication when a cluster is being imported. Add a log for when
a cluster import is starting.
2021-04-09 09:27:09 -07:00
Colleen Murphy c7e9249a7e Ignore already exists error for CA Secret creation
When clusters are imported there might be a race condition where the
GKEClusterConfig Status doesn't successfully get saved on the first try
and the controller will retry on its own. Without this patch, if it does
this, it would reattempt to create the CA secret and end up failing
trying to do this. This change ensures that there is no reattempt to
create the same CA secret, which also helps ensure that if there is a
legitimate failure to update the cluster config's Status then that
failure message isn't clobbered by this unrelated error.
2021-04-06 16:47:49 -07:00
Colleen Murphy 9e2dcf914f
Merge pull request #22 from cmurphy/fix-ca-secret
Fix CACert encoding
2021-04-06 13:29:59 -07:00
Colleen Murphy c862a1c0a3
Merge pull request #21 from cmurphy/endpoint-cleanup
Remove publicEndpoint and privateEndpoint
2021-04-06 13:29:43 -07:00
Colleen Murphy ef44b0ca0a Revert "Add support for Autopilot"
This reverts commit 559c5d0025.

The Autopilot feature is not compatible with Rancher because it
restricts management of the kube-system namespace[1]. We can reevaluate
this feature and look into a workaround for this at a later time.

[1] https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview#managed_namespaces
2021-04-06 07:21:30 -07:00
Colleen Murphy 904303c262 Fix CACert encoding
Without this change, the CA secret gets created from the raw CA
certificate, which means the resulting Secret is base64-encoded one
time. This caused a problem in Rancher which, for EKS, was relying on
not having to manage the encoding at all[1], but was accidentally
changed when GKE support was added. This change simplifies the secret
generator to just use the CA cert from GKE as-is without decoding, and
Rancher will be fixed to also not reencode it.

[1] 9007198a6b/pkg/controllers/management/eks/eks_cluster_handler.go (L497)
[2] a6f0b91b8a (diff-a774fa3e1a522b4a76b3f3b100a84fa19e2166a41ed9414ec37bf2c57e70ffa9R248)
2021-04-05 19:45:33 -07:00
Colleen Murphy 9ae95fc350 Remove publicEndpoint and privateEndpoint
PrivateClusterConfig.PublicEndpoint and
PrivateClusterConfig.PrivateEndpoint are read-only informational
parameters in the GKE SDK[1]. They should never have been exposed as
configurable options here. Configuration of the public and private
endpoints is done via the EnablePrivateEndpoint toggle.

[1] https://pkg.go.dev/google.golang.org/api/container/v1#PrivateClusterConfig
2021-04-05 15:13:52 -07:00
Colleen Murphy 559c5d0025 Add support for Autopilot
GKE introduced a new mode called Autopilot[1] in which node pools no
longer need to be managed by the user. This change allows users to opt
into this mode. There are caveats regarding the flexibility of this
mode[2] so it is not necessarily suitable for all users and standard
mode should still be supported.

The cluster must have clusterAddons.horizontalPodAutoscaling and
clusterAddons.httpLoadBalancing enabled. The cluster must be created for
a region, not a zone, so zone must be unset. Any configured node pools
will be ignored, and setting node pools to an empty list should be
allowed in Rancher. The autopilot setting is immutable.

[1] https://cloud.google.com/blog/products/containers-kubernetes/introducing-gke-autopilot
[2] https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview
2021-04-05 08:38:01 -07:00
Colleen Murphy 7f7ad301a9
Merge pull request #17 from cmurphy/fix-record-error
Fix recordError function
2021-03-31 10:50:53 -07:00
Colleen Murphy 3fb5e3977a Rename GKE structs
Rancher's machine driver creates a dynamic schema with name "nodeconfig"
that collides with the static schema generated for the GKE operator's
NodeConfig struct. The result is that repeated calls to
/v3/schemas/nodeConfig may alternately return either the NodeConfig
schema from this operator or a different NodeConfig schema containing
the googleConfig schema for GCP cloud credentials.

This change renames the NodeConfig struct to avoid this collision, and
also renames every other struct defined for the operator in order to
prevent potential collisions in the future.
2021-03-30 20:08:58 -07:00
Colleen Murphy 30517e2386 Fix recordError function
The function that was supposed to set the FailureMessage status
attribute was not doing that. This change ensures the error from the
onChange handler is actually used and the cluster status is updated.
2021-03-29 15:58:40 -07:00
Colleen Murphy a1bf448365
Merge pull request #15 from cmurphy/nodepool-mgmt
Add missing fields for maintenance and availability
2021-03-22 18:24:26 -07:00
Colleen Murphy 535ef2bd7e Fix controller sync for empty version/image string
It is valid to create a GKE cluster with the kubernetes version, node
pool version, or node pool image type set to "". GKE will set defaults
on the backend. However, it causes problems during the update cycle. The
validateAndUpdate function on the controller cannot check whether "" is
semver compliant, and the Update* functions cannot set "" as a desired
value to update to. This change adds allowances for certain string
parameters to be empty. Also restructures the
UpdateMasterKubernetesVersion to be more Go-idiomatic by returning early
if possible.
2021-03-22 08:20:44 -07:00
Colleen Murphy bc5497ed83 Add MaintenanceWindow field
Add a field MaintenanceWindow which represents the start time
at which automatic maintenance is allowed to be performed. The duration
of the window is not settable. Setting MaintenanceWindow to "" means
maintenance can occur at any time, which is the default.
2021-03-20 14:34:18 -07:00
Colleen Murphy 632453b70b Add Locations field
For parity with the KEv1 implementation, add the ability to set and
update the Locations field for a cluster. This represents additional
zones (within a single region) in which node pools can be deployed for
greater availability.
2021-03-20 14:25:12 -07:00
Colleen Murphy acfc5bda22 Add support for AutoRepair and AutoUpgrade feature
Add a struct called Management as an attribute for NodePool which
supports toggling auto-repair and auto-upgrade for a node pool. Also
update the examples.

GKE defaults to setting these to true on the backend, so before this
patch, since we were not setting the Management pointer to an object for
a node pool, the node pool would be created with auto-repair and
auto-upgrade enabled. Now that we're explicitly setting it, it defaults
to disabled.
2021-03-19 15:30:41 -07:00
Colleen Murphy 4c61a49312 Fix handling of region vs zone
In GKE, a cluster is created either in a Region or a Zone. The CRD
supports setting either and the controller validates that one but not
both is set. But without this patch, the operator was only respecting
the Region setting and never regarding the Zone. The examples also
incorrectly referred to an example Zone identifier in the Region
setting. This patch ensures that Zone will be used to identify the
cluster if it is used and fixes the examples to make sense.
2021-03-17 17:58:25 -07:00
Colleen Murphy 40f9b1a0f5
Merge pull request #11 from cmurphy/kubeconfig
Add controller.GetTokenSource function
2021-03-17 07:55:26 -07:00
Colleen Murphy 81a9439c81 Change CredentialContent to GoogleCredentialSecret
The CredentialContent attribute does not really have the contents of the
credential, but a reference to a Cloud Credential Secret which does
contain the credential. Rename it to make its purpose clearer and to be
consistent with the credential field in eks- and aks-operator.
2021-03-15 19:34:48 -07:00
Colleen Murphy 5f12f59fd8 Fix importCluster comment
There is no displayName attribute in GKEClusterConfigSpec. Also, the
function creates a Secret as a side effect, which is worth mentioning in
the comment.
2021-03-15 18:45:35 -07:00
Colleen Murphy 941b68e3de Rename EnableAlphaFeature to EnableKubernetesAlpha
Rename the alpha-feature toggle to be consistent with the Go SDK and
improve the readability of the translation between CRD attributes and
GKE API attributes.
2021-03-15 18:40:11 -07:00
Colleen Murphy 4e5c4d885c Add controller.GetTokenSource function
The GKE handler in Rancher needs to be able to build an initial admin
kubeconfig in order to generate the first service account in the
cluster. Since gke-operator already knows how to get the credential
Secret and convert it to an OAuth2 token, it is convenient to expose
this as a public function that Rancher can use to authenticate to the
cluster.
2021-03-15 15:16:32 -07:00