Commit Graph

44 Commits

Author SHA1 Message Date
Hidde Beydals 93d2118f71
controller: enrich "HelmChart not ready" messages
This propagates the reason a HelmChart is (likely) not ready to the
message of the Ready condition.

The goal of this is to make it easier for people to reason about a
potential failure that may be happening while retrieving the chart,
without having to inspect the HelmChart itself.

As at times, they may not have access (due to e.g. not being able to
access the namespace, while the controller is allowed to create the
object there), or are simply not aware of the fact that this object
is created by the controller for them.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-12-07 23:35:44 +01:00
Hidde Beydals 0919fb4c24
controller: remove deprecated metrics
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-12-01 17:23:52 +01:00
Hidde Beydals 51563d6012
reconcile: stall without rollback target
This ensures that if there is no target to roll back to due to all of
them being in a failed state, the controller stalls instead of ending up
in a loop of upgrade attempts.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-12-01 17:20:51 +01:00
Hidde Beydals 0a2041c338
controller: ensure object in cache before requeue
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-12-01 17:20:50 +01:00
Hidde Beydals 48cad68386
controller: unready dep should not bump obs gen
This ensures that any unfulfilled dependencies for which we requeue do
not prematurely bump the observed generation by introducing typed
errors.

These typed errors ensure that the logic to bump the observed generation
can continue to be the same, while ignoring them just in time before
returning the final error.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-12-01 14:14:40 +01:00
Hidde Beydals 6b7789aadc
Implement `forceAt` and `resetAt` annotations
This makes the controller actually take the
`reconcile.fluxcd.io/forceAt` and `reconcile.fluxcd.io/resetAt` into
account.

For `reconcile.fluxcd.io/resetAt`, this means that the failure counts on
the `HelmRelease` object are reset when the token value of the
annotation equals `reconcile.fluxcd.io/requestedAt`. Allowing the
controller to start over with attempting to install or upgrade the
release until the retries count has been reached again.

For `reconcile.fluxcd.io/forceAt`, this means that a one-off Helm
install or upgrade is allowed to take place even if the object is out of
retries, in a failed state where it should be remediated, or in-sync.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-30 10:22:49 +01:00
Hidde Beydals 2d927b9b9e
Miscellaneous tidying of minor things
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-24 17:59:45 +01:00
Hidde Beydals eaa2a8c2fe
Update dependencies
- github.com/fluxcd/cli-utils to v0.36.0-flux.1
- github.com/fluxcd/pkg/apis/event to v0.6.0
- github.com/fluxcd/pkg/apis/kustomize to v1.2.0
- github.com/fluxcd/pkg/apis/meta to v1.2.0
- github.com/fluxcd/pkg/runtime to v0.43.0
- github.com/fluxcd/pkg/ssa to v0.34.0
- github.com/fluxcd/pkg/testserver to v0.5.0
- github.com/go-logr/logr to v1.3.0
- github.com/google/go-cmp to v0.6.0
- github.com/hashicorp/go-retryablehttp to v0.7.5
- github.com/onsi/gomega to v1.30.0
- github.com/opencontainers/go-digest to v1.0.1-0.20231025023718-d50d2fec9c98
- github.com/opencontainers/go-digest/blake3 to v0.0.0-20231025023718-d50d2fec9c98
- golang.org/x/text to v0.14.0
- helm.sh/helm/v3 to v3.13.2
- k8s.io/api to v0.28.4
- k8s.io/apiextensions-apiserver to v0.28.4
- k8s.io/apimachinery to v0.28.4
- k8s.io/cli-runtime to v0.28.4
- k8s.io/client-go to v0.28.4
- k8s.io/kubectl to v0.28.4
- k8s.io/utils to v0.0.0-20231121161247-cf03d44ff3cf
- sigs.k8s.io/controller-runtime to v0.16.3
- sigs.k8s.io/kustomize/api to v0.15.0
- sigs.k8s.io/kustomize/kyaml to v0.15.0
- sigs.k8s.io/yaml to v1.4.0

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-24 12:43:33 +01:00
Hidde Beydals 4a8d2ff0f4
action: provide reason for failures count reset
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-23 00:17:17 +01:00
Hidde Beydals 7aad010664
controller: immediate requeue unfinished release
This improves continuity while the controller attempts to move the
release forward.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-23 00:17:14 +01:00
Hidde Beydals 5d1f34a029
controller: patch after setting `Reconciling=True`
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-23 00:17:13 +01:00
Hidde Beydals 20c00fd47a
action: provide a reason on release target changes
This to allow better feedback to the user on why the controller decided
to uninstall the release.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-23 00:17:09 +01:00
Hidde Beydals 580c72cd09
controller: adopt release based on v2beta1 state
This allows the controller to be updated from `v2beta1` to `v2beta2`
without triggering a release to settle state.

It does this by looking at the previous successful release as recorded
for the `v2beta1` object, and if found, recording a snapshot for it in
the new `History` field of the status.

This feature can be disabled by setting the `AdoptLegacyReleases`
feature flag to `false`.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-22 23:14:17 +01:00
Hidde Beydals 70485017d2
controller: requeue on fixed interval on chart 404
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:54 +01:00
Hidde Beydals c5a017cb76
api: record observed releases in `Status.History`
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:53 +01:00
Hidde Beydals 2e0e22593f
reconcile: improve state determination
This decouples the state determination from deciding which action to
take, making it easier to reason about the different types of state
and what action should be taken to drive it forward.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:51 +01:00
Hidde Beydals 80d0878e96
controller: ignore `NotFound` API error on delete
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:50 +01:00
Hidde Beydals 096956fdfd
controller: properly record object metrics
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:48 +01:00
Hidde Beydals f156c3550e
reconcile: allow cfg of manager in atomic action
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:46 +01:00
Hidde Beydals a6ae4c3fb9
reconcile: improve log levels of actions
This ensures the logs of the Kubernetes client used by Helm are
persisted to the log buffer, as they can contain important information
when an action times out.

In addition, move the logs from the Helm actions themselves to the
"debug" log level (while still including them in Kubernetes Events in
case of a failure), in favor of the logs produced by the `reconcile`
package itself. While moving the logs from the Helm storage to the
"trace" log level, as they only contain information about e.g. writes
to a Secret.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:44 +01:00
Hidde Beydals 94064da340
controller: add reconcile release tests
Plus some minor improvements to the logic, based on writing tests.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:42 +01:00
Hidde Beydals 882da27a5d
api: move `Current` and `Previous` into `History`
The primary reason for this is the alphabetical ordering of `kubectl
describe`, which caused the fields to be listed in separate places
instead of a bundle.

From a programmatic perspective, it is also great because it is now much
easier to reset any previous state when e.g. uninstalling a release. As
we can simply write an empty struct to erase any memory of a previous
release, instead of having to deal with multiple fields.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:42 +01:00
Hidde Beydals b2ba3d97ea
controller: improve deletion logic and add tests
This ensures certain edge-cases around the availability of the service
account and/or KubeConfig are handled.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:40 +01:00
Hidde Beydals fbd73ac399
controller: start w/ adding tests for HelmRelease
This adds base coverage for some of the simpler methods which do not
require extensive mocking.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:39 +01:00
Hidde Beydals 5e3ad5d21a
reconcile: add `HelmChartTemplate` sub-reconciler
"With hope comes the potential for both triumph and tribulation."

Due to difficulties beyond the time I have at hands at present[1], the
separate reconciler which took care of ensuring the HelmChart of the
HelmRelease was kept up-to-date has been transformed into a
sub-reconciler.

The behavior of the sub-reconciler remains largely unchanged, except the
required changes to deal with the lack of possibilities to requeue.
Effectively, this means that instead of e.g. deleting the HelmChart
object, requeue, and create it again. This is now handled in a single
operation, unless the deletion fails.

[1]: The core of the issue is that deregistration of finalizers becomes
difficult due to the behavior of the patch helper, and unavailability of
list merges for patch operations on Custom Resources within Kubernetes.

This means that when two reconcilers simultaneously work on the
deregistration of the finalizers, and one succeeds before the other. The
last finishing reconciler will attempt to add the finalizer of the other
reconciler back, as it did exist at the start of their reconciliation
run.

Attempts to work around this (for example, by using an optimistic lock
on the patch operation of the finalizers field) would cause new issues.
As Kubernetes will then delete the object as soon as the patch has
succeeded, and before the reconciliation process actually ends.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:38 +01:00
Hidde Beydals 68c273b701
controller: handle delete before adding finalizer
When an object is marked as under deletion, the API server will reject
any attempt to register new finalizers. Given this, handling the
deletion timestamp always has to come before an attempt to register
the finalizer.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:37 +01:00
Hidde Beydals 866f076d1f
reconcile: share PatchHelper with controller
This ensures they both have the same observation on the last
modifications made to the object. Preventing possible scenarios where
a condition would not be removed because it wasn't set at the start of
the reconcile run, then added, and then removed. Causing it to go
unnoticed during the diff calculation.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:36 +01:00
Hidde Beydals d802ba6cc1
controllers: roughly rewire HelmRelease reconciler
This adds the base wiring to get the controller to work with the
v2beta2 API and the newly introduced packages in `internal/`.

In essence, this means that from now on the controller will utilize all
new code for the reconciliation of the HelmRelease resource.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:35 +01:00
Hidde Beydals c99b00d885
Move predicates into package and add tests
Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:02:41 +01:00
Hidde Beydals 0140eeeea9
Factor various bits out of reconciler
This commit moves various generic bits out of the reconciler into
separate modules, while adding more test coverage.

Some of the logic around merging chart values from references has been
improved to work with `client.Object`, instead of two separate maps.

In addition, the option to override the hostname of an Artifact has
been removed. It was undocumented and for testing purposes only, which
these days can be better achieved by e.g. configuring the
`--storage-adv-addr`.

Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:02:40 +01:00
Hidde Beydals fe661df9d7
Move HelmChart handling to separate reconciler
This moves the HelmChart template handling to a separate reconciler,
with predicates detecing relevant changes. The idea is that this would
both facilitate working _without_ chart templates but with references
in the future, and to reduce cognitive load while working with
reconciler logic.

The predicate uses `DeepEqual` from `k8s.io/apimachinery/pkg/api/equality`
to inspect the Chart template objects of the old and new HelmRelease
object in the update event.

The reconciler uses server-side apply to create or update the HelmChart
on the cluster, and emits an event based on the change set of the
action. It does not produce any diff yet, as the server-side apply
library at present does not provide a way to gain access to an "old"
versus "new" objects after performing an apply. The `diff` package
has however been prepared to allow diffing Unstructured objects.

As this reconciler has a separate life-cycle, a new
`chart.finalizers.fluxcd.io` finalizer has been introduced to ensure
a HelmChart is properly garbage collected before the HelmRelease is
allowed to be deleted.

The implementation on the release reconciler's end is a rough sketch,
but in working shape. The foresight is that much of the reconciler will
change when the release logic will be adjusted to work with the earlier
introduced storage observer.

Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:02:40 +01:00
Hidde Beydals f054ff5853
misc: fix hypothetical implicit memory aliasing
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-10-10 10:43:13 +02:00
Hidde Beydals aa2f6dd3be
misc: remove redundant use of `fmt.Sprintf`
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-10-10 10:03:28 +02:00
Hidde Beydals 08d3674e5a
misc: use `time.Since` instead of `time.Now().Sub`
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-10-10 10:02:42 +02:00
Stefan Prodan bd3ec35697
Retry failed releases when charts are available in storage
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
2023-10-04 11:03:07 +03:00
Hidde Beydals 2465cb43bd
controller: use `DefaultServiceAccount` in differ
This addresses an issue in which the defunct `DefaultServiceAccount`
from the `HelmReleaseReconciler` was being used to construct the
impersonator used by the differ.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-09-18 12:54:08 +02:00
Hidde Beydals 1aa739028d
controller: strip newlines from Helm error message
To prevent spurious newlines between the error message and the captured
logs, as at times Helm ends error with one or multiple newlines.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-09-11 16:57:54 +02:00
Sunny 74e33a70c4 Delete stale metrics on object delete
Use the metrics helper to record all the metrics. Metrics helpers
ensures that the metrics for deleted objects are deleted as well.

Move all the metrics recording to be performed at the very end of the
reconciliation. Realtime metrics for readiness is no longer recorded as
it will be removed in a future version for CRD metrics collected using
kube-state-metrics. Updating the object status with realtime readiness
should provide the readiness to CRD metrics watchers.

`HelmReleaseReconciler.reconcileDelete()` is modified to receive a
pointer HelmRelease object so that any modifications on the object is
reflected on the object instance that's passed to the metrics recorder.
This is not needed for `HelmReleaseReconciler.reconcile()` as it returns
a new copy of the object that's saved in the same object variable,
overwriting the object instance with the updates.

Signed-off-by: Sunny <darkowlzz@protonmail.com>
2023-08-15 02:42:09 +05:30
Hidde Beydals d76f3a355b
controller: jitter requeue interval
This adds a `--interval-jitter-percentage` flag to the controller to
add a +/- percentage jitter to the interval defined in a HelmRelease
(defaults to 5%).

Effectively, this results in a reconciliation every 9.5 - 10.5 minutes
for a resource with an interval of 10 minutes.

Main reason to add this change is to mitigate spikes in memory and
CPU usage caused by many resources being configured with the same
interval.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-08-09 17:50:43 +02:00
Carlos Sanchez ee3f232fd8
chore: fix typo reconcilation
Signed-off-by: Carlos Sanchez <carlos@apache.org>
2023-07-14 19:22:25 +02:00
Aurel Canciu 7c75fc4d3d
Fix HelmRelease reconciliation loop
Likely after the upgrade to controller-runtime v0.15.0 a regression
surfaced for long-running reconciliations of HelmRelease resources (e.g.
for charts having pre-upgrade hooks taking a few minutes to complete).
This regression would cause the controller to immediately re-run the
upgrade after a successful upgrade, thus entering an almost-endless
loop.

Apparently, the only fix to this issue is to ensure
`.Status.LastReleaseRevision` is updated as soon as possible in the
reconiliation cycle rather than wait for the update at the end of the
cycle.

Signed-off-by: Aurel Canciu <aurelcanciu@gmail.com>
2023-06-20 14:52:50 +03:00
Hidde Beydals 2ea7393629
Include revision and token in event metadata
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-05-31 13:01:50 +02:00
Hidde Beydals 4df753a1f1
Use last attempted values checksum as event metadata token
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-05-24 14:23:11 +02:00
Hidde Beydals d345af0e73
Rename controllers to controller
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-05-24 11:05:53 +02:00