Commit Graph

54 Commits

Author SHA1 Message Date
Soule BA 65a02c8c6c
Add a test when switching from chart template to chartRef
The test case successfully upgrade with the same chart because version
is not computed the same way (12 digits of digest appended for
OCIRepository source).

Signed-off-by: Soule BA <bah.soule@gmail.com>
2024-04-18 13:07:41 +02:00
Soule BA edec322a3d
Take into account the oci-digest
This commit add the oci artifact digest into the release observed
snapshot. This is used to later to add that value as an annotation.

Signed-off-by: Soule BA <bah.soule@gmail.com>
2024-04-18 13:07:41 +02:00
Soule BA aeac55dba9
Adding 12 first character of digest to chart version
This is needed for an OCIRepository source in order to detect change for
mutable tags.

Signed-off-by: Soule BA <bah.soule@gmail.com>
2024-04-18 13:07:41 +02:00
Soule BA 686fe58f6e
address review comments
Signed-off-by: Soule BA <bah.soule@gmail.com>
2024-04-18 13:07:40 +02:00
Soule BA 157f806598
fix methods names
Signed-off-by: Soule BA <bah.soule@gmail.com>
2024-04-18 13:07:40 +02:00
Soule BA 20e14fe304
This commit enable reusing an existing OCIRepo as chartRef.
It takes into account switching from a chart
template to a referenced source (garbage collection).

Signed-off-by: Soule BA <bah.soule@gmail.com>
2024-04-18 13:05:04 +02:00
Soule BA ff1421257e
fix: use corev1 event type for sending events
Signed-off-by: Soule BA <bah.soule@gmail.com>
2024-03-07 22:00:26 +01:00
Soule BA e283ead7f3
Reintroduce missing events for helmChart reconciliation
If implemented this PR reintroduce events for some failling action
during the reconciliation process, related to the helmChart retrieval
and loading of chart and values.

Signed-off-by: Soule BA <bah.soule@gmail.com>
2024-03-06 15:52:41 +01:00
Sunny 59c577a924 Remove stale Ready=False conditions values
When the reconciliation begins, while fulfilling the prerequisites,
Ready=False condition for various reasons are added on the object. On
failure, this reason is persisted on the object. On a subsequent
reconciliation, when the failure is recovered, the Ready=False condition
is not updates until the atomic reconciliation reaches a conclusion.
During this period if the atomic reconciliation enters a retry loop due
to constant drift detection and correction, the stale Ready=False
condition with incorrect reason persists on the object. The Ready=False
message is also copied to Reconciling=True condition, resulting in an
incorrect depiction of what's actually happening.
For example, if previously the HelmRelease failed with dependency not
ready error, on a subsequent reconciliation, even after going past the
dependency check and returning from atomic reconciliation due to drift
detection and correction loop scenario, the Ready=False condition
continues to show the stale dependency not ready error.

In order to show more accurate status, the Ready=False conditions added
while fulfilling prerequisites can be removed once those checks have
succeeded, updating Ready=False to Ready=Unknown with "reconciliation in
progress" message. If the atomic reconciliation gets stuck in the drift
detection and correction loop with this, the Ready and Reconciling
conditons would show "reconciliation in progress". This should be a
better indicator of what's going on. The events and logs can be checked
to determine accurately what's causing the reconciliation to be
progressing for ever.

Signed-off-by: Sunny <github@darkowlzz.space>
2024-02-05 13:31:05 +05:30
Hidde Beydals 07e204615b
loader: log HTTP errors to provide faster feedback
This configures a logger on the HTTP client used to load a Helm chart,
ensuring HTTP errors surface faster.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-12-13 11:30:37 +01:00
Hidde Beydals 93d2118f71
controller: enrich "HelmChart not ready" messages
This propagates the reason a HelmChart is (likely) not ready to the
message of the Ready condition.

The goal of this is to make it easier for people to reason about a
potential failure that may be happening while retrieving the chart,
without having to inspect the HelmChart itself.

As at times, they may not have access (due to e.g. not being able to
access the namespace, while the controller is allowed to create the
object there), or are simply not aware of the fact that this object
is created by the controller for them.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-12-07 23:35:44 +01:00
Hidde Beydals 0919fb4c24
controller: remove deprecated metrics
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-12-01 17:23:52 +01:00
Hidde Beydals 51563d6012
reconcile: stall without rollback target
This ensures that if there is no target to roll back to due to all of
them being in a failed state, the controller stalls instead of ending up
in a loop of upgrade attempts.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-12-01 17:20:51 +01:00
Hidde Beydals 0a2041c338
controller: ensure object in cache before requeue
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-12-01 17:20:50 +01:00
Hidde Beydals 48cad68386
controller: unready dep should not bump obs gen
This ensures that any unfulfilled dependencies for which we requeue do
not prematurely bump the observed generation by introducing typed
errors.

These typed errors ensure that the logic to bump the observed generation
can continue to be the same, while ignoring them just in time before
returning the final error.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-12-01 14:14:40 +01:00
Hidde Beydals 6b7789aadc
Implement `forceAt` and `resetAt` annotations
This makes the controller actually take the
`reconcile.fluxcd.io/forceAt` and `reconcile.fluxcd.io/resetAt` into
account.

For `reconcile.fluxcd.io/resetAt`, this means that the failure counts on
the `HelmRelease` object are reset when the token value of the
annotation equals `reconcile.fluxcd.io/requestedAt`. Allowing the
controller to start over with attempting to install or upgrade the
release until the retries count has been reached again.

For `reconcile.fluxcd.io/forceAt`, this means that a one-off Helm
install or upgrade is allowed to take place even if the object is out of
retries, in a failed state where it should be remediated, or in-sync.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-30 10:22:49 +01:00
Hidde Beydals 2d927b9b9e
Miscellaneous tidying of minor things
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-24 17:59:45 +01:00
Hidde Beydals eaa2a8c2fe
Update dependencies
- github.com/fluxcd/cli-utils to v0.36.0-flux.1
- github.com/fluxcd/pkg/apis/event to v0.6.0
- github.com/fluxcd/pkg/apis/kustomize to v1.2.0
- github.com/fluxcd/pkg/apis/meta to v1.2.0
- github.com/fluxcd/pkg/runtime to v0.43.0
- github.com/fluxcd/pkg/ssa to v0.34.0
- github.com/fluxcd/pkg/testserver to v0.5.0
- github.com/go-logr/logr to v1.3.0
- github.com/google/go-cmp to v0.6.0
- github.com/hashicorp/go-retryablehttp to v0.7.5
- github.com/onsi/gomega to v1.30.0
- github.com/opencontainers/go-digest to v1.0.1-0.20231025023718-d50d2fec9c98
- github.com/opencontainers/go-digest/blake3 to v0.0.0-20231025023718-d50d2fec9c98
- golang.org/x/text to v0.14.0
- helm.sh/helm/v3 to v3.13.2
- k8s.io/api to v0.28.4
- k8s.io/apiextensions-apiserver to v0.28.4
- k8s.io/apimachinery to v0.28.4
- k8s.io/cli-runtime to v0.28.4
- k8s.io/client-go to v0.28.4
- k8s.io/kubectl to v0.28.4
- k8s.io/utils to v0.0.0-20231121161247-cf03d44ff3cf
- sigs.k8s.io/controller-runtime to v0.16.3
- sigs.k8s.io/kustomize/api to v0.15.0
- sigs.k8s.io/kustomize/kyaml to v0.15.0
- sigs.k8s.io/yaml to v1.4.0

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-24 12:43:33 +01:00
Hidde Beydals 4a8d2ff0f4
action: provide reason for failures count reset
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-23 00:17:17 +01:00
Hidde Beydals 7aad010664
controller: immediate requeue unfinished release
This improves continuity while the controller attempts to move the
release forward.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-23 00:17:14 +01:00
Hidde Beydals 5d1f34a029
controller: patch after setting `Reconciling=True`
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-23 00:17:13 +01:00
Hidde Beydals 20c00fd47a
action: provide a reason on release target changes
This to allow better feedback to the user on why the controller decided
to uninstall the release.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-23 00:17:09 +01:00
Hidde Beydals 580c72cd09
controller: adopt release based on v2beta1 state
This allows the controller to be updated from `v2beta1` to `v2beta2`
without triggering a release to settle state.

It does this by looking at the previous successful release as recorded
for the `v2beta1` object, and if found, recording a snapshot for it in
the new `History` field of the status.

This feature can be disabled by setting the `AdoptLegacyReleases`
feature flag to `false`.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-22 23:14:17 +01:00
Hidde Beydals 70485017d2
controller: requeue on fixed interval on chart 404
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:54 +01:00
Hidde Beydals c5a017cb76
api: record observed releases in `Status.History`
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:53 +01:00
Hidde Beydals 2e0e22593f
reconcile: improve state determination
This decouples the state determination from deciding which action to
take, making it easier to reason about the different types of state
and what action should be taken to drive it forward.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:51 +01:00
Hidde Beydals 80d0878e96
controller: ignore `NotFound` API error on delete
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:50 +01:00
Hidde Beydals 096956fdfd
controller: properly record object metrics
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:48 +01:00
Hidde Beydals f156c3550e
reconcile: allow cfg of manager in atomic action
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:46 +01:00
Hidde Beydals a6ae4c3fb9
reconcile: improve log levels of actions
This ensures the logs of the Kubernetes client used by Helm are
persisted to the log buffer, as they can contain important information
when an action times out.

In addition, move the logs from the Helm actions themselves to the
"debug" log level (while still including them in Kubernetes Events in
case of a failure), in favor of the logs produced by the `reconcile`
package itself. While moving the logs from the Helm storage to the
"trace" log level, as they only contain information about e.g. writes
to a Secret.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:44 +01:00
Hidde Beydals 94064da340
controller: add reconcile release tests
Plus some minor improvements to the logic, based on writing tests.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:42 +01:00
Hidde Beydals 882da27a5d
api: move `Current` and `Previous` into `History`
The primary reason for this is the alphabetical ordering of `kubectl
describe`, which caused the fields to be listed in separate places
instead of a bundle.

From a programmatic perspective, it is also great because it is now much
easier to reset any previous state when e.g. uninstalling a release. As
we can simply write an empty struct to erase any memory of a previous
release, instead of having to deal with multiple fields.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:42 +01:00
Hidde Beydals b2ba3d97ea
controller: improve deletion logic and add tests
This ensures certain edge-cases around the availability of the service
account and/or KubeConfig are handled.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:40 +01:00
Hidde Beydals fbd73ac399
controller: start w/ adding tests for HelmRelease
This adds base coverage for some of the simpler methods which do not
require extensive mocking.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:39 +01:00
Hidde Beydals 5e3ad5d21a
reconcile: add `HelmChartTemplate` sub-reconciler
"With hope comes the potential for both triumph and tribulation."

Due to difficulties beyond the time I have at hands at present[1], the
separate reconciler which took care of ensuring the HelmChart of the
HelmRelease was kept up-to-date has been transformed into a
sub-reconciler.

The behavior of the sub-reconciler remains largely unchanged, except the
required changes to deal with the lack of possibilities to requeue.
Effectively, this means that instead of e.g. deleting the HelmChart
object, requeue, and create it again. This is now handled in a single
operation, unless the deletion fails.

[1]: The core of the issue is that deregistration of finalizers becomes
difficult due to the behavior of the patch helper, and unavailability of
list merges for patch operations on Custom Resources within Kubernetes.

This means that when two reconcilers simultaneously work on the
deregistration of the finalizers, and one succeeds before the other. The
last finishing reconciler will attempt to add the finalizer of the other
reconciler back, as it did exist at the start of their reconciliation
run.

Attempts to work around this (for example, by using an optimistic lock
on the patch operation of the finalizers field) would cause new issues.
As Kubernetes will then delete the object as soon as the patch has
succeeded, and before the reconciliation process actually ends.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:38 +01:00
Hidde Beydals 68c273b701
controller: handle delete before adding finalizer
When an object is marked as under deletion, the API server will reject
any attempt to register new finalizers. Given this, handling the
deletion timestamp always has to come before an attempt to register
the finalizer.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:37 +01:00
Hidde Beydals 866f076d1f
reconcile: share PatchHelper with controller
This ensures they both have the same observation on the last
modifications made to the object. Preventing possible scenarios where
a condition would not be removed because it wasn't set at the start of
the reconcile run, then added, and then removed. Causing it to go
unnoticed during the diff calculation.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:36 +01:00
Hidde Beydals d802ba6cc1
controllers: roughly rewire HelmRelease reconciler
This adds the base wiring to get the controller to work with the
v2beta2 API and the newly introduced packages in `internal/`.

In essence, this means that from now on the controller will utilize all
new code for the reconciliation of the HelmRelease resource.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:35 +01:00
Hidde Beydals c99b00d885
Move predicates into package and add tests
Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:02:41 +01:00
Hidde Beydals 0140eeeea9
Factor various bits out of reconciler
This commit moves various generic bits out of the reconciler into
separate modules, while adding more test coverage.

Some of the logic around merging chart values from references has been
improved to work with `client.Object`, instead of two separate maps.

In addition, the option to override the hostname of an Artifact has
been removed. It was undocumented and for testing purposes only, which
these days can be better achieved by e.g. configuring the
`--storage-adv-addr`.

Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:02:40 +01:00
Hidde Beydals fe661df9d7
Move HelmChart handling to separate reconciler
This moves the HelmChart template handling to a separate reconciler,
with predicates detecing relevant changes. The idea is that this would
both facilitate working _without_ chart templates but with references
in the future, and to reduce cognitive load while working with
reconciler logic.

The predicate uses `DeepEqual` from `k8s.io/apimachinery/pkg/api/equality`
to inspect the Chart template objects of the old and new HelmRelease
object in the update event.

The reconciler uses server-side apply to create or update the HelmChart
on the cluster, and emits an event based on the change set of the
action. It does not produce any diff yet, as the server-side apply
library at present does not provide a way to gain access to an "old"
versus "new" objects after performing an apply. The `diff` package
has however been prepared to allow diffing Unstructured objects.

As this reconciler has a separate life-cycle, a new
`chart.finalizers.fluxcd.io` finalizer has been introduced to ensure
a HelmChart is properly garbage collected before the HelmRelease is
allowed to be deleted.

The implementation on the release reconciler's end is a rough sketch,
but in working shape. The foresight is that much of the reconciler will
change when the release logic will be adjusted to work with the earlier
introduced storage observer.

Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:02:40 +01:00
Hidde Beydals f054ff5853
misc: fix hypothetical implicit memory aliasing
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-10-10 10:43:13 +02:00
Hidde Beydals aa2f6dd3be
misc: remove redundant use of `fmt.Sprintf`
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-10-10 10:03:28 +02:00
Hidde Beydals 08d3674e5a
misc: use `time.Since` instead of `time.Now().Sub`
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-10-10 10:02:42 +02:00
Stefan Prodan bd3ec35697
Retry failed releases when charts are available in storage
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
2023-10-04 11:03:07 +03:00
Hidde Beydals 2465cb43bd
controller: use `DefaultServiceAccount` in differ
This addresses an issue in which the defunct `DefaultServiceAccount`
from the `HelmReleaseReconciler` was being used to construct the
impersonator used by the differ.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-09-18 12:54:08 +02:00
Hidde Beydals 1aa739028d
controller: strip newlines from Helm error message
To prevent spurious newlines between the error message and the captured
logs, as at times Helm ends error with one or multiple newlines.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-09-11 16:57:54 +02:00
Sunny 74e33a70c4 Delete stale metrics on object delete
Use the metrics helper to record all the metrics. Metrics helpers
ensures that the metrics for deleted objects are deleted as well.

Move all the metrics recording to be performed at the very end of the
reconciliation. Realtime metrics for readiness is no longer recorded as
it will be removed in a future version for CRD metrics collected using
kube-state-metrics. Updating the object status with realtime readiness
should provide the readiness to CRD metrics watchers.

`HelmReleaseReconciler.reconcileDelete()` is modified to receive a
pointer HelmRelease object so that any modifications on the object is
reflected on the object instance that's passed to the metrics recorder.
This is not needed for `HelmReleaseReconciler.reconcile()` as it returns
a new copy of the object that's saved in the same object variable,
overwriting the object instance with the updates.

Signed-off-by: Sunny <darkowlzz@protonmail.com>
2023-08-15 02:42:09 +05:30
Hidde Beydals d76f3a355b
controller: jitter requeue interval
This adds a `--interval-jitter-percentage` flag to the controller to
add a +/- percentage jitter to the interval defined in a HelmRelease
(defaults to 5%).

Effectively, this results in a reconciliation every 9.5 - 10.5 minutes
for a resource with an interval of 10 minutes.

Main reason to add this change is to mitigate spikes in memory and
CPU usage caused by many resources being configured with the same
interval.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-08-09 17:50:43 +02:00
Carlos Sanchez ee3f232fd8
chore: fix typo reconcilation
Signed-off-by: Carlos Sanchez <carlos@apache.org>
2023-07-14 19:22:25 +02:00