Commit Graph

1199 Commits

Author SHA1 Message Date
Hidde Beydals 80d0878e96
controller: ignore `NotFound` API error on delete
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:50 +01:00
Hidde Beydals 2df90eb4cf
reconcile: improve observability between actions
- Change the log-level of "action determination" to "debug".
- Set `Ready=Unknown` while working on an install or upgrade.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:49 +01:00
Hidde Beydals 7c52fd255f
action: simplify chart diff logic
We actually only care about the chart name or version changing, as we
assume proper (immutable) versioning by the publisher of the chart
(either the user, or the source-controller).

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:49 +01:00
Hidde Beydals 096956fdfd
controller: properly record object metrics
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:48 +01:00
Hidde Beydals d0c4c14056
reconcile: improve uninstall w/o purging history
This improves the reconciliation of an uninstall when the release has
already been uninstalled while `KeepHistory` has been set, by detecting
the (sadly non-typed) error Helm produces as desired state.

Avoiding certain edge-cases where for example a deleted HelmRelease
would end up in an irrecoverable loop of uninstall attempts, after
being remediated (using an uninstall) before the deletion request.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:47 +01:00
Hidde Beydals 191bebfafd
reconcile: simplify `NextAction` logic
By looking at the type of the error, instead of doing a separate check
on `cur != nil`.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:47 +01:00
Hidde Beydals f156c3550e
reconcile: allow cfg of manager in atomic action
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:46 +01:00
Hidde Beydals ac9c2c3142
reconcile: ensure object patch on context cancel
As we are working with secondary state which we need to keep track of,
persisting the last state even when the context is canceled (due to
e.g. a controller shutdown) is important to improve the chances of
successfully being able to recover from any abrupt terminations.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:46 +01:00
Hidde Beydals 19be1b24ac
api: change format of `Snapshot#FullReleaseName`
From `<namespace>/<name>.<version>` to `<namespace>/<name>.v<version>`,
to better resemble the internal name format of e.g. Helm storage
Secrets.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:45 +01:00
Hidde Beydals 272329d86a
action: add `:` separator between ts and msg logs
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:45 +01:00
Hidde Beydals a6ae4c3fb9
reconcile: improve log levels of actions
This ensures the logs of the Kubernetes client used by Helm are
persisted to the log buffer, as they can contain important information
when an action times out.

In addition, move the logs from the Helm actions themselves to the
"debug" log level (while still including them in Kubernetes Events in
case of a failure), in favor of the logs produced by the `reconcile`
package itself. While moving the logs from the Helm storage to the
"trace" log level, as they only contain information about e.g. writes
to a Secret.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:44 +01:00
Hidde Beydals bc036c027f
reconcile: improve insights of progress in logs
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:43 +01:00
Hidde Beydals 5510175ccb
reconcile: tweak event messages
This in an attempt to maintain compatability with earlier documented
inclusion and exclusion lists for Alerts, like the following:

```
  eventSources:
    - kind: HelmRelease
      name: demo
  inclusionList:
    - ".*.upgrade.*succeeded.*"
```

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:43 +01:00
Hidde Beydals 94064da340
controller: add reconcile release tests
Plus some minor improvements to the logic, based on writing tests.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:42 +01:00
Hidde Beydals 882da27a5d
api: move `Current` and `Previous` into `History`
The primary reason for this is the alphabetical ordering of `kubectl
describe`, which caused the fields to be listed in separate places
instead of a bundle.

From a programmatic perspective, it is also great because it is now much
easier to reset any previous state when e.g. uninstalling a release. As
we can simply write an empty struct to erase any memory of a previous
release, instead of having to deal with multiple fields.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:42 +01:00
Hidde Beydals 7dfce0c738
api: introduce `APIVersion` in `Snapshot`
This will allow the controller to pick the right method for digest
calculations when we for example add new data into the calculation.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:41 +01:00
Hidde Beydals 9df9b176d9
api: various naming improvements
- Rename `HelmReleaseInfo` to `Snapshot`.
- Rename `HelmReleaseTestHook` to `TestHookStatus`.
- Rename `ObservedRelease` to `Observation`.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:41 +01:00
Hidde Beydals b2ba3d97ea
controller: improve deletion logic and add tests
This ensures certain edge-cases around the availability of the service
account and/or KubeConfig are handled.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:40 +01:00
Hidde Beydals fbd73ac399
controller: start w/ adding tests for HelmRelease
This adds base coverage for some of the simpler methods which do not
require extensive mocking.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:39 +01:00
Hidde Beydals 1dac82ad2c
reconcile: handle manually uninstalled release
This is a better way of dealing with this situation, as the previous
logic would result in an `ErrNoStorageUpdate`.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:39 +01:00
Hidde Beydals 5e3ad5d21a
reconcile: add `HelmChartTemplate` sub-reconciler
"With hope comes the potential for both triumph and tribulation."

Due to difficulties beyond the time I have at hands at present[1], the
separate reconciler which took care of ensuring the HelmChart of the
HelmRelease was kept up-to-date has been transformed into a
sub-reconciler.

The behavior of the sub-reconciler remains largely unchanged, except the
required changes to deal with the lack of possibilities to requeue.
Effectively, this means that instead of e.g. deleting the HelmChart
object, requeue, and create it again. This is now handled in a single
operation, unless the deletion fails.

[1]: The core of the issue is that deregistration of finalizers becomes
difficult due to the behavior of the patch helper, and unavailability of
list merges for patch operations on Custom Resources within Kubernetes.

This means that when two reconcilers simultaneously work on the
deregistration of the finalizers, and one succeeds before the other. The
last finishing reconciler will attempt to add the finalizer of the other
reconciler back, as it did exist at the start of their reconciliation
run.

Attempts to work around this (for example, by using an optimistic lock
on the patch operation of the finalizers field) would cause new issues.
As Kubernetes will then delete the object as soon as the patch has
succeeded, and before the reconciliation process actually ends.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:38 +01:00
Hidde Beydals dab2578c07
acl: introduce package to enable global config
This introduces an `acl` package in `internal` which globally configures
the allowance to namespaced references, instead of having to pass on a
variable everywhere.

For the sake of security, the default behavior of the package itself is
to _not_ allow cross namespace references. However, the behavior of the
controller remains unchanged, and the configuration flag still enables
the allowance by default.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:38 +01:00
Hidde Beydals e32c1a0f4a
reconcile: trim space from Helm error messages
Sadly, Helm more than often ends error messages with `\n\n`. Trim this
space to ensure we produce pretty messages.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:37 +01:00
Hidde Beydals 68c273b701
controller: handle delete before adding finalizer
When an object is marked as under deletion, the API server will reject
any attempt to register new finalizers. Given this, handling the
deletion timestamp always has to come before an attempt to register
the finalizer.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:37 +01:00
Hidde Beydals 866f076d1f
reconcile: share PatchHelper with controller
This ensures they both have the same observation on the last
modifications made to the object. Preventing possible scenarios where
a condition would not be removed because it wasn't set at the start of
the reconcile run, then added, and then removed. Causing it to go
unnoticed during the diff calculation.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:36 +01:00
Hidde Beydals bbefbc4ded
reconcile: use failure count in Stalled condition
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:35 +01:00
Hidde Beydals d802ba6cc1
controllers: roughly rewire HelmRelease reconciler
This adds the base wiring to get the controller to work with the
v2beta2 API and the newly introduced packages in `internal/`.

In essence, this means that from now on the controller will utilize all
new code for the reconciliation of the HelmRelease resource.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:35 +01:00
Hidde Beydals eee91b06fa
Introduce new `yaml` package with `Encode` func
Comparison versus `sigs.k8s.io/yaml#Marshal`:

```
BenchmarkEncode/EncodeWithSort-12         	    475	  2419063 ns/op	2235305 B/op	   5398 allocs/op
BenchmarkEncode/EncodeWithSort-12         	    498	  2406794 ns/op	2235300 B/op	   5398 allocs/op
BenchmarkEncode/EncodeWithSort-12         	    492	  2376460 ns/op	2235312 B/op	   5398 allocs/op
BenchmarkEncode/EncodeWithSort-12         	    496	  2406756 ns/op	2235323 B/op	   5398 allocs/op
BenchmarkEncode/EncodeWithSort-12         	    488	  2402969 ns/op	2235336 B/op	   5398 allocs/op
BenchmarkEncode/SigYAMLMarshal-12         	    202	  5791549 ns/op	3124841 B/op	  19324 allocs/op
BenchmarkEncode/SigYAMLMarshal-12         	    205	  5780248 ns/op	3123193 B/op	  19320 allocs/op
BenchmarkEncode/SigYAMLMarshal-12         	    207	  5762621 ns/op	3124537 B/op	  19324 allocs/op
BenchmarkEncode/SigYAMLMarshal-12         	    214	  5748899 ns/op	3121183 B/op	  19324 allocs/op
BenchmarkEncode/SigYAMLMarshal-12         	    211	  5682105 ns/op	3120592 B/op	  19325 allocs/op
```

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:34 +01:00
Hidde Beydals bb4e9b7cee
Update YAMLs to `helm.toolkit.fluxcd.io/v2beta2`
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:34 +01:00
Hidde Beydals deb0b14e43
api: make v2beta2 storage version
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:33 +01:00
Hidde Beydals 76f62ffc47
api: backport uninstall del propagation to v2beta2
Manual backport of the work done in #698, to keep things aligned.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:10 +01:00
Hidde Beydals 64b2d5455e
Address review comments
- Use `Unknown` status for the `TestSuccess` condition when tests
  have not been run yet.
- Update Ready summarization logic to incorportate conditions with an
  Unknown status. Within the context of readiness, this always caises
  Ready=False when the condition is included in the summarization.
- Variety of tiny fixes.
- Tiny nits in test mocks to prevent confusion.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:09 +01:00
Hidde Beydals 410ce3a00d
reconcile: include "token" in event metadata
This includes the "token" in the emitted events which is used to rate
limit events received by the notification-controller.

Either by using the already calculated config (values) digest, or by
calculating it for the current reconciliation request in scenarios
where it isn't available from made observations.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
2023-11-20 12:06:09 +01:00
Hidde Beydals 64cc09ce5e
reconcile: test emitted events
Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:06:08 +01:00
Hidde Beydals ea81c8e099
action: include TS in LogBuffer
This provides more context to individual log entries (and the duration
between individual log lines) while e.g. printing them in an event.

Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:06:08 +01:00
Hidde Beydals b975b3f999
reconcile: add atomic release reconciler
This commit adds an atomic release reconciler, capable of stepping
through a series of Helm actions. In addition, it adds the last bits
around eventing and summarizing the end state of the Condition types
into e.g. a Ready condition.

Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:06:07 +01:00
Hidde Beydals 479341461a
action: allow composed release name >=53 char
This solves the issue where a release name composed out of e.g.
the target namespace and name of the HelmRelease itself would exceed
the >=53 character length. By calculating the SHA256 checksum of the
release name, taking the first 12 characters of this checksum and
appending it to the release named trimmed to 40 characters separated
by a hyphen (`<long-release-name>-abcdef12345678`).

Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:06:07 +01:00
Hidde Beydals 026fd45c2c
action: add name param to rollback and uninstall
This gives more fine-grain control over what release must be targeted,
as we do not always want to rely on the current spec but rather on e.g.
a release we have made ourselves with a previous configuration for
garbage collection purposes.

Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:06:06 +01:00
Hidde Beydals 9812286bb4
action: add `Len` method to `LogBuffer`
This allows for requesting the count of non-empty values in the ring
buffer, and thus the number of log lines.

Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:06:05 +01:00
Hidde Beydals 0b8692f61a
api: add service account name validation rule
Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:06:05 +01:00
Hidde Beydals 9e1eedcfa4
api: various changes to support new logic
- Change the map with Helm release test hooks to a pointer map. This
  allows (in combination with the constrains around JSON serialization)
  to distinguish a release _without_ a test run from a release _with_
  test run but no tests (an empty map).
- Add `GetTestHooks` and `SetTestHooks` methods to help circumvent some
  of the common problems around working with a pointer map in Go (e.g.
  not being capable of iterating over it using range).
- Add `HasBeenTested` and `HasTestInPhase` methods to help make
  observations on captured release information.
- Add `StorageNamespace` to Status to allow for observations of
  configuration changes which are mutating compared to the spec.
- Add `GetActiveRemediation` helper method to get the active
  remediation strategy based on the presence of Current and/or Previous
  release observations in the Status of the object.
- Add `ReleaseTargetChanged` helper method to determine if an immutable
  release target changed has occurred, in which case e.g. garbage
  collection needs to happen before performing any other action.
- Add `GetCurrent`, `HasCurrent`, `GetPrevious` and `HasPrevious`
  helper methods to ease access to their values nested in the Status.
- Add `FullReleaseName` and `VersionedChartName` helper methods to e.g.
  allow printing full name references in Condition and Event messages
  which can be placed in a point in time based on metadata more
  familiar to a user than for example the observed generation.
- Change `GetFailureCount` and `RetriesExhausted` signatures of
  `Remediation` interface to take a pointer. This eases use of the API,
  as generally speaking a (Kubernetes) API object is a pointer.
- Move methods from `HelmReleaseSpec` to `HelmRelease`, this is easier
  to access and matches `GetConditions`, etc.
- Remove `DeploymentAction` interface and `GetDescription` from
  `Remediation` interface as this is no longer of value.

Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:06:04 +01:00
Jiri Tyr 8cefed19fd
Adding tests
Signed-off-by: Jiri Tyr <jiri.tyr@gmail.com>
2023-11-20 12:06:04 +01:00
Jiri Tyr e1393542a7
Fixing typo
Co-authored-by: Hidde Beydals <hiddeco@users.noreply.github.com>
Signed-off-by: Jiri Tyr <jtyr@users.noreply.github.com>
2023-11-20 12:06:03 +01:00
Jiri Tyr 88a21fecbf
Moving stuff from runner; removing changes in v2beta1
Signed-off-by: Jiri Tyr <jiri.tyr@gmail.com>
2023-11-20 12:06:03 +01:00
Jiri Tyr 6db62ed507
Adding test filters
Signed-off-by: Jiri Tyr <jiri.tyr@gmail.com>
2023-11-20 12:06:02 +01:00
Hidde Beydals 5843cc2ef0
action: allow passing of config options
This to allow the Flux CLI to e.g. enable the dry-run flag on an action
outside of the HelmRelease spec, and inject other (user input based)
modifications.

Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:06:02 +01:00
Hidde Beydals 220e789481
Allow detection of next reconcile action
This provides a rough (but not flawless) outline for determining the
sub-reconciler which should run based on the state of the `HelmRelease`
API object, and the Helm storage.

Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:06:01 +01:00
Hidde Beydals d9055f81b8
Add reconcile logic for individual Helm actions
This adds a `reconcile` package with the reconciliation and (status)
observation logic for individual Helm actions, but no glue to loop
through them till desired state.

All actions have individual `ActionReconciler` implementations which
construct their `action.Configuration` out of a factory, so the Helm
client can be shared between sub-reconcilers. They all present a
`ReconcilerType`, allowing an iterator to e.g. stop after running
every type just once.

The observation model can be explained as follows, but may lack some
minor details:

- The observed release has to match the release target of the
  HelmRelease object
- ActionReconcilers of type "release" move Current to Previous
  when they see a higher release revision. They then write the
  new release to Current, and continue to observe writes to
  revisions that match either version
- Remediation only updates Current
- Test updates Current and Current.TestHooks
- Unlock updates Current

After running the action, the reconcilers observe both the action
result and the state of the object. This allows them to distinguish
certain types of errors which are otherwise hard to detect.
For example, errors which do not cause drift to the Helm storage, or a
change of release version compared to Current for actions which do not
provide a version target flag.

Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:06:01 +01:00
Hidde Beydals dfebba2783
Add `ObservedRelease` and other release utils
This adds a `release` package which allows to create (minified)
`ObservedRelease` copy of a Helm release object. This
`ObservedRelease` contains sufficient data to detect changes
to the storage object made by Helm actions run manually, and a variety
of malicious changes (but not all, at present).

The data in an `ObservedRelease` can be filtered using a `DataFilter`,
this allows for example to filter out test hooks to prevent the
controller from taking action on a manually run `helm test`.

The consumer can combine the `ObservedRelease` with a Helm storage
observer to take snapshots of the release object as written to the
storage by a Helm action. To record this on a `HelmRelease` v2beta2 API
object, the `ObservedRelease` can be transformed into a
`HelmReleaseInfo` API object which can be recorded as either the
Current or Previous release in the status.

During the transformation, the digests of both the `ObservedRelease`
object and release config are calculated using the canonical algorithm.

Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:06:00 +01:00
Hidde Beydals 89a6f497e5
Run individual Helm actions using HelmRelease
This commit introduces an `action` package which allows the consumer to
run Helm actions using the instructions from a `HelmRelease` v2beta2
API object.

The actions do not determine if there is a desire be run, nor do they
record state on the object. This can however be injected by the caller
using the simplified observing Helm storage driver, which now iterates
over a list of callback functions after persisting an object instead
of keeping state.

This separation of concerns would allow e.g. the Flux CLI later on
to run actions (but with a dry-run flag or different storage
configuration) using the object in the same manner as the controller.

Some minor changes have been made to the `postrender` and `runner`
package to allow the code to co-exist while we are inbetween API
versions.

Signed-off-by: Hidde Beydals <hello@hidde.co>
2023-11-20 12:05:57 +01:00