Merge pull request #330 from kakabisht/feat-fine-tuning-errors
Adding info about fine tuning errors detection Issue #3879
This commit is contained in:
commit
8f4e03a7c7
|
|
@ -0,0 +1,162 @@
|
||||||
|
# **Fine Tuning Error Detection**
|
||||||
|
|
||||||
|
Fleet monitors the `status` field of deployed resources to determine whether a `Bundle` is healthy or in error. In certain cases, Fleet may interpret a condition in the status field as an error, even if it is expected or harmless.
|
||||||
|
|
||||||
|
You can adjust this behavior in two ways:
|
||||||
|
|
||||||
|
* Ignore conditions in `fleet.yaml`
|
||||||
|
* Customize error mappings with environment variables
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
:::note
|
||||||
|
You should rarely need to configure readiness detection in Fleet with environment variables. If you do, open an issue or submit a pull request to help improve the default readiness detection.
|
||||||
|
:::
|
||||||
|
|
||||||
|
## **Ignore conditions in `fleet.yaml`**
|
||||||
|
|
||||||
|
Use the `ignore.conditions` setting in the `fleet.yaml` file to tell Fleet to ignore specific conditions.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# from https://fleet.rancher.io/ref-fleet-yaml
|
||||||
|
|
||||||
|
# Ignore fields when monitoring a Bundle. This can be used when Fleet thinks
|
||||||
|
# some conditions in Custom Resources makes the Bundle to be in an error state
|
||||||
|
# when it shouldn't.
|
||||||
|
ignore:
|
||||||
|
|
||||||
|
# Conditions to be ignored
|
||||||
|
conditions:
|
||||||
|
|
||||||
|
# In this example a condition will be ignored if it contains
|
||||||
|
# {"type": "Active", "status", "False"}
|
||||||
|
- type: Active
|
||||||
|
status: "False"
|
||||||
|
```
|
||||||
|
|
||||||
|
This method is useful when a custom resource or controller sets conditions that cause Fleet to mark a Bundle as failed, even though the resource is healthy.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## **Configure error mapping with environment variables**
|
||||||
|
|
||||||
|
In Fleet **v0.13**, error detection was enhanced to give you more control. You can use the environment variable `CATTLE_WRANGLER_CHECK_GVK_ERROR_MAPPING` to customize how resource conditions are interpreted.
|
||||||
|
|
||||||
|
This variable lets you define, by `Group`,`Version`,`Kind` (GVK), which condition values should be treated as errors or explicitly not treated as errors.
|
||||||
|
|
||||||
|
Set this variable in your Fleet Helm chart deployment (`values.yaml`) using `extraEnv`. The value must be JSON.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Extra environment variables passed to the fleet pods.
|
||||||
|
# extraEnv:
|
||||||
|
# - name: OCI_STORAGE
|
||||||
|
# value: "false"
|
||||||
|
```
|
||||||
|
:::note
|
||||||
|
This setting is global to all Fleet controllers and applies to every `GitRepo`. If you need to adjust error handling only for a specific Bundle, use the `ignoreConditions` option in `fleet.yaml` instead.
|
||||||
|
:::
|
||||||
|
|
||||||
|
### Merging behavior
|
||||||
|
|
||||||
|
When you override mappings with `CATTLE_WRANGLER_CHECK_GVK_ERROR_MAPPING`:
|
||||||
|
|
||||||
|
* New Conditions are merged with predefined conditions.
|
||||||
|
* Condition values are replaced for any condition you redefine.
|
||||||
|
|
||||||
|
For example, consider the Default mapping:
|
||||||
|
|
||||||
|
`HelmChart.Failed=["True"]`
|
||||||
|
|
||||||
|
This means `Failed=True` is treated as an error.
|
||||||
|
|
||||||
|
When you override with:
|
||||||
|
|
||||||
|
* `HelmChart.Failed=["False"]`
|
||||||
|
* `HelmChart.Ready=["False"]`
|
||||||
|
|
||||||
|
This results in
|
||||||
|
|
||||||
|
* `Failed=["False"]` replaces the default mapping. This means **`Failed=False`** is now treated as an error.
|
||||||
|
* `Ready=["False"]` is added, so **`Ready=False`** is also treated as an error.
|
||||||
|
* Other conditions unchanged.
|
||||||
|
|
||||||
|
### Disable error interpretation example
|
||||||
|
|
||||||
|
Assume that every value of `Failed` was previously interpreted as an error, for example:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{ "type": "Failed", "status": ["True", "False"] }
|
||||||
|
```
|
||||||
|
You can narrow this mapping to treat only `Failed=True` as an error by setting:
|
||||||
|
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"gvk": "sample.cattle.io/v1, Kind=Sample",
|
||||||
|
"conditionMappings": [
|
||||||
|
{ "type": "Failed", "status": ["True"] }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
This configuration means only **`Failed=True`** is treated as an error. `Failed=False` is no longer considered an error.
|
||||||
|
|
||||||
|
You can also disable errors for any value of `Failed` by
|
||||||
|
|
||||||
|
```json
|
||||||
|
{ "type": "Failed", "status": [""] }
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
This configuration ensures that **no value of `Failed`** is treated as an error.
|
||||||
|
|
||||||
|
:::note
|
||||||
|
Overriding conditions only affects the default error mappings (refer to [Default error mappings](#default-error-mappings)). Fleet may still mark a resource as an error because other checks, such as those from the `kstatus` library, continue to run after your customization.
|
||||||
|
:::
|
||||||
|
|
||||||
|
### Enable error interpretation example
|
||||||
|
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"gvk": "sample.cattle.io/v1, Kind=Sample",
|
||||||
|
"conditionMappings": [
|
||||||
|
{ "type": "Failed", "status": ["True"] }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
Here, `Failed=True` is treated as an error.
|
||||||
|
|
||||||
|
## Default error mappings {#default-error-mappings}
|
||||||
|
|
||||||
|
Fleet adds default error mappings to interpret certain resource conditions in the `status` field as errors. These mappings are applied besides to other readiness checks, such as those performed by the Kubernetes `kstatus` library.
|
||||||
|
|
||||||
|
The following default mappings apply:
|
||||||
|
|
||||||
|
* **HelmChart** (`helm.cattle.io/v1`)
|
||||||
|
* `JobCreated`: Neither `True` nor `False` is considered an error.
|
||||||
|
* `Failed`: `True` is considered an error.
|
||||||
|
* **Node** (`v1`)
|
||||||
|
* `OutOfDisk`: `True` is considered an error.
|
||||||
|
* `MemoryPressure`: `True` is considered an error.
|
||||||
|
* `DiskPressure`: `True` is considered an error.
|
||||||
|
* `NetworkUnavailable`: `True` is considered an error.
|
||||||
|
* **Deployment** (`apps/v1`)
|
||||||
|
* `ReplicaFailure`: `True` is considered an error.
|
||||||
|
* `Progressing`: `False` is considered an error.
|
||||||
|
* **ReplicaSet** (`apps/v1`)
|
||||||
|
* `ReplicaFailure`: `True` is considered an error.
|
||||||
|
|
||||||
|
### Fallback mapping
|
||||||
|
|
||||||
|
If a resource does not match the listed GVKs, Fleet applies a fallback mapping:
|
||||||
|
|
||||||
|
* Any `Group` and `Version` with any kind
|
||||||
|
|
||||||
|
* `Stalled`: `True` is considered an error.
|
||||||
|
* `Failed`: `True` is considered an error.
|
||||||
|
|
||||||
|
|
@ -52,7 +52,8 @@ module.exports = {
|
||||||
{type:'doc', id:'bundle-add'},
|
{type:'doc', id:'bundle-add'},
|
||||||
{type:'doc', id:'observability'},
|
{type:'doc', id:'observability'},
|
||||||
{type:'doc', id:'helm-ops'},
|
{type:'doc', id:'helm-ops'},
|
||||||
{type:'doc', id:'rollout'}
|
{type:'doc', id:'rollout'},
|
||||||
|
{type:'doc', id:'fine-tune-error'}
|
||||||
],
|
],
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|
|
||||||
File diff suppressed because one or more lines are too long
|
After Width: | Height: | Size: 24 KiB |
File diff suppressed because one or more lines are too long
|
After Width: | Height: | Size: 15 KiB |
Loading…
Reference in New Issue