There are a few documented scenarios where `kube-state-metrics` will
lock up(#995, #1028). I believe a much simpler solution to ensure
`kube-state-metrics` doesn't lock up and require a restart to server
`/metrics` requests is to add default read and write timeouts and to
allow them to be configurable. At Grafana, we've experienced a few
scenarios where `kube-state-metrics` running in larger clusters falls
behind and starts getting scraped multiple times. When this occurs,
`kube-state-metrics` becomes completely unresponsive and requires a
reboot. This is somewhat easily reproduceable(I'll provide a script in
an issue) and causes other critical workloads(KEDA, VPA) to fail in
weird ways.
Adds two flags:
- `server-read-timeout`
- `server-write-timeout`
Updates the metrics http server to set the `ReadTimeout` and
`WriteTimeout` to the configured values.
This uses code pieces from prometheus/alertmanager in https://github.com/prometheus/alertmanager/blob/main/config/coordinator.go#LL56C26-L56C26
licensed under Apache-2.0.
kube_state_metrics_config_hash{type="config", filename="config.yml"} 4.0061079457904e+13
kube_state_metrics_config_last_reload_success_timestamp_seconds{type="config", filename="config.yml"} 1.6697483049487052e+09
kube_state_metrics_config_last_reload_successful{type="config",
filename="config.yml"} 1
Signed-off-by: Manuel Rüger <manuel@rueg.eu>
s/pflags/cobra/g:
* Use spf13/cobra to handle all flags and sub-commands.
* Remove all spf13/pflag usage, and fallback to the in-build flag
package if, and when needed.
* Add completion support.
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
Support filtering label allowlist by "*", which will expand to the
enabled resources, while infering their values based on its value(s).
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
... to only monitor all known custom-resource configurations instead of
listing each of them explicitly
Signed-off-by: Mario Constanti <mario@constanti.de>
Remediate:
G104: Errors unhandled.
G109: Potential Integer overflow made by strconv.Atoi result conversion to int16/32
G112: Potential Slowloris Attack because ReadHeaderTimeout is not configured in the http.Server
G304: Potential file inclusion via variable
G601: Implicit memory aliasing in for loop.
Signed-off-by: Manuel Rüger <manuel@rueg.eu>
This is more flexible than the env variable, as a configuration can still set an env variable and use substitution in the args. e.g.,
```yaml
args:
- --custom-resource-state.config
- $(KSM_CUSTOM_RESOURCE_STATE_CONFIG)
env:
...
```
This test is added to ensure that ksm can be invoked as follows and that
this is a "documented" feature.
```
<ksm> --metric-denylist="
^kube_.+_created$,
^kube_.+_metadata_resource_version$,
^kube_pod_completion_time$,
^kube_pod_status_scheduled$
"
```
See: Usage in kube-prometheus jsonnet for example [1]
[1]: 9cf6111562/jsonnet/kube-prometheus/addons/ksm-lite.libsonnet (L18)
Signed-off-by: Sunil Thaha <sthaha@redhat.com>