Operator guide (#250)

* always use npm to manage package

Signed-off-by: 守辰 <shouchen.zz@alibaba-inc.com>

* add operator manual

Signed-off-by: 守辰 <shouchen.zz@alibaba-inc.com>

* improve and translate rollout describe and pause description

Signed-off-by: 守辰 <shouchen.zz@alibaba-inc.com>

---------

Signed-off-by: 守辰 <shouchen.zz@alibaba-inc.com>
This commit is contained in:
Zhen Zhang 2025-07-07 10:31:40 +08:00 committed by GitHub
parent e6947b649f
commit dbb5d412b8
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
24 changed files with 1309 additions and 51 deletions

View File

@ -923,5 +923,19 @@
"zhangsean",
"zhaomingshan",
"zxvf",
"the"
"the",
"DoNotSchedule",
"Entrypoint",
"ScheduleAnyway",
"matchLabelKeys",
"maxSkew",
"topologyKey",
"topologySpreadConstraints",
"workqueue",
"whenUnsatisfiable",
"oom",
"pprof-addr",
"enable-pprof",
"jsonpath",
"ip"
]

View File

@ -327,21 +327,3 @@ To uninstall kruise if it is installed with helm charts:
$ helm uninstall kruise
release "kruise" uninstalled
```
## Kruise State Metrics
[kruise-state-metrics](https://github.com/openkruise/kruise-state-metrics) is a simple service that listens to the
Kubernetes API server and generates metrics about the state of the objects.
It is not focused on the health of the individual OpenKruise components, but rather on the health of the various objects
inside, such as clonesets, advanced statefulsets and sidecarsets.
```bash
# Firstly add openkruise charts repository if you haven't do this.
$ helm repo add openkruise https://openkruise.github.io/charts/
# [Optional]
$ helm repo update
# Install the latest version.
$ helm install kruise openkruise/kruise-state-metrics --version 0.1.0
```

View File

@ -0,0 +1,61 @@
---
title: High availability
---
# Overview
the kruise control-plane kruise-manager has two components: webhook and controller. The webhook is replicated and the traffic is balanced over the replicas using cluster ip service. The controller is main-secondary architecture, and the main instance is leader elected. In the default installation, the webhook is colocated with the controller. However, it is also possible to deploy the webhook and controller independently.
# How to recover from application abnormal
Entrypoint of the kruise-manager container is the process of kruise-manager, so any panic or oom will restart the container. The readiness probe of kruise-manager will detect whether webhook credential is ready and whether webhook server is reachable, so if the webhook server is not responsive, the traffic will be routed away from the non-responsive kruise-manager instance. The controller component of kruise-manager will periodically update the lease for leader election, if the leader controller panic or hang, other healthy instance of kruise-manager can become the new leader and continue the reconciliation.
# How to recover from node/zone failure
The kruise-manager instances are scheduled into different nodes prior to 1.8, so that the kruise-manager can tolerate single node failure. After 1.8 the instances are scheduled into different availability zone, so that the kruise-manager can tolerate failure of single availability zone. Note that the topology information may not exist in some bare-metal nodes in the on-prem environment, please add the topology key to the nodes if you want to avoid zone failure. In addition, the topology spread configuration is not required and one can change the `whenUnsatisfiable` to `DoNotSchedule` if strong spread is required.
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
control-plane: controller-manager
name: kruise-controller-manager
spec:
template:
spec:
topologySpreadConstraints:
- labelSelector:
matchLabels:
control-plane: controller-manager
matchLabelKeys:
- pod-template-hash
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway # change to whenUnsatisfiable if strong spread is required.
...
```
# How to avoid cyclic dependency problem
kruise-manager contains a webhook for pod, and failure in kruise-manager will block pod creation. To avoid cyclic dependency problem, kruise-manager itself use host network, so that kruise-manager will not rely on any network component. In addition, the webhook of kruise-manager will skip the pod admission in kube-system namespace which are the default installation namespace of common system component, and the namespace where kruise-manager itself are deployed. Note that without the webhook function, some OpenKruise function will not work correctly e.g. SidecarSet and WorkloadSpread.
# How to avoid failure during component update
To ensure maximum availability during component rollout, kruise-manager always try to create 100% replicas of new pods, and wait for these replicas to be ready before delete the old replicas. Note that these strategy may be problematic if the resource for deployment is not enough. One can adjust the deployment strategy and reduce the value of maxSurge in such case.
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
control-plane: controller-manager
name: kruise-controller-manager
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 100% # reduce maxSurge if the resource for deployment is not adequate
...
```

View File

@ -0,0 +1,123 @@
---
title: Troubleshooting
---
## Investigate log
OpenKruise uses klog for logging, the kruise chart comes with a default log level of 4, which is verbose enough for most daily trouble shooting. One can further increase the log level to 5 and above to get all the debug informations.
From OpenKruise 1.7, we are adding support for structured logs, which natively support (key, value) pairs and object references. And logs can also be outputted in JSON format using `helm install ... --set manager.loggingFormat=json.`
For example, this invocation of InfoS:
```
klog.V(3).InfoS("SidecarSet updated status success", "sidecarSet", klog.KObj(sidecarSet), "matchedPods", status.MatchedPods,
"updatedPods", status.UpdatedPods, "readyPods", status.ReadyPods, "updateReadyPods", status.UpdatedReadyPods)
```
will result in this log:
```
I0821 14:22:35.587919 1 sidecarset_processor.go:280] "SidecarSet updated status success" sidecarSet="test-sidecarset" matchedPods=1 updatedPods=1 readyPods=1 updateReadyPods=1
```
Or, if `helm install ... --set manager.loggingFormat=json`, it will result in this output:
```json
{
"ts": 1724239224606.642,
"caller": "sidecarset/sidecarset_processor.go:280",
"msg": "SidecarSet updated status success",
"v": 3,
"sidecarSet": {
"name": "test-sidecarset"
},
"matchedPods": 1,
"updatedPods": 1,
"readyPods": 0,
"updateReadyPods": 0
}
```
## Investigate metrics
### Built-in metrics
OpenKruise exposes metrics in Prometheus format, crucial for monitoring the health and performance of its controllers and managed workloads. By default, the OpenKruise controller manager (`kruise-manager`) exposes metrics on port `8080` at the `/metrics` endpoint. This is typically enabled during installation (e.g., via Helm).
To verify:
1. **Port-forward to the `kruise-controller-manager` service:**
(The service port for metrics is often named `metrics` or is the main service port, e.g., 8080)
```bash
# kubectl get svc -n kruise-system kruise-controller-manager -o jsonpath='{.spec.ports[?(@.name=="metrics")].port}' # to find port
kubectl port-forward -n kruise-system svc/kruise-controller-manager 8080:8080
```
2. **Query the metrics endpoint:**
```bash
curl localhost:8080/metrics
```
#### Key Metrics Categories
1. Controller Runtime Metrics
Standard metrics from `controller-runtime` library, offering insights into individual controller and webhook performance, here are some key metrics to notice:
| Metric Name | Type | Description | Labels |
|----------------------------------------------|-----------|---------------------------------------------|------------------|
| `controller_runtime_reconcile_total` | Counter | Total reconciliations per controller. | `controller` |
| `controller_runtime_reconcile_errors_total` | Counter | Total reconciliation errors per controller. | `controller` |
| `controller_runtime_reconcile_time_seconds` | Histogram | Reconciliation duration per controller. | `controller` |
| `workqueue_depth` | Gauge | Current workqueue depth per controller. | `name` |
| `controller_runtime_webhook_requests_total` | Counter | Total requests of webhook | `code`,`webhook` |
| `controller_runtime_webhook_latency_seconds` | Histogram | latency of webhook request | `webhook` |
**Use to:** Identify overloaded controllers (high `workqueue_depth`, long `controller_runtime_webhook_latency_seconds`) or persistent issues (increasing `reconcile_errors_total`).
2. OpenKruise Specific Metrics
Custom metrics for OpenKruise features.
| Metric Name | Type | Description | Labels |
|--------------------------------------|---------|--------------------------------------------------------------------|-----------------------------------|
| `kruise_manager_is_leader` | Gauge | Gauge the leader of kruise-manager | |
| `pod_unavailable_budget` | Counter | Number of Pod Unavailable Budget protection against pod disruption | `kind_namespace_name`, `username` |
| `cloneset_scale_expectation_leakage` | Counter | Number of CloneSet Scale Expectation Timeout | `namespace`,`name` |
| `namespace_deletion_protection` | Counter | Number of namespace deletion protection | `name`, `username` |
| `crd_deletion_protection` | Counter | Number of CustomResourceDefinition deletion protection | `name`, `username` |
| `workload_deletion_protection` | Counter | Number of workload deletion protection | `kind_namespace_name`, `username` |
**Use to:** Identify control-plane problem, feature performance (e.g., PUB protections)
3. Go Runtime and Process Metrics
Standard Go and process metrics for advanced debugging of `kruise-controller-manager` resource usage.
### Kruise State Metrics
[kruise-state-metrics](https://github.com/openkruise/kruise-state-metrics) is a simple service that listens to the
Kubernetes API server and generates metrics about the state of the objects.
It is not focused on the health of the individual OpenKruise components, but rather on the health of the various objects
inside, such as clonesets, advanced statefulsets and sidecarsets.
```bash
# Firstly add openkruise charts repository if you haven't do this.
$ helm repo add openkruise https://openkruise.github.io/charts/
# [Optional]
$ helm repo update
# Install the latest version.
$ helm install kruise openkruise/kruise-state-metrics --version 0.2.2
```
## Investigate performance problem
### how to enable pprof
Kruise-manager enables pprof by default, and kruise-daemon disables the pprof by default for security consideration. One can disable pprof or change the pprof address by setting the command line arguments of the component to include the following argument.
| component | pprof enable argument | pprof address argument |
| -------- | ------- | ------- |
| kruise-manager | --enable-pprof=true (true by default) | --pprof-addr="host:port" (":8090" by default) |
| kruise-daemon | --enable-pprof=false(false by default) | --pprof-addr="host:port" (":10222" by default) |

View File

@ -110,6 +110,14 @@ API 规范的主要部分包括 3 部分,您应该注意:
## API 详细信息
### Rollout Spec 字段
| Field | Type | Default | Description |
|-------------------|---------|---------|--------------------------------------|
| `disabled` | boolean | false | 当设置为true 释放Rollout对工作负载的托管 并恢复流量路由 |
| `workloadRef` | Object | | 工作负载绑定 API |
| `strategy` | Object | | 发布策略配置 (金丝雀或者蓝绿) |
### 工作负载绑定 API必填
告诉 Kruise Rollout 应该绑定哪个工作负载:
@ -291,7 +299,18 @@ spec:
### 策略API必填
canary用于金丝雀发布和多批次发布blueGreen用于蓝绿发布二者是互斥选项不能同时为空也不能都非空。blueGreen选项是在Kruise-Rollout v0.5.0版本之后引入的且不支持v1alpha1 API。
| Field | Type | Default | Description |
|--------------|---------|---------|---------------------------|
| `paused` | boolean | false | 当为`true`时暂停Rollout对发布的处理 |
| `canary` | Object | nil | 金丝雀发布策略配置 |
| `blueGreen` | Object | nil | 蓝绿发布策略配置 (需要 v0.6.0及以上) |
**注意:`Disabled` 和 `Paused` 的区别**
- **Disabled**: 释放Rollout对工作负责的托管并将流量全部路由到稳定版本的实例控制器会忽略该Rollout对象 效果相当于删除了Rollout对象。
- **Paused**: 保持Rollout对工作负载的托管但不再推进发布进展.
canary用于金丝雀发布和多批次发布blueGreen用于蓝绿发布二者是互斥选项不能同时为空也不能都非空。blueGreen选项是在Kruise-Rollout v0.6.0版本引入的且不支持v1alpha1 API。
#### canary

View File

@ -170,6 +170,49 @@ func IsRolloutCurrentStepReady(workload appsv1.Deployment, rollout *rolloutsv1be
}
```
## 如何查看新部署的Pod
你可以使用`kubectl kruise describe rollout`命令来查看新部署的 Pod。请注意该命令会显示 rollout 的简要信息,并列出与当前步骤相关的已部署 Pod。
```bash
$ kubectl kruise describe rollout rollouts-demo -n default
```
**实例输出:**
```
Name: rollouts-demo
Namespace: default
Status: ⚠ Progressing
Message: Rollout is in step(1/3), and you need manually confirm to enter the next step
Strategy: Canary
Step: 1/3
Steps:
- Replicas: 1
State: StepPaused
- Replicas: 2
- Replicas: 3
Images: registry.cn-hangzhou.aliyuncs.com/acs-sample/nginx:latest
Current Revision: 5555d6dcc8
Update Revision: 579589c5cd
Replicas:
Desired: 10
Updated: 1
Current: 10
Ready: 10
Available: 10
NAME READY BATCH ID REVISION AGE RESTARTS STATUS
nginx-deployment-basic-579589c5cd-rx5nm 1/1 1 579589c5cd 22s 0 ✔ Running
```
或者,你也可以通过以下 Pod 标签直接过滤出相关 Pod
1. `rollouts.kruise.io/rollout-id`用于标识不同的rollout单子。该标签的值来源于工作负载上的同名标签。如果工作负载上没有`rollouts.kruise.io/rollout-id` 标签Kruise Rollout 将会使用 revision修订版本生成一个。
2. `rollouts.kruise.io/rollout-batch-id`:用于标识不同的发布批次。其值是一个从 1 开始递增的数字
你可以使用如下命令直接过滤 Pod
```bash
% kubectl get pods -l rollouts.kruise.io/rollout-id=579589c5cd,rollouts.kruise.io/rollout-batch-id=1
NAME READY STATUS RESTARTS AGE
nginx-deployment-basic-579589c5cd-rx5nm 1/1 Running 0 18m
```
## 如何回滚
事实上Kruise Rollout **不提供** 直接回滚的功能。**Kruise Rollout
@ -192,3 +235,34 @@ kruise-tools 是 OpenKruise 的 kubectl 插件,它为 kruise 功能提供了
将从开始步骤(第一步)重新开始进展。
- **HPA兼容**假设您为工作负载配置了水平Pod自动伸缩HPA并使用多批次升级策略我们建议使用“百分比”来指定“steps[x]
.replicas”。如果在升级进行过程中扩展或缩小副本数量旧版本和新版本的副本将根据“百分比”配置进行伸缩以确保伸缩与升级进展保持一致。
## 可选操作
### 暂停Rollout处理
在Rollout发布过程中可以暂停Rollout的处理这对于手动检查或故障排除很有用。控制器继续监视资源直到Rollout被取消暂停才开始处理下一个步骤。
要暂停Rollout的处理请patch`spec.strategy.paused` 字段为 `true`.
```bash
# Pause the current rollout
kubectl patch rollout rollouts-demo --type merge -p '{"spec":{"strategy":{"paused":true}}}'
# To resume, set the field back to false
kubectl patch rollout rollouts-demo --type merge -p '{"spec":{"strategy":{"paused":false}}}'
```
### 禁止Rollout处理
在Rollout发布完成后一般而言您不需要删除或者禁止RolloutRollout只会在发布过程中处理。然而 你如果想要确保Rollout不再处理 或则不再想使用渐进式发布, 可以使用`spec.disabled`字段来禁用Rollout。相对于之间删除Rollout对象 禁用Rollout可以更容易做问题排查 并且允许您更快速地重新启用渐进式发布。
要禁止Rollout的处理, 请patch`spec.disabled`字段为`true`.
```bash
# Disable the rollout after it has finished
kubectl patch rollout rollouts-demo --type merge -p '{"spec":{"disabled":true}}'
# To re-enable, set the field back to false
kubectl patch rollout rollouts-demo --type merge -p '{"spec":{"disabled":false}}'
```

View File

@ -38,5 +38,8 @@
},
"sidebar.docs.category.Developer Manuals": {
"message": "开发者手册"
},
"sidebar.docs.category.Operator Manuals": {
"message": "运维手册"
}
}

View File

@ -0,0 +1,68 @@
---
title: 高可用性
---
# 概述
Kruise 控制平面组件 kruise-manager 包含两个部分webhook 和 controller。Webhook 是多副本的,并通过 Cluster IP Service 在多个副本之间进行流量负载均衡。Controller 则采用主从架构leader election其中主实例由选举产生。在默认安装中webhook 和 controller 是部署在一起的。当然,也可以将 webhook 和 controller 独立部署。
# 异常恢复
kruise-manager 容器的入口是 kruise-manager 进程,因此任何 panic 或 OOM 都会导致容器重启。kruise-manager 的 readiness 探针会检测 webhook 的证书是否就绪、webhook server 是否可达。如果某个 webhook server 不响应,流量将被路由到其他正常的 kruise-manager 实例。此外kruise-manager 的 controller 组件会定期更新 leader 选举的租约信息,如果主 controller 发生 panic 或卡住,其他健康的 kruise-manager 实例可以成为新的 leader 并继续执行协调操作。
# 节点/区域故障恢复
在 v1.8 之前kruise-manager 实例会被调度到不同的节点上,以容忍单节点故障。从 v1.8 开始kruise-manager 实例会被调度到不同的可用区中,从而能够容忍单个可用区的故障。需要注意的是,在某些本地部署环境中的裸金属节点可能没有拓扑信息,请为这些节点手动添加拓扑标签,以便避免因可用区故障导致问题。此外,拓扑打散配置不是必须的,如果您希望强制进行 Pod打散可以将`whenUnsatisfiable` 设置为 `DoNotSchedule`
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
control-plane: controller-manager
name: kruise-controller-manager
spec:
template:
spec:
topologySpreadConstraints:
- labelSelector:
matchLabels:
control-plane: controller-manager
matchLabelKeys:
- pod-template-hash
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway # 如果需要强打散,请改为 DoNotSchedule
...
```
# 避免循环依赖问题
kruise-manager 包含一个针对 Pod 的 webhook如果 kruise-manager 出现故障,可能会阻塞 Pod 的创建。为了避免这种循环依赖问题kruise-manager 自身使用主机网络host network这样它就不依赖任何网络组件。此外kruise-manager 的 webhook 会跳过 kube-system 命名空间下的 Pod 准入控制,该命名空间通常是系统组件的默认安装命名空间,也是 kruise-manager 自身部署的命名空间。请注意,如果没有 webhook 功能OpenKruise 的一些功能(如 SidecarSet 和 WorkloadSpread将无法正常工作。
# 避免组件升级过程中的故障
为了在组件滚动更新期间确保最大可用性kruise-manager 总是会先启动 100% 的新 Pod 副本,并等待这些副本就绪后才删除旧副本。需要注意的是,如果集群资源不足,这种策略可能会导致问题。在这种情况下,您可以调整部署策略并减少 maxSurge 的值。
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
control-plane: controller-manager
name: kruise-controller-manager
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 100% # 如果资源不足,请适当减小 maxSurge 的值
...
```

View File

@ -0,0 +1,119 @@
---
title: 问题排查
---
## 日志分析
OpenKruise 使用 klog 进行日志记录。默认情况下OpenKruise 的 Helm Chart 设置的日志级别为 4这已经足够用于日常的故障排查。如果需要更详细的调试信息可以将日志级别提升至 5 或更高。
从 OpenKruise 1.7 版本开始我们支持结构化日志structured logs原生支持 (key, value) 键值对和对象引用。你还可以通过 Helm 安装时设置 --set manager.loggingFormat=json 来以 JSON 格式输出日志。
例如以下 InfoS 调用:
```
klog.V(3).InfoS("SidecarSet updated status success", "sidecarSet", klog.KObj(sidecarSet), "matchedPods", status.MatchedPods,
"updatedPods", status.UpdatedPods, "readyPods", status.ReadyPods, "updateReadyPods", status.UpdatedReadyPods)
```
将会输出如下日志:
```
I0821 14:22:35.587919 1 sidecarset_processor.go:280] "SidecarSet updated status success" sidecarSet="test-sidecarset" matchedPods=1 updatedPods=1 readyPods=1 updateReadyPods=1
```
如果你使用了 helm install ... --set manager.loggingFormat=json则输出如下 JSON 格式日志:
```json
{
"ts": 1724239224606.642,
"caller": "sidecarset/sidecarset_processor.go:280",
"msg": "SidecarSet updated status success",
"v": 3,
"sidecarSet": {
"name": "test-sidecarset"
},
"matchedPods": 1,
"updatedPods": 1,
"readyPods": 0,
"updateReadyPods": 0
}
```
## 指标分析
### 内置指标
OpenKruise 以 Prometheus 格式暴露指标这对于监控控制器和受管工作负载的健康状况和性能至关重要。默认情况下OpenKruise 控制器管理器 (kruise-manager) 在端口 8080 上的 /metrics 端点暴露这些指标。通常在安装时就已经启用(例如通过 Helm
验证指标透出:
1. **将本地端口转发到 kruise-controller-manager 服务:**
```bash
# kubectl get svc -n kruise-system kruise-controller-manager -o jsonpath='{.spec.ports[?(@.name=="metrics")].port}' # to find port
kubectl port-forward -n kruise-system svc/kruise-controller-manager 8080:8080
```
2. **访问指标端点**
```bash
curl localhost:8080/metrics
```
#### 关键指标分类
1. Controller Runtime 指标
来自 controller-runtime 库的标准指标,反映各控制器的运行情况,这里列出了其中最关键的指标:
| Metric Name | Type | Description | Labels |
|----------------------------------------------|-----------|-----------------------|------------------|
| `controller_runtime_reconcile_total` | Counter | 每个控制器的 reconcile 总次数 | `controller` |
| `controller_runtime_reconcile_errors_total` | Counter | 每个控制器的 reconcile 错误总数 | `controller` |
| `controller_runtime_reconcile_time_seconds` | Histogram | 每个控制器的 reconcile 耗时 | `controller` |
| `workqueue_depth` | Gauge | 每个控制器当前的工作队列长度 | `name` |
| `controller_runtime_webhook_requests_total` | Counter | 各个webhook的请求总数 | `webhook`,`code` |
| `controller_runtime_webhook_latency_seconds` | Histogram | 各个webhook的请求延时 | `webhook` |
**用途:** 可用于识别过载的控制器(如高 `workqueue_depth`、长 `reconcile_time_seconds`)或持续出错的控制器(如`reconcile_errors_total` 持续上升)。
2. OpenKruise 特有指标
| Metric Name | Type | Description | Labels |
|--------------------------------------|---------|----------------------------------|-----------------------------------|
| `kruise_manager_is_leader` | Gauge | 表示当前 kruise-manager 是否为 leader | |
| `pod_unavailable_budget` | Counter | Pod 不可用预算保护机制防止 Pod 扰动的数量 | `kind_namespace_name`, `username` |
| `cloneset_scale_expectation_leakage` | Counter | CloneSet Scale Expectation 超时的次数 | `namespace`,`name` |
| `namespace_deletion_protection` | Counter | 命名空间删除保护数量 | `name`, `username` |
| `crd_deletion_protection` | Counter | 自定义资源定义CRD删除保护数量 | `name`, `username` |
| `workload_deletion_protection` | Counter | 工作负载删除保护数量 | `kind_namespace_name`, `username` |
**用途 ** 用于识别控制平面问题,以及特性功能性能问题(如 PUB 保护机制)
3. Go 运行时与进程指标
用于高级调试 kruise-controller-manager 资源使用情况的标准 Go 和进程指标。
### Kruise State Metrics
[kruise-state-metrics](https://github.com/openkruise/kruise-state-metrics) 是一个监听 Kubernetes API Server 并生成各类资源状态指标的服务。它不关注 OpenKruise 各组件本身的健康状态,而是关注内部资源(如 CloneSet、Advanced StatefulSet、SidecarSet 等)的状态。
```bash
# Firstly add openkruise charts repository if you haven't do this.
$ helm repo add openkruise https://openkruise.github.io/charts/
# [Optional]
$ helm repo update
# Install the latest version.
$ helm install kruise openkruise/kruise-state-metrics --version 0.2.2
```
## 性能问题排查
### 如何启用 pprof
默认情况下kruise-manager 已经启用了 pprof而为了安全考虑kruise-daemon 默认是禁用的。你也可以通过组件启动参数来更改其行为。
| component | pprof enable argument | pprof address argument |
| -------- | ------- | ------- |
| kruise-manager | --enable-pprof=true (true by default) | --pprof-addr="host:port" (":8090" by default) |
| kruise-daemon | --enable-pprof=false(false by default) | --pprof-addr="host:port" (":10222" by default) |

View File

@ -38,5 +38,8 @@
},
"sidebar.docs.category.Developer Manuals": {
"message": "开发者手册"
},
"sidebar.docs.category.Operator Manuals": {
"message": "运维手册"
}
}

View File

@ -0,0 +1,68 @@
---
title: 高可用性
---
# 概述
Kruise 控制平面组件 kruise-manager 包含两个部分webhook 和 controller。Webhook 是多副本的,并通过 Cluster IP Service 在多个副本之间进行流量负载均衡。Controller 则采用主从架构leader election其中主实例由选举产生。在默认安装中webhook 和 controller 是部署在一起的。当然,也可以将 webhook 和 controller 独立部署。
# 异常恢复
kruise-manager 容器的入口是 kruise-manager 进程,因此任何 panic 或 OOM 都会导致容器重启。kruise-manager 的 readiness 探针会检测 webhook 的证书是否就绪、webhook server 是否可达。如果某个 webhook server 不响应,流量将被路由到其他正常的 kruise-manager 实例。此外kruise-manager 的 controller 组件会定期更新 leader 选举的租约信息,如果主 controller 发生 panic 或卡住,其他健康的 kruise-manager 实例可以成为新的 leader 并继续执行协调操作。
# 节点/区域故障恢复
在 v1.8 之前kruise-manager 实例会被调度到不同的节点上,以容忍单节点故障。从 v1.8 开始kruise-manager 实例会被调度到不同的可用区中,从而能够容忍单个可用区的故障。需要注意的是,在某些本地部署环境中的裸金属节点可能没有拓扑信息,请为这些节点手动添加拓扑标签,以便避免因可用区故障导致问题。此外,拓扑打散配置不是必须的,如果您希望强制进行 Pod打散可以将`whenUnsatisfiable` 设置为 `DoNotSchedule`
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
control-plane: controller-manager
name: kruise-controller-manager
spec:
template:
spec:
topologySpreadConstraints:
- labelSelector:
matchLabels:
control-plane: controller-manager
matchLabelKeys:
- pod-template-hash
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway # 如果需要强打散,请改为 DoNotSchedule
...
```
# 避免循环依赖问题
kruise-manager 包含一个针对 Pod 的 webhook如果 kruise-manager 出现故障,可能会阻塞 Pod 的创建。为了避免这种循环依赖问题kruise-manager 自身使用主机网络host network这样它就不依赖任何网络组件。此外kruise-manager 的 webhook 会跳过 kube-system 命名空间下的 Pod 准入控制,该命名空间通常是系统组件的默认安装命名空间,也是 kruise-manager 自身部署的命名空间。请注意,如果没有 webhook 功能OpenKruise 的一些功能(如 SidecarSet 和 WorkloadSpread将无法正常工作。
# 避免组件升级过程中的故障
为了在组件滚动更新期间确保最大可用性kruise-manager 总是会先启动 100% 的新 Pod 副本,并等待这些副本就绪后才删除旧副本。需要注意的是,如果集群资源不足,这种策略可能会导致问题。在这种情况下,您可以调整部署策略并减少 maxSurge 的值。
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
control-plane: controller-manager
name: kruise-controller-manager
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 100% # 如果资源不足,请适当减小 maxSurge 的值
...
```

View File

@ -0,0 +1,119 @@
---
title: 问题排查
---
## 日志分析
OpenKruise 使用 klog 进行日志记录。默认情况下OpenKruise 的 Helm Chart 设置的日志级别为 4这已经足够用于日常的故障排查。如果需要更详细的调试信息可以将日志级别提升至 5 或更高。
从 OpenKruise 1.7 版本开始我们支持结构化日志structured logs原生支持 (key, value) 键值对和对象引用。你还可以通过 Helm 安装时设置 --set manager.loggingFormat=json 来以 JSON 格式输出日志。
例如以下 InfoS 调用:
```
klog.V(3).InfoS("SidecarSet updated status success", "sidecarSet", klog.KObj(sidecarSet), "matchedPods", status.MatchedPods,
"updatedPods", status.UpdatedPods, "readyPods", status.ReadyPods, "updateReadyPods", status.UpdatedReadyPods)
```
将会输出如下日志:
```
I0821 14:22:35.587919 1 sidecarset_processor.go:280] "SidecarSet updated status success" sidecarSet="test-sidecarset" matchedPods=1 updatedPods=1 readyPods=1 updateReadyPods=1
```
如果你使用了 helm install ... --set manager.loggingFormat=json则输出如下 JSON 格式日志:
```json
{
"ts": 1724239224606.642,
"caller": "sidecarset/sidecarset_processor.go:280",
"msg": "SidecarSet updated status success",
"v": 3,
"sidecarSet": {
"name": "test-sidecarset"
},
"matchedPods": 1,
"updatedPods": 1,
"readyPods": 0,
"updateReadyPods": 0
}
```
## 指标分析
### 内置指标
OpenKruise 以 Prometheus 格式暴露指标这对于监控控制器和受管工作负载的健康状况和性能至关重要。默认情况下OpenKruise 控制器管理器 (kruise-manager) 在端口 8080 上的 /metrics 端点暴露这些指标。通常在安装时就已经启用(例如通过 Helm
验证指标透出:
1. **将本地端口转发到 kruise-controller-manager 服务:**
```bash
# kubectl get svc -n kruise-system kruise-controller-manager -o jsonpath='{.spec.ports[?(@.name=="metrics")].port}' # to find port
kubectl port-forward -n kruise-system svc/kruise-controller-manager 8080:8080
```
2. **访问指标端点**
```bash
curl localhost:8080/metrics
```
#### 关键指标分类
1. Controller Runtime 指标
来自 controller-runtime 库的标准指标,反映各控制器的运行情况,这里列出了其中最关键的指标:
| Metric Name | Type | Description | Labels |
|----------------------------------------------|-----------|-----------------------|------------------|
| `controller_runtime_reconcile_total` | Counter | 每个控制器的 reconcile 总次数 | `controller` |
| `controller_runtime_reconcile_errors_total` | Counter | 每个控制器的 reconcile 错误总数 | `controller` |
| `controller_runtime_reconcile_time_seconds` | Histogram | 每个控制器的 reconcile 耗时 | `controller` |
| `workqueue_depth` | Gauge | 每个控制器当前的工作队列长度 | `name` |
| `controller_runtime_webhook_requests_total` | Counter | 各个webhook的请求总数 | `webhook`,`code` |
| `controller_runtime_webhook_latency_seconds` | Histogram | 各个webhook的请求延时 | `webhook` |
**用途:** 可用于识别过载的控制器(如高 `workqueue_depth`、长 `reconcile_time_seconds`)或持续出错的控制器(如`reconcile_errors_total` 持续上升)。
2. OpenKruise 特有指标
| Metric Name | Type | Description | Labels |
|--------------------------------------|---------|----------------------------------|-----------------------------------|
| `kruise_manager_is_leader` | Gauge | 表示当前 kruise-manager 是否为 leader | |
| `pod_unavailable_budget` | Counter | Pod 不可用预算保护机制防止 Pod 扰动的数量 | `kind_namespace_name`, `username` |
| `cloneset_scale_expectation_leakage` | Counter | CloneSet Scale Expectation 超时的次数 | `namespace`,`name` |
| `namespace_deletion_protection` | Counter | 命名空间删除保护数量 | `name`, `username` |
| `crd_deletion_protection` | Counter | 自定义资源定义CRD删除保护数量 | `name`, `username` |
| `workload_deletion_protection` | Counter | 工作负载删除保护数量 | `kind_namespace_name`, `username` |
**用途 ** 用于识别控制平面问题,以及特性功能性能问题(如 PUB 保护机制)
3. Go 运行时与进程指标
用于高级调试 kruise-controller-manager 资源使用情况的标准 Go 和进程指标。
### Kruise State Metrics
[kruise-state-metrics](https://github.com/openkruise/kruise-state-metrics) 是一个监听 Kubernetes API Server 并生成各类资源状态指标的服务。它不关注 OpenKruise 各组件本身的健康状态,而是关注内部资源(如 CloneSet、Advanced StatefulSet、SidecarSet 等)的状态。
```bash
# Firstly add openkruise charts repository if you haven't do this.
$ helm repo add openkruise https://openkruise.github.io/charts/
# [Optional]
$ helm repo update
# Install the latest version.
$ helm install kruise openkruise/kruise-state-metrics --version 0.2.2
```
## 性能问题排查
### 如何启用 pprof
默认情况下kruise-manager 已经启用了 pprof而为了安全考虑kruise-daemon 默认是禁用的。你也可以通过组件启动参数来更改其行为。
| component | pprof enable argument | pprof address argument |
| -------- | ------- | ------- |
| kruise-manager | --enable-pprof=true (true by default) | --pprof-addr="host:port" (":8090" by default) |
| kruise-daemon | --enable-pprof=false(false by default) | --pprof-addr="host:port" (":10222" by default) |

View File

@ -38,5 +38,8 @@
},
"sidebar.docs.category.Developer Manuals": {
"message": "开发者手册"
},
"sidebar.docs.category.Operator Manuals": {
"message": "运维手册"
}
}

View File

@ -0,0 +1,68 @@
---
title: 高可用性
---
# 概述
Kruise 控制平面组件 kruise-manager 包含两个部分webhook 和 controller。Webhook 是多副本的,并通过 Cluster IP Service 在多个副本之间进行流量负载均衡。Controller 则采用主从架构leader election其中主实例由选举产生。在默认安装中webhook 和 controller 是部署在一起的。当然,也可以将 webhook 和 controller 独立部署。
# 异常恢复
kruise-manager 容器的入口是 kruise-manager 进程,因此任何 panic 或 OOM 都会导致容器重启。kruise-manager 的 readiness 探针会检测 webhook 的证书是否就绪、webhook server 是否可达。如果某个 webhook server 不响应,流量将被路由到其他正常的 kruise-manager 实例。此外kruise-manager 的 controller 组件会定期更新 leader 选举的租约信息,如果主 controller 发生 panic 或卡住,其他健康的 kruise-manager 实例可以成为新的 leader 并继续执行协调操作。
# 节点/区域故障恢复
在 v1.8 之前kruise-manager 实例会被调度到不同的节点上,以容忍单节点故障。从 v1.8 开始kruise-manager 实例会被调度到不同的可用区中,从而能够容忍单个可用区的故障。需要注意的是,在某些本地部署环境中的裸金属节点可能没有拓扑信息,请为这些节点手动添加拓扑标签,以便避免因可用区故障导致问题。此外,拓扑打散配置不是必须的,如果您希望强制进行 Pod打散可以将`whenUnsatisfiable` 设置为 `DoNotSchedule`
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
control-plane: controller-manager
name: kruise-controller-manager
spec:
template:
spec:
topologySpreadConstraints:
- labelSelector:
matchLabels:
control-plane: controller-manager
matchLabelKeys:
- pod-template-hash
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway # 如果需要强打散,请改为 DoNotSchedule
...
```
# 避免循环依赖问题
kruise-manager 包含一个针对 Pod 的 webhook如果 kruise-manager 出现故障,可能会阻塞 Pod 的创建。为了避免这种循环依赖问题kruise-manager 自身使用主机网络host network这样它就不依赖任何网络组件。此外kruise-manager 的 webhook 会跳过 kube-system 命名空间下的 Pod 准入控制,该命名空间通常是系统组件的默认安装命名空间,也是 kruise-manager 自身部署的命名空间。请注意,如果没有 webhook 功能OpenKruise 的一些功能(如 SidecarSet 和 WorkloadSpread将无法正常工作。
# 避免组件升级过程中的故障
为了在组件滚动更新期间确保最大可用性kruise-manager 总是会先启动 100% 的新 Pod 副本,并等待这些副本就绪后才删除旧副本。需要注意的是,如果集群资源不足,这种策略可能会导致问题。在这种情况下,您可以调整部署策略并减少 maxSurge 的值。
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
control-plane: controller-manager
name: kruise-controller-manager
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 100% # 如果资源不足,请适当减小 maxSurge 的值
...
```

View File

@ -0,0 +1,119 @@
---
title: 问题排查
---
## 日志分析
OpenKruise 使用 klog 进行日志记录。默认情况下OpenKruise 的 Helm Chart 设置的日志级别为 4这已经足够用于日常的故障排查。如果需要更详细的调试信息可以将日志级别提升至 5 或更高。
从 OpenKruise 1.7 版本开始我们支持结构化日志structured logs原生支持 (key, value) 键值对和对象引用。你还可以通过 Helm 安装时设置 --set manager.loggingFormat=json 来以 JSON 格式输出日志。
例如以下 InfoS 调用:
```
klog.V(3).InfoS("SidecarSet updated status success", "sidecarSet", klog.KObj(sidecarSet), "matchedPods", status.MatchedPods,
"updatedPods", status.UpdatedPods, "readyPods", status.ReadyPods, "updateReadyPods", status.UpdatedReadyPods)
```
将会输出如下日志:
```
I0821 14:22:35.587919 1 sidecarset_processor.go:280] "SidecarSet updated status success" sidecarSet="test-sidecarset" matchedPods=1 updatedPods=1 readyPods=1 updateReadyPods=1
```
如果你使用了 helm install ... --set manager.loggingFormat=json则输出如下 JSON 格式日志:
```json
{
"ts": 1724239224606.642,
"caller": "sidecarset/sidecarset_processor.go:280",
"msg": "SidecarSet updated status success",
"v": 3,
"sidecarSet": {
"name": "test-sidecarset"
},
"matchedPods": 1,
"updatedPods": 1,
"readyPods": 0,
"updateReadyPods": 0
}
```
## 指标分析
### 内置指标
OpenKruise 以 Prometheus 格式暴露指标这对于监控控制器和受管工作负载的健康状况和性能至关重要。默认情况下OpenKruise 控制器管理器 (kruise-manager) 在端口 8080 上的 /metrics 端点暴露这些指标。通常在安装时就已经启用(例如通过 Helm
验证指标透出:
1. **将本地端口转发到 kruise-controller-manager 服务:**
```bash
# kubectl get svc -n kruise-system kruise-controller-manager -o jsonpath='{.spec.ports[?(@.name=="metrics")].port}' # to find port
kubectl port-forward -n kruise-system svc/kruise-controller-manager 8080:8080
```
2. **访问指标端点**
```bash
curl localhost:8080/metrics
```
#### 关键指标分类
1. Controller Runtime 指标
来自 controller-runtime 库的标准指标,反映各控制器的运行情况,这里列出了其中最关键的指标:
| Metric Name | Type | Description | Labels |
|----------------------------------------------|-----------|-----------------------|------------------|
| `controller_runtime_reconcile_total` | Counter | 每个控制器的 reconcile 总次数 | `controller` |
| `controller_runtime_reconcile_errors_total` | Counter | 每个控制器的 reconcile 错误总数 | `controller` |
| `controller_runtime_reconcile_time_seconds` | Histogram | 每个控制器的 reconcile 耗时 | `controller` |
| `workqueue_depth` | Gauge | 每个控制器当前的工作队列长度 | `name` |
| `controller_runtime_webhook_requests_total` | Counter | 各个webhook的请求总数 | `webhook`,`code` |
| `controller_runtime_webhook_latency_seconds` | Histogram | 各个webhook的请求延时 | `webhook` |
**用途:** 可用于识别过载的控制器(如高 `workqueue_depth`、长 `reconcile_time_seconds`)或持续出错的控制器(如`reconcile_errors_total` 持续上升)。
2. OpenKruise 特有指标
| Metric Name | Type | Description | Labels |
|--------------------------------------|---------|----------------------------------|-----------------------------------|
| `kruise_manager_is_leader` | Gauge | 表示当前 kruise-manager 是否为 leader | |
| `pod_unavailable_budget` | Counter | Pod 不可用预算保护机制防止 Pod 扰动的数量 | `kind_namespace_name`, `username` |
| `cloneset_scale_expectation_leakage` | Counter | CloneSet Scale Expectation 超时的次数 | `namespace`,`name` |
| `namespace_deletion_protection` | Counter | 命名空间删除保护数量 | `name`, `username` |
| `crd_deletion_protection` | Counter | 自定义资源定义CRD删除保护数量 | `name`, `username` |
| `workload_deletion_protection` | Counter | 工作负载删除保护数量 | `kind_namespace_name`, `username` |
**用途 ** 用于识别控制平面问题,以及特性功能性能问题(如 PUB 保护机制)
3. Go 运行时与进程指标
用于高级调试 kruise-controller-manager 资源使用情况的标准 Go 和进程指标。
### Kruise State Metrics
[kruise-state-metrics](https://github.com/openkruise/kruise-state-metrics) 是一个监听 Kubernetes API Server 并生成各类资源状态指标的服务。它不关注 OpenKruise 各组件本身的健康状态,而是关注内部资源(如 CloneSet、Advanced StatefulSet、SidecarSet 等)的状态。
```bash
# Firstly add openkruise charts repository if you haven't do this.
$ helm repo add openkruise https://openkruise.github.io/charts/
# [Optional]
$ helm repo update
# Install the latest version.
$ helm install kruise openkruise/kruise-state-metrics --version 0.2.2
```
## 性能问题排查
### 如何启用 pprof
默认情况下kruise-manager 已经启用了 pprof而为了安全考虑kruise-daemon 默认是禁用的。你也可以通过组件启动参数来更改其行为。
| component | pprof enable argument | pprof address argument |
| -------- | ------- | ------- |
| kruise-manager | --enable-pprof=true (true by default) | --pprof-addr="host:port" (":8090" by default) |
| kruise-daemon | --enable-pprof=false(false by default) | --pprof-addr="host:port" (":10222" by default) |

View File

@ -113,11 +113,6 @@ There are 3 major parts of api specifications you should pay attention to:
| `workloadRef` | Object | | Reference to the workload being managed |
| `strategy` | Object | | Rollout strategy configuration (canary or bluegreen) |
**Version Compatibility**
- `blueGreen` strategy requires Kruise Rollout v0.5.0+
- `spec.disabled` is available in both v1alpha1 and v1beta1 APIs
- The `blueGreen` strategy is only supported in v1beta1 API
### Workload Binding API (Mandatory)
Tell Kruise Rollout which workload should be bounded:
@ -327,15 +322,15 @@ spec:
### Strategy API (Mandatory)
| Field | Type | Default | Description |
|--------------|---------|---------|---------------------------------------------------------------------------|
| `paused` | boolean | false | When true, pauses rollout progression until manually resumed |
| `canary` | Object | nil | Canary strategy configuration |
| `blueGreen` | Object | nil | Blue-green strategy configuration (requires v0.5.0+) |
| Field | Type | Default | Description |
|--------------|---------|---------|--------------------------------------------------------------|
| `paused` | boolean | false | When true, pauses rollout progression until manually resumed |
| `canary` | Object | nil | Canary strategy configuration |
| `blueGreen` | Object | nil | Blue-green strategy configuration (requires v0.6.0+) |
**Note: Difference between Disabled and Paused**
- **Disabled**: Stops all Rollout reconciliation; the controller ignores this Rollout until re-enabled.
- **Paused**: Keeps the Rollout active but halts progression between steps. Useful for inspections or troubleshooting. `paused` field is available in both v1alpha1 and v1beta1 APIs.
- **Disabled**: Stops all Rollout reconciliation and routes all traffic to the stable workload; the controller will ignore this Rollout until re-enabled, it is equivalent to deleting the Rollout object.
- **Paused**: Keeps the Rollout active but halts progression between steps. Useful for inspections or troubleshooting.
`canary` is used for canary strategy and multi-batch strategy, while `blueGreen` is used for blue-green strategy. These two are mutually exclusive; they cannot both be empty or both be non-empty. The `blueGreen` strategy was introduced in Kruise-Rollout versions higher than v0.5.0 and is not supported in the v1alpha1 API.

View File

@ -108,25 +108,7 @@ $ kubectl patch deployment workload-demo -p \
Wait a while, we will see the Deployment status show **Only 1 Pod** is upgraded.
![](../../static/img/rollouts/basic-1st-batch.jpg)
### Step 3: Inspect or continue your rollout
**Inspect** the rollouts detailed status, steps, and recent events:
```bash
$ kubectl-kruise describe rollout rollouts-demo -n default
```
**Example output:**
```
Name: rollouts-demo
Namespace: default
Status: Healthy
Strategy: Canary
Step: 1/4
Steps:
- Replicas: 1 State: StepUpgrade
- Replicas: 2
- Replicas: 3
- Replicas: 4
```
### Step 3: Continue to release the 2-nd batch
**Approve** the next batch if everything looks good:
```bash
$ kubectl-kruise rollout approve rollout/rollouts-demo -n default
@ -207,6 +189,49 @@ func IsRolloutCurrentStepReady(workload appsv1.Deployment, rollout *rolloutsv1be
}
```
## How to examine the newly deployed pods
One can examine the newly deployed pods by using `kubectl kruise describe rollout`. Note that the command will show concise information about the rollout and list the newly deployed pods related to current step.
```bash
$ kubectl kruise describe rollout rollouts-demo -n default
```
**Example output:**
```
Name: rollouts-demo
Namespace: default
Status: ⚠ Progressing
Message: Rollout is in step(1/3), and you need manually confirm to enter the next step
Strategy: Canary
Step: 1/3
Steps:
- Replicas: 1
State: StepPaused
- Replicas: 2
- Replicas: 3
Images: registry.cn-hangzhou.aliyuncs.com/acs-sample/nginx:latest
Current Revision: 5555d6dcc8
Update Revision: 579589c5cd
Replicas:
Desired: 4
Updated: 1
Current: 4
Ready: 4
Available: 4
NAME READY BATCH ID REVISION AGE RESTARTS STATUS
nginx-deployment-basic-579589c5cd-rx5nm 1/1 1 579589c5cd 22s 0 ✔ Running
```
alternatively, one can directly filter the pods using the following pod labels:
1. `rollouts.kruise.io/rollout-id`: used to identify different rollout actions. The value of this label comes from the label of the workload with the same name. If the `rollouts.kruise.io/rollout-id` label does not exist in the workload, Kruise Rollout will generate one using the revision.
2. `rollouts.kruise.io/rollout-batch-id`: used to identify different steps. The value is a number that starts from 1.
one can use the following command to filter the pods directly:
```bash
% kubectl get pods -l rollouts.kruise.io/rollout-id=579589c5cd,rollouts.kruise.io/rollout-batch-id=1
NAME READY STATUS RESTARTS AGE
nginx-deployment-basic-579589c5cd-rx5nm 1/1 Running 0 18m
```
## How to do rollback?
In fact, Kruise Rollout **DOES NOT** provide the ability to rollback directly. **Kruise Rollout prefers that users can rollback workload spec directly to rollback their application.** When users need to rollback from “version-2” to ”version-1“, Kruise Rollout will use the native rolling upgrade strategy to quickly rollback, instead of following the multi-batch checkpoint strategy.

View File

@ -89,6 +89,15 @@ module.exports = {
'developer-manuals/other-languages',
],
},
{
type: 'category',
label: 'Operator Manuals',
collapsed: true,
items: [
'operator-manuals/availability',
'operator-manuals/troubleshooting',
],
},
{
type: 'category',
label: 'Reference',

View File

@ -0,0 +1,61 @@
---
title: High availability
---
# Overview
the kruise control-plane kruise-manager has two components: webhook and controller. The webhook is replicated and the traffic is balanced over the replicas using cluster ip service. The controller is main-secondary architecture, and the main instance is leader elected. In the default installation, the webhook is colocated with the controller. However, it is also possible to deploy the webhook and controller independently.
# How to recover from application abnormal
Entrypoint of the kruise-manager container is the process of kruise-manager, so any panic or oom will restart the container. The readiness probe of kruise-manager will detect whether webhook credential is ready and whether webhook server is reachable, so if the webhook server is not responsive, the traffic will be routed away from the non-responsive kruise-manager instance. The controller component of kruise-manager will periodically update the lease for leader election, if the leader controller panic or hang, other healthy instance of kruise-manager can become the new leader and continue the reconciliation.
# How to recover from node/zone failure
The kruise-manager instances are scheduled into different nodes prior to 1.8, so that the kruise-manager can tolerate single node failure. After 1.8 the instances are scheduled into different availability zone, so that the kruise-manager can tolerate failure of single availability zone. Note that the topology information may not exist in some bare-metal nodes in the on-prem environment, please add the topology key to the nodes if you want to avoid zone failure. In addition, the topology spread configuration is not required and one can change the `whenUnsatisfiable` to `DoNotSchedule` if strong spread is required.
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
control-plane: controller-manager
name: kruise-controller-manager
spec:
template:
spec:
topologySpreadConstraints:
- labelSelector:
matchLabels:
control-plane: controller-manager
matchLabelKeys:
- pod-template-hash
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway # change to whenUnsatisfiable if strong spread is required.
...
```
# How to avoid cyclic dependency problem
kruise-manager contains a webhook for pod, and failure in kruise-manager will block pod creation. To avoid cyclic dependency problem, kruise-manager itself use host network, so that kruise-manager will not rely on any network component. In addition, the webhook of kruise-manager will skip the pod admission in kube-system namespace which are the default installation namespace of common system component, and the namespace where kruise-manager itself are deployed. Note that without the webhook function, some OpenKruise function will not work correctly e.g. SidecarSet and WorkloadSpread.
# How to avoid failure during component update
To ensure maximum availability during component rollout, kruise-manager always try to create 100% replicas of new pods, and wait for these replicas to be ready before delete the old replicas. Note that these strategy may be problematic if the resource for deployment is not enough. One can adjust the deployment strategy and reduce the value of maxSurge in such case.
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
control-plane: controller-manager
name: kruise-controller-manager
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 100% # reduce maxSurge if the resource for deployment is not adequate
...
```

View File

@ -0,0 +1,123 @@
---
title: Troubleshooting
---
## Investigate log
OpenKruise uses klog for logging, the kruise chart comes with a default log level of 4, which is verbose enough for most daily trouble shooting. One can further increase the log level to 5 and above to get all the debug information.
From OpenKruise 1.7, we are adding support for structured logs, which natively support (key, value) pairs and object references. And logs can also be outputted in JSON format using `helm install ... --set manager.loggingFormat=json.`
For example, this invocation of InfoS:
```
klog.V(3).InfoS("SidecarSet updated status success", "sidecarSet", klog.KObj(sidecarSet), "matchedPods", status.MatchedPods,
"updatedPods", status.UpdatedPods, "readyPods", status.ReadyPods, "updateReadyPods", status.UpdatedReadyPods)
```
will result in this log:
```
I0821 14:22:35.587919 1 sidecarset_processor.go:280] "SidecarSet updated status success" sidecarSet="test-sidecarset" matchedPods=1 updatedPods=1 readyPods=1 updateReadyPods=1
```
Or, if `helm install ... --set manager.loggingFormat=json`, it will result in this output:
```json
{
"ts": 1724239224606.642,
"caller": "sidecarset/sidecarset_processor.go:280",
"msg": "SidecarSet updated status success",
"v": 3,
"sidecarSet": {
"name": "test-sidecarset"
},
"matchedPods": 1,
"updatedPods": 1,
"readyPods": 0,
"updateReadyPods": 0
}
```
## Investigate metrics
### Built-in metrics
OpenKruise exposes metrics in Prometheus format, crucial for monitoring the health and performance of its controllers and managed workloads. By default, the OpenKruise controller manager (`kruise-manager`) exposes metrics on port `8080` at the `/metrics` endpoint. This is typically enabled during installation (e.g., via Helm).
To verify:
1. **Port-forward to the `kruise-controller-manager` service:**
(The service port for metrics is often named `metrics` or is the main service port, e.g., 8080)
```bash
# kubectl get svc -n kruise-system kruise-controller-manager -o jsonpath='{.spec.ports[?(@.name=="metrics")].port}' # to find port
kubectl port-forward -n kruise-system svc/kruise-controller-manager 8080:8080
```
2. **Query the metrics endpoint:**
```bash
curl localhost:8080/metrics
```
#### Key Metrics Categories
1. Controller Runtime Metrics
Standard metrics from `controller-runtime` library, offering insights into individual controller and webhook performance, here are some key metrics to notice:
| Metric Name | Type | Description | Labels |
|----------------------------------------------|-----------|---------------------------------------------|------------------|
| `controller_runtime_reconcile_total` | Counter | Total reconciliations per controller. | `controller` |
| `controller_runtime_reconcile_errors_total` | Counter | Total reconciliation errors per controller. | `controller` |
| `controller_runtime_reconcile_time_seconds` | Histogram | Reconciliation duration per controller. | `controller` |
| `workqueue_depth` | Gauge | Current workqueue depth per controller. | `name` |
| `controller_runtime_webhook_requests_total` | Counter | Total requests of webhook | `code`,`webhook` |
| `controller_runtime_webhook_latency_seconds` | Histogram | latency of webhook request | `webhook` |
**Use to:** Identify overloaded controllers (high `workqueue_depth`, long `controller_runtime_webhook_latency_seconds`) or persistent issues (increasing `reconcile_errors_total`).
2. OpenKruise Specific Metrics
Custom metrics for OpenKruise features.
| Metric Name | Type | Description | Labels |
|--------------------------------------|---------|--------------------------------------------------------------------|-----------------------------------|
| `kruise_manager_is_leader` | Gauge | Gauge the leader of kruise-manager | |
| `pod_unavailable_budget` | Counter | Number of Pod Unavailable Budget protection against pod disruption | `kind_namespace_name`, `username` |
| `cloneset_scale_expectation_leakage` | Counter | Number of CloneSet Scale Expectation Timeout | `namespace`,`name` |
| `namespace_deletion_protection` | Counter | Number of namespace deletion protection | `name`, `username` |
| `crd_deletion_protection` | Counter | Number of CustomResourceDefinition deletion protection | `name`, `username` |
| `workload_deletion_protection` | Counter | Number of workload deletion protection | `kind_namespace_name`, `username` |
**Use to:** Identify control-plane problem, feature performance (e.g., PUB protections)
3. Go Runtime and Process Metrics
Standard Go and process metrics for advanced debugging of `kruise-controller-manager` resource usage.
### Kruise State Metrics
[kruise-state-metrics](https://github.com/openkruise/kruise-state-metrics) is a simple service that listens to the
Kubernetes API server and generates metrics about the state of the objects.
It is not focused on the health of the individual OpenKruise components, but rather on the health of the various objects
inside, such as clonesets, advanced statefulsets and sidecarsets.
```bash
# Firstly add openkruise charts repository if you haven't do this.
$ helm repo add openkruise https://openkruise.github.io/charts/
# [Optional]
$ helm repo update
# Install the latest version.
$ helm install kruise openkruise/kruise-state-metrics --version 0.2.2
```
## Investigate performance problem
### how to enable pprof
Kruise-manager enables pprof by default, and kruise-daemon disables the pprof by default for security consideration. One can disable pprof or change the pprof address by setting the command line arguments of the component to include the following argument.
| component | pprof enable argument | pprof address argument |
| -------- | ------- | ------- |
| kruise-manager | --enable-pprof=true (true by default) | --pprof-addr="host:port" (":8090" by default) |
| kruise-daemon | --enable-pprof=false(false by default) | --pprof-addr="host:port" (":10222" by default) |

View File

@ -0,0 +1,61 @@
---
title: High availability
---
# Overview
the kruise control-plane kruise-manager has two components: webhook and controller. The webhook is replicated and the traffic is balanced over the replicas using cluster ip service. The controller is main-secondary architecture, and the main instance is leader elected. In the default installation, the webhook is colocated with the controller. However, it is also possible to deploy the webhook and controller independently.
# How to recover from application abnormal
Entrypoint of the kruise-manager container is the process of kruise-manager, so any panic or oom will restart the container. The readiness probe of kruise-manager will detect whether webhook credential is ready and whether webhook server is reachable, so if the webhook server is not responsive, the traffic will be routed away from the non-responsive kruise-manager instance. The controller component of kruise-manager will periodically update the lease for leader election, if the leader controller panic or hang, other healthy instance of kruise-manager can become the new leader and continue the reconciliation.
# How to recover from node/zone failure
The kruise-manager instances are scheduled into different nodes prior to 1.8, so that the kruise-manager can tolerate single node failure. After 1.8 the instances are scheduled into different availability zone, so that the kruise-manager can tolerate failure of single availability zone. Note that the topology information may not exist in some bare-metal nodes in the on-prem environment, please add the topology key to the nodes if you want to avoid zone failure. In addition, the topology spread configuration is not required and one can change the `whenUnsatisfiable` to `DoNotSchedule` if strong spread is required.
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
control-plane: controller-manager
name: kruise-controller-manager
spec:
template:
spec:
topologySpreadConstraints:
- labelSelector:
matchLabels:
control-plane: controller-manager
matchLabelKeys:
- pod-template-hash
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway # change to whenUnsatisfiable if strong spread is required.
...
```
# How to avoid cyclic dependency problem
kruise-manager contains a webhook for pod, and failure in kruise-manager will block pod creation. To avoid cyclic dependency problem, kruise-manager itself use host network, so that kruise-manager will not rely on any network component. In addition, the webhook of kruise-manager will skip the pod admission in kube-system namespace which are the default installation namespace of common system component, and the namespace where kruise-manager itself are deployed. Note that without the webhook function, some OpenKruise function will not work correctly e.g. SidecarSet and WorkloadSpread.
# How to avoid failure during component update
To ensure maximum availability during component rollout, kruise-manager always try to create 100% replicas of new pods, and wait for these replicas to be ready before delete the old replicas. Note that these strategy may be problematic if the resource for deployment is not enough. One can adjust the deployment strategy and reduce the value of maxSurge in such case.
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
control-plane: controller-manager
name: kruise-controller-manager
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 100% # reduce maxSurge if the resource for deployment is not adequate
...
```

View File

@ -0,0 +1,123 @@
---
title: Troubleshooting
---
## Investigate log
OpenKruise uses klog for logging, the kruise chart comes with a default log level of 4, which is verbose enough for most daily trouble shooting. One can further increase the log level to 5 and above to get all the debug informations.
From OpenKruise 1.7, we are adding support for structured logs, which natively support (key, value) pairs and object references. And logs can also be outputted in JSON format using `helm install ... --set manager.loggingFormat=json.`
For example, this invocation of InfoS:
```
klog.V(3).InfoS("SidecarSet updated status success", "sidecarSet", klog.KObj(sidecarSet), "matchedPods", status.MatchedPods,
"updatedPods", status.UpdatedPods, "readyPods", status.ReadyPods, "updateReadyPods", status.UpdatedReadyPods)
```
will result in this log:
```
I0821 14:22:35.587919 1 sidecarset_processor.go:280] "SidecarSet updated status success" sidecarSet="test-sidecarset" matchedPods=1 updatedPods=1 readyPods=1 updateReadyPods=1
```
Or, if `helm install ... --set manager.loggingFormat=json`, it will result in this output:
```json
{
"ts": 1724239224606.642,
"caller": "sidecarset/sidecarset_processor.go:280",
"msg": "SidecarSet updated status success",
"v": 3,
"sidecarSet": {
"name": "test-sidecarset"
},
"matchedPods": 1,
"updatedPods": 1,
"readyPods": 0,
"updateReadyPods": 0
}
```
## Investigate metrics
### Built-in metrics
OpenKruise exposes metrics in Prometheus format, crucial for monitoring the health and performance of its controllers and managed workloads. By default, the OpenKruise controller manager (`kruise-manager`) exposes metrics on port `8080` at the `/metrics` endpoint. This is typically enabled during installation (e.g., via Helm).
To verify:
1. **Port-forward to the `kruise-controller-manager` service:**
(The service port for metrics is often named `metrics` or is the main service port, e.g., 8080)
```bash
# kubectl get svc -n kruise-system kruise-controller-manager -o jsonpath='{.spec.ports[?(@.name=="metrics")].port}' # to find port
kubectl port-forward -n kruise-system svc/kruise-controller-manager 8080:8080
```
2. **Query the metrics endpoint:**
```bash
curl localhost:8080/metrics
```
#### Key Metrics Categories
1. Controller Runtime Metrics
Standard metrics from `controller-runtime` library, offering insights into individual controller and webhook performance, here are some key metrics to notice:
| Metric Name | Type | Description | Labels |
|----------------------------------------------|-----------|---------------------------------------------|------------------|
| `controller_runtime_reconcile_total` | Counter | Total reconciliations per controller. | `controller` |
| `controller_runtime_reconcile_errors_total` | Counter | Total reconciliation errors per controller. | `controller` |
| `controller_runtime_reconcile_time_seconds` | Histogram | Reconciliation duration per controller. | `controller` |
| `workqueue_depth` | Gauge | Current workqueue depth per controller. | `name` |
| `controller_runtime_webhook_requests_total` | Counter | Total requests of webhook | `code`,`webhook` |
| `controller_runtime_webhook_latency_seconds` | Histogram | latency of webhook request | `webhook` |
**Use to:** Identify overloaded controllers (high `workqueue_depth`, long `controller_runtime_webhook_latency_seconds`) or persistent issues (increasing `reconcile_errors_total`).
2. OpenKruise Specific Metrics
Custom metrics for OpenKruise features.
| Metric Name | Type | Description | Labels |
|--------------------------------------|---------|--------------------------------------------------------------------|-----------------------------------|
| `kruise_manager_is_leader` | Gauge | Gauge the leader of kruise-manager | |
| `pod_unavailable_budget` | Counter | Number of Pod Unavailable Budget protection against pod disruption | `kind_namespace_name`, `username` |
| `cloneset_scale_expectation_leakage` | Counter | Number of CloneSet Scale Expectation Timeout | `namespace`,`name` |
| `namespace_deletion_protection` | Counter | Number of namespace deletion protection | `name`, `username` |
| `crd_deletion_protection` | Counter | Number of CustomResourceDefinition deletion protection | `name`, `username` |
| `workload_deletion_protection` | Counter | Number of workload deletion protection | `kind_namespace_name`, `username` |
**Use to:** Identify control-plane problem, feature performance (e.g., PUB protections)
3. Go Runtime and Process Metrics
Standard Go and process metrics for advanced debugging of `kruise-controller-manager` resource usage.
### Kruise State Metrics
[kruise-state-metrics](https://github.com/openkruise/kruise-state-metrics) is a simple service that listens to the
Kubernetes API server and generates metrics about the state of the objects.
It is not focused on the health of the individual OpenKruise components, but rather on the health of the various objects
inside, such as clonesets, advanced statefulsets and sidecarsets.
```bash
# Firstly add openkruise charts repository if you haven't do this.
$ helm repo add openkruise https://openkruise.github.io/charts/
# [Optional]
$ helm repo update
# Install the latest version.
$ helm install kruise openkruise/kruise-state-metrics --version 0.2.2
```
## Investigate performance problem
### how to enable pprof
Kruise-manager enables pprof by default, and kruise-daemon disables the pprof by default for security consideration. One can disable pprof or change the pprof address by setting the command line arguments of the component to include the following argument.
| component | pprof enable argument | pprof address argument |
| -------- | ------- | ------- |
| kruise-manager | --enable-pprof=true (true by default) | --pprof-addr="host:port" (":8090" by default) |
| kruise-daemon | --enable-pprof=false(false by default) | --pprof-addr="host:port" (":10222" by default) |

View File

@ -78,6 +78,15 @@
"developer-manuals/other-languages"
]
},
{
"type": "category",
"label": "Operator Manuals",
"collapsed": true,
"items": [
"operator-manuals/availability",
"operator-manuals/troubleshooting"
]
},
{
"type": "category",
"label": "Reference",

View File

@ -78,6 +78,15 @@
"developer-manuals/other-languages"
]
},
{
"type": "category",
"label": "Operator Manuals",
"collapsed": true,
"items": [
"operator-manuals/availability",
"operator-manuals/troubleshooting"
]
},
{
"type": "category",
"label": "Reference",