[zh] sync /debug-application/debug-pods.md

This commit is contained in:
windsonsea 2023-06-21 09:16:08 +08:00
parent dc4eea76c8
commit a074360983
1 changed files with 55 additions and 48 deletions

View File

@ -3,7 +3,6 @@ title: 调试 Pod
content_type: task content_type: task
weight: 10 weight: 10
--- ---
<!-- <!--
reviewers: reviewers:
- mikedanese - mikedanese
@ -16,23 +15,21 @@ weight: 10
<!-- overview --> <!-- overview -->
<!-- <!--
This guide is to help users debug applications that are deployed into Kubernetes and not behaving correctly. This guide is to help users debug applications that are deployed into Kubernetes
This is *not* a guide for people who want to debug their cluster. For that you should check out and not behaving correctly. This is *not* a guide for people who want to debug their cluster.
[this guide](/docs/tasks/debug/debug-cluster). For that you should check out [this guide](/docs/tasks/debug/debug-cluster).
--> -->
本指南帮助用户调试那些部署到 Kubernetes 上后没有正常运行的应用。 本指南帮助用户调试那些部署到 Kubernetes 上后没有正常运行的应用。
本指南 **并非** 指导用户如何调试集群。 本指南 **并非** 指导用户如何调试集群。
如果想调试集群的话,请参阅[这里](/zh-cn/docs/tasks/debug/debug-cluster)。 如果想调试集群的话,请参阅[这里](/zh-cn/docs/tasks/debug/debug-cluster)。
<!-- body --> <!-- body -->
<!-- <!--
## Diagnosing the problem ## Diagnosing the problem
The first step in troubleshooting is triage. What is the problem? Is it your Pods, your Replication Controller or The first step in troubleshooting is triage. What is the problem?
your Service? Is it your Pods, your Replication Controller or your Service?
* [Debugging Pods](#debugging-pods) * [Debugging Pods](#debugging-pods)
* [Debugging Replication Controllers](#debugging-replication-controllers) * [Debugging Replication Controllers](#debugging-replication-controllers)
@ -49,7 +46,8 @@ your Service?
<!-- <!--
### Debugging Pods ### Debugging Pods
The first step in debugging a Pod is taking a look at it. Check the current state of the Pod and recent events with the following command: The first step in debugging a Pod is taking a look at it. Check the current
state of the Pod and recent events with the following command:
--> -->
### 调试 Pod {#debugging-pods} ### 调试 Pod {#debugging-pods}
@ -60,7 +58,8 @@ kubectl describe pods ${POD_NAME}
``` ```
<!-- <!--
Look at the state of the containers in the pod. Are they all `Running`? Have there been recent restarts? Look at the state of the containers in the pod. Are they all `Running`?
Have there been recent restarts?
Continue debugging depending on the state of the pods. Continue debugging depending on the state of the pods.
--> -->
@ -71,32 +70,35 @@ Continue debugging depending on the state of the pods.
<!-- <!--
#### My pod stays pending #### My pod stays pending
If a Pod is stuck in `Pending` it means that it can not be scheduled onto a node. Generally this is because If a Pod is stuck in `Pending` it means that it can not be scheduled onto a node.
there are insufficient resources of one type or another that prevent scheduling. Look at the output of the Generally this is because there are insufficient resources of one type or another
`kubectl describe ...` command above. There should be messages from the scheduler about why it can not schedule that prevent scheduling. Look at the output of the `kubectl describe ...` command above.
your pod. Reasons include: There should be messages from the scheduler about why it can not schedule your pod.
Reasons include:
--> -->
#### Pod 停滞在 Pending 状态 {#my-pod-stays-pending} #### Pod 停滞在 Pending 状态 {#my-pod-stays-pending}
如果一个 Pod 停滞在 `Pending` 状态,表示 Pod 没有被调度到节点上。通常这是因为 如果一个 Pod 停滞在 `Pending` 状态,表示 Pod 没有被调度到节点上。
某种类型的资源不足导致无法调度。 通常这是因为某种类型的资源不足导致无法调度。
查看上面的 `kubectl describe ...` 命令的输出,其中应该显示了为什么没被调度的原因。 查看上面的 `kubectl describe ...` 命令的输出,其中应该显示了为什么没被调度的原因。
常见原因如下: 常见原因如下:
<!-- <!--
* **You don't have enough resources**: You may have exhausted the supply of CPU or Memory in your cluster, in this case * **You don't have enough resources**: You may have exhausted the supply of CPU
you need to delete Pods, adjust resource requests, or add new nodes to your cluster. See or Memory in your cluster, in this case you need to delete Pods, adjust resource
[Compute Resources document](/docs/concepts/configuration/manage-resources-containers/) for more information. requests, or add new nodes to your cluster. See [Compute Resources document](/docs/concepts/configuration/manage-resources-containers/)
for more information.
* **You are using `hostPort`**: When you bind a Pod to a `hostPort` there are a limited number of places that pod can be * **You are using `hostPort`**: When you bind a Pod to a `hostPort` there are a
scheduled. In most cases, `hostPort` is unnecessary, try using a Service object to expose your Pod. If you do require limited number of places that pod can be scheduled. In most cases, `hostPort`
`hostPort` then you can only schedule as many Pods as there are nodes in your Kubernetes cluster. is unnecessary, try using a Service object to expose your Pod. If you do require
`hostPort` then you can only schedule as many Pods as there are nodes in your Kubernetes cluster.
--> -->
* **资源不足**: * **资源不足**
你可能耗尽了集群上所有的 CPU 或内存。此时,你需要删除 Pod、调整资源请求或者为集群添加节点。 你可能耗尽了集群上所有的 CPU 或内存。此时,你需要删除 Pod、调整资源请求或者为集群添加节点。
更多信息请参阅[计算资源文档](/zh-cn/docs/concepts/configuration/manage-resources-containers/) 更多信息请参阅[计算资源文档](/zh-cn/docs/concepts/configuration/manage-resources-containers/)
* **使用了 `hostPort`**: * **使用了 `hostPort`**
如果绑定 Pod 到 `hostPort`,那么能够运行该 Pod 的节点就有限了。 如果绑定 Pod 到 `hostPort`,那么能够运行该 Pod 的节点就有限了。
多数情况下,`hostPort` 是非必要的,而应该采用 Service 对象来暴露 Pod。 多数情况下,`hostPort` 是非必要的,而应该采用 Service 对象来暴露 Pod。
如果确实需要使用 `hostPort`,那么集群中节点的个数就是所能创建的 Pod 如果确实需要使用 `hostPort`,那么集群中节点的个数就是所能创建的 Pod
@ -105,8 +107,10 @@ scheduled. In most cases, `hostPort` is unnecessary, try using a Service object
<!-- <!--
#### My pod stays waiting #### My pod stays waiting
If a Pod is stuck in the `Waiting` state, then it has been scheduled to a worker node, but it can't run on that machine. If a Pod is stuck in the `Waiting` state, then it has been scheduled to a worker node,
Again, the information from `kubectl describe ...` should be informative. The most common cause of `Waiting` pods is a failure to pull the image. There are three things to check: but it can't run on that machine. Again, the information from `kubectl describe ...`
should be informative. The most common cause of `Waiting` pods is a failure to pull the image.
There are three things to check:
* Make sure that you have the name of the image correct. * Make sure that you have the name of the image correct.
* Have you pushed the image to the registry? * Have you pushed the image to the registry?
@ -119,20 +123,21 @@ Again, the information from `kubectl describe ...` should be informative. The m
同样,`kubectl describe ...` 命令的输出可能很有用。 同样,`kubectl describe ...` 命令的输出可能很有用。
`Waiting` 状态的最常见原因是拉取镜像失败。要检查的有三个方面: `Waiting` 状态的最常见原因是拉取镜像失败。要检查的有三个方面:
* 确保镜像名字拼写正确 * 确保镜像名字拼写正确
* 确保镜像已被推送到镜像仓库 * 确保镜像已被推送到镜像仓库
* 尝试手动是否能拉取镜像。例如,如果你在你的 PC 上使用 Docker请运行 `docker pull <镜像>` * 尝试手动是否能拉取镜像。例如,如果你在你的 PC 上使用 Docker请运行 `docker pull <镜像>`
<!-- <!--
#### My pod is crashing or otherwise unhealthy #### My pod is crashing or otherwise unhealthy
Once your pod has been scheduled, the methods described in [Debug Running Pods]( Once your pod has been scheduled, the methods described in
/docs/tasks/debug/debug-application/debug-running-pod/) are available for debugging. [Debug Running Pods](/docs/tasks/debug/debug-application/debug-running-pod/)
are available for debugging.
--> -->
#### Pod 处于 Crashing 或别的不健康状态 {#my-pod-is-crashing-or-otherwise-unhealthy} #### Pod 处于 Crashing 或别的不健康状态 {#my-pod-is-crashing-or-otherwise-unhealthy}
一旦 Pod 被调度,就可以采用 一旦 Pod 被调度,
[调试运行中的 Pod](/zh-cn/docs/tasks/debug/debug-application/debug-running-pod/) 就可以采用[调试运行中的 Pod](/zh-cn/docs/tasks/debug/debug-application/debug-running-pod/)
中的方法来进一步调试。 中的方法来进一步调试。
<!-- <!--
@ -160,7 +165,7 @@ If you misspelled `command` as `commnd` then will give an error like this:
--> -->
可以做的第一件事是删除你的 Pod并尝试带有 `--validate` 选项重新创建。 可以做的第一件事是删除你的 Pod并尝试带有 `--validate` 选项重新创建。
例如,运行 `kubectl apply --validate -f mypod.yaml` 例如,运行 `kubectl apply --validate -f mypod.yaml`
如果 `command` 被误拼成 `commnd`,你将会看到下面的错误信息: 如果 `command` 被误拼成 `commnd`,你将会看到下面的错误信息:
```shell ```shell
I0805 10:43:25.129850 46757 schema.go:126] unknown field: commnd I0805 10:43:25.129850 46757 schema.go:126] unknown field: commnd
@ -175,9 +180,9 @@ The next thing to check is whether the pod on the apiserver
matches the pod you meant to create (e.g. in a yaml file on your local machine). matches the pod you meant to create (e.g. in a yaml file on your local machine).
For example, run `kubectl get pods/mypod -o yaml > mypod-on-apiserver.yaml` and then For example, run `kubectl get pods/mypod -o yaml > mypod-on-apiserver.yaml` and then
manually compare the original pod description, `mypod.yaml` with the one you got manually compare the original pod description, `mypod.yaml` with the one you got
back from apiserver, `mypod-on-apiserver.yaml`. There will typically be some back from apiserver, `mypod-on-apiserver.yaml`. There will typically be some
lines on the "apiserver" version that are not on the original version. This is lines on the "apiserver" version that are not on the original version. This is
expected. However, if there are lines on the original that are not on the apiserver expected. However, if there are lines on the original that are not on the apiserver
version, then this may indicate a problem with your pod spec. version, then this may indicate a problem with your pod spec.
--> -->
接下来就要检查的是 API 服务器上的 Pod 与你所期望创建的是否匹配 接下来就要检查的是 API 服务器上的 Pod 与你所期望创建的是否匹配
@ -191,11 +196,12 @@ Pod 规约是有问题的。
<!-- <!--
### Debugging Replication Controllers ### Debugging Replication Controllers
Replication controllers are fairly straightforward. They can either create Pods or they can't. If they can't Replication controllers are fairly straightforward. They can either create Pods or they can't.
create pods, then please refer to the [instructions above](#debugging-pods) to debug your pods. If they can't create pods, then please refer to the
[instructions above](#debugging-pods) to debug your pods.
You can also use `kubectl describe rc ${CONTROLLER_NAME}` to introspect events related to the replication You can also use `kubectl describe rc ${CONTROLLER_NAME}` to introspect events
controller. related to the replication controller.
--> -->
### 调试副本控制器 {#debugging-replication-controllers} ### 调试副本控制器 {#debugging-replication-controllers}
@ -207,10 +213,11 @@ controller.
<!-- <!--
### Debugging Services ### Debugging Services
Services provide load balancing across a set of pods. There are several common problems that can make Services Services provide load balancing across a set of pods. There are several common problems that can make Services
not work properly. The following instructions should help debug Service problems. not work properly. The following instructions should help debug Service problems.
First, verify that there are endpoints for the service. For every Service object, the apiserver makes an `endpoints` resource available. First, verify that there are endpoints for the service. For every Service object,
the apiserver makes an `endpoints` resource available.
You can view this resource with: You can view this resource with:
--> -->
@ -241,8 +248,8 @@ IP addresses in the Service's endpoints.
<!-- <!--
#### My service is missing endpoints #### My service is missing endpoints
If you are missing endpoints, try listing pods using the labels that Service uses. Imagine that you have If you are missing endpoints, try listing pods using the labels that Service uses.
a Service where the labels are: Imagine that you have a Service where the labels are:
--> -->
#### 服务缺少 Endpoints {#my-service-is-missing-endpoints} #### 服务缺少 Endpoints {#my-service-is-missing-endpoints}
@ -263,7 +270,8 @@ You can use:
kubectl get pods --selector=name=nginx,type=frontend kubectl get pods --selector=name=nginx,type=frontend
``` ```
to list pods that match this selector. Verify that the list matches the Pods that you expect to provide your Service. to list pods that match this selector. Verify that the list matches the Pods that you expect to provide your Service.
Verify that the pod's `containerPort` matches up with the Service's `targetPort`
--> -->
你可以使用如下命令列出与选择算符相匹配的 Pod并验证这些 Pod 是否归属于创建的服务: 你可以使用如下命令列出与选择算符相匹配的 Pod并验证这些 Pod 是否归属于创建的服务:
@ -298,8 +306,7 @@ You may also visit [troubleshooting document](/docs/tasks/debug/) for more infor
--> -->
如果上述方法都不能解决你的问题, 如果上述方法都不能解决你的问题,
请按照[调试 Service 文档](/zh-cn/docs/tasks/debug/debug-application/debug-service/)中的介绍, 请按照[调试 Service 文档](/zh-cn/docs/tasks/debug/debug-application/debug-service/)中的介绍,
确保你的 `Service` 处于 Running 态,有 `Endpoints` 被创建,`Pod` 真的在提供服务; 确保你的 `Service` 处于 Running 态,有 `Endpoints` 被创建,`Pod` 真的在提供服务;
DNS 服务已配置并正常工作iptables 规则也安装并且 `kube-proxy` 也没有异常行为。 DNS 服务已配置并正常工作iptables 规则也安装并且 `kube-proxy` 也没有异常行为。
你也可以访问[故障排查文档](/zh-cn/docs/tasks/debug/)来获取更多信息。 你也可以访问[故障排查文档](/zh-cn/docs/tasks/debug/)来获取更多信息。