Reorg the monitoring task section (#32823)

* reorg the monitoring task section

Signed-off-by: Paul S. Schweigert <paulschw@us.ibm.com>

* reorg from review comments

Signed-off-by: Paul S. Schweigert <paulschw@us.ibm.com>

* review comments

Signed-off-by: Paul S. Schweigert <paulschw@us.ibm.com>

* review fixes

Signed-off-by: Paul S. Schweigert <paulschw@us.ibm.com>
This commit is contained in:
Paul Schweigert 2022-04-26 00:30:51 -04:00 committed by GitHub
parent 5521d32c12
commit f26e8eff23
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
20 changed files with 656 additions and 765 deletions

View File

@ -1,6 +0,0 @@
---
title: "Monitoring, Logging, and Debugging"
description: Set up monitoring and logging to troubleshoot a cluster, or debug a containerized application.
weight: 80
---

View File

@ -1,124 +0,0 @@
---
reviewers:
- davidopp
title: Troubleshoot Clusters
content_type: concept
---
<!-- overview -->
This doc is about cluster troubleshooting; we assume you have already ruled out your application as the root cause of the
problem you are experiencing. See
the [application troubleshooting guide](/docs/tasks/debug-application-cluster/debug-application) for tips on application debugging.
You may also visit [troubleshooting document](/docs/tasks/debug-application-cluster/troubleshooting/) for more information.
<!-- body -->
## Listing your cluster
The first thing to debug in your cluster is if your nodes are all registered correctly.
Run
```shell
kubectl get nodes
```
And verify that all of the nodes you expect to see are present and that they are all in the `Ready` state.
To get detailed information about the overall health of your cluster, you can run:
```shell
kubectl cluster-info dump
```
## Looking at logs
For now, digging deeper into the cluster requires logging into the relevant machines. Here are the locations
of the relevant log files. (note that on systemd-based systems, you may need to use `journalctl` instead)
### Master
* `/var/log/kube-apiserver.log` - API Server, responsible for serving the API
* `/var/log/kube-scheduler.log` - Scheduler, responsible for making scheduling decisions
* `/var/log/kube-controller-manager.log` - Controller that manages replication controllers
### Worker Nodes
* `/var/log/kubelet.log` - Kubelet, responsible for running containers on the node
* `/var/log/kube-proxy.log` - Kube Proxy, responsible for service load balancing
## A general overview of cluster failure modes
This is an incomplete list of things that could go wrong, and how to adjust your cluster setup to mitigate the problems.
### Root causes:
- VM(s) shutdown
- Network partition within cluster, or between cluster and users
- Crashes in Kubernetes software
- Data loss or unavailability of persistent storage (e.g. GCE PD or AWS EBS volume)
- Operator error, for example misconfigured Kubernetes software or application software
### Specific scenarios:
- Apiserver VM shutdown or apiserver crashing
- Results
- unable to stop, update, or start new pods, services, replication controller
- existing pods and services should continue to work normally, unless they depend on the Kubernetes API
- Apiserver backing storage lost
- Results
- apiserver should fail to come up
- kubelets will not be able to reach it but will continue to run the same pods and provide the same service proxying
- manual recovery or recreation of apiserver state necessary before apiserver is restarted
- Supporting services (node controller, replication controller manager, scheduler, etc) VM shutdown or crashes
- currently those are colocated with the apiserver, and their unavailability has similar consequences as apiserver
- in future, these will be replicated as well and may not be co-located
- they do not have their own persistent state
- Individual node (VM or physical machine) shuts down
- Results
- pods on that Node stop running
- Network partition
- Results
- partition A thinks the nodes in partition B are down; partition B thinks the apiserver is down. (Assuming the master VM ends up in partition A.)
- Kubelet software fault
- Results
- crashing kubelet cannot start new pods on the node
- kubelet might delete the pods or not
- node marked unhealthy
- replication controllers start new pods elsewhere
- Cluster operator error
- Results
- loss of pods, services, etc
- lost of apiserver backing store
- users unable to read API
- etc.
### Mitigations:
- Action: Use IaaS provider's automatic VM restarting feature for IaaS VMs
- Mitigates: Apiserver VM shutdown or apiserver crashing
- Mitigates: Supporting services VM shutdown or crashes
- Action: Use IaaS providers reliable storage (e.g. GCE PD or AWS EBS volume) for VMs with apiserver+etcd
- Mitigates: Apiserver backing storage lost
- Action: Use [high-availability](/docs/setup/production-environment/tools/kubeadm/high-availability/) configuration
- Mitigates: Control plane node shutdown or control plane components (scheduler, API server, controller-manager) crashing
- Will tolerate one or more simultaneous node or component failures
- Mitigates: API server backing storage (i.e., etcd's data directory) lost
- Assumes HA (highly-available) etcd configuration
- Action: Snapshot apiserver PDs/EBS-volumes periodically
- Mitigates: Apiserver backing storage lost
- Mitigates: Some cases of operator error
- Mitigates: Some cases of Kubernetes software fault
- Action: use replication controller and services in front of pods
- Mitigates: Node shutdown
- Mitigates: Kubelet software fault
- Action: applications (containers) designed to tolerate unexpected restarts
- Mitigates: Node shutdown
- Mitigates: Kubelet software fault

View File

@ -1,107 +0,0 @@
---
reviewers:
- bprashanth
title: Debug Pods and ReplicationControllers
content_type: task
---
<!-- overview -->
This page shows how to debug Pods and ReplicationControllers.
## {{% heading "prerequisites" %}}
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
* You should be familiar with the basics of
{{< glossary_tooltip text="Pods" term_id="pod" >}} and with
Pods' [lifecycles](/docs/concepts/workloads/pods/pod-lifecycle/).
<!-- steps -->
## Debugging Pods
The first step in debugging a pod is taking a look at it. Check the current
state of the pod and recent events with the following command:
```shell
kubectl describe pods ${POD_NAME}
```
Look at the state of the containers in the pod. Are they all `Running`? Have
there been recent restarts?
Continue debugging depending on the state of the pods.
### My pod stays pending
If a pod is stuck in `Pending` it means that it can not be scheduled onto a
node. Generally this is because there are insufficient resources of one type or
another that prevent scheduling. Look at the output of the `kubectl describe
...` command above. There should be messages from the scheduler about why it
can not schedule your pod. Reasons include:
#### Insufficient resources
You may have exhausted the supply of CPU or Memory in your cluster. In this
case you can try several things:
* Add more nodes to the cluster.
* [Terminate unneeded pods](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)
to make room for pending pods.
* Check that the pod is not larger than your nodes. For example, if all
nodes have a capacity of `cpu:1`, then a pod with a request of `cpu: 1.1`
will never be scheduled.
You can check node capacities with the `kubectl get nodes -o <format>`
command. Here are some example command lines that extract the necessary
information:
```shell
kubectl get nodes -o yaml | egrep '\sname:|cpu:|memory:'
kubectl get nodes -o json | jq '.items[] | {name: .metadata.name, cap: .status.capacity}'
```
The [resource quota](/docs/concepts/policy/resource-quotas/)
feature can be configured to limit the total amount of
resources that can be consumed. If used in conjunction with namespaces, it can
prevent one team from hogging all the resources.
#### Using hostPort
When you bind a pod to a `hostPort` there are a limited number of places that
the pod can be scheduled. In most cases, `hostPort` is unnecessary; try using a
service object to expose your pod. If you do require `hostPort` then you can
only schedule as many pods as there are nodes in your container cluster.
### My pod stays waiting
If a pod is stuck in the `Waiting` state, then it has been scheduled to a
worker node, but it can't run on that machine. Again, the information from
`kubectl describe ...` should be informative. The most common cause of
`Waiting` pods is a failure to pull the image. There are three things to check:
* Make sure that you have the name of the image correct.
* Have you pushed the image to the repository?
* Try to manually pull the image to see if it can be pulled. For example, if you
use Docker on your PC, run `docker pull <image>`.
### My pod is crashing or otherwise unhealthy
Once your pod has been scheduled, the methods described in [Debug Running Pods](
/docs/tasks/debug-application-cluster/debug-running-pod/) are available for debugging.
## Debugging ReplicationControllers
ReplicationControllers are fairly straightforward. They can either create pods
or they can't. If they can't create pods, then please refer to the
[instructions above](#debugging-pods) to debug your pods.
You can also use `kubectl describe rc ${CONTROLLER_NAME}` to inspect events
related to the replication controller.

View File

@ -1,333 +0,0 @@
---
reviewers:
- verb
- soltysh
title: Debug Running Pods
content_type: task
---
<!-- overview -->
This page explains how to debug Pods running (or crashing) on a Node.
## {{% heading "prerequisites" %}}
* Your {{< glossary_tooltip text="Pod" term_id="pod" >}} should already be
scheduled and running. If your Pod is not yet running, start with [Troubleshoot
Applications](/docs/tasks/debug-application-cluster/debug-application/).
* For some of the advanced debugging steps you need to know on which Node the
Pod is running and have shell access to run commands on that Node. You don't
need that access to run the standard debug steps that use `kubectl`.
<!-- steps -->
## Examining pod logs {#examine-pod-logs}
First, look at the logs of the affected container:
```shell
kubectl logs ${POD_NAME} ${CONTAINER_NAME}
```
If your container has previously crashed, you can access the previous container's crash log with:
```shell
kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}
```
## Debugging with container exec {#container-exec}
If the {{< glossary_tooltip text="container image" term_id="image" >}} includes
debugging utilities, as is the case with images built from Linux and Windows OS
base images, you can run commands inside a specific container with
`kubectl exec`:
```shell
kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} ${ARG1} ${ARG2} ... ${ARGN}
```
{{< note >}}
`-c ${CONTAINER_NAME}` is optional. You can omit it for Pods that only contain a single container.
{{< /note >}}
As an example, to look at the logs from a running Cassandra pod, you might run
```shell
kubectl exec cassandra -- cat /var/log/cassandra/system.log
```
You can run a shell that's connected to your terminal using the `-i` and `-t`
arguments to `kubectl exec`, for example:
```shell
kubectl exec -it cassandra -- sh
```
For more details, see [Get a Shell to a Running Container](
/docs/tasks/debug-application-cluster/get-shell-running-container/).
## Debugging with an ephemeral debug container {#ephemeral-container}
{{< feature-state state="beta" for_k8s_version="v1.23" >}}
{{< glossary_tooltip text="Ephemeral containers" term_id="ephemeral-container" >}}
are useful for interactive troubleshooting when `kubectl exec` is insufficient
because a container has crashed or a container image doesn't include debugging
utilities, such as with [distroless images](
https://github.com/GoogleContainerTools/distroless).
### Example debugging using ephemeral containers {#ephemeral-container-example}
You can use the `kubectl debug` command to add ephemeral containers to a
running Pod. First, create a pod for the example:
```shell
kubectl run ephemeral-demo --image=k8s.gcr.io/pause:3.1 --restart=Never
```
The examples in this section use the `pause` container image because it does not
contain debugging utilities, but this method works with all container
images.
If you attempt to use `kubectl exec` to create a shell you will see an error
because there is no shell in this container image.
```shell
kubectl exec -it ephemeral-demo -- sh
```
```
OCI runtime exec failed: exec failed: container_linux.go:346: starting container process caused "exec: \"sh\": executable file not found in $PATH": unknown
```
You can instead add a debugging container using `kubectl debug`. If you
specify the `-i`/`--interactive` argument, `kubectl` will automatically attach
to the console of the Ephemeral Container.
```shell
kubectl debug -it ephemeral-demo --image=busybox:1.28 --target=ephemeral-demo
```
```
Defaulting debug container name to debugger-8xzrl.
If you don't see a command prompt, try pressing enter.
/ #
```
This command adds a new busybox container and attaches to it. The `--target`
parameter targets the process namespace of another container. It's necessary
here because `kubectl run` does not enable [process namespace sharing](
/docs/tasks/configure-pod-container/share-process-namespace/) in the pod it
creates.
{{< note >}}
The `--target` parameter must be supported by the {{< glossary_tooltip
text="Container Runtime" term_id="container-runtime" >}}. When not supported,
the Ephemeral Container may not be started, or it may be started with an
isolated process namespace so that `ps` does not reveal processes in other
containers.
{{< /note >}}
You can view the state of the newly created ephemeral container using `kubectl describe`:
```shell
kubectl describe pod ephemeral-demo
```
```
...
Ephemeral Containers:
debugger-8xzrl:
Container ID: docker://b888f9adfd15bd5739fefaa39e1df4dd3c617b9902082b1cfdc29c4028ffb2eb
Image: busybox
Image ID: docker-pullable://busybox@sha256:1828edd60c5efd34b2bf5dd3282ec0cc04d47b2ff9caa0b6d4f07a21d1c08084
Port: <none>
Host Port: <none>
State: Running
Started: Wed, 12 Feb 2020 14:25:42 +0100
Ready: False
Restart Count: 0
Environment: <none>
Mounts: <none>
...
```
Use `kubectl delete` to remove the Pod when you're finished:
```shell
kubectl delete pod ephemeral-demo
```
## Debugging using a copy of the Pod
Sometimes Pod configuration options make it difficult to troubleshoot in certain
situations. For example, you can't run `kubectl exec` to troubleshoot your
container if your container image does not include a shell or if your application
crashes on startup. In these situations you can use `kubectl debug` to create a
copy of the Pod with configuration values changed to aid debugging.
### Copying a Pod while adding a new container
Adding a new container can be useful when your application is running but not
behaving as you expect and you'd like to add additional troubleshooting
utilities to the Pod.
For example, maybe your application's container images are built on `busybox`
but you need debugging utilities not included in `busybox`. You can simulate
this scenario using `kubectl run`:
```shell
kubectl run myapp --image=busybox:1.28 --restart=Never -- sleep 1d
```
Run this command to create a copy of `myapp` named `myapp-debug` that adds a
new Ubuntu container for debugging:
```shell
kubectl debug myapp -it --image=ubuntu --share-processes --copy-to=myapp-debug
```
```
Defaulting debug container name to debugger-w7xmf.
If you don't see a command prompt, try pressing enter.
root@myapp-debug:/#
```
{{< note >}}
* `kubectl debug` automatically generates a container name if you don't choose
one using the `--container` flag.
* The `-i` flag causes `kubectl debug` to attach to the new container by
default. You can prevent this by specifying `--attach=false`. If your session
becomes disconnected you can reattach using `kubectl attach`.
* The `--share-processes` allows the containers in this Pod to see processes
from the other containers in the Pod. For more information about how this
works, see [Share Process Namespace between Containers in a Pod](
/docs/tasks/configure-pod-container/share-process-namespace/).
{{< /note >}}
Don't forget to clean up the debugging Pod when you're finished with it:
```shell
kubectl delete pod myapp myapp-debug
```
### Copying a Pod while changing its command
Sometimes it's useful to change the command for a container, for example to
add a debugging flag or because the application is crashing.
To simulate a crashing application, use `kubectl run` to create a container
that immediately exits:
```
kubectl run --image=busybox:1.28 myapp -- false
```
You can see using `kubectl describe pod myapp` that this container is crashing:
```
Containers:
myapp:
Image: busybox
...
Args:
false
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
```
You can use `kubectl debug` to create a copy of this Pod with the command
changed to an interactive shell:
```
kubectl debug myapp -it --copy-to=myapp-debug --container=myapp -- sh
```
```
If you don't see a command prompt, try pressing enter.
/ #
```
Now you have an interactive shell that you can use to perform tasks like
checking filesystem paths or running the container command manually.
{{< note >}}
* To change the command of a specific container you must
specify its name using `--container` or `kubectl debug` will instead
create a new container to run the command you specified.
* The `-i` flag causes `kubectl debug` to attach to the container by default.
You can prevent this by specifying `--attach=false`. If your session becomes
disconnected you can reattach using `kubectl attach`.
{{< /note >}}
Don't forget to clean up the debugging Pod when you're finished with it:
```shell
kubectl delete pod myapp myapp-debug
```
### Copying a Pod while changing container images
In some situations you may want to change a misbehaving Pod from its normal
production container images to an image containing a debugging build or
additional utilities.
As an example, create a Pod using `kubectl run`:
```
kubectl run myapp --image=busybox:1.28 --restart=Never -- sleep 1d
```
Now use `kubectl debug` to make a copy and change its container image
to `ubuntu`:
```
kubectl debug myapp --copy-to=myapp-debug --set-image=*=ubuntu
```
The syntax of `--set-image` uses the same `container_name=image` syntax as
`kubectl set image`. `*=ubuntu` means change the image of all containers
to `ubuntu`.
Don't forget to clean up the debugging Pod when you're finished with it:
```shell
kubectl delete pod myapp myapp-debug
```
## Debugging via a shell on the node {#node-shell-session}
If none of these approaches work, you can find the Node on which the Pod is
running and create a privileged Pod running in the host namespaces. To create
an interactive shell on a node using `kubectl debug`, run:
```shell
kubectl debug node/mynode -it --image=ubuntu
```
```
Creating debugging pod node-debugger-mynode-pdx84 with container debugger on node mynode.
If you don't see a command prompt, try pressing enter.
root@ek8s:/#
```
When creating a debugging session on a node, keep in mind that:
* `kubectl debug` automatically generates the name of the new Pod based on
the name of the Node.
* The container runs in the host IPC, Network, and PID namespaces.
* The root filesystem of the Node will be mounted at `/host`.
Don't forget to clean up the debugging Pod when you're finished with it:
```shell
kubectl delete pod node-debugger-mynode-pdx84
```

View File

@ -1,9 +1,12 @@
---
title: "Monitoring, Logging, and Debugging"
description: Set up monitoring and logging to troubleshoot a cluster, or debug a containerized application.
weight: 20
reviewers:
- brendandburns
- davidopp
content_type: concept
title: Troubleshooting
no_list: true
---
<!-- overview -->
@ -11,9 +14,9 @@ title: Troubleshooting
Sometimes things go wrong. This guide is aimed at making them right. It has
two sections:
* [Troubleshooting your application](/docs/tasks/debug-application-cluster/debug-application/) - Useful
* [Debugging your application](/docs/tasks/debug/debug-application/) - Useful
for users who are deploying code into Kubernetes and wondering why it is not working.
* [Troubleshooting your cluster](/docs/tasks/debug-application-cluster/debug-cluster/) - Useful
* [Debugging your cluster](/docs/tasks/debug/debug-cluster/) - Useful
for cluster administrators and people whose Kubernetes cluster is unhappy.
You should also check the known issues for the [release](https://github.com/kubernetes/kubernetes/releases)

View File

@ -0,0 +1,8 @@
---
title: "Troubleshooting Applications"
description: Debugging common containerized application issues.
weight: 20
---
This doc contains a set of resources for fixing issues with containerized applications. It covers things like common issues with Kubernetes resources (like Pods, Services, or StatefulSets), advice on making sense of container termination messages, and ways to debug running containers.

View File

@ -9,6 +9,7 @@ reviewers:
- smarterclayton
title: Debug Init Containers
content_type: task
weight: 40
---
<!-- overview -->

View File

@ -2,15 +2,16 @@
reviewers:
- mikedanese
- thockin
title: Troubleshoot Applications
title: Debug Pods
content_type: concept
weight: 10
---
<!-- overview -->
This guide is to help users debug applications that are deployed into Kubernetes and not behaving correctly.
This is *not* a guide for people who want to debug their cluster. For that you should check out
[this guide](/docs/tasks/debug-application-cluster/debug-cluster).
[this guide](/docs/tasks/debug/debug-cluster).
<!-- body -->
@ -64,7 +65,7 @@ Again, the information from `kubectl describe ...` should be informative. The m
#### My pod is crashing or otherwise unhealthy
Once your pod has been scheduled, the methods described in [Debug Running Pods](
/docs/tasks/debug-application-cluster/debug-running-pod/) are available for debugging.
/docs/tasks/debug/debug-applications/debug-running-pod/) are available for debugging.
#### My pod is running but not doing what I told it to do
@ -145,15 +146,15 @@ Verify that the pod's `containerPort` matches up with the Service's `targetPort`
#### Network traffic is not forwarded
Please see [debugging service](/docs/tasks/debug-application-cluster/debug-service/) for more information.
Please see [debugging service](/docs/tasks/debug/debug-applications/debug-service/) for more information.
## {{% heading "whatsnext" %}}
If none of the above solves your problem, follow the instructions in
[Debugging Service document](/docs/tasks/debug-application-cluster/debug-service/)
[Debugging Service document](/docs/tasks/debug/debug-applications/debug-service/)
to make sure that your `Service` is running, has `Endpoints`, and your `Pods` are
actually serving; you have DNS working, iptables rules installed, and kube-proxy
does not seem to be misbehaving.
You may also visit [troubleshooting document](/docs/tasks/debug-application-cluster/troubleshooting/) for more information.
You may also visit [troubleshooting document](/docs/tasks/debug/overview/) for more information.

View File

@ -1,21 +1,25 @@
---
reviewers:
- janetkuo
- thockin
content_type: concept
title: Application Introspection and Debugging
- verb
- soltysh
title: Debug Running Pods
content_type: task
---
<!-- overview -->
Once your application is running, you'll inevitably need to debug problems with it.
Earlier we described how you can use `kubectl get pods` to retrieve simple status information about
your pods. But there are a number of ways to get even more information about your application.
This page explains how to debug Pods running (or crashing) on a Node.
## {{% heading "prerequisites" %}}
<!-- body -->
* Your {{< glossary_tooltip text="Pod" term_id="pod" >}} should already be
scheduled and running. If your Pod is not yet running, start with [Debugging
Pods](/docs/tasks/debug/debug-application/).
* For some of the advanced debugging steps you need to know on which Node the
Pod is running and have shell access to run commands on that Node. You don't
need that access to run the standard debug steps that use `kubectl`.
## Using `kubectl describe pod` to fetch details about pods
@ -125,6 +129,7 @@ Currently the only Condition associated with a Pod is the binary Ready condition
Lastly, you see a log of recent events related to your Pod. The system compresses multiple identical events by indicating the first and last time it was seen and the number of times it was seen. "From" indicates the component that is logging the event, "SubobjectPath" tells you which object (e.g. container within the pod) is being referred to, and "Reason" and "Message" tell you what happened.
## Example: debugging Pending Pods
A common scenario that you can detect using events is when you've created a Pod that won't fit on any node. For example, the Pod might request more resources than are free on any node, or it might specify a label selector that doesn't match any nodes. Let's say we created the previous Deployment with 5 replicas (instead of 2) and requesting 600 millicores instead of 500, on a four-node cluster where each (virtual) machine has 1 CPU. In that case one of the Pods will not be able to schedule. (Note that because of the cluster addon pods such as fluentd, skydns, etc., that run on each node, if we requested 1000 millicores then none of the Pods would be able to schedule.)
@ -326,197 +331,308 @@ status:
startTime: "2022-02-17T21:51:01Z"
```
## Example: debugging a down/unreachable node
## Examining pod logs {#examine-pod-logs}
Sometimes when debugging it can be useful to look at the status of a node -- for example, because you've noticed strange behavior of a Pod that's running on the node, or to find out why a Pod won't schedule onto the node. As with Pods, you can use `kubectl describe node` and `kubectl get node -o yaml` to retrieve detailed information about nodes. For example, here's what you'll see if a node is down (disconnected from the network, or kubelet dies and won't restart, etc.). Notice the events that show the node is NotReady, and also notice that the pods are no longer running (they are evicted after five minutes of NotReady status).
First, look at the logs of the affected container:
```shell
kubectl get nodes
kubectl logs ${POD_NAME} ${CONTAINER_NAME}
```
```none
NAME STATUS ROLES AGE VERSION
kube-worker-1 NotReady <none> 1h v1.23.3
kubernetes-node-bols Ready <none> 1h v1.23.3
kubernetes-node-st6x Ready <none> 1h v1.23.3
kubernetes-node-unaj Ready <none> 1h v1.23.3
```
If your container has previously crashed, you can access the previous container's crash log with:
```shell
kubectl describe node kube-worker-1
kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}
```
```none
Name: kube-worker-1
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=kube-worker-1
kubernetes.io/os=linux
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Thu, 17 Feb 2022 16:46:30 -0500
Taints: node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unreachable:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: kube-worker-1
AcquireTime: <unset>
RenewTime: Thu, 17 Feb 2022 17:13:09 -0500
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Thu, 17 Feb 2022 17:09:13 -0500 Thu, 17 Feb 2022 17:09:13 -0500 WeaveIsUp Weave pod has set this
MemoryPressure Unknown Thu, 17 Feb 2022 17:12:40 -0500 Thu, 17 Feb 2022 17:13:52 -0500 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Thu, 17 Feb 2022 17:12:40 -0500 Thu, 17 Feb 2022 17:13:52 -0500 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure Unknown Thu, 17 Feb 2022 17:12:40 -0500 Thu, 17 Feb 2022 17:13:52 -0500 NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown Thu, 17 Feb 2022 17:12:40 -0500 Thu, 17 Feb 2022 17:13:52 -0500 NodeStatusUnknown Kubelet stopped posting node status.
Addresses:
InternalIP: 192.168.0.113
Hostname: kube-worker-1
Capacity:
cpu: 2
ephemeral-storage: 15372232Ki
hugepages-2Mi: 0
memory: 2025188Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 14167048988
hugepages-2Mi: 0
memory: 1922788Ki
pods: 110
System Info:
Machine ID: 9384e2927f544209b5d7b67474bbf92b
System UUID: aa829ca9-73d7-064d-9019-df07404ad448
Boot ID: 5a295a03-aaca-4340-af20-1327fa5dab5c
Kernel Version: 5.13.0-28-generic
OS Image: Ubuntu 21.10
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.5.9
Kubelet Version: v1.23.3
Kube-Proxy Version: v1.23.3
Non-terminated Pods: (4 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
default nginx-deployment-67d4bdd6f5-cx2nz 500m (25%) 500m (25%) 128Mi (6%) 128Mi (6%) 23m
default nginx-deployment-67d4bdd6f5-w6kd7 500m (25%) 500m (25%) 128Mi (6%) 128Mi (6%) 23m
kube-system kube-proxy-dnxbz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28m
kube-system weave-net-gjxxp 100m (5%) 0 (0%) 200Mi (10%) 0 (0%) 28m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1100m (55%) 1 (50%)
memory 456Mi (24%) 256Mi (13%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
## Debugging with container exec {#container-exec}
If the {{< glossary_tooltip text="container image" term_id="image" >}} includes
debugging utilities, as is the case with images built from Linux and Windows OS
base images, you can run commands inside a specific container with
`kubectl exec`:
```shell
kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} ${ARG1} ${ARG2} ... ${ARGN}
```
{{< note >}}
`-c ${CONTAINER_NAME}` is optional. You can omit it for Pods that only contain a single container.
{{< /note >}}
As an example, to look at the logs from a running Cassandra pod, you might run
```shell
kubectl exec cassandra -- cat /var/log/cassandra/system.log
```
You can run a shell that's connected to your terminal using the `-i` and `-t`
arguments to `kubectl exec`, for example:
```shell
kubectl exec -it cassandra -- sh
```
For more details, see [Get a Shell to a Running Container](
/docs/tasks/debug/debug-application/get-shell-running-container/).
## Debugging with an ephemeral debug container {#ephemeral-container}
{{< feature-state state="beta" for_k8s_version="v1.23" >}}
{{< glossary_tooltip text="Ephemeral containers" term_id="ephemeral-container" >}}
are useful for interactive troubleshooting when `kubectl exec` is insufficient
because a container has crashed or a container image doesn't include debugging
utilities, such as with [distroless images](
https://github.com/GoogleContainerTools/distroless).
### Example debugging using ephemeral containers {#ephemeral-container-example}
You can use the `kubectl debug` command to add ephemeral containers to a
running Pod. First, create a pod for the example:
```shell
kubectl run ephemeral-demo --image=k8s.gcr.io/pause:3.1 --restart=Never
```
The examples in this section use the `pause` container image because it does not
contain debugging utilities, but this method works with all container
images.
If you attempt to use `kubectl exec` to create a shell you will see an error
because there is no shell in this container image.
```shell
kubectl exec -it ephemeral-demo -- sh
```
```
OCI runtime exec failed: exec failed: container_linux.go:346: starting container process caused "exec: \"sh\": executable file not found in $PATH": unknown
```
You can instead add a debugging container using `kubectl debug`. If you
specify the `-i`/`--interactive` argument, `kubectl` will automatically attach
to the console of the Ephemeral Container.
```shell
kubectl debug -it ephemeral-demo --image=busybox:1.28 --target=ephemeral-demo
```
```
Defaulting debug container name to debugger-8xzrl.
If you don't see a command prompt, try pressing enter.
/ #
```
This command adds a new busybox container and attaches to it. The `--target`
parameter targets the process namespace of another container. It's necessary
here because `kubectl run` does not enable [process namespace sharing](
/docs/tasks/configure-pod-container/share-process-namespace/) in the pod it
creates.
{{< note >}}
The `--target` parameter must be supported by the {{< glossary_tooltip
text="Container Runtime" term_id="container-runtime" >}}. When not supported,
the Ephemeral Container may not be started, or it may be started with an
isolated process namespace so that `ps` does not reveal processes in other
containers.
{{< /note >}}
You can view the state of the newly created ephemeral container using `kubectl describe`:
```shell
kubectl describe pod ephemeral-demo
```
```
...
Ephemeral Containers:
debugger-8xzrl:
Container ID: docker://b888f9adfd15bd5739fefaa39e1df4dd3c617b9902082b1cfdc29c4028ffb2eb
Image: busybox
Image ID: docker-pullable://busybox@sha256:1828edd60c5efd34b2bf5dd3282ec0cc04d47b2ff9caa0b6d4f07a21d1c08084
Port: <none>
Host Port: <none>
State: Running
Started: Wed, 12 Feb 2020 14:25:42 +0100
Ready: False
Restart Count: 0
Environment: <none>
Mounts: <none>
...
```
Use `kubectl delete` to remove the Pod when you're finished:
```shell
kubectl get node kube-worker-1 -o yaml
kubectl delete pod ephemeral-demo
```
```yaml
apiVersion: v1
kind: Node
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: /run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2022-02-17T21:46:30Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: kube-worker-1
kubernetes.io/os: linux
name: kube-worker-1
resourceVersion: "4026"
uid: 98efe7cb-2978-4a0b-842a-1a7bf12c05f8
spec: {}
status:
addresses:
- address: 192.168.0.113
type: InternalIP
- address: kube-worker-1
type: Hostname
allocatable:
cpu: "2"
ephemeral-storage: "14167048988"
hugepages-2Mi: "0"
memory: 1922788Ki
pods: "110"
capacity:
cpu: "2"
ephemeral-storage: 15372232Ki
hugepages-2Mi: "0"
memory: 2025188Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2022-02-17T22:20:32Z"
lastTransitionTime: "2022-02-17T22:20:32Z"
message: Weave pod has set this
reason: WeaveIsUp
status: "False"
type: NetworkUnavailable
- lastHeartbeatTime: "2022-02-17T22:20:15Z"
lastTransitionTime: "2022-02-17T22:13:25Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2022-02-17T22:20:15Z"
lastTransitionTime: "2022-02-17T22:13:25Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2022-02-17T22:20:15Z"
lastTransitionTime: "2022-02-17T22:13:25Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2022-02-17T22:20:15Z"
lastTransitionTime: "2022-02-17T22:15:15Z"
message: kubelet is posting ready status. AppArmor enabled
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
nodeInfo:
architecture: amd64
bootID: 22333234-7a6b-44d4-9ce1-67e31dc7e369
containerRuntimeVersion: containerd://1.5.9
kernelVersion: 5.13.0-28-generic
kubeProxyVersion: v1.23.3
kubeletVersion: v1.23.3
machineID: 9384e2927f544209b5d7b67474bbf92b
operatingSystem: linux
osImage: Ubuntu 21.10
systemUUID: aa829ca9-73d7-064d-9019-df07404ad448
## Debugging using a copy of the Pod
Sometimes Pod configuration options make it difficult to troubleshoot in certain
situations. For example, you can't run `kubectl exec` to troubleshoot your
container if your container image does not include a shell or if your application
crashes on startup. In these situations you can use `kubectl debug` to create a
copy of the Pod with configuration values changed to aid debugging.
### Copying a Pod while adding a new container
Adding a new container can be useful when your application is running but not
behaving as you expect and you'd like to add additional troubleshooting
utilities to the Pod.
For example, maybe your application's container images are built on `busybox`
but you need debugging utilities not included in `busybox`. You can simulate
this scenario using `kubectl run`:
```shell
kubectl run myapp --image=busybox:1.28 --restart=Never -- sleep 1d
```
Run this command to create a copy of `myapp` named `myapp-debug` that adds a
new Ubuntu container for debugging:
## {{% heading "whatsnext" %}}
```shell
kubectl debug myapp -it --image=ubuntu --share-processes --copy-to=myapp-debug
```
```
Defaulting debug container name to debugger-w7xmf.
If you don't see a command prompt, try pressing enter.
root@myapp-debug:/#
```
Learn about additional debugging tools, including:
{{< note >}}
* `kubectl debug` automatically generates a container name if you don't choose
one using the `--container` flag.
* The `-i` flag causes `kubectl debug` to attach to the new container by
default. You can prevent this by specifying `--attach=false`. If your session
becomes disconnected you can reattach using `kubectl attach`.
* The `--share-processes` allows the containers in this Pod to see processes
from the other containers in the Pod. For more information about how this
works, see [Share Process Namespace between Containers in a Pod](
/docs/tasks/configure-pod-container/share-process-namespace/).
{{< /note >}}
* [Logging](/docs/concepts/cluster-administration/logging/)
* [Monitoring](/docs/tasks/debug-application-cluster/resource-usage-monitoring/)
* [Getting into containers via `exec`](/docs/tasks/debug-application-cluster/get-shell-running-container/)
* [Connecting to containers via proxies](/docs/tasks/extend-kubernetes/http-proxy-access-api/)
* [Connecting to containers via port forwarding](/docs/tasks/access-application-cluster/port-forward-access-application-cluster/)
* [Inspect Kubernetes node with crictl](/docs/tasks/debug-application-cluster/crictl/)
Don't forget to clean up the debugging Pod when you're finished with it:
```shell
kubectl delete pod myapp myapp-debug
```
### Copying a Pod while changing its command
Sometimes it's useful to change the command for a container, for example to
add a debugging flag or because the application is crashing.
To simulate a crashing application, use `kubectl run` to create a container
that immediately exits:
```
kubectl run --image=busybox:1.28 myapp -- false
```
You can see using `kubectl describe pod myapp` that this container is crashing:
```
Containers:
myapp:
Image: busybox
...
Args:
false
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
```
You can use `kubectl debug` to create a copy of this Pod with the command
changed to an interactive shell:
```
kubectl debug myapp -it --copy-to=myapp-debug --container=myapp -- sh
```
```
If you don't see a command prompt, try pressing enter.
/ #
```
Now you have an interactive shell that you can use to perform tasks like
checking filesystem paths or running the container command manually.
{{< note >}}
* To change the command of a specific container you must
specify its name using `--container` or `kubectl debug` will instead
create a new container to run the command you specified.
* The `-i` flag causes `kubectl debug` to attach to the container by default.
You can prevent this by specifying `--attach=false`. If your session becomes
disconnected you can reattach using `kubectl attach`.
{{< /note >}}
Don't forget to clean up the debugging Pod when you're finished with it:
```shell
kubectl delete pod myapp myapp-debug
```
### Copying a Pod while changing container images
In some situations you may want to change a misbehaving Pod from its normal
production container images to an image containing a debugging build or
additional utilities.
As an example, create a Pod using `kubectl run`:
```
kubectl run myapp --image=busybox:1.28 --restart=Never -- sleep 1d
```
Now use `kubectl debug` to make a copy and change its container image
to `ubuntu`:
```
kubectl debug myapp --copy-to=myapp-debug --set-image=*=ubuntu
```
The syntax of `--set-image` uses the same `container_name=image` syntax as
`kubectl set image`. `*=ubuntu` means change the image of all containers
to `ubuntu`.
Don't forget to clean up the debugging Pod when you're finished with it:
```shell
kubectl delete pod myapp myapp-debug
```
## Debugging via a shell on the node {#node-shell-session}
If none of these approaches work, you can find the Node on which the Pod is
running and create a privileged Pod running in the host namespaces. To create
an interactive shell on a node using `kubectl debug`, run:
```shell
kubectl debug node/mynode -it --image=ubuntu
```
```
Creating debugging pod node-debugger-mynode-pdx84 with container debugger on node mynode.
If you don't see a command prompt, try pressing enter.
root@ek8s:/#
```
When creating a debugging session on a node, keep in mind that:
* `kubectl debug` automatically generates the name of the new Pod based on
the name of the Node.
* The container runs in the host IPC, Network, and PID namespaces.
* The root filesystem of the Node will be mounted at `/host`.
Don't forget to clean up the debugging Pod when you're finished with it:
```shell
kubectl delete pod node-debugger-mynode-pdx84
```

View File

@ -4,6 +4,7 @@ reviewers:
- bowei
content_type: concept
title: Debug Services
weight: 20
---
<!-- overview -->
@ -441,7 +442,7 @@ they are running fine and not crashing.
The "RESTARTS" column says that these pods are not crashing frequently or being
restarted. Frequent restarts could lead to intermittent connectivity issues.
If the restart count is high, read more about how to [debug pods](/docs/tasks/debug-application-cluster/debug-pod-replication-controller/#debugging-pods).
If the restart count is high, read more about how to [debug pods](/docs/tasks/debug/debug-application/debug-pods).
Inside the Kubernetes system is a control loop which evaluates the selector of
every Service and saves the results into a corresponding Endpoints object.
@ -727,13 +728,13 @@ Service is not working. Please let us know what is going on, so we can help
investigate!
Contact us on
[Slack](/docs/tasks/debug-application-cluster/troubleshooting/#slack) or
[Slack](/docs/tasks/debug/overview/#slack) or
[Forum](https://discuss.kubernetes.io) or
[GitHub](https://github.com/kubernetes/kubernetes).
## {{% heading "whatsnext" %}}
Visit [troubleshooting document](/docs/tasks/debug-application-cluster/troubleshooting/)
Visit the [troubleshooting overview document](/docs/tasks/debug/overview/)
for more information.

View File

@ -9,6 +9,7 @@ reviewers:
- smarterclayton
title: Debug a StatefulSet
content_type: task
weight: 30
---
<!-- overview -->
@ -34,9 +35,9 @@ If you find that any Pods listed are in `Unknown` or `Terminating` state for an
refer to the [Deleting StatefulSet Pods](/docs/tasks/run-application/delete-stateful-set/) task for
instructions on how to deal with them.
You can debug individual Pods in a StatefulSet using the
[Debugging Pods](/docs/tasks/debug-application-cluster/debug-pod-replication-controller/) guide.
[Debugging Pods](/docs/tasks/debug/debug-application/debug-pods/) guide.
## {{% heading "whatsnext" %}}
Learn more about [debugging an init-container](/docs/tasks/debug-application-cluster/debug-init-containers/).
Learn more about [debugging an init-container](/docs/tasks/debug/debug-application/debug-init-containers/).

View File

@ -0,0 +1,316 @@
---
reviewers:
- davidopp
title: "Troubleshooting Clusters"
description: Debugging common cluster issues.
weight: 20
no_list: true
---
<!-- overview -->
This doc is about cluster troubleshooting; we assume you have already ruled out your application as the root cause of the
problem you are experiencing. See
the [application troubleshooting guide](/docs/tasks/debug/debug-application/) for tips on application debugging.
You may also visit the [troubleshooting overview document](/docs/tasks/debug/) for more information.
<!-- body -->
## Listing your cluster
The first thing to debug in your cluster is if your nodes are all registered correctly.
Run the following command:
```shell
kubectl get nodes
```
And verify that all of the nodes you expect to see are present and that they are all in the `Ready` state.
To get detailed information about the overall health of your cluster, you can run:
```shell
kubectl cluster-info dump
```
### Example: debugging a down/unreachable node
Sometimes when debugging it can be useful to look at the status of a node -- for example, because you've noticed strange behavior of a Pod that's running on the node, or to find out why a Pod won't schedule onto the node. As with Pods, you can use `kubectl describe node` and `kubectl get node -o yaml` to retrieve detailed information about nodes. For example, here's what you'll see if a node is down (disconnected from the network, or kubelet dies and won't restart, etc.). Notice the events that show the node is NotReady, and also notice that the pods are no longer running (they are evicted after five minutes of NotReady status).
```shell
kubectl get nodes
```
```none
NAME STATUS ROLES AGE VERSION
kube-worker-1 NotReady <none> 1h v1.23.3
kubernetes-node-bols Ready <none> 1h v1.23.3
kubernetes-node-st6x Ready <none> 1h v1.23.3
kubernetes-node-unaj Ready <none> 1h v1.23.3
```
```shell
kubectl describe node kube-worker-1
```
```none
Name: kube-worker-1
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=kube-worker-1
kubernetes.io/os=linux
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Thu, 17 Feb 2022 16:46:30 -0500
Taints: node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unreachable:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: kube-worker-1
AcquireTime: <unset>
RenewTime: Thu, 17 Feb 2022 17:13:09 -0500
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Thu, 17 Feb 2022 17:09:13 -0500 Thu, 17 Feb 2022 17:09:13 -0500 WeaveIsUp Weave pod has set this
MemoryPressure Unknown Thu, 17 Feb 2022 17:12:40 -0500 Thu, 17 Feb 2022 17:13:52 -0500 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Thu, 17 Feb 2022 17:12:40 -0500 Thu, 17 Feb 2022 17:13:52 -0500 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure Unknown Thu, 17 Feb 2022 17:12:40 -0500 Thu, 17 Feb 2022 17:13:52 -0500 NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown Thu, 17 Feb 2022 17:12:40 -0500 Thu, 17 Feb 2022 17:13:52 -0500 NodeStatusUnknown Kubelet stopped posting node status.
Addresses:
InternalIP: 192.168.0.113
Hostname: kube-worker-1
Capacity:
cpu: 2
ephemeral-storage: 15372232Ki
hugepages-2Mi: 0
memory: 2025188Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 14167048988
hugepages-2Mi: 0
memory: 1922788Ki
pods: 110
System Info:
Machine ID: 9384e2927f544209b5d7b67474bbf92b
System UUID: aa829ca9-73d7-064d-9019-df07404ad448
Boot ID: 5a295a03-aaca-4340-af20-1327fa5dab5c
Kernel Version: 5.13.0-28-generic
OS Image: Ubuntu 21.10
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.5.9
Kubelet Version: v1.23.3
Kube-Proxy Version: v1.23.3
Non-terminated Pods: (4 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
default nginx-deployment-67d4bdd6f5-cx2nz 500m (25%) 500m (25%) 128Mi (6%) 128Mi (6%) 23m
default nginx-deployment-67d4bdd6f5-w6kd7 500m (25%) 500m (25%) 128Mi (6%) 128Mi (6%) 23m
kube-system kube-proxy-dnxbz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28m
kube-system weave-net-gjxxp 100m (5%) 0 (0%) 200Mi (10%) 0 (0%) 28m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1100m (55%) 1 (50%)
memory 456Mi (24%) 256Mi (13%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
...
```
```shell
kubectl get node kube-worker-1 -o yaml
```
```yaml
apiVersion: v1
kind: Node
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: /run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2022-02-17T21:46:30Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: kube-worker-1
kubernetes.io/os: linux
name: kube-worker-1
resourceVersion: "4026"
uid: 98efe7cb-2978-4a0b-842a-1a7bf12c05f8
spec: {}
status:
addresses:
- address: 192.168.0.113
type: InternalIP
- address: kube-worker-1
type: Hostname
allocatable:
cpu: "2"
ephemeral-storage: "14167048988"
hugepages-2Mi: "0"
memory: 1922788Ki
pods: "110"
capacity:
cpu: "2"
ephemeral-storage: 15372232Ki
hugepages-2Mi: "0"
memory: 2025188Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2022-02-17T22:20:32Z"
lastTransitionTime: "2022-02-17T22:20:32Z"
message: Weave pod has set this
reason: WeaveIsUp
status: "False"
type: NetworkUnavailable
- lastHeartbeatTime: "2022-02-17T22:20:15Z"
lastTransitionTime: "2022-02-17T22:13:25Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2022-02-17T22:20:15Z"
lastTransitionTime: "2022-02-17T22:13:25Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2022-02-17T22:20:15Z"
lastTransitionTime: "2022-02-17T22:13:25Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2022-02-17T22:20:15Z"
lastTransitionTime: "2022-02-17T22:15:15Z"
message: kubelet is posting ready status. AppArmor enabled
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
nodeInfo:
architecture: amd64
bootID: 22333234-7a6b-44d4-9ce1-67e31dc7e369
containerRuntimeVersion: containerd://1.5.9
kernelVersion: 5.13.0-28-generic
kubeProxyVersion: v1.23.3
kubeletVersion: v1.23.3
machineID: 9384e2927f544209b5d7b67474bbf92b
operatingSystem: linux
osImage: Ubuntu 21.10
systemUUID: aa829ca9-73d7-064d-9019-df07404ad448
```
## Looking at logs
For now, digging deeper into the cluster requires logging into the relevant machines. Here are the locations
of the relevant log files. On systemd-based systems, you may need to use `journalctl` instead of examining log files.
### Control Plane nodes
* `/var/log/kube-apiserver.log` - API Server, responsible for serving the API
* `/var/log/kube-scheduler.log` - Scheduler, responsible for making scheduling decisions
* `/var/log/kube-controller-manager.log` - a component that runs most Kubernetes built-in {{<glossary_tooltip text="controllers" term_id="controller">}}, with the notable exception of scheduling (the kube-scheduler handles scheduling).
### Worker Nodes
* `/var/log/kubelet.log` - logs from the kubelet, responsible for running containers on the node
* `/var/log/kube-proxy.log` - logs from `kube-proxy`, which is responsible for directing traffic to Service endpoints
## Cluster failure modes
This is an incomplete list of things that could go wrong, and how to adjust your cluster setup to mitigate the problems.
### Contributing causes
- VM(s) shutdown
- Network partition within cluster, or between cluster and users
- Crashes in Kubernetes software
- Data loss or unavailability of persistent storage (e.g. GCE PD or AWS EBS volume)
- Operator error, for example misconfigured Kubernetes software or application software
### Specific scenarios
- API server VM shutdown or apiserver crashing
- Results
- unable to stop, update, or start new pods, services, replication controller
- existing pods and services should continue to work normally, unless they depend on the Kubernetes API
- API server backing storage lost
- Results
- the kube-apiserver component fails to start successfully and become healthy
- kubelets will not be able to reach it but will continue to run the same pods and provide the same service proxying
- manual recovery or recreation of apiserver state necessary before apiserver is restarted
- Supporting services (node controller, replication controller manager, scheduler, etc) VM shutdown or crashes
- currently those are colocated with the apiserver, and their unavailability has similar consequences as apiserver
- in future, these will be replicated as well and may not be co-located
- they do not have their own persistent state
- Individual node (VM or physical machine) shuts down
- Results
- pods on that Node stop running
- Network partition
- Results
- partition A thinks the nodes in partition B are down; partition B thinks the apiserver is down. (Assuming the master VM ends up in partition A.)
- Kubelet software fault
- Results
- crashing kubelet cannot start new pods on the node
- kubelet might delete the pods or not
- node marked unhealthy
- replication controllers start new pods elsewhere
- Cluster operator error
- Results
- loss of pods, services, etc
- lost of apiserver backing store
- users unable to read API
- etc.
### Mitigations
- Action: Use IaaS provider's automatic VM restarting feature for IaaS VMs
- Mitigates: Apiserver VM shutdown or apiserver crashing
- Mitigates: Supporting services VM shutdown or crashes
- Action: Use IaaS providers reliable storage (e.g. GCE PD or AWS EBS volume) for VMs with apiserver+etcd
- Mitigates: Apiserver backing storage lost
- Action: Use [high-availability](/docs/setup/production-environment/tools/kubeadm/high-availability/) configuration
- Mitigates: Control plane node shutdown or control plane components (scheduler, API server, controller-manager) crashing
- Will tolerate one or more simultaneous node or component failures
- Mitigates: API server backing storage (i.e., etcd's data directory) lost
- Assumes HA (highly-available) etcd configuration
- Action: Snapshot apiserver PDs/EBS-volumes periodically
- Mitigates: Apiserver backing storage lost
- Mitigates: Some cases of operator error
- Mitigates: Some cases of Kubernetes software fault
- Action: use replication controller and services in front of pods
- Mitigates: Node shutdown
- Mitigates: Kubelet software fault
- Action: applications (containers) designed to tolerate unexpected restarts
- Mitigates: Node shutdown
- Mitigates: Kubelet software fault
## {{% heading "whatsnext" %}}
* Learn about the metrics available in the [Resource Metrics Pipeline](resource-metrics-pipeline)
* Discover additional tools for [monitoring resource usage](resource-usage-monitoring)
* Use Node Problem Detector to [monitor node health](monitor-node-health)
* Use `crictl` to [debug Kubernetes nodes](crictl)
* Get more information about [Kubernetes auditing](audit)
* Use `telepresence` to [develop and debug services locally](local-debugging)

View File

@ -5,6 +5,7 @@ reviewers:
- mrunalp
title: Debugging Kubernetes nodes with crictl
content_type: task
weight: 30
---

View File

@ -1,5 +1,5 @@
---
title: Developing and debugging services locally
title: Developing and debugging services locally using telepresence
content_type: task
---
@ -58,4 +58,4 @@ Telepresence installs a traffic-agent sidecar next to your existing application'
If you're interested in a hands-on tutorial, check out [this tutorial](https://cloud.google.com/community/tutorials/developing-services-with-k8s) that walks through locally developing the Guestbook application on Google Kubernetes Engine.
For further reading, visit the [Telepresence website](https://www.telepresence.io).
For further reading, visit the [Telepresence website](https://www.telepresence.io).

View File

@ -4,6 +4,7 @@ content_type: task
reviewers:
- Random-Liu
- dchen1107
weight: 20
---
<!-- overview -->

View File

@ -4,6 +4,7 @@ reviewers:
- piosz
title: Resource metrics pipeline
content_type: concept
weight: 15
---
<!-- overview -->

View File

@ -3,6 +3,7 @@ reviewers:
- mikedanese
content_type: concept
title: Tools for Monitoring Resources
weight: 15
---
<!-- overview -->
@ -58,4 +59,14 @@ then exposes them to Kubernetes via an adapter by implementing either the
[Prometheus](https://prometheus.io), a CNCF project, can natively monitor Kubernetes, nodes, and Prometheus itself.
Full metrics pipeline projects that are not part of the CNCF are outside the scope of Kubernetes documentation.
## {{% heading "whatsnext" %}}
Learn about additional debugging tools, including:
* [Logging](/docs/concepts/cluster-administration/logging/)
* [Monitoring](/docs/tasks/debug-application-cluster/resource-usage-monitoring/)
* [Getting into containers via `exec`](/docs/tasks/debug-application-cluster/applications/get-shell-running-container/)
* [Connecting to containers via proxies](/docs/tasks/extend-kubernetes/http-proxy-access-api/)
* [Connecting to containers via port forwarding](/docs/tasks/access-application-cluster/port-forward-access-application-cluster/)
* [Inspect Kubernetes node with crictl](/docs/tasks/debug-application-cluster/monitoring/crictl/)