From 40d02e77a38d57e593670f5fd9652d236e6dca13 Mon Sep 17 00:00:00 2001 From: Ryan Nowak Date: Thu, 13 Aug 2020 15:26:35 -0700 Subject: [PATCH] Update docs to include failed health check (#730) Fixes #729 --- .../troubleshooting/common_issues.md | 33 +++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/best-practices/troubleshooting/common_issues.md b/best-practices/troubleshooting/common_issues.md index 30fbd65f7..3dab08fad 100644 --- a/best-practices/troubleshooting/common_issues.md +++ b/best-practices/troubleshooting/common_issues.md @@ -51,6 +51,39 @@ In order to further diagnose any issue, check the logs of the Dapr sidecar injec *Note: If you installed Dapr to a different namespace, replace dapr-system above with the desired namespace* +### My pod is in CrashLoopBackoff or another failed state due to the daprd sidecar + +If the Dapr sidecar (`daprd`) is taking too long to initialize, this might be surfaced as a failing health check by Kubernetes. + +If your pod is in a failed state you should check this: + +```bash +kubectl describe pod +``` + +You might see a table like the following at the end of the command output: + +```txt + Normal Created 7m41s (x2 over 8m2s) kubelet, aks-agentpool-12499885-vmss000000 Created container daprd + Normal Started 7m41s (x2 over 8m2s) kubelet, aks-agentpool-12499885-vmss000000 Started container daprd + Warning Unhealthy 7m28s (x5 over 7m58s) kubelet, aks-agentpool-12499885-vmss000000 Readiness probe failed: Get http://10.244.1.10:3500/v1.0/healthz: dial tcp 10.244.1.10:3500: connect: connection refused + Warning Unhealthy 7m25s (x6 over 7m55s) kubelet, aks-agentpool-12499885-vmss000000 Liveness probe failed: Get http://10.244.1.10:3500/v1.0/healthz: dial tcp 10.244.1.10:3500: connect: connection refused + Normal Killing 7m25s (x2 over 7m43s) kubelet, aks-agentpool-12499885-vmss000000 Container daprd failed liveness probe, will be restarted + Warning BackOff 3m2s (x18 over 6m48s) kubelet, aks-agentpool-12499885-vmss000000 Back-off restarting failed container +``` + +The message `Container daprd failed liveness probe, will be restarted` indicates at the Dapr sidecar has failed its health checks and will be restarted. The messages `Readiness probe failed: Get http://10.244.1.10:3500/v1.0/healthz: dial tcp 10.244.1.10:3500: connect: connection refused` and `Liveness probe failed: Get http://10.244.1.10:3500/v1.0/healthz: dial tcp 10.244.1.10:3500: connect: connection refused` show that the health check failed because no connection could be made to the sidecar. + +The most common cause of this failure is that a component (such as a state store) is misconfigured and is causing initialization to take too long. When initialization takes a long time, it's possible that the health check could terminate the sidecar before anything useful is logged by the sidecar. + +To diagnose the root cause: + +- Significantly increase the liveness probe delay - [link](../../howto/configure-k8s/README.md) +- Set the log level of the sidecar to debug - [link](./logs.md#setting-the-sidecar-log-level) +- Watch the logs for meaningful information - [link](./logs.md#viewing-logs-on-kubernetes) + +> :bulb: Remember to configure the liveness check delay and log level back to your desired values after solving the problem. + ### I am unable to save state or get state Have you installed an Dapr State store in your cluster?