Updates to application debugging and telemetry documentation (#1258)

* Updates to debugging guide

* Update telemetry.md

Edit for clarity, add screenshots, steps, etc. Still need to upload screenshots and add in links

* Add two screenshots of the Kibana UI

* Add urls to screenshots

* Adding troubleshooting on Istio routing

Needs to be more actionable, but this is a start.

* Wording, grammer, etc.

* Typo fix

* Update telemetry.md

* Update telemetry.md

* Update telemetry.md

* Addressing Ivan's comments

* Remove reference to Pantheon

* Update telemetry.md

Clarify the relationship between Kibana and Elasticsearch

* Fixing whitespace so list renders properly

* Responding to Evan's comments

* More updates from Evan's PR comments

* Created a /docs/images folder for Kibana screenshots, fixed line wrapping.

* Adding docs/images folder.

* Making links to images relative.

* Removed hover text from link.
This commit is contained in:
Sam O'Dell 2018-06-26 17:24:21 -07:00 committed by Google Prow Robot
parent 6d3a3c5f47
commit 896bd5dac0
4 changed files with 212 additions and 118 deletions

View File

@ -1,17 +1,17 @@
# Application Debugging Guide
You deployed your app to Knative Serving but it is not working as expected. Go through
this step by step guide to understand what failed.
You deployed your app to Knative Serving, but it isn't working as expected.
Go through this step by step guide to understand what failed.
## Check command line output
Check your deploy command output to see whether it succeeded or not. If your
deployment process was terminated, there should be error message showing up in
the output and describing the reason why the deployment failed.
deployment process was terminated, there should be an error message showing up
in the output that describes the reason why the deployment failed.
This kind of failures is most likely due to either misconfigured manifest or wrong
command. For example, the following output says that you should configure route
traffic percent summing to 100:
This kind of failure is most likely due to either a misconfigured manifest or
wrong command. For example, the following output says that you must configure
route traffic percent to sum to 100:
```
Error from server (InternalError): error when applying patch:
@ -22,25 +22,78 @@ for: "STDIN": Internal error occurred: admission webhook "webhook.knative.dev" d
ERROR: Non-zero return code '1' from command: Process exited with status 1
```
## Check application logs
Knative Serving provides default out-of-the-box logs for your application. After entering
`kubectl proxy`, you can go to the
[Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana)
to search for logs. _(See [telemetry guide](../telemetry.md) for more information
on logging and monitoring features of Knative Serving.)_
### Stdout/stderr logs
To find the logs sent to `stdout/stderr` from your application in the
Kibana UI:
1. Click `Discover` on the left side bar.
1. Choose `logstash-*` index pattern on the left top.
1. Input `tag: kubernetes*` in the top search bar then search.
### Request logs
To find the request logs of your application in the Kibana UI :
1. Click `Discover` on the left side bar.
1. Choose `logstash-*` index pattern on the left top.
1. Input `tag: "requestlog.logentry.istio-system"` in the top search bar then
search.
## Check Route status
Run the following command to get `status` of the `Route` with which you deployed
your application:
Run the following command to get the `status` of the `Route` object with which
you deployed your application:
```shell
kubectl get route <route-name> -o yaml
```
The `conditions` in `status` provide the reason if there is any failure. For
details, see Elafro
details, see Knative
[Error Conditions and Reporting](../spec/errors.md)(currently some of them
are not implemented yet).
## Check revision status
### Check Istio routing
If you configure your `Route` with `Configuration`, run the following command to
get the name of the `Revision` created for you deployment(look up the
configuration name in the `Route` yaml file):
Compare your Knative `Route` object's configuration (obtained in the previous
step) to the Istio `RouteRule` object's configuration.
Enter the following, replacing `<routerule-name>` with the appropriate value:
```shell
kubectl get routerule <routerule-name> -o yaml
```
If you don't know the name of your route rule, use the
```kubectl get routerule``` command to find it.
The command returns the configuration of your route rule. Compare the domains
between your route and route rule; they should match.
### Check ingress status
Enter:
```shell
kubectl get ingress
```
The command returns the status of the ingress. You can see the name, age,
domains, and IP address.
## Check Revision status
If you configure your `Route` with `Configuration`, run the following
command to get the name of the `Revision` created for you deployment
(look up the configuration name in the `Route` .yaml file):
```shell
kubectl get configuration <configuration-name> -o jsonpath="{.status.latestCreatedRevisionName}"
@ -64,19 +117,19 @@ conditions:
type: Ready
```
If you see this condition, to debug further:
If you see this condition, check the following to continue debugging:
1. [Check Pod status](#check-pod-status)
1. [Check application logs](#check-application-logs)
1. [Check Istio routing](#check-istio-routing)
* [Check Pod status](#check-pod-status)
* [Check application logs](#check-application-logs)
* [Check Istio routing](#check-istio-routing)
If you see other conditions, to debug further:
1. Look up the meaning of the conditions in Elafro
* Look up the meaning of the conditions in Knative
[Error Conditions and Reporting](../spec/errors.md). Note: some of them
are not implemented yet. An alternation is to
are not implemented yet. An alternative is to
[check Pod status](#check-pod-status).
1. If you are using `BUILD` to deploy and the `BuidComplete` condition is not
* If you are using `BUILD` to deploy and the `BuidComplete` condition is not
`True`, [check BUILD status](#check-build-status).
## Check Pod status
@ -112,36 +165,13 @@ your `Revision`:
kubectl get build $(kubectl get revision <revision-name> -o jsonpath="{.spec.buildName}") -o yaml
```
The `conditions` in `status` provide the reason if there is any failure. To access build logs, first execute `kubectl proxy` and then open [Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana). Use any of the following filters within Kibana UI to see build logs. _(See [telemetry guide](../telemetry.md) for more information on logging and monitoring features of Knative Serving.)_
The `conditions` in `status` provide the reason if there is any failure. To
access build logs, first execute `kubectl proxy` and then open [Kibana
UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-
logging/proxy/app/kibana). Use any of the following filters within Kibana UI to
see build logs. _(See [telemetry guide](../telemetry.md) for more information on
logging and monitoring features of Knative Serving.)_
* All build logs: `_exists_:"kubernetes.labels.build-name"`
* Build logs for a specific build: `kubernetes.labels.build-name:"<BUILD NAME>"`
* Build logs for a specific build and step: `kubernetes.labels.build-name:"<BUILD NAME>" AND kubernetes.container_name:"build-step-<BUILD STEP NAME>"`
## Check application logs
Knative Serving provides default out-of-box logs for your application. After executing
`kubectl proxy`, you can go to the
[Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana)
to search for logs. _(See [telemetry guide](../telemetry.md) for more information on logging and monitoring features of Knative Serving.)_
### Stdout/stderr logs
You can find the logs emitted to `stdout/stderr` from your application on
Kibana UI by following steps:
1. Click `Discover` on the left side bar.
1. Choose `logstash-*` index pattern on the left top.
1. Input `tag: kubernetes*` in the top search bar then search.
### Request logs
You can find the request logs of your application on Kibana UI by following
steps:
1. Click `Discover` on the left side bar.
1. Choose `logstash-*` index pattern on the left top.
1. Input `tag: "requestlog.logentry.istio-system"` in the top search bar then
search.
## Check Istio routing
TBD.

Binary file not shown.

After

Width:  |  Height:  |  Size: 177 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 320 KiB

View File

@ -1,47 +1,50 @@
# Logs and metrics
## Monitoring components Setup
## Monitoring components setup
First, deploy monitoring components.
### Elasticsearch, Kibana, Prometheus & Grafana Setup
### Elasticsearch, Kibana, Prometheus, and Grafana Setup
You can use two different setups:
1. **150-elasticsearch-prod**: This configuration collects logs & metrics from user containers, build controller and Istio requests.
1. **150-elasticsearch-prod**: This configuration collects logs & metrics from
user containers, build controller and Istio requests.
```shell
kubectl apply -R -f config/monitoring/100-common \
-f config/monitoring/150-elasticsearch-prod \
-f third_party/config/monitoring/common \
-f third_party/config/monitoring/elasticsearch \
-f config/monitoring/200-common \
-f config/monitoring/200-common/100-istio.yaml
```
```shell
kubectl apply -R -f config/monitoring/100-common \
-f config/monitoring/150-elasticsearch-prod \
-f third_party/config/monitoring/common \
-f third_party/config/monitoring/elasticsearch \
-f config/monitoring/200-common \
-f config/monitoring/200-common/100-istio.yaml
```
1. **150-elasticsearch-dev**: This configuration collects everything in (1) plus Knative Serving controller logs.
1. **150-elasticsearch-dev**: This configuration collects everything **150
-elasticsearch-prod** does, plus Knative Serving controller logs.
```shell
kubectl apply -R -f config/monitoring/100-common \
-f config/monitoring/150-elasticsearch-dev \
-f third_party/config/monitoring/common \
-f third_party/config/monitoring/elasticsearch \
-f config/monitoring/200-common \
-f config/monitoring/200-common/100-istio.yaml
```
```shell
kubectl apply -R -f config/monitoring/100-common \
-f config/monitoring/150-elasticsearch-dev \
-f third_party/config/monitoring/common \
-f third_party/config/monitoring/elasticsearch \
-f config/monitoring/200-common \
-f config/monitoring/200-common/100-istio.yaml
```
### Stackdriver(logs), Prometheus & Grafana Setup
### Stackdriver, Prometheus, and Grafana Setup
If your Knative Serving is not built on a GCP based cluster or you want to send logs to
another GCP project, you need to build your own Fluentd image and modify the
configuration first. See
If your Knative Serving is not built on a Google Cloud Platform based cluster,
or you want to send logs to another GCP project, you need to build your own
Fluentd image and modify the configuration first. See
1. [Fluentd image on Knative Serving](/image/fluentd/README.md)
2. [Setting up a logging plugin](setting-up-a-logging-plugin.md)
Then you can use two different setups:
1. **150-stackdriver-prod**: This configuration collects logs & metrics from user containers, build controller and Istio requests.
1. **150-stackdriver-prod**: This configuration collects logs and metrics from
user containers, build controller, and Istio requests.
```shell
kubectl apply -R -f config/monitoring/100-common \
@ -51,7 +54,8 @@ kubectl apply -R -f config/monitoring/100-common \
-f config/monitoring/200-common/100-istio.yaml
```
2. **150-stackdriver-dev**: This configuration collects everything in (1) plus Knative Serving controller logs.
2. **150-stackdriver-dev**: This configuration collects everything **150
-stackdriver-prod** does, plus Knative Serving controller logs.
```shell
kubectl apply -R -f config/monitoring/100-common \
@ -63,29 +67,55 @@ kubectl apply -R -f config/monitoring/100-common \
## Accessing logs
### Elasticsearch & Kibana
### Kibana and Elasticsearch
Run,
To open the Kibana UI (the visualization tool for [Elasticsearch](https://info.elastic.co),
enter the following command:
```shell
kubectl proxy
```
Then open Kibana UI at this [link](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana)
(*it might take a couple of minutes for the proxy to work*).
When Kibana is opened the first time, it will ask you to create an index. Accept the default options as is. As more logs get ingested,
new fields will be discovered and to have them indexed, go to Management -> Index Patterns -> Refresh button (on top right) -> Refresh fields.
This starts a local proxy of Kibana on port 8001. The Kibana UI is only exposed within
the cluster for security reasons.
Navigate to the [Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana)
(*It might take a couple of minutes for the proxy to work*).
When Kibana is opened the first time, it will ask you to create an index.
Accept the default options:
![Kibana UI Configuring an Index Pattern](images/kibana-landing-page-configure-index.png)
The Discover tab of the Kibana UI looks like this:
![Kibana UI Discover tab](images/kibana-discover-tab-annotated.png)
You can change the time frame of logs Kibana displays in the upper right corner
of the screen. The main search bar is across the top of the Dicover page.
As more logs are ingested, new fields will be discovered. To have them indexed,
go to Management > Index Patterns > Refresh button (on top right) > Refresh
fields.
<!-- TODO: create a video walkthrough of the Kibana UI -->
#### Accessing configuration and revision logs
To access to logs for a configuration, use the following search term in Kibana UI:
To access the logs for a configuration, enter the following search query in Kibana:
```
kubernetes.labels.knative_dev\/configuration: "configuration-example"
```
Replace `configuration-example` with your configuration's name.
Replace `configuration-example` with your configuration's name. Enter the following
command to get your configuration's name:
To access logs for a revision, use the following search term in Kibana UI:
```shell
kubectl get configurations
```
To access logs for a revision, enter the following search query in Kibana:
```
kubernetes.labels.knative_dev\/revision: "configuration-example-00001"
@ -95,13 +125,13 @@ Replace `configuration-example-00001` with your revision's name.
#### Accessing build logs
To access to logs for a build, use the following search term in Kibana UI:
To access the logs for a build, enter the following search query in Kibana:
```
kubernetes.labels.build\-name: "test-build"
```
Replace `test-build` with your build's name. A build's name is specified in its YAML file as follows:
Replace `test-build` with your build's name. The build name is specified in the `.yaml` file as follows:
```yaml
apiVersion: build.dev/v1alpha1
@ -112,18 +142,19 @@ metadata:
### Stackdriver
Go to [Pantheon logging page](https://console.cloud.google.com/logs/viewer) for
Go to the [Google Cloud Console logging page](https://console.cloud.google.com/logs/viewer) for
your GCP project which stores your logs via Stackdriver.
## Accessing metrics
Run:
Enter:
```shell
kubectl port-forward -n monitoring $(kubectl get pods -n monitoring --selector=app=grafana --output=jsonpath="{.items..metadata.name}") 3000
```
Then open Grafana UI at [http://localhost:3000](http://localhost:3000). The following dashboards are pre-installed with Knative Serving:
Then open the Grafana UI at [http://localhost:3000](http://localhost:3000). The following dashboards are
pre-installed with Knative Serving:
* **Revision HTTP Requests:** HTTP request count, latency and size metrics per revision and per configuration
* **Nodes:** CPU, memory, network and disk metrics at node level
@ -134,26 +165,46 @@ Then open Grafana UI at [http://localhost:3000](http://localhost:3000). The foll
### Accessing per request traces
First open Kibana UI as shown above. Browse to Management -> Index Patterns -> +Create Index Pattern and type "zipkin*" (without the quotes) to the "Index pattern" text field and hit "Create" button. This will create a new index pattern that will store per request traces captured by Zipkin. This is a one time step and is needed only for fresh installations.
Before you can view per request metrics, you'll need to create a new index pattern that will store
per request traces captured by Zipkin:
Next, start the proxy if it is not already running:
1. Start the Kibana UI serving on local port 8001 by entering the following command:
```shell
kubectl proxy
```
```shell
kubectl proxy
```
Then open Zipkin UI at this [link](http://localhost:8001/api/v1/namespaces/istio-system/services/zipkin:9411/proxy/zipkin/). Click on "Find Traces" to see the latest traces. You can search for a trace ID or look at traces of a specific application within this UI. Click on a trace to see a detailed view of a specific call.
1. Open the [Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana).
To see a demo of distributed tracing, deploy the [Telemetry sample](../sample/telemetrysample/README.md), send some traffic to it and explore the traces it generates from Zipkin UI.
1. Navigate to Management -> Index Patterns -> Create Index Pattern.
1. Enter `zipkin*` in the "Index pattern" text field.
1. Click **Create**.
After you've created the Zipkin index pattern, open the
[Zipkin UI](http://localhost:8001/api/v1/namespaces/istio-system/services/zipkin:9411/proxy/zipkin/).
Click on "Find Traces" to see the latest traces. You can search for a trace ID
or look at traces of a specific application. Click on a trace to see a detailed
view of a specific call.
To see a demo of distributed tracing, deploy the
[Telemetry sample](../sample/telemetrysample/README.md), send some traffic to it,
then explore the traces it generates from Zipkin UI.
<!--TODO: Consider adding a video here. -->
## Default metrics
Following metrics are collected by default:
The following metrics are collected by default:
* Knative Serving controller metrics
* Istio metrics (mixer, envoy and pilot)
* Node and pod metrics
There are several other collectors that are pre-configured but not enabled. To see the full list, browse to config/monitoring/prometheus-exporter and config/monitoring/prometheus-servicemonitor folders and deploy them using kubectl apply -f.
There are several other collectors that are pre-configured but not enabled.
To see the full list, browse to config/monitoring/prometheus-exporter
and config/monitoring/prometheus-servicemonitor folders and deploy them
using `kubectl apply -f`.
## Default logs
@ -166,26 +217,29 @@ To enable log collection from other containers and destinations, see
[setting up a logging plugin](setting-up-a-logging-plugin.md).
## Metrics troubleshooting
You can use Prometheus web UI to troubleshoot publishing and service discovery issues for metrics.
To access to the web UI, forward the Prometheus server to your machine:
You can use the Prometheus web UI to troubleshoot publishing and service
discovery issues for metrics. To access to the web UI, forward the Prometheus
server to your machine:
```shell
kubectl port-forward -n monitoring $(kubectl get pods -n monitoring --selector=app=prometheus --output=jsonpath="{.items[0].metadata.name}") 9090
```
Then browse to http://localhost:9090 to access the UI:
Then browse to http://localhost:9090 to access the UI.
* To see the targets that are being scraped, go to Status -> Targets
* To see what Prometheus service discovery is picking up vs. dropping, go to Status -> Service Discovery
## Generating metrics
If you want to send metrics from your controller, follow the steps below.
These steps are already applied to autoscaler and controller. For those controllers,
simply add your new metric definitions to the `view`, create new `tag.Key`s if necessary and
instrument your code as described in step 3.
If you want to send metrics from your controller, follow the steps below. These
steps are already applied to autoscaler and controller. For those controllers,
simply add your new metric definitions to the `view`, create new `tag.Key`s if
necessary and instrument your code as described in step 3.
In the example below, we will setup the service to host the metrics and instrument a sample
'Gauge' type metric using the setup.
In the example below, we will setup the service to host the metrics and
instrument a sample 'Gauge' type metric using the setup.
1. First, go through [OpenCensus Go Documentation](https://godoc.org/go.opencensus.io).
2. Add the following to your application startup:
@ -249,7 +303,9 @@ func main() {
http.ListenAndServe(":8080", mux)
}
```
3. In your code where you want to instrument, set the counter with the appropriate label values - example:
3. In your code where you want to instrument, set the counter with the
appropriate label values - example:
```go
ctx := context.TODO()
@ -260,7 +316,8 @@ tag.New(
stats.Record(ctx, desiredPodCountM.M({Measurement Value}))
```
4. Add the following to scape config file located at config/monitoring/200-common/300-prometheus/100-scrape-config.yaml:
4. Add the following to scape config file located at
config/monitoring/200-common/300-prometheus/100-scrape-config.yaml:
```yaml
- job_name: <YOUR SERVICE NAME>
@ -297,29 +354,36 @@ kubectl apply -f config/monitoring/200-common/300-prometheus
6. Add a dashboard for your metrics - you can see examples of it under
config/grafana/dashboard-definition folder. An easy way to generate JSON
definitions is to use Grafana UI (make sure to login with as admin user) and [export JSON](http://docs.grafana.org/reference/export_import) from it.
definitions is to use Grafana UI (make sure to login with as admin user) and
[export JSON](http://docs.grafana.org/reference/export_import) from it.
7. Validate the metrics flow either by Grafana UI or Prometheus UI (see Troubleshooting section
above to enable Prometheus UI)
7. Validate the metrics flow either by Grafana UI or Prometheus UI (see
Troubleshooting section above to enable Prometheus UI)
## Generating logs
Use [glog](https://godoc.org/github.com/golang/glog) to write logs in your code. In your container spec, add the following args to redirect the logs to stderr:
<!--TODO: Explain why we recommend using glog. -->
Use [glog](https://godoc.org/github.com/golang/glog) to write logs in your code.
In your container spec, add the following arguments to redirect the logs to stderr:
```yaml
args:
- "-logtostderr=true"
- "-stderrthreshold=INFO"
```
See [helloworld](../sample/helloworld/README.md) sample's configuration file as an example.
See [helloworld](../sample/helloworld/README.md) sample's configuration file as
an example.
## Distributed tracing with Zipkin
Check [Telemetry sample](../sample/telemetrysample/README.md) as an example usage of [OpenZipkin](https://zipkin.io/pages/existing_instrumentations)'s Go client library.
Check [Telemetry sample](../sample/telemetrysample/README.md) as an example usage of
[OpenZipkin](https://zipkin.io/pages/existing_instrumentations)'s Go client library.
## Delete monitoring components
Enter:
```shell
ko delete --ignore-not-found=true \
-f config/monitoring/200-common/100-istio.yaml \
-f config/monitoring/200-common/100-zipkin.yaml \
-f config/monitoring/100-common
```
```