Updates to application debugging and telemetry documentation (#1258)

* Updates to debugging guide * Update telemetry.md Edit for clarity, add screenshots, steps, etc. Still need to upload screenshots and add in links * Add two screenshots of the Kibana UI * Add urls to screenshots * Adding troubleshooting on Istio routing Needs to be more actionable, but this is a start. * Wording, grammer, etc. * Typo fix * Update telemetry.md * Update telemetry.md * Update telemetry.md * Addressing Ivan's comments * Remove reference to Pantheon * Update telemetry.md Clarify the relationship between Kibana and Elasticsearch * Fixing whitespace so list renders properly * Responding to Evan's comments * More updates from Evan's PR comments * Created a /docs/images folder for Kibana screenshots, fixed line wrapping. * Adding docs/images folder. * Making links to images relative. * Removed hover text from link.
2018-06-26 17:24:21 -07:00 · 2018-06-26 17:24:21 -07:00 · 896bd5dac0
parent 6d3a3c5f47
commit 896bd5dac0
4 changed files with 212 additions and 118 deletions
--- a/debugging/application-debugging-guide.md
+++ b/debugging/application-debugging-guide.md
@ -1,17 +1,17 @@
 # Application Debugging Guide

-You deployed your app to Knative Serving but it is not working as expected. Go through
-this step by step guide to understand what failed.
+You deployed your app to Knative Serving, but it isn't working as expected.
+Go through this step by step guide to understand what failed.

 ## Check command line output

 Check your deploy command output to see whether it succeeded or not. If your
-deployment process was terminated, there should be error message showing up in
-the output and describing the reason why the deployment failed.
+deployment process was terminated, there should be an error message showing up
+in the output that describes the reason why the deployment failed.

-This kind of failures is most likely due to either misconfigured manifest or wrong
-command. For example, the following output says that you should configure route
-traffic percent summing to 100:
+This kind of failure is most likely due to either a misconfigured manifest or
+wrong command. For example, the following output says that you must configure
+route traffic percent to sum to 100:

 ```
 Error from server (InternalError): error when applying patch:
@ -22,25 +22,78 @@ for: "STDIN": Internal error occurred: admission webhook "webhook.knative.dev" d
 ERROR: Non-zero return code '1' from command: Process exited with status 1
 ```

+## Check application logs
+Knative Serving provides default out-of-the-box logs for your application. After entering
+`kubectl proxy`, you can go to the
+[Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana)
+to search for logs. _(See [telemetry guide](../telemetry.md) for more information
+on logging and monitoring features of Knative Serving.)_
+
+### Stdout/stderr logs
+
+To find the logs sent to `stdout/stderr` from your application in the
+Kibana UI:
+
+1. Click `Discover` on the left side bar.
+1. Choose `logstash-*` index pattern on the left top.
+1. Input `tag: kubernetes*` in the top search bar then search.
+
+### Request logs
+
+To find the request logs of your application in the Kibana UI :
+
+1. Click `Discover` on the left side bar.
+1. Choose `logstash-*` index pattern on the left top.
+1. Input `tag: "requestlog.logentry.istio-system"` in the top search bar then
+   search.
+
 ## Check Route status

-Run the following command to get `status` of the `Route` with which you deployed
-your application:
+Run the following command to get the `status` of the `Route` object with which
+you deployed your application:

 ```shell
 kubectl get route <route-name> -o yaml
 ```

 The `conditions` in `status` provide the reason if there is any failure. For
-details, see Elafro
+details, see Knative
 [Error Conditions and Reporting](../spec/errors.md)(currently some of them
 are not implemented yet).

-## Check revision status
+### Check Istio routing

-If you configure your `Route` with `Configuration`, run the following command to
-get the name of the `Revision` created for you deployment(look up the
-configuration name in the `Route` yaml file):
+Compare your Knative `Route` object's configuration (obtained in the previous
+step) to the Istio `RouteRule` object's configuration.
+
+Enter the following, replacing `<routerule-name>` with the appropriate value:
+
+```shell
+kubectl get routerule <routerule-name> -o yaml
+```
+
+If you don't know the name of your route rule, use the
+```kubectl get routerule``` command to find it.
+
+The command returns the configuration of your route rule. Compare the domains
+between your route and route rule; they should match.
+
+### Check ingress status
+Enter:
+
+```shell
+kubectl get ingress
+```
+
+The command returns the status of the ingress. You can see the name, age,
+domains, and IP address.
+
+
+## Check Revision status
+
+If you configure your `Route` with `Configuration`, run the following
+command to get the name of the `Revision` created for you deployment
+(look up the configuration name in the `Route` .yaml file):

 ```shell
 kubectl get configuration <configuration-name> -o jsonpath="{.status.latestCreatedRevisionName}"
@ -64,19 +117,19 @@ conditions:
    type: Ready
 ```

-If you see this condition, to debug further:
+If you see this condition, check the following to continue debugging:

-  1. [Check Pod status](#check-pod-status)
-  1. [Check application logs](#check-application-logs)
-  1. [Check Istio routing](#check-istio-routing)
+  * [Check Pod status](#check-pod-status)
+  * [Check application logs](#check-application-logs)
+  * [Check Istio routing](#check-istio-routing)

 If you see other conditions, to debug further:

-  1. Look up the meaning of the conditions in Elafro
+  * Look up the meaning of the conditions in Knative
     [Error Conditions and Reporting](../spec/errors.md). Note: some of them
-     are not implemented yet. An alternation is to
+     are not implemented yet. An alternative is to
     [check Pod status](#check-pod-status).
-  1. If you are using `BUILD` to deploy and the `BuidComplete` condition is not
+  * If you are using `BUILD` to deploy and the `BuidComplete` condition is not
     `True`, [check BUILD status](#check-build-status).

 ## Check Pod status
@ -112,36 +165,13 @@ your `Revision`:
 kubectl get build $(kubectl get revision <revision-name> -o jsonpath="{.spec.buildName}") -o yaml
 ```

-The `conditions` in `status` provide the reason if there is any failure. To access build logs, first execute `kubectl proxy` and then open [Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana). Use any of the following filters within Kibana UI to see build logs. _(See [telemetry guide](../telemetry.md) for more information on logging and monitoring features of Knative Serving.)_
+The `conditions` in `status` provide the reason if there is any failure. To
+access build logs, first execute `kubectl proxy` and then open [Kibana
+UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-
+logging/proxy/app/kibana). Use any of the following filters within Kibana UI to
+see build logs. _(See [telemetry guide](../telemetry.md) for more information on
+logging and monitoring features of Knative Serving.)_
+
 * All build logs: `_exists_:"kubernetes.labels.build-name"`
 * Build logs for a specific build: `kubernetes.labels.build-name:"<BUILD NAME>"`
 * Build logs for a specific build and step: `kubernetes.labels.build-name:"<BUILD NAME>" AND kubernetes.container_name:"build-step-<BUILD STEP NAME>"`
-
-## Check application logs
-Knative Serving provides default out-of-box logs for your application. After executing
-`kubectl proxy`, you can go to the
-[Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana)
-to search for logs. _(See [telemetry guide](../telemetry.md) for more information on logging and monitoring features of Knative Serving.)_
-
-### Stdout/stderr logs
-
-You can find the logs emitted to `stdout/stderr` from your application on
-Kibana UI by following steps:
-
-1. Click `Discover` on the left side bar.
-1. Choose `logstash-*` index pattern on the left top.
-1. Input `tag: kubernetes*` in the top search bar then search.
-
-### Request logs
-
-You can find the request logs of your application on Kibana UI by following
-steps:
-
-1. Click `Discover` on the left side bar.
-1. Choose `logstash-*` index pattern on the left top.
-1. Input `tag: "requestlog.logentry.istio-system"` in the top search bar then
-   search.
-
-## Check Istio routing
-
-TBD.
--- a/images/kibana-discover-tab-annotated.png
+++ b/images/kibana-discover-tab-annotated.png
--- a/images/kibana-landing-page-configure-index.png
+++ b/images/kibana-landing-page-configure-index.png
--- a/telemetry.md
+++ b/telemetry.md
@ -1,47 +1,50 @@
 # Logs and metrics

-## Monitoring components Setup
+## Monitoring components setup

 First, deploy monitoring components.

-### Elasticsearch, Kibana, Prometheus & Grafana Setup
+### Elasticsearch, Kibana, Prometheus, and Grafana Setup

 You can use two different setups:

-1. **150-elasticsearch-prod**: This configuration collects logs & metrics from user containers, build controller and Istio requests.
+1. **150-elasticsearch-prod**: This configuration collects logs & metrics from
+user containers, build controller and Istio requests.

-```shell
-kubectl apply -R -f config/monitoring/100-common \
-    -f config/monitoring/150-elasticsearch-prod \
-    -f third_party/config/monitoring/common \
-    -f third_party/config/monitoring/elasticsearch \
-    -f config/monitoring/200-common \
-    -f config/monitoring/200-common/100-istio.yaml
-```
+	```shell
+	kubectl apply -R -f config/monitoring/100-common \
+	    -f config/monitoring/150-elasticsearch-prod \
+	    -f third_party/config/monitoring/common \
+	    -f third_party/config/monitoring/elasticsearch \
+	    -f config/monitoring/200-common \
+	    -f config/monitoring/200-common/100-istio.yaml
+	```

-1. **150-elasticsearch-dev**: This configuration collects everything in (1) plus Knative Serving controller logs.
+1. **150-elasticsearch-dev**: This configuration collects everything **150
+-elasticsearch-prod** does, plus Knative Serving controller logs.

-```shell
-kubectl apply -R -f config/monitoring/100-common \
-    -f config/monitoring/150-elasticsearch-dev \
-    -f third_party/config/monitoring/common \
-    -f third_party/config/monitoring/elasticsearch \
-    -f config/monitoring/200-common \
-    -f config/monitoring/200-common/100-istio.yaml
-```
+	```shell
+	kubectl apply -R -f config/monitoring/100-common \
+	    -f config/monitoring/150-elasticsearch-dev \
+	    -f third_party/config/monitoring/common \
+	    -f third_party/config/monitoring/elasticsearch \
+	    -f config/monitoring/200-common \
+	    -f config/monitoring/200-common/100-istio.yaml
+	```

-### Stackdriver(logs), Prometheus & Grafana Setup
+### Stackdriver, Prometheus, and Grafana Setup

-If your Knative Serving is not built on a GCP based cluster or you want to send logs to
-another GCP project, you need to build your own Fluentd image and modify the
-configuration first. See
+If your Knative Serving is not built on a Google Cloud Platform based cluster,
+or you want to send logs to another GCP project, you need to build your own
+Fluentd image and modify the configuration first. See

 1. [Fluentd image on Knative Serving](/image/fluentd/README.md)
 2. [Setting up a logging plugin](setting-up-a-logging-plugin.md)

 Then you can use two different setups:

-1. **150-stackdriver-prod**: This configuration collects logs & metrics from user containers, build controller and Istio requests.
+1. **150-stackdriver-prod**: This configuration collects logs and metrics from
+user containers, build controller, and Istio requests.

 ```shell
 kubectl apply -R -f config/monitoring/100-common \
@ -51,7 +54,8 @@ kubectl apply -R -f config/monitoring/100-common \
    -f config/monitoring/200-common/100-istio.yaml
 ```

-2. **150-stackdriver-dev**: This configuration collects everything in (1) plus Knative Serving controller logs.
+2. **150-stackdriver-dev**: This configuration collects everything **150
+-stackdriver-prod** does, plus Knative Serving controller logs.

 ```shell
 kubectl apply -R -f config/monitoring/100-common \
@ -63,29 +67,55 @@ kubectl apply -R -f config/monitoring/100-common \

 ## Accessing logs

-### Elasticsearch & Kibana
+### Kibana and Elasticsearch

-Run,
+To open the Kibana UI (the visualization tool for [Elasticsearch](https://info.elastic.co),
+enter the following command:

 ```shell
 kubectl proxy
 ```

-Then open Kibana UI at this [link](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana)
-(*it might take a couple of minutes for the proxy to work*).
-When Kibana is opened the first time, it will ask you to create an index. Accept the default options as is. As more logs get ingested,
-new fields will be discovered and to have them indexed, go to Management -> Index Patterns -> Refresh button (on top right) -> Refresh fields.
+This starts a local proxy of Kibana on port 8001. The Kibana UI is only exposed within
+the cluster for security reasons.
+
+Navigate to the [Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana)
+(*It might take a couple of minutes for the proxy to work*).
+
+When Kibana is opened the first time, it will ask you to create an index.
+Accept the default options:
+
+![Kibana UI Configuring an Index Pattern](images/kibana-landing-page-configure-index.png)
+
+The Discover tab of the Kibana UI looks like this:
+
+![Kibana UI Discover tab](images/kibana-discover-tab-annotated.png)
+
+You can change the time frame of logs Kibana displays in the upper right corner
+of the screen. The main search bar is across the top of the Dicover page.
+
+As more logs are ingested, new fields will be discovered. To have them indexed,
+go to Management > Index Patterns > Refresh button (on top right) > Refresh
+fields.
+
+<!-- TODO: create a video walkthrough of the Kibana UI -->

 #### Accessing configuration and revision logs

-To access to logs for a configuration, use the following search term in Kibana UI:
+To access the logs for a configuration, enter the following search query in Kibana:
+
 ```
 kubernetes.labels.knative_dev\/configuration: "configuration-example"
 ```

-Replace `configuration-example` with your configuration's name.
+Replace `configuration-example` with your configuration's name. Enter the following
+command to get your configuration's name:

-To access logs for a revision, use the following search term in Kibana UI:
+```shell
+kubectl get configurations
+```
+
+To access logs for a revision, enter the following search query in Kibana:

 ```
 kubernetes.labels.knative_dev\/revision: "configuration-example-00001"
@ -95,13 +125,13 @@ Replace `configuration-example-00001` with your revision's name.

 #### Accessing build logs

-To access to logs for a build, use the following search term in Kibana UI:
+To access the logs for a build, enter the following search query in Kibana:

 ```
 kubernetes.labels.build\-name: "test-build"
 ```

-Replace `test-build` with your build's name. A build's name is specified in its YAML file as follows:
+Replace `test-build` with your build's name. The build name is specified in the `.yaml` file as follows:

 ```yaml
 apiVersion: build.dev/v1alpha1
@ -112,18 +142,19 @@ metadata:

 ### Stackdriver

-Go to [Pantheon logging page](https://console.cloud.google.com/logs/viewer) for
+Go to the [Google Cloud Console logging page](https://console.cloud.google.com/logs/viewer) for
 your GCP project which stores your logs via Stackdriver.

 ## Accessing metrics

-Run:
+Enter:

 ```shell
 kubectl port-forward -n monitoring $(kubectl get pods -n monitoring --selector=app=grafana --output=jsonpath="{.items..metadata.name}") 3000
 ```

-Then open Grafana UI at [http://localhost:3000](http://localhost:3000). The following dashboards are pre-installed with Knative Serving:
+Then open the Grafana UI at [http://localhost:3000](http://localhost:3000). The following dashboards are
+pre-installed with Knative Serving:

 * **Revision HTTP Requests:** HTTP request count, latency and size metrics per revision and per configuration
 * **Nodes:** CPU, memory, network and disk metrics at node level
@ -134,26 +165,46 @@ Then open Grafana UI at [http://localhost:3000](http://localhost:3000). The foll

 ### Accessing per request traces

-First open Kibana UI as shown above. Browse to Management -> Index Patterns -> +Create Index Pattern and type "zipkin*" (without the quotes) to the "Index pattern" text field and hit "Create" button. This will create a new index pattern that will store per request traces captured by Zipkin. This is a one time step and is needed only for fresh installations.
+Before you can view per request metrics, you'll need to create a new index pattern that will store
+per request traces captured by Zipkin:

-Next, start the proxy if it is not already running:
+1. Start the Kibana UI serving on local port 8001 by entering the following command:

-```shell
-kubectl proxy
-```
+	```shell
+	kubectl proxy
+	```

-Then open Zipkin UI at this [link](http://localhost:8001/api/v1/namespaces/istio-system/services/zipkin:9411/proxy/zipkin/). Click on "Find Traces" to see the latest traces. You can search for a trace ID or look at traces of a specific application within this UI. Click on a trace to see a detailed view of a specific call.
+1. Open the [Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana). 

-To see a demo of distributed tracing, deploy the [Telemetry sample](../sample/telemetrysample/README.md), send some traffic to it and explore the traces it generates from Zipkin UI.
+1. Navigate to Management -> Index Patterns -> Create Index Pattern.
+
+1. Enter `zipkin*` in the "Index pattern" text field.
+
+1. Click **Create**.
+
+After you've created the Zipkin index pattern, open the
+[Zipkin UI](http://localhost:8001/api/v1/namespaces/istio-system/services/zipkin:9411/proxy/zipkin/).
+Click on "Find Traces" to see the latest traces. You can search for a trace ID
+or look at traces of a specific application. Click on a trace to see a detailed
+view of a specific call.
+
+To see a demo of distributed tracing, deploy the
+[Telemetry sample](../sample/telemetrysample/README.md), send some traffic to it,
+then explore the traces it generates from Zipkin UI.
+
+<!--TODO: Consider adding a video here. -->

 ## Default metrics

-Following metrics are collected by default:
+The following metrics are collected by default:
 * Knative Serving controller metrics
 * Istio metrics (mixer, envoy and pilot)
 * Node and pod metrics

-There are several other collectors that are pre-configured but not enabled. To see the full list, browse to config/monitoring/prometheus-exporter and config/monitoring/prometheus-servicemonitor folders and deploy them using kubectl apply -f.
+There are several other collectors that are pre-configured but not enabled.
+To see the full list, browse to config/monitoring/prometheus-exporter
+and config/monitoring/prometheus-servicemonitor folders and deploy them
+using `kubectl apply -f`.

 ## Default logs

@ -166,26 +217,29 @@ To enable log collection from other containers and destinations, see
 [setting up a logging plugin](setting-up-a-logging-plugin.md).

 ## Metrics troubleshooting
-You can use Prometheus web UI to troubleshoot publishing and service discovery issues for metrics.
-To access to the web UI, forward the Prometheus server to your machine:
+
+You can use the Prometheus web UI to troubleshoot publishing and service
+discovery issues for metrics. To access to the web UI, forward the Prometheus
+server to your machine:

 ```shell
 kubectl port-forward -n monitoring $(kubectl get pods -n monitoring --selector=app=prometheus --output=jsonpath="{.items[0].metadata.name}") 9090
 ```

-Then browse to http://localhost:9090 to access the UI:
+Then browse to http://localhost:9090 to access the UI.
+
 * To see the targets that are being scraped, go to Status -> Targets
 * To see what Prometheus service discovery is picking up vs. dropping, go to Status -> Service Discovery

 ## Generating metrics

-If you want to send metrics from your controller, follow the steps below.
-These steps are already applied to autoscaler and controller. For those controllers,
-simply add your new metric definitions to the `view`, create new `tag.Key`s if necessary and
-instrument your code as described in step 3.
+If you want to send metrics from your controller, follow the steps below. These
+steps are already applied to autoscaler and controller. For those controllers,
+simply add your new metric definitions to the `view`, create new `tag.Key`s if
+necessary and instrument your code as described in step 3.

-In the example below, we will setup the service to host the metrics and instrument a sample
-'Gauge' type metric using the setup.
+In the example below, we will setup the service to host the metrics and
+instrument a sample 'Gauge' type metric using the setup.

 1. First, go through [OpenCensus Go Documentation](https://godoc.org/go.opencensus.io).
 2. Add the following to your application startup:
@ -249,7 +303,9 @@ func main() {
 	http.ListenAndServe(":8080", mux)
 }
 ```
-3. In your code where you want to instrument, set the counter with the appropriate label values - example:
+
+3. In your code where you want to instrument, set the counter with the
+appropriate label values - example:

 ```go
 ctx := context.TODO()
@ -260,7 +316,8 @@ tag.New(
 stats.Record(ctx, desiredPodCountM.M({Measurement Value}))
 ```

-4. Add the following to scape config file located at config/monitoring/200-common/300-prometheus/100-scrape-config.yaml:
+4. Add the following to scape config file located at
+config/monitoring/200-common/300-prometheus/100-scrape-config.yaml:

 ```yaml
 - job_name: <YOUR SERVICE NAME>
@ -297,29 +354,36 @@ kubectl apply -f config/monitoring/200-common/300-prometheus

 6. Add a dashboard for your metrics - you can see examples of it under
 config/grafana/dashboard-definition folder. An easy way to generate JSON
-definitions is to use Grafana UI (make sure to login with as admin user) and [export JSON](http://docs.grafana.org/reference/export_import) from it.
+definitions is to use Grafana UI (make sure to login with as admin user) and
+[export JSON](http://docs.grafana.org/reference/export_import) from it.

-7. Validate the metrics flow either by Grafana UI or Prometheus UI (see Troubleshooting section
-above to enable Prometheus UI)
+7. Validate the metrics flow either by Grafana UI or Prometheus UI (see
+Troubleshooting section above to enable Prometheus UI)

 ## Generating logs
-Use [glog](https://godoc.org/github.com/golang/glog) to write logs in your code. In your container spec, add the following args to redirect the logs to stderr:
+<!--TODO: Explain why we recommend using glog. -->
+Use [glog](https://godoc.org/github.com/golang/glog) to write logs in your code.
+In your container spec, add the following arguments to redirect the logs to stderr:
+
 ```yaml
 args:
 - "-logtostderr=true"
 - "-stderrthreshold=INFO"
 ```

-See [helloworld](../sample/helloworld/README.md) sample's configuration file as an example.
+See [helloworld](../sample/helloworld/README.md) sample's configuration file as
+an example.

 ## Distributed tracing with Zipkin
-Check [Telemetry sample](../sample/telemetrysample/README.md) as an example usage of [OpenZipkin](https://zipkin.io/pages/existing_instrumentations)'s Go client library.
+Check [Telemetry sample](../sample/telemetrysample/README.md) as an example usage of
+[OpenZipkin](https://zipkin.io/pages/existing_instrumentations)'s Go client library.

 ## Delete monitoring components
+Enter:

 ```shell
 ko delete --ignore-not-found=true \
  -f config/monitoring/200-common/100-istio.yaml \
  -f config/monitoring/200-common/100-zipkin.yaml \
  -f config/monitoring/100-common
-```
+```