Updates to application debugging and telemetry documentation (#1258)

* Updates to debugging guide * Update telemetry.md Edit for clarity, add screenshots, steps, etc. Still need to upload screenshots and add in links * Add two screenshots of the Kibana UI * Add urls to screenshots * Adding troubleshooting on Istio routing Needs to be more actionable, but this is a start. * Wording, grammer, etc. * Typo fix * Update telemetry.md * Update telemetry.md * Update telemetry.md * Addressing Ivan's comments * Remove reference to Pantheon * Update telemetry.md Clarify the relationship between Kibana and Elasticsearch * Fixing whitespace so list renders properly * Responding to Evan's comments * More updates from Evan's PR comments * Created a /docs/images folder for Kibana screenshots, fixed line wrapping. * Adding docs/images folder. * Making links to images relative. * Removed hover text from link.
2018-06-26 17:24:21 -07:00 · 2018-06-26 17:24:21 -07:00 · 896bd5dac0
parent 6d3a3c5f47
commit 896bd5dac0
4 changed files with 212 additions and 118 deletions
--- a/debugging/application-debugging-guide.md
+++ b/debugging/application-debugging-guide.md
@ -1,17 +1,17 @@
 # Application Debugging Guide
-You deployed your app to Knative Serving but it is not working as expected. Go through
+You deployed your app to Knative Serving, but it isn't working as expected.
-this step by step guide to understand what failed.
+Go through this step by step guide to understand what failed.
 ## Check command line output
 Check your deploy command output to see whether it succeeded or not. If your
-deployment process was terminated, there should be error message showing up in
+deployment process was terminated, there should be an error message showing up
-the output and describing the reason why the deployment failed.
+in the output that describes the reason why the deployment failed.
-This kind of failures is most likely due to either misconfigured manifest or wrong
+This kind of failure is most likely due to either a misconfigured manifest or
-command. For example, the following output says that you should configure route
+wrong command. For example, the following output says that you must configure
-traffic percent summing to 100:
+route traffic percent to sum to 100:
 ```
 Error from server (InternalError): error when applying patch:
@ -22,25 +22,78 @@ for: "STDIN": Internal error occurred: admission webhook "webhook.knative.dev" d
 ERROR: Non-zero return code '1' from command: Process exited with status 1
 ```
 ## Check application logs
 Knative Serving provides default out-of-the-box logs for your application. After entering
 `kubectl proxy`, you can go to the
 [Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana)
 to search for logs. _(See [telemetry guide](../telemetry.md) for more information
 on logging and monitoring features of Knative Serving.)_
 ### Stdout/stderr logs
 To find the logs sent to `stdout/stderr` from your application in the
 Kibana UI:
 1. Click `Discover` on the left side bar.
 1. Choose `logstash-*` index pattern on the left top.
 1. Input `tag: kubernetes*` in the top search bar then search.
 ### Request logs
 To find the request logs of your application in the Kibana UI :
 1. Click `Discover` on the left side bar.
 1. Choose `logstash-*` index pattern on the left top.
 1. Input `tag: "requestlog.logentry.istio-system"` in the top search bar then
   search.
 ## Check Route status
-Run the following command to get `status` of the `Route` with which you deployed
+Run the following command to get the `status` of the `Route` object with which
-your application:
+you deployed your application:
 ```shell
 kubectl get route <route-name> -o yaml
 ```
 The `conditions` in `status` provide the reason if there is any failure. For
-details, see Elafro
+details, see Knative
 [Error Conditions and Reporting](../spec/errors.md)(currently some of them
 are not implemented yet).
-## Check revision status
+### Check Istio routing
-If you configure your `Route` with `Configuration`, run the following command to
+Compare your Knative `Route` object's configuration (obtained in the previous
-get the name of the `Revision` created for you deployment(look up the
+step) to the Istio `RouteRule` object's configuration.
-configuration name in the `Route` yaml file):
+
 Enter the following, replacing `<routerule-name>` with the appropriate value:
 ```shell
 kubectl get routerule <routerule-name> -o yaml
 ```
 If you don't know the name of your route rule, use the
 ```kubectl get routerule``` command to find it.
 The command returns the configuration of your route rule. Compare the domains
 between your route and route rule; they should match.
 ### Check ingress status
 Enter:
 ```shell
 kubectl get ingress
 ```
 The command returns the status of the ingress. You can see the name, age,
 domains, and IP address.
 ## Check Revision status
 If you configure your `Route` with `Configuration`, run the following
 command to get the name of the `Revision` created for you deployment
 (look up the configuration name in the `Route` .yaml file):
 ```shell
 kubectl get configuration <configuration-name> -o jsonpath="{.status.latestCreatedRevisionName}"
@ -64,19 +117,19 @@ conditions:
    type: Ready
 ```
-If you see this condition, to debug further:
+If you see this condition, check the following to continue debugging:
-  1. [Check Pod status](#check-pod-status)
+  * [Check Pod status](#check-pod-status)
-  1. [Check application logs](#check-application-logs)
+  * [Check application logs](#check-application-logs)
-  1. [Check Istio routing](#check-istio-routing)
+  * [Check Istio routing](#check-istio-routing)
 If you see other conditions, to debug further:
-  1. Look up the meaning of the conditions in Elafro
+  * Look up the meaning of the conditions in Knative
     [Error Conditions and Reporting](../spec/errors.md). Note: some of them
-     are not implemented yet. An alternation is to
+     are not implemented yet. An alternative is to
     [check Pod status](#check-pod-status).
-  1. If you are using `BUILD` to deploy and the `BuidComplete` condition is not
+  * If you are using `BUILD` to deploy and the `BuidComplete` condition is not
     `True`, [check BUILD status](#check-build-status).
 ## Check Pod status
@ -112,36 +165,13 @@ your `Revision`:
 kubectl get build $(kubectl get revision <revision-name> -o jsonpath="{.spec.buildName}") -o yaml
 ```
-The `conditions` in `status` provide the reason if there is any failure. To access build logs, first execute `kubectl proxy` and then open [Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana). Use any of the following filters within Kibana UI to see build logs. _(See [telemetry guide](../telemetry.md) for more information on logging and monitoring features of Knative Serving.)_
+The `conditions` in `status` provide the reason if there is any failure. To
 access build logs, first execute `kubectl proxy` and then open [Kibana
 UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-
 logging/proxy/app/kibana). Use any of the following filters within Kibana UI to
 see build logs. _(See [telemetry guide](../telemetry.md) for more information on
 logging and monitoring features of Knative Serving.)_
 * All build logs: `_exists_:"kubernetes.labels.build-name"`
 * Build logs for a specific build: `kubernetes.labels.build-name:"<BUILD NAME>"`
 * Build logs for a specific build and step: `kubernetes.labels.build-name:"<BUILD NAME>" AND kubernetes.container_name:"build-step-<BUILD STEP NAME>"`
 ## Check application logs
 Knative Serving provides default out-of-box logs for your application. After executing
 `kubectl proxy`, you can go to the
 [Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana)
 to search for logs. _(See [telemetry guide](../telemetry.md) for more information on logging and monitoring features of Knative Serving.)_
 ### Stdout/stderr logs
 You can find the logs emitted to `stdout/stderr` from your application on
 Kibana UI by following steps:
 1. Click `Discover` on the left side bar.
 1. Choose `logstash-*` index pattern on the left top.
 1. Input `tag: kubernetes*` in the top search bar then search.
 ### Request logs
 You can find the request logs of your application on Kibana UI by following
 steps:
 1. Click `Discover` on the left side bar.
 1. Choose `logstash-*` index pattern on the left top.
 1. Input `tag: "requestlog.logentry.istio-system"` in the top search bar then
   search.
 ## Check Istio routing
 TBD.
--- a/images/kibana-discover-tab-annotated.png
+++ b/images/kibana-discover-tab-annotated.png
--- a/images/kibana-landing-page-configure-index.png
+++ b/images/kibana-landing-page-configure-index.png
--- a/telemetry.md
+++ b/telemetry.md
@ -1,47 +1,50 @@
 # Logs and metrics
-## Monitoring components Setup
+## Monitoring components setup
 First, deploy monitoring components.
-### Elasticsearch, Kibana, Prometheus & Grafana Setup
+### Elasticsearch, Kibana, Prometheus, and Grafana Setup
 You can use two different setups:
-1. **150-elasticsearch-prod**: This configuration collects logs & metrics from user containers, build controller and Istio requests.
+1. **150-elasticsearch-prod**: This configuration collects logs & metrics from
 user containers, build controller and Istio requests.
-```shell
+	```shell
-kubectl apply -R -f config/monitoring/100-common \
+	kubectl apply -R -f config/monitoring/100-common \
-    -f config/monitoring/150-elasticsearch-prod \
+	    -f config/monitoring/150-elasticsearch-prod \
-    -f third_party/config/monitoring/common \
+	    -f third_party/config/monitoring/common \
-    -f third_party/config/monitoring/elasticsearch \
+	    -f third_party/config/monitoring/elasticsearch \
-    -f config/monitoring/200-common \
+	    -f config/monitoring/200-common \
-    -f config/monitoring/200-common/100-istio.yaml
+	    -f config/monitoring/200-common/100-istio.yaml
-```
+	```
-1. **150-elasticsearch-dev**: This configuration collects everything in (1) plus Knative Serving controller logs.
+1. **150-elasticsearch-dev**: This configuration collects everything **150
 -elasticsearch-prod** does, plus Knative Serving controller logs.
-```shell
+	```shell
-kubectl apply -R -f config/monitoring/100-common \
+	kubectl apply -R -f config/monitoring/100-common \
-    -f config/monitoring/150-elasticsearch-dev \
+	    -f config/monitoring/150-elasticsearch-dev \
-    -f third_party/config/monitoring/common \
+	    -f third_party/config/monitoring/common \
-    -f third_party/config/monitoring/elasticsearch \
+	    -f third_party/config/monitoring/elasticsearch \
-    -f config/monitoring/200-common \
+	    -f config/monitoring/200-common \
-    -f config/monitoring/200-common/100-istio.yaml
+	    -f config/monitoring/200-common/100-istio.yaml
-```
+	```
-### Stackdriver(logs), Prometheus & Grafana Setup
+### Stackdriver, Prometheus, and Grafana Setup
-If your Knative Serving is not built on a GCP based cluster or you want to send logs to
+If your Knative Serving is not built on a Google Cloud Platform based cluster,
-another GCP project, you need to build your own Fluentd image and modify the
+or you want to send logs to another GCP project, you need to build your own
-configuration first. See
+Fluentd image and modify the configuration first. See
 1. [Fluentd image on Knative Serving](/image/fluentd/README.md)
 2. [Setting up a logging plugin](setting-up-a-logging-plugin.md)
 Then you can use two different setups:
-1. **150-stackdriver-prod**: This configuration collects logs & metrics from user containers, build controller and Istio requests.
+1. **150-stackdriver-prod**: This configuration collects logs and metrics from
 user containers, build controller, and Istio requests.
 ```shell
 kubectl apply -R -f config/monitoring/100-common \
@ -51,7 +54,8 @@ kubectl apply -R -f config/monitoring/100-common \
    -f config/monitoring/200-common/100-istio.yaml
 ```
-2. **150-stackdriver-dev**: This configuration collects everything in (1) plus Knative Serving controller logs.
+2. **150-stackdriver-dev**: This configuration collects everything **150
 -stackdriver-prod** does, plus Knative Serving controller logs.
 ```shell
 kubectl apply -R -f config/monitoring/100-common \
@ -63,29 +67,55 @@ kubectl apply -R -f config/monitoring/100-common \
 ## Accessing logs
-### Elasticsearch & Kibana
+### Kibana and Elasticsearch
-Run,
+To open the Kibana UI (the visualization tool for [Elasticsearch](https://info.elastic.co),
 enter the following command:
 ```shell
 kubectl proxy
 ```
-Then open Kibana UI at this [link](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana)
+This starts a local proxy of Kibana on port 8001. The Kibana UI is only exposed within
-(*it might take a couple of minutes for the proxy to work*).
+the cluster for security reasons.
-When Kibana is opened the first time, it will ask you to create an index. Accept the default options as is. As more logs get ingested,
+
-new fields will be discovered and to have them indexed, go to Management -> Index Patterns -> Refresh button (on top right) -> Refresh fields.
+Navigate to the [Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana)
 (*It might take a couple of minutes for the proxy to work*).
 When Kibana is opened the first time, it will ask you to create an index.
 Accept the default options:
 ![Kibana UI Configuring an Index Pattern](images/kibana-landing-page-configure-index.png)
 The Discover tab of the Kibana UI looks like this:
 ![Kibana UI Discover tab](images/kibana-discover-tab-annotated.png)
 You can change the time frame of logs Kibana displays in the upper right corner
 of the screen. The main search bar is across the top of the Dicover page.
 As more logs are ingested, new fields will be discovered. To have them indexed,
 go to Management > Index Patterns > Refresh button (on top right) > Refresh
 fields.
 <!-- TODO: create a video walkthrough of the Kibana UI -->
 #### Accessing configuration and revision logs
-To access to logs for a configuration, use the following search term in Kibana UI:
+To access the logs for a configuration, enter the following search query in Kibana:
 ```
 kubernetes.labels.knative_dev\/configuration: "configuration-example"
 ```
-Replace `configuration-example` with your configuration's name.
+Replace `configuration-example` with your configuration's name. Enter the following
 command to get your configuration's name:
-To access logs for a revision, use the following search term in Kibana UI:
+```shell
 kubectl get configurations
 ```
 To access logs for a revision, enter the following search query in Kibana:
 ```
 kubernetes.labels.knative_dev\/revision: "configuration-example-00001"
@ -95,13 +125,13 @@ Replace `configuration-example-00001` with your revision's name.
 #### Accessing build logs
-To access to logs for a build, use the following search term in Kibana UI:
+To access the logs for a build, enter the following search query in Kibana:
 ```
 kubernetes.labels.build\-name: "test-build"
 ```
-Replace `test-build` with your build's name. A build's name is specified in its YAML file as follows:
+Replace `test-build` with your build's name. The build name is specified in the `.yaml` file as follows:
 ```yaml
 apiVersion: build.dev/v1alpha1
@ -112,18 +142,19 @@ metadata:
 ### Stackdriver
-Go to [Pantheon logging page](https://console.cloud.google.com/logs/viewer) for
+Go to the [Google Cloud Console logging page](https://console.cloud.google.com/logs/viewer) for
 your GCP project which stores your logs via Stackdriver.
 ## Accessing metrics
-Run:
+Enter:
 ```shell
 kubectl port-forward -n monitoring $(kubectl get pods -n monitoring --selector=app=grafana --output=jsonpath="{.items..metadata.name}") 3000
 ```
-Then open Grafana UI at [http://localhost:3000](http://localhost:3000). The following dashboards are pre-installed with Knative Serving:
+Then open the Grafana UI at [http://localhost:3000](http://localhost:3000). The following dashboards are
 pre-installed with Knative Serving:
 * **Revision HTTP Requests:** HTTP request count, latency and size metrics per revision and per configuration
 * **Nodes:** CPU, memory, network and disk metrics at node level
@ -134,26 +165,46 @@ Then open Grafana UI at [http://localhost:3000](http://localhost:3000). The foll
 ### Accessing per request traces
-First open Kibana UI as shown above. Browse to Management -> Index Patterns -> +Create Index Pattern and type "zipkin*" (without the quotes) to the "Index pattern" text field and hit "Create" button. This will create a new index pattern that will store per request traces captured by Zipkin. This is a one time step and is needed only for fresh installations.
+Before you can view per request metrics, you'll need to create a new index pattern that will store
 per request traces captured by Zipkin:
-Next, start the proxy if it is not already running:
+1. Start the Kibana UI serving on local port 8001 by entering the following command:
-```shell
+	```shell
-kubectl proxy
+	kubectl proxy
-```
+	```
-Then open Zipkin UI at this [link](http://localhost:8001/api/v1/namespaces/istio-system/services/zipkin:9411/proxy/zipkin/). Click on "Find Traces" to see the latest traces. You can search for a trace ID or look at traces of a specific application within this UI. Click on a trace to see a detailed view of a specific call.
+1. Open the [Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana). 
-To see a demo of distributed tracing, deploy the [Telemetry sample](../sample/telemetrysample/README.md), send some traffic to it and explore the traces it generates from Zipkin UI.
+1. Navigate to Management -> Index Patterns -> Create Index Pattern.
 1. Enter `zipkin*` in the "Index pattern" text field.
 1. Click **Create**.
 After you've created the Zipkin index pattern, open the
 [Zipkin UI](http://localhost:8001/api/v1/namespaces/istio-system/services/zipkin:9411/proxy/zipkin/).
 Click on "Find Traces" to see the latest traces. You can search for a trace ID
 or look at traces of a specific application. Click on a trace to see a detailed
 view of a specific call.
 To see a demo of distributed tracing, deploy the
 [Telemetry sample](../sample/telemetrysample/README.md), send some traffic to it,
 then explore the traces it generates from Zipkin UI.
 <!--TODO: Consider adding a video here. -->
 ## Default metrics
-Following metrics are collected by default:
+The following metrics are collected by default:
 * Knative Serving controller metrics
 * Istio metrics (mixer, envoy and pilot)
 * Node and pod metrics
-There are several other collectors that are pre-configured but not enabled. To see the full list, browse to config/monitoring/prometheus-exporter and config/monitoring/prometheus-servicemonitor folders and deploy them using kubectl apply -f.
+There are several other collectors that are pre-configured but not enabled.
 To see the full list, browse to config/monitoring/prometheus-exporter
 and config/monitoring/prometheus-servicemonitor folders and deploy them
 using `kubectl apply -f`.
 ## Default logs
@ -166,26 +217,29 @@ To enable log collection from other containers and destinations, see
 [setting up a logging plugin](setting-up-a-logging-plugin.md).
 ## Metrics troubleshooting
-You can use Prometheus web UI to troubleshoot publishing and service discovery issues for metrics.
+
-To access to the web UI, forward the Prometheus server to your machine:
+You can use the Prometheus web UI to troubleshoot publishing and service
 discovery issues for metrics. To access to the web UI, forward the Prometheus
 server to your machine:
 ```shell
 kubectl port-forward -n monitoring $(kubectl get pods -n monitoring --selector=app=prometheus --output=jsonpath="{.items[0].metadata.name}") 9090
 ```
-Then browse to http://localhost:9090 to access the UI:
+Then browse to http://localhost:9090 to access the UI.
 * To see the targets that are being scraped, go to Status -> Targets
 * To see what Prometheus service discovery is picking up vs. dropping, go to Status -> Service Discovery
 ## Generating metrics
-If you want to send metrics from your controller, follow the steps below.
+If you want to send metrics from your controller, follow the steps below. These
-These steps are already applied to autoscaler and controller. For those controllers,
+steps are already applied to autoscaler and controller. For those controllers,
-simply add your new metric definitions to the `view`, create new `tag.Key`s if necessary and
+simply add your new metric definitions to the `view`, create new `tag.Key`s if
-instrument your code as described in step 3.
+necessary and instrument your code as described in step 3.
-In the example below, we will setup the service to host the metrics and instrument a sample
+In the example below, we will setup the service to host the metrics and
-'Gauge' type metric using the setup.
+instrument a sample 'Gauge' type metric using the setup.
 1. First, go through [OpenCensus Go Documentation](https://godoc.org/go.opencensus.io).
 2. Add the following to your application startup:
@ -249,7 +303,9 @@ func main() {
 	http.ListenAndServe(":8080", mux)
 }
 ```
-3. In your code where you want to instrument, set the counter with the appropriate label values - example:
+
 3. In your code where you want to instrument, set the counter with the
 appropriate label values - example:
 ```go
 ctx := context.TODO()
@ -260,7 +316,8 @@ tag.New(
 stats.Record(ctx, desiredPodCountM.M({Measurement Value}))
 ```
-4. Add the following to scape config file located at config/monitoring/200-common/300-prometheus/100-scrape-config.yaml:
+4. Add the following to scape config file located at
 config/monitoring/200-common/300-prometheus/100-scrape-config.yaml:
 ```yaml
 - job_name: <YOUR SERVICE NAME>
@ -297,29 +354,36 @@ kubectl apply -f config/monitoring/200-common/300-prometheus
 6. Add a dashboard for your metrics - you can see examples of it under
 config/grafana/dashboard-definition folder. An easy way to generate JSON
-definitions is to use Grafana UI (make sure to login with as admin user) and [export JSON](http://docs.grafana.org/reference/export_import) from it.
+definitions is to use Grafana UI (make sure to login with as admin user) and
 [export JSON](http://docs.grafana.org/reference/export_import) from it.
-7. Validate the metrics flow either by Grafana UI or Prometheus UI (see Troubleshooting section
+7. Validate the metrics flow either by Grafana UI or Prometheus UI (see
-above to enable Prometheus UI)
+Troubleshooting section above to enable Prometheus UI)
 ## Generating logs
-Use [glog](https://godoc.org/github.com/golang/glog) to write logs in your code. In your container spec, add the following args to redirect the logs to stderr:
+<!--TODO: Explain why we recommend using glog. -->
 Use [glog](https://godoc.org/github.com/golang/glog) to write logs in your code.
 In your container spec, add the following arguments to redirect the logs to stderr:
 ```yaml
 args:
 - "-logtostderr=true"
 - "-stderrthreshold=INFO"
 ```
-See [helloworld](../sample/helloworld/README.md) sample's configuration file as an example.
+See [helloworld](../sample/helloworld/README.md) sample's configuration file as
 an example.
 ## Distributed tracing with Zipkin
-Check [Telemetry sample](../sample/telemetrysample/README.md) as an example usage of [OpenZipkin](https://zipkin.io/pages/existing_instrumentations)'s Go client library.
+Check [Telemetry sample](../sample/telemetrysample/README.md) as an example usage of
 [OpenZipkin](https://zipkin.io/pages/existing_instrumentations)'s Go client library.
 ## Delete monitoring components
 Enter:
 ```shell
 ko delete --ignore-not-found=true \
  -f config/monitoring/200-common/100-istio.yaml \
  -f config/monitoring/200-common/100-zipkin.yaml \
  -f config/monitoring/100-common
-```
+```