Initial checkin of telemetry ops guide pages. (#2022)

This commit is contained in:
Martin Taillefer 2018-07-30 08:43:33 -07:00 committed by GitHub
parent 822e842599
commit 5f443905bf
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 252 additions and 277 deletions

View File

@ -0,0 +1,50 @@
---
title: Component Debugging
description: How to do low-level debugging of Istio components.
weight: 25
---
You can gain insights into what individual components are doing by inspecting their [logs](/help/ops/component-logging/)
or peering inside via [introspection](/help/ops/controlz/). If that's insufficient, the steps below explain
how to get under the hood.
## With `istioctl`
`istioctl` allows you to inspect the current xDS of a given Envoy from its admin interface (locally) or from Pilot using the `proxy-config` or `pc` command.
For example, to retrieve the configured clusters in an Envoy via the admin interface run the following command:
{{< text bash >}}
$ istioctl proxy-config endpoint <pod-name> clusters
{{< /text >}}
To retrieve endpoints for a given pod in the application namespace from Pilot, run the following command:
{{< text bash >}}
$ istioctl proxy-config pilot -n application <pod-name> eds
{{< /text >}}
The `proxy-config` command also allows you to retrieve the state of the entire mesh from Pilot using the following command:
{{< text bash >}}
$ istioctl proxy-config pilot mesh ads
{{< /text >}}
## With GDB
To debug Istio with `gdb`, you will need to run the debug images of Envoy / Mixer / Pilot. A recent `gdb` and the golang extensions (for Mixer/Pilot or other golang components) is required.
1. `kubectl exec -it PODNAME -c [proxy | mixer | pilot]`
1. Find process ID: ps ax
1. gdb -p PID binary
1. For go: info goroutines, goroutine x bt
## With Tcpdump
Tcpdump doesn't work in the sidecar pod - the container doesn't run as root. However any other container in the same pod will see all the packets, since the
network namespace is shared. `iptables` will also see the pod-wide configuration.
Communication between Envoy and the app happens on 127.0.0.1, and is not encrypted.

View File

@ -169,7 +169,7 @@ Verifying connectivity to Pilot is a useful troubleshooting step. Every proxy co
$ kubectl exec -it $INGRESS_POD_NAME -n istio-system /bin/bash
{{< /text >}}
1. Test connectivity to Pilot using cURL. The following example cURL's the v1 registration API using default Pilot configuration parameters and mutual TLS enabled:
1. Test connectivity to Pilot using `curl`. The following example invokes the v1 registration API using default Pilot configuration parameters and mutual TLS enabled:
{{< text bash >}}
$ curl -k --cert /etc/certs/cert-chain.pem --cacert /etc/certs/root-cert.pem --key /etc/certs/key.pem https://istio-pilot:15003/v1/registration
@ -232,281 +232,6 @@ server {
}
{{< /text >}}
## No Grafana output when connecting from a local web client to Istio remotely hosted
Validate the client and server date and time match.
The time of the web client (e.g. Chrome) affects the output from Grafana. A simple solution
to this problem is to verify a time synchronization service is running correctly within the
Kubernetes cluster and the web client machine also is correctly using a time synchronization
service. Some common time synchronization systems are NTP and Chrony. This is especially
problematic is engineering labs with firewalls. In these scenarios, NTP may not be configured
properly to point at the lab-based NTP services.
## Where are the metrics for my service?
The expected flow of metrics is:
1. Envoy reports attributes to Mixer in batch (asynchronously from requests)
1. Mixer translates the attributes from Mixer into instances based on
operator-provided configuration.
1. The instances are handed to Mixer adapters for processing and backend storage.
1. The backend storage systems record metrics data.
The default installations of Mixer ship with a [Prometheus](https://prometheus.io/)
adapter, as well as configuration for generating a basic set of metric
values and sending them to the Prometheus adapter. The
[Prometheus add-on](/docs/tasks/telemetry/querying-metrics/#about-the-prometheus-add-on)
also supplies configuration for an instance of Prometheus to scrape
Mixer for metrics.
If you do not see the expected metrics in the Istio Dashboard and/or via
Prometheus queries, there may be an issue at any of the steps in the flow
listed above. Below is a set of instructions to troubleshoot each of
those steps.
### Verify Mixer is receiving Report calls
Mixer generates metrics for monitoring the behavior of Mixer itself.
Check these metrics.
1. Establish a connection to the Mixer self-monitoring endpoint.
In Kubernetes environments, execute the following command:
{{< text bash >}}
$ kubectl -n istio-system port-forward <mixer pod> 9093 &
{{< /text >}}
1. Verify successful report calls.
On the [Mixer self-monitoring endpoint](http://localhost:9093/metrics),
search for `grpc_server_handled_total`.
You should see something like:
{{< text plain >}}
grpc_server_handled_total{grpc_code="OK",grpc_method="Report",grpc_service="istio.mixer.v1.Mixer",grpc_type="unary"} 68
{{< /text >}}
If you do not see any data for `grpc_server_handled_total` with a
`grpc_method="Report"`, then Mixer is not being called by Envoy to report
telemetry. In this case, ensure that the services have been properly
integrated into the mesh (either by via
[automatic](/docs/setup/kubernetes/sidecar-injection/#automatic-sidecar-injection)
or [manual](/docs/setup/kubernetes/sidecar-injection/#manual-sidecar-injection) sidecar injection).
### Verify Mixer metrics configuration exists
1. Verify Mixer rules exist.
In Kubernetes environments, issue the following command:
{{< text bash >}}
$ kubectl get rules --all-namespaces
NAMESPACE NAME KIND
istio-system promhttp rule.v1alpha2.config.istio.io
istio-system promtcp rule.v1alpha2.config.istio.io
istio-system stdio rule.v1alpha2.config.istio.io
{{< /text >}}
If you do not see anything named `promhttp` or `promtcp`, then there is
no Mixer configuration for sending metric instances to a Prometheus adapter.
You will need to supply configuration for rules that connect Mixer metric
instances to a Prometheus handler.
<!-- todo replace ([example](https://github.com/istio/istio/blob/master/install/kubernetes/istio.yaml#L892)). -->
1. Verify the Prometheus handler configuration exists.
In Kubernetes environments, issue the following command:
{{< text bash >}}
$ kubectl get prometheuses.config.istio.io --all-namespaces
NAMESPACE NAME KIND
istio-system handler prometheus.v1alpha2.config.istio.io
{{< /text >}}
If there are no prometheus handlers configured, you will need to reconfigure
Mixer with the appropriate handler configuration.
<!-- todo replace ([example](https://github.com/istio/istio/blob/master/install/kubernetes/istio.yaml#L819)) -->
1. Verify the Mixer metric instance configuration exists.
In Kubernetes environments, issue the following command:
{{< text bash >}}
$ kubectl get metrics.config.istio.io --all-namespaces
NAMESPACE NAME KIND
istio-system requestcount metric.v1alpha2.config.istio.io
istio-system requestduration metric.v1alpha2.config.istio.io
istio-system requestsize metric.v1alpha2.config.istio.io
istio-system responsesize metric.v1alpha2.config.istio.io
istio-system stackdriverrequestcount metric.v1alpha2.config.istio.io
istio-system stackdriverrequestduration metric.v1alpha2.config.istio.io
istio-system stackdriverrequestsize metric.v1alpha2.config.istio.io
istio-system stackdriverresponsesize metric.v1alpha2.config.istio.io
istio-system tcpbytereceived metric.v1alpha2.config.istio.io
istio-system tcpbytesent metric.v1alpha2.config.istio.io
{{< /text >}}
If there are no metric instances configured, you will need to reconfigure
Mixer with the appropriate instance configuration.
<!-- todo replace ([example](https://github.com/istio/istio/blob/master/install/kubernetes/istio.yaml#L727)) -->
1. Verify Mixer configuration resolution is working for your service.
1. Establish a connection to the Mixer self-monitoring endpoint.
Setup a `port-forward` to the Mixer self-monitoring port as described in
[Verify Mixer is receiving Report calls](#verify-mixer-is-receiving-report-calls).
1. On the [Mixer self-monitoring port](http://localhost:9093/metrics), search
for `mixer_config_resolve_count`.
You should find something like:
{{< text plain >}}
mixer_config_resolve_count{error="false",target="details.default.svc.cluster.local"} 56
mixer_config_resolve_count{error="false",target="ingress.istio-system.svc.cluster.local"} 67
mixer_config_resolve_count{error="false",target="mongodb.default.svc.cluster.local"} 18
mixer_config_resolve_count{error="false",target="productpage.default.svc.cluster.local"} 59
mixer_config_resolve_count{error="false",target="ratings.default.svc.cluster.local"} 26
mixer_config_resolve_count{error="false",target="reviews.default.svc.cluster.local"} 54
{{< /text >}}
1. Validate that there are values for `mixer_config_resolve_count` where
`target="<your service>"` and `error="false"`.
If there are only instances where `error="true"` where `target=<your service>`,
there is likely an issue with Mixer configuration for your service. Logs
information is needed to further debug.
In Kubernetes environments, retrieve the Mixer logs via:
{{< text bash >}}
$ kubectl -n istio-system logs <mixer pod> -c mixer
{{< /text >}}
Look for errors related to your configuration or your service in the
returned logs.
More on viewing Mixer configuration can be found [here](/help/faq/mixer/#mixer-self-monitoring)
### Verify Mixer is sending metric instances to the Prometheus adapter
1. Establish a connection to the Mixer self-monitoring endpoint.
Setup a `port-forward` to the Mixer self-monitoring port as described in
[Verify Mixer is receiving Report calls](#verify-mixer-is-receiving-report-calls).
1. On the [Mixer self-monitoring port](http://localhost:9093/metrics), search
for `mixer_adapter_dispatch_count`.
You should find something like:
{{< text plain >}}
mixer_adapter_dispatch_count{adapter="prometheus",error="false",handler="handler.prometheus.istio-system",meshFunction="metric",response_code="OK"} 114
mixer_adapter_dispatch_count{adapter="prometheus",error="true",handler="handler.prometheus.default",meshFunction="metric",response_code="INTERNAL"} 4
mixer_adapter_dispatch_count{adapter="stdio",error="false",handler="handler.stdio.istio-system",meshFunction="logentry",response_code="OK"} 104
{{< /text >}}
1. Validate that there are values for `mixer_adapter_dispatch_count` where
`adapter="prometheus"` and `error="false"`.
If there are are no recorded dispatches to the Prometheus adapter, there
is likely a configuration issue. Please see
[Verify Mixer metrics configuration exists](#verify-mixer-metrics-configuration-exists).
If dispatches to the Prometheus adapter are reporting errors, check the
Mixer logs to determine the source of the error. Most likely, there is a
configuration issue for the handler listed in `mixer_adapter_dispatch_count`.
In Kubernetes environment, check the Mixer logs via:
{{< text bash >}}
$ kubectl -n istio-system logs <mixer pod> -c mixer
{{< /text >}}
Filter for lines including something like `Report 0 returned with: INTERNAL
(1 error occurred:` (with some surrounding context) to find more information
regarding Report dispatch failures.
### Verify Prometheus configuration
1. Connect to the Prometheus UI and verify that it can successfully
scrape Mixer.
In Kubernetes environments, setup port-forwarding as follows:
{{< text bash >}}
$ kubectl -n istio-system port-forward $(kubectl -n istio-system get pod -l app=prometheus -o jsonpath='{.items[0].metadata.name}') 9090:9090 &
{{< /text >}}
1. Visit [http://localhost:9090/targets](http://localhost:9090/targets) and confirm that the target `istio-mesh` has a status of **UP**.
1. Visit [http://localhost:9090/config](http://localhost:9090/config) and confirm that an entry exists that looks like:
{{< text yaml >}}
- job_name: istio-mesh
scrape_interval: 5s
scrape_timeout: 5s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- api_server: null
role: endpoints
namespaces:
names: []
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
separator: ;
regex: istio-system;istio-telemetry;prometheus
replacement: $1
action: keep
{{< /text >}}
## How can I debug issues with the service mesh?
### With `istioctl`
`istioctl` allows you to inspect the current xDS of a given Envoy from its admin interface (locally) or from Pilot using the `proxy-config` or `pc` command.
For example, to retrieve the configured clusters in an Envoy via the admin interface run the following command:
{{< text bash >}}
$ istioctl proxy-config endpoint <pod-name> clusters
{{< /text >}}
To retrieve endpoints for a given pod in the application namespace from Pilot run the following command:
{{< text bash >}}
$ istioctl proxy-config pilot -n application <pod-name> eds
{{< /text >}}
The `proxy-config` command also allows you to retrieve the state of the entire mesh from Pilot using the following command:
{{< text bash >}}
$ istioctl proxy-config pilot mesh ads
{{< /text >}}
### With GDB
To debug Istio with `gdb`, you will need to run the debug images of Envoy / Mixer / Pilot. A recent `gdb` and the golang extensions (for Mixer/Pilot or other golang components) is required.
1. `kubectl exec -it PODNAME -c [proxy | mixer | pilot]`
1. Find process ID: ps ax
1. gdb -p PID binary
1. For go: info goroutines, goroutine x bt
### With Tcpdump
Tcpdump doesn't work in the sidecar pod - the container doesn't run as root. However any other container in the same pod will see all the packets, since the network namespace is shared. `iptables` will also see the pod-wide configuration.
Communication between Envoy and the app happens on 127.0.0.1, and is not encrypted.
## Envoy is crashing under load
Check your `ulimit -a`. Many systems have a 1024 open file descriptor limit by default which will cause Envoy to assert and crash with:
@ -541,7 +266,7 @@ $ kubectl scale --replicas=0 deploy/istio-citadel -n istio-system
This should stop Istio from restarting Envoy and disconnecting TCP connections.
## Envoy process has high CPU usage
## Envoy has high CPU usage
For larger clusters, the default configuration that comes with Istio
refreshes the Envoy configuration every 1 second. This can cause high

View File

@ -0,0 +1,15 @@
---
title: Grafana
description: Dealing with Grafana issues.
weight: 90
---
If you're unable to get Grafana output when connecting from a local web client to Istio remotely hosted, you
should validate the client and server date and time match.
The time of the web client (e.g. Chrome) affects the output from Grafana. A simple solution
to this problem is to verify a time synchronization service is running correctly within the
Kubernetes cluster and the web client machine also is correctly using a time synchronization
service. Some common time synchronization systems are NTP and Chrony. This is especially
problematic in engineering labs with firewalls. In these scenarios, NTP may not be configured
properly to point at the lab-based NTP services.

View File

@ -0,0 +1,185 @@
---
title: Missing Metrics
description:
weight: 10
---
The procedures below help you diagnose problems where metrics
you are expecting to see reported and not being collected.
The expected flow for metrics is:
1. Envoy reports attributes from requests asynchronously to Mixer in a batch.
1. Mixer translates the attributes into instances based on the operator-provided configuration.
1. Mixer hands the instances to Mixer adapters for processing and backend storage.
1. The backend storage systems record the metrics data.
The Mixer default installations include a Prometheus adapter and the configuration to generate a [default set of metric values](https://istio.io/docs/reference/config/policy-and-telemetry/metrics/) and send them to the Prometheus adapter. The Prometheus adapter configuration enables a Prometheus instance to scrape Mixer for metrics.
If the Istio Dashboard or the Prometheus queries dont show the expected metrics, any step of the flow above may present an issue. The following sections provide instructions to troubleshoot each step.
## Verify Mixer is receiving Report calls
Mixer generates metrics to monitor its own behavior. The first step is to check these metrics:
1. Establish a connection to the Mixer self-monitoring endpoint for the istio-telemetry deployment. In Kubernetes environments, execute the following command:
{{< text bash >}}
$ kubectl -n istio-system port-forward <istio-telemetry pod> 9093 &
{{< /text >}}
1. Verify successful report calls. On the Mixer self-monitoring endpoint, search for `grpc_server_handled_total`. You should see something like:
{{< text plain >}}
grpc_server_handled_total{grpc_code="OK",grpc_method="Report",grpc_service="istio.mixer.v1.Mixer",grpc_type="unary"} 68
{{< /text >}}
If you do not see any data for `grpc_server_handled_total` with a `grpc_method="Report"`, then Envoy is not calling Mixer to report telemetry.
1. In this case, ensure you integrated the services properly into the mesh. You can achieve this task with either [automatic or manual sidecar injection](/docs/setup/kubernetes/sidecar-injection/).
## Verify the Mixer rules exist
In Kubernetes environments, issue the following command:
{{< text bash >}}
$ kubectl get rules --all-namespaces
NAMESPACE NAME AGE
istio-system kubeattrgenrulerule 13d
istio-system promhttp 13d
istio-system promtcp 13d
istio-system stdio 13d
istio-system tcpkubeattrgenrulerule 13d
{{< /text >}}
If the output shows no rules named `promhttp` or `promtcp`, then the Mixer configuration for sending metric instances to the Prometheus adapter is missing. You must supply the configuration for rules connecting the Mixer metric instances to a Prometheus handler.
For reference, please consult the [default rules for Prometheus]({{< github_file >}}/install/kubernetes/helm/istio/charts/mixer/templates/config.yaml).
## Verify the Prometheus handler configuration exists
1. In Kubernetes environments, issue the following command:
{{< text bash >}}
$ kubectl get prometheuses.config.istio.io --all-namespaces
NAMESPACE NAME AGE
istio-system handler 13d
{{< /text >}}
1. If the output shows no configured Prometheus handlers, you must reconfigure Mixer with the appropriate handler configuration.
For reference, please consult the [default handler configuration for Prometheus]({{< github_file >}}/install/kubernetes/helm/istio/charts/mixer/templates/config.yaml).
## Verify Mixer metric instances configuration exists
1. In Kubernetes environments, issue the following command:
{{< text bash >}}
$ kubectl get metrics.config.istio.io --all-namespaces
NAMESPACE NAME AGE
istio-system requestcount 13d
istio-system requestduration 13d
istio-system requestsize 13d
istio-system responsesize 13d
istio-system tcpbytereceived 13d
istio-system tcpbytesent 13d
{{< /text >}}
1. If the output shows no configured metric instances, you must reconfigure Mixer with the appropriate instance configuration.
For reference, please consult the [default instances configuration for metrics]({{< github_file >}}/install/kubernetes/helm/istio/charts/mixer/templates/config.yaml).
## Verify there are no known configuration errors
1. To establish a connection to the Istio-telemetry self-monitoring endpoint, setup a port-forward to the Istio-telemetry self-monitoring port as described in
[Verify Mixer is receiving Report calls](#verify-mixer-is-receiving-report-calls).
1. For each of the following metrics, verify that the most up-to-date value is 0:
* `mixer_config_adapter_info_config_error_count`
* `mixer_config_handler_validation_error_count`
* `mixer_config_instance_config_error_count`
* `mixer_config_rule_config_error_count`
* `mixer_config_rule_config_match_error_count`
* `mixer_config_unsatisfied_action_handler_count`
* `mixer_handler_handler_build_failure_count`
On the page showing Mixer self-monitoring port, search for each of the metrics listed above. The matching text should look something like
the following (using `mixer_config_instance_config_error_count`):
{{< text plain >}}
mixer_config_rule_config_match_error_count{configID="-1"} 0
mixer_config_rule_config_match_error_count{configID="0"} 0
mixer_config_rule_config_match_error_count{configID="1"} 0</td>
{{< /text >}}
Confirm that the metric value with the largest configuration ID is 0. This will verify that Mixer has generated no errors in processing the configuration as supplied.
## Verify Mixer is sending metric instances to the Prometheus adapter
1. Establish a connection to the istio-telemetry self-monitoring endpoint. Setup a port-forward to the istio-telemetry self-monitoring port as described in
[Verify Mixer is receiving Report calls](#verify-mixer-is-receiving-report-calls).
1. On the Mixer self-monitoring port, search for `mixer_runtime_dispatch_count`. The output should be similar to:
{{< text plain >}}
mixer_runtime_dispatch_count{adapter="prometheus",error="false",handler="handler.prometheus.istio-system",meshFunction="metric"} 916
mixer_runtime_dispatch_count{adapter="prometheus",error="true",handler="handler.prometheus.istio-system",meshFunction="metric"} 0
{{< /text >}}
1. Confirm that `mixer_runtime_dispatch_count` is present with the values:
{{< text plain >}}
adapter="prometheus"
error="false"
{{< /text >}}
If you cant find recorded dispatches to the Prometheus adapter, there is likely a configuration issue. Please follow the steps above
to ensure everything is configured properly.
If the dispatches to the Prometheus adapter report errors, check the Mixer logs to determine the source of the error. The most likely cause is a configuration issue for the handler listed in mixer_runtime_dispatch_count.
1. Check the Mixer logs in a Kubernetes environment with:
{{< text bash >}}
$ kubectl -n istio-system logs <istio-telemetry pod> -c mixer
{{< /text >}}
## Verify Prometheus configuration
1. Connect to the Prometheus UI
1. Verify you can successfully scrape Mixer through the UI.
1. In Kubernetes environments, setup port-forwarding with:
{{< text bash >}}
$ kubectl -n istio-system port-forward $(kubectl -n istio-system get pod -l app=prometheus -o jsonpath='{.items[0].metadata.name}') 9090:9090 &
{{< /text >}}
1. Visit `http://localhost:9090/targets`
1. Confirm the target `istio-mesh` has a status of UP.
1. Visit `http://localhost:9090/config`
1. Confirm an entry exists similar to:
{{< text plain >}}
- job_name: 'istio-mesh'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['istio-mixer.istio-system:42422']</td>
{{< /text >}}