mirror of https://github.com/knative/docs.git
124 lines
5.1 KiB
Markdown
124 lines
5.1 KiB
Markdown
# Investigating Performance Issues
|
|
|
|
You deployed your application or function to Knative Serving but its performance
|
|
doesn't meet your expectations. Knative Serving provides various dashboards and
|
|
tools to help investigate such issues. This document reviews these dashboards
|
|
and tools.
|
|
|
|
## Request metrics
|
|
|
|
Start your investigation with the "Revision - HTTP Requests" dashboard.
|
|
|
|
1. To open this dashboard, open the Grafana UI as described in
|
|
[Accessing Metrics](./accessing-metrics.md) and navigate to "Knative
|
|
Serving - Revision HTTP Requests".
|
|
|
|
1. Select your configuration and revision from the menu on top left of the
|
|
page. You will see a page like this:
|
|
|
|

|
|
|
|
This dashboard gives visibility into the following for each revision:
|
|
|
|
- Request volume
|
|
- Request volume per HTTP response code
|
|
- Response time
|
|
- Response time per HTTP response code
|
|
- Request and response sizes
|
|
|
|
This dashboard can show traffic volume or latency discrepancies between
|
|
different revisions. If, for example, a revision's latency is higher than others
|
|
revisions, then focus your investigation on the offending revision through the
|
|
rest of this guide.
|
|
|
|
## Request traces
|
|
|
|
Next, look into request traces to find out where the time is spent for a single
|
|
request.
|
|
|
|
1. To access request traces, open the Zipkin UI as described in
|
|
[Accessing Traces](./accessing-traces.md).
|
|
|
|
1. Select your revision from the "Service Name" dropdown, and then click the
|
|
"Find Traces" button. You'll get a view that looks like this:
|
|
|
|

|
|
|
|
In this example, you can see that the request spent most of its time in the
|
|
[span](https://github.com/opentracing/specification/blob/master/specification.md#the-opentracing-data-model)
|
|
right before the last, so focus your investigation on that specific span.
|
|
|
|
1. Click that span to see a view like the following:
|
|
|
|

|
|
|
|
This view shows detailed information about the specific span, such as the
|
|
micro service or external URL that was called. In this example, the call to
|
|
a Grafana URL is taking the most time. Focus your investigation on why that
|
|
URL is taking that long.
|
|
|
|
## Autoscaler metrics
|
|
|
|
If request metrics or traces do not show any obvious hot spots, or if they show
|
|
that most of the time is spent in your own code, look at autoscaler metrics
|
|
next.
|
|
|
|
1. To open the autoscaler dashboard, open Grafana UI and select "Knative
|
|
Serving - Autoscaler" dashboard, which looks like this:
|
|
|
|

|
|
|
|
This view shows 4 key metrics from the Knative Serving autoscaler:
|
|
|
|
- Actual pod count: # of pods that are running a given revision
|
|
- Desired pod count: # of pods that autoscaler thinks should serve the revision
|
|
- Requested pod count: # of pods that the autoscaler requested from Kubernetes
|
|
- Panic mode: If 0, the autoscaler is operating in
|
|
[stable mode](https://github.com/knative/serving/blob/master/docs/scaling/DEVELOPMENT.md#stable-mode).
|
|
If 1, the autoscaler is operating in
|
|
[panic mode](https://github.com/knative/serving/blob/master/docs/scaling/DEVELOPMENT.md#panic-mode).
|
|
|
|
A large gap between the actual pod count and the requested pod count indicates
|
|
that the Kubernetes cluster is unable to keep up allocating new resources fast
|
|
enough, or that the Kubernetes cluster is out of requested resources.
|
|
|
|
A large gap between the requested pod count and the desired pod count indicates
|
|
that the Knative Serving autoscaler is unable to communicate with the Kubernetes
|
|
master to make the request.
|
|
|
|
In the preceding example, the autoscaler requested 18 pods to optimally serve
|
|
the traffic but was only granted 8 pods because the cluster is out of resources.
|
|
|
|
## CPU and memory usage
|
|
|
|
You can access total CPU and memory usage of your revision from the "Knative
|
|
Serving - Revision CPU and Memory Usage" dashboard, which looks like this:
|
|
|
|

|
|
|
|
The first chart shows rate of the CPU usage across all pods serving the
|
|
revision. The second chart shows total memory consumed across all pods serving
|
|
the revision. Both of these metrics are further divided into per container
|
|
usage.
|
|
|
|
- user-container: This container runs the user code (application, function, or
|
|
container).
|
|
- [istio-proxy](https://github.com/istio/proxy): Sidecar container to form an
|
|
[Istio](https://istio.io/docs/concepts/what-is-istio/overview.html) mesh.
|
|
- queue-proxy: Knative Serving owned sidecar container to enforce request
|
|
concurrency limits.
|
|
- autoscaler: Knative Serving owned sidecar container to provide autoscaling for
|
|
the revision.
|
|
- fluentd-proxy: Sidecar container to collect logs from /var/log.
|
|
|
|
## Profiling
|
|
|
|
...To be filled...
|
|
|
|
---
|
|
|
|
Except as otherwise noted, the content of this page is licensed under the
|
|
[Creative Commons Attribution 4.0 License](https://creativecommons.org/licenses/by/4.0/),
|
|
and code samples are licensed under the
|
|
[Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
|