mirror of https://github.com/istio/istio.io.git
605 lines
24 KiB
Markdown
605 lines
24 KiB
Markdown
---
|
||
title: Troubleshooting Guide
|
||
description: Advice on tackling common problems with Istio
|
||
weight: 40
|
||
force_inline_toc: true
|
||
draft: true
|
||
---
|
||
|
||
Below is a list of solutions to common problems.
|
||
|
||
## 503 errors while reconfiguring service routes
|
||
|
||
When setting route rules to direct traffic to specific versions (subsets) of a service, care must be taken to ensure
|
||
that the subsets are available before they are used in the routes. Otherwise, calls to the service may return
|
||
503 errors during a reconfiguration period.
|
||
|
||
Creating both the `VirtualServices` and `DestinationRules` that define the corresponding subsets using a single `istioctl`
|
||
call (e.g., `istioctl create -f myVirtualServiceAndDestinationRule.yaml` is not sufficient because the
|
||
resources propagate (from the configuration server, i.e., Kubernetes API server) to the Pilot instances in an eventually consistent manner. If the
|
||
`VirtualService` using the subsets arrives before the `DestinationRule` where the subsets are defined, the Envoy configuration generated by Pilot would refer to non-existent upstream pools, resulting in HTTP 503 errors until all configuration objects are available to Pilot.
|
||
|
||
To make sure services will have zero down-time when configuring routes with subsets, follow a "make-before-break" process as described below:
|
||
|
||
* When adding new subsets:
|
||
|
||
1. Update `DestinationRules` to add a new subset first, before updating any `VirtualServices` that use it. Apply the rule using istioctl or any platform-specific tooling.
|
||
|
||
1. Wait a few seconds for the `DestinationRule` configuration to propagate to the Envoys
|
||
|
||
1. Update the `VirtualService` to refer to the newly added subsets.
|
||
|
||
* When removing subsets:
|
||
|
||
1. Update `VirtualServices` to remove any references to a subset, before removing the subset from a `DestinationRule`.
|
||
|
||
1. Wait a few seconds for the `VirtualService` configuration to propagate to the Envoys
|
||
|
||
1. Update the `DestinationRule` to remove the unused subsets
|
||
|
||
## Route rules have no effect on ingress gateway requests
|
||
|
||
Let's assume you are using an ingress `Gateway` and corresponding `VirtualSerive` to access an internal service.
|
||
For example, your `VirtualService` looks something like this:
|
||
|
||
{{< text yaml >}}
|
||
apiVersion: networking.istio.io/v1alpha3
|
||
kind: VirtualService
|
||
metadata:
|
||
name: myapp
|
||
spec:
|
||
hosts:
|
||
- "myapp.com" # or maybe "*" if you are testing without DNS using the ingress-gateway IP (e.g., http://1.2.3.4/hello)
|
||
gateways:
|
||
- myapp-gateway
|
||
http:
|
||
- match:
|
||
- uri:
|
||
prefix: /hello
|
||
route:
|
||
- destination:
|
||
host: helloworld.default.svc.cluster.local
|
||
- match:
|
||
...
|
||
{{< /text >}}
|
||
|
||
You also have a `VirtualService` which routes traffic for the helloworld service to a particular subset:
|
||
|
||
{{< text yaml >}}
|
||
apiVersion: networking.istio.io/v1alpha3
|
||
kind: VirtualService
|
||
metadata:
|
||
name: helloworld
|
||
spec:
|
||
hosts:
|
||
- helloworld.default.svc.cluster.local
|
||
http:
|
||
- route:
|
||
- destination:
|
||
host: helloworld.default.svc.cluster.local
|
||
subset: v1
|
||
{{< /text >}}
|
||
|
||
In this situation you will notice that requests to the helloworld service via the ingress gateway will
|
||
not be directed to subset v1 but instead will continue to use default round-robin routing.
|
||
|
||
The ingress requests are using the gateway host (e.g., `myapp.com`)
|
||
which will activate the rules in the myapp `VirtualService` that routes to any endpoint in the helloworld service.
|
||
On the other hand, internal requests with host `helloworld.default.svc.cluster.local` will use the
|
||
helloworld `VirtualService` which directs traffic exclusively to subset v1.
|
||
|
||
To control the traffic from the gateway, you need to include the subset rule in the myapp `VirtualService`:
|
||
|
||
{{< text yaml >}}
|
||
apiVersion: networking.istio.io/v1alpha3
|
||
kind: VirtualService
|
||
metadata:
|
||
name: myapp
|
||
spec:
|
||
hosts:
|
||
- "myapp.com" # or maybe "*" if you are testing without DNS using the ingress-gateway IP (e.g., http://1.2.3.4/hello)
|
||
gateways:
|
||
- myapp-gateway
|
||
http:
|
||
- match:
|
||
- uri:
|
||
prefix: /hello
|
||
route:
|
||
- destination:
|
||
host: helloworld.default.svc.cluster.local
|
||
subset: v1
|
||
- match:
|
||
...
|
||
{{< /text >}}
|
||
|
||
Alternatively, you can combine both `VirtualServices` into one unit if possible:
|
||
|
||
{{< text yaml >}}
|
||
apiVersion: networking.istio.io/v1alpha3
|
||
kind: VirtualService
|
||
metadata:
|
||
name: myapp
|
||
spec:
|
||
hosts:
|
||
- myapp.com # cannot use "*" here since this is being combined with the mesh services
|
||
- helloworld.default.svc.cluster.local
|
||
gateways:
|
||
- mesh # applies internally as well as externally
|
||
- myapp-gateway
|
||
http:
|
||
- match:
|
||
- uri:
|
||
prefix: /hello
|
||
gateways:
|
||
- myapp-gateway #restricts this rule to apply only to ingress gateway
|
||
route:
|
||
- destination:
|
||
host: helloworld.default.svc.cluster.local
|
||
subset: v1
|
||
- match:
|
||
- gateways:
|
||
- mesh # applies to all services inside the mesh
|
||
route:
|
||
- destination:
|
||
host: helloworld.default.svc.cluster.local
|
||
subset: v1
|
||
{{< /text >}}
|
||
|
||
## Route rules have no effect on my application
|
||
|
||
If route rules are working perfectly for the [Bookinfo](/docs/examples/bookinfo/) sample,
|
||
but similar version routing rules have no effect on your own application, it may be that
|
||
your Kubernetes services need to be changed slightly.
|
||
|
||
Kubernetes services must adhere to certain restrictions in order to take advantage of
|
||
Istio's L7 routing features.
|
||
Refer to the [sidecar injection documentation](/docs/setup/kubernetes/sidecar-injection/#pod-spec-requirements)
|
||
for details.
|
||
|
||
## Verifying connectivity to Istio Pilot
|
||
|
||
Verifying connectivity to Pilot is a useful troubleshooting step. Every proxy container in the service mesh should be able to communicate with Pilot. This can be accomplished in a few simple steps:
|
||
|
||
1. Get the name of the Istio Ingress pod:
|
||
|
||
{{< text bash >}}
|
||
$ INGRESS_POD_NAME=$(kubectl get po -n istio-system | grep ingressgateway\- | awk '{print$1}'); echo ${INGRESS_POD_NAME};
|
||
{{< /text >}}
|
||
|
||
1. Exec into the Istio Ingress pod:
|
||
|
||
{{< text bash >}}
|
||
$ kubectl exec -it $INGRESS_POD_NAME -n istio-system /bin/bash
|
||
{{< /text >}}
|
||
|
||
1. Test connectivity to Pilot using cURL. The following example cURL's the v1 registration API using default Pilot configuration parameters and mTLS enabled:
|
||
|
||
{{< text bash >}}
|
||
$ curl -k --cert /etc/certs/cert-chain.pem --cacert /etc/certs/root-cert.pem --key /etc/certs/key.pem https://istio-pilot:15003/v1/registration
|
||
{{< /text >}}
|
||
|
||
If mutual TLS is disabled:
|
||
|
||
{{< text bash >}}
|
||
$ curl http://istio-pilot:15003/v1/registration
|
||
{{< /text >}}
|
||
|
||
You should receive a response listing the "service-key" and "hosts" for each service in the mesh.
|
||
|
||
## No traces appearing in Zipkin when running Istio locally on Mac
|
||
|
||
Istio is installed and everything seems to be working except there are no traces showing up in Zipkin when there
|
||
should be.
|
||
|
||
This may be caused by a known [Docker issue](https://github.com/docker/for-mac/issues/1260) where the time inside
|
||
containers may skew significantly from the time on the host machine. If this is the case,
|
||
when you select a very long date range in Zipkin you will see the traces appearing as much as several days too early.
|
||
|
||
You can also confirm this problem by comparing the date inside a docker container to outside:
|
||
|
||
{{< text bash >}}
|
||
$ docker run --entrypoint date gcr.io/istio-testing/ubuntu-16-04-slave:latest
|
||
Sun Jun 11 11:44:18 UTC 2017
|
||
{{< /text >}}
|
||
|
||
{{< text bash >}}
|
||
$ date -u
|
||
Thu Jun 15 02:25:42 UTC 2017
|
||
{{< /text >}}
|
||
|
||
To fix the problem, you'll need to shutdown and then restart Docker before reinstalling Istio.
|
||
|
||
## Envoy won't connect to my HTTP/1.0 service
|
||
|
||
Envoy requires HTTP/1.1 or HTTP/2 traffic for upstream services. For example, when using [NGINX](https://www.nginx.com/) for serving traffic behind Envoy, you
|
||
will need to set the [proxy_http_version](https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_http_version) directive in your NGINX config to be "1.1", since the NGINX default is 1.0
|
||
|
||
Example config:
|
||
|
||
{{< text plain >}}
|
||
upstream http_backend {
|
||
server 127.0.0.1:8080;
|
||
|
||
keepalive 16;
|
||
}
|
||
|
||
server {
|
||
...
|
||
|
||
location /http/ {
|
||
proxy_pass http://http_backend;
|
||
proxy_http_version 1.1;
|
||
proxy_set_header Connection "";
|
||
...
|
||
}
|
||
}
|
||
{{< /text >}}
|
||
|
||
## No Grafana output when connecting from a local web client to Istio remotely hosted
|
||
|
||
Validate the client and server date and time match.
|
||
|
||
The time of the web client (e.g. Chrome) affects the output from Grafana. A simple solution
|
||
to this problem is to verify a time synchronization service is running correctly within the
|
||
Kubernetes cluster and the web client machine also is correctly using a time synchronization
|
||
service. Some common time synchronization systems are NTP and Chrony. This is especially
|
||
problematic is engineering labs with firewalls. In these scenarios, NTP may not be configured
|
||
properly to point at the lab-based NTP services.
|
||
|
||
## Where are the metrics for my service?
|
||
|
||
The expected flow of metrics is:
|
||
|
||
1. Envoy reports attributes to Mixer in batch (asynchronously from requests)
|
||
1. Mixer translates the attributes from Mixer into instances based on
|
||
operator-provided configuration.
|
||
1. The instances are handed to Mixer adapters for processing and backend storage.
|
||
1. The backend storage systems record metrics data.
|
||
|
||
The default installations of Mixer ship with a [Prometheus](https://prometheus.io/)
|
||
adapter, as well as configuration for generating a basic set of metric
|
||
values and sending them to the Prometheus adapter. The
|
||
[Prometheus add-on](/docs/tasks/telemetry/querying-metrics/#about-the-prometheus-add-on)
|
||
also supplies configuration for an instance of Prometheus to scrape
|
||
Mixer for metrics.
|
||
|
||
If you do not see the expected metrics in the Istio Dashboard and/or via
|
||
Prometheus queries, there may be an issue at any of the steps in the flow
|
||
listed above. Below is a set of instructions to troubleshoot each of
|
||
those steps.
|
||
|
||
### Verify Mixer is receiving Report calls
|
||
|
||
Mixer generates metrics for monitoring the behavior of Mixer itself.
|
||
Check these metrics.
|
||
|
||
1. Establish a connection to the Mixer self-monitoring endpoint.
|
||
|
||
In Kubernetes environments, execute the following command:
|
||
|
||
{{< text bash >}}
|
||
$ kubectl -n istio-system port-forward <mixer pod> 9093 &
|
||
{{< /text >}}
|
||
|
||
1. Verify successful report calls.
|
||
|
||
On the [Mixer self-monitoring endpoint](http://localhost:9093/metrics),
|
||
search for `grpc_server_handled_total`.
|
||
|
||
You should see something like:
|
||
|
||
{{< text plain >}}
|
||
grpc_server_handled_total{grpc_code="OK",grpc_method="Report",grpc_service="istio.mixer.v1.Mixer",grpc_type="unary"} 68
|
||
{{< /text >}}
|
||
|
||
If you do not see any data for `grpc_server_handled_total` with a
|
||
`grpc_method="Report"`, then Mixer is not being called by Envoy to report
|
||
telemetry. In this case, ensure that the services have been properly
|
||
integrated into the mesh (either by via
|
||
[automatic](/docs/setup/kubernetes/sidecar-injection/#automatic-sidecar-injection)
|
||
or [manual](/docs/setup/kubernetes/sidecar-injection/#manual-sidecar-injection) sidecar injection).
|
||
|
||
### Verify Mixer metrics configuration exists
|
||
|
||
1. Verify Mixer rules exist.
|
||
|
||
In Kubernetes environments, issue the following command:
|
||
|
||
{{< text bash >}}
|
||
$ kubectl get rules --all-namespaces
|
||
NAMESPACE NAME KIND
|
||
istio-system promhttp rule.v1alpha2.config.istio.io
|
||
istio-system promtcp rule.v1alpha2.config.istio.io
|
||
istio-system stdio rule.v1alpha2.config.istio.io
|
||
{{< /text >}}
|
||
|
||
If you do not see anything named `promhttp` or `promtcp`, then there is
|
||
no Mixer configuration for sending metric instances to a Prometheus adapter.
|
||
You will need to supply configuration for rules that connect Mixer metric
|
||
instances to a Prometheus handler.
|
||
<!-- todo replace ([example](https://github.com/istio/istio/blob/master/install/kubernetes/istio.yaml#L892)). -->
|
||
|
||
1. Verify Prometheus handler config exists.
|
||
|
||
In Kubernetes environments, issue the following command:
|
||
|
||
{{< text bash >}}
|
||
$ kubectl get prometheuses.config.istio.io --all-namespaces
|
||
NAMESPACE NAME KIND
|
||
istio-system handler prometheus.v1alpha2.config.istio.io
|
||
{{< /text >}}
|
||
|
||
If there are no prometheus handlers configured, you will need to reconfigure
|
||
Mixer with the appropriate handler configuration.
|
||
<!-- todo replace ([example](https://github.com/istio/istio/blob/master/install/kubernetes/istio.yaml#L819)) -->
|
||
|
||
1. Verify Mixer metric instances config exists.
|
||
|
||
In Kubernetes environments, issue the following command:
|
||
|
||
{{< text bash >}}
|
||
$ kubectl get metrics.config.istio.io --all-namespaces
|
||
NAMESPACE NAME KIND
|
||
istio-system requestcount metric.v1alpha2.config.istio.io
|
||
istio-system requestduration metric.v1alpha2.config.istio.io
|
||
istio-system requestsize metric.v1alpha2.config.istio.io
|
||
istio-system responsesize metric.v1alpha2.config.istio.io
|
||
istio-system stackdriverrequestcount metric.v1alpha2.config.istio.io
|
||
istio-system stackdriverrequestduration metric.v1alpha2.config.istio.io
|
||
istio-system stackdriverrequestsize metric.v1alpha2.config.istio.io
|
||
istio-system stackdriverresponsesize metric.v1alpha2.config.istio.io
|
||
istio-system tcpbytereceived metric.v1alpha2.config.istio.io
|
||
istio-system tcpbytesent metric.v1alpha2.config.istio.io
|
||
{{< /text >}}
|
||
|
||
If there are no metric instances configured, you will need to reconfigure
|
||
Mixer with the appropriate instance configuration.
|
||
<!-- todo replace ([example](https://github.com/istio/istio/blob/master/install/kubernetes/istio.yaml#L727)) -->
|
||
|
||
1. Verify Mixer configuration resolution is working for your service.
|
||
|
||
1. Establish a connection to the Mixer self-monitoring endpoint.
|
||
|
||
Setup a `port-forward` to the Mixer self-monitoring port as described in
|
||
[Verify Mixer is receiving Report calls](#verify-mixer-is-receiving-report-calls).
|
||
|
||
1. On the [Mixer self-monitoring port](http://localhost:9093/metrics), search
|
||
for `mixer_config_resolve_count`.
|
||
|
||
You should find something like:
|
||
|
||
{{< text plain >}}
|
||
mixer_config_resolve_count{error="false",target="details.default.svc.cluster.local"} 56
|
||
mixer_config_resolve_count{error="false",target="ingress.istio-system.svc.cluster.local"} 67
|
||
mixer_config_resolve_count{error="false",target="mongodb.default.svc.cluster.local"} 18
|
||
mixer_config_resolve_count{error="false",target="productpage.default.svc.cluster.local"} 59
|
||
mixer_config_resolve_count{error="false",target="ratings.default.svc.cluster.local"} 26
|
||
mixer_config_resolve_count{error="false",target="reviews.default.svc.cluster.local"} 54
|
||
{{< /text >}}
|
||
|
||
1. Validate that there are values for `mixer_config_resolve_count` where
|
||
`target="<your service>"` and `error="false"`.
|
||
|
||
If there are only instances where `error="true"` where `target=<your service>`,
|
||
there is likely an issue with Mixer configuration for your service. Logs
|
||
information is needed to further debug.
|
||
|
||
In Kubernetes environments, retrieve the Mixer logs via:
|
||
|
||
{{< text bash >}}
|
||
$ kubectl -n istio-system logs <mixer pod> -c mixer
|
||
{{< /text >}}
|
||
|
||
Look for errors related to your configuration or your service in the
|
||
returned logs.
|
||
|
||
More on viewing Mixer configuration can be found [here](/help/faq/mixer/#mixer-self-monitoring)
|
||
|
||
### Verify Mixer is sending metric instances to the Prometheus adapter
|
||
|
||
1. Establish a connection to the Mixer self-monitoring endpoint.
|
||
|
||
Setup a `port-forward` to the Mixer self-monitoring port as described in
|
||
[Verify Mixer is receiving Report calls](#verify-mixer-is-receiving-report-calls).
|
||
|
||
1. On the [Mixer self-monitoring port](http://localhost:9093/metrics), search
|
||
for `mixer_adapter_dispatch_count`.
|
||
|
||
You should find something like:
|
||
|
||
{{< text plain >}}
|
||
mixer_adapter_dispatch_count{adapter="prometheus",error="false",handler="handler.prometheus.istio-system",meshFunction="metric",response_code="OK"} 114
|
||
mixer_adapter_dispatch_count{adapter="prometheus",error="true",handler="handler.prometheus.default",meshFunction="metric",response_code="INTERNAL"} 4
|
||
mixer_adapter_dispatch_count{adapter="stdio",error="false",handler="handler.stdio.istio-system",meshFunction="logentry",response_code="OK"} 104
|
||
{{< /text >}}
|
||
|
||
1. Validate that there are values for `mixer_adapter_dispatch_count` where
|
||
`adapter="prometheus"` and `error="false"`.
|
||
|
||
If there are are no recorded dispatches to the Prometheus adapter, there
|
||
is likely a configuration issue. Please see
|
||
[Verify Mixer metrics configuration exists](#verify-mixer-metrics-configuration-exists).
|
||
|
||
If dispatches to the Prometheus adapter are reporting errors, check the
|
||
Mixer logs to determine the source of the error. Most likely, there is a
|
||
configuration issue for the handler listed in `mixer_adapter_dispatch_count`.
|
||
|
||
In Kubernetes environment, check the Mixer logs via:
|
||
|
||
{{< text bash >}}
|
||
$ kubectl -n istio-system logs <mixer pod> -c mixer
|
||
{{< /text >}}
|
||
|
||
Filter for lines including something like `Report 0 returned with: INTERNAL
|
||
(1 error occurred:` (with some surrounding context) to find more information
|
||
regarding Report dispatch failures.
|
||
|
||
### Verify Prometheus configuration
|
||
|
||
1. Connect to the Prometheus UI and verify that it can successfully
|
||
scrape Mixer.
|
||
|
||
In Kubernetes environments, setup port-forwarding as follows:
|
||
|
||
{{< text bash >}}
|
||
$ kubectl -n istio-system port-forward $(kubectl -n istio-system get pod -l app=prometheus -o jsonpath='{.items[0].metadata.name}') 9090:9090 &
|
||
{{< /text >}}
|
||
|
||
1. Visit [http://localhost:9090/targets](http://localhost:9090/targets) and confirm that the target `istio-mesh` has a status of **UP**.
|
||
|
||
1. Visit [http://localhost:9090/config](http://localhost:9090/config) and confirm that an entry exists that looks like:
|
||
|
||
{{< text yaml >}}
|
||
- job_name: istio-mesh
|
||
scrape_interval: 5s
|
||
scrape_timeout: 5s
|
||
metrics_path: /metrics
|
||
scheme: http
|
||
kubernetes_sd_configs:
|
||
- api_server: null
|
||
role: endpoints
|
||
namespaces:
|
||
names: []
|
||
relabel_configs:
|
||
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
|
||
separator: ;
|
||
regex: istio-system;istio-telemetry;prometheus
|
||
replacement: $1
|
||
action: keep
|
||
{{< /text >}}
|
||
|
||
## How can I debug issues with the service mesh?
|
||
|
||
### With istioctl
|
||
|
||
Istioctl allows you to inspect the current xDS of a given Envoy from its admin interface (locally) or from Pilot using the `proxy-config` or `pc` command.
|
||
|
||
For example, to retrieve the configured clusters in an Envoy via the admin interface run the following command:
|
||
|
||
{{< text bash >}}
|
||
$ istioctl proxy-config endpoint <pod-name> clusters
|
||
{{< /text >}}
|
||
|
||
To retrieve endpoints for a given pod in the application namespace from Pilot run the following command:
|
||
|
||
{{< text bash >}}
|
||
$ istioctl proxy-config pilot -n application <pod-name> eds
|
||
{{< /text >}}
|
||
|
||
The `proxy-config` command also allows you to retrieve the state of the entire mesh from Pilot using the following command:
|
||
|
||
{{< text bash >}}
|
||
$ istioctl proxy-config pilot mesh ads
|
||
{{< /text >}}
|
||
|
||
### With GDB
|
||
|
||
To debug Istio with `gdb`, you will need to run the debug images of Envoy / Mixer / Pilot. A recent `gdb` and the golang extensions (for Mixer/Pilot or other golang components) is required.
|
||
|
||
1. `kubectl exec -it PODNAME -c [proxy | mixer | pilot]`
|
||
|
||
1. Find process ID: ps ax
|
||
|
||
1. gdb -p PID binary
|
||
|
||
1. For go: info goroutines, goroutine x bt
|
||
|
||
### With Tcpdump
|
||
|
||
Tcpdump doesn't work in the sidecar pod - the container doesn't run as root. However any other container in the same pod will see all the packets, since the network namespace is shared. `iptables` will also see the pod-wide config.
|
||
|
||
Communication between Envoy and the app happens on 127.0.0.1, and is not encrypted.
|
||
|
||
## Envoy is crashing under load
|
||
|
||
Check your `ulimit -a`. Many systems have a 1024 open file descriptor limit by default which will cause Envoy to assert and crash with:
|
||
|
||
{{< text plain >}}
|
||
[2017-05-17 03:00:52.735][14236][critical][assert] assert failure: fd_ != -1: external/envoy/source/common/network/connection_impl.cc:58
|
||
{{< /text >}}
|
||
|
||
Make sure to raise your ulimit. Example: `ulimit -n 16384`
|
||
|
||
## Headless TCP services losing connection from Istiofied containers
|
||
|
||
If `istio-citadel` is deployed, Envoy is restarted every 15 minutes to refresh certificates.
|
||
This causes the disconnection of TCP streams or long-running connections between services.
|
||
|
||
You should build resilience into your application for this type of
|
||
disconnect, but if you still want to prevent the disconnects from
|
||
happening, you will need to disable mutual TLS and the `istio-citadel` deployment.
|
||
|
||
First, edit your `istio` config to disable mutual TLS
|
||
|
||
{{< text bash >}}
|
||
$ kubectl edit configmap -n istio-system istio
|
||
$ kubectl delete pods -n istio-system -l istio=pilot
|
||
{{< /text >}}
|
||
|
||
Next, scale down the `istio-citadel` deployment to disable Envoy restarts.
|
||
|
||
{{< text bash >}}
|
||
$ kubectl scale --replicas=0 deploy/istio-citadel -n istio-system
|
||
{{< /text >}}
|
||
|
||
This should stop Istio from restarting Envoy and disconnecting TCP connections.
|
||
|
||
## Envoy Process High CPU Usage
|
||
|
||
For larger clusters, the default configuration that comes with Istio
|
||
refreshes the Envoy configuration every 1 second. This can cause high
|
||
CPU usage, even when Envoy isn't doing anything. In order to bring the
|
||
CPU usage down for larger deployments, increase the refresh interval for
|
||
Envoy to something higher, like 30 seconds.
|
||
|
||
{{< text bash >}}
|
||
$ kubectl edit configmap -n istio-system istio
|
||
$ kubectl delete pods -n istio-system -l istio=pilot
|
||
{{< /text >}}
|
||
|
||
Also make sure to reinject the sidecar into all of your pods, as
|
||
their configuration needs to be updated as well.
|
||
|
||
Afterwards, you should see CPU usage fall back to 0-1% while idling.
|
||
Make sure to tune these values for your specific deployment.
|
||
|
||
*Warning:*: Changes created by routing rules will take up to 2x refresh interval to propagate to the sidecars.
|
||
While the larger refresh interval will reduce CPU usage, updates caused by routing rules may cause a period
|
||
of HTTP 404s (up to 2x the refresh interval) until the Envoy sidecars get all relevant configuration.
|
||
|
||
## Automatic sidecar injection will fail if the kube-apiserver has proxy settings
|
||
|
||
When the Kube-apiserver included proxy settings such as:
|
||
|
||
{{< text yaml >}}
|
||
env:
|
||
- name: http_proxy
|
||
value: http://proxy-wsa.esl.foo.com:80
|
||
- name: https_proxy
|
||
value: http://proxy-wsa.esl.foo.com:80
|
||
- name: no_proxy
|
||
value: 127.0.0.1,localhost,dockerhub.foo.com,devhub-docker.foo.com,10.84.100.125,10.84.100.126,10.84.100.127
|
||
{{< /text >}}
|
||
|
||
The sidecar injection would fail. The only related failure logs was in the kube-apiserver log:
|
||
|
||
{{< text plain >}}
|
||
W0227 21:51:03.156818 1 admission.go:257] Failed calling webhook, failing open sidecar-injector.istio.io: failed calling admission webhook "sidecar-injector.istio.io": Post https://istio-sidecar-injector.istio-system.svc:443/inject: Service Unavailable
|
||
{{< /text >}}
|
||
|
||
Make sure both pod and service CIDRs are not proxied according to *_proxy variables. Check the kube-apiserver files and logs to verify the configuration and whether any requests are being proxied.
|
||
|
||
A workaround is to remove the proxy settings from the kube-apiserver manifest and restart the server or use a later version of Kubernetes.
|
||
|
||
An issue was filed with Kubernetes related to this and has since been closed. [https://github.com/kubernetes/kubeadm/issues/666](https://github.com/kubernetes/kubeadm/issues/666)
|
||
[https://github.com/kubernetes/kubernetes/pull/58698#discussion_r163879443](https://github.com/kubernetes/kubernetes/pull/58698#discussion_r163879443)
|
||
|
||
## What Envoy version is istio using?
|
||
|
||
To find out the envoy version, you can follow below steps:
|
||
|
||
1. `kubectl exec -it PODNAME -c istio-proxy -n NAMESPACE /bin/bash`
|
||
|
||
1. `curl localhost:15000/server_info`
|
||
|