Reorganize traffic managment ops guide (#2669)

* Reorganize traffic managment ops guide

* fix header

* fix circleci issues
This commit is contained in:
Frank Budinsky 2018-09-25 09:22:52 -04:00 committed by GitHub
parent edfdf7d795
commit 28fba53f88
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
8 changed files with 528 additions and 538 deletions

View File

@ -40,7 +40,7 @@ configuration by intentionally "tripping" the circuit breaker.
1. Create a [destination rule](/docs/reference/config/istio.networking.v1alpha3/#DestinationRule) to apply circuit breaking settings
when calling the `httpbin` service:
> If you installed/configured Istio with mutual TLS Authentication enabled, you must add a TLS traffic policy `mode: ISTIO_MUTUAL` to the `DestinationRule` before applying it. Otherwise requests will generate 503 errors as described [here](/help/ops/traffic-management/deploy-guidelines/#503-errors-after-setting-destination-rule).
> If you installed/configured Istio with mutual TLS Authentication enabled, you must add a TLS traffic policy `mode: ISTIO_MUTUAL` to the `DestinationRule` before applying it. Otherwise requests will generate 503 errors as described [here](/help/ops/traffic-management/troubleshooting/#503-errors-after-setting-destination-rule).
{{< text bash >}}
$ kubectl apply -f - <<EOF

View File

@ -125,7 +125,7 @@ In this step, you will change that behavior so that all traffic goes to `v1`.
1. Create a default route rule to route all traffic to `v1` of the service:
> If you installed/configured Istio with mutual TLS Authentication enabled, you must add a TLS traffic policy `mode: ISTIO_MUTUAL` to the `DestinationRule` before applying it. Otherwise requests will generate 503 errors as described [here](/help/ops/traffic-management/deploy-guidelines/#503-errors-after-setting-destination-rule).
> If you installed/configured Istio with mutual TLS Authentication enabled, you must add a TLS traffic policy `mode: ISTIO_MUTUAL` to the `DestinationRule` before applying it. Otherwise requests will generate 503 errors as described [here](/help/ops/traffic-management/troubleshooting/#503-errors-after-setting-destination-rule).
{{< text bash >}}
$ kubectl apply -f - <<EOF

View File

@ -56,7 +56,7 @@ To retrieve information about endpoint configuration for the Envoy instance in a
$ istioctl proxy-config endpoints <pod-name> [flags]
{{< /text >}}
See [Debugging Envoy and Pilot](/help/ops/traffic-management/proxy-cmd/) for more advice on interpreting this information.
See [Observing Traffic Management Configuration](/help/ops/traffic-management/observing/) for more advice on interpreting this information.
## With GDB

View File

@ -1,7 +1,7 @@
---
title: Deployment and Configuration Guidelines
description: Provides specific deployment and configuration guidelines.
weight: 5
weight: 20
---
This section provides specific deployment or configuration guidelines to avoid networking or traffic management issues.
@ -133,7 +133,7 @@ spec:
The downside of this kind of configuration is that other configuration (e.g., route rules) for any of the
underlying microservices, will need to also be included in this single configuration file, instead of
in separate resources associated with, and potentially owned by, the individual service teams.
See [Route rules have no effect on ingress gateway requests](#route-rules-have-no-effect-on-ingress-gateway-requests)
See [Route rules have no effect on ingress gateway requests](/help/ops/traffic-management/troubleshooting/#route-rules-have-no-effect-on-ingress-gateway-requests)
for details.
To avoid this problem, it may be preferable to break up the configuration of `myapp.com` into several
@ -204,36 +204,7 @@ A `DestinationRule` can also be fragmented with similar merge semantic and restr
Any following top-level `trafficPolicy` configuration is discarded.
1. Unlike virtual service merging, destination rule merging works in both sidecars and gateways.
## 503 errors after setting destination rule
If requests to a service immediately start generating HTTP 503 errors after you applied a `DestinationRule`
and the errors continue until you remove or revert the `DestinationRule`, then the `DestinationRule` is probably
causing a TLS conflict for the service.
For example, if you configure mutual TLS in the cluster globally, the `DestinationRule` must include the following `trafficPolicy`:
{{< text yaml >}}
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
{{< /text >}}
Otherwise, the mode defaults to `DISABLED` causing client proxy sidecars to make plain HTTP requests
instead of TLS encrypted requests. Thus, the requests conflict with the server proxy because the server proxy expects
encrypted requests.
To confirm there is a conflict, check whether the `STATUS` field in the output of the `istioctl authn tls-check` command
is set to `CONFLICT` for your service. For example:
{{< text bash >}}
$ istioctl authn tls-check httpbin.default.svc.cluster.local
HOST:PORT STATUS SERVER CLIENT AUTHN POLICY DESTINATION RULE
httpbin.default.svc.cluster.local:8000 CONFLICT mTLS HTTP default/ httpbin/default
{{< /text >}}
Whenever you apply a `DestinationRule`, ensure the `trafficPolicy` TLS mode matches the global server configuration.
## 503 errors while reconfiguring service routes
## Avoid 503 errors while reconfiguring service routes
When setting route rules to direct traffic to specific versions (subsets) of a service, care must be taken to ensure
that the subsets are available before they are used in the routes. Otherwise, calls to the service may return
@ -261,172 +232,3 @@ To make sure services will have zero down-time when configuring routes with subs
1. Wait a few seconds for the `VirtualService` configuration to propagate to the Envoy sidecars.
1. Update the `DestinationRule` to remove the unused subsets.
## Route rules have no effect on ingress gateway requests
Let's assume you are using an ingress `Gateway` and corresponding `VirtualService` to access an internal service.
For example, your `VirtualService` looks something like this:
{{< text yaml >}}
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- "myapp.com" # or maybe "*" if you are testing without DNS using the ingress-gateway IP (e.g., http://1.2.3.4/hello)
gateways:
- myapp-gateway
http:
- match:
- uri:
prefix: /hello
route:
- destination:
host: helloworld.default.svc.cluster.local
- match:
...
{{< /text >}}
You also have a `VirtualService` which routes traffic for the helloworld service to a particular subset:
{{< text yaml >}}
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: helloworld
spec:
hosts:
- helloworld.default.svc.cluster.local
http:
- route:
- destination:
host: helloworld.default.svc.cluster.local
subset: v1
{{< /text >}}
In this situation you will notice that requests to the helloworld service via the ingress gateway will
not be directed to subset v1 but instead will continue to use default round-robin routing.
The ingress requests are using the gateway host (e.g., `myapp.com`)
which will activate the rules in the myapp `VirtualService` that routes to any endpoint in the helloworld service.
Internal requests with the host `helloworld.default.svc.cluster.local` will use the
helloworld `VirtualService` which directs traffic exclusively to subset v1.
To control the traffic from the gateway, you need to include the subset rule in the myapp `VirtualService`:
{{< text yaml >}}
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- "myapp.com" # or maybe "*" if you are testing without DNS using the ingress-gateway IP (e.g., http://1.2.3.4/hello)
gateways:
- myapp-gateway
http:
- match:
- uri:
prefix: /hello
route:
- destination:
host: helloworld.default.svc.cluster.local
subset: v1
- match:
...
{{< /text >}}
Alternatively, you can combine both `VirtualServices` into one unit if possible:
{{< text yaml >}}
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp.com # cannot use "*" here since this is being combined with the mesh services
- helloworld.default.svc.cluster.local
gateways:
- mesh # applies internally as well as externally
- myapp-gateway
http:
- match:
- uri:
prefix: /hello
gateways:
- myapp-gateway #restricts this rule to apply only to ingress gateway
route:
- destination:
host: helloworld.default.svc.cluster.local
subset: v1
- match:
- gateways:
- mesh # applies to all services inside the mesh
route:
- destination:
host: helloworld.default.svc.cluster.local
subset: v1
{{< /text >}}
## Route rules have no effect on my application
If route rules are working perfectly for the [Bookinfo](/docs/examples/bookinfo/) sample,
but similar version routing rules have no effect on your own application, it may be that
your Kubernetes services need to be changed slightly.
Kubernetes services must adhere to certain restrictions in order to take advantage of
Istio's L7 routing features.
Refer to the [Requirements for Pods and Services](/docs/setup/kubernetes/spec-requirements)
for details.
## Envoy won't connect to my HTTP/1.0 service
Envoy requires `HTTP/1.1` or `HTTP/2` traffic for upstream services. For example, when using [NGINX](https://www.nginx.com/) for serving traffic behind Envoy, you
will need to set the [proxy_http_version](https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_http_version) directive in your NGINX configuration to be "1.1", since the NGINX default is 1.0.
Example configuration:
{{< text plain >}}
upstream http_backend {
server 127.0.0.1:8080;
keepalive 16;
}
server {
...
location /http/ {
proxy_pass http://http_backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
...
}
}
{{< /text >}}
## Headless TCP services losing connection
If `istio-citadel` is deployed, Envoy is restarted every 15 minutes to refresh certificates.
This causes the disconnection of TCP streams or long-running connections between services.
You should build resilience into your application for this type of
disconnect, but if you still want to prevent the disconnects from
happening, you will need to disable mutual TLS and the `istio-citadel` deployment.
First, edit your `istio` configuration to disable mutual TLS:
{{< text bash >}}
$ kubectl edit configmap -n istio-system istio
$ kubectl delete pods -n istio-system -l istio=pilot
{{< /text >}}
Next, scale down the `istio-citadel` deployment to disable Envoy restarts:
{{< text bash >}}
$ kubectl scale --replicas=0 deploy/istio-citadel -n istio-system
{{< /text >}}
This should stop Istio from restarting Envoy and disconnecting TCP connections.

View File

@ -1,7 +1,7 @@
---
title: Introduction to Network Operations
description: An introduction to Istio networking operational aspects.
weight: 5
weight: 10
---
This section is intended as a guide to operators of an Istio based
deployment. It will provide information an operator of a Istio deployment

View File

@ -1,28 +1,312 @@
---
title: Observing Traffic Management
description: Describes tools and techniques to observe traffic management or issues related to traffic management.
weight: 5
title: Observing Traffic Management Configuration
description: Describes tools and techniques to diagnose configuration issues related to traffic management.
weight: 30
keywords: [debug,proxy,status,config,pilot,envoy]
---
## Envoy is crashing under load
Istio provides two very valuable commands to help diagnose traffic management configuration problems,
the [`proxy-status`](/docs/reference/commands/istioctl/#istioctl-proxy-status)
and [`proxy-config`](/docs/reference/commands/istioctl/#istioctl-proxy-config) commands. The `proxy-status` command
allows you to get an overview of your mesh and identify the proxy causing the problem. Then `proxy-config` can be used
to inspect Envoy configuration and diagnose the issue.
Check your `ulimit -a`. Many systems have a 1024 open file descriptor limit by default which will cause Envoy to assert and crash with:
If you want to try the commands yourself, you can either:
{{< text plain >}}
[2017-05-17 03:00:52.735][14236][critical][assert] assert failure: fd_ != -1: external/envoy/source/common/network/connection_impl.cc:58
* Have a Kubernetes cluster with Istio and Bookinfo installed (e.g use `istio.yaml` as described in
[installation steps](/docs/setup/kubernetes/quick-start/#installation-steps) and
[Bookinfo installation steps](/docs/examples/bookinfo/#if-you-are-running-on-kubernetes)).
OR
* Use similar commands against your own application running in a Kubernetes cluster.
## Get an overview of your mesh
The `proxy-status` command allows you to get an overview of your mesh. If you suspect one of your sidecars isn't
receiving configuration or is out of sync then `proxy-status` will tell you this.
{{< text bash >}}
$ istioctl proxy-status
PROXY CDS LDS EDS RDS PILOT
details-v1-6dcc6fbb9d-wsjz4.default SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-75bdf98789-tfdvh
istio-egressgateway-c49694485-l9d5l.istio-system SYNCED SYNCED SYNCED (100%) NOT SENT istio-pilot-75bdf98789-tfdvh
istio-ingress-6458b8c98f-7ks48.istio-system SYNCED SYNCED SYNCED (100%) NOT SENT istio-pilot-75bdf98789-n2kqh
istio-ingressgateway-7d6874b48f-qxhn5.istio-system SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-75bdf98789-n2kqh
productpage-v1-6c886ff494-hm7zk.default SYNCED SYNCED SYNCED (100%) STALE istio-pilot-75bdf98789-n2kqh
ratings-v1-5d9ff497bb-gslng.default SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-75bdf98789-n2kqh
reviews-v1-55d4c455db-zjj2m.default SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-75bdf98789-n2kqh
reviews-v2-686bbb668-99j76.default SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-75bdf98789-tfdvh
reviews-v3-7b9b5fdfd6-4r52s.default SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-75bdf98789-n2kqh
{{< /text >}}
Make sure to raise your ulimit. Example: `ulimit -n 16384`
If a proxy is missing from this list it means that it is not currently connected to a Pilot instance so will not be
receiving any configuration.
## Why is creating a weighted route rule to split traffic between two versions of a service not working as expected?
* `SYNCED` means that Envoy has acknowledged the last configuration Pilot has sent to it.
* `SYNCED (100%)` means that Envoy has successfully synced all of the endpoints in the cluster.
* `NOT SENT` means that Pilot hasn't sent anything to Envoy. This usually is because Pilot has nothing to send.
* `STALE` means that Pilot has sent an update to Envoy but has not received an acknowledgement. This usually indicates
a networking issue between Envoy and Pilot or a bug with Istio itself.
For the current Envoy sidecar implementation, up to 100 requests may be required for the desired distribution to be observed.
## Retrieve diffs between Envoy and Istio Pilot
## How come some of my route rules don't take effect immediately?
The `proxy-status` command can also be used to retrieve a diff between the configuration Envoy has loaded and the
configuration Pilot would send, by providing a proxy ID. This can help you determine exactly what is out of sync and
where the issue may lie.
The Istio implementation on Kubernetes utilizes an eventually consistent
algorithm to ensure all Envoy sidecars have the correct configuration
including all route rules. A configuration change will take some time
to propagate to all the sidecars. With large deployments the
propagation will take longer and there maybe a lag time on the
order of seconds.
{{< text bash json >}}
$ istioctl proxy-status details-v1-6dcc6fbb9d-wsjz4.default
--- Pilot Clusters
+++ Envoy Clusters
@@ -374,36 +374,14 @@
"edsClusterConfig": {
"edsConfig": {
"ads": {
}
},
"serviceName": "outbound|443||public-cr0bdc785ce3f14722918080a97e1f26be-alb1.kube-system.svc.cluster.local"
- },
- "connectTimeout": "1.000s",
- "circuitBreakers": {
- "thresholds": [
- {
-
- }
- ]
- }
- }
- },
- {
- "cluster": {
- "name": "outbound|53||kube-dns.kube-system.svc.cluster.local",
- "type": "EDS",
- "edsClusterConfig": {
- "edsConfig": {
- "ads": {
-
- }
- },
- "serviceName": "outbound|53||kube-dns.kube-system.svc.cluster.local"
},
"connectTimeout": "1.000s",
"circuitBreakers": {
"thresholds": [
{
}
Listeners Match
Routes Match
{{< /text >}}
Here you can see that the listeners and routes match but the clusters are out of sync.
## Deep dive into Envoy configuration
The `proxy-config` command can be used to see how a given Envoy instance is configured. This can then be used to
pinpoint any issues you are unable to detect by just looking through your Istio configuration and custom resources.
To get a basic summary of clusters, listeners or routes for a given pod use the command as follows (changing clusters
for listeners or routes when required):
{{< text bash >}}
$ istioctl proxy-config clusters -n istio-system istio-ingressgateway-7d6874b48f-qxhn5
SERVICE FQDN PORT SUBSET DIRECTION TYPE
BlackHoleCluster - - - STATIC
details.default.svc.cluster.local 9080 - outbound EDS
heapster.kube-system.svc.cluster.local 80 - outbound EDS
istio-citadel.istio-system.svc.cluster.local 8060 - outbound EDS
istio-citadel.istio-system.svc.cluster.local 9093 - outbound EDS
istio-egressgateway.istio-system.svc.cluster.local 80 - outbound EDS
...
{{< /text >}}
In order to debug Envoy you need to understand Envoy clusters/listeners/routes/endpoints and how they all interact.
We will use the `proxy-config` command with the `-o json` and filtering flags to follow Envoy as it determines where
to send a request from the `productpage` pod to the `reviews` pod at `reviews:9080`.
1. If you query the listener summary on a pod you will notice Istio generates the following listeners:
* A listener on `0.0.0.0:15001` that receives all traffic into and out of the pod, then hands the request over to
a virtual listener.
* A virtual listener per service IP, per each non-HTTP for outbound TCP/HTTPS traffic.
* A virtual listener on the pod IP for each exposed port for inbound traffic.
* A virtual listener on `0.0.0.0` per each HTTP port for outbound HTTP traffic.
{{< text bash >}}
$ istioctl proxy-config listeners productpage-v1-6c886ff494-7vxhs
ADDRESS PORT TYPE
172.21.252.250 15005 TCP <--+
172.21.252.250 15011 TCP |
172.21.79.56 42422 TCP |
172.21.160.5 443 TCP |
172.21.157.6 443 TCP |
172.21.117.222 443 TCP |
172.21.0.10 53 TCP |
172.21.126.131 443 TCP | Receives outbound non-HTTP traffic for relevant IP:PORT pair from listener `0.0.0.0_15001`
172.21.160.5 31400 TCP |
172.21.81.159 9102 TCP |
172.21.0.1 443 TCP |
172.21.126.131 80 TCP |
172.21.119.8 443 TCP |
172.21.112.64 80 TCP |
172.21.179.54 443 TCP |
172.21.165.197 443 TCP <--+
0.0.0.0 9090 HTTP <-+
0.0.0.0 8060 HTTP |
0.0.0.0 15010 HTTP |
0.0.0.0 15003 HTTP |
0.0.0.0 15004 HTTP |
0.0.0.0 9093 HTTP | Receives outbound HTTP traffic for relevant port from listener `0.0.0.0_15001`
0.0.0.0 15007 HTTP |
0.0.0.0 8080 HTTP |
0.0.0.0 9091 HTTP |
0.0.0.0 9080 HTTP |
0.0.0.0 80 HTTP <-+
0.0.0.0 15001 TCP // Receives all inbound and outbound traffic to the pod from IP tables and hands over to virtual listener
172.30.164.190 9080 HTTP // Receives all inbound traffic on 9080 from listener `0.0.0.0_15001`
{{< /text >}}
1. From the above summary you can see that every sidecar has a listener bound to `0.0.0.0:15001` which is where
IP tables routes all inbound and outbound pod traffic to. This listener has `useOriginalDst` set to true which means
it hands the request over to the listener that best matches the original destination of the request.
If it can't find any matching virtual listeners it sends the request to the `BlackHoleCluster` which returns a 404.
{{< text bash json >}}
$ istioctl proxy-config listeners productpage-v1-6c886ff494-7vxhs --port 15001 -o json
{
"name": "virtual",
"address": {
"socketAddress": {
"address": "0.0.0.0",
"portValue": 15001
}
},
"filterChains": [
{
"filters": [
{
"name": "envoy.tcp_proxy",
"config": {
"cluster": "BlackHoleCluster",
"stat_prefix": "BlackHoleCluster"
}
}
]
}
],
"useOriginalDst": true
}
{{< /text >}}
1. Our request is an outbound HTTP request to port `9080` this means it gets handed off to the `0.0.0.0:9080` virtual
listener. This listener then looks up the route configuration in its configured RDS. In this case it will be looking
up route `9080` in RDS configured by Pilot (via ADS).
{{< text bash json >}}
$ istioctl proxy-config listeners productpage-v1-6c886ff494-7vxhs -o json --address 0.0.0.0 --port 9080
...
"rds": {
"config_source": {
"ads": {}
},
"route_config_name": "9080"
}
...
{{< /text >}}
1. The `9080` route configuration only has a virtual host for each service. Our request is heading to the reviews
service so Envoy will select the virtual host to which our request matches a domain. Once matched on domain Envoy
looks for the first route that matches the request. In this case we don't have any advanced routing so there is only
one route that matches on everything. This route tells Envoy to send the request to the
`outbound|9080||reviews.default.svc.cluster.local` cluster.
{{< text bash json >}}
$ istioctl proxy-config routes productpage-v1-6c886ff494-7vxhs --name 9080 -o json
[
{
"name": "9080",
"virtualHosts": [
{
"name": "reviews.default.svc.cluster.local:9080",
"domains": [
"reviews.default.svc.cluster.local",
"reviews.default.svc.cluster.local:9080",
"reviews",
"reviews:9080",
"reviews.default.svc.cluster",
"reviews.default.svc.cluster:9080",
"reviews.default.svc",
"reviews.default.svc:9080",
"reviews.default",
"reviews.default:9080",
"172.21.152.34",
"172.21.152.34:9080"
],
"routes": [
{
"match": {
"prefix": "/"
},
"route": {
"cluster": "outbound|9080||reviews.default.svc.cluster.local",
"timeout": "0.000s"
},
...
{{< /text >}}
1. This cluster is configured to retrieve the associated endpoints from Pilot (via ADS). So Envoy will then use the
`serviceName` field as a key to look up the list of Endpoints and proxy the request to one of them.
{{< text bash json >}}
$ istioctl proxy-config clusters productpage-v1-6c886ff494-7vxhs --fqdn reviews.default.svc.cluster.local -o json
[
{
"name": "outbound|9080||reviews.default.svc.cluster.local",
"type": "EDS",
"edsClusterConfig": {
"edsConfig": {
"ads": {}
},
"serviceName": "outbound|9080||reviews.default.svc.cluster.local"
},
"connectTimeout": "1.000s",
"circuitBreakers": {
"thresholds": [
{}
]
}
}
]
{{< /text >}}
1. To see the endpoints currently available for this cluster use the `proxy-config` endpoints command.
{{< text bash json >}}
$ istioctl proxy-config endpoints productpage-v1-6c886ff494-7vxhs --cluster outbound|9080||reviews.default.svc.cluster.local
ENDPOINT STATUS CLUSTER
172.17.0.17:9080 HEALTHY outbound|9080||reviews.default.svc.cluster.local
172.17.0.18:9080 HEALTHY outbound|9080||reviews.default.svc.cluster.local
172.17.0.5:9080 HEALTHY outbound|9080||reviews.default.svc.cluster.local
{{< /text >}}
## Inspecting Bootstrap configuration
So far we have looked at configuration retrieved (mostly) from Pilot, however Envoy requires some bootstrap configuration that
includes information like where Pilot can be found. To view this use the following command:
{{< text bash json >}}
$ istioctl proxy-config bootstrap -n istio-system istio-ingressgateway-7d6874b48f-qxhn5
{
"bootstrap": {
"node": {
"id": "router~172.30.86.14~istio-ingressgateway-7d6874b48f-qxhn5.istio-system~istio-system.svc.cluster.local",
"cluster": "istio-ingressgateway",
"metadata": {
"POD_NAME": "istio-ingressgateway-7d6874b48f-qxhn5",
"istio": "sidecar"
},
"buildVersion": "0/1.8.0-dev//RELEASE"
},
...
{{< /text >}}

View File

@ -1,311 +0,0 @@
---
title: Debugging Envoy and Pilot
description: Demonstrates how to debug Pilot and Envoy.
weight: 5
keywords: [debug,proxy,status,config,pilot,envoy]
---
This task demonstrates how to use the [`proxy-status`](/docs/reference/commands/istioctl/#istioctl-proxy-status)
and [`proxy-config`](/docs/reference/commands/istioctl/#istioctl-proxy-config) commands. The `proxy-status` command
allows you to get an overview of your mesh and identify the proxy causing the problem. Then `proxy-config` can be used
to inspect Envoy configuration and diagnose the issue.
## Before you begin
* Have a Kubernetes cluster with Istio and Bookinfo installed (e.g use `istio.yaml` as described in
[installation steps](/docs/setup/kubernetes/quick-start/#installation-steps) and
[Bookinfo installation steps](/docs/examples/bookinfo/#if-you-are-running-on-kubernetes)).
OR
* Use similar commands against your own application running in a Kubernetes cluster.
## Get an overview of your mesh
The `proxy-status` command allows you to get an overview of your mesh. If you suspect one of your sidecars isn't
receiving configuration or is out of sync then `proxy-status` will tell you this.
{{< text bash >}}
$ istioctl proxy-status
PROXY CDS LDS EDS RDS PILOT
details-v1-6dcc6fbb9d-wsjz4.default SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-75bdf98789-tfdvh
istio-egressgateway-c49694485-l9d5l.istio-system SYNCED SYNCED SYNCED (100%) NOT SENT istio-pilot-75bdf98789-tfdvh
istio-ingress-6458b8c98f-7ks48.istio-system SYNCED SYNCED SYNCED (100%) NOT SENT istio-pilot-75bdf98789-n2kqh
istio-ingressgateway-7d6874b48f-qxhn5.istio-system SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-75bdf98789-n2kqh
productpage-v1-6c886ff494-hm7zk.default SYNCED SYNCED SYNCED (100%) STALE istio-pilot-75bdf98789-n2kqh
ratings-v1-5d9ff497bb-gslng.default SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-75bdf98789-n2kqh
reviews-v1-55d4c455db-zjj2m.default SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-75bdf98789-n2kqh
reviews-v2-686bbb668-99j76.default SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-75bdf98789-tfdvh
reviews-v3-7b9b5fdfd6-4r52s.default SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-75bdf98789-n2kqh
{{< /text >}}
If a proxy is missing from this list it means that it is not currently connected to a Pilot instance so will not be
receiving any configuration.
* `SYNCED` means that Envoy has acknowledged the last configuration Pilot has sent to it.
* `SYNCED (100%)` means that Envoy has successfully synced all of the endpoints in the cluster.
* `NOT SENT` means that Pilot hasn't sent anything to Envoy. This usually is because Pilot has nothing to send.
* `STALE` means that Pilot has sent an update to Envoy but has not received an acknowledgement. This usually indicates
a networking issue between Envoy and Pilot or a bug with Istio itself.
## Retrieve diffs between Envoy and Istio Pilot
The `proxy-status` command can also be used to retrieve a diff between the configuration Envoy has loaded and the
configuration Pilot would send, by providing a proxy ID. This can help you determine exactly what is out of sync and
where the issue may lie.
{{< text bash json >}}
$ istioctl proxy-status details-v1-6dcc6fbb9d-wsjz4.default
--- Pilot Clusters
+++ Envoy Clusters
@@ -374,36 +374,14 @@
"edsClusterConfig": {
"edsConfig": {
"ads": {
}
},
"serviceName": "outbound|443||public-cr0bdc785ce3f14722918080a97e1f26be-alb1.kube-system.svc.cluster.local"
- },
- "connectTimeout": "1.000s",
- "circuitBreakers": {
- "thresholds": [
- {
-
- }
- ]
- }
- }
- },
- {
- "cluster": {
- "name": "outbound|53||kube-dns.kube-system.svc.cluster.local",
- "type": "EDS",
- "edsClusterConfig": {
- "edsConfig": {
- "ads": {
-
- }
- },
- "serviceName": "outbound|53||kube-dns.kube-system.svc.cluster.local"
},
"connectTimeout": "1.000s",
"circuitBreakers": {
"thresholds": [
{
}
Listeners Match
Routes Match
{{< /text >}}
Here you can see that the listeners and routes match but the clusters are out of sync.
## Deep dive into Envoy configuration
The `proxy-config` command can be used to see how a given Envoy instance is configured. This can then be used to
pinpoint any issues you are unable to detect by just looking through your Istio configuration and custom resources.
To get a basic summary of clusters, listeners or routes for a given pod use the command as follows (changing clusters
for listeners or routes when required):
{{< text bash >}}
$ istioctl proxy-config clusters -n istio-system istio-ingressgateway-7d6874b48f-qxhn5
SERVICE FQDN PORT SUBSET DIRECTION TYPE
BlackHoleCluster - - - STATIC
details.default.svc.cluster.local 9080 - outbound EDS
heapster.kube-system.svc.cluster.local 80 - outbound EDS
istio-citadel.istio-system.svc.cluster.local 8060 - outbound EDS
istio-citadel.istio-system.svc.cluster.local 9093 - outbound EDS
istio-egressgateway.istio-system.svc.cluster.local 80 - outbound EDS
...
{{< /text >}}
In order to debug Envoy you need to understand Envoy clusters/listeners/routes/endpoints and how they all interact.
We will use the `proxy-config` command with the `-o json` and filtering flags to follow Envoy as it determines where
to send a request from the `productpage` pod to the `reviews` pod at `reviews:9080`.
1. If you query the listener summary on a pod you will notice Istio generates the following listeners:
* A listener on `0.0.0.0:15001` that receives all traffic into and out of the pod, then hands the request over to
a virtual listener.
* A virtual listener per service IP, per each non-HTTP for outbound TCP/HTTPS traffic.
* A virtual listener on the pod IP for each exposed port for inbound traffic.
* A virtual listener on `0.0.0.0` per each HTTP port for outbound HTTP traffic.
{{< text bash >}}
$ istioctl proxy-config listeners productpage-v1-6c886ff494-7vxhs
ADDRESS PORT TYPE
172.21.252.250 15005 TCP <--+
172.21.252.250 15011 TCP |
172.21.79.56 42422 TCP |
172.21.160.5 443 TCP |
172.21.157.6 443 TCP |
172.21.117.222 443 TCP |
172.21.0.10 53 TCP |
172.21.126.131 443 TCP | Receives outbound non-HTTP traffic for relevant IP:PORT pair from listener `0.0.0.0_15001`
172.21.160.5 31400 TCP |
172.21.81.159 9102 TCP |
172.21.0.1 443 TCP |
172.21.126.131 80 TCP |
172.21.119.8 443 TCP |
172.21.112.64 80 TCP |
172.21.179.54 443 TCP |
172.21.165.197 443 TCP <--+
0.0.0.0 9090 HTTP <-+
0.0.0.0 8060 HTTP |
0.0.0.0 15010 HTTP |
0.0.0.0 15003 HTTP |
0.0.0.0 15004 HTTP |
0.0.0.0 9093 HTTP | Receives outbound HTTP traffic for relevant port from listener `0.0.0.0_15001`
0.0.0.0 15007 HTTP |
0.0.0.0 8080 HTTP |
0.0.0.0 9091 HTTP |
0.0.0.0 9080 HTTP |
0.0.0.0 80 HTTP <-+
0.0.0.0 15001 TCP // Receives all inbound and outbound traffic to the pod from IP tables and hands over to virtual listener
172.30.164.190 9080 HTTP // Receives all inbound traffic on 9080 from listener `0.0.0.0_15001`
{{< /text >}}
1. From the above summary you can see that every sidecar has a listener bound to `0.0.0.0:15001` which is where
IP tables routes all inbound and outbound pod traffic to. This listener has `useOriginalDst` set to true which means
it hands the request over to the listener that best matches the original destination of the request.
If it can't find any matching virtual listeners it sends the request to the `BlackHoleCluster` which returns a 404.
{{< text bash json >}}
$ istioctl proxy-config listeners productpage-v1-6c886ff494-7vxhs --port 15001 -o json
{
"name": "virtual",
"address": {
"socketAddress": {
"address": "0.0.0.0",
"portValue": 15001
}
},
"filterChains": [
{
"filters": [
{
"name": "envoy.tcp_proxy",
"config": {
"cluster": "BlackHoleCluster",
"stat_prefix": "BlackHoleCluster"
}
}
]
}
],
"useOriginalDst": true
}
{{< /text >}}
1. Our request is an outbound HTTP request to port `9080` this means it gets handed off to the `0.0.0.0:9080` virtual
listener. This listener then looks up the route configuration in its configured RDS. In this case it will be looking
up route `9080` in RDS configured by Pilot (via ADS).
{{< text bash json >}}
$ istioctl proxy-config listeners productpage-v1-6c886ff494-7vxhs -o json --address 0.0.0.0 --port 9080
...
"rds": {
"config_source": {
"ads": {}
},
"route_config_name": "9080"
}
...
{{< /text >}}
1. The `9080` route configuration only has a virtual host for each service. Our request is heading to the reviews
service so Envoy will select the virtual host to which our request matches a domain. Once matched on domain Envoy
looks for the first route that matches the request. In this case we don't have any advanced routing so there is only
one route that matches on everything. This route tells Envoy to send the request to the
`outbound|9080||reviews.default.svc.cluster.local` cluster.
{{< text bash json >}}
$ istioctl proxy-config routes productpage-v1-6c886ff494-7vxhs --name 9080 -o json
[
{
"name": "9080",
"virtualHosts": [
{
"name": "reviews.default.svc.cluster.local:9080",
"domains": [
"reviews.default.svc.cluster.local",
"reviews.default.svc.cluster.local:9080",
"reviews",
"reviews:9080",
"reviews.default.svc.cluster",
"reviews.default.svc.cluster:9080",
"reviews.default.svc",
"reviews.default.svc:9080",
"reviews.default",
"reviews.default:9080",
"172.21.152.34",
"172.21.152.34:9080"
],
"routes": [
{
"match": {
"prefix": "/"
},
"route": {
"cluster": "outbound|9080||reviews.default.svc.cluster.local",
"timeout": "0.000s"
},
...
{{< /text >}}
1. This cluster is configured to retrieve the associated endpoints from Pilot (via ADS). So Envoy will then use the
`serviceName` field as a key to look up the list of Endpoints and proxy the request to one of them.
{{< text bash json >}}
$ istioctl proxy-config clusters productpage-v1-6c886ff494-7vxhs --fqdn reviews.default.svc.cluster.local -o json
[
{
"name": "outbound|9080||reviews.default.svc.cluster.local",
"type": "EDS",
"edsClusterConfig": {
"edsConfig": {
"ads": {}
},
"serviceName": "outbound|9080||reviews.default.svc.cluster.local"
},
"connectTimeout": "1.000s",
"circuitBreakers": {
"thresholds": [
{}
]
}
}
]
{{< /text >}}
1. To see the endpoints currently available for this cluster use the `proxy-config` endpoints command.
{{< text bash json >}}
$ istioctl proxy-config endpoints productpage-v1-6c886ff494-7vxhs --cluster outbound|9080||reviews.default.svc.cluster.local
ENDPOINT STATUS CLUSTER
172.17.0.17:9080 HEALTHY outbound|9080||reviews.default.svc.cluster.local
172.17.0.18:9080 HEALTHY outbound|9080||reviews.default.svc.cluster.local
172.17.0.5:9080 HEALTHY outbound|9080||reviews.default.svc.cluster.local
{{< /text >}}
## Inspecting Bootstrap configuration
So far we have looked at configuration retrieved (mostly) from Pilot, however Envoy requires some bootstrap configuration that
includes information like where Pilot can be found. To view this use the following command:
{{< text bash json >}}
$ istioctl proxy-config bootstrap -n istio-system istio-ingressgateway-7d6874b48f-qxhn5
{
"bootstrap": {
"node": {
"id": "router~172.30.86.14~istio-ingressgateway-7d6874b48f-qxhn5.istio-system~istio-system.svc.cluster.local",
"cluster": "istio-ingressgateway",
"metadata": {
"POD_NAME": "istio-ingressgateway-7d6874b48f-qxhn5",
"istio": "sidecar"
},
"buildVersion": "0/1.8.0-dev//RELEASE"
},
...
{{< /text >}}

View File

@ -1,11 +1,226 @@
---
title: Troubleshooting Networking Issues
description: Describes tools and techniques that can be used to root cause networking issues.
weight: 5
description: Describes common networking issues and how to recognize and avoid them.
weight: 40
---
* Migrate content from old troubleshooting guide here
This section describes common problems and tools and techniques to observe issues related to traffic management.
* Provide a few general procedures that should be followed to isolate
## Route rules don't seem to affect traffic flow
* Describe high level isolation steps and things to check.
With the current Envoy sidecar implementation, up to 100 requests may be required for weighted
version distribution to be observed.
If route rules are working perfectly for the [Bookinfo](/docs/examples/bookinfo/) sample,
but similar version routing rules have no effect on your own application, it may be that
your Kubernetes services need to be changed slightly.
Kubernetes services must adhere to certain restrictions in order to take advantage of
Istio's L7 routing features.
Refer to the [Requirements for Pods and Services](/docs/setup/kubernetes/spec-requirements)
for details.
Another potential issue is that the route rules may simply be slow to take effect.
The Istio implementation on Kubernetes utilizes an eventually consistent
algorithm to ensure all Envoy sidecars have the correct configuration
including all route rules. A configuration change will take some time
to propagate to all the sidecars. With large deployments the
propagation will take longer and there may be a lag time on the
order of seconds.
## 503 errors after setting destination rule
If requests to a service immediately start generating HTTP 503 errors after you applied a `DestinationRule`
and the errors continue until you remove or revert the `DestinationRule`, then the `DestinationRule` is probably
causing a TLS conflict for the service.
For example, if you configure mutual TLS in the cluster globally, the `DestinationRule` must include the following `trafficPolicy`:
{{< text yaml >}}
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
{{< /text >}}
Otherwise, the mode defaults to `DISABLED` causing client proxy sidecars to make plain HTTP requests
instead of TLS encrypted requests. Thus, the requests conflict with the server proxy because the server proxy expects
encrypted requests.
To confirm there is a conflict, check whether the `STATUS` field in the output of the `istioctl authn tls-check` command
is set to `CONFLICT` for your service. For example:
{{< text bash >}}
$ istioctl authn tls-check httpbin.default.svc.cluster.local
HOST:PORT STATUS SERVER CLIENT AUTHN POLICY DESTINATION RULE
httpbin.default.svc.cluster.local:8000 CONFLICT mTLS HTTP default/ httpbin/default
{{< /text >}}
Whenever you apply a `DestinationRule`, ensure the `trafficPolicy` TLS mode matches the global server configuration.
## Route rules have no effect on ingress gateway requests
Let's assume you are using an ingress `Gateway` and corresponding `VirtualService` to access an internal service.
For example, your `VirtualService` looks something like this:
{{< text yaml >}}
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- "myapp.com" # or maybe "*" if you are testing without DNS using the ingress-gateway IP (e.g., http://1.2.3.4/hello)
gateways:
- myapp-gateway
http:
- match:
- uri:
prefix: /hello
route:
- destination:
host: helloworld.default.svc.cluster.local
- match:
...
{{< /text >}}
You also have a `VirtualService` which routes traffic for the helloworld service to a particular subset:
{{< text yaml >}}
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: helloworld
spec:
hosts:
- helloworld.default.svc.cluster.local
http:
- route:
- destination:
host: helloworld.default.svc.cluster.local
subset: v1
{{< /text >}}
In this situation you will notice that requests to the helloworld service via the ingress gateway will
not be directed to subset v1 but instead will continue to use default round-robin routing.
The ingress requests are using the gateway host (e.g., `myapp.com`)
which will activate the rules in the myapp `VirtualService` that routes to any endpoint in the helloworld service.
Internal requests with the host `helloworld.default.svc.cluster.local` will use the
helloworld `VirtualService` which directs traffic exclusively to subset v1.
To control the traffic from the gateway, you need to include the subset rule in the myapp `VirtualService`:
{{< text yaml >}}
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- "myapp.com" # or maybe "*" if you are testing without DNS using the ingress-gateway IP (e.g., http://1.2.3.4/hello)
gateways:
- myapp-gateway
http:
- match:
- uri:
prefix: /hello
route:
- destination:
host: helloworld.default.svc.cluster.local
subset: v1
- match:
...
{{< /text >}}
Alternatively, you can combine both `VirtualServices` into one unit if possible:
{{< text yaml >}}
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp.com # cannot use "*" here since this is being combined with the mesh services
- helloworld.default.svc.cluster.local
gateways:
- mesh # applies internally as well as externally
- myapp-gateway
http:
- match:
- uri:
prefix: /hello
gateways:
- myapp-gateway #restricts this rule to apply only to ingress gateway
route:
- destination:
host: helloworld.default.svc.cluster.local
subset: v1
- match:
- gateways:
- mesh # applies to all services inside the mesh
route:
- destination:
host: helloworld.default.svc.cluster.local
subset: v1
{{< /text >}}
## Headless TCP services losing connection
If `istio-citadel` is deployed, Envoy is restarted every 15 minutes to refresh certificates.
This causes the disconnection of TCP streams or long-running connections between services.
You should build resilience into your application for this type of
disconnect, but if you still want to prevent the disconnects from
happening, you will need to disable mutual TLS and the `istio-citadel` deployment.
First, edit your `istio` configuration to disable mutual TLS:
{{< text bash >}}
$ kubectl edit configmap -n istio-system istio
$ kubectl delete pods -n istio-system -l istio=pilot
{{< /text >}}
Next, scale down the `istio-citadel` deployment to disable Envoy restarts:
{{< text bash >}}
$ kubectl scale --replicas=0 deploy/istio-citadel -n istio-system
{{< /text >}}
This should stop Istio from restarting Envoy and disconnecting TCP connections.
## Envoy is crashing under load
Check your `ulimit -a`. Many systems have a 1024 open file descriptor limit by default which will cause Envoy to assert and crash with:
{{< text plain >}}
[2017-05-17 03:00:52.735][14236][critical][assert] assert failure: fd_ != -1: external/envoy/source/common/network/connection_impl.cc:58
{{< /text >}}
Make sure to raise your ulimit. Example: `ulimit -n 16384`
## Envoy won't connect to my HTTP/1.0 service
Envoy requires `HTTP/1.1` or `HTTP/2` traffic for upstream services. For example, when using [NGINX](https://www.nginx.com/) for serving traffic behind Envoy, you
will need to set the [proxy_http_version](https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_http_version) directive in your NGINX configuration to be "1.1", since the NGINX default is 1.0.
Example configuration:
{{< text plain >}}
upstream http_backend {
server 127.0.0.1:8080;
keepalive 16;
}
server {
...
location /http/ {
proxy_pass http://http_backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
...
}
}
{{< /text >}}