--- title: Troubleshooting Multicluster description: Describes tools and techniques to diagnose issues with multicluster and multi-network installations. weight: 90 keywords: [debug,multicluster,multi-network,envoy] owner: istio/wg-environments-maintainers test: no --- This page describes how to troubleshoot issues with Istio deployed to multiple clusters and/or networks. Before reading this, you should take the steps in [Multicluster Installation](/docs/setup/install/multicluster/) and read the [Deployment Models](/docs/ops/deployment/deployment-models/) guide. ## Cross-Cluster Load Balancing The most common, but also broad problem with multi-network installations is that cross-cluster load balancing doesn’t work. Usually this manifests itself as only seeing responses from the cluster-local instance of a Service: {{< text bash >}} $ for i in $(seq 10); do kubectl --context=$CTX_CLUSTER1 -n sample exec sleep-dd98b5f48-djwdw -c sleep -- curl -s helloworld:5000/hello; done Hello version: v1, instance: helloworld-v1-578dd69f69-j69pf Hello version: v1, instance: helloworld-v1-578dd69f69-j69pf Hello version: v1, instance: helloworld-v1-578dd69f69-j69pf ... {{< /text >}} When following the guide to [verify multicluster installation](/docs/setup/install/multicluster/verify/) we would expect both `v1` and `v2` responses, indicating traffic is going to both clusters. There are many possible causes to the problem: ### Connectivity and firewall issues In some environments it may not be apparent that a firewall is blocking traffic between your clusters. It's possible that `ICMP` (ping) traffic may succeed, but HTTP and other types of traffic do not. This can appear as a timeout, or in some cases a more confusing error such as: {{< text plain >}} upstream connect error or disconnect/reset before headers. reset reason: local reset, transport failure reason: TLS error: 268435612:SSL routines:OPENSSL_internal:HTTP_REQUEST {{< /text >}} While Istio provides service discovery capabilities to make it easier, cross-cluster traffic should still succeed if pods in each cluster are on a single network without Istio. To rule out issues with TLS/mTLS, you can do a manual traffic test using pods without Istio sidecars. In each cluster, create a new namespace for this test. Do _not_ enable sidecar injection: {{< text bash >}} $ kubectl create --context="${CTX_CLUSTER1}" namespace uninjected-sample $ kubectl create --context="${CTX_CLUSTER2}" namespace uninjected-sample {{< /text >}} Then deploy the same apps used in [verify multicluster installation](/docs/setup/install/multicluster/verify/): {{< text bash >}} $ kubectl apply --context="${CTX_CLUSTER1}" \ -f samples/helloworld/helloworld.yaml \ -l service=helloworld -n uninjected-sample $ kubectl apply --context="${CTX_CLUSTER2}" \ -f samples/helloworld/helloworld.yaml \ -l service=helloworld -n uninjected-sample $ kubectl apply --context="${CTX_CLUSTER1}" \ -f samples/helloworld/helloworld.yaml \ -l version=v1 -n uninjected-sample $ kubectl apply --context="${CTX_CLUSTER2}" \ -f samples/helloworld/helloworld.yaml \ -l version=v2 -n uninjected-sample $ kubectl apply --context="${CTX_CLUSTER1}" \ -f samples/sleep/sleep.yaml -n uninjected-sample $ kubectl apply --context="${CTX_CLUSTER2}" \ -f samples/sleep/sleep.yaml -n uninjected-sample {{< /text >}} Verify that there is a helloworld pod running in `cluster2`, using the `-o wide` flag, so we can get the Pod IP: {{< text bash >}} $ kubectl --context="${CTX_CLUSTER2}" -n uninjected-sample get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES helloworld-v2-54df5f84b-z28p5 1/1 Running 0 43s 10.100.0.1 node-1 sleep-557747455f-jdsd8 1/1 Running 0 41s 10.100.0.2 node-2 {{< /text >}} Take note of the `IP` column for `helloworld`. In this case, it is `10.100.0.1`: {{< text bash >}} $ REMOTE_POD_IP=10.100.0.1 {{< /text >}} Next, attempt to send traffic from the `sleep` pod in `cluster1` directly to this Pod IP: {{< text bash >}} $ kubectl exec --context="${CTX_CLUSTER1}" -n uninjected-sample -c sleep \ "$(kubectl get pod --context="${CTX_CLUSTER1}" -n uninjected-sample -l \ app=sleep -o jsonpath='{.items[0].metadata.name}')" \ -- curl -sS $REMOTE_POD_IP:5000/hello Hello version: v2, instance: helloworld-v2-54df5f84b-z28p5 {{< /text >}} If successful, there should be responses only from `helloworld-v2`. Repeat the steps, but send traffic from `cluster2` to `cluster1`. If this succeeds, you can rule out connectivity issues. If it does not, the cause of the problem may lie outside your Istio configuration. ### Locality Load Balancing [Locality load balancing](/docs/tasks/traffic-management/locality-load-balancing/failover/#configure -locality-failover) can be used to make clients prefer that traffic go to the nearest destination. If the clusters are in different localities (region/zone), locality load balancing will prefer the local-cluster and is working as intended. If locality load balancing is disabled, or the clusters are in the same locality, there may be another issue. ### Trust Configuration Cross-cluster traffic, as with intra-cluster traffic, relies on a common root of trust between the proxies. The default Istio installation will use their own individually generated root certificate-authorities. For multi-cluster, we must manually configure a shared root of trust. Follow Plug-in Certs below or read [Identity and Trust Models](/docs/ops/deployment/deployment-models/#identity-and-trust-models) to learn more. **Plug-in Certs:** To verify certs are configured correctly, you can compare the root-cert in each cluster: {{< text bash >}} $ diff \ <(kubectl --context="${CTX_CLUSTER1}" -n istio-system get secret cacerts -ojsonpath='{.data.root-cert\.pem}') \ <(kubectl --context="${CTX_CLUSTER2}" -n istio-system get secret cacerts -ojsonpath='{.data.root-cert\.pem}') {{< /text >}} You can follow the [Plugin CA Certs](/docs/tasks/security/cert-management/plugin-ca-cert/) guide, ensuring to run the steps for every cluster. ### Step-by-step Diagnosis If you've gone through the sections above and are still having issues, then it's time to dig a little deeper. The following steps assume you're following the [HelloWorld verification](/docs/setup/install/multicluster/verify/). Before continuing, make sure both `helloworld` and `sleep` are deployed in each cluster. From each cluster, find the endpoints the `sleep` service has for `helloworld`: {{< text bash >}} $ istioctl --context $CTX_CLUSTER1 proxy-config endpoint sleep-dd98b5f48-djwdw.sample | grep helloworld {{< /text >}} Troubleshooting information differs based on the cluster that is the source of traffic: {{< tabset category-name="source-cluster" >}} {{< tab name="Primary cluster" category-value="primary" >}} {{< text bash >}} $ istioctl --context $CTX_CLUSTER1 proxy-config endpoint sleep-dd98b5f48-djwdw.sample | grep helloworld 10.0.0.11:5000 HEALTHY OK outbound|5000||helloworld.sample.svc.cluster.local {{< /text >}} Only one endpoint is shown, indicating the control plane cannot read endpoints from the remote cluster. Verify that remote secrets are configured properly. {{< text bash >}} $ kubectl get secrets --context=$CTX_CLUSTER1 -n istio-system -l "istio/multiCluster=true" {{< /text >}} * If the secret is missing, create it. * If the secret is present: * Look at the config in the secret. Make sure the cluster name is used as the data key for the remote `kubeconfig`. * If the secret looks correct, check the logs of `istiod` for connectivity or permissions issues reaching the remote Kubernetes API server. Log messages may include `Failed to add remote cluster from secret` along with an error reason. {{< /tab >}} {{< tab name="Remote cluster" category-value="remote" >}} {{< text bash >}} $ istioctl --context $CTX_CLUSTER2 proxy-config endpoint sleep-dd98b5f48-djwdw.sample | grep helloworld 10.0.1.11:5000 HEALTHY OK outbound|5000||helloworld.sample.svc.cluster.local {{< /text >}} Only one endpoint is shown, indicating the control plane cannot read endpoints from the remote cluster. Verify that remote secrets are configured properly. {{< text bash >}} $ kubectl get secrets --context=$CTX_CLUSTER1 -n istio-system -l "istio/multiCluster=true" {{< /text >}} * If the secret is missing, create it. * If the secret is present and the endpoint is a Pod in the **primary** cluster: * Look at the config in the secret. Make sure the cluster name is used as the data key for the remote `kubeconfig`. * If the secret looks correct, check the logs of `istiod` for connectivity or permissions issues reaching the remote Kubernetes API server. Log messages may include `Failed to add remote cluster from secret` along with an error reason. * If the secret is present and the endpoint is a Pod in the **remote** cluster: * The proxy is reading configuration from an istiod inside the remote cluster. When a remote cluster has an in -cluster istiod, it is only meant for sidecar injection and CA. You can verify this is the problem by looking for a Service named `istiod-remote` in the `istio-system` namespace. If it's missing, reinstall making sure `values.global.remotePilotAddress` is set. {{< /tab >}} {{< tab name="Multi-Network" category-value="multi-primary" >}} The steps for Primary and Remote clusters still apply for multi-network, although multi-network has an additional case: {{< text bash >}} $ istioctl --context $CTX_CLUSTER1 proxy-config endpoint sleep-dd98b5f48-djwdw.sample | grep helloworld 10.0.5.11:5000 HEALTHY OK outbound|5000||helloworld.sample.svc.cluster.local 10.0.6.13:5000 HEALTHY OK outbound|5000||helloworld.sample.svc.cluster.local {{< /text >}} In multi-network, we expect one of the endpoint IPs to match the remote cluster's east-west gateway public IP. Seeing multiple Pod IPs indicates one of two things: * The address of the gateway for the remote network cannot be determined. * The network of either the client or server pod cannot be determined. **The address of the gateway for the remote network cannot be determined:** In the remote cluster that cannot be reached, check that the Service has an External IP: {{< text bash >}} $ kubectl -n istio-system get service -l "istio=eastwestgateway" NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE istio-eastwestgateway LoadBalancer 10.8.17.119 15021:31781/TCP,15443:30498/TCP,15012:30879/TCP,15017:30336/TCP 76m {{< /text >}} If the `EXTERNAL-IP` is stuck in ``, the environment may not support `LoadBalancer` services. In this case, it may be necessary to customize the `spec.externalIPs` section of the Service to manually give the Gateway an IP reachable from outside the cluster. If the external IP is present, check that the Service includes a `topology.istio.io/network` label with the correct value. If that is incorrect, reinstall the gateway and make sure to set the --network flag on the generation script. **The network of either the client or server cannot be determined.** On the source pod, check the proxy metadata. {{< text bash >}} $ kubectl get pod $SLEEP_POD_NAME \ -o jsonpath="{.spec.containers[*].env[?(@.name=='ISTIO_META_NETWORK')].value}" {{< /text >}} {{< text bash >}} $ kubectl get pod $HELLOWORLD_POD_NAME \ -o jsonpath="{.metadata.labels.topology\.istio\.io/network}" {{< /text >}} If either of these values aren't set, or have the wrong value, istiod may treat the source and client proxies as being on the same network and send network-local endpoints. When these aren't set, check that `values.global.network` was set properly during install, or that the injection webhook is configured correctly. Istio determines the network of a Pod using the `topology.istio.io/network` label which is set during injection. For non-injected Pods, Istio relies on the `topology.istio.io/network` label set on the system namespace in the cluster. In each cluster, check the network: {{< text bash >}} $ kubectl --context="${CTX_CLUSTER1}" get ns istio-system -ojsonpath='{.metadata.labels.topology\.istio\.io/network}' {{< /text >}} If the above command doesn't output the expected network name, set the label: {{< text bash >}} $ kubectl --context="${CTX_CLUSTER1}" label namespace istio-system topology.istio.io/network=network1 {{< /text >}} {{< /tab >}} {{< /tabset >}}