website/content/en/docs/tasks/debug/debug-cluster/windows.md

177 lines
7.5 KiB
Markdown

---
reviewers:
- aravindhp
- jayunit100
- jsturtevant
- marosset
title: Windows debugging tips
content_type: concept
---
<!-- overview -->
<!-- body -->
## Node-level troubleshooting {#troubleshooting-node}
1. My Pods are stuck at "Container Creating" or restarting over and over
Ensure that your pause image is compatible with your Windows OS version.
See [Pause container](/docs/concepts/windows/intro/#pause-container)
to see the latest / recommended pause image and/or get more information.
{{< note >}}
If using containerd as your container runtime the pause image is specified in the
`plugins.plugins.cri.sandbox_image` field of the of config.toml configration file.
{{< /note >}}
1. My pods show status as `ErrImgPull` or `ImagePullBackOff`
Ensure that your Pod is getting scheduled to a
[compatible](https://docs.microsoft.com/virtualization/windowscontainers/deploy-containers/version-compatibility)
Windows Node.
More information on how to specify a compatible node for your Pod can be found in
[this guide](/docs/concepts/windows/user-guide/#ensuring-os-specific-workloads-land-on-the-appropriate-container-host).
## Network troubleshooting {#troubleshooting-network}
1. My Windows Pods do not have network connectivity
If you are using virtual machines, ensure that MAC spoofing is **enabled** on all
the VM network adapter(s).
1. My Windows Pods cannot ping external resources
Windows Pods do not have outbound rules programmed for the ICMP protocol. However,
TCP/UDP is supported. When trying to demonstrate connectivity to resources
outside of the cluster, substitute `ping <IP>` with corresponding
`curl <IP>` commands.
If you are still facing problems, most likely your network configuration in
[cni.conf](https://github.com/Microsoft/SDN/blob/master/Kubernetes/flannel/l2bridge/cni/config/cni.conf)
deserves some extra attention. You can always edit this static file. The
configuration update will apply to any new Kubernetes resources.
One of the Kubernetes networking requirements
(see [Kubernetes model](/docs/concepts/cluster-administration/networking/)) is
for cluster communication to occur without
NAT internally. To honor this requirement, there is an
[ExceptionList](https://github.com/Microsoft/SDN/blob/master/Kubernetes/flannel/l2bridge/cni/config/cni.conf#L20)
for all the communication where you do not want outbound NAT to occur. However,
this also means that you need to exclude the external IP you are trying to query
from the `ExceptionList`. Only then will the traffic originating from your Windows
pods be SNAT'ed correctly to receive a response from the outside world. In this
regard, your `ExceptionList` in `cni.conf` should look as follows:
```conf
"ExceptionList": [
"10.244.0.0/16", # Cluster subnet
"10.96.0.0/12", # Service subnet
"10.127.130.0/24" # Management (host) subnet
]
```
1. My Windows node cannot access `NodePort` type Services
Local NodePort access from the node itself fails. This is a known
limitation. NodePort access works from other nodes or external clients.
1. vNICs and HNS endpoints of containers are being deleted
This issue can be caused when the `hostname-override` parameter is not passed to
[kube-proxy](/docs/reference/command-line-tools-reference/kube-proxy/). To resolve
it, users need to pass the hostname to kube-proxy as follows:
```powershell
C:\k\kube-proxy.exe --hostname-override=$(hostname)
```
1. My Windows node cannot access my services using the service IP
This is a known limitation of the networking stack on Windows. However, Windows Pods can access the Service IP.
1. No network adapter is found when starting the kubelet
The Windows networking stack needs a virtual adapter for Kubernetes networking to work.
If the following commands return no results (in an admin shell),
virtual network creation — a necessary prerequisite for the kubelet to work — has failed:
```powershell
Get-HnsNetwork | ? Name -ieq "cbr0"
Get-NetAdapter | ? Name -Like "vEthernet (Ethernet*"
```
Often it is worthwhile to modify the [InterfaceName](https://github.com/microsoft/SDN/blob/master/Kubernetes/flannel/start.ps1#L7)
parameter of the `start.ps1` script, in cases where the host's network adapter isn't "Ethernet".
Otherwise, consult the output of the `start-kubelet.ps1` script to see if there are errors during virtual network creation.
1. DNS resolution is not properly working
Check the DNS limitations for Windows in this [section](/docs/concepts/services-networking/dns-pod-service/#dns-windows).
1. `kubectl port-forward` fails with "unable to do port forwarding: wincat not found"
This was implemented in Kubernetes 1.15 by including `wincat.exe` in the pause infrastructure container
`mcr.microsoft.com/oss/kubernetes/pause:3.6`.
Be sure to use a supported version of Kubernetes.
If you would like to build your own pause infrastructure container be sure to include
[wincat](https://github.com/kubernetes/kubernetes/tree/master/build/pause/windows/wincat).
1. My Kubernetes installation is failing because my Windows Server node is behind a proxy
If you are behind a proxy, the following PowerShell environment variables must be defined:
```PowerShell
[Environment]::SetEnvironmentVariable("HTTP_PROXY", "http://proxy.example.com:80/", [EnvironmentVariableTarget]::Machine)
[Environment]::SetEnvironmentVariable("HTTPS_PROXY", "http://proxy.example.com:443/", [EnvironmentVariableTarget]::Machine)
```
### Flannel troubleshooting
1. With Flannel, my nodes are having issues after rejoining a cluster
Whenever a previously deleted node is being re-joined to the cluster, flannelD
tries to assign a new pod subnet to the node. Users should remove the old pod
subnet configuration files in the following paths:
```powershell
Remove-Item C:\k\SourceVip.json
Remove-Item C:\k\SourceVipRequest.json
```
1. Flanneld is stuck in "Waiting for the Network to be created"
There are numerous reports of this [issue](https://github.com/coreos/flannel/issues/1066);
most likely it is a timing issue for when the management IP of the flannel network is set.
A workaround is to relaunch `start.ps1` or relaunch it manually as follows:
```powershell
[Environment]::SetEnvironmentVariable("NODE_NAME", "<Windows_Worker_Hostname>")
C:\flannel\flanneld.exe --kubeconfig-file=c:\k\config --iface=<Windows_Worker_Node_IP> --ip-masq=1 --kube-subnet-mgr=1
```
1. My Windows Pods cannot launch because of missing `/run/flannel/subnet.env`
This indicates that Flannel didn't launch correctly. You can either try
to restart `flanneld.exe` or you can copy the files over manually from
`/run/flannel/subnet.env` on the Kubernetes master to `C:\run\flannel\subnet.env`
on the Windows worker node and modify the `FLANNEL_SUBNET` row to a different
number. For example, if node subnet 10.244.4.1/24 is desired:
```env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.4.1/24
FLANNEL_MTU=1500
FLANNEL_IPMASQ=true
```
### Further investigation
If these steps don't resolve your problem, you can get help running Windows containers on Windows nodes in Kubernetes through:
* StackOverflow [Windows Server Container](https://stackoverflow.com/questions/tagged/windows-server-container) topic
* Kubernetes Official Forum [discuss.kubernetes.io](https://discuss.kubernetes.io/)
* Kubernetes Slack [#SIG-Windows Channel](https://kubernetes.slack.com/messages/sig-windows)