Troubleshooting Networking
Network issues comes up frequently for new installations of Kubernetes or increasing Kubernetes load. This chapter introduces various network problems and troubleshooting method when using kubernetes.
Overview
Kubernetes "IP-per-pod" model solves 4 distinct networking problems:
Highly-coupled container-to-container communications: this is solved by pods and localhost communications.
Pod-to-Pod communications: this is solved by CNI network plugin
Pod-to-Service communications: this is solved by services
External-to-Service communications: this is solved by services
And these are exactly the direction of what we should do when encountering network issues. Reasons include:
CNI network plugin configure error
CIDR conflicts with existing ones
using protocols not supported by underlying network (e.g. multicast may be disabled for clusters on public cloud)
IP forward is not enabled
sysctl net.ipv4.ip_forwardsysctl net.bridge.bridge-nf-call-iptables
Missing route tables
default kubenet plugin requires a network route for each podCIDR to node IP
kube-controller-manager should configure the route table for all nodes, but if something is wrong (e.g. not authorized, exceed quota, etc), route may be missing
Forbidden by security groups or firewall rules
iptables not managed by kubernetes may forbid kubernetes network connections
security groups on public cloud may forbid kubernetes network connections
ACL on switches or routers may also forbid kubernetes network connections
Pod failed to allocate IP address
Pod stuck on ContainerCreating state and its events report Failed to allocate address error:
Check the allocated IP addresses in plugin IPAM store, you may find that all IP addresses have been allocated, but the number is much less that running Pods:
There are two possible reasons for such case, which include
Bugs in network plugin, which forgets deallocating the IP address when Pod terminated
Pods creation is much more faster than garbage collection of terminated Pods
For the first reason, contacting author of plugin for workaround or fixes is first choice. But you can also deallocate IP addresses manually if you are sure about what you are doing:
Stop Kubelet
Find the IPAM store file for the CNI plugin and get a list of all allocated IP addresses, e.g.
/var/lib/cni/networks/cbr0(flannel) and/var/run/azure-vnet-ipam.json(Azure CNI)Get list of using IP addresses, e.g. by
kubectl get pod -o wide --all-namespaces | grep <node-name>Compare the two list, remove the unused IP addresses from the store file, and then delete related netns or virtual nics (this requires deep understand of the network plugin)
Finally restart kubelet
For the second reason, a fast garbage collection could be configured for kubelet, e.g.
Flannel Pods stuck in Init:CrashLoopBackOff
Init:CrashLoopBackOffWhen using Flannel network plugin, it is very easy to install for a fresh setup
However, after a while, Flannel Pods may be stuck in Init:CrashLoopBackOff state and also result in not able to create other pods (because network ready is a requirement).
Check logs of Pod kube-flannel-ds-jpp96
This issue is usually caused by SELinux, close SELinux should solve the problem. There are two ways to do this:
Set
SELINUX=disabledin file/etc/selinux/config(persistent even after reboot)Execute command
setenforce 0(not persistent after reboot)
DNS not work
If your docker version is above 1.13+, then docker would change default iptables FORWARD policy to DROP (at each restart). This change may cause kube-dns not reaching upstream DNS servers. A solution is run iptables -P FORWARD ACCEPT on each nodes, e.g.
Or if you are using flannel or weave network plugin, upgrades to latest version could also solve the problem.
There is also possible that kube-dns is not running normally.
If kube-dns pods are in CrashLoopBackOff state, refer to Troubleshooting kube-dns/dashboard CrashLoopBackOff for troubleshooting kube-dns problem.
Or else, check whether kube-dns service and endpoints are normal:
If kube-dns service doens't exist, or its endpoints are empty, then you should recreate kube-dns service, e.g.
Service not accessible within Pods
The first step is checking whether endpoints have been created automatically for the service
If you got an empty result, there is possible your service's label selector is wrong. Confirm it as follows:
If all of above steps are still ok, confirm further by
checking whether Pod containerPort is same with Service containerPort
checking whether
podIP:containerPortis working
Further, there are also other reasons could also cause service problems. Reasons include:
container is not listening to specified containerPort (check pod description again)
CNI plugin error or network route error
kube-proxy is not running or iptables rules are not configured correctly
Normally, following iptables should be created for a service named hostnames:
There should be 1 rule in KUBE-SERVICES, 1 or 2 rules per endpoint in KUBE-SVC-(hash) (depending on SessionAffinity), one KUBE-SEP-(hash) chain per endpoint, and a few rules in each KUBE-SEP-(hash) chain. The exact rules will vary based on your exact config (including node-ports and load-balancers).
Pod cannot reach itself via Service IP
This can happen when the network is not properly configured for “hairpin” traffic, usually when kube-proxy is running in iptables mode and Pods are connected with bridge network.
Kubelet exposes a --hairpin-mode option, which should be configured as promiscuous-bridge or hairpin-veth instead of none (default is promiscuous-bridge).
Confirm hairpin-veth is working by:
Confirm promiscuous-bridge is working by:
Can't access Kubernetes API
Many addons and containers need to access Kubernetes API for various data (e.g. kube-dns and operator containers). If such errors happened, then confirm whether Kubernetes API is accessible within Pods first:
If timeout error is reported, then confirm whether kubernetes service and its endpoints are normal or not:
If both are still OK, then it's probably kube-apiserver is not start or it is blocked by firewall. Check kube-apiserver status and its logs
But if 403 - Forbidden error is reported, then it's probably kube-apiserver is configured with RBAC. And your container's serviceAccount is not authorized to resources. For such case, you should create proper RoleBindings and ClusterRoleBindings, e.g. to make CoreDNS container run, you need
Last updated
Was this helpful?