Troubleshooting Networking

Network issues comes up frequently for new installations of Kubernetes or increasing Kubernetes load. This chapter introduces various network problems and troubleshooting method when using kubernetes.

Overview

Kubernetes "IP-per-pod" model solves 4 distinct networking problems:

  • Highly-coupled container-to-container communications: this is solved by pods and localhost communications.

  • Pod-to-Pod communications: this is solved by CNI network plugin

  • Pod-to-Service communications: this is solved by services

  • External-to-Service communications: this is solved by services

And these are exactly the direction of what we should do when encountering network issues. Reasons include:

  • CNI network plugin configure error

    • CIDR conflicts with existing ones

    • using protocols not supported by underlying network (e.g. multicast may be disabled for clusters on public cloud)

    • IP forward is not enabled

      • sysctl net.ipv4.ip_forward

      • sysctl net.bridge.bridge-nf-call-iptables

  • Missing route tables

    • default kubenet plugin requires a network route for each podCIDR to node IP

    • kube-controller-manager should configure the route table for all nodes, but if something is wrong (e.g. not authorized, exceed quota, etc), route may be missing

  • Forbidden by security groups or firewall rules

    • iptables not managed by kubernetes may forbid kubernetes network connections

    • security groups on public cloud may forbid kubernetes network connections

    • ACL on switches or routers may also forbid kubernetes network connections

Pod failed to allocate IP address

Pod stuck on ContainerCreating state and its events report Failed to allocate address error:

Check the allocated IP addresses in plugin IPAM store, you may find that all IP addresses have been allocated, but the number is much less that running Pods:

There are two possible reasons for such case, which include

  • Bugs in network plugin, which forgets deallocating the IP address when Pod terminated

  • Pods creation is much more faster than garbage collection of terminated Pods

For the first reason, contacting author of plugin for workaround or fixes is first choice. But you can also deallocate IP addresses manually if you are sure about what you are doing:

  • Stop Kubelet

  • Find the IPAM store file for the CNI plugin and get a list of all allocated IP addresses, e.g. /var/lib/cni/networks/cbr0 (flannel) and /var/run/azure-vnet-ipam.json (Azure CNI)

  • Get list of using IP addresses, e.g. by kubectl get pod -o wide --all-namespaces | grep <node-name>

  • Compare the two list, remove the unused IP addresses from the store file, and then delete related netns or virtual nics (this requires deep understand of the network plugin)

  • Finally restart kubelet

For the second reason, a fast garbage collection could be configured for kubelet, e.g.

Flannel Pods stuck in Init:CrashLoopBackOff

When using Flannel network plugin, it is very easy to install for a fresh setup

However, after a while, Flannel Pods may be stuck in Init:CrashLoopBackOff state and also result in not able to create other pods (because network ready is a requirement).

Check logs of Pod kube-flannel-ds-jpp96

This issue is usually caused by SELinux, close SELinux should solve the problem. There are two ways to do this:

  • Set SELINUX=disabled in file /etc/selinux/config (persistent even after reboot)

  • Execute command setenforce 0 (not persistent after reboot)

DNS not work

If your docker version is above 1.13+, then docker would change default iptables FORWARD policy to DROP (at each restart). This change may cause kube-dns not reaching upstream DNS servers. A solution is run iptables -P FORWARD ACCEPT on each nodes, e.g.

Or if you are using flannel or weave network plugin, upgrades to latest version could also solve the problem.

There is also possible that kube-dns is not running normally.

If kube-dns pods are in CrashLoopBackOff state, refer to Troubleshooting kube-dns/dashboard CrashLoopBackOff for troubleshooting kube-dns problem.

Or else, check whether kube-dns service and endpoints are normal:

If kube-dns service doens't exist, or its endpoints are empty, then you should recreate kube-dns service, e.g.

Service not accessible within Pods

The first step is checking whether endpoints have been created automatically for the service

If you got an empty result, there is possible your service's label selector is wrong. Confirm it as follows:

If all of above steps are still ok, confirm further by

  • checking whether Pod containerPort is same with Service containerPort

  • checking whether podIP:containerPort is working

Further, there are also other reasons could also cause service problems. Reasons include:

  • container is not listening to specified containerPort (check pod description again)

  • CNI plugin error or network route error

  • kube-proxy is not running or iptables rules are not configured correctly

Normally, following iptables should be created for a service named hostnames:

There should be 1 rule in KUBE-SERVICES, 1 or 2 rules per endpoint in KUBE-SVC-(hash) (depending on SessionAffinity), one KUBE-SEP-(hash) chain per endpoint, and a few rules in each KUBE-SEP-(hash) chain. The exact rules will vary based on your exact config (including node-ports and load-balancers).

Pod cannot reach itself via Service IP

This can happen when the network is not properly configured for “hairpin” traffic, usually when kube-proxy is running in iptables mode and Pods are connected with bridge network.

Kubelet exposes a --hairpin-mode option, which should be configured as promiscuous-bridge or hairpin-veth instead of none (default is promiscuous-bridge).

Confirm hairpin-veth is working by:

Confirm promiscuous-bridge is working by:

Can't access Kubernetes API

Many addons and containers need to access Kubernetes API for various data (e.g. kube-dns and operator containers). If such errors happened, then confirm whether Kubernetes API is accessible within Pods first:

If timeout error is reported, then confirm whether kubernetes service and its endpoints are normal or not:

If both are still OK, then it's probably kube-apiserver is not start or it is blocked by firewall. Check kube-apiserver status and its logs

But if 403 - Forbidden error is reported, then it's probably kube-apiserver is configured with RBAC. And your container's serviceAccount is not authorized to resources. For such case, you should create proper RoleBindings and ClusterRoleBindings, e.g. to make CoreDNS container run, you need

Last updated

Was this helpful?