当前位置:网站首页>Handling of communication failure between kuberbetes pod
Handling of communication failure between kuberbetes pod
2022-06-24 16:20:00 【yaxin】
k8s Create cluster service( service ) after , Within cluster pod The node can access the service , However, other nodes cannot access the service normally , After debugging , I think the process is very meaningful , Record the whole debugging and solving process .
0x01 Fault and cluster information
Let's start with k8s Cluster situation , A cluster master( node A) One worker( node B), their IP The information is as follows :
Machine identification | Machine action | Intranet IP | Public network IP |
|---|---|---|---|
node A | master node | 10.200.200.7 | 129.226.176.163 |
node B | worker node | 172.19.0.13 | 43.129.71.69 |
node A And nodes B Not on the same intranet , They go directly through the public network IP Data transfer .
The main performance of the fault is , Deploy a nginx Service and create the corresponding service after , stay master Node cannot pass ClusterIP Access the service , But in worker Nodes can normally access the service .
0x02 kubernetes A network model
understand k8s The network model of is very necessary for fault analysis and solution , So let's explain it in advance , The network plug-in used here is flannel, Other plug-ins are basically the same .
k8s The foundation of the network is two types IP, One is ClusterIP( Within a cluster IP), The other is PodIP(Pod Of IP). Let's analyze one by one .
2.1 ClusterIP
First say ClusterIP, I said before ,ClusterIP It's a virtual IP, What is virtual IP Well ? That's it IP Do not bind any network card ( Including virtual network card ) And does not have any routing rules , You can view all nodes IP Or all Pod Of IP And routing information , Will not find anything to do with ClusterIP Relevant information .ClusterIP Can work normally , Depend entirely on iptables Provided DNAT Ability , The packet is passing by iptables Of OUTPUT When the chain , The destination port and address will be in accordance with k8s The defined rule is modified to the specified PodIP The port and address of , The packet is then forwarded to the specified pod On . This also determines that the external network cannot pass through ClusterIP Access to specific services ( The request cannot be routed to the cluster and k8s Not compatible with external requests DNAT).
2.2 PodIP
Let's talk about PodIP,PodIP yes k8s Assigned to pod Of IP, Every pod Each are not identical , To all in a large network break , each pod It can be interconnected , And the nodes of the cluster can also pass through PodIP Access to each pod Service for . In order to understand its principle , Let's look at the following picture first :
Above picture ,cni0 and flannel.1 Network devices are flannel Virtual network devices generated by network plug-ins , Use cni0 It's a bridge equipment ,flannel.1 It's a vxlan equipment ,cni0 Connect the nodes on this node pod as well as flannel.1. and flannel.1 The other end of the is connected to flanneld process , All access flannel.1 Of the traffic will be handed over to flanneld Process to process , and flanneld Will send packets 3 layer (IP layer ) And more packages are encapsulated into one udp In bag , Then find the system configuration , Find the node to which the packet needs to be sent IP, Then it is sent to the designated node at the opposite end udp port (5678), This technology is vxlan, Devices not in the same network segment can form the same network segment through the public network ( Intranet ).
chart 2.2、 Device type
Let's use the network flow diagram to illustrate , node A Medium pod-1 To the node B Of pod-3 Make a request , The packet first queries pod-1 Routing table in , Packets arrive by default route cni0, Then forward to flannel.1 equipment , Then there was flanneld The process is processed with udp Send the data packet to the node B Of 5678 port , node B Of flanneld After receiving the data, unpack the data , Then throw it to the node B Of flannel.1 equipment , Then the request is forwarded to the node B Of cni0 equipment , By looking for arp surface , Determine the purpose pod-3, Finally, the packet arrived smoothly pod-3. And if the request is on a node A Directly from the , At the beginning, the node will be queried A The routing table for , Then the routing table determines that the packets are forwarded to flannel.1 equipment , The following process is the same as above .
Understand the above two categories IP And forwarding rules , Let's look at the specific problems encountered .
0x03 Fault analysis
First match to nat Tabular OUTPUT chain (Chain), You can see , The request was forwarded to KUBE-SERVICES chain
see KUBE-SERVICES chain , According to the goal IP(10.110.36.99) And port (80), The request was forwarded to KUBE-SVC-HL5LMXD5JFHQZ6LN chain
see KUBE-SVC-HL5LMXD5JFHQZ6LN The chain knows that the request was transferred to KUBE-SEP-SADCJIHRQW7RJ62U chain , Then check it out KUBE-SEP-SADCJIHRQW7RJ62U Chain details , You can see , The request was made with DNAT The way of 10.244.1.2:80.
Through the above analysis , We know that the request was eventually transferred to 10.244.1.2 Of 80 port , that 10.244.1.2 This IP Whose is it ?
Actually this IP Is the name of the container associated with the service IP, adopt kubectl get pods Inquire about deployment Created pod, And then use kubectl describe pods nginx-6fd9b8bcc7-ln2ws Inquire about pod Details can be found in pod Of IP.
Then our present problem is transformed into master Node to access 10.244.1.2 Service for , This also doesn't make sense .
Through the analysis in the previous section , We know that packets first arrive at flannel.1 equipment , then flanneld The process goes on eth0 network card . Go directly to the packet capturing tool flannel.1 Grab the bag , give the result as follows :
The above picture shows the result of packet capturing , The packet did arrive flannel.1, But only those who go SYN package , Didn't return the package . The packet should be lost at some stage . Let's put this first , Continue to the next package received . because flanneld Is a process , Last resort , Don't touch it yet .flanneld Your next device is eth0, Have the conditions for capturing packets , Directly in eth0 The grab port is 8472 Of udp package :
Here is the clue , Packets are sent to nodes B Of , But the purpose of use IP It is a node B The Intranet IP, And node A and B Not on the same intranet , There is no direct access to , This leads to packet loss and failure to connect .
Found the cause of the problem , The next step is to analyze what caused this . We know from the above analysis , The packet with the wrong destination address is a node A Of flanneld Process sent . At this point, our investigation direction has shifted to flanneld On , When looking at the flannel Deployment information and master and worker On flannel There is still no clue after the configuration file of the container . View flannel file (https://github.com/flannel-io/flannel/blob/master/Documentation/kubernetes.md#annotations) Found out flannel.alpha.coreos.com/public-ip-overwrite annotation , This annotation can override the external IP, We can use kubectl describe nodes worker01 View node details .
By viewing the node details , We found the annotation of the current node flannel.alpha.coreos.com/public-ip The value is worker01 The Intranet IP, This is also the reason why the destination address of the above packet is wrong . Execute the following command on worker01 The node is marked with flannel.alpha.coreos.com/public-ip-overwrite annotation , The value is worker01 The Internet IP.
# Set annotation kubectl annotate node worker01 flannel.alpha.coreos.com/public-ip-overwrite=43.129.71.69 --overwrite # Get flannel pod on worker01 kubectl -n kube-system get pods --field-selector spec.nodeName=worker01 | grep flannel # Restart flannel kubectl -n kube-system get pod kube-flannel-ds-xq7b9 -o yaml | kubectl replace --force -f -
Need to restart after modification worker01 Of flannel pod To take effect , And then again in master On the implementation curl 10.244.1.2
0x04 summary
The reason for this problem is that the machine is not on the same intranet and the machine is on the public network IP Not shown is bound to the machine network card ( A typical virtual machine ), Usually, we build on the same intranet k8s colony , Therefore, we seldom encounter . The process of solving this problem can make us better understand k8s The principle of network communication .
As for no longer the same intranet and public network IP It does not show how the machine bound to the network card can set up a cluster , I'll write the following article alone .
边栏推荐
- How to easily realize online karaoke room and sing "mountain sea" with Wang Xinling
- Nature刊登量子计算重大进展:有史以来第一个量子集成电路实现
- 60 divine vs Code plug-ins!!
- How to obtain ECS metadata
- Using oasis to develop a hop by hop (I) -- Scene Building
- Install the imagemagick7.1 library and the imageick extension for PHP
- 企业安全攻击面分析工具
- Find out the invisible assets -- use hosts collision to break through the boundary
- 2021-04-18: given a two-dimensional array matrix, the value in it is either 1 or 0,
- Ps\ai and other design software pondering notes
猜你喜欢

Using oasis to develop a hop by hop (I) -- Scene Building
![[application recommendation] the hands-on experience and model selection suggestions of apifox & apipost in the recent fire](/img/dd/24df91a8a1cf1f1b9ac635abd6863a.png)
[application recommendation] the hands-on experience and model selection suggestions of apifox & apipost in the recent fire

Cap: multiple attention mechanism, interesting fine-grained classification scheme | AAAI 2021

【附下载】汉化版Awvs安装与简单使用
MySQL Advanced Series: Locks - Locks in InnoDB

存在安全隐患 路虎召回部分混动揽运

Several common DoS attacks

Ui- first lesson
![[download attached] installation and simple use of Chinese version of awvs](/img/3b/f26617383690c86edff465c9a1099e.png)
[download attached] installation and simple use of Chinese version of awvs

ZOJ——4104 Sequence in the Pocket(思维问题)
随机推荐
B. Ternary Sequence(思维+贪心)Codeforces Round #665 (Div. 2)
SQL multi table updating data is very slow
Some adventurer hybrid versions with potential safety hazards will be recalled
@There is a free copyright protection service for enterprises in Dawan District
#夏日挑战赛# HarmonyOS - 实现带日期效果的待办事项
2021-04-29: given an array arr, it represents a row of balloons with scores. One for each blow
Dismantle the industrial chain of synthetic rubber industry, and the supply chain may become a sharp weapon for breakthrough
打破内存墙的新利器成行业“热搜”!持久内存让打工人也能玩转海量数据+高维模型
The million bonus competition is about to start, and Ti-One will be upgraded to help you win the championship!
50 growers | closed door meeting of marketing circle of friends ス gathering Magic City thinking collision to help enterprise marketing growth
Understanding openstack network
Cap: multiple attention mechanism, interesting fine-grained classification scheme | AAAI 2021
2021-05-04: given a non negative integer C, you need to judge whether there are two integers a and B, so that a*a+b*b=c.
Learning these 10 kinds of timed tasks, I'm a little floating
Nifi from introduction to practice (nanny level tutorial) - environment
对深度可分离卷积、分组卷积、扩张卷积、转置卷积(反卷积)的理解
Global and Chinese market of inverted syrup 2022-2028: Research Report on technology, participants, trends, market size and share
Nature publishes significant progress in quantum computing: the first quantum integrated circuit implementation in history
【面试高频题】难度 3/5,可直接构造的序列 DP 题
Product level design of a project in SAP mm