一、问题背景

1.1 问题说明

之前由于网络插件flannel安装不成功,导致xxl-job执行任务的时候,提示拒绝服务,如下图所示:
在这里插入图片描述
但是安装flannel安装成功后,依然无法联通,还是提示相同问题

1.2 排查网络

通过ifconfig查看各个k8s节点的网络情况
1) master:cni0和flannel.1的网段是对应的,都是10.244.0.x

[root@zzyk8s01 scripts]# ifconfig
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.1  netmask 255.255.255.0  broadcast 10.244.0.255
        inet6 fe80::1c6c:c1ff:fe59:846d  prefixlen 64  scopeid 0x20<link>
        ether 1e:6c:c1:59:84:6d  txqueuelen 1000  (Ethernet)
        RX packets 494717  bytes 39902995 (38.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 792397  bytes 57936901 (55.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.0  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::70a0:e8ff:fe34:99ba  prefixlen 64  scopeid 0x20<link>
        ether 72:a0:e8:34:99:ba  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 8 overruns 0  carrier 0  collisions 0

2)worker1:cni0和flannel.1的网段是没有对应上,正常cni0应该是10.244.1.1

[root@zzyk8s02 zzitadmin]#  ifconfig                                                                                                                               
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.1  netmask 255.255.255.0  broadcast 10.244.0.255
        inet6 fe80::a432:5bff:fea6:bef7  prefixlen 64  scopeid 0x20<link>
        ether a6:32:5b:a6:be:f7  txqueuelen 1000  (Ethernet)
        RX packets 23183004  bytes 4647637027 (4.3 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 17425600  bytes 2903842258 (2.7 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.1.0  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::acf6:b2ff:fe89:9d1b  prefixlen 64  scopeid 0x20<link>
        ether ae:f6:b2:89:9d:1b  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 8 overruns 0  carrier 0  collisions 0

3)worker2:cni0和flannel.1的网段是也没有对应上,正常cni0应该是10.244.2.1

[root@zzyk8s03 zzitadmin]# ifconfig
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.1  netmask 255.255.255.0  broadcast 10.244.0.255
        inet6 fe80::7878:40ff:fe29:d7de  prefixlen 64  scopeid 0x20<link>
        ether 7a:78:40:29:d7:de  txqueuelen 1000  (Ethernet)
        RX packets 493875  bytes 103157632 (98.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 369347  bytes 42352328 (40.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.2.0  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::bcaf:54ff:feca:bd0d  prefixlen 64  scopeid 0x20<link>
        ether be:af:54:ca:bd:0d  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 8 overruns 0  carrier 0  collisions 0

1.3 解决办法

在master节点和所有worker节点依次执行如下命令,重新安装网络插件flannel

ifconfig cni0 down
ip link delete cni0
ifconfig flannel.1 down
ip link delete flannel.1
rm -rf /var/lib/cni/flannel/*
rm -rf /var/lib/cni/networks/cbr0/*
rm -rf /var/lib/cni/cache/*
rm -f /etc/cni/net.d/*
systemctl restart kubelet
systemctl restart docker
chmod a+w /var/run/docker.sock

执行完之后,再次查看ifconfig ,发现cni0和flannel.1网络先是没有了,过了一会儿自动生成了新的。
如下图:worker1上的cni0的网段和flannel.1的网段保持了一致。worker3情况类似。
在这里插入图片描述
再次查看网络插件的运行情况:均是Running
在这里插入图片描述

二、回归测试

最后,通过执行一次xxljob进行测试,发现执行成功,且后台日志正常。
在这里插入图片描述

END

Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐