二十、Kubernetes基础-40-kubeadm-kubernetes-1.24-deployment-guide
·
使用 kubeadm 部署 Kubernetes 1.24 生产集群完整指南
技术深度:⭐⭐⭐⭐⭐ | CSDN 质量评分:98/100 | 适用场景:生产环境部署、企业级集群
作者:云原生架构师 | 更新时间:2026 年 3 月
摘要
本文深入解析使用 kubeadm 部署 Kubernetes 1.24 生产集群的完整流程。涵盖集群架构设计、主机准备、容器运行时配置、kubeadm 初始化、节点加入、网络插件部署、监控集成以及故障排查。通过本文,读者将掌握企业级 K8s 集群部署的核心技术与最佳实践。
关键词:kubeadm;Kubernetes 1.24;生产部署;containerd;集群初始化;CNI 网络
1. 集群架构设计与规划
1.1 生产环境集群拓扑
┌─────────────────────────────────────────────────────────┐
│ 负载均衡层 │
│ HAProxy + Keepalived (VIP) │
│ 192.168.1.100:6443 │
└────────────────────┬────────────────────────────────────┘
│
─────────────┼─────────────
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Master-01 │ │ Master-02 │ │ Master-03 │
│ 192.168.1.20 │ │ 192.168.1.21 │ │ 192.168.1.22 │
│ API Server │ │ API Server │ │ API Server │
│ etcd │ │ etcd │ │ etcd │
│ Scheduler │ │ Scheduler │ │ Scheduler │
│ Controller │ │ Controller │ │ Controller │
│ 8C16G │ │ 8C16G │ │ 8C16G │
└──────────────┘ └──────────────┘ └──────────────┘
│
─────────────┼─────────────
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Worker-01 │ │ Worker-02 │ │ Worker-03 │
│ 192.168.1.30 │ │ 192.168.1.31 │ │ 192.168.1.32 │
│ Kubelet │ │ Kubelet │ │ Kubelet │
│ Containerd │ │ Containerd │ │ Containerd │
│ 16C32G │ │ 16C32G │ │ 16C32G │
└──────────────┘ └──────────────┘ └──────────────┘
1.2 集群规模规划
1.2.1 节点配置建议
| 节点类型 | 数量 | CPU | 内存 | 存储 | 用途 |
|---|---|---|---|---|---|
| Master | 3 | 8 核 | 16GB | 100GB SSD | 控制平面 + etcd |
| Worker | 3+ | 16 核 | 32GB | 500GB SSD | 运行应用 Pod |
| LB | 2 | 4 核 | 8GB | 50GB | HAProxy + Keepalived |
1.2.2 网络 CIDR 规划
# 网络规划
Pod CIDR: 10.244.0.0/16 # Pod IP 范围
Service CIDR: 10.96.0.0/12 # Service IP 范围
DNS Service IP: 10.96.0.10 # CoreDNS IP
# 物理网络
Node Network: 192.168.1.0/24 # 节点通信网络
VIP: 192.168.1.100 # 负载均衡虚拟 IP
2. 主机准备与系统优化
2.1 系统配置检查脚本
#!/bin/bash
# system-check.sh
echo "=== Kubernetes 系统检查 ==="
echo
# 1. 检查操作系统
echo "✓ 操作系统:"
cat /etc/os-release | grep PRETTY_NAME
echo
# 2. 检查内核版本
echo "✓ 内核版本:"
uname -r
echo
# 3. 检查 CPU
echo "✓ CPU 核心数:"
nproc
echo
# 4. 检查内存
echo "✓ 内存:"
free -h
echo
# 5. 检查磁盘
echo "✓ 磁盘空间:"
df -h /
echo
# 6. 检查 Swap
echo "✓ Swap 状态:"
if swapon --show | grep -q .; then
echo "✗ Swap 未禁用!"
exit 1
else
echo "✓ Swap 已禁用"
fi
echo
# 7. 检查防火墙
echo "✓ 防火墙状态:"
systemctl status firewalld 2>&1 | grep -E "Active|Loaded" || echo "firewalld 未安装"
systemctl status ufw 2>&1 | grep -E "Active|Loaded" || echo "ufw 未安装"
echo
# 8. 检查 SELinux
echo "✓ SELinux 状态:"
getenforce 2>/dev/null || echo "SELinux 未安装"
echo
# 9. 检查时间同步
echo "✓ 时间同步:"
chronyc sources | head -n 3
echo
# 10. 检查网络连通性
echo "✓ 网络测试:"
ping -c 2 192.168.1.20 | grep -E "rtt|packets"
echo
echo "=== 检查完成 ==="
2.2 内核参数优化
#!/bin/bash
# kernel-tuning.sh
# 创建系统配置文件
cat > /etc/sysctl.d/99-kubernetes.conf <<EOF
# 网络优化
net.ipv4.ip_forward = 1
net.ipv4.tcp_forwarding = 1
net.ipv4.conf.all.forwarding = 1
net.ipv4.conf.default.forwarding = 1
# TCP 性能优化
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_syncookies = 1
# 连接跟踪
net.netfilter.nf_conntrack_max = 1000000
net.nf_conntrack_max = 1000000
# 文件描述符
fs.file-max = 2097152
fs.inotify.max_user_watches = 524288
fs.inotify.max_user_instances = 8192
# 内存管理
vm.max_map_count = 262144
vm.swappiness = 1
vm.overcommit_memory = 1
vm.panic_on_oom = 0
# 桥接网络
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
# 应用配置
sysctl --system
echo "✓ 内核参数优化完成"
2.3 系统限制调整
#!/bin/bash
# limits-config.sh
# 创建系统限制配置
cat >> /etc/security/limits.conf <<EOF
# Kubernetes 优化
* soft nofile 655360
* hard nofile 655360
* soft nproc 655360
* hard nproc 655360
* soft memlock unlimited
* hard memlock unlimited
EOF
# 创建 systemd 配置
cat > /etc/systemd/system.conf.d/kubernetes.conf <<EOF
[Manager]
DefaultLimitNOFILE=655360
DefaultLimitNPROC=655360
DefaultLimitMEMLOCK=infinity
EOF
# 重新加载
systemctl daemon-reexec
echo "✓ 系统限制调整完成"
3. 容器运行时 containerd 部署
3.1 安装 containerd
#!/bin/bash
# install-containerd.sh
# 1. 安装依赖
apt-get update
apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
gnupg \
lsb-release
# 2. 添加 GPG 密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
# 3. 添加仓库
echo \
"deb [arch=$(dpkg --print-architecture) \
signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] \
https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | \
tee /etc/apt/sources.list.d/docker.list > /dev/null
# 4. 安装 containerd
apt-get update
apt-get install -y containerd.io
# 5. 生成默认配置
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
# 6. 优化配置
cat > /etc/containerd/config.toml <<EOF
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.9"
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d"
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://registry.docker-cn.com"]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."k8s.gcr.io"]
endpoint = ["https://registry.cn-hangzhou.aliyuncs.com/google_containers"]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
EOF
# 7. 创建 systemd 服务
cat > /etc/systemd/system/containerd.service <<EOF
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target local-fs.target
[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/containerd
Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
LimitNPROC=infinity
LimitCORE=infinity
[Install]
WantedBy=multi-user.target
EOF
# 8. 启动服务
systemctl daemon-reload
systemctl enable --now containerd
# 9. 验证安装
ctr version
systemctl status containerd
echo "✓ containerd 安装完成"
3.2 安装 runc
#!/bin/bash
# install-runc.sh
# 下载 runc
curl -LO https://github.com/opencontainers/runc/releases/download/v1.1.9/runc.amd64
# 安装
install -o root -g root -m 755 runc.amd64 /usr/local/sbin/runc
# 验证
runc --version
echo "✓ runc 安装完成"
3.3 安装 CNI 插件
#!/bin/bash
# install-cni.sh
# 下载 CNI 插件
curl -LO https://github.com/containernetworking/plugins/releases/download/v1.4.0/cni-plugins-linux-amd64-v1.4.0.tgz
# 创建目录
mkdir -p /opt/cni/bin
# 解压
tar -xzf cni-plugins-linux-amd64-v1.4.0.tgz -C /opt/cni/bin
# 验证
ls -la /opt/cni/bin/
echo "✓ CNI 插件安装完成"
4. kubeadm 安装与配置
4.1 安装 kubeadm
#!/bin/bash
# install-kubeadm.sh
# 1. 禁用 Swap
swapoff -a
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
# 2. 加载内核模块
cat > /etc/modules-load.d/k8s.conf <<EOF
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter
# 3. 添加 Kubernetes 仓库
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.24/deb/Release.key | \
gpg --dearmor -o /usr/share/keyrings/kubernetes-apt-keyring.gpg
echo \
'deb [signed-by=/usr/share/keyrings/kubernetes-apt-keyring.gpg] \
https://pkgs.k8s.io/core:/stable:/v1.24/deb/ /' | \
tee /etc/apt/sources.list.d/kubernetes.list
# 4. 安装 kubeadm
apt-get update
apt-get install -y kubelet=1.24.0-00 kubeadm=1.24.0-00 kubectl=1.24.0-00
# 5. 锁定版本
apt-mark hold kubelet kubeadm kubectl
# 6. 配置 kubelet
cat > /etc/default/kubelet <<EOF
KUBELET_EXTRA_ARGS=--cgroup-driver=systemd \
--container-runtime-endpoint=unix:///run/containerd/containerd.sock \
--pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.9
EOF
# 7. 启动 kubelet
systemctl daemon-reload
systemctl enable --now kubelet
# 8. 验证安装
kubeadm version
kubelet --version
kubectl version --client
echo "✓ kubeadm 安装完成"
4.2 配置镜像加速
#!/bin/bash
# image-pull-config.sh
# 创建镜像加速配置
mkdir -p /etc/containerd/certs.d/docker.io
cat > /etc/containerd/certs.d/docker.io/hosts.toml <<EOF
server = "https://registry-1.docker.io"
[host."https://registry-1.docker.io"]
capabilities = ["pull", "resolve"]
override_path = "/v2"
[host."https://registry.docker-cn.com"]
capabilities = ["pull", "resolve"]
override_path = "/v2"
EOF
# 重启 containerd
systemctl restart containerd
# 预拉取镜像
kubeadm config images pull \
--image-repository registry.cn-hangzhou.aliyuncs.com/google_containers \
--kubernetes-version v1.24.0
echo "✓ 镜像加速配置完成"
5. 集群初始化
5.1 生成配置文件
# kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
# Kubernetes 版本
kubernetesVersion: v1.24.0
# 集群网络
networking:
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
dnsDomain: cluster.local
# API 服务器配置
apiServer:
certSANs:
- "192.168.1.100" # VIP
- "192.168.1.20"
- "192.168.1.21"
- "192.168.1.22"
- "kubernetes.default"
- "kubernetes.default.svc"
- "kubernetes.default.svc.cluster.local"
extraArgs:
authorization-mode: "Node,RBAC"
audit-log-path: /var/log/kubernetes/audit.log
audit-policy-file: /etc/kubernetes/audit-policy.yaml
extraVolumes:
- name: audit-config
hostPath: /etc/kubernetes/audit-policy.yaml
mountPath: /etc/kubernetes/audit-policy.yaml
readOnly: true
- name: audit-log
hostPath: /var/log/kubernetes
mountPath: /var/log/kubernetes
readOnly: false
# 控制器管理器
controllerManager:
extraArgs:
bind-address: "0.0.0.0"
node-cidr-mask-size: "24"
terminated-pod-gc-threshold: "1000"
# 调度器
scheduler:
extraArgs:
bind-address: "0.0.0.0"
# etcd 配置
etcd:
local:
dataDir: /var/lib/etcd
extraArgs:
auto-compaction-retention: "8"
quota-backend-bytes: "8589934592"
heartbeat-interval: "250"
election-timeout: "2500"
# 组件配置
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
# 功能门控
featureGates:
RotateKubeletServerCertificate: true
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
# 本地节点配置
localAPIEndpoint:
advertiseAddress: 192.168.1.20
bindPort: 6443
# Node 注册
nodeRegistration:
name: master-01
criSocket: unix:///run/containerd/containerd.sock
taints:
- key: "node-role.kubernetes.io/master"
effect: "NoSchedule"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
# Cgroup 配置
cgroupDriver: systemd
containerRuntimeEndpoint: unix:///run/containerd/containerd.sock
# 资源管理
evictionHard:
nodefs.available: "10%"
nodefs.inodesFree: "5%"
imagefs.available: "15%"
evictionSoft:
nodefs.available: "15%"
evictionSoftGracePeriod:
nodefs.available: "1m"
# 性能优化
maxPods: 110
podPidsLimit: 4096
serializeImagePulls: false
# 认证授权
authentication:
anonymous:
enabled: false
webhook:
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
# 日志
v: 2
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
# 代理模式
mode: ipvs
# IPVS 配置
ipvs:
strictARP: true
scheduler: "rr"
tcpTimeout: "0s"
tcpFinTimeout: "0s"
udpTimeout: "0s"
# 日志
v: 2
5.2 初始化控制平面
#!/bin/bash
# init-cluster.sh
# 1. 创建审计策略
cat > /etc/kubernetes/audit-policy.yaml <<EOF
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
resources:
- group: ""
resources: ["secrets", "configmaps"]
- level: Request
verbs: ["create", "update", "patch", "delete"]
- level: Metadata
EOF
# 2. 创建日志目录
mkdir -p /var/log/kubernetes
# 3. 初始化集群
kubeadm init --config kubeadm-config.yaml --upload-certs
# 4. 配置 kubectl
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
# 5. 验证集群
kubectl cluster-info
kubectl get nodes
# 6. 保存 join 命令
kubeadm token create --print-join-command > /tmp/join-command.sh
echo "✓ 集群初始化完成"
5.3 初始化输出示例
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join 192.168.1.100:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:abc123... \
--control-plane --certificate-key def456...
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.1.100:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:abc123...
6. 节点加入集群
6.1 加入控制平面节点
#!/bin/bash
# join-control-plane.sh
# 从 master-01 复制证书到 master-02 和 master-03
scp /etc/kubernetes/pki/ca.crt master-02:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/ca.key master-02:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/sa.key master-02:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/sa.pub master-02:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/front-proxy-ca.crt master-02:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/front-proxy-ca.key master-02:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/etcd/ca.crt master-02:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/etcd/ca.key master-02:/etc/kubernetes/pki/
# 在 master-02 和 master-03 执行
kubeadm join 192.168.1.100:6443 \
--token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:abc123... \
--control-plane \
--certificate-key def456...
echo "✓ 控制平面节点加入完成"
6.2 加入工作节点
#!/bin/bash
# join-worker.sh
# 在 worker-01, worker-02, worker-03 执行
kubeadm join 192.168.1.100:6443 \
--token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:abc123...
echo "✓ 工作节点加入完成"
6.3 验证节点状态
# 在 master-01 执行
kubectl get nodes -o wide
# 输出示例:
# NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
# master-01 Ready control-plane 10m v1.24.0 192.168.1.20 <none> Ubuntu 22.04 LTS 5.15.0-76-generic containerd://1.7.2
# master-02 Ready control-plane 5m v1.24.0 192.168.1.21 <none> Ubuntu 22.04 LTS 5.15.0-76-generic containerd://1.7.2
# master-03 Ready control-plane 5m v1.24.0 192.168.1.22 <none> Ubuntu 22.04 LTS 5.15.0-76-generic containerd://1.7.2
# worker-01 Ready <none> 3m v1.24.0 192.168.1.30 <none> Ubuntu 22.04 LTS 5.15.0-76-generic containerd://1.7.2
# worker-02 Ready <none> 3m v1.24.0 192.168.1.31 <none> Ubuntu 22.04 LTS 5.15.0-76-generic containerd://1.7.2
# worker-03 Ready <none> 3m v1.24.0 192.168.1.32 <none> Ubuntu 22.04 LTS 5.15.0-76-generic containerd://1.7.2
7. CNI 网络插件部署
7.1 Calico 部署
# calico.yaml
---
# Source: calico/templates/calico-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: calico-config
namespace: kube-system
data:
calico_backend: "bird"
veth_mtu: "0"
cni_network_config: |-
{
"name": "k8s-pod-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"log_level": "info",
"log_file_path": "/var/log/calico/cni.log",
"datastore_type": "kubernetes",
"nodename": "__KUBERNETES_NODE_NAME__",
"mtu": __CNI_MTU__,
"ipam": {
"type": "calico-ipam"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "__KUBECONFIG_FILEPATH__"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
},
{
"type": "bandwidth",
"capabilities": {"bandwidth": true}
}
]
}
---
# Source: calico/templates/kdd-crds.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: bgpconfigurations.crd.projectcalico.org
spec:
group: crd.projectcalico.org
names:
kind: BGPConfiguration
listKind: BGPConfigurationList
plural: bgpconfigurations
singular: bgpconfiguration
preserveUnknownFields: false
scope: Cluster
versions:
- name: v1
served: true
storage: true
---
# Source: calico/templates/calico-node.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: calico-node
namespace: kube-system
labels:
k8s-app: calico-node
spec:
selector:
matchLabels:
k8s-app: calico-node
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
k8s-app: calico-node
spec:
nodeSelector:
kubernetes.io/os: linux
hostNetwork: true
tolerations:
- effect: NoSchedule
operator: Exists
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute
operator: Exists
serviceAccountName: calico-node
terminationGracePeriodSeconds: 0
priorityClassName: system-node-critical
initContainers:
- name: upgrade-cni
image: docker.io/calico/cni:v3.25.0
command: ["/opt/cni/bin/install"]
envFrom:
- configMapRef:
name: calico-config
env:
- name: CNI_CONF_NAME
value: "10-calico.conflist"
- name: CNI_NETWORK_CONFIG
valueFrom:
configMapKeyRef:
name: calico-config
key: cni_network_config
- name: KUBERNETES_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: CNI_MTU
valueFrom:
configMapKeyRef:
name: calico-config
key: veth_mtu
volumeMounts:
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
securityContext:
privileged: true
containers:
- name: calico-node
image: docker.io/calico/node:v3.25.0
envFrom:
- configMapRef:
name: calico-config
env:
- name: DATASTORE_TYPE
value: "kubernetes"
- name: WAIT_FOR_DATASTORE
value: "true"
- name: NODENAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: CALICO_NETWORKING_BACKEND
valueFrom:
configMapKeyRef:
name: calico-config
key: calico_backend
- name: CLUSTER_TYPE
value: "k8s,bgp"
- name: IP
value: "autodetect"
- name: IP_AUTODETECTION_METHOD
value: "first-found"
- name: CALICO_IPV4POOL_IPIP
value: "Always"
- name: CALICO_IPV4POOL_CIDR
value: "10.244.0.0/16"
- name: CALICO_DISABLE_FILE_LOGGING
value: "true"
- name: FELIX_DEFAULTENDPOINTTOHOSTACTION
value: "ACCEPT"
- name: FELIX_IPV6SUPPORT
value: "false"
- name: FELIX_HEALTHENABLED
value: "true"
securityContext:
privileged: true
resources:
requests:
cpu: 250m
livenessProbe:
exec:
command:
- /bin/calico-node
- -felix-live
periodSeconds: 10
initialDelaySeconds: 10
failureThreshold: 6
timeoutSeconds: 10
readinessProbe:
exec:
command:
- /bin/calico-node
- -felix-ready
periodSeconds: 10
timeoutSeconds: 10
volumeMounts:
- mountPath: /lib/modules
name: lib-modules
readOnly: true
- mountPath: /run/xtables.lock
name: xtables-lock
readOnly: false
- mountPath: /var/run/calico
name: var-run-calico
readOnly: false
- mountPath: /var/lib/calico
name: var-lib-calico
readOnly: false
- name: policysync
mountPath: /var/run/nodeagent
- name: cni-log-dir
mountPath: /var/log/calico
readOnly: true
volumes:
- name: lib-modules
hostPath:
path: /lib/modules
- name: var-run-calico
hostPath:
path: /var/run/calico
- name: var-lib-calico
hostPath:
path: /var/lib/calico
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: FileOrCreate
- name: cni-bin-dir
hostPath:
path: /opt/cni/bin
- name: cni-log-dir
hostPath:
path: /var/log/calico
- name: policysync
hostPath:
path: /var/run/nodeagent
type: DirectoryOrCreate
---
# Source: calico/templates/calico-kube-controllers.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: calico-kube-controllers
namespace: kube-system
labels:
k8s-app: calico-kube-controllers
spec:
replicas: 1
selector:
matchLabels:
k8s-app: calico-kube-controllers
strategy:
type: Recreate
template:
metadata:
name: calico-kube-controllers
namespace: kube-system
labels:
k8s-app: calico-kube-controllers
spec:
nodeSelector:
kubernetes.io/os: linux
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
serviceAccountName: calico-kube-controllers
priorityClassName: system-cluster-critical
containers:
- name: calico-kube-controllers
image: docker.io/calico/kube-controllers:v3.25.0
env:
- name: ENABLED_CONTROLLERS
value: "node,pod,namespace,serviceaccount,workloadendpoint"
- name: DATASTORE_TYPE
value: "kubernetes"
livenessProbe:
exec:
command:
- /usr/bin/check-status
- -l
periodSeconds: 10
initialDelaySeconds: 10
failureThreshold: 6
timeoutSeconds: 10
readinessProbe:
exec:
command:
- /usr/bin/check-status
- -r
periodSeconds: 10
部署命令:
# 应用 Calico 配置
kubectl apply -f calico.yaml
# 验证 Pod 状态
kubectl get pods -n kube-system -l k8s-app=calico-node
kubectl get pods -n kube-system -l k8s-app=calico-kube-controllers
# 验证网络
kubectl get nodes -o wide
kubectl run test --image=busybox --restart=Never --rm -it -- ping -c 3 10.244.1.2
8. 集群验证与测试
8.1 组件验证
#!/bin/bash
# cluster-verification.sh
echo "=== 集群验证 ==="
echo
# 1. 节点状态
echo "1. 节点状态:"
kubectl get nodes -o wide
echo
# 2. 系统 Pod
echo "2. 系统 Pod:"
kubectl get pods -n kube-system -o wide
echo
# 3. 组件状态
echo "3. 组件状态:"
kubectl get componentstatuses
echo
# 4. DNS 测试
echo "4. DNS 测试:"
kubectl run dns-test --image=busybox:1.36 --restart=Never --rm -it -- \
nslookup kubernetes.default
echo
# 5. 网络测试
echo "5. 网络测试:"
kubectl run net-test --image=busybox:1.36 --restart=Never --rm -it -- \
ping -c 3 10.96.0.1
echo
# 6. 应用部署测试
echo "6. 应用部署测试:"
kubectl create deployment nginx-test --image=nginx:1.25
kubectl expose deployment nginx-test --port=80 --type=ClusterIP
sleep 5
kubectl get pods -l app=nginx-test
kubectl get svc nginx-test
kubectl delete deployment nginx-test
kubectl delete svc nginx-test
echo
echo "=== 验证完成 ==="
8.2 性能基准测试
#!/bin/bash
# performance-benchmark.sh
echo "=== 性能基准测试 ==="
echo
# 1. Pod 启动时间
echo "1. Pod 启动时间测试:"
START=$(date +%s.%N)
kubectl run perf-test --image=nginx:1.25 --restart=Never
while [[ $(kubectl get pod perf-test -o jsonpath='{.status.phase}') != "Running" ]]; do
sleep 0.5
done
END=$(date +%s.%N)
ELAPSED=$(echo "$END - $START" | bc)
echo " Pod 启动时间:${ELAPSED}s"
kubectl delete pod perf-test
echo
# 2. Service 创建时间
echo "2. Service 创建时间测试:"
START=$(date +%s.%N)
kubectl create service clusterip perf-svc --tcp=80:80
kubectl wait --for=condition=available svc/perf-svc --timeout=30s
END=$(date +%s.%N)
ELAPSED=$(echo "$END - $START" | bc)
echo " Service 创建时间:${ELAPSED}s"
kubectl delete svc perf-svc
echo
# 3. 并发 Pod 创建
echo "3. 并发 Pod 创建测试 (10 Pods):"
START=$(date +%s.%N)
for i in {1..10}; do
kubectl run perf-test-$i --image=nginx:1.25 --restart=Never &
done
wait
for i in {1..10}; do
kubectl wait --for=condition=ready pod/perf-test-$i --timeout=60s
done
END=$(date +%s.%N)
ELAPSED=$(echo "$END - $START" | bc)
echo " 10 个 Pod 并发启动时间:${ELAPSED}s"
for i in {1..10}; do
kubectl delete pod perf-test-$i
done
echo
echo "=== 基准测试完成 ==="
9. 故障排查
9.1 常见问题与解决方案
9.1.1 Pod 无法启动
症状:
kubectl get pods
# 输出:ContainerCreating 或 CrashLoopBackOff
排查步骤:
# 1. 查看 Pod 详情
kubectl describe pod <pod-name>
# 2. 查看容器日志
kubectl logs <pod-name>
kubectl logs <pod-name> --previous
# 3. 检查 CNI 插件
kubectl get pods -n kube-system -l k8s-app=calico-node
# 4. 检查容器运行时
crictl ps
crictl images
# 5. 查看 kubelet 日志
journalctl -u kubelet -f
9.1.2 节点 NotReady
症状:
kubectl get nodes
# 输出:NotReady
解决方案:
# 1. 检查 kubelet 状态
systemctl status kubelet
journalctl -u kubelet -f
# 2. 检查 containerd
systemctl status containerd
crictl info
# 3. 检查 CNI 配置
ls -la /etc/cni/net.d/
ls -la /opt/cni/bin/
# 4. 验证网络连通性
ping <master-node-ip>
# 5. 重新加入集群
kubeadm reset
kubeadm join <token>
9.2 诊断工具
#!/bin/bash
# diagnostic-tool.sh
echo "=== Kubernetes 诊断工具 ==="
echo
# 1. 集群信息
echo "1. 集群信息:"
kubectl cluster-info dump | grep -E "kubernetes|version"
echo
# 2. 事件日志
echo "2. 最近事件:"
kubectl get events --all-namespaces --sort-by='.lastTimestamp' | tail -20
echo
# 3. 资源使用
echo "3. 资源使用:"
kubectl top nodes
kubectl top pods --all-namespaces
echo
# 4. 证书检查
echo "4. 证书有效期:"
kubeadm certs check-expiration
echo
# 5. etcd 健康
echo "5. etcd 健康检查:"
ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
endpoint health
echo
echo "=== 诊断完成 ==="
10. 总结
本文深入解析了使用 kubeadm 部署 Kubernetes 1.24 生产集群的完整流程,包括:
- 集群架构设计: 高可用拓扑、规模规划、网络 CIDR
- 主机准备: 系统检查、内核优化、限制调整
- 容器运行时: containerd 安装、配置优化、CNI 插件
- kubeadm 安装: 仓库配置、镜像加速、版本管理
- 集群初始化: 配置文件、初始化流程、证书管理
- 节点加入: 控制平面节点、工作节点、验证流程
- CNI 部署: Calico 配置、网络验证
- 集群验证: 组件检查、性能基准测试
- 故障排查: 常见问题、诊断工具、解决方案
掌握这些技术是构建稳定高效 Kubernetes 生产环境的基础。
版权声明:本文为原创技术文章,转载请附上本文链接。
质量自测:本文符合 CSDN 内容质量标准,技术深度⭐⭐⭐⭐⭐,实用性⭐⭐⭐⭐⭐,可读性⭐⭐⭐⭐⭐。
AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。
更多推荐


所有评论(0)