Pod核心原理与生命周期管理
Pod核心原理与生命周期管理
1. Pod数据结构深度解析
1.1 Pod核心概念
Pod是Kubernetes中最小的可部署单元,它封装了一个或多个容器、存储资源、唯一的网络IP以及控制容器运行方式的选项。
Pod的本质
Pod的四大共享特性:
| 共享类型 | 说明 | 实现方式 |
|---|---|---|
| 网络命名空间 | 所有容器共享同一IP和端口空间 | Pause容器(基础设施容器) |
| 存储卷 | 所有容器可访问相同的卷 | Volume挂载 |
| IPC命名空间 | 容器间可通过信号量、共享内存通信 | shareProcessNamespace |
| UTS命名空间 | 所有容器共享主机名 | Pause容器 |
Pause容器的作用
Pause容器是每个Pod的基础设施容器,负责:
- 持有网络命名空间:Pod内所有容器加入Pause容器的网络命名空间
- 持有IPC命名空间:启用进程间通信
- 回收僵尸进程:作为PID 1进程,回收孤儿进程
# 查看Pause容器
crictl ps | grep pause
# 输出示例
# CONTAINER ID IMAGE NAME STATE
# abc123 registry.k8s.io/pause:3.9 k8s_POD_nginx_xxx Running
1.2 PodSpec核心字段详解
PodSpec定义了Pod的期望状态,包含众多关键配置字段。
核心字段结构
apiVersion: v1
kind: Pod
metadata:
name: example-pod
namespace: default
labels:
app: example
annotations:
description: "示例Pod配置"
spec:
# ===== 容器定义 =====
containers: [] # 必选,主容器列表
initContainers: [] # 可选,Init容器列表
ephemeralContainers: [] # 可选,临时容器列表
# ===== 调度控制 =====
nodeName: "" # 指定节点名
nodeSelector: {} # 节点选择器
affinity: {} # 亲和性配置
tolerations: [] # 容忍配置
topologySpreadConstraints: [] # 拓扑分布约束
priorityClassName: "" # 优先级类名
priority: 0 # 优先级数值
preemptionPolicy: "PreemptLowerPriority" # 抢占策略
# ===== 运行控制 =====
restartPolicy: "Always" # 重启策略:Always/OnFailure/Never
activeDeadlineSeconds: 0 # 活动截止时间
terminationGracePeriodSeconds: 30 # 终止宽限期
# ===== 网络配置 =====
hostNetwork: false # 使用主机网络
hostPID: false # 使用主机PID命名空间
hostIPC: false # 使用主机IPC命名空间
shareProcessNamespace: false # 共享进程命名空间
dnsPolicy: "ClusterFirst" # DNS策略
dnsConfig: {} # 自定义DNS配置
hostAliases: [] # 主机别名
enableServiceLinks: true # 启用服务链接
# ===== 安全配置 =====
serviceAccountName: "default" # ServiceAccount名
automountServiceAccountToken: true # 自动挂载SA Token
securityContext: {} # Pod安全上下文
# ===== 存储配置 =====
volumes: [] # 卷定义列表
# ===== 资源配置 =====
overhead: {} # Pod开销(RuntimeClass关联)
# ===== 其他配置 =====
hostname: "" # 主机名
subdomain: "" # 子域名
schedulerName: "default-scheduler" # 调度器名
imagePullSecrets: [] # 镜像拉取密钥
runtimeClassName: "" # 运行时类名
字段分类详解
容器相关字段:
| 字段 | 类型 | 必选 | 说明 |
|---|---|---|---|
containers |
[]Container | ✅ | 主容器列表,至少一个 |
initContainers |
[]Container | ❌ | Init容器,按顺序启动 |
ephemeralContainers |
[]EphemeralContainer | ❌ | 临时容器,用于调试 |
调度控制字段:
| 字段 | 类型 | 说明 |
|---|---|---|
nodeName |
string | 直接指定节点,跳过调度器 |
nodeSelector |
map[string]string | 简单节点标签选择 |
affinity |
Affinity | 高级亲和性配置 |
tolerations |
[]Toleration | 污点容忍配置 |
topologySpreadConstraints |
[]TopologySpreadConstraint | 拓扑分布约束 |
priorityClassName |
string | PriorityClass引用 |
priority |
int32 | 优先级数值(1-1000000000) |
preemptionPolicy |
string | 抢占策略 |
运行控制字段:
| 字段 | 类型 | 默认值 | 说明 |
|---|---|---|---|
restartPolicy |
string | Always | 重启策略 |
activeDeadlineSeconds |
int64 | 0 | Pod最大运行时间(秒) |
terminationGracePeriodSeconds |
int64 | 30 | 优雅终止时间(秒) |
1.3 PodStatus状态机
PodStatus描述了Pod的当前状态,包含Phase、Conditions、ContainerStatuses等信息。
Phase状态流转图
Phase状态详解
| Phase | 说明 | 触发条件 |
|---|---|---|
| Pending | Pod已被接受,但容器未启动 | 等待调度、拉取镜像、挂载卷 |
| Running | Pod已绑定节点,容器运行中 | 至少一个容器运行或启动中 |
| Succeeded | 所有容器成功终止 | restartPolicy=Never且正常退出 |
| Failed | 所有容器终止,至少一个失败 | 容器非零退出且无法恢复 |
| Unknown | 无法获取Pod状态 | 节点通信故障 |
PodConditions详解
status:
conditions:
- type: PodScheduled
status: "True"
lastProbeTime: null
lastTransitionTime: "2024-01-01T10:00:00Z"
reason: PodScheduled
message: "Successfully assigned default/nginx-pod to node-1"
- type: Initialized
status: "True"
lastProbeTime: null
lastTransitionTime: "2024-01-01T10:00:05Z"
reason: PodCompleted
message: "All init containers completed successfully"
- type: Ready
status: "True"
lastProbeTime: null
lastTransitionTime: "2024-01-01T10:00:15Z"
reason: PodReady
message: "Pod is ready"
- type: ContainersReady
status: "True"
lastProbeTime: null
lastTransitionTime: "2024-01-01T10:00:15Z"
reason: ContainersReady
message: "All containers are ready"
- type: DisruptionTarget
status: "False"
lastProbeTime: null
lastTransitionTime: "2024-01-01T10:00:00Z"
Condition类型说明:
| Condition | 说明 | 状态转换 |
|---|---|---|
PodScheduled |
Pod已调度到节点 | Pending→True |
Initialized |
Init容器完成 | Initialized→True |
ContainersReady |
所有容器就绪 | 所有容器readiness通过 |
Ready |
Pod可服务 | ContainersReady + 其他条件 |
DisruptionTarget |
Pod将被驱逐 | 自愿中断时设为True |
ContainerStatus详解
status:
containerStatuses:
- name: nginx
state:
running:
startedAt: "2024-01-01T10:00:10Z"
lastState: {}
ready: true
restartCount: 0
image: nginx:1.25
imageID: docker-pullable://nginx@sha256:xxx
containerID: containerd://xxx
started: true
initContainerStatuses:
- name: init-myservice
state:
terminated:
exitCode: 0
reason: Completed
startedAt: "2024-01-01T10:00:01Z"
finishedAt: "2024-01-01T10:00:05Z"
lastState: {}
ready: true
restartCount: 0
image: busybox:1.36
imageID: docker-pullable://busybox@sha256:xxx
containerID: containerd://xxx
1.4 字段默认值与验证规则
默认值设置
| 字段 | 默认值 | 设置时机 |
|---|---|---|
restartPolicy |
Always | 创建时 |
terminationGracePeriodSeconds |
30 | 创建时 |
dnsPolicy |
ClusterFirst | 创建时 |
serviceAccountName |
default | 创建时 |
automountServiceAccountToken |
true | 创建时 |
enableServiceLinks |
true | 创建时 |
shareProcessNamespace |
false | 创建时 |
hostNetwork |
false | 创建时 |
hostPID |
false | 创建时 |
hostIPC |
false | 创建时 |
验证规则
// Pod验证规则示例
type PodValidation struct {
// containers至少一个
ContainersMinLength int `json:"containers" validate:"min=1"`
// restartPolicy枚举值
RestartPolicy string `json:"restartPolicy" validate:"oneof=Always OnFailure Never"`
// terminationGracePeriodSeconds范围
TerminationGracePeriod int64 `json:"terminationGracePeriodSeconds" validate:"gte=0"`
// activeDeadlineSeconds范围
ActiveDeadlineSeconds int64 `json:"activeDeadlineSeconds" validate:"gte=0"`
// priority范围
Priority int32 `json:"priority" validate:"gte=0,lte=1000000000"`
// DNS策略枚举值
DNSPolicy string `json:"dnsPolicy" validate:"oneof=ClusterFirst ClusterFirstWithHostNet Default None"`
}
2. 容器定义与参数详解
2.1 必选参数与镜像配置
必选参数
每个容器定义必须包含以下参数:
| 参数 | 类型 | 说明 | 示例 |
|---|---|---|---|
name |
string | 容器名称,Pod内唯一 | nginx |
image |
string | 容器镜像 | nginx:1.25 |
镜像名称格式
[registry/][namespace/]repository[:tag|@digest]
示例:
- nginx:1.25 # Docker Hub官方镜像
- library/nginx:1.25 # Docker Hub官方镜像(完整格式)
- docker.io/library/nginx:1.25 # 完整URL格式
- quay.io/prometheus/prometheus:v2.45.0 # 其他仓库
- harbor.example.com/myproject/myapp:v1.0.0 # 私有仓库
- nginx@sha256:abc123... # 使用摘要
imagePullPolicy详解
imagePullPolicy取值说明:
| 值 | 说明 | 默认行为 |
|---|---|---|
Always |
每次启动容器都拉取镜像 | tag为latest时的默认值 |
Never |
从不拉取,只使用本地镜像 | - |
IfNotPresent |
本地不存在时才拉取 | tag非latest时的默认值 |
apiVersion: v1
kind: Pod
metadata:
name: image-pull-policy-demo
spec:
containers:
- name: always-pull
image: nginx:latest
imagePullPolicy: Always # 每次都拉取
- name: never-pull
image: nginx:1.25
imagePullPolicy: Never # 从不拉取,需本地存在
- name: if-not-present
image: nginx:1.25
imagePullPolicy: IfNotPresent # 本地不存在时拉取
- name: default-latest
image: nginx:latest
# imagePullPolicy默认为Always
- name: default-tagged
image: nginx:1.25
# imagePullPolicy默认为IfNotPresent
2.2 命令与参数配置
command与args关系
Docker与Kubernetes字段对应:
| Docker | Kubernetes | 说明 |
|---|---|---|
| ENTRYPOINT | command | 可执行程序 |
| CMD | args | 参数列表 |
apiVersion: v1
kind: Pod
metadata:
name: command-args-demo
spec:
containers:
- name: command-demo
image: debian:bookworm
# 覆盖ENTRYPOINT和CMD
command: ["/bin/sh"]
args: ["-c", "echo 'Hello Kubernetes' && sleep 3600"]
- name: args-only
image: nginx:1.25
# 只覆盖CMD,使用镜像的ENTRYPOINT
args: ["-g", "daemon off;"]
- name: command-only
image: debian:bookworm
# 只覆盖ENTRYPOINT,使用镜像的CMD作为参数
command: ["/bin/echo"]
- name: inherit-all
image: nginx:1.25
# 使用镜像的ENTRYPOINT和CMD
workingDir配置
apiVersion: v1
kind: Pod
metadata:
name: workingdir-demo
spec:
containers:
- name: app
image: nginx:1.25
workingDir: /app # 设置工作目录
command: ["./start.sh"]
2.3 环境变量管理
环境变量配置方式
apiVersion: v1
kind: Pod
metadata:
name: env-demo
spec:
containers:
- name: app
image: nginx:1.25
env:
# 直接设置值
- name: ENV_VAR_1
value: "value1"
# 从ConfigMap获取
- name: ENV_VAR_2
valueFrom:
configMapKeyRef:
name: my-config
key: config-key
# 从Secret获取
- name: ENV_VAR_3
valueFrom:
secretKeyRef:
name: my-secret
key: secret-key
# 从Pod字段获取
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# 从容器资源获取
- name: CPU_LIMIT
valueFrom:
resourceFieldRef:
containerName: app
resource: limits.cpu
- name: MEM_REQUEST
valueFrom:
resourceFieldRef:
containerName: app
resource: requests.memory
envFrom:
# 从ConfigMap批量导入
- configMapRef:
name: app-config
prefix: CONFIG_ # 可选前缀
# 从Secret批量导入
- secretRef:
name: app-secret
prefix: SECRET_
环境变量优先级
2.4 资源管理参数
资源类型与单位
| 资源类型 | 单位 | 说明 |
|---|---|---|
| CPU | m(毫核)或整数(核心) |
1000m = 1核心 |
| 内存 | Ki, Mi, Gi(二进制)或K, M, G(十进制) |
1Mi = 1024Ki = 1048576字节 |
| 临时存储 | Ki, Mi, Gi |
ephemeral-storage |
| 扩展资源 | 整数 | 如nvidia.com/gpu: 1 |
requests与limits详解
apiVersion: v1
kind: Pod
metadata:
name: resources-demo
spec:
containers:
- name: app
image: nginx:1.25
resources:
# 资源请求(调度依据)
requests:
cpu: "250m" # 0.25核心
memory: "64Mi" # 64MiB
ephemeral-storage: "1Gi"
# 资源限制(运行时上限)
limits:
cpu: "500m" # 0.5核心
memory: "128Mi" # 128MiB
ephemeral-storage: "2Gi"
nvidia.com/gpu: 1 # 扩展资源
requests与limits的作用:
CPU限流机制
CPU限流示例:
# 查看CPU限流
cat /sys/fs/cgroup/cpu/kubepods/burstable/podxxx/cpu.cfs_quota_us
# 输出: 50000 (500m = 50ms per 100ms period)
cat /sys/fs/cgroup/cpu/kubepods/burstable/podxxx/cpu.cfs_period_us
# 输出: 100000 (100ms period)
2.5 生命周期钩子
生命周期钩子类型
钩子处理器类型
apiVersion: v1
kind: Pod
metadata:
name: lifecycle-demo
spec:
terminationGracePeriodSeconds: 60 # 终止宽限期
containers:
- name: app
image: nginx:1.25
lifecycle:
# 容器启动后立即执行
postStart:
exec:
command: ["/bin/sh", "-c", "echo 'Container started' > /var/log/start.log"]
# 容器终止前执行
preStop:
exec:
command: ["/bin/sh", "-c", "nginx -s quit; sleep 10"]
# HTTP钩子示例
# postStart:
# httpGet:
# path: /startup
# port: 8080
# host: localhost
# scheme: HTTP
钩子执行时机与注意事项
| 钩子 | 执行时机 | 阻塞行为 | 失败处理 |
|---|---|---|---|
postStart |
容器创建后立即执行 | 阻塞容器启动 | 容器启动失败 |
preStop |
容器终止前执行 | 阻塞容器终止 | 记录事件,继续终止 |
注意事项:
postStart与容器入口点异步执行,但必须完成后容器才视为"已启动"preStop必须在terminationGracePeriodSeconds内完成- 钩子执行失败会导致容器重启
- 钩子应设计为幂等操作
2.6 交互式容器配置
交互式参数
apiVersion: v1
kind: Pod
metadata:
name: interactive-demo
spec:
containers:
- name: interactive
image: debian:bookworm
stdin: true # 保持标准输入打开
stdinOnce: false # 多次连接stdin(默认false)
tty: true # 分配TTY
command: ["/bin/bash"]
参数说明:
| 参数 | 类型 | 默认值 | 说明 |
|---|---|---|---|
stdin |
bool | false | 保持标准输入打开 |
stdinOnce |
bool | false | stdin关闭后是否终止容器 |
tty |
bool | false | 分配伪终端 |
使用场景:
# 连接到交互式容器
kubectl attach -it interactive-demo -c interactive
# 使用exec进入容器
kubectl exec -it interactive-demo -c interactive -- /bin/bash
终止消息配置
apiVersion: v1
kind: Pod
metadata:
name: termination-message-demo
spec:
containers:
- name: app
image: debian:bookworm
command: ["/bin/sh", "-c"]
args:
- |
echo "Application starting..."
# 模拟错误
echo "Error: Database connection failed" > /dev/termination-log
exit 1
# 终止消息配置
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File # File或FallbackToLogsOnError
terminationMessagePolicy取值:
| 值 | 说明 |
|---|---|
File |
从terminationMessagePath文件读取 |
FallbackToLogsOnError |
文件为空或错误时,从容器的日志尾部读取 |
3. 镜像拉取与私有仓库
3.1 imagePullPolicy详解
拉取策略决策流程
默认行为详解
apiVersion: v1
kind: Pod
metadata:
name: image-pull-default
spec:
containers:
# 场景1: tag为latest,默认imagePullPolicy=Always
- name: latest-tag
image: nginx:latest
# 等同于 imagePullPolicy: Always
# 场景2: tag省略,默认imagePullPolicy=Always
- name: no-tag
image: nginx
# 等同于 imagePullPolicy: Always
# 场景3: tag为具体版本,默认imagePullPolicy=IfNotPresent
- name: specific-tag
image: nginx:1.25
# 等同于 imagePullPolicy: IfNotPresent
# 场景4: 使用摘要,默认imagePullPolicy=IfNotPresent
- name: digest
image: nginx@sha256:abc123...
# 等同于 imagePullPolicy: IfNotPresent
3.2 私有镜像仓库认证
创建Docker Registry Secret
# 方式1: 从docker凭证创建
kubectl create secret docker-registry my-registry-secret \
--docker-server=harbor.example.com \
--docker-username=admin \
--docker-password=Harbor12345 \
--docker-email=admin@example.com \
-n default
# 方式2: 从~/.docker/config.json创建
kubectl create secret generic my-registry-secret \
--from-file=.dockerconfigjson=/root/.docker/config.json \
--type=kubernetes.io/dockerconfigjson \
-n default
Secret数据结构
apiVersion: v1
kind: Secret
metadata:
name: my-registry-secret
namespace: default
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: eyJhdXRocyI6eyJoYXJib3IuZXhhbXBsZS5jb20iOnsidXNlcm5hbWUiOiJhZG1pbiIsInBhc3N3b3JkIjoiSGFyYm9yMTIzNDUiLCJlbWFpbCI6ImFkbWluQGV4YW1wbGUuY29tIiwiYXV0aCI6IllXUnRhVzQ2U0ZGeVltOXlabW89In19fQ==
解码后的内容:
{
"auths": {
"harbor.example.com": {
"username": "admin",
"password": "Harbor12345",
"email": "admin@example.com",
"auth": "YWRtaW46SGFyYm9yMTIzNDU="
}
}
}
使用imagePullSecrets
# 方式1: Pod级别配置
apiVersion: v1
kind: Pod
metadata:
name: private-image-pod
spec:
imagePullSecrets:
- name: my-registry-secret
containers:
- name: app
image: harbor.example.com/myproject/myapp:v1.0.0
---
# 方式2: ServiceAccount级别配置(推荐)
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-service-account
imagePullSecrets:
- name: my-registry-secret
---
apiVersion: v1
kind: Pod
metadata:
name: sa-image-pull
spec:
serviceAccountName: my-service-account
containers:
- name: app
image: harbor.example.com/myproject/myapp:v1.0.0
3.3 常用私有仓库配置
Harbor仓库配置
apiVersion: v1
kind: Secret
metadata:
name: harbor-secret
namespace: default
type: kubernetes.io/dockerconfigjson
stringData:
.dockerconfigjson: |
{
"auths": {
"harbor.example.com": {
"username": "robot$myproject",
"password": "robot-token-here",
"auth": "$(echo -n 'robot$myproject:robot-token-here' | base64)"
}
}
}
---
apiVersion: v1
kind: Pod
metadata:
name: harbor-app
spec:
imagePullSecrets:
- name: harbor-secret
containers:
- name: app
image: harbor.example.com/myproject/myapp:v1.0.0
阿里云ACR配置
apiVersion: v1
kind: Secret
metadata:
name: aliyun-acr-secret
namespace: default
type: kubernetes.io/dockerconfigjson
stringData:
.dockerconfigjson: |
{
"auths": {
"registry.cn-hangzhou.aliyuncs.com": {
"username": "your-username",
"password": "your-password",
"auth": "$(echo -n 'your-username:your-password' | base64)"
}
}
}
AWS ECR配置
# 使用AWS CLI获取认证令牌
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 123456789.dkr.ecr.us-west-2.amazonaws.com
# 创建Secret
kubectl create secret docker-registry aws-ecr-secret \
--docker-server=123456789.dkr.ecr.us-west-2.amazonaws.com \
--docker-username=AWS \
--docker-password=$(aws ecr get-login-password --region us-west-2) \
-n default
多仓库配置
apiVersion: v1
kind: Secret
metadata:
name: multi-registry-secret
namespace: default
type: kubernetes.io/dockerconfigjson
stringData:
.dockerconfigjson: |
{
"auths": {
"harbor.example.com": {
"username": "admin",
"password": "password1"
},
"registry.cn-hangzhou.aliyuncs.com": {
"username": "user",
"password": "password2"
},
"docker.io": {
"username": "dockeruser",
"password": "password3"
}
}
}
3.4 镜像拉取失败排查
常见错误类型
| 错误 | 说明 | 排查方向 |
|---|---|---|
ImagePullBackOff |
镜像拉取失败,正在重试 | 检查镜像名、认证、网络 |
ErrImagePull |
镜像拉取失败 | 检查镜像是否存在 |
ErrImageNeverPull |
本地镜像不存在 | 检查本地镜像或修改策略 |
RegistryUnavailable |
镜像仓库不可用 | 检查仓库连通性 |
Unauthorized |
认证失败 | 检查imagePullSecrets |
排查流程
排查命令
# 1. 查看Pod事件
kubectl describe pod <pod-name> -n <namespace>
# 2. 查看Pod状态
kubectl get pod <pod-name> -n <namespace> -o yaml
# 3. 检查Secret
kubectl get secret <secret-name> -n <namespace> -o yaml
kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d
# 4. 测试镜像拉取(在节点上)
crictl pull harbor.example.com/myproject/myapp:v1.0.0
# 5. 检查节点上的镜像
crictl images | grep myapp
# 6. 测试仓库连通性
curl -v https://harbor.example.com/v2/_catalog
# 7. 使用docker测试认证
docker login harbor.example.com -u admin -p Harbor12345
docker pull harbor.example.com/myproject/myapp:v1.0.0
4. Init容器与Sidecar模式
4.1 Init容器机制
Init容器特性
Init容器与主容器的区别:
| 特性 | Init容器 | 主容器 |
|---|---|---|
| 执行顺序 | 串行执行 | 并行执行 |
| 退出要求 | 必须成功退出 | 可持续运行 |
| 重启策略 | 失败时重启Pod | 根据restartPolicy |
| 探针支持 | 不支持 | 支持liveness/readiness |
| 端口声明 | 不支持 | 支持 |
Init容器配置示例
apiVersion: v1
kind: Pod
metadata:
name: init-container-demo
spec:
initContainers:
# Init容器1: 等待依赖服务
- name: wait-for-db
image: busybox:1.36
command: ['sh', '-c', 'until nc -z mysql-service 3306; do echo waiting for mysql; sleep 2; done']
# Init容器2: 初始化配置
- name: init-config
image: busybox:1.36
command: ['sh', '-c', 'cp /config-template/* /config/']
volumeMounts:
- name: config-template
mountPath: /config-template
- name: config
mountPath: /config
# Init容器3: 数据库迁移
- name: db-migration
image: myapp:migration
command: ['python', 'manage.py', 'migrate']
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-secret
key: url
containers:
- name: app
image: myapp:v1.0.0
volumeMounts:
- name: config
mountPath: /app/config
volumes:
- name: config-template
configMap:
name: app-config-template
- name: config
emptyDir: {}
Init容器资源计算
资源计算示例:
apiVersion: v1
kind: Pod
metadata:
name: resource-calculation-demo
spec:
initContainers:
- name: init-1
image: busybox
resources:
requests:
cpu: 100m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi
- name: init-2
image: busybox
resources:
requests:
cpu: 200m # 最大CPU请求
memory: 32Mi
limits:
cpu: 500m # 最大CPU限制
memory: 64Mi
containers:
- name: main-1
image: nginx
resources:
requests:
cpu: 100m
memory: 128Mi
- name: main-2
image: redis
resources:
requests:
cpu: 100m
memory: 64Mi
# Pod有效资源请求:
# CPU: max(200m, 100m+100m) = 200m
# Memory: max(64Mi, 128Mi+64Mi) = 192Mi
4.2 原生Sidecar容器
Kubernetes v1.29引入了原生Sidecar容器支持,通过在Init容器中设置restartPolicy: Always实现。
原生Sidecar特性
原生Sidecar与传统Sidecar对比:
| 特性 | 传统Sidecar | 原生Sidecar (v1.29+) |
|---|---|---|
| 定义位置 | containers | initContainers |
| 启动顺序 | 与主容器并行 | 先于主容器启动 |
| 生命周期 | 独立 | 与Pod生命周期绑定 |
| 重启策略 | 遵循Pod策略 | restartPolicy: Always |
| 终止顺序 | 随机 | 最后终止 |
原生Sidecar配置示例
apiVersion: v1
kind: Pod
metadata:
name: native-sidecar-demo
spec:
initContainers:
# 原生Sidecar容器:日志收集代理
- name: log-collector
image: fluent/fluent-bit:2.2
restartPolicy: Always # 关键:设置为Always
volumeMounts:
- name: app-logs
mountPath: /var/log/app
- name: fluent-config
mountPath: /fluent-bit/etc
# 普通Init容器:初始化任务
- name: init-task
image: busybox:1.36
command: ['sh', '-c', 'echo "Initializing..." && sleep 5']
containers:
- name: app
image: nginx:1.25
volumeMounts:
- name: app-logs
mountPath: /var/log/app
volumes:
- name: app-logs
emptyDir: {}
- name: fluent-config
configMap:
name: fluent-bit-config
4.3 多容器设计模式
Sidecar模式
apiVersion: v1
kind: Pod
metadata:
name: sidecar-pattern
spec:
containers:
# 主容器:应用
- name: app
image: nginx:1.25
volumeMounts:
- name: shared-logs
mountPath: /var/log/nginx
# Sidecar容器:日志收集
- name: log-collector
image: fluent/fluent-bit:2.2
volumeMounts:
- name: shared-logs
mountPath: /var/log/nginx
readOnly: true
volumes:
- name: shared-logs
emptyDir: {}
Ambassador模式
apiVersion: v1
kind: Pod
metadata:
name: ambassador-pattern
spec:
containers:
# 主容器:应用
- name: app
image: myapp:v1.0.0
env:
- name: DB_HOST
value: "127.0.0.1" # 连接本地代理
- name: DB_PORT
value: "3306"
# Ambassador容器:数据库代理
- name: db-proxy
image: envoyproxy/envoy:v1.28
ports:
- containerPort: 3306
volumeMounts:
- name: envoy-config
mountPath: /etc/envoy
volumes:
- name: envoy-config
configMap:
name: envoy-db-proxy-config
Adapter模式
apiVersion: v1
kind: Pod
metadata:
name: adapter-pattern
spec:
containers:
# 主容器:应用(输出自定义格式日志)
- name: app
image: myapp:v1.0.0
volumeMounts:
- name: app-logs
mountPath: /var/log/app
# Adapter容器:日志格式转换
- name: log-adapter
image: log-adapter:v1.0.0
volumeMounts:
- name: app-logs
mountPath: /var/log/app
readOnly: true
- name: output-logs
mountPath: /var/log/output
env:
- name: INPUT_FORMAT
value: "custom"
- name: OUTPUT_FORMAT
value: "json"
volumes:
- name: app-logs
emptyDir: {}
- name: output-logs
emptyDir: {}
4.4 容器共享配置
共享进程命名空间
apiVersion: v1
kind: Pod
metadata:
name: process-namespace-demo
spec:
shareProcessNamespace: true # 启用进程命名空间共享
containers:
- name: app
image: nginx:1.25
- name: debugger
image: busybox:1.36
command: ['sleep', '3600']
securityContext:
capabilities:
add: ['SYS_PTRACE'] # 允许调试其他容器进程
启用后的效果:
# 在debugger容器中可以看到app容器的进程
kubectl exec -it process-namespace-demo -c debugger -- ps aux
# 输出示例:
# PID USER TIME COMMAND
# 1 root 0:00 /pause
# 10 root 0:00 nginx: master process
# 20 101 0:00 nginx: worker process
# 30 root 0:00 sleep 3600
# 40 root 0:00 ps aux
共享主机命名空间
apiVersion: v1
kind: Pod
metadata:
name: host-namespace-demo
spec:
hostNetwork: true # 使用主机网络
hostPID: true # 使用主机PID命名空间
hostIPC: true # 使用主机IPC命名空间
containers:
- name: app
image: nginx:1.25
ports:
- containerPort: 80
hostPort: 8080 # 当hostNetwork=true时,hostPort等同于containerPort
注意事项:
| 配置 | 安全风险 | 使用场景 |
|---|---|---|
hostNetwork: true |
可访问主机所有网络 | 网络插件、监控Agent |
hostPID: true |
可查看主机所有进程 | 调试工具、监控Agent |
hostIPC: true |
可访问主机IPC资源 | 特殊应用 |
5. Pod生命周期管理
5.1 Phase状态流转
完整状态流转图
状态转换触发条件
| 当前状态 | 目标状态 | 触发条件 |
|---|---|---|
| Pending | Running | 所有容器创建并启动成功 |
| Pending | Failed | 调度失败(无可用节点) |
| Pending | Failed | 镜像拉取失败 |
| Running | Succeeded | 所有容器退出码为0(restartPolicy=Never) |
| Running | Failed | 容器退出码非0且无法恢复 |
| Running | Failed | 超过activeDeadlineSeconds |
| Running | Running | 容器重启(restartPolicy=Always/OnFailure) |
5.2 探针机制详解
探针类型
三种探针对比:
| 探针类型 | 作用 | 失败后果 | 使用场景 |
|---|---|---|---|
livenessProbe |
检测容器是否存活 | 重启容器 | 检测死锁、僵尸进程 |
readinessProbe |
检测容器是否就绪 | 从Service端点移除 | 检测依赖、预热 |
startupProbe |
检测容器是否启动完成 | 重启容器 | 慢启动应用 |
探针机制工作流程
5.3 探针参数配置
探针参数详解
apiVersion: v1
kind: Pod
metadata:
name: probe-params-demo
spec:
containers:
- name: app
image: nginx:1.25
ports:
- containerPort: 80
# 存活探针
livenessProbe:
httpGet:
path: /healthz
port: 80
scheme: HTTP
httpHeaders:
- name: Custom-Header
value: "probe"
initialDelaySeconds: 15 # 初始延迟(秒)
periodSeconds: 10 # 检查间隔(秒)
timeoutSeconds: 5 # 超时时间(秒)
successThreshold: 1 # 成功阈值
failureThreshold: 3 # 失败阈值
# 就绪探针
readinessProbe:
httpGet:
path: /ready
port: 80
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
# 启动探针
startupProbe:
httpGet:
path: /startup
port: 80
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 30 # 最多等待300秒(30*10)
参数说明:
| 参数 | 说明 | 默认值 | 推荐值 |
|---|---|---|---|
initialDelaySeconds |
首次探测前的等待时间 | 0 | 根据应用启动时间 |
periodSeconds |
探测间隔 | 10 | 5-15秒 |
timeoutSeconds |
单次探测超时时间 | 1 | 3-5秒 |
successThreshold |
成功阈值(连续成功次数) | 1 | 1 |
failureThreshold |
失败阈值(连续失败次数) | 3 | 3-5 |
探针类型配置
apiVersion: v1
kind: Pod
metadata:
name: probe-types-demo
spec:
containers:
- name: app
image: nginx:1.25
# HTTP GET探针
livenessProbe:
httpGet:
path: /healthz
port: 80
scheme: HTTP
httpHeaders:
- name: Host
value: "localhost"
# TCP Socket探针
readinessProbe:
tcpSocket:
port: 80
# Exec探针
startupProbe:
exec:
command:
- /bin/sh
- -c
- "test -f /app/ready"
# gRPC探针(v1.24+)
# livenessProbe:
# grpc:
# port: 50051
# service: myservice
探针配置最佳实践
apiVersion: v1
kind: Pod
metadata:
name: probe-best-practice
spec:
containers:
- name: app
image: myapp:v1.0.0
ports:
- containerPort: 8080
# 启动探针:慢启动应用
startupProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
failureThreshold: 30 # 最多等待5分钟
# 存活探针:检测死锁
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 0 # startupProbe完成后才开始
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
# 就绪探针:检测依赖
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
5.4 终止流程管理
Pod终止流程
terminationGracePeriodSeconds详解
apiVersion: v1
kind: Pod
metadata:
name: termination-demo
spec:
terminationGracePeriodSeconds: 60 # 终止宽限期(秒)
containers:
- name: app
image: nginx:1.25
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- |
# 优雅关闭nginx
nginx -s quit
# 等待连接处理完成
sleep 30
宽限期时间分配:
强制删除Pod
# 正常删除(等待宽限期)
kubectl delete pod my-pod
# 强制删除(跳过宽限期)
kubectl delete pod my-pod --force --grace-period=0
# 注意:强制删除可能导致数据丢失
6. Pod调度策略
6.1 nodeSelector与nodeAffinity
nodeSelector简单选择
apiVersion: v1
kind: Pod
metadata:
name: nodeselector-demo
spec:
nodeSelector:
disktype: ssd # 必须匹配
zone: us-west-1a # 必须匹配
containers:
- name: app
image: nginx:1.25
节点标签设置:
# 添加节点标签
kubectl label nodes node-1 disktype=ssd
kubectl label nodes node-1 zone=us-west-1a
# 查看节点标签
kubectl get nodes --show-labels
nodeAffinity高级配置
apiVersion: v1
kind: Pod
metadata:
name: nodeaffinity-demo
spec:
affinity:
nodeAffinity:
# 必须满足(硬性要求)
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- amd64
- key: disktype
operator: In
values:
- ssd
- nvme
# 优先满足(软性要求)
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
preference:
matchExpressions:
- key: zone
operator: In
values:
- us-west-1a
- weight: 20
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
containers:
- name: app
image: nginx:1.25
Operator操作符:
| Operator | 说明 | 示例 |
|---|---|---|
In |
值在列表中 | zone In [us-west-1a, us-west-1b] |
NotIn |
值不在列表中 | env NotIn [prod] |
Exists |
键存在 | gpu Exists |
DoesNotExist |
键不存在 | legacy DoesNotExist |
Gt |
值大于(数字) | cpu-cores Gt 8 |
Lt |
值小于(数字) | memory Lt 64 |
6.2 Pod亲和性与反亲和性
Pod亲和性配置
apiVersion: v1
kind: Pod
metadata:
name: pod-affinity-demo
spec:
affinity:
podAffinity:
# 必须与指定Pod在同一拓扑域
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: database
topologyKey: kubernetes.io/hostname
# 优先与指定Pod在同一拓扑域
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: cache
topologyKey: kubernetes.io/hostname
podAntiAffinity:
# 必须不与指定Pod在同一节点
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: frontend
topologyKey: kubernetes.io/hostname
# 优先不与指定Pod在同一可用区
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 50
podAffinityTerm:
labelSelector:
matchLabels:
app: frontend
topologyKey: topology.kubernetes.io/zone
containers:
- name: app
image: nginx:1.25
亲和性应用场景
6.3 污点与容忍
污点类型
# 添加污点
kubectl taint nodes node-1 key=value:effect
# 删除污点
kubectl taint nodes node-1 key:effect-
# 查看污点
kubectl describe node node-1 | grep Taints
Effect效果:
| Effect | 说明 | 行为 |
|---|---|---|
NoSchedule |
不调度 | 新Pod不会被调度到该节点 |
PreferNoSchedule |
尽量不调度 | 尽量避免调度,但资源不足时可以调度 |
NoExecute |
不调度且驱逐 | 新Pod不调度,已有Pod被驱逐 |
容忍配置
apiVersion: v1
kind: Pod
metadata:
name: toleration-demo
spec:
tolerations:
# 容忍特定污点
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
# 容忍某个key的所有值
- key: "key2"
operator: "Exists"
effect: "NoSchedule"
# 容忍所有污点(不推荐)
- operator: "Exists"
# 容忍NoExecute污点,并设置容忍时间
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 300 # 节点不可达后容忍300秒
containers:
- name: app
image: nginx:1.25
内置污点
| 污点 | 说明 | 自动添加时机 |
|---|---|---|
node.kubernetes.io/not-ready |
节点不健康 | 节点Ready状态为False |
node.kubernetes.io/unreachable |
节点不可达 | 节点Ready状态为Unknown |
node.kubernetes.io/memory-pressure |
内存压力 | 内存不足 |
node.kubernetes.io/disk-pressure |
磁盘压力 | 磁盘不足 |
node.kubernetes.io/pid-pressure |
PID压力 | PID不足 |
node.kubernetes.io/network-unavailable |
网络不可用 | 网络未配置 |
node.kubernetes.io/unschedulable |
不可调度 | kubectl cordon |
6.4 拓扑分布约束
topologySpreadConstraints配置
apiVersion: v1
kind: Pod
metadata:
name: topology-spread-demo
spec:
topologySpreadConstraints:
- maxSkew: 1 # 最大偏差
topologyKey: kubernetes.io/hostname # 拓扑键(节点级别)
whenUnsatisfiable: DoNotSchedule # 不满足时的行为
labelSelector:
matchLabels:
app: myapp
- maxSkew: 2
topologyKey: topology.kubernetes.io/zone # 拓扑键(可用区级别)
whenUnsatisfiable: ScheduleAnyway # 尽量满足
labelSelector:
matchLabels:
app: myapp
containers:
- name: app
image: nginx:1.25
labels:
app: myapp
参数说明:
| 参数 | 说明 |
|---|---|
maxSkew |
最大偏差,不同拓扑域Pod数量的最大差值 |
topologyKey |
拓扑键,定义拓扑域的节点标签 |
whenUnsatisfiable |
不满足约束时的行为 |
labelSelector |
匹配的Pod标签 |
whenUnsatisfiable取值:
| 值 | 说明 |
|---|---|
DoNotSchedule |
不满足约束时不调度(硬性要求) |
ScheduleAnyway |
尽量满足,但不强制(软性要求) |
分布约束示例
6.5 优先级与抢占
PriorityClass定义
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000 # 优先级数值(1-1000000000)
globalDefault: false # 是否为默认优先级
preemptionPolicy: PreemptLowerPriority # 抢占策略
description: "高优先级应用"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: medium-priority
value: 100000
globalDefault: true
preemptionPolicy: PreemptLowerPriority
description: "中等优先级应用"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low-priority
value: 1000
globalDefault: false
preemptionPolicy: Never # 不抢占其他Pod
description: "低优先级应用,不抢占"
使用PriorityClass
apiVersion: v1
kind: Pod
metadata:
name: high-priority-pod
spec:
priorityClassName: high-priority # 引用PriorityClass
containers:
- name: app
image: nginx:1.25
抢占机制流程
7. 命名空间与资源隔离
7.1 Namespace核心概念
Namespace数据结构
apiVersion: v1
kind: Namespace
metadata:
name: my-namespace
labels:
name: my-namespace
environment: development
annotations:
description: "开发环境命名空间"
spec:
finalizers:
- kubernetes # 删除前清理资源
status:
phase: Active # Active或Terminating
Namespace生命周期
7.2 命名空间隔离机制
资源隔离
| 资源类型 | 命名空间级别 | 集群级别 |
|---|---|---|
| Pod | ✅ | - |
| Service | ✅ | - |
| Deployment | ✅ | - |
| ConfigMap | ✅ | - |
| Secret | ✅ | - |
| PVC | ✅ | - |
| Node | - | ✅ |
| PV | - | ✅ |
| StorageClass | - | ✅ |
| ClusterRole | - | ✅ |
| Namespace | - | ✅ |
网络隔离
# 默认拒绝所有入站流量
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: my-namespace
spec:
podSelector: {} # 选择所有Pod
policyTypes:
- Ingress
---
# 允许同命名空间内通信
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-same-namespace
namespace: my-namespace
spec:
podSelector: {}
ingress:
- from:
- podSelector: {} # 同命名空间的Pod
policyTypes:
- Ingress
RBAC隔离
# Role:命名空间级别权限
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: namespace-admin
namespace: my-namespace
rules:
- apiGroups: [""]
resources: ["*"]
verbs: ["*"]
- apiGroups: ["apps"]
resources: ["*"]
verbs: ["*"]
---
# RoleBinding:绑定到用户
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: namespace-admin-binding
namespace: my-namespace
subjects:
- kind: User
name: dev-user
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: namespace-admin
apiGroup: rbac.authorization.k8s.io
7.3 跨命名空间访问
跨命名空间Service访问
# 命名空间A中的Service
apiVersion: v1
kind: Service
metadata:
name: backend-service
namespace: namespace-a
spec:
selector:
app: backend
ports:
- port: 8080
---
# 命名空间B中的Pod访问
apiVersion: v1
kind: Pod
metadata:
name: frontend
namespace: namespace-b
spec:
containers:
- name: app
image: nginx:1.25
env:
- name: BACKEND_URL
value: "http://backend-service.namespace-a.svc.cluster.local:8080"
Service DNS格式:
<service-name>.<namespace>.svc.cluster.local
示例:
backend-service.namespace-a.svc.cluster.local
跨命名空间资源引用
# 跨命名空间引用Secret(部分资源支持)
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
namespace: namespace-b
spec:
template:
spec:
imagePullSecrets:
- name: registry-secret # 只能引用同命名空间的Secret
# ConfigMap/Secret不能跨命名空间引用
# 需要在目标命名空间创建副本
7.4 系统命名空间
| 命名空间 | 说明 | 内容 |
|---|---|---|
default |
默认命名空间 | 用户资源 |
kube-system |
系统命名空间 | 系统组件(kube-proxy, CoreDNS等) |
kube-public |
公共命名空间 | 公共资源(如集群信息) |
kube-node-lease |
节点租约 | 节点心跳数据 |
# 查看系统命名空间
kubectl get namespaces
# 查看kube-system中的组件
kubectl get pods -n kube-system
# 查看集群信息(kube-public)
kubectl get configmap cluster-info -n kube-public -o yaml
8. DNS配置与服务发现
8.1 DNS策略配置
dnsPolicy取值
apiVersion: v1
kind: Pod
metadata:
name: dns-policy-demo
spec:
dnsPolicy: ClusterFirst # DNS策略
containers:
- name: app
image: nginx:1.25
dnsPolicy取值说明:
| 值 | 说明 | 使用场景 |
|---|---|---|
ClusterFirst |
优先使用集群DNS | 默认值,大多数应用 |
ClusterFirstWithHostNet |
主机网络时使用集群DNS | hostNetwork: true |
Default |
使用节点DNS配置 | 需要解析外部域名 |
None |
自定义DNS配置 | 完全自定义DNS |
各策略对比
8.2 自定义DNS配置
dnsConfig详解
apiVersion: v1
kind: Pod
metadata:
name: dns-config-demo
spec:
dnsPolicy: None # 必须设置为None才能使用dnsConfig
dnsConfig:
nameservers:
- 8.8.8.8
- 8.8.4.4
searches:
- mydomain.local
- example.com
options:
- name: ndots
value: "2"
- name: timeout
value: "3"
- name: attempts
value: "2"
containers:
- name: app
image: nginx:1.25
生成的/etc/resolv.conf:
nameserver 8.8.8.8
nameserver 8.8.4.4
search mydomain.local example.com
options ndots:2 timeout:3 attempts:2
hostAliases配置
apiVersion: v1
kind: Pod
metadata:
name: hostaliases-demo
spec:
hostAliases:
- ip: "127.0.0.1"
hostnames:
- "foo.local"
- "bar.local"
- ip: "192.168.1.100"
hostnames:
- "myapp.local"
containers:
- name: app
image: nginx:1.25
生成的/etc/hosts:
# Kubernetes-managed hosts file
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
...
# Entries added by HostAliases.
127.0.0.1 foo.local bar.local
192.168.1.100 myapp.local
8.3 服务发现机制
DNS服务发现
DNS记录类型
| 记录类型 | 格式 | 说明 |
|---|---|---|
| A记录 | <service>.<ns>.svc.cluster.local |
Service ClusterIP |
| SRV记录 | _<port>._<proto>.<service>.<ns>.svc.cluster.local |
服务端口信息 |
| A记录 | <pod-ip>.<service>.<ns>.svc.cluster.local |
Headless Service的Pod IP |
DNS查询示例:
# 查询Service
nslookup kubernetes.default.svc.cluster.local
# 查询Headless Service的Pod
nslookup myapp-0.myapp-headless.default.svc.cluster.local
# 查询SRV记录
nslookup -type=SRV _http._tcp.myapp.default.svc.cluster.local
环境变量服务发现
apiVersion: v1
kind: Pod
metadata:
name: env-service-discovery
spec:
enableServiceLinks: true # 默认为true
containers:
- name: app
image: nginx:1.25
# 自动注入的环境变量:
# KUBERNETES_SERVICE_HOST=10.96.0.1
# KUBERNETES_SERVICE_PORT=443
# KUBERNETES_PORT=tcp://10.96.0.1:443
# KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
# KUBERNETES_PORT_443_TCP_PROTO=tcp
# KUBERNETES_PORT_443_TCP_PORT=443
# KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
8.4 DNS调试方法
常用调试命令
# 1. 检查CoreDNS状态
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns
# 2. 检查CoreDNS配置
kubectl get configmap coredns -n kube-system -o yaml
# 3. 使用nslookup调试
kubectl exec -it my-pod -- nslookup kubernetes.default
# 4. 使用dig调试
kubectl exec -it my-pod -- dig kubernetes.default.svc.cluster.local
# 5. 检查Pod的DNS配置
kubectl exec -it my-pod -- cat /etc/resolv.conf
# 6. 测试DNS解析
kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never -- nslookup kubernetes.default
DNS问题排查流程
DNS调试Pod
apiVersion: v1
kind: Pod
metadata:
name: dns-debug
spec:
containers:
- name: dns-debug
image: nicolaka/netshoot:latest
command: ['sleep', '3600']
# 使用方法:
# kubectl exec -it dns-debug -- dig kubernetes.default.svc.cluster.local
# kubectl exec -it dns-debug -- nslookup kubernetes.default
# kubectl exec -it dns-debug -- drill kubernetes.default.svc.cluster.local
9. 总结与最佳实践
9.1 Pod设计最佳实践
| 场景 | 最佳实践 | 说明 |
|---|---|---|
| 资源限制 | 始终设置requests和limits | 防止资源争抢,保证QoS |
| 健康检查 | 配置liveness和readiness探针 | 自动故障恢复,流量管理 |
| 优雅终止 | 设置preStop钩子和宽限期 | 确保连接正确关闭 |
| 镜像管理 | 使用具体版本tag,避免latest | 确保可重复部署 |
| 配置管理 | 使用ConfigMap/Secret | 配置与镜像分离 |
| 安全配置 | 设置securityContext | 最小权限原则 |
9.2 资源配置模板
apiVersion: v1
kind: Pod
metadata:
name: best-practice-pod
labels:
app: myapp
version: v1.0.0
spec:
serviceAccountName: myapp-sa
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
terminationGracePeriodSeconds: 60
containers:
- name: app
image: myapp:v1.0.0
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
startupProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
failureThreshold: 30
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
volumeMounts:
- name: tmp
mountPath: /tmp
- name: config
mountPath: /app/config
readOnly: true
volumes:
- name: tmp
emptyDir: {}
- name: config
configMap:
name: myapp-config
9.3 常见问题排查清单
| 问题 | 排查命令 | 可能原因 |
|---|---|---|
| Pod一直Pending | kubectl describe pod |
资源不足、调度限制 |
| 镜像拉取失败 | kubectl describe pod |
认证失败、网络问题 |
| 容器频繁重启 | kubectl logs --previous |
应用崩溃、探针失败 |
| DNS解析失败 | kubectl exec -- nslookup |
CoreDNS故障、配置错误 |
| 服务无法访问 | kubectl get endpoints |
Selector不匹配、探针失败 |
AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。
更多推荐
所有评论(0)