Pod核心原理与生命周期管理

紫丁香

338人浏览 · 2026-05-02 21:11:03

紫丁香 · 2026-05-02 21:11:03 发布

Pod核心原理与生命周期管理

1. Pod数据结构深度解析

1.1 Pod核心概念

Pod是Kubernetes中最小的可部署单元，它封装了一个或多个容器、存储资源、唯一的网络IP以及控制容器运行方式的选项。

Pod的本质

Pod的四大共享特性：

共享类型	说明	实现方式
网络命名空间	所有容器共享同一IP和端口空间	Pause容器（基础设施容器）
存储卷	所有容器可访问相同的卷	Volume挂载
IPC命名空间	容器间可通过信号量、共享内存通信	`shareProcessNamespace`
UTS命名空间	所有容器共享主机名	Pause容器

Pause容器的作用

Pause容器是每个Pod的基础设施容器，负责：

持有网络命名空间：Pod内所有容器加入Pause容器的网络命名空间
持有IPC命名空间：启用进程间通信
回收僵尸进程：作为PID 1进程，回收孤儿进程

# 查看Pause容器
crictl ps | grep pause

# 输出示例
# CONTAINER ID   IMAGE                             NAME                STATE
# abc123         registry.k8s.io/pause:3.9         k8s_POD_nginx_xxx   Running

1.2 PodSpec核心字段详解

PodSpec定义了Pod的期望状态，包含众多关键配置字段。

核心字段结构

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
  namespace: default
  labels:
    app: example
  annotations:
    description: "示例Pod配置"
spec:
  # ===== 容器定义 =====
  containers: []          # 必选，主容器列表
  initContainers: []      # 可选，Init容器列表
  ephemeralContainers: [] # 可选，临时容器列表
  
  # ===== 调度控制 =====
  nodeName: ""            # 指定节点名
  nodeSelector: {}        # 节点选择器
  affinity: {}            # 亲和性配置
  tolerations: []         # 容忍配置
  topologySpreadConstraints: []  # 拓扑分布约束
  priorityClassName: ""   # 优先级类名
  priority: 0             # 优先级数值
  preemptionPolicy: "PreemptLowerPriority"  # 抢占策略
  
  # ===== 运行控制 =====
  restartPolicy: "Always" # 重启策略：Always/OnFailure/Never
  activeDeadlineSeconds: 0  # 活动截止时间
  terminationGracePeriodSeconds: 30  # 终止宽限期
  
  # ===== 网络配置 =====
  hostNetwork: false      # 使用主机网络
  hostPID: false          # 使用主机PID命名空间
  hostIPC: false          # 使用主机IPC命名空间
  shareProcessNamespace: false  # 共享进程命名空间
  dnsPolicy: "ClusterFirst"  # DNS策略
  dnsConfig: {}           # 自定义DNS配置
  hostAliases: []         # 主机别名
  enableServiceLinks: true  # 启用服务链接
  
  # ===== 安全配置 =====
  serviceAccountName: "default"  # ServiceAccount名
  automountServiceAccountToken: true  # 自动挂载SA Token
  securityContext: {}     # Pod安全上下文
  
  # ===== 存储配置 =====
  volumes: []             # 卷定义列表
  
  # ===== 资源配置 =====
  overhead: {}            # Pod开销（RuntimeClass关联）
  
  # ===== 其他配置 =====
  hostname: ""            # 主机名
  subdomain: ""           # 子域名
  schedulerName: "default-scheduler"  # 调度器名
  imagePullSecrets: []    # 镜像拉取密钥
  runtimeClassName: ""    # 运行时类名

字段分类详解

容器相关字段：

字段	类型	必选	说明
`containers`	[]Container	✅	主容器列表，至少一个
`initContainers`	[]Container	❌	Init容器，按顺序启动
`ephemeralContainers`	[]EphemeralContainer	❌	临时容器，用于调试

调度控制字段：

字段	类型	说明
`nodeName`	string	直接指定节点，跳过调度器
`nodeSelector`	map[string]string	简单节点标签选择
`affinity`	Affinity	高级亲和性配置
`tolerations`	[]Toleration	污点容忍配置
`topologySpreadConstraints`	[]TopologySpreadConstraint	拓扑分布约束
`priorityClassName`	string	PriorityClass引用
`priority`	int32	优先级数值（1-1000000000）
`preemptionPolicy`	string	抢占策略

运行控制字段：

字段	类型	默认值	说明
`restartPolicy`	string	Always	重启策略
`activeDeadlineSeconds`	int64	0	Pod最大运行时间（秒）
`terminationGracePeriodSeconds`	int64	30	优雅终止时间（秒）

1.3 PodStatus状态机

PodStatus描述了Pod的当前状态，包含Phase、Conditions、ContainerStatuses等信息。

Phase状态流转图

Phase状态详解

Phase	说明	触发条件
Pending	Pod已被接受，但容器未启动	等待调度、拉取镜像、挂载卷
Running	Pod已绑定节点，容器运行中	至少一个容器运行或启动中
Succeeded	所有容器成功终止	restartPolicy=Never且正常退出
Failed	所有容器终止，至少一个失败	容器非零退出且无法恢复
Unknown	无法获取Pod状态	节点通信故障

PodConditions详解

status:
  conditions:
  - type: PodScheduled
    status: "True"
    lastProbeTime: null
    lastTransitionTime: "2024-01-01T10:00:00Z"
    reason: PodScheduled
    message: "Successfully assigned default/nginx-pod to node-1"
  
  - type: Initialized
    status: "True"
    lastProbeTime: null
    lastTransitionTime: "2024-01-01T10:00:05Z"
    reason: PodCompleted
    message: "All init containers completed successfully"
  
  - type: Ready
    status: "True"
    lastProbeTime: null
    lastTransitionTime: "2024-01-01T10:00:15Z"
    reason: PodReady
    message: "Pod is ready"
  
  - type: ContainersReady
    status: "True"
    lastProbeTime: null
    lastTransitionTime: "2024-01-01T10:00:15Z"
    reason: ContainersReady
    message: "All containers are ready"
  
  - type: DisruptionTarget
    status: "False"
    lastProbeTime: null
    lastTransitionTime: "2024-01-01T10:00:00Z"

Condition类型说明：

Condition	说明	状态转换
`PodScheduled`	Pod已调度到节点	Pending→True
`Initialized`	Init容器完成	Initialized→True
`ContainersReady`	所有容器就绪	所有容器readiness通过
`Ready`	Pod可服务	ContainersReady + 其他条件
`DisruptionTarget`	Pod将被驱逐	自愿中断时设为True

ContainerStatus详解

status:
  containerStatuses:
  - name: nginx
    state:
      running:
        startedAt: "2024-01-01T10:00:10Z"
    lastState: {}
    ready: true
    restartCount: 0
    image: nginx:1.25
    imageID: docker-pullable://nginx@sha256:xxx
    containerID: containerd://xxx
    started: true
    
  initContainerStatuses:
  - name: init-myservice
    state:
      terminated:
        exitCode: 0
        reason: Completed
        startedAt: "2024-01-01T10:00:01Z"
        finishedAt: "2024-01-01T10:00:05Z"
    lastState: {}
    ready: true
    restartCount: 0
    image: busybox:1.36
    imageID: docker-pullable://busybox@sha256:xxx
    containerID: containerd://xxx

1.4 字段默认值与验证规则

默认值设置

字段	默认值	设置时机
`restartPolicy`	Always	创建时
`terminationGracePeriodSeconds`	30	创建时
`dnsPolicy`	ClusterFirst	创建时
`serviceAccountName`	default	创建时
`automountServiceAccountToken`	true	创建时
`enableServiceLinks`	true	创建时
`shareProcessNamespace`	false	创建时
`hostNetwork`	false	创建时
`hostPID`	false	创建时
`hostIPC`	false	创建时

验证规则

// Pod验证规则示例
type PodValidation struct {
    // containers至少一个
    ContainersMinLength int `json:"containers" validate:"min=1"`
    
    // restartPolicy枚举值
    RestartPolicy string `json:"restartPolicy" validate:"oneof=Always OnFailure Never"`
    
    // terminationGracePeriodSeconds范围
    TerminationGracePeriod int64 `json:"terminationGracePeriodSeconds" validate:"gte=0"`
    
    // activeDeadlineSeconds范围
    ActiveDeadlineSeconds int64 `json:"activeDeadlineSeconds" validate:"gte=0"`
    
    // priority范围
    Priority int32 `json:"priority" validate:"gte=0,lte=1000000000"`
    
    // DNS策略枚举值
    DNSPolicy string `json:"dnsPolicy" validate:"oneof=ClusterFirst ClusterFirstWithHostNet Default None"`
}

2. 容器定义与参数详解

2.1 必选参数与镜像配置

必选参数

每个容器定义必须包含以下参数：

参数	类型	说明	示例
`name`	string	容器名称，Pod内唯一	`nginx`
`image`	string	容器镜像	`nginx:1.25`

镜像名称格式

[registry/][namespace/]repository[:tag|@digest]

示例：
- nginx:1.25                    # Docker Hub官方镜像
- library/nginx:1.25            # Docker Hub官方镜像（完整格式）
- docker.io/library/nginx:1.25  # 完整URL格式
- quay.io/prometheus/prometheus:v2.45.0  # 其他仓库
- harbor.example.com/myproject/myapp:v1.0.0  # 私有仓库
- nginx@sha256:abc123...        # 使用摘要

imagePullPolicy详解

imagePullPolicy取值说明：

值	说明	默认行为
`Always`	每次启动容器都拉取镜像	tag为`latest`时的默认值
`Never`	从不拉取，只使用本地镜像	-
`IfNotPresent`	本地不存在时才拉取	tag非`latest`时的默认值

apiVersion: v1
kind: Pod
metadata:
  name: image-pull-policy-demo
spec:
  containers:
  - name: always-pull
    image: nginx:latest
    imagePullPolicy: Always  # 每次都拉取
  
  - name: never-pull
    image: nginx:1.25
    imagePullPolicy: Never   # 从不拉取，需本地存在
  
  - name: if-not-present
    image: nginx:1.25
    imagePullPolicy: IfNotPresent  # 本地不存在时拉取
  
  - name: default-latest
    image: nginx:latest
    # imagePullPolicy默认为Always
  
  - name: default-tagged
    image: nginx:1.25
    # imagePullPolicy默认为IfNotPresent

2.2 命令与参数配置

command与args关系

Docker与Kubernetes字段对应：

Docker	Kubernetes	说明
ENTRYPOINT	command	可执行程序
CMD	args	参数列表

apiVersion: v1
kind: Pod
metadata:
  name: command-args-demo
spec:
  containers:
  - name: command-demo
    image: debian:bookworm
    # 覆盖ENTRYPOINT和CMD
    command: ["/bin/sh"]
    args: ["-c", "echo 'Hello Kubernetes' && sleep 3600"]
  
  - name: args-only
    image: nginx:1.25
    # 只覆盖CMD，使用镜像的ENTRYPOINT
    args: ["-g", "daemon off;"]
  
  - name: command-only
    image: debian:bookworm
    # 只覆盖ENTRYPOINT，使用镜像的CMD作为参数
    command: ["/bin/echo"]
  
  - name: inherit-all
    image: nginx:1.25
    # 使用镜像的ENTRYPOINT和CMD

workingDir配置

apiVersion: v1
kind: Pod
metadata:
  name: workingdir-demo
spec:
  containers:
  - name: app
    image: nginx:1.25
    workingDir: /app  # 设置工作目录
    command: ["./start.sh"]

2.3 环境变量管理

环境变量配置方式

apiVersion: v1
kind: Pod
metadata:
  name: env-demo
spec:
  containers:
  - name: app
    image: nginx:1.25
    env:
    # 直接设置值
    - name: ENV_VAR_1
      value: "value1"
    
    # 从ConfigMap获取
    - name: ENV_VAR_2
      valueFrom:
        configMapKeyRef:
          name: my-config
          key: config-key
    
    # 从Secret获取
    - name: ENV_VAR_3
      valueFrom:
        secretKeyRef:
          name: my-secret
          key: secret-key
    
    # 从Pod字段获取
    - name: POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          fieldPath: metadata.namespace
    
    - name: POD_IP
      valueFrom:
        fieldRef:
          fieldPath: status.podIP
    
    - name: NODE_NAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName
    
    # 从容器资源获取
    - name: CPU_LIMIT
      valueFrom:
        resourceFieldRef:
          containerName: app
          resource: limits.cpu
    
    - name: MEM_REQUEST
      valueFrom:
        resourceFieldRef:
          containerName: app
          resource: requests.memory
    
    envFrom:
    # 从ConfigMap批量导入
    - configMapRef:
        name: app-config
      prefix: CONFIG_  # 可选前缀
    
    # 从Secret批量导入
    - secretRef:
        name: app-secret
      prefix: SECRET_

环境变量优先级

2.4 资源管理参数

资源类型与单位

资源类型	单位	说明
CPU	`m`（毫核）或整数（核心）	1000m = 1核心
内存	`Ki`, `Mi`, `Gi`（二进制）或`K`, `M`, `G`（十进制）	1Mi = 1024Ki = 1048576字节
临时存储	`Ki`, `Mi`, `Gi`	ephemeral-storage
扩展资源	整数	如`nvidia.com/gpu: 1`

requests与limits详解

apiVersion: v1
kind: Pod
metadata:
  name: resources-demo
spec:
  containers:
  - name: app
    image: nginx:1.25
    resources:
      # 资源请求（调度依据）
      requests:
        cpu: "250m"      # 0.25核心
        memory: "64Mi"   # 64MiB
        ephemeral-storage: "1Gi"
      
      # 资源限制（运行时上限）
      limits:
        cpu: "500m"      # 0.5核心
        memory: "128Mi"  # 128MiB
        ephemeral-storage: "2Gi"
        nvidia.com/gpu: 1  # 扩展资源

requests与limits的作用：

CPU限流机制

CPU限流示例：

# 查看CPU限流
cat /sys/fs/cgroup/cpu/kubepods/burstable/podxxx/cpu.cfs_quota_us
# 输出: 50000 (500m = 50ms per 100ms period)

cat /sys/fs/cgroup/cpu/kubepods/burstable/podxxx/cpu.cfs_period_us
# 输出: 100000 (100ms period)

2.5 生命周期钩子

生命周期钩子类型

钩子处理器类型

apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-demo
spec:
  terminationGracePeriodSeconds: 60  # 终止宽限期
  containers:
  - name: app
    image: nginx:1.25
    lifecycle:
      # 容器启动后立即执行
      postStart:
        exec:
          command: ["/bin/sh", "-c", "echo 'Container started' > /var/log/start.log"]
      
      # 容器终止前执行
      preStop:
        exec:
          command: ["/bin/sh", "-c", "nginx -s quit; sleep 10"]
    
    # HTTP钩子示例
    # postStart:
    #   httpGet:
    #     path: /startup
    #     port: 8080
    #     host: localhost
    #     scheme: HTTP

钩子执行时机与注意事项

钩子	执行时机	阻塞行为	失败处理
`postStart`	容器创建后立即执行	阻塞容器启动	容器启动失败
`preStop`	容器终止前执行	阻塞容器终止	记录事件，继续终止

注意事项：

postStart与容器入口点异步执行，但必须完成后容器才视为"已启动"
preStop必须在terminationGracePeriodSeconds内完成
钩子执行失败会导致容器重启
钩子应设计为幂等操作

2.6 交互式容器配置

交互式参数

apiVersion: v1
kind: Pod
metadata:
  name: interactive-demo
spec:
  containers:
  - name: interactive
    image: debian:bookworm
    stdin: true        # 保持标准输入打开
    stdinOnce: false   # 多次连接stdin（默认false）
    tty: true          # 分配TTY
    command: ["/bin/bash"]

参数说明：

参数	类型	默认值	说明
`stdin`	bool	false	保持标准输入打开
`stdinOnce`	bool	false	stdin关闭后是否终止容器
`tty`	bool	false	分配伪终端

使用场景：

# 连接到交互式容器
kubectl attach -it interactive-demo -c interactive

# 使用exec进入容器
kubectl exec -it interactive-demo -c interactive -- /bin/bash

终止消息配置

apiVersion: v1
kind: Pod
metadata:
  name: termination-message-demo
spec:
  containers:
  - name: app
    image: debian:bookworm
    command: ["/bin/sh", "-c"]
    args:
    - |
      echo "Application starting..."
      # 模拟错误
      echo "Error: Database connection failed" > /dev/termination-log
      exit 1
    
    # 终止消息配置
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File  # File或FallbackToLogsOnError

terminationMessagePolicy取值：

值	说明
`File`	从terminationMessagePath文件读取
`FallbackToLogsOnError`	文件为空或错误时，从容器的日志尾部读取

3. 镜像拉取与私有仓库

3.1 imagePullPolicy详解

拉取策略决策流程

默认行为详解

apiVersion: v1
kind: Pod
metadata:
  name: image-pull-default
spec:
  containers:
  # 场景1: tag为latest，默认imagePullPolicy=Always
  - name: latest-tag
    image: nginx:latest
    # 等同于 imagePullPolicy: Always
  
  # 场景2: tag省略，默认imagePullPolicy=Always
  - name: no-tag
    image: nginx
    # 等同于 imagePullPolicy: Always
  
  # 场景3: tag为具体版本，默认imagePullPolicy=IfNotPresent
  - name: specific-tag
    image: nginx:1.25
    # 等同于 imagePullPolicy: IfNotPresent
  
  # 场景4: 使用摘要，默认imagePullPolicy=IfNotPresent
  - name: digest
    image: nginx@sha256:abc123...
    # 等同于 imagePullPolicy: IfNotPresent

3.2 私有镜像仓库认证

创建Docker Registry Secret

# 方式1: 从docker凭证创建
kubectl create secret docker-registry my-registry-secret \
  --docker-server=harbor.example.com \
  --docker-username=admin \
  --docker-password=Harbor12345 \
  --docker-email=admin@example.com \
  -n default

# 方式2: 从~/.docker/config.json创建
kubectl create secret generic my-registry-secret \
  --from-file=.dockerconfigjson=/root/.docker/config.json \
  --type=kubernetes.io/dockerconfigjson \
  -n default

Secret数据结构

apiVersion: v1
kind: Secret
metadata:
  name: my-registry-secret
  namespace: default
type: kubernetes.io/dockerconfigjson
data:
  .dockerconfigjson: eyJhdXRocyI6eyJoYXJib3IuZXhhbXBsZS5jb20iOnsidXNlcm5hbWUiOiJhZG1pbiIsInBhc3N3b3JkIjoiSGFyYm9yMTIzNDUiLCJlbWFpbCI6ImFkbWluQGV4YW1wbGUuY29tIiwiYXV0aCI6IllXUnRhVzQ2U0ZGeVltOXlabW89In19fQ==

解码后的内容：

{
  "auths": {
    "harbor.example.com": {
      "username": "admin",
      "password": "Harbor12345",
      "email": "admin@example.com",
      "auth": "YWRtaW46SGFyYm9yMTIzNDU="
    }
  }
}

使用imagePullSecrets

# 方式1: Pod级别配置
apiVersion: v1
kind: Pod
metadata:
  name: private-image-pod
spec:
  imagePullSecrets:
  - name: my-registry-secret
  containers:
  - name: app
    image: harbor.example.com/myproject/myapp:v1.0.0
---
# 方式2: ServiceAccount级别配置（推荐）
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-service-account
imagePullSecrets:
- name: my-registry-secret
---
apiVersion: v1
kind: Pod
metadata:
  name: sa-image-pull
spec:
  serviceAccountName: my-service-account
  containers:
  - name: app
    image: harbor.example.com/myproject/myapp:v1.0.0

3.3 常用私有仓库配置

Harbor仓库配置

apiVersion: v1
kind: Secret
metadata:
  name: harbor-secret
  namespace: default
type: kubernetes.io/dockerconfigjson
stringData:
  .dockerconfigjson: |
    {
      "auths": {
        "harbor.example.com": {
          "username": "robot$myproject",
          "password": "robot-token-here",
          "auth": "$(echo -n 'robot$myproject:robot-token-here' | base64)"
        }
      }
    }
---
apiVersion: v1
kind: Pod
metadata:
  name: harbor-app
spec:
  imagePullSecrets:
  - name: harbor-secret
  containers:
  - name: app
    image: harbor.example.com/myproject/myapp:v1.0.0

阿里云ACR配置

apiVersion: v1
kind: Secret
metadata:
  name: aliyun-acr-secret
  namespace: default
type: kubernetes.io/dockerconfigjson
stringData:
  .dockerconfigjson: |
    {
      "auths": {
        "registry.cn-hangzhou.aliyuncs.com": {
          "username": "your-username",
          "password": "your-password",
          "auth": "$(echo -n 'your-username:your-password' | base64)"
        }
      }
    }

AWS ECR配置

# 使用AWS CLI获取认证令牌
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 123456789.dkr.ecr.us-west-2.amazonaws.com

# 创建Secret
kubectl create secret docker-registry aws-ecr-secret \
  --docker-server=123456789.dkr.ecr.us-west-2.amazonaws.com \
  --docker-username=AWS \
  --docker-password=$(aws ecr get-login-password --region us-west-2) \
  -n default

多仓库配置

apiVersion: v1
kind: Secret
metadata:
  name: multi-registry-secret
  namespace: default
type: kubernetes.io/dockerconfigjson
stringData:
  .dockerconfigjson: |
    {
      "auths": {
        "harbor.example.com": {
          "username": "admin",
          "password": "password1"
        },
        "registry.cn-hangzhou.aliyuncs.com": {
          "username": "user",
          "password": "password2"
        },
        "docker.io": {
          "username": "dockeruser",
          "password": "password3"
        }
      }
    }

3.4 镜像拉取失败排查

常见错误类型

错误	说明	排查方向
`ImagePullBackOff`	镜像拉取失败，正在重试	检查镜像名、认证、网络
`ErrImagePull`	镜像拉取失败	检查镜像是否存在
`ErrImageNeverPull`	本地镜像不存在	检查本地镜像或修改策略
`RegistryUnavailable`	镜像仓库不可用	检查仓库连通性
`Unauthorized`	认证失败	检查imagePullSecrets

排查流程

排查命令

# 1. 查看Pod事件
kubectl describe pod <pod-name> -n <namespace>

# 2. 查看Pod状态
kubectl get pod <pod-name> -n <namespace> -o yaml

# 3. 检查Secret
kubectl get secret <secret-name> -n <namespace> -o yaml
kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d

# 4. 测试镜像拉取（在节点上）
crictl pull harbor.example.com/myproject/myapp:v1.0.0

# 5. 检查节点上的镜像
crictl images | grep myapp

# 6. 测试仓库连通性
curl -v https://harbor.example.com/v2/_catalog

# 7. 使用docker测试认证
docker login harbor.example.com -u admin -p Harbor12345
docker pull harbor.example.com/myproject/myapp:v1.0.0

4. Init容器与Sidecar模式

4.1 Init容器机制

Init容器特性

Init容器与主容器的区别：

特性	Init容器	主容器
执行顺序	串行执行	并行执行
退出要求	必须成功退出	可持续运行
重启策略	失败时重启Pod	根据restartPolicy
探针支持	不支持	支持liveness/readiness
端口声明	不支持	支持

Init容器配置示例

apiVersion: v1
kind: Pod
metadata:
  name: init-container-demo
spec:
  initContainers:
  # Init容器1: 等待依赖服务
  - name: wait-for-db
    image: busybox:1.36
    command: ['sh', '-c', 'until nc -z mysql-service 3306; do echo waiting for mysql; sleep 2; done']
  
  # Init容器2: 初始化配置
  - name: init-config
    image: busybox:1.36
    command: ['sh', '-c', 'cp /config-template/* /config/']
    volumeMounts:
    - name: config-template
      mountPath: /config-template
    - name: config
      mountPath: /config
  
  # Init容器3: 数据库迁移
  - name: db-migration
    image: myapp:migration
    command: ['python', 'manage.py', 'migrate']
    env:
    - name: DATABASE_URL
      valueFrom:
        secretKeyRef:
          name: db-secret
          key: url
  
  containers:
  - name: app
    image: myapp:v1.0.0
    volumeMounts:
    - name: config
      mountPath: /app/config
  
  volumes:
  - name: config-template
    configMap:
      name: app-config-template
  - name: config
    emptyDir: {}

Init容器资源计算

资源计算示例：

apiVersion: v1
kind: Pod
metadata:
  name: resource-calculation-demo
spec:
  initContainers:
  - name: init-1
    image: busybox
    resources:
      requests:
        cpu: 100m
        memory: 64Mi
      limits:
        cpu: 200m
        memory: 128Mi
  
  - name: init-2
    image: busybox
    resources:
      requests:
        cpu: 200m    # 最大CPU请求
        memory: 32Mi
      limits:
        cpu: 500m    # 最大CPU限制
        memory: 64Mi
  
  containers:
  - name: main-1
    image: nginx
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
  
  - name: main-2
    image: redis
    resources:
      requests:
        cpu: 100m
        memory: 64Mi

# Pod有效资源请求：
# CPU: max(200m, 100m+100m) = 200m
# Memory: max(64Mi, 128Mi+64Mi) = 192Mi

4.2 原生Sidecar容器

Kubernetes v1.29引入了原生Sidecar容器支持，通过在Init容器中设置restartPolicy: Always实现。

原生Sidecar特性

原生Sidecar与传统Sidecar对比：

特性	传统Sidecar	原生Sidecar (v1.29+)
定义位置	containers	initContainers
启动顺序	与主容器并行	先于主容器启动
生命周期	独立	与Pod生命周期绑定
重启策略	遵循Pod策略	restartPolicy: Always
终止顺序	随机	最后终止

原生Sidecar配置示例

apiVersion: v1
kind: Pod
metadata:
  name: native-sidecar-demo
spec:
  initContainers:
  # 原生Sidecar容器：日志收集代理
  - name: log-collector
    image: fluent/fluent-bit:2.2
    restartPolicy: Always  # 关键：设置为Always
    volumeMounts:
    - name: app-logs
      mountPath: /var/log/app
    - name: fluent-config
      mountPath: /fluent-bit/etc
  
  # 普通Init容器：初始化任务
  - name: init-task
    image: busybox:1.36
    command: ['sh', '-c', 'echo "Initializing..." && sleep 5']
  
  containers:
  - name: app
    image: nginx:1.25
    volumeMounts:
    - name: app-logs
      mountPath: /var/log/app
  
  volumes:
  - name: app-logs
    emptyDir: {}
  - name: fluent-config
    configMap:
      name: fluent-bit-config

4.3 多容器设计模式

Sidecar模式

apiVersion: v1
kind: Pod
metadata:
  name: sidecar-pattern
spec:
  containers:
  # 主容器：应用
  - name: app
    image: nginx:1.25
    volumeMounts:
    - name: shared-logs
      mountPath: /var/log/nginx
  
  # Sidecar容器：日志收集
  - name: log-collector
    image: fluent/fluent-bit:2.2
    volumeMounts:
    - name: shared-logs
      mountPath: /var/log/nginx
      readOnly: true
  
  volumes:
  - name: shared-logs
    emptyDir: {}

Ambassador模式

apiVersion: v1
kind: Pod
metadata:
  name: ambassador-pattern
spec:
  containers:
  # 主容器：应用
  - name: app
    image: myapp:v1.0.0
    env:
    - name: DB_HOST
      value: "127.0.0.1"  # 连接本地代理
    - name: DB_PORT
      value: "3306"
  
  # Ambassador容器：数据库代理
  - name: db-proxy
    image: envoyproxy/envoy:v1.28
    ports:
    - containerPort: 3306
    volumeMounts:
    - name: envoy-config
      mountPath: /etc/envoy
  
  volumes:
  - name: envoy-config
    configMap:
      name: envoy-db-proxy-config

Adapter模式

apiVersion: v1
kind: Pod
metadata:
  name: adapter-pattern
spec:
  containers:
  # 主容器：应用（输出自定义格式日志）
  - name: app
    image: myapp:v1.0.0
    volumeMounts:
    - name: app-logs
      mountPath: /var/log/app
  
  # Adapter容器：日志格式转换
  - name: log-adapter
    image: log-adapter:v1.0.0
    volumeMounts:
    - name: app-logs
      mountPath: /var/log/app
      readOnly: true
    - name: output-logs
      mountPath: /var/log/output
    env:
    - name: INPUT_FORMAT
      value: "custom"
    - name: OUTPUT_FORMAT
      value: "json"
  
  volumes:
  - name: app-logs
    emptyDir: {}
  - name: output-logs
    emptyDir: {}

4.4 容器共享配置

共享进程命名空间

apiVersion: v1
kind: Pod
metadata:
  name: process-namespace-demo
spec:
  shareProcessNamespace: true  # 启用进程命名空间共享
  containers:
  - name: app
    image: nginx:1.25
  
  - name: debugger
    image: busybox:1.36
    command: ['sleep', '3600']
    securityContext:
      capabilities:
        add: ['SYS_PTRACE']  # 允许调试其他容器进程

启用后的效果：

# 在debugger容器中可以看到app容器的进程
kubectl exec -it process-namespace-demo -c debugger -- ps aux

# 输出示例：
# PID   USER     TIME  COMMAND
# 1     root      0:00 /pause
# 10    root      0:00 nginx: master process
# 20    101       0:00 nginx: worker process
# 30    root      0:00 sleep 3600
# 40    root      0:00 ps aux

共享主机命名空间

apiVersion: v1
kind: Pod
metadata:
  name: host-namespace-demo
spec:
  hostNetwork: true    # 使用主机网络
  hostPID: true        # 使用主机PID命名空间
  hostIPC: true        # 使用主机IPC命名空间
  containers:
  - name: app
    image: nginx:1.25
    ports:
    - containerPort: 80
      hostPort: 8080  # 当hostNetwork=true时，hostPort等同于containerPort

注意事项：

配置	安全风险	使用场景
`hostNetwork: true`	可访问主机所有网络	网络插件、监控Agent
`hostPID: true`	可查看主机所有进程	调试工具、监控Agent
`hostIPC: true`	可访问主机IPC资源	特殊应用

5. Pod生命周期管理

5.1 Phase状态流转

完整状态流转图

状态转换触发条件

当前状态	目标状态	触发条件
Pending	Running	所有容器创建并启动成功
Pending	Failed	调度失败（无可用节点）
Pending	Failed	镜像拉取失败
Running	Succeeded	所有容器退出码为0（restartPolicy=Never）
Running	Failed	容器退出码非0且无法恢复
Running	Failed	超过activeDeadlineSeconds
Running	Running	容器重启（restartPolicy=Always/OnFailure）

5.2 探针机制详解

探针类型

三种探针对比：

探针类型	作用	失败后果	使用场景
`livenessProbe`	检测容器是否存活	重启容器	检测死锁、僵尸进程
`readinessProbe`	检测容器是否就绪	从Service端点移除	检测依赖、预热
`startupProbe`	检测容器是否启动完成	重启容器	慢启动应用

探针机制工作流程

5.3 探针参数配置

探针参数详解

apiVersion: v1
kind: Pod
metadata:
  name: probe-params-demo
spec:
  containers:
  - name: app
    image: nginx:1.25
    ports:
    - containerPort: 80
    
    # 存活探针
    livenessProbe:
      httpGet:
        path: /healthz
        port: 80
        scheme: HTTP
        httpHeaders:
        - name: Custom-Header
          value: "probe"
      initialDelaySeconds: 15   # 初始延迟（秒）
      periodSeconds: 10         # 检查间隔（秒）
      timeoutSeconds: 5         # 超时时间（秒）
      successThreshold: 1       # 成功阈值
      failureThreshold: 3       # 失败阈值
    
    # 就绪探针
    readinessProbe:
      httpGet:
        path: /ready
        port: 80
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 3
      successThreshold: 1
      failureThreshold: 3
    
    # 启动探针
    startupProbe:
      httpGet:
        path: /startup
        port: 80
      initialDelaySeconds: 0
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 30  # 最多等待300秒（30*10）

参数说明：

参数	说明	默认值	推荐值
`initialDelaySeconds`	首次探测前的等待时间	0	根据应用启动时间
`periodSeconds`	探测间隔	10	5-15秒
`timeoutSeconds`	单次探测超时时间	1	3-5秒
`successThreshold`	成功阈值（连续成功次数）	1	1
`failureThreshold`	失败阈值（连续失败次数）	3	3-5

探针类型配置

apiVersion: v1
kind: Pod
metadata:
  name: probe-types-demo
spec:
  containers:
  - name: app
    image: nginx:1.25
    
    # HTTP GET探针
    livenessProbe:
      httpGet:
        path: /healthz
        port: 80
        scheme: HTTP
        httpHeaders:
        - name: Host
          value: "localhost"
    
    # TCP Socket探针
    readinessProbe:
      tcpSocket:
        port: 80
    
    # Exec探针
    startupProbe:
      exec:
        command:
        - /bin/sh
        - -c
        - "test -f /app/ready"
    
    # gRPC探针（v1.24+）
    # livenessProbe:
    #   grpc:
    #     port: 50051
    #     service: myservice

探针配置最佳实践

apiVersion: v1
kind: Pod
metadata:
  name: probe-best-practice
spec:
  containers:
  - name: app
    image: myapp:v1.0.0
    ports:
    - containerPort: 8080
    
    # 启动探针：慢启动应用
    startupProbe:
      httpGet:
        path: /healthz
        port: 8080
      periodSeconds: 10
      failureThreshold: 30  # 最多等待5分钟
    
    # 存活探针：检测死锁
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 0  # startupProbe完成后才开始
      periodSeconds: 15
      timeoutSeconds: 5
      failureThreshold: 3
    
    # 就绪探针：检测依赖
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 0
      periodSeconds: 5
      timeoutSeconds: 3
      failureThreshold: 3

5.4 终止流程管理

Pod终止流程

terminationGracePeriodSeconds详解

apiVersion: v1
kind: Pod
metadata:
  name: termination-demo
spec:
  terminationGracePeriodSeconds: 60  # 终止宽限期（秒）
  containers:
  - name: app
    image: nginx:1.25
    lifecycle:
      preStop:
        exec:
          command:
          - /bin/sh
          - -c
          - |
            # 优雅关闭nginx
            nginx -s quit
            # 等待连接处理完成
            sleep 30

宽限期时间分配：

强制删除Pod

# 正常删除（等待宽限期）
kubectl delete pod my-pod

# 强制删除（跳过宽限期）
kubectl delete pod my-pod --force --grace-period=0

# 注意：强制删除可能导致数据丢失

6. Pod调度策略

6.1 nodeSelector与nodeAffinity

nodeSelector简单选择

apiVersion: v1
kind: Pod
metadata:
  name: nodeselector-demo
spec:
  nodeSelector:
    disktype: ssd        # 必须匹配
    zone: us-west-1a     # 必须匹配
  containers:
  - name: app
    image: nginx:1.25

节点标签设置：

# 添加节点标签
kubectl label nodes node-1 disktype=ssd
kubectl label nodes node-1 zone=us-west-1a

# 查看节点标签
kubectl get nodes --show-labels

nodeAffinity高级配置

apiVersion: v1
kind: Pod
metadata:
  name: nodeaffinity-demo
spec:
  affinity:
    nodeAffinity:
      # 必须满足（硬性要求）
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/arch
            operator: In
            values:
            - amd64
          - key: disktype
            operator: In
            values:
            - ssd
            - nvme
      
      # 优先满足（软性要求）
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 80
        preference:
          matchExpressions:
          - key: zone
            operator: In
            values:
            - us-west-1a
      - weight: 20
        preference:
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
  
  containers:
  - name: app
    image: nginx:1.25

Operator操作符：

Operator	说明	示例
`In`	值在列表中	`zone In [us-west-1a, us-west-1b]`
`NotIn`	值不在列表中	`env NotIn [prod]`
`Exists`	键存在	`gpu Exists`
`DoesNotExist`	键不存在	`legacy DoesNotExist`
`Gt`	值大于（数字）	`cpu-cores Gt 8`
`Lt`	值小于（数字）	`memory Lt 64`

6.2 Pod亲和性与反亲和性

Pod亲和性配置

apiVersion: v1
kind: Pod
metadata:
  name: pod-affinity-demo
spec:
  affinity:
    podAffinity:
      # 必须与指定Pod在同一拓扑域
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: database
        topologyKey: kubernetes.io/hostname
      
      # 优先与指定Pod在同一拓扑域
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: cache
          topologyKey: kubernetes.io/hostname
    
    podAntiAffinity:
      # 必须不与指定Pod在同一节点
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: frontend
        topologyKey: kubernetes.io/hostname
      
      # 优先不与指定Pod在同一可用区
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 50
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: frontend
          topologyKey: topology.kubernetes.io/zone
  
  containers:
  - name: app
    image: nginx:1.25

亲和性应用场景

6.3 污点与容忍

污点类型

# 添加污点
kubectl taint nodes node-1 key=value:effect

# 删除污点
kubectl taint nodes node-1 key:effect-

# 查看污点
kubectl describe node node-1 | grep Taints

Effect效果：

Effect	说明	行为
`NoSchedule`	不调度	新Pod不会被调度到该节点
`PreferNoSchedule`	尽量不调度	尽量避免调度，但资源不足时可以调度
`NoExecute`	不调度且驱逐	新Pod不调度，已有Pod被驱逐

容忍配置

apiVersion: v1
kind: Pod
metadata:
  name: toleration-demo
spec:
  tolerations:
  # 容忍特定污点
  - key: "key1"
    operator: "Equal"
    value: "value1"
    effect: "NoSchedule"
  
  # 容忍某个key的所有值
  - key: "key2"
    operator: "Exists"
    effect: "NoSchedule"
  
  # 容忍所有污点（不推荐）
  - operator: "Exists"
  
  # 容忍NoExecute污点，并设置容忍时间
  - key: "node.kubernetes.io/unreachable"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 300  # 节点不可达后容忍300秒
  
  containers:
  - name: app
    image: nginx:1.25

内置污点

污点	说明	自动添加时机
`node.kubernetes.io/not-ready`	节点不健康	节点Ready状态为False
`node.kubernetes.io/unreachable`	节点不可达	节点Ready状态为Unknown
`node.kubernetes.io/memory-pressure`	内存压力	内存不足
`node.kubernetes.io/disk-pressure`	磁盘压力	磁盘不足
`node.kubernetes.io/pid-pressure`	PID压力	PID不足
`node.kubernetes.io/network-unavailable`	网络不可用	网络未配置
`node.kubernetes.io/unschedulable`	不可调度	kubectl cordon

6.4 拓扑分布约束

topologySpreadConstraints配置

apiVersion: v1
kind: Pod
metadata:
  name: topology-spread-demo
spec:
  topologySpreadConstraints:
  - maxSkew: 1                    # 最大偏差
    topologyKey: kubernetes.io/hostname  # 拓扑键（节点级别）
    whenUnsatisfiable: DoNotSchedule     # 不满足时的行为
    labelSelector:
      matchLabels:
        app: myapp
  
  - maxSkew: 2
    topologyKey: topology.kubernetes.io/zone  # 拓扑键（可用区级别）
    whenUnsatisfiable: ScheduleAnyway   # 尽量满足
    labelSelector:
      matchLabels:
        app: myapp
  
  containers:
  - name: app
    image: nginx:1.25
    labels:
      app: myapp

参数说明：

参数	说明
`maxSkew`	最大偏差，不同拓扑域Pod数量的最大差值
`topologyKey`	拓扑键，定义拓扑域的节点标签
`whenUnsatisfiable`	不满足约束时的行为
`labelSelector`	匹配的Pod标签

whenUnsatisfiable取值：

值	说明
`DoNotSchedule`	不满足约束时不调度（硬性要求）
`ScheduleAnyway`	尽量满足，但不强制（软性要求）

分布约束示例

6.5 优先级与抢占

PriorityClass定义

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000              # 优先级数值（1-1000000000）
globalDefault: false        # 是否为默认优先级
preemptionPolicy: PreemptLowerPriority  # 抢占策略
description: "高优先级应用"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: medium-priority
value: 100000
globalDefault: true
preemptionPolicy: PreemptLowerPriority
description: "中等优先级应用"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: 1000
globalDefault: false
preemptionPolicy: Never     # 不抢占其他Pod
description: "低优先级应用，不抢占"

使用PriorityClass

apiVersion: v1
kind: Pod
metadata:
  name: high-priority-pod
spec:
  priorityClassName: high-priority  # 引用PriorityClass
  containers:
  - name: app
    image: nginx:1.25

抢占机制流程

7. 命名空间与资源隔离

7.1 Namespace核心概念

Namespace数据结构

apiVersion: v1
kind: Namespace
metadata:
  name: my-namespace
  labels:
    name: my-namespace
    environment: development
  annotations:
    description: "开发环境命名空间"
spec:
  finalizers:
  - kubernetes  # 删除前清理资源
status:
  phase: Active  # Active或Terminating

Namespace生命周期

7.2 命名空间隔离机制

资源隔离

资源类型	命名空间级别	集群级别
Pod	✅	-
Service	✅	-
Deployment	✅	-
ConfigMap	✅	-
Secret	✅	-
PVC	✅	-
Node	-	✅
PV	-	✅
StorageClass	-	✅
ClusterRole	-	✅
Namespace	-	✅

网络隔离

# 默认拒绝所有入站流量
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: my-namespace
spec:
  podSelector: {}  # 选择所有Pod
  policyTypes:
  - Ingress
---
# 允许同命名空间内通信
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-same-namespace
  namespace: my-namespace
spec:
  podSelector: {}
  ingress:
  - from:
    - podSelector: {}  # 同命名空间的Pod
  policyTypes:
  - Ingress

RBAC隔离

# Role：命名空间级别权限
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: namespace-admin
  namespace: my-namespace
rules:
- apiGroups: [""]
  resources: ["*"]
  verbs: ["*"]
- apiGroups: ["apps"]
  resources: ["*"]
  verbs: ["*"]
---
# RoleBinding：绑定到用户
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: namespace-admin-binding
  namespace: my-namespace
subjects:
- kind: User
  name: dev-user
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: namespace-admin
  apiGroup: rbac.authorization.k8s.io

7.3 跨命名空间访问

跨命名空间Service访问

# 命名空间A中的Service
apiVersion: v1
kind: Service
metadata:
  name: backend-service
  namespace: namespace-a
spec:
  selector:
    app: backend
  ports:
  - port: 8080
---
# 命名空间B中的Pod访问
apiVersion: v1
kind: Pod
metadata:
  name: frontend
  namespace: namespace-b
spec:
  containers:
  - name: app
    image: nginx:1.25
    env:
    - name: BACKEND_URL
      value: "http://backend-service.namespace-a.svc.cluster.local:8080"

Service DNS格式：

<service-name>.<namespace>.svc.cluster.local

示例：
backend-service.namespace-a.svc.cluster.local

跨命名空间资源引用

# 跨命名空间引用Secret（部分资源支持）
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
  namespace: namespace-b
spec:
  template:
    spec:
      imagePullSecrets:
      - name: registry-secret  # 只能引用同命名空间的Secret
      
      # ConfigMap/Secret不能跨命名空间引用
      # 需要在目标命名空间创建副本

7.4 系统命名空间

命名空间	说明	内容
`default`	默认命名空间	用户资源
`kube-system`	系统命名空间	系统组件（kube-proxy, CoreDNS等）
`kube-public`	公共命名空间	公共资源（如集群信息）
`kube-node-lease`	节点租约	节点心跳数据

# 查看系统命名空间
kubectl get namespaces

# 查看kube-system中的组件
kubectl get pods -n kube-system

# 查看集群信息（kube-public）
kubectl get configmap cluster-info -n kube-public -o yaml

8. DNS配置与服务发现

8.1 DNS策略配置

dnsPolicy取值

apiVersion: v1
kind: Pod
metadata:
  name: dns-policy-demo
spec:
  dnsPolicy: ClusterFirst  # DNS策略
  containers:
  - name: app
    image: nginx:1.25

dnsPolicy取值说明：

值	说明	使用场景
`ClusterFirst`	优先使用集群DNS	默认值，大多数应用
`ClusterFirstWithHostNet`	主机网络时使用集群DNS	hostNetwork: true
`Default`	使用节点DNS配置	需要解析外部域名
`None`	自定义DNS配置	完全自定义DNS

各策略对比

8.2 自定义DNS配置

dnsConfig详解

apiVersion: v1
kind: Pod
metadata:
  name: dns-config-demo
spec:
  dnsPolicy: None  # 必须设置为None才能使用dnsConfig
  dnsConfig:
    nameservers:
    - 8.8.8.8
    - 8.8.4.4
    searches:
    - mydomain.local
    - example.com
    options:
    - name: ndots
      value: "2"
    - name: timeout
      value: "3"
    - name: attempts
      value: "2"
  
  containers:
  - name: app
    image: nginx:1.25

生成的/etc/resolv.conf：

nameserver 8.8.8.8
nameserver 8.8.4.4
search mydomain.local example.com
options ndots:2 timeout:3 attempts:2

hostAliases配置

apiVersion: v1
kind: Pod
metadata:
  name: hostaliases-demo
spec:
  hostAliases:
  - ip: "127.0.0.1"
    hostnames:
    - "foo.local"
    - "bar.local"
  - ip: "192.168.1.100"
    hostnames:
    - "myapp.local"
  
  containers:
  - name: app
    image: nginx:1.25

生成的/etc/hosts：

# Kubernetes-managed hosts file
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
...

# Entries added by HostAliases.
127.0.0.1	foo.local	bar.local
192.168.1.100	myapp.local

8.3 服务发现机制

DNS服务发现

DNS记录类型

记录类型	格式	说明
A记录	`<service>.<ns>.svc.cluster.local`	Service ClusterIP
SRV记录	`_<port>._<proto>.<service>.<ns>.svc.cluster.local`	服务端口信息
A记录	`<pod-ip>.<service>.<ns>.svc.cluster.local`	Headless Service的Pod IP

DNS查询示例：

# 查询Service
nslookup kubernetes.default.svc.cluster.local

# 查询Headless Service的Pod
nslookup myapp-0.myapp-headless.default.svc.cluster.local

# 查询SRV记录
nslookup -type=SRV _http._tcp.myapp.default.svc.cluster.local

环境变量服务发现

apiVersion: v1
kind: Pod
metadata:
  name: env-service-discovery
spec:
  enableServiceLinks: true  # 默认为true
  containers:
  - name: app
    image: nginx:1.25
    # 自动注入的环境变量：
    # KUBERNETES_SERVICE_HOST=10.96.0.1
    # KUBERNETES_SERVICE_PORT=443
    # KUBERNETES_PORT=tcp://10.96.0.1:443
    # KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
    # KUBERNETES_PORT_443_TCP_PROTO=tcp
    # KUBERNETES_PORT_443_TCP_PORT=443
    # KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1

8.4 DNS调试方法

常用调试命令

# 1. 检查CoreDNS状态
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns

# 2. 检查CoreDNS配置
kubectl get configmap coredns -n kube-system -o yaml

# 3. 使用nslookup调试
kubectl exec -it my-pod -- nslookup kubernetes.default

# 4. 使用dig调试
kubectl exec -it my-pod -- dig kubernetes.default.svc.cluster.local

# 5. 检查Pod的DNS配置
kubectl exec -it my-pod -- cat /etc/resolv.conf

# 6. 测试DNS解析
kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never -- nslookup kubernetes.default

DNS问题排查流程

DNS调试Pod

apiVersion: v1
kind: Pod
metadata:
  name: dns-debug
spec:
  containers:
  - name: dns-debug
    image: nicolaka/netshoot:latest
    command: ['sleep', '3600']
  # 使用方法：
  # kubectl exec -it dns-debug -- dig kubernetes.default.svc.cluster.local
  # kubectl exec -it dns-debug -- nslookup kubernetes.default
  # kubectl exec -it dns-debug -- drill kubernetes.default.svc.cluster.local

9. 总结与最佳实践

9.1 Pod设计最佳实践

场景	最佳实践	说明
资源限制	始终设置requests和limits	防止资源争抢，保证QoS
健康检查	配置liveness和readiness探针	自动故障恢复，流量管理
优雅终止	设置preStop钩子和宽限期	确保连接正确关闭
镜像管理	使用具体版本tag，避免latest	确保可重复部署
配置管理	使用ConfigMap/Secret	配置与镜像分离
安全配置	设置securityContext	最小权限原则

9.2 资源配置模板

apiVersion: v1
kind: Pod
metadata:
  name: best-practice-pod
  labels:
    app: myapp
    version: v1.0.0
spec:
  serviceAccountName: myapp-sa
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 1000
    seccompProfile:
      type: RuntimeDefault
  
  terminationGracePeriodSeconds: 60
  
  containers:
  - name: app
    image: myapp:v1.0.0
    imagePullPolicy: IfNotPresent
    
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 500m
        memory: 512Mi
    
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop: ["ALL"]
    
    startupProbe:
      httpGet:
        path: /healthz
        port: 8080
      periodSeconds: 10
      failureThreshold: 30
    
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      periodSeconds: 15
      timeoutSeconds: 5
      failureThreshold: 3
    
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      periodSeconds: 5
      timeoutSeconds: 3
      failureThreshold: 3
    
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "sleep 15"]
    
    volumeMounts:
    - name: tmp
      mountPath: /tmp
    - name: config
      mountPath: /app/config
      readOnly: true
  
  volumes:
  - name: tmp
    emptyDir: {}
  - name: config
    configMap:
      name: myapp-config

9.3 常见问题排查清单

问题	排查命令	可能原因
Pod一直Pending	`kubectl describe pod`	资源不足、调度限制
镜像拉取失败	`kubectl describe pod`	认证失败、网络问题
容器频繁重启	`kubectl logs --previous`	应用崩溃、探针失败
DNS解析失败	`kubectl exec -- nslookup`	CoreDNS故障、配置错误
服务无法访问	`kubectl get endpoints`	Selector不匹配、探针失败

AtomGit开源社区

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念，把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起，为开发者提供从开发、训练到部署的一站式体验。

更多推荐

2026年阿里云 618 Hermes Agent/OpenClaw配置Token Plan部署超全攻略

OpenClaw并非传统的聊天机器人，而是一款本地优先、云端适配的AI自动化代理——它以大语言模型为“大脑”，以Skills插件生态为“手脚”，能理解自然语言指令，自主完成网页操作、邮件管理、文档处理、多平台协同等具象化任务，无需编写复杂的自动化脚本。零代码门槛：通过自然语言下达指令，无需掌握Python/Java等编程技能；多端适配：支持阿里云服务器、本地设备、无影云电脑等多环境部署；生态扩展：