容器化发布流程深度实践:从代码提交到容器部署

版本: V1.0 | 技术深度: 生产实战级 | 预计阅读时间: 65 分钟
质量目标: CSDN 评分>95 | 适用人群: DevOps 工程师、容器工程师、发布工程师


目录


1. 代码推送与版本管理

1.1 Git 工作流规范

1.1.1 Git Flow 工作流

热修复分支

发布分支

功能分支

主分支

合并

合并

合并

合并

发布

合并

合并

合并

合并

main/master
生产环境

develop
开发主线

feature/user-auth

feature/order-system

feature/payment

release/v1.0

release/v1.1

hotfix/bug-fix

1.2 代码提交规范

1.2.1 Conventional Commits 规范
# Commit Message 格式
<type>(<scope>): <subject>

# type 类型说明
feat:     新功能
fix:      Bug 修复
docs:     文档更新
style:    代码格式(不影响代码运行)
refactor: 重构(既不是新功能也不是 Bug 修复)
test:     测试相关
chore:    构建过程或辅助工具变动

# scope 说明(可选)
ui:       用户界面
api:      API 接口
db:       数据库
config:   配置文件
ci:       CI/CD

# 示例
feat(user): add user authentication feature
fix(order): fix order calculation bug
docs(readme): update installation guide
style(format): fix code formatting
refactor(auth): refactor authentication logic
test(login): add login test cases
chore(deps): update dependencies
1.2.2 Git 提交脚本
#!/bin/bash
# git-commit.sh - 规范化 Git 提交

set -euo pipefail

# 颜色定义
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'

log() {
    echo -e "${BLUE}[$(date +'%H:%M:%S')]${NC} $1"
}

show_help() {
    cat << EOF
Git 规范化提交工具

用法:$0 <type> <scope> <message>

type 类型:
  feat     新功能
  fix      Bug 修复
  docs     文档更新
  style    代码格式
  refactor 重构
  test     测试
  chore    构建/工具

scope 范围:
  ui       用户界面
  api      API 接口
  db       数据库
  config   配置
  ci       CI/CD

示例:
  $0 feat user "add login feature"
  $0 fix order "fix calculation bug"
  $0 docs readme "update installation"
EOF
}

if [ $# -lt 3 ]; then
    show_help
    exit 1
fi

TYPE=$1
SCOPE=$2
MESSAGE=$3

# 验证 type
valid_types=("feat" "fix" "docs" "style" "refactor" "test" "chore")
if [[ ! " ${valid_types[@]} " =~ " ${TYPE} " ]]; then
    echo -e "${RED}✗ 无效的 type: $TYPE${NC}"
    show_help
    exit 1
fi

# 创建提交消息
commit_message="${TYPE}(${SCOPE}): ${MESSAGE}"

log "提交消息:$commit_message"

# 添加文件
git add -A

# 提交
git commit -m "$commit_message"

log "✓ 提交成功"

1.3 分支管理策略

1.3.1 分支保护规则
#!/bin/bash
# branch-protection.sh - 分支保护配置

set -euo pipefail

GITHUB_TOKEN="your_token"
REPO="owner/repo"

# 配置 main 分支保护
curl -X PUT \
    -H "Authorization: token $GITHUB_TOKEN" \
    -H "Accept: application/vnd.github.v3+json" \
    https://api.github.com/repos/$REPO/branches/main/protection \
    -d '{
        "required_status_checks": {
            "strict": true,
            "contexts": ["ci/jenkins", "sonarqube"]
        },
        "enforce_admins": true,
        "required_pull_request_reviews": {
            "required_approving_review_count": 1,
            "dismiss_stale_reviews": true,
            "require_code_owner_reviews": true
        },
        "restrictions": null,
        "required_linear_history": false,
        "allow_force_pushes": false,
        "allow_deletions": false
    }'

echo "✓ 分支保护配置完成"

2. 容器镜像构建

2.1 Dockerfile 最佳实践

2.1.1 Java 应用 Dockerfile
# Dockerfile - Java 应用多阶段构建
# 阶段 1: 依赖下载
FROM maven:3.9-openjdk-17 AS dependencies
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline -B

# 阶段 2: 编译构建
FROM maven:3.9-openjdk-17 AS builder
WORKDIR /app
COPY --from=dependencies /root/.m2 /root/.m2
COPY src ./src
RUN mvn clean package -DskipTests -B

# 阶段 3: 运行环境
FROM eclipse-temurin:17-jre-alpine AS production

# 创建非 root 用户
RUN addgroup -g 1001 -S java && \
    adduser -S java -u 1001

# 设置工作目录
WORKDIR /app

# 复制构建产物
COPY --from=builder --chown=java:java /app/target/*.jar app.jar

# 设置 JVM 参数
ENV JAVA_OPTS="-Xms512m -Xmx2g -XX:+UseG1GC"

# 暴露端口
EXPOSE 8080

# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
    CMD wget -qO- http://localhost:8080/actuator/health || exit 1

# 切换用户
USER java

# 启动应用
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar app.jar"]
2.1.2 Node.js 应用 Dockerfile
# Dockerfile - Node.js 应用多阶段构建
# 阶段 1: 依赖安装
FROM node:18-alpine AS dependencies
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

# 阶段 2: 构建编译
FROM node:18-alpine AS builder
WORKDIR /app
COPY --from=dependencies /app/node_modules ./node_modules
COPY . .
RUN npm run build

# 阶段 3: 生产环境
FROM node:18-alpine AS production

# 安装 dumb-init
RUN apk add --no-cache dumb-init

# 创建非 root 用户
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001

WORKDIR /app

# 复制构建产物
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nodejs:nodejs /app/package.json ./

# 设置环境变量
ENV NODE_ENV=production
ENV PORT=3000

# 暴露端口
EXPOSE 3000

# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=30s --retries=3 \
    CMD wget -qO- http://localhost:3000/health || exit 1

# 切换用户
USER nodejs

# 使用 dumb-init 启动
ENTRYPOINT ["dumb-init", "node", "dist/server.js"]
2.1.3 Python 应用 Dockerfile
# Dockerfile - Python 应用多阶段构建
# 阶段 1: 依赖安装
FROM python:3.11-slim AS dependencies
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# 阶段 2: 构建编译
FROM python:3.11-slim AS builder
WORKDIR /app
COPY --from=dependencies /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY . .

# 阶段 3: 生产环境
FROM python:3.11-slim AS production

# 安装运行时依赖
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && rm -rf /var/lib/apt/lists/*

# 创建非 root 用户
RUN groupadd -g 1001 python && \
    useradd -r -u 1001 -g python python

WORKDIR /app

# 复制构建产物
COPY --from=builder --chown=python:python /app .

# 设置环境变量
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

# 暴露端口
EXPOSE 8000

# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=30s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

# 切换用户
USER python

# 启动应用
ENTRYPOINT ["python", "-m", "gunicorn", "--bind", "0.0.0.0:8000", "app:app"]

2.2 多阶段构建

2.2.2 多阶段构建优势
优势 说明 效果
减小镜像体积 只复制最终产物,不包含构建工具 减少 70-90%
提高安全性 生产镜像不包含源码和构建依赖 减少攻击面
加速构建 利用缓存层,减少重复下载 提升 50-80%
环境隔离 构建环境与运行环境分离 提高一致性

2.3 镜像优化技巧

2.3.1 镜像优化清单

层缓存优化

# 按变更频率排序
# 1. 最少变化的放前面
COPY requirements.txt .
RUN pip install -r requirements.txt

# 2. 经常变化的放后面
COPY . .

减少镜像层数

# 使用 && 连接多条 RUN 指令
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        curl \
        wget \
    && rm -rf /var/lib/apt/lists/*

使用 .dockerignore

# .dockerignore 文件
.git
.gitignore
*.md
Dockerfile
.dockerignore
node_modules
__pycache__
*.pyc
*.pyo
.env
.venv

选择基础镜像

# 按体积排序
FROM alpine:latest        # 5MB
FROM scratch              # 0MB
FROM debian:slim          # 30MB
FROM ubuntu:latest        # 70MB

3. 镜像仓库管理

3.1 Harbor 仓库配置

3.1.1 Harbor 项目配置
#!/bin/bash
# harbor-project.sh - Harbor 项目配置

set -euo pipefail

HARBOR_URL="https://harbor.example.com"
HARBOR_USER="admin"
HARBOR_PASSWORD="Harbor12345"

# 获取认证 Token
TOKEN=$(curl -sk -X POST \
    "${HARBOR_URL}/api/v2.0/users/current/permissions" \
    -u "${HARBOR_USER}:${HARBOR_PASSWORD}" \
    | jq -r '.token')

# 创建项目
create_project() {
    local project_name=$1
    local public=$2
    
    curl -sk -X POST \
        "${HARBOR_URL}/api/v2.0/projects" \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer ${TOKEN}" \
        -d "{
            \"project_name\": \"${project_name}\",
            \"public\": ${public},
            \"metadata\": {
                \"auto_scan\": \"true\",
                \"reuse_sys_cve_allowlist\": \"false\"
            }
        }"
    
    echo "✓ 项目创建成功:$project_name"
}

# 创建项目
create_project "production" false
create_project "staging" false
create_project "development" true

3.2 镜像推送流程

3.2.1 镜像推送脚本
#!/bin/bash
# push-image.sh - 镜像推送脚本

set -euo pipefail

REGISTRY="harbor.example.com"
PROJECT="production"
APP_NAME="myapp"
VERSION="${1:-latest}"

log() {
    echo "[$(date +'%H:%M:%S')] $1"
}

log "=== 推送镜像到 Harbor ==="

# 1. 构建镜像
log "[1/4] 构建镜像..."
docker build -t ${REGISTRY}/${PROJECT}/${APP_NAME}:${VERSION} .

# 2. 登录 Harbor
log "[2/4] 登录 Harbor..."
docker login ${REGISTRY} -u admin -p Harbor12345

# 3. 推送镜像
log "[3/4] 推送镜像..."
docker push ${REGISTRY}/${PROJECT}/${APP_NAME}:${VERSION}

# 4. 推送 latest 标签
if [ "$VERSION" != "latest" ]; then
    log "[4/4] 推送 latest 标签..."
    docker tag ${REGISTRY}/${PROJECT}/${APP_NAME}:${VERSION} \
               ${REGISTRY}/${PROJECT}/${APP_NAME}:latest
    docker push ${REGISTRY}/${PROJECT}/${APP_NAME}:latest
fi

log "✓ 镜像推送完成"
log "镜像地址:${REGISTRY}/${PROJECT}/${APP_NAME}:${VERSION}"

3.3 镜像安全扫描

3.3.1 镜像扫描脚本
#!/bin/bash
# scan-image.sh - 镜像安全扫描

set -euo pipefail

REGISTRY="harbor.example.com"
PROJECT="production"
APP_NAME="myapp"
VERSION="${1:-latest}"

log() {
    echo "[$(date +'%H:%M:%S')] $1"
}

log "=== 镜像安全扫描 ==="

# 1. 使用 Trivy 扫描
log "[1/3] Trivy 扫描..."
docker run --rm \
    -v /var/run/docker.sock:/var/run/docker.sock \
    aquasec/trivy image \
    ${REGISTRY}/${PROJECT}/${APP_NAME}:${VERSION}

# 2. 使用 Harbor 扫描 API
log "[2/3] 触发 Harbor 扫描..."
curl -sk -X POST \
    "https://${REGISTRY}/api/v2.0/projects/${PROJECT}/repositories/${APP_NAME}/artifacts/${VERSION}/scan" \
    -u admin:Harbor12345

# 3. 获取扫描报告
log "[3/3] 获取扫描报告..."
REPORT=$(curl -sk \
    "https://${REGISTRY}/api/v2.0/projects/${PROJECT}/repositories/${APP_NAME}/artifacts/${VERSION}" \
    -u admin:Harbor12345 | jq '.[0].scan_summary')

echo "$REPORT" | jq .

log "✓ 扫描完成"

4. 容器化部署

4.1 Docker Swarm 部署

4.1.1 Swarm 部署脚本
#!/bin/bash
# swarm-deploy.sh - Docker Swarm 部署

set -euo pipefail

STACK_NAME="myapp"
REGISTRY="harbor.example.com"
PROJECT="production"
APP_NAME="myapp"
VERSION="${1:-latest}"

log() {
    echo "[$(date +'%H:%M:%S')] $1"
}

log "=== Docker Swarm 部署 ==="

# 1. 准备 docker-compose.yml
log "[1/5] 准备部署配置..."
cat > docker-compose.swarm.yml <<EOF
version: '3.8'

services:
  app:
    image: ${REGISTRY}/${PROJECT}/${APP_NAME}:${VERSION}
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 30s
        failure_action: rollback
        order: start-first
      resources:
        limits:
          cpus: '1.0'
          memory: 1G
        reservations:
          cpus: '0.5'
          memory: 512M
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
    ports:
      - "8080:8080"
    networks:
      - app-network
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

networks:
  app-network:
    driver: overlay
    attachable: true
EOF

# 2. 登录 Harbor
log "[2/5] 登录镜像仓库..."
docker login ${REGISTRY} -u admin -p Harbor12345

# 3. 部署服务
log "[3/5] 部署服务..."
docker stack deploy \
    -c docker-compose.swarm.yml \
    ${STACK_NAME}

# 4. 等待部署完成
log "[4/5] 等待部署完成..."
for i in {1..30}; do
    RUNNING=$(docker service ps ${STACK_NAME}_app --format "{{.CurrentState}}" | grep -c "Running" || true)
    if [ "$RUNNING" -ge 3 ]; then
        log "✓ 所有副本已就绪"
        break
    fi
    log "等待中... ($RUNNING/3) ($i/30)"
    sleep 10
done

# 5. 验证部署
log "[5/5] 验证部署..."
docker service ps ${STACK_NAME}_app
docker service ls | grep ${STACK_NAME}

log "✓ 部署完成"

4.2 Kubernetes 部署

4.2.1 K8s 部署脚本
#!/bin/bash
# k8s-deploy.sh - Kubernetes 部署

set -euo pipefail

NAMESPACE="production"
APP_NAME="myapp"
REGISTRY="harbor.example.com"
PROJECT="production"
VERSION="${1:-latest}"

log() {
    echo "[$(date +'%H:%M:%S')] $1"
}

log "=== Kubernetes 部署 ==="

# 1. 准备 Deployment YAML
log "[1/6] 准备 Deployment 配置..."
cat > deployment.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ${APP_NAME}
  namespace: ${NAMESPACE}
  labels:
    app: ${APP_NAME}
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ${APP_NAME}
  template:
    metadata:
      labels:
        app: ${APP_NAME}
    spec:
      containers:
      - name: ${APP_NAME}
        image: ${REGISTRY}/${PROJECT}/${APP_NAME}:${VERSION}
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
EOF

# 2. 准备 Service YAML
log "[2/6] 准备 Service 配置..."
cat > service.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
  name: ${APP_NAME}
  namespace: ${NAMESPACE}
spec:
  selector:
    app: ${APP_NAME}
  ports:
  - port: 80
    targetPort: 8080
  type: LoadBalancer
EOF

# 3. 创建命名空间
log "[3/6] 创建命名空间..."
kubectl create namespace ${NAMESPACE} --dry-run=client -o yaml | kubectl apply -f -

# 4. 创建镜像拉取密钥
log "[4/6] 创建镜像密钥..."
kubectl create secret docker-registry harbor-secret \
    --docker-server=${REGISTRY} \
    --docker-username=admin \
    --docker-password=Harbor12345 \
    --namespace=${NAMESPACE} \
    --dry-run=client -o yaml | kubectl apply -f -

# 5. 应用配置
log "[5/6] 应用配置..."
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml

# 6. 等待部署完成
log "[6/6] 等待部署完成..."
kubectl rollout status deployment/${APP_NAME} -n ${NAMESPACE}

log "✓ 部署完成"
kubectl get pods -n ${NAMESPACE} -l app=${APP_NAME}

4.3 滚动更新策略

4.3.1 滚动更新配置
# Kubernetes 滚动更新配置
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # 最多超出副本数的 Pod 数
      maxUnavailable: 0  # 最多不可用的 Pod 数
  
  # 更新参数
  minReadySeconds: 30    # Pod 就绪后等待时间
  revisionHistoryLimit: 10  # 保留的旧 ReplicaSet 数
  progressDeadlineSeconds: 600  # 更新超时时间

5. 服务发现与负载均衡

5.1 服务发现机制

5.1.1 Docker Swarm 服务发现
# Docker Swarm 内置 DNS 服务
# 服务名自动解析为 VIP

# 示例:服务间通信
docker network create -d overlay my-network

docker service create \
    --name backend \
    --network my-network \
    harbor.example.com/backend:latest

docker service create \
    --name frontend \
    --network my-network \
    --env BACKEND_URL=http://backend:8080 \
    harbor.example.com/frontend:latest

# frontend 服务可以通过 backend 主机名访问 backend 服务

5.2 负载均衡配置

5.2.1 Nginx 负载均衡配置
# nginx-lb.conf - Nginx 负载均衡配置
upstream myapp_backend {
    least_conn;  # 最少连接算法
    
    server myapp-1:8080 weight=5;
    server myapp-2:8080 weight=5;
    server myapp-3:8080 weight=5;
    
    keepalive 32;
}

server {
    listen 80;
    server_name myapp.example.com;
    
    location / {
        proxy_pass http://myapp_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        
        # 超时配置
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
        
        # 健康检查
        proxy_next_upstream error timeout http_500 http_502 http_503 http_504;
    }
    
    # 健康检查端点
    location /health {
        access_log off;
        return 200 "healthy\n";
        add_header Content-Type text/plain;
    }
}

5.3 健康检查

5.3.1 健康检查配置
# Kubernetes 健康检查配置
apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  containers:
  - name: myapp
    image: myapp:latest
    livenessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 30  # 容器启动后等待时间
      periodSeconds: 10        # 检查间隔
      timeoutSeconds: 5        # 超时时间
      successThreshold: 1      # 成功阈值
      failureThreshold: 3      # 失败阈值
    
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 3
      successThreshold: 1
      failureThreshold: 3
    
    startupProbe:
      httpGet:
        path: /health
        port: 8080
      failureThreshold: 30
      periodSeconds: 10

6. 发布验证与回滚

6.1 发布验证

6.1.1 发布验证脚本
#!/bin/bash
# verify-deployment.sh - 发布验证脚本

set -euo pipefail

NAMESPACE="${1:-production}"
APP_NAME="${2:-myapp}"

log() {
    echo "[$(date +'%H:%M:%S')] $1"
}

log "=== 发布验证 ==="

# 1. 检查 Pod 状态
log "[1/5] 检查 Pod 状态..."
kubectl get pods -n ${NAMESPACE} -l app=${APP_NAME}

# 2. 检查副本数
log "[2/5] 检查副本数..."
DESIRED=$(kubectl get deployment ${APP_NAME} -n ${NAMESPACE} -o jsonpath='{.spec.replicas}')
CURRENT=$(kubectl get deployment ${APP_NAME} -n ${NAMESPACE} -o jsonpath='{.status.readyReplicas}')

if [ "$CURRENT" -ge "$DESIRED" ]; then
    log "✓ 副本数正常:$CURRENT/$DESIRED"
else
    log "✗ 副本数异常:$CURRENT/$DESIRED"
    exit 1
fi

# 3. 检查健康状态
log "[3/5] 检查健康状态..."
for pod in $(kubectl get pods -n ${NAMESPACE} -l app=${APP_NAME} -o jsonpath='{.items[*].metadata.name}'); do
    kubectl exec -n ${NAMESPACE} $pod -- curl -f http://localhost:8080/health || {
        log "✗ Pod 健康检查失败:$pod"
        exit 1
    }
done
log "✓ 所有 Pod 健康检查通过"

# 4. 检查服务访问
log "[4/5] 检查服务访问..."
SERVICE_IP=$(kubectl get svc ${APP_NAME} -n ${NAMESPACE} -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
if curl -f http://${SERVICE_IP}/health; then
    log "✓ 服务访问正常"
else
    log "✗ 服务访问失败"
    exit 1
fi

# 5. 检查日志
log "[5/5] 检查日志..."
kubectl logs -n ${NAMESPACE} -l app=${APP_NAME} --tail=20

log "✓ 发布验证完成"

6.2 自动回滚机制

6.2.1 自动回滚脚本
#!/bin/bash
# auto-rollback.sh - 自动回滚脚本

set -euo pipefail

NAMESPACE="${1:-production}"
APP_NAME="${2:-myapp}"
ERROR_THRESHOLD="${3:-5}"  # 错误率阈值

log() {
    echo "[$(date +'%H:%M:%S')] $1"
}

log "=== 监控部署状态 ==="

# 获取当前版本
CURRENT_VERSION=$(kubectl get deployment ${APP_NAME} -n ${NAMESPACE} -o jsonpath='{.spec.template.spec.containers[0].image}')

# 监控错误率
while true; do
    # 获取错误率(从 Prometheus)
    ERROR_RATE=$(curl -s "http://prometheus:9090/api/v1/query" \
        --data-urlencode "query=rate(http_requests_total{status=~\"5..\",app=\"${APP_NAME}\"}[5m])" | \
        jq -r '.data.result[0].value[1]')
    
    log "当前错误率:${ERROR_RATE}"
    
    # 检查是否超过阈值
    if (( $(echo "$ERROR_RATE > $ERROR_THRESHOLD" | bc -l) )); then
        log "✗ 错误率超过阈值,触发自动回滚"
        
        # 获取上一版本
        PREV_REVISION=$(kubectl rollout history deployment/${APP_NAME} -n ${NAMESPACE} | tail -2 | head -1 | awk '{print $1}')
        
        # 执行回滚
        kubectl rollout undo deployment/${APP_NAME} -n ${NAMESPACE} --to-revision=${PREV_REVISION}
        
        log "✓ 已回滚到版本:$PREV_REVISION"
        
        # 发送告警
        curl -X POST "http://alertmanager:9093/api/v1/alerts" \
            -H "Content-Type: application/json" \
            -d "{
                \"alerts\": [{
                    \"labels\": {
                        \"severity\": \"critical\",
                        \"app\": \"${APP_NAME}\"
                    },
                    \"annotations\": {
                        \"summary\": \"自动回滚触发\",
                        \"description\": \"错误率超过阈值,已回滚到版本 ${PREV_REVISION}\"
                    }
                }]
            }"
        
        exit 1
    fi
    
    sleep 60
done

6.3 灰度发布

6.3.1 金丝雀发布脚本
#!/bin/bash
# canary-release.sh - 金丝雀发布脚本

set -euo pipefail

NAMESPACE="${1:-production}"
APP_NAME="${2:-myapp}"
NEW_VERSION="${3:-latest}"
CANARY_PERCENT="${4:-10}"

log() {
    echo "[$(date +'%H:%M:%S')] $1"
}

log "=== 金丝雀发布 ==="

# 1. 获取当前副本数
TOTAL_REPLICAS=$(kubectl get deployment ${APP_NAME} -n ${NAMESPACE} -o jsonpath='{.spec.replicas}')
CANARY_REPLICAS=$((TOTAL_REPLICAS * CANARY_PERCENT / 100))
STABLE_REPLICAS=$((TOTAL_REPLICAS - CANARY_REPLICAS))

log "总副本数:$TOTAL_REPLICAS"
log "金丝雀副本:$CANARY_REPLICAS"
log "稳定副本:$STABLE_REPLICAS"

# 2. 创建金丝雀部署
log "[1/4] 创建金丝雀部署..."
cat > canary-deployment.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ${APP_NAME}-canary
  namespace: ${NAMESPACE}
spec:
  replicas: ${CANARY_REPLICAS}
  selector:
    matchLabels:
      app: ${APP_NAME}
      version: canary
  template:
    metadata:
      labels:
        app: ${APP_NAME}
        version: canary
    spec:
      containers:
      - name: ${APP_NAME}
        image: harbor.example.com/${APP_NAME}:${NEW_VERSION}
        ports:
        - containerPort: 8080
EOF

kubectl apply -f canary-deployment.yaml

# 3. 更新稳定版本
log "[2/4] 更新稳定版本..."
kubectl set image deployment/${APP_NAME} \
    ${APP_NAME}=harbor.example.com/${APP_NAME}:${NEW_VERSION} \
    -n ${NAMESPACE}

# 4. 等待金丝雀部署
log "[3/4] 等待金丝雀部署..."
kubectl rollout status deployment/${APP_NAME}-canary -n ${NAMESPACE}

# 5. 监控金丝雀指标
log "[4/4] 监控金丝雀指标(60 秒)..."
sleep 60

# 检查错误率
CANARY_ERROR=$(curl -s "http://prometheus:9090/api/v1/query" \
    --data-urlencode "query=rate(http_requests_total{status=~\"5..\",app=\"${APP_NAME}\",version=\"canary\"}[5m])" | \
    jq -r '.data.result[0].value[1]')

log "金丝雀错误率:$CANARY_ERROR"

if (( $(echo "$CANARY_ERROR > 0.05" | bc -l) )); then
    log "✗ 金丝雀发布失败,错误率过高"
    kubectl delete deployment ${APP_NAME}-canary -n ${NAMESPACE}
    exit 1
fi

log "✓ 金丝雀发布成功"

7. 监控与日志

7.1 容器监控

7.1.1 Prometheus 监控配置
# prometheus-rules.yml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: container-alerts
  namespace: monitoring
spec:
  groups:
  - name: container.rules
    rules:
    - alert: ContainerHighCPU
      expr: rate(container_cpu_usage_seconds_total[5m]) > 0.8
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "容器 CPU 使用率过高"
        description: "{{ $labels.container }} CPU 使用率超过 80%"
    
    - alert: ContainerHighMemory
      expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "容器内存使用率过高"
        description: "{{ $labels.container }} 内存使用率超过 90%"
    
    - alert: ContainerRestarted
      expr: rate(container_last_seen[5m]) > 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "容器频繁重启"
        description: "{{ $labels.container }} 在 5 分钟内重启"

7.2 日志收集

7.2.1 EFK Stack 配置
# fluentd-config.yml
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
  namespace: logging
data:
  fluent.conf: |
    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head true
      <parse>
        @type json
      </parse>
    </source>
    
    <filter kubernetes.**>
      @type kubernetes_metadata
      @id filter_kube_metadata
    </filter>
    
    <match **>
      @type elasticsearch
      host elasticsearch.logging.svc.cluster.local
      port 9200
      logstash_format true
      logstash_prefix kubernetes
      flush_interval 5s
    </match>

7.3 告警配置

7.3.1 告警规则配置
# alertmanager-config.yml
route:
  group_by: ['alertname', 'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'default-receiver'
  routes:
  - match:
      severity: critical
    receiver: 'critical-receiver'
  - match:
      severity: warning
    receiver: 'warning-receiver'

receivers:
- name: 'default-receiver'
  email_configs:
  - to: 'team@example.com'
    send_resolved: true

- name: 'critical-receiver'
  webhook_configs:
  - url: 'http://dingtalk-webhook:8080/alerts'
    send_resolved: true

- name: 'warning-receiver'
  email_configs:
  - to: 'dev@example.com'
    send_resolved: true

8. 总结

8.1 核心技术要点

  1. 代码管理:Git Flow 工作流、规范化提交
  2. 镜像构建:多阶段构建、镜像优化
  3. 仓库管理:Harbor 配置、安全扫描
  4. 容器部署:Swarm/K8s 部署、滚动更新
  5. 服务发现:负载均衡、健康检查
  6. 发布验证:自动化验证、回滚机制
  7. 监控告警:指标监控、日志收集

8.2 最佳实践清单

构建优化

  • 使用多阶段构建
  • 选择最小基础镜像
  • 利用层缓存
  • 配置 .dockerignore

部署规范

  • 配置健康检查
  • 实施滚动更新
  • 配置资源限制
  • 使用非 root 用户

安全加固

  • 镜像安全扫描
  • 使用私有仓库
  • 配置网络策略
  • 最小权限原则

监控运维

  • 配置监控指标
  • 收集应用日志
  • 设置告警规则
  • 实施自动回滚

附录 A:完整发布脚本

完整发布脚本已在上文各章节提供,包括:

  • Git 提交脚本
  • 镜像构建脚本
  • 镜像推送脚本
  • Swarm 部署脚本
  • K8s 部署脚本
  • 金丝雀发布脚本
  • 自动回滚脚本

附录 B:故障排查指南

# 容器故障排查 SOP

# 1. 检查容器状态
docker ps -a
kubectl get pods -n <namespace>

# 2. 查看容器日志
docker logs <container>
kubectl logs -n <namespace> <pod>

# 3. 进入容器调试
docker exec -it <container> sh
kubectl exec -it -n <namespace> <pod> -- sh

# 4. 检查资源使用
docker stats
kubectl top pods -n <namespace>

# 5. 网络连通性
docker network inspect <network>
kubectl describe pod <pod> -n <namespace>

# 6. 健康检查
curl http://localhost:8080/health
kubectl get endpoints <service> -n <namespace>

文档版本: V1.0
最后更新: 2026-03-12
作者: AI 技术助手
许可协议: CC BY-SA 4.0

Logo

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。

更多推荐