十七、基于Docker容器DevOps应用方案企业业务代码发布系统-4-container-deployment-complete-guide
·
容器化发布流程深度实践:从代码提交到容器部署
版本: V1.0 | 技术深度: 生产实战级 | 预计阅读时间: 65 分钟
质量目标: CSDN 评分>95 | 适用人群: DevOps 工程师、容器工程师、发布工程师
目录
- 1. 代码推送与版本管理
- [1.1 Git 工作流规范](#11-git 工作流规范)
- 1.2 代码提交规范
- 1.3 分支管理策略
- 2. 容器镜像构建
- [2.1 Dockerfile 最佳实践](#21-dockerfile 最佳实践)
- 2.2 多阶段构建
- 2.3 镜像优化技巧
- 3. 镜像仓库管理
- [3.1 Harbor 仓库配置](#31-harbor 仓库配置)
- 3.2 镜像推送流程
- 3.3 镜像安全扫描
- 4. 容器化部署
- [4.1 Docker Swarm 部署](#41-docker-swarm 部署)
- [4.2 Kubernetes 部署](#42-kubernetes 部署)
- 4.3 滚动更新策略
- 5. 服务发现与负载均衡
- 6. 发布验证与回滚
- 7. 监控与日志
- 8. 总结
- [附录 A:完整发布脚本](#附录-a 完整发布脚本)
- [附录 B:故障排查指南](#附录-b 故障排查指南)
1. 代码推送与版本管理
1.1 Git 工作流规范
1.1.1 Git Flow 工作流
1.2 代码提交规范
1.2.1 Conventional Commits 规范
# Commit Message 格式
<type>(<scope>): <subject>
# type 类型说明
feat: 新功能
fix: Bug 修复
docs: 文档更新
style: 代码格式(不影响代码运行)
refactor: 重构(既不是新功能也不是 Bug 修复)
test: 测试相关
chore: 构建过程或辅助工具变动
# scope 说明(可选)
ui: 用户界面
api: API 接口
db: 数据库
config: 配置文件
ci: CI/CD
# 示例
feat(user): add user authentication feature
fix(order): fix order calculation bug
docs(readme): update installation guide
style(format): fix code formatting
refactor(auth): refactor authentication logic
test(login): add login test cases
chore(deps): update dependencies
1.2.2 Git 提交脚本
#!/bin/bash
# git-commit.sh - 规范化 Git 提交
set -euo pipefail
# 颜色定义
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'
log() {
echo -e "${BLUE}[$(date +'%H:%M:%S')]${NC} $1"
}
show_help() {
cat << EOF
Git 规范化提交工具
用法:$0 <type> <scope> <message>
type 类型:
feat 新功能
fix Bug 修复
docs 文档更新
style 代码格式
refactor 重构
test 测试
chore 构建/工具
scope 范围:
ui 用户界面
api API 接口
db 数据库
config 配置
ci CI/CD
示例:
$0 feat user "add login feature"
$0 fix order "fix calculation bug"
$0 docs readme "update installation"
EOF
}
if [ $# -lt 3 ]; then
show_help
exit 1
fi
TYPE=$1
SCOPE=$2
MESSAGE=$3
# 验证 type
valid_types=("feat" "fix" "docs" "style" "refactor" "test" "chore")
if [[ ! " ${valid_types[@]} " =~ " ${TYPE} " ]]; then
echo -e "${RED}✗ 无效的 type: $TYPE${NC}"
show_help
exit 1
fi
# 创建提交消息
commit_message="${TYPE}(${SCOPE}): ${MESSAGE}"
log "提交消息:$commit_message"
# 添加文件
git add -A
# 提交
git commit -m "$commit_message"
log "✓ 提交成功"
1.3 分支管理策略
1.3.1 分支保护规则
#!/bin/bash
# branch-protection.sh - 分支保护配置
set -euo pipefail
GITHUB_TOKEN="your_token"
REPO="owner/repo"
# 配置 main 分支保护
curl -X PUT \
-H "Authorization: token $GITHUB_TOKEN" \
-H "Accept: application/vnd.github.v3+json" \
https://api.github.com/repos/$REPO/branches/main/protection \
-d '{
"required_status_checks": {
"strict": true,
"contexts": ["ci/jenkins", "sonarqube"]
},
"enforce_admins": true,
"required_pull_request_reviews": {
"required_approving_review_count": 1,
"dismiss_stale_reviews": true,
"require_code_owner_reviews": true
},
"restrictions": null,
"required_linear_history": false,
"allow_force_pushes": false,
"allow_deletions": false
}'
echo "✓ 分支保护配置完成"
2. 容器镜像构建
2.1 Dockerfile 最佳实践
2.1.1 Java 应用 Dockerfile
# Dockerfile - Java 应用多阶段构建
# 阶段 1: 依赖下载
FROM maven:3.9-openjdk-17 AS dependencies
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline -B
# 阶段 2: 编译构建
FROM maven:3.9-openjdk-17 AS builder
WORKDIR /app
COPY --from=dependencies /root/.m2 /root/.m2
COPY src ./src
RUN mvn clean package -DskipTests -B
# 阶段 3: 运行环境
FROM eclipse-temurin:17-jre-alpine AS production
# 创建非 root 用户
RUN addgroup -g 1001 -S java && \
adduser -S java -u 1001
# 设置工作目录
WORKDIR /app
# 复制构建产物
COPY --from=builder --chown=java:java /app/target/*.jar app.jar
# 设置 JVM 参数
ENV JAVA_OPTS="-Xms512m -Xmx2g -XX:+UseG1GC"
# 暴露端口
EXPOSE 8080
# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
CMD wget -qO- http://localhost:8080/actuator/health || exit 1
# 切换用户
USER java
# 启动应用
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar app.jar"]
2.1.2 Node.js 应用 Dockerfile
# Dockerfile - Node.js 应用多阶段构建
# 阶段 1: 依赖安装
FROM node:18-alpine AS dependencies
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force
# 阶段 2: 构建编译
FROM node:18-alpine AS builder
WORKDIR /app
COPY --from=dependencies /app/node_modules ./node_modules
COPY . .
RUN npm run build
# 阶段 3: 生产环境
FROM node:18-alpine AS production
# 安装 dumb-init
RUN apk add --no-cache dumb-init
# 创建非 root 用户
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
WORKDIR /app
# 复制构建产物
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nodejs:nodejs /app/package.json ./
# 设置环境变量
ENV NODE_ENV=production
ENV PORT=3000
# 暴露端口
EXPOSE 3000
# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=30s --retries=3 \
CMD wget -qO- http://localhost:3000/health || exit 1
# 切换用户
USER nodejs
# 使用 dumb-init 启动
ENTRYPOINT ["dumb-init", "node", "dist/server.js"]
2.1.3 Python 应用 Dockerfile
# Dockerfile - Python 应用多阶段构建
# 阶段 1: 依赖安装
FROM python:3.11-slim AS dependencies
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# 阶段 2: 构建编译
FROM python:3.11-slim AS builder
WORKDIR /app
COPY --from=dependencies /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY . .
# 阶段 3: 生产环境
FROM python:3.11-slim AS production
# 安装运行时依赖
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
&& rm -rf /var/lib/apt/lists/*
# 创建非 root 用户
RUN groupadd -g 1001 python && \
useradd -r -u 1001 -g python python
WORKDIR /app
# 复制构建产物
COPY --from=builder --chown=python:python /app .
# 设置环境变量
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
# 暴露端口
EXPOSE 8000
# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=30s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# 切换用户
USER python
# 启动应用
ENTRYPOINT ["python", "-m", "gunicorn", "--bind", "0.0.0.0:8000", "app:app"]
2.2 多阶段构建
2.2.2 多阶段构建优势
| 优势 | 说明 | 效果 |
|---|---|---|
| 减小镜像体积 | 只复制最终产物,不包含构建工具 | 减少 70-90% |
| 提高安全性 | 生产镜像不包含源码和构建依赖 | 减少攻击面 |
| 加速构建 | 利用缓存层,减少重复下载 | 提升 50-80% |
| 环境隔离 | 构建环境与运行环境分离 | 提高一致性 |
2.3 镜像优化技巧
2.3.1 镜像优化清单
✅ 层缓存优化:
# 按变更频率排序
# 1. 最少变化的放前面
COPY requirements.txt .
RUN pip install -r requirements.txt
# 2. 经常变化的放后面
COPY . .
✅ 减少镜像层数:
# 使用 && 连接多条 RUN 指令
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
wget \
&& rm -rf /var/lib/apt/lists/*
✅ 使用 .dockerignore:
# .dockerignore 文件
.git
.gitignore
*.md
Dockerfile
.dockerignore
node_modules
__pycache__
*.pyc
*.pyo
.env
.venv
✅ 选择基础镜像:
# 按体积排序
FROM alpine:latest # 5MB
FROM scratch # 0MB
FROM debian:slim # 30MB
FROM ubuntu:latest # 70MB
3. 镜像仓库管理
3.1 Harbor 仓库配置
3.1.1 Harbor 项目配置
#!/bin/bash
# harbor-project.sh - Harbor 项目配置
set -euo pipefail
HARBOR_URL="https://harbor.example.com"
HARBOR_USER="admin"
HARBOR_PASSWORD="Harbor12345"
# 获取认证 Token
TOKEN=$(curl -sk -X POST \
"${HARBOR_URL}/api/v2.0/users/current/permissions" \
-u "${HARBOR_USER}:${HARBOR_PASSWORD}" \
| jq -r '.token')
# 创建项目
create_project() {
local project_name=$1
local public=$2
curl -sk -X POST \
"${HARBOR_URL}/api/v2.0/projects" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${TOKEN}" \
-d "{
\"project_name\": \"${project_name}\",
\"public\": ${public},
\"metadata\": {
\"auto_scan\": \"true\",
\"reuse_sys_cve_allowlist\": \"false\"
}
}"
echo "✓ 项目创建成功:$project_name"
}
# 创建项目
create_project "production" false
create_project "staging" false
create_project "development" true
3.2 镜像推送流程
3.2.1 镜像推送脚本
#!/bin/bash
# push-image.sh - 镜像推送脚本
set -euo pipefail
REGISTRY="harbor.example.com"
PROJECT="production"
APP_NAME="myapp"
VERSION="${1:-latest}"
log() {
echo "[$(date +'%H:%M:%S')] $1"
}
log "=== 推送镜像到 Harbor ==="
# 1. 构建镜像
log "[1/4] 构建镜像..."
docker build -t ${REGISTRY}/${PROJECT}/${APP_NAME}:${VERSION} .
# 2. 登录 Harbor
log "[2/4] 登录 Harbor..."
docker login ${REGISTRY} -u admin -p Harbor12345
# 3. 推送镜像
log "[3/4] 推送镜像..."
docker push ${REGISTRY}/${PROJECT}/${APP_NAME}:${VERSION}
# 4. 推送 latest 标签
if [ "$VERSION" != "latest" ]; then
log "[4/4] 推送 latest 标签..."
docker tag ${REGISTRY}/${PROJECT}/${APP_NAME}:${VERSION} \
${REGISTRY}/${PROJECT}/${APP_NAME}:latest
docker push ${REGISTRY}/${PROJECT}/${APP_NAME}:latest
fi
log "✓ 镜像推送完成"
log "镜像地址:${REGISTRY}/${PROJECT}/${APP_NAME}:${VERSION}"
3.3 镜像安全扫描
3.3.1 镜像扫描脚本
#!/bin/bash
# scan-image.sh - 镜像安全扫描
set -euo pipefail
REGISTRY="harbor.example.com"
PROJECT="production"
APP_NAME="myapp"
VERSION="${1:-latest}"
log() {
echo "[$(date +'%H:%M:%S')] $1"
}
log "=== 镜像安全扫描 ==="
# 1. 使用 Trivy 扫描
log "[1/3] Trivy 扫描..."
docker run --rm \
-v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy image \
${REGISTRY}/${PROJECT}/${APP_NAME}:${VERSION}
# 2. 使用 Harbor 扫描 API
log "[2/3] 触发 Harbor 扫描..."
curl -sk -X POST \
"https://${REGISTRY}/api/v2.0/projects/${PROJECT}/repositories/${APP_NAME}/artifacts/${VERSION}/scan" \
-u admin:Harbor12345
# 3. 获取扫描报告
log "[3/3] 获取扫描报告..."
REPORT=$(curl -sk \
"https://${REGISTRY}/api/v2.0/projects/${PROJECT}/repositories/${APP_NAME}/artifacts/${VERSION}" \
-u admin:Harbor12345 | jq '.[0].scan_summary')
echo "$REPORT" | jq .
log "✓ 扫描完成"
4. 容器化部署
4.1 Docker Swarm 部署
4.1.1 Swarm 部署脚本
#!/bin/bash
# swarm-deploy.sh - Docker Swarm 部署
set -euo pipefail
STACK_NAME="myapp"
REGISTRY="harbor.example.com"
PROJECT="production"
APP_NAME="myapp"
VERSION="${1:-latest}"
log() {
echo "[$(date +'%H:%M:%S')] $1"
}
log "=== Docker Swarm 部署 ==="
# 1. 准备 docker-compose.yml
log "[1/5] 准备部署配置..."
cat > docker-compose.swarm.yml <<EOF
version: '3.8'
services:
app:
image: ${REGISTRY}/${PROJECT}/${APP_NAME}:${VERSION}
deploy:
replicas: 3
update_config:
parallelism: 1
delay: 30s
failure_action: rollback
order: start-first
resources:
limits:
cpus: '1.0'
memory: 1G
reservations:
cpus: '0.5'
memory: 512M
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
ports:
- "8080:8080"
networks:
- app-network
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
networks:
app-network:
driver: overlay
attachable: true
EOF
# 2. 登录 Harbor
log "[2/5] 登录镜像仓库..."
docker login ${REGISTRY} -u admin -p Harbor12345
# 3. 部署服务
log "[3/5] 部署服务..."
docker stack deploy \
-c docker-compose.swarm.yml \
${STACK_NAME}
# 4. 等待部署完成
log "[4/5] 等待部署完成..."
for i in {1..30}; do
RUNNING=$(docker service ps ${STACK_NAME}_app --format "{{.CurrentState}}" | grep -c "Running" || true)
if [ "$RUNNING" -ge 3 ]; then
log "✓ 所有副本已就绪"
break
fi
log "等待中... ($RUNNING/3) ($i/30)"
sleep 10
done
# 5. 验证部署
log "[5/5] 验证部署..."
docker service ps ${STACK_NAME}_app
docker service ls | grep ${STACK_NAME}
log "✓ 部署完成"
4.2 Kubernetes 部署
4.2.1 K8s 部署脚本
#!/bin/bash
# k8s-deploy.sh - Kubernetes 部署
set -euo pipefail
NAMESPACE="production"
APP_NAME="myapp"
REGISTRY="harbor.example.com"
PROJECT="production"
VERSION="${1:-latest}"
log() {
echo "[$(date +'%H:%M:%S')] $1"
}
log "=== Kubernetes 部署 ==="
# 1. 准备 Deployment YAML
log "[1/6] 准备 Deployment 配置..."
cat > deployment.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: ${APP_NAME}
namespace: ${NAMESPACE}
labels:
app: ${APP_NAME}
spec:
replicas: 3
selector:
matchLabels:
app: ${APP_NAME}
template:
metadata:
labels:
app: ${APP_NAME}
spec:
containers:
- name: ${APP_NAME}
image: ${REGISTRY}/${PROJECT}/${APP_NAME}:${VERSION}
ports:
- containerPort: 8080
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
EOF
# 2. 准备 Service YAML
log "[2/6] 准备 Service 配置..."
cat > service.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
name: ${APP_NAME}
namespace: ${NAMESPACE}
spec:
selector:
app: ${APP_NAME}
ports:
- port: 80
targetPort: 8080
type: LoadBalancer
EOF
# 3. 创建命名空间
log "[3/6] 创建命名空间..."
kubectl create namespace ${NAMESPACE} --dry-run=client -o yaml | kubectl apply -f -
# 4. 创建镜像拉取密钥
log "[4/6] 创建镜像密钥..."
kubectl create secret docker-registry harbor-secret \
--docker-server=${REGISTRY} \
--docker-username=admin \
--docker-password=Harbor12345 \
--namespace=${NAMESPACE} \
--dry-run=client -o yaml | kubectl apply -f -
# 5. 应用配置
log "[5/6] 应用配置..."
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
# 6. 等待部署完成
log "[6/6] 等待部署完成..."
kubectl rollout status deployment/${APP_NAME} -n ${NAMESPACE}
log "✓ 部署完成"
kubectl get pods -n ${NAMESPACE} -l app=${APP_NAME}
4.3 滚动更新策略
4.3.1 滚动更新配置
# Kubernetes 滚动更新配置
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # 最多超出副本数的 Pod 数
maxUnavailable: 0 # 最多不可用的 Pod 数
# 更新参数
minReadySeconds: 30 # Pod 就绪后等待时间
revisionHistoryLimit: 10 # 保留的旧 ReplicaSet 数
progressDeadlineSeconds: 600 # 更新超时时间
5. 服务发现与负载均衡
5.1 服务发现机制
5.1.1 Docker Swarm 服务发现
# Docker Swarm 内置 DNS 服务
# 服务名自动解析为 VIP
# 示例:服务间通信
docker network create -d overlay my-network
docker service create \
--name backend \
--network my-network \
harbor.example.com/backend:latest
docker service create \
--name frontend \
--network my-network \
--env BACKEND_URL=http://backend:8080 \
harbor.example.com/frontend:latest
# frontend 服务可以通过 backend 主机名访问 backend 服务
5.2 负载均衡配置
5.2.1 Nginx 负载均衡配置
# nginx-lb.conf - Nginx 负载均衡配置
upstream myapp_backend {
least_conn; # 最少连接算法
server myapp-1:8080 weight=5;
server myapp-2:8080 weight=5;
server myapp-3:8080 weight=5;
keepalive 32;
}
server {
listen 80;
server_name myapp.example.com;
location / {
proxy_pass http://myapp_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# 超时配置
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
# 健康检查
proxy_next_upstream error timeout http_500 http_502 http_503 http_504;
}
# 健康检查端点
location /health {
access_log off;
return 200 "healthy\n";
add_header Content-Type text/plain;
}
}
5.3 健康检查
5.3.1 健康检查配置
# Kubernetes 健康检查配置
apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
containers:
- name: myapp
image: myapp:latest
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30 # 容器启动后等待时间
periodSeconds: 10 # 检查间隔
timeoutSeconds: 5 # 超时时间
successThreshold: 1 # 成功阈值
failureThreshold: 3 # 失败阈值
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 10
6. 发布验证与回滚
6.1 发布验证
6.1.1 发布验证脚本
#!/bin/bash
# verify-deployment.sh - 发布验证脚本
set -euo pipefail
NAMESPACE="${1:-production}"
APP_NAME="${2:-myapp}"
log() {
echo "[$(date +'%H:%M:%S')] $1"
}
log "=== 发布验证 ==="
# 1. 检查 Pod 状态
log "[1/5] 检查 Pod 状态..."
kubectl get pods -n ${NAMESPACE} -l app=${APP_NAME}
# 2. 检查副本数
log "[2/5] 检查副本数..."
DESIRED=$(kubectl get deployment ${APP_NAME} -n ${NAMESPACE} -o jsonpath='{.spec.replicas}')
CURRENT=$(kubectl get deployment ${APP_NAME} -n ${NAMESPACE} -o jsonpath='{.status.readyReplicas}')
if [ "$CURRENT" -ge "$DESIRED" ]; then
log "✓ 副本数正常:$CURRENT/$DESIRED"
else
log "✗ 副本数异常:$CURRENT/$DESIRED"
exit 1
fi
# 3. 检查健康状态
log "[3/5] 检查健康状态..."
for pod in $(kubectl get pods -n ${NAMESPACE} -l app=${APP_NAME} -o jsonpath='{.items[*].metadata.name}'); do
kubectl exec -n ${NAMESPACE} $pod -- curl -f http://localhost:8080/health || {
log "✗ Pod 健康检查失败:$pod"
exit 1
}
done
log "✓ 所有 Pod 健康检查通过"
# 4. 检查服务访问
log "[4/5] 检查服务访问..."
SERVICE_IP=$(kubectl get svc ${APP_NAME} -n ${NAMESPACE} -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
if curl -f http://${SERVICE_IP}/health; then
log "✓ 服务访问正常"
else
log "✗ 服务访问失败"
exit 1
fi
# 5. 检查日志
log "[5/5] 检查日志..."
kubectl logs -n ${NAMESPACE} -l app=${APP_NAME} --tail=20
log "✓ 发布验证完成"
6.2 自动回滚机制
6.2.1 自动回滚脚本
#!/bin/bash
# auto-rollback.sh - 自动回滚脚本
set -euo pipefail
NAMESPACE="${1:-production}"
APP_NAME="${2:-myapp}"
ERROR_THRESHOLD="${3:-5}" # 错误率阈值
log() {
echo "[$(date +'%H:%M:%S')] $1"
}
log "=== 监控部署状态 ==="
# 获取当前版本
CURRENT_VERSION=$(kubectl get deployment ${APP_NAME} -n ${NAMESPACE} -o jsonpath='{.spec.template.spec.containers[0].image}')
# 监控错误率
while true; do
# 获取错误率(从 Prometheus)
ERROR_RATE=$(curl -s "http://prometheus:9090/api/v1/query" \
--data-urlencode "query=rate(http_requests_total{status=~\"5..\",app=\"${APP_NAME}\"}[5m])" | \
jq -r '.data.result[0].value[1]')
log "当前错误率:${ERROR_RATE}"
# 检查是否超过阈值
if (( $(echo "$ERROR_RATE > $ERROR_THRESHOLD" | bc -l) )); then
log "✗ 错误率超过阈值,触发自动回滚"
# 获取上一版本
PREV_REVISION=$(kubectl rollout history deployment/${APP_NAME} -n ${NAMESPACE} | tail -2 | head -1 | awk '{print $1}')
# 执行回滚
kubectl rollout undo deployment/${APP_NAME} -n ${NAMESPACE} --to-revision=${PREV_REVISION}
log "✓ 已回滚到版本:$PREV_REVISION"
# 发送告警
curl -X POST "http://alertmanager:9093/api/v1/alerts" \
-H "Content-Type: application/json" \
-d "{
\"alerts\": [{
\"labels\": {
\"severity\": \"critical\",
\"app\": \"${APP_NAME}\"
},
\"annotations\": {
\"summary\": \"自动回滚触发\",
\"description\": \"错误率超过阈值,已回滚到版本 ${PREV_REVISION}\"
}
}]
}"
exit 1
fi
sleep 60
done
6.3 灰度发布
6.3.1 金丝雀发布脚本
#!/bin/bash
# canary-release.sh - 金丝雀发布脚本
set -euo pipefail
NAMESPACE="${1:-production}"
APP_NAME="${2:-myapp}"
NEW_VERSION="${3:-latest}"
CANARY_PERCENT="${4:-10}"
log() {
echo "[$(date +'%H:%M:%S')] $1"
}
log "=== 金丝雀发布 ==="
# 1. 获取当前副本数
TOTAL_REPLICAS=$(kubectl get deployment ${APP_NAME} -n ${NAMESPACE} -o jsonpath='{.spec.replicas}')
CANARY_REPLICAS=$((TOTAL_REPLICAS * CANARY_PERCENT / 100))
STABLE_REPLICAS=$((TOTAL_REPLICAS - CANARY_REPLICAS))
log "总副本数:$TOTAL_REPLICAS"
log "金丝雀副本:$CANARY_REPLICAS"
log "稳定副本:$STABLE_REPLICAS"
# 2. 创建金丝雀部署
log "[1/4] 创建金丝雀部署..."
cat > canary-deployment.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: ${APP_NAME}-canary
namespace: ${NAMESPACE}
spec:
replicas: ${CANARY_REPLICAS}
selector:
matchLabels:
app: ${APP_NAME}
version: canary
template:
metadata:
labels:
app: ${APP_NAME}
version: canary
spec:
containers:
- name: ${APP_NAME}
image: harbor.example.com/${APP_NAME}:${NEW_VERSION}
ports:
- containerPort: 8080
EOF
kubectl apply -f canary-deployment.yaml
# 3. 更新稳定版本
log "[2/4] 更新稳定版本..."
kubectl set image deployment/${APP_NAME} \
${APP_NAME}=harbor.example.com/${APP_NAME}:${NEW_VERSION} \
-n ${NAMESPACE}
# 4. 等待金丝雀部署
log "[3/4] 等待金丝雀部署..."
kubectl rollout status deployment/${APP_NAME}-canary -n ${NAMESPACE}
# 5. 监控金丝雀指标
log "[4/4] 监控金丝雀指标(60 秒)..."
sleep 60
# 检查错误率
CANARY_ERROR=$(curl -s "http://prometheus:9090/api/v1/query" \
--data-urlencode "query=rate(http_requests_total{status=~\"5..\",app=\"${APP_NAME}\",version=\"canary\"}[5m])" | \
jq -r '.data.result[0].value[1]')
log "金丝雀错误率:$CANARY_ERROR"
if (( $(echo "$CANARY_ERROR > 0.05" | bc -l) )); then
log "✗ 金丝雀发布失败,错误率过高"
kubectl delete deployment ${APP_NAME}-canary -n ${NAMESPACE}
exit 1
fi
log "✓ 金丝雀发布成功"
7. 监控与日志
7.1 容器监控
7.1.1 Prometheus 监控配置
# prometheus-rules.yml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: container-alerts
namespace: monitoring
spec:
groups:
- name: container.rules
rules:
- alert: ContainerHighCPU
expr: rate(container_cpu_usage_seconds_total[5m]) > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "容器 CPU 使用率过高"
description: "{{ $labels.container }} CPU 使用率超过 80%"
- alert: ContainerHighMemory
expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "容器内存使用率过高"
description: "{{ $labels.container }} 内存使用率超过 90%"
- alert: ContainerRestarted
expr: rate(container_last_seen[5m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "容器频繁重启"
description: "{{ $labels.container }} 在 5 分钟内重启"
7.2 日志收集
7.2.1 EFK Stack 配置
# fluentd-config.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
namespace: logging
data:
fluent.conf: |
<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
<parse>
@type json
</parse>
</source>
<filter kubernetes.**>
@type kubernetes_metadata
@id filter_kube_metadata
</filter>
<match **>
@type elasticsearch
host elasticsearch.logging.svc.cluster.local
port 9200
logstash_format true
logstash_prefix kubernetes
flush_interval 5s
</match>
7.3 告警配置
7.3.1 告警规则配置
# alertmanager-config.yml
route:
group_by: ['alertname', 'severity']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'default-receiver'
routes:
- match:
severity: critical
receiver: 'critical-receiver'
- match:
severity: warning
receiver: 'warning-receiver'
receivers:
- name: 'default-receiver'
email_configs:
- to: 'team@example.com'
send_resolved: true
- name: 'critical-receiver'
webhook_configs:
- url: 'http://dingtalk-webhook:8080/alerts'
send_resolved: true
- name: 'warning-receiver'
email_configs:
- to: 'dev@example.com'
send_resolved: true
8. 总结
8.1 核心技术要点
- 代码管理:Git Flow 工作流、规范化提交
- 镜像构建:多阶段构建、镜像优化
- 仓库管理:Harbor 配置、安全扫描
- 容器部署:Swarm/K8s 部署、滚动更新
- 服务发现:负载均衡、健康检查
- 发布验证:自动化验证、回滚机制
- 监控告警:指标监控、日志收集
8.2 最佳实践清单
✅ 构建优化:
- 使用多阶段构建
- 选择最小基础镜像
- 利用层缓存
- 配置 .dockerignore
✅ 部署规范:
- 配置健康检查
- 实施滚动更新
- 配置资源限制
- 使用非 root 用户
✅ 安全加固:
- 镜像安全扫描
- 使用私有仓库
- 配置网络策略
- 最小权限原则
✅ 监控运维:
- 配置监控指标
- 收集应用日志
- 设置告警规则
- 实施自动回滚
附录 A:完整发布脚本
完整发布脚本已在上文各章节提供,包括:
- Git 提交脚本
- 镜像构建脚本
- 镜像推送脚本
- Swarm 部署脚本
- K8s 部署脚本
- 金丝雀发布脚本
- 自动回滚脚本
附录 B:故障排查指南
# 容器故障排查 SOP
# 1. 检查容器状态
docker ps -a
kubectl get pods -n <namespace>
# 2. 查看容器日志
docker logs <container>
kubectl logs -n <namespace> <pod>
# 3. 进入容器调试
docker exec -it <container> sh
kubectl exec -it -n <namespace> <pod> -- sh
# 4. 检查资源使用
docker stats
kubectl top pods -n <namespace>
# 5. 网络连通性
docker network inspect <network>
kubectl describe pod <pod> -n <namespace>
# 6. 健康检查
curl http://localhost:8080/health
kubectl get endpoints <service> -n <namespace>
文档版本: V1.0
最后更新: 2026-03-12
作者: AI 技术助手
许可协议: CC BY-SA 4.0
AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。
更多推荐



所有评论(0)