Docker Compose 编排 AI 应用栈:一键启动 LLM + PGVector + Redis
·
Docker Compose 编排 AI 应用栈:从零搭建生产级 LLM + 向量数据库开发环境
前言
💡 痛点:每次搭建 AI 开发环境都要手动安装一堆依赖?Ollama、PostgreSQL、Redis、PGVector… 装完还要配环境变量、解决端口冲突?
🎯 解决方案:用 Docker Compose 一键启动完整的 AI 开发栈,一行命令,所有服务同时就绪!
Docker Compose 是 Docker 官方提供的容器编排工具,可以:
- 📦 一键启动 —— 定义好配置,一个命令启动所有服务
- 🔄 环境隔离 —— 每个项目独立的开发环境,互不干扰
- 🎯 版本锁定 —— 所有依赖版本固定,避免"在我机器上能跑"问题
- 🚀 快速迁移 —— 复制
docker-compose.yml,新机器 5 分钟搞定
本文将教你用 Docker Compose 搭建完整的 AI 应用开发栈:
一、为什么用 Docker Compose 搭建 AI 开发环境?
1.1 传统安装 vs Docker Compose
1.2 传统安装的问题
| 问题 | 传统安装 | Docker Compose |
|---|---|---|
| ⏱️ 安装时间 | 30-60 分钟 | 3-5 分钟 |
| 🔧 配置复杂度 | 高(手动配置) | 低(YAML 配置) |
| 🔄 版本管理 | 混乱 | 固定版本 |
| 🧹 卸载清理 | 繁琐 | docker compose down |
| 🔁 环境一致性 | 难保证 | 100% 一致 |
| 💻 多项目支持 | 端口冲突 | 完全隔离 |
1.3 AI 开发栈组件
二、Docker 和 Docker Compose 快速入门
2.1 安装 Docker
渲染错误: Mermaid 渲染失败: Parse error on line 4: ...[Docker Desktop
(需要 WSL2)] B --> -----------------------^ Expecting 'SQE', 'DOUBLECIRCLEEND', 'PE', '-)', 'STADIUMEND', 'SUBROUTINEEND', 'PIPE', 'CYLINDEREND', 'DIAMOND_STOP', 'TAGEND', 'TRAPEND', 'INVTRAPEND', 'UNICODE_TEXT', 'TEXT', 'TAGSTART', got 'PS'
(需要 WSL2)] B --> -----------------------^ Expecting 'SQE', 'DOUBLECIRCLEEND', 'PE', '-)', 'STADIUMEND', 'SUBROUTINEEND', 'PIPE', 'CYLINDEREND', 'DIAMOND_STOP', 'TAGEND', 'TRAPEND', 'INVTRAPEND', 'UNICODE_TEXT', 'TEXT', 'TAGSTART', got 'PS'
macOS / Windows:
# 1. 下载 Docker Desktop
# https://www.docker.com/products/docker-desktop/
# 2. 安装并启动
# 3. 验证
docker --version
# Docker version 26.0.0, build 61b9c72
docker compose version
# Docker Compose version v2.24.0
Linux (Ubuntu/Debian):
# 一键安装脚本
curl -fsSL https://get.docker.com | sh
# 或使用包管理器
sudo apt update
sudo apt install docker.io docker-compose-v2
# 添加用户到 docker 组(免 sudo)
sudo usermod -aG docker $USER
newgrp docker
2.2 Docker 核心概念
| 概念 | 说明 | 类比 |
|---|---|---|
| Image(镜像) | 应用的只读模板 | 建筑图纸 |
| Container(容器) | 镜像的运行实例 | 建好的房子 |
| Dockerfile | 构建镜像的脚本 | 施工方案 |
| Volume(卷) | 持久化存储 | 地下室 |
| Network(网络) | 容器间通信 | 局域网 |
2.3 Docker Compose 核心概念
三、基础模板:Ollama + FastAPI + Redis
3.1 项目结构
ai-stack/
├── docker-compose.yml # ⭐ 核心配置文件
├── Dockerfile # API 服务镜像构建
├── .env # 环境变量
├── requirements.txt # Python 依赖
├── app/
│ ├── main.py # FastAPI 入口
│ ├── config.py # 配置管理
│ ├── routers/ # API 路由
│ │ ├── chat.py # 对话接口
│ │ └── embed.py # 向量化接口
│ └── models/ # 数据模型
└── data/ # 持久化目录
├── ollama/ # Ollama 模型存储
├── postgres/ # PostgreSQL 数据
└── redis/ # Redis 数据
3.2 基础 docker-compose.yml
# docker-compose.yml
# 基础 AI 开发栈:Ollama + FastAPI + Redis
version: '3.8'
services:
# ═══════════════════════════════════════════
# 🤖 Ollama - 本地 LLM 推理服务
# ═══════════════════════════════════════════
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434" # Ollama API 端口
volumes:
- ./data/ollama:/root/.ollama # 模型持久化
environment:
- OLLAMA_HOST=0.0.0.0:11434
- OLLAMA_MODELS=/root/.ollama/models
# GPU 支持(如果有 NVIDIA GPU)
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
networks:
- ai-network
# ═══════════════════════════════════════════
# 🔴 Redis - 缓存层
# ═══════════════════════════════════════════
redis:
image: redis:7-alpine
container_name: redis
restart: unless-stopped
ports:
- "6379:6379"
volumes:
- ./data/redis:/data
command: redis-server --appendonly yes
networks:
- ai-network
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 3
# ═══════════════════════════════════════════
# 🌐 FastAPI - API 服务
# ═══════════════════════════════════════════
api:
build:
context: .
dockerfile: Dockerfile
container_name: ai-api
restart: unless-stopped
ports:
- "8000:8000"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- REDIS_HOST=redis
- REDIS_PORT=6379
depends_on:
ollama:
condition: service_started
redis:
condition: service_healthy
networks:
- ai-network
# ═══════════════════════════════════════════
# 🌐 网络配置
# ═══════════════════════════════════════════
networks:
ai-network:
driver: bridge
# ═══════════════════════════════════════════
# 💾 数据卷持久化
# ═══════════════════════════════════════════
volumes:
ollama-data:
redis-data:
3.3 Dockerfile(FastAPI 服务)
# Dockerfile
FROM python:3.11-slim
# 设置工作目录
WORKDIR /app
# 安装系统依赖
RUN apt-get update && apt-get install -y \
curl \
&& rm -rf /var/lib/apt/lists/*
# 复制依赖文件
COPY requirements.txt .
# 安装 Python 依赖
RUN pip install --no-cache-dir -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
# 复制应用代码
COPY app/ ./app/
# 暴露端口
EXPOSE 8000
# 启动命令
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]
3.4 requirements.txt
# requirements.txt
# AI 开发核心依赖
# Web 框架
fastapi==0.111.0
uvicorn[standard]==0.30.0
pydantic==2.8.0
# LLM 集成
langchain==0.2.0
langchain-community==0.2.0
langchain-ollama==0.1.0
# Redis 客户端
redis==5.0.0
aioredis==2.0.1
# 工具
python-dotenv==1.0.0
httpx==0.27.0
3.5 环境变量文件
# .env
# 环境变量配置
# Ollama 配置
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3:8b
# Redis 配置
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0
# API 配置
API_HOST=0.0.0.0
API_PORT=8000
# 日志
LOG_LEVEL=INFO
3.6 FastAPI 入口文件
"""
app/main.py - FastAPI 入口文件
"""
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from contextlib import asynccontextmanager
from app.routers import chat, embed
from app.config import settings
@asynccontextmanager
async def lifespan(app: FastAPI):
"""应用生命周期管理"""
# 启动时
print("🚀 AI API 服务启动中...")
yield
# 关闭时
print("👋 AI API 服务关闭...")
# 创建 FastAPI 应用
app = FastAPI(
title="AI Stack API",
description="Docker Compose AI 开发栈的 API 服务",
version="1.0.0",
lifespan=lifespan
)
# 配置 CORS
app.add_middleware(
CORSMiddleware,
allow_origins=["*"], # 生产环境应限制
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# 注册路由
app.include_router(chat.router, prefix="/api/v1/chat", tags=["对话"])
app.include_router(embed.router, prefix="/api/v1/embed", tags=["向量化"])
@app.get("/")
async def root():
"""健康检查"""
return {
"status": "healthy",
"service": "AI Stack API",
"version": "1.0.0"
}
@app.get("/health")
async def health_check():
"""详细健康检查"""
return {
"status": "healthy",
"ollama": "connected",
"redis": "connected"
}
3.7 配置管理
"""
app/config.py - 配置管理
"""
from pydantic_settings import BaseSettings
from functools import lru_cache
class Settings(BaseSettings):
"""应用配置"""
# Ollama
ollama_base_url: str = "http://localhost:11434"
ollama_model: str = "llama3:8b"
ollama_timeout: int = 120
# Redis
redis_host: str = "localhost"
redis_port: int = 6379
redis_db: int = 0
# API
api_host: str = "0.0.0.0"
api_port: int = 8000
class Config:
env_file = ".env"
env_file_encoding = "utf-8"
@lru_cache()
def get_settings() -> Settings:
"""获取配置(单例)"""
return Settings()
3.8 路由:对话接口
"""
app/routers/chat.py - 对话路由
"""
from fastapi import APIRouter, HTTPException, BackgroundTasks
from pydantic import BaseModel
from typing import Optional, List
import httpx
router = APIRouter()
class ChatMessage(BaseModel):
"""对话消息"""
role: str # user / assistant
content: str
class ChatRequest(BaseModel):
"""对话请求"""
messages: List[ChatMessage]
model: Optional[str] = "llama3:8b"
stream: Optional[bool] = False
temperature: Optional[float] = 0.7
max_tokens: Optional[int] = 2048
class ChatResponse(BaseModel):
"""对话响应"""
model: str
message: ChatMessage
done: bool
total_duration: Optional[int] = None
@router.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest, background_tasks: BackgroundTasks):
"""
Ollama 对话接口
将用户消息转发给 Ollama,返回 AI 回复
"""
from app.config import get_settings
settings = get_settings()
# 构造 Ollama API 请求
payload = {
"model": request.model or settings.ollama_model,
"messages": [msg.model_dump() for msg in request.messages],
"stream": request.stream,
"options": {
"temperature": request.temperature,
"num_predict": request.max_tokens
}
}
try:
async with httpx.AsyncClient(timeout=settings.ollama_timeout) as client:
response = await client.post(
f"{settings.ollama_base_url}/api/chat",
json=payload
)
response.raise_for_status()
result = response.json()
return ChatResponse(
model=result["model"],
message=ChatMessage(
role=result["message"]["role"],
content=result["message"]["content"]
),
done=result["done"],
total_duration=result.get("total_duration")
)
except httpx.TimeoutException:
raise HTTPException(status_code=504, detail="Ollama 请求超时")
except httpx.HTTPStatusError as e:
raise HTTPException(status_code=e.response.status_code, detail=str(e))
@router.post("/chat/stream")
async def chat_stream(request: ChatRequest):
"""
流式对话接口
返回 Server-Sent Events (SSE)
"""
from app.config import get_settings
settings = get_settings()
payload = {
"model": request.model or settings.ollama_model,
"messages": [msg.model_dump() for msg in request.messages],
"stream": True
}
async def generate():
async with httpx.AsyncClient(timeout=settings.ollama_timeout) as client:
async with client.stream(
"POST",
f"{settings.ollama_base_url}/api/chat",
json=payload
) as response:
async for line in response.aiter_lines():
if line:
yield f"data: {line}\n\n"
from fastapi.responses import StreamingResponse
return StreamingResponse(
generate(),
media_type="text/event-stream"
)
3.9 路由:向量化接口
"""
app/routers/embed.py - 向量化路由
"""
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel
from typing import List
import httpx
router = APIRouter()
class EmbedRequest(BaseModel):
"""向量化请求"""
model: str = "nomic-embed-text"
prompt: str
class EmbedResponse(BaseModel):
"""向量化响应"""
model: str
embedding: List[float]
duration: int
@router.post("/embed", response_model=EmbedResponse)
async def embed(request: EmbedRequest):
"""
Ollama 向量化接口
将文本转换为向量表示
"""
from app.config import get_settings
settings = get_settings()
payload = {
"model": request.model,
"prompt": request.prompt
}
try:
async with httpx.AsyncClient(timeout=settings.ollama_timeout) as client:
response = await client.post(
f"{settings.ollama_base_url}/api/embeddings",
json=payload
)
response.raise_for_status()
result = response.json()
return EmbedResponse(
model=result["model"],
embedding=result["embedding"],
duration=result.get("duration", 0)
)
except httpx.HTTPStatusError as e:
raise HTTPException(
status_code=e.response.status_code,
detail=f"Embedding 请求失败: {str(e)}"
)
四、生产级配置:Ollama + PGVector + Redis + FastAPI
4.1 完整 docker-compose.yml
# docker-compose.prod.yml
# 生产级 AI 开发栈
version: '3.8'
services:
# ═══════════════════════════════════════════
# 🤖 Ollama - 本地 LLM 推理服务
# ═══════════════════════════════════════════
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434" # Ollama API
volumes:
- ollama-data:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0:11434
- OLLAMA_NUM_PARALLEL=4
- OLLAMA_MAX_LOADED_MODELS=2
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 30s
timeout: 10s
retries: 3
networks:
- ai-network
# ═══════════════════════════════════════════
# 🐘 PostgreSQL + PGVector - 向量数据库
# ═══════════════════════════════════════════
postgres:
image: ankane/pgvector:v0.7.0
container_name: postgres
restart: unless-stopped
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
environment:
- POSTGRES_DB=ai_knowledge
- POSTGRES_USER=ai_user
- POSTGRES_PASSWORD=ai_secure_password_2024
command: >
postgres
-c shared_preload_libraries=vector
-c max_connections=100
-c shared_buffers=256MB
-c effective_cache_size=512MB
-c maintenance_work_mem=128MB
-c work_mem=64MB
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ai_user -d ai_knowledge"]
interval: 10s
timeout: 5s
retries: 3
networks:
- ai-network
# ═══════════════════════════════════════════
# 🔴 Redis - 缓存 + Session 存储
# ═══════════════════════════════════════════
redis:
image: redis:7-alpine
container_name: redis
restart: unless-stopped
ports:
- "6379:6379"
volumes:
- redis-data:/data
- ./redis.conf:/usr/local/etc/redis/redis.conf
command: redis-server /usr/local/etc/redis/redis.conf
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 3
networks:
- ai-network
# ═══════════════════════════════════════════
# 🌐 FastAPI - API 服务
# ═══════════════════════════════════════════
api:
build:
context: .
dockerfile: Dockerfile
container_name: ai-api
restart: unless-stopped
ports:
- "8000:8000"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- OLLAMA_MODEL=llama3:8b
- REDIS_HOST=redis
- REDIS_PORT=6379
- POSTGRES_HOST=postgres
- POSTGRES_PORT=5432
- POSTGRES_DB=ai_knowledge
- POSTGRES_USER=ai_user
- POSTGRES_PASSWORD=ai_secure_password_2024
- LOG_LEVEL=INFO
depends_on:
ollama:
condition: service_healthy
postgres:
condition: service_healthy
redis:
condition: service_healthy
networks:
- ai-network
# ═══════════════════════════════════════════
# 🎨 Streamlit - Web UI
# ═══════════════════════════════════════════
webui:
image: python:3.11-slim
container_name: streamlit-ui
restart: unless-stopped
working_dir: /app
ports:
- "8501:8501"
volumes:
- ./webui:/app
command: >
sh -c "pip install streamlit requests pandas -q &&
streamlit run app.py --server.port=8501 --server.address=0.0.0.0"
environment:
- API_BASE_URL=http://api:8000
depends_on:
- api
networks:
- ai-network
# ═══════════════════════════════════════════
# 🌐 网络配置
# ═══════════════════════════════════════════
networks:
ai-network:
driver: bridge
# ═══════════════════════════════════════════
# 💾 数据卷持久化
# ═══════════════════════════════════════════
volumes:
ollama-data:
driver: local
postgres-data:
driver: local
redis-data:
driver: local
4.2 Redis 配置文件
# redis.conf
# Redis 配置文件
# 网络
bind 0.0.0.0
port 6379
protected-mode no
# 持久化
appendonly yes
appendfsync everysec
# 内存
maxmemory 512mb
maxmemory-policy allkeys-lru
# 日志
loglevel notice
logfile ""
# 安全(生产环境请启用)
# requirepass your_redis_password
4.3 PostgreSQL 初始化脚本
创建 init-scripts/01-init.sql:
-- init.sql
-- PostgreSQL 初始化脚本
-- 启用 vector 扩展
CREATE EXTENSION IF NOT EXISTS vector;
-- 创建向量表示例
CREATE TABLE IF NOT EXISTS document_chunks (
id SERIAL PRIMARY KEY,
document_id INTEGER NOT NULL,
chunk_index INTEGER NOT NULL,
content TEXT NOT NULL,
embedding VECTOR(768), -- nomic-embed-text 维度
metadata JSONB,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- 创建索引(加速相似度搜索)
CREATE INDEX IF NOT EXISTS idx_chunks_embedding
ON document_chunks
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- 创建全文搜索索引
CREATE INDEX IF NOT EXISTS idx_chunks_content
ON document_chunks
USING gin (to_tsvector('chinese', content));
-- 创建元数据索引
CREATE INDEX IF NOT EXISTS idx_chunks_doc_id
ON document_chunks (document_id);
-- 查看表结构
\d document_chunks
4.4 添加初始化脚本到 docker-compose
# 在 postgres 服务中添加
postgres:
image: ankane/pgvector:v0.7.0
# ... 其他配置 ...
volumes:
- postgres-data:/var/lib/postgresql/data
- ./init-scripts:/docker-entrypoint-initdb.d # ⭐ 初始化脚本
五、高级配置:多模型 + 负载均衡
5.1 多 Ollama 实例(负载均衡)
# docker-compose.ha.yml
# 高可用配置:多 Ollama 实例 + Nginx 负载均衡
version: '3.8'
services:
# Nginx 负载均衡器
nginx:
image: nginx:alpine
container_name: nginx-lb
restart: unless-stopped
ports:
- "11434:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
depends_on:
- ollama-1
- ollama-2
- ollama-3
networks:
- ai-network
# Ollama 实例 1
ollama-1:
image: ollama/ollama:latest
container_name: ollama-1
restart: unless-stopped
environment:
- OLLAMA_HOST=0.0.0.0:11434
- OLLAMA_NUM_PARALLEL=2
volumes:
- ollama-data-1:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
networks:
- ai-network
# Ollama 实例 2
ollama-2:
image: ollama/ollama:latest
container_name: ollama-2
restart: unless-stopped
environment:
- OLLAMA_HOST=0.0.0.0:11434
- OLLAMA_NUM_PARALLEL=2
volumes:
- ollama-data-2:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
networks:
- ai-network
# Ollama 实例 3 (备用,支持不同模型)
ollama-3:
image: ollama/ollama:latest
container_name: ollama-3
restart: unless-stopped
environment:
- OLLAMA_HOST=0.0.0.0:11434
- OLLAMA_NUM_PARALLEL=1
volumes:
- ollama-data-3:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
networks:
- ai-network
volumes:
ollama-data-1:
ollama-data-2:
ollama-data-3:
networks:
ai-network:
driver: bridge
5.2 Nginx 负载均衡配置
# nginx.conf
worker_processes auto;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
keepalive_timeout 65;
# 上游 Ollama 服务
upstream ollama_backend {
least_conn; # 最少连接算法
server ollama-1:11434 weight=3;
server ollama-2:11434 weight=3;
server ollama-3:11434 weight=2;
}
server {
listen 80;
server_name localhost;
location / {
proxy_pass http://ollama_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_cache_bypass $http_upgrade;
# 超时配置
proxy_connect_timeout 60s;
proxy_send_timeout 300s;
proxy_read_timeout 300s;
}
# 健康检查
location /health {
return 200 'OK';
add_header Content-Type text/plain;
}
}
}
六、使用指南
6.1 常用命令
| 操作 | 命令 | 说明 |
|---|---|---|
| 🚀 启动所有服务 | docker compose up -d |
后台启动 |
| 🚀 启动并查看日志 | docker compose up |
前台运行 |
| 🛑 停止所有服务 | docker compose down |
停止并移除容器 |
| 🛑 停止并清除数据 | docker compose down -v |
清除数据卷 |
| 📊 查看状态 | docker compose ps |
查看运行状态 |
| 📝 查看日志 | docker compose logs -f |
实时日志 |
| 🔄 重启服务 | docker compose restart <service> |
重启指定服务 |
| 🔨 重新构建 | docker compose build --no-cache |
重新构建镜像 |
| 🐚 进入容器 | docker exec -it <container> sh |
Shell 交互 |
| 🧹 清理未使用镜像 | docker image prune |
释放磁盘空间 |
6.2 模型管理
# 进入 Ollama 容器
docker exec -it ollama sh
# 拉取模型
ollama pull llama3:8b # Llama 3 8B
ollama pull llama3:70b # Llama 3 70B
ollama pull nomic-embed-text # Embedding 模型
ollama pull mistral # Mistral 7B
ollama pull phi3 # Phi-3 Mini
# 查看已下载模型
ollama list
# 删除模型
ollama rm llama3:8b
# 复制模型
ollama cp llama3:8b my-custom-model
# 运行模型
ollama run llama3:8b "Hello, how are you?"
6.3 数据库操作
# 连接 PostgreSQL
docker exec -it postgres psql -U ai_user -d ai_knowledge
# 常用 SQL
\dt # 查看表
\d document_chunks # 查看表结构
SELECT COUNT(*) FROM document_chunks; # 统计记录
# 连接 Redis
docker exec -it redis redis-cli
# 常用命令
PING # 测试连接
INFO # 服务器信息
KEYS * # 查看所有键
FLUSHDB # 清空当前数据库
6.4 测试 API
# 健康检查
curl http://localhost:8000/health
# 对话测试
curl -X POST http://localhost:8000/api/v1/chat/chat \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Hello!"}],
"model": "llama3:8b"
}'
# 向量化测试
curl -X POST http://localhost:8000/api/v1/embed/embed \
-H "Content-Type: application/json" \
-d '{
"model": "nomic-embed-text",
"prompt": "This is a test sentence"
}'
七、生产环境优化
7.1 安全加固
# docker-compose.prod.yml (安全增强版)
services:
ollama:
# ... 其他配置 ...
environment:
# 限制 API 访问
- OLLAMA_HOST=127.0.0.1:11434 # 只允许本地访问
networks:
- internal-network # 内部网络,不暴露端口
postgres:
# ... 其他配置 ...
environment:
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD} # 从环境变量读取
volumes:
- ./pg_hba.conf:/etc/postgresql/pg_hba.conf:ro # 认证配置
关键安全措施:
7.2 资源限制
# 资源限制配置
services:
postgres:
deploy:
resources:
limits:
cpus: '2'
memory: 2G
reservations:
cpus: '1'
memory: 1G
redis:
deploy:
resources:
limits:
cpus: '1'
memory: 512M
api:
deploy:
resources:
limits:
cpus: '2'
memory: 1G
7.3 监控集成
# 添加 Prometheus 监控
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
networks:
- ai-network
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana-data:/var/lib/grafana
depends_on:
- prometheus
networks:
- ai-network
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'ai-stack'
static_configs:
- targets: ['api:8000']
八、常见问题 FAQ
Q1:容器启动失败?
排查步骤:
# 1. 查看详细日志 docker compose logs -f <service-name> # 2. 检查端口占用 netstat -tulpn | grep <port> # 3. 重建镜像 docker compose build --no-cache docker compose up -d
Q2:GPU 不被识别?
解决方案:
# 1. 确认 NVIDIA 驱动已安装 nvidia-smi # 2. 安装 nvidia-container-toolkit distribution=ubuntu2204 curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt update && sudo apt install nvidia-container-toolkit sudo systemctl restart docker # 3. 验证 docker run --gpus all nvidia/cuda:11.8.0-runtime-ubuntu22.04 nvidia-smi
Q3:模型下载太慢?
加速方案:
# 方法 1:配置镜像(如果有) # 在 Ollama 启动时添加环境变量 environment: - OLLAMA_REGISTRY=https://example.com/ollama # 方法 2:手动下载后导入 # 1. 从其他途径下载模型文件 # 2. 放入 ./data/ollama/models/ # 3. 重启容器 # 方法 3:使用代理 environment: - HTTP_PROXY=http://proxy:8080 - HTTPS_PROXY=http://proxy:8080
Q4:磁盘空间不足?
清理方案:
# 1. 清理未使用的 Docker 资源 docker system prune -a # 2. 清理未使用的卷 docker volume prune # 3. 清理构建缓存 docker builder prune # 4. 查看磁盘使用 docker system df
九、总结
一键启动命令
# 1. 克隆项目
git clone https://github.com/yourusername/ai-stack.git
cd ai-stack
# 2. 启动所有服务
docker compose up -d
# 3. 查看状态
docker compose ps
# 4. 下载模型
docker exec -it ollama ollama pull llama3:8b
docker exec -it ollama ollama pull nomic-embed-text
# 5. 访问服务
# API: http://localhost:8000
# Streamlit: http://localhost:8501
# Ollama: http://localhost:11434
下一步行动
附录:完整项目模板
目录结构
ai-stack/
├── docker-compose.yml # 基础配置
├── docker-compose.prod.yml # 生产配置
├── docker-compose.ha.yml # 高可用配置
├── Dockerfile # API 镜像
├── Dockerfile.streamlit # Web UI 镜像
├── nginx.conf # 负载均衡
├── redis.conf # Redis 配置
├── requirements.txt # Python 依赖
├── .env.example # 环境变量示例
├── .gitignore
├── README.md
├── init-scripts/
│ └── 01-init.sql # 数据库初始化
├── app/
│ ├── main.py
│ ├── config.py
│ ├── routers/
│ │ ├── chat.py
│ │ └── embed.py
│ └── models/
│ └── schemas.py
└── webui/
└── app.py # Streamlit UI
快速复制模板
# 一键创建项目结构
mkdir -p ai-stack && cd ai-stack
mkdir -p app/routers app/models webui init-scripts data
# 创建 docker-compose.yml
cat > docker-compose.yml << 'EOF'
# 见本文档 4.1 节
EOF
# 创建 .env
cat > .env << 'EOF'
OLLAMA_MODEL=llama3:8b
POSTGRES_PASSWORD=your_secure_password
REDIS_PASSWORD=
EOF
# 启动!
docker compose up -d
本文基于 Docker 26+、Docker Compose v2.24+ 编写。如有问题欢迎评论区讨论!
AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。
更多推荐



所有评论(0)