周末肝了一波！DeepSeek+RAG+Function Calling 全栈 AI 助手保姆级实战（附源码+避坑指南）

BDawn

235人浏览 · 2026-06-08 17:00:57

BDawn · 2026-06-08 17:00:57 发布

周末肝了一波！DeepSeek+RAG+Function Calling 全栈 AI 助手保姆级实战（附源码+避坑指南）

文章目录

周末肝了一波！DeepSeek+RAG+Function Calling 全栈 AI 助手保姆级实战（附源码+避坑指南）

写在前面

最近在折腾 AI 应用开发，发现网上的教程要么只讲概念没有代码，要么只贴代码不解释原理，而且几乎没人把 Function Calling 和 RAG 放在一个项目里完整落地。

于是花了一个周末，从零搭了一个全栈 AI 助手，顺手把踩过的坑都记了下来。这篇文章就带你一步步复现。

📦 完整源码：GitHub：lgx-ai（求 Star ⭐）

🧪 最终效果：一个带界面的 AI 助手，支持普通聊天 / 检索私有文档回答 / 调用天气 API 工具

一、这个项目能干什么

先看看最终效果，再决定要不要往下读：

模式	能力	应用场景
普通聊天	多轮对话、流式输出	通用问答
查询资料库（RAG）	检索你的私有文档再回答	项目文档问答、知识库搜索
工具调用（Function Calling）	AI 主动调用天气 API、发通知	智能助手、自动化 Agent

架构一览：

Vue3 前端(8080) ──proxy──▶ Express 后端(3000) ──┬── DeepSeek API（对话）
                                                 ├── ChromaDB + Ollama（RAG 检索）
                                                 └── 天气 API / 通知 （Function Calling）

下面正式开始，每一步都附带完整代码 + 原理解析 + 踩坑记录。

二、环境准备：版本很关键！

在开始之前，先统一环境，版本不匹配是最大的坑。

我的开发环境

依赖	版本	说明
Node.js	20.19+ 或 22.12+	前端 Vue 项目要求
openai	^6.41.0	DeepSeek 兼容 OpenAI SDK
express	^5.2.1	后端框架
chromadb	^3.4.3	向量数据库 Node SDK
@chroma-core/ollama	^0.1.8	Ollama embedding 桥接
Vue	^3.5.32	前端框架
ant-design-x-vue	^1.6.0	AI 对话 UI 组件库
vite	^8.0.8	构建工具
Python	3.10+	ChromaDB 服务端需要

外部服务

服务	用途	获取方式
DeepSeek API Key	大模型调用	platform.deepseek.com 注册即送
Ollama	本地 embedding 模型	ollama.com 下载安装
ChromaDB Server	向量数据库	Python 启动，见下方
天气 API（可选）	Function Calling 示例	tianqiapi.com 免费申请

初始化项目

# 根目录
npm init -y
npm install openai express readline-sync

# 前端
cd ai-web && npm install && cd ..

# ChromaDB 子项目
cd chromadb && npm install && cd ..

三、第一步：封装 DeepSeek API 调用

3.1 为什么选择 DeepSeek？

几个关键原因：

API 完全兼容 OpenAI SDK，迁移成本为零
价格便宜，日常开发调试不心疼
中文能力强，生成质量不输 GPT-4
支持 Function Calling，工具调用开箱即用

3.2 初始化 OpenAI 客户端

// ai-learning/call.js
const { OpenAI } = require('openai');
const fs = require('fs');
const path = require('path');

// 从文件读取 API Key（不要硬编码！）
const apiKey = fs.readFileSync(
    path.join(__dirname, '..', 'api-key.txt'), 'utf-8'
).trim();

const openai = new OpenAI({
    baseURL: 'https://api.deepseek.com',  // 关键：指向 DeepSeek
    apiKey: apiKey,
});

⚠️ 踩坑记录 #1：baseURL 一定不要漏掉 https://，否则会报 Connection error。另外不要加 /v1 后缀，SDK 自动拼接。

3.3 封装非流式调用

async function askDeepSeek(ask, messages = [], retries = 3) {
    if (messages.length === 0) {
        messages.push({ role: "user", content: ask });
    }

    try {
        const completion = await openai.chat.completions.create({
            messages: messages,
            model: "deepseek-v4-pro",
            thinking: { "type": "enabled" },     // 开启推理模式
            reasoning_effort: "high",             // 高推理强度
            stream: false,
        });
        return completion.choices[0].message.content;
    } catch (error) {
        console.error('API 调用失败:', error.message);
        if (retries > 0) {
            console.log(`重试中，剩余 ${retries - 1} 次...`);
            // 重试前等待，避免频率限制
            await new Promise(r => setTimeout(r, 1000));
            return await askDeepSeek(ask, messages, retries - 1);
        }
        return "抱歉，服务暂时不可用，请稍后再试。";
    }
}

💡 为什么要加重试机制？ DeepSeek API 有并发限制，高频调用时偶尔返回 429/503。加 3 次重试 + 1 秒间隔可以有效提升稳定性。

3.4 封装流式调用（重点）

流式调用是实现"打字机效果"的核心，也是前后端通信的关键设计：

async function askDeepSeekStream(ask, msgRecCb, messages = [], retries = 3) {
    if (messages.length === 0) {
        messages.push({ role: "user", content: ask });
    }

    try {
        const stream = await openai.chat.completions.create({
            messages: messages,
            model: "deepseek-v4-pro",
            thinking: { "type": "disabled" },  // 流式模式建议关闭 thinking
            stream: true,
        });

        let fullContent = '';
        let reasoningContent = '';

        for await (const chunk of stream) {
            const delta = chunk.choices[0]?.delta;
            const finishReason = chunk.choices[0]?.finish_reason;

            // 普通文本内容
            if (delta?.content) {
                fullContent += delta.content;
                if (typeof msgRecCb === 'function') {
                    msgRecCb(delta.content, false);  // false = 还未结束
                }
            }

            // DeepSeek 特有的推理内容（思考过程）
            if (delta?.reasoning_content) {
                reasoningContent += delta.reasoning_content;
                process.stdout.write(delta.reasoning_content);
            }

            // finishReason 各值的含义
            // "stop"            — 正常结束
            // "length"          — 达到 max_tokens 上限被截断
            // "content_filter"  — 触发了安全过滤
            // "tool_calls"      — 模型要调用工具
            if (finishReason) {
                if (typeof msgRecCb === 'function') {
                    msgRecCb('', true);  // true = 流结束信号
                }
            }
        }

        return fullContent;
    } catch (error) {
        console.error('流式调用失败:', error.message);
        if (retries > 0) {
            await new Promise(r => setTimeout(r, 1000));
            return await askDeepSeekStream(ask, msgRecCb, messages, retries - 1);
        }
        return "抱歉，服务暂时不可用。";
    }
}

// 导出
exports.askDeepSeek = askDeepSeek;
exports.askDeepSeekStream = askDeepSeekStream;
exports.openai = openai;

⚠️ 踩坑记录 #2：流式模式下如果开启 thinking，delta.content 在前几个 chunk 可能是 undefined。建议流式对话时关闭 thinking（设为 "disabled"），否则前端会看到空白停顿。

⚠️ 踩坑记录 #3：finishReason 为 null 时表示流还在传输，不要凭 delta.content === undefined 判断结束，必须用 finishReason。

3.5 命令行聊天测试

// ai-learning/index.js
try {
    require('child_process').execSync('chcp 65001', { stdio: 'pipe' });
} catch (e) {}

const { askDeepSeekStream } = require('./call');
const readline = require('readline-sync');

const messages = [];

async function main() {
    console.log('===== DeepSeek 命令行聊天 =====');
    console.log('输入 "exit" 退出\n');

    while (true) {
        const ask = readline.question('你: ');
        if (ask === 'exit') {
            console.log('再见！');
            break;
        }

        messages.push({ role: 'user', content: ask });
        process.stdout.write('AI: ');

        let response = '';
        await askDeepSeekStream(ask, (chunk, isFinish) => {
            process.stdout.write(chunk);
            response += chunk;
            if (isFinish) {
                process.stdout.write('\n\n');
                messages.push({ role: 'assistant', content: response });
            }
        }, messages);
    }
}

main();

⚠️ 踩坑记录 #4：Windows 终端中文乱码？文件开头加 chcp 65001 强制切换 UTF-8 编码。Mac/Linux 不需要。

测试一下：

===== DeepSeek 命令行聊天 =====
输入 "exit" 退出

你: 介绍一下 Node.js 事件循环
AI: Node.js 的事件循环是其异步非阻塞 I/O 模型的核心...

四、第二步：Function Calling——让 AI 学会"动手"

4.1 Function Calling 原理

一句话解释：告诉模型你有哪些工具可用，它自己判断要不要调用、调用哪个、传什么参数。

流程如下：

用户提问 → 模型分析 → 需要外部数据？
                         ├── 是 → 返回 functionName + 参数
                         │         ↓
                         │    你的代码执行函数
                         │         ↓
                         │    把结果再交给模型总结
                         │         ↓
                         │    返回最终回答
                         └── 否 → 直接返回回答

4.2 定义工具（Function Definitions）

// function-calling/index.js
const { openai, askDeepSeekStream } = require('../ai-learning/call');
const weatherApi = require('../weather-secret.json');

// 实际执行的函数
const fakeFunctions = {
    get_weather: async (city) => {
        const url = `http://pddfps.tianqiapi.com/api?unescape=1&version=v63` +
            `&appid=${weatherApi.appid}&appsecret=${weatherApi.appsecret}` +
            `&city=${encodeURIComponent(city)}`;
        const res = await fetch(url);
        const data = await res.json();
        return JSON.stringify(data);
    },
    send_notification(title, message) {
        console.log(`[通知] ${title}: ${message}`);
        return `通知已发送: ${title} - ${message}`;
    },
    get_time: () => {
        return new Date().toLocaleString();
    }
};

// 告诉模型有哪些工具
const functionDefinitions = [
    {
        type: "function",
        function: {
            name: "get_weather",
            description: "获取指定城市的天气信息",
            parameters: {
                type: "object",
                properties: {
                    city: {
                        type: "string",
                        description: "城市名称，如北京、上海"
                    }
                },
                required: ["city"]
            }
        }
    },
    {
        type: "function",
        function: {
            name: "send_notification",
            description: "发送通知到用户手机",
            parameters: {
                type: "object",
                properties: {
                    title: { type: "string", description: "通知标题" },
                    message: { type: "string", description: "通知内容" }
                },
                required: ["title", "message"]
            }
        }
    }
];

💡 关键设计：description 和 parameters.description 写得越清晰，模型判断越准确。如果描述模糊（比如只写"获取天气"），模型可能不知道该传什么参数。

4.3 带工具调用的对话函数

async function askWithFunctions(userMessage) {
    const response = await openai.chat.completions.create({
        model: "deepseek-v4-pro",
        messages: [{ role: "user", content: userMessage }],
        tools: functionDefinitions,
        tool_choice: "auto"  // 让模型自己判断要不要调工具
    });

    const message = response.choices[0].message;

    if (message.tool_calls) {
        const toolCall = message.tool_calls[0];
        const functionName = toolCall.function.name;
        const args = JSON.parse(toolCall.function.arguments);
        console.log(`🔧 模型决定调用: ${functionName}(${JSON.stringify(args)})`);
        return { functionName, arguments: args };
    } else {
        return { text: message.content };
    }
}

⚠️ 踩坑记录 #5：tool_choice: "auto" 时，模型对于聊天类问题也会正常回答，不会强行调用工具。如果设置成 "required"，模型会每次都尝试调用工具——连"你好"都会随便编一个工具调用来应付。

4.4 调度执行——“工具调用的后半段”

const askWithTools = (msg, cb) => {
    askWithFunctions(msg).then(async res => {
        if (res.functionName === 'get_weather') {
            // 1. 执行真实 API 调用获取天气数据
            const weatherData = await fakeFunctions.get_weather(res.arguments.city);

            // 2. 把天气数据作为上下文，让模型生成自然语言回答
            const prompt = `请基于以下资料回答问题。
提问：${msg}
资料：${weatherData}
要求：
1. 回答基于资料内容
2. 如果资料中没有相关信息，明确说明
3. 回答简洁准确`;

            await askDeepSeekStream(prompt, cb, []);

        } else if (res.functionName === 'send_notification') {
            const r = fakeFunctions.send_notification(
                res.arguments.title,
                res.arguments.message
            );
            cb(r, true);  // 通知类直接返回结果

        } else if (!res.functionName) {
            cb(res.text, true);  // 普通回答，直接返回
        }
    });
};

exports.askWithTools = askWithTools;

💡 为什么"先调工具 → 再让模型总结"？ 这是一个两段式设计：

第一段：模型决定调用哪个工具并提取参数（相当于"大脑决策"）

第二段：我们把工具执行结果喂回去，让模型用自然语言组织回答（相当于"嘴巴说话"）

如果不做第二段，用户看到的就是一串 JSON 天气数据，谁也不爱看。

五、第三步：RAG——基于私有文档的智能问答（核心）

5.1 什么是 RAG？为什么要做？

RAG = Retrieval Augmented Generation（检索增强生成）

大模型有两个固有问题：

知识截止日期——训练数据只到某个时间点
幻觉（Hallucination）——不知道的事情会胡编乱造

RAG 的思路是：先检索你提供的文档，找到最相关的片段，拼到 prompt 里再让模型回答。

5.2 为什么选 ChromaDB + Ollama？

对比维度	ChromaDB	Pinecone	Weaviate	Milvus
部署难度	⭐ 一行命令	⭐⭐⭐ 需注册	⭐⭐ Docker	⭐⭐⭐ 资源大户
成本	免费	有免费额度	免费版受限	免费
Node SDK	✅ 原生	✅	✅	⭐ 一般
适合场景	个人/小项目	生产级	中等规模	企业级

Ollama 跑本地 embedding 不需要 API Key，离线可用，省钱省心。

5.3 启动外部服务

# 1. 拉取 Ollama embedding 模型
ollama pull nomic-embed-text

# 2. 启动 ChromaDB 服务（用 Python）
cd chromadb/chromadb-server
pip install chromadb
python main.py
# 服务运行在 http://localhost:8000

⚠️ 踩坑记录 #6：chromadb npm 包的默认端口是 8000，但它的 @chroma-core/default-embed 默认 embedding 模型需要另外下载。我踩了这个坑，后来改用 @chroma-core/ollama 桥接本地 Ollama，一举两得。

⚠️ 踩坑记录 #7：ChromaDB 创建 collection 时必须指定 embeddingFunction，否则后续 query 会报 embedding function not found，因为它会尝试用默认 embedding 函数（可能未配置）。

5.4 文档向量化导入

// chromadb/batch-import.js
const fs = require('fs');
const path = require('path');
const { ChromaClient } = require('chromadb');
const { OllamaEmbeddingFunction } = require('@chroma-core/ollama');

// Ollama embedding — 本地免费
const embedder = new OllamaEmbeddingFunction({
    model: "nomic-embed-text",
    baseUrl: "http://localhost:11434"
});

const client = new ChromaClient({
    host: "localhost",
    port: 8000
});

const COLLECTION_NAME = 'my_notes';

async function importNotes() {
    // 1. 先删除旧集合（重新导入）
    try {
        await client.deleteCollection({ name: COLLECTION_NAME });
        console.log('已删除旧集合');
    } catch (e) {
        // 第一次运行没有集合，忽略错误
    }

    // 2. 创建集合，指定余弦相似度
    const collection = await client.createCollection({
        name: COLLECTION_NAME,
        embeddingFunction: embedder,
        metadata: { "hnsw:space": "cosine" }  // 余弦相似度
    });

    // 3. 读取文档
    const notesDir = path.join(__dirname, 'notes');
    const files = fs.readdirSync(notesDir)
        .filter(f => f.endsWith('.txt') || f.endsWith('.md'));

    const documents = [];
    const ids = [];
    const metadatas = [];

    files.forEach((file, idx) => {
        const content = fs.readFileSync(
            path.join(notesDir, file), 'utf-8'
        );
        // 超长文档截断——embedding 模型有 token 限制
        const truncated = content.length > 1000
            ? content.slice(0, 1000)
            : content;

        documents.push(truncated);
        ids.push(`note_${idx}`);
        metadatas.push({
            filename: file,
            length: content.length
        });
    });

    // 4. 批量写入向量库
    await collection.add({ ids, documents, metadatas });
    console.log(`✅ 已导入 ${documents.length} 篇文档`);
}

importNotes();

⚠️ 踩坑记录 #8：nomic-embed-text 模型的上下文窗口大约 8192 token，如果你的单篇文档超过这个长度，embedding 会失败。所以必须做截断处理。更好的做法是分块（chunking）——把长文档按 500-1000 字切分成多个 chunk，每个 chunk 单独存。这个项目为了演示简单直接截断前 1000 字。

⚠️ 踩坑记录 #9：hnsw:space 一定要设置成 "cosine"，默认是 "l2"（欧几里得距离）。用余弦相似度时 query 返回的 distance 越小越相似，1 - distance 才是真正的相似度分数。两种空间的数值范围完全不同，混用会导致相似度过滤失效。

5.5 RAG 查询核心逻辑

// chromadb/index.js
const { ChromaClient } = require('chromadb');
const { OllamaEmbeddingFunction } = require('@chroma-core/ollama');

const embedder = new OllamaEmbeddingFunction({
    model: "nomic-embed-text",
    baseUrl: "http://localhost:11434"
});

const client = new ChromaClient({
    host: "localhost",
    port: 8000
});

const COLLECTION_NAME = 'my_notes';
const SIMILARITY_THRESHOLD = 0.6;  // 可调整：0.3~0.8

exports.rgaQuery = async (question) => {
    const collection = await client.getCollection({
        name: COLLECTION_NAME,
        embeddingFunction: embedder
    });

    // 检索 top 3 最相关文档
    const results = await collection.query({
        queryTexts: [question],
        nResults: 3,
        include: ["documents", "metadatas", "distances"]
    });

    // 相似度过滤
    const validDocs = [];
    const validSources = [];

    for (let i = 0; i < results.documents[0].length; i++) {
        const distance = results.distances[0][i];      // 余弦距离
        const similarity = 1 - distance;                 // 转相似度
        const doc = results.documents[0][i];
        const source = results.metadatas[0][i].filename;

        console.log(`📊 [${source}] 距离:${distance.toFixed(4)} 相似度:${similarity.toFixed(4)}`);

        if (similarity > SIMILARITY_THRESHOLD) {
            validDocs.push(doc);
            validSources.push(source);
        }
    }

    // 无相关文档
    if (validDocs.length === 0) {
        return {
            prompt: "抱歉，我的资料库中没有找到与您问题相关的信息。",
            sources: []
        };
    }

    // 构造 prompt
    const contextText = validDocs
        .map((doc, i) => `[${i + 1}] ${doc}`)
        .join('\n\n');

    const prompt = `请基于以下资料回答问题。如果资料信息不完整，请基于已有信息合理回答。

资料：
${contextText}

问题：${question}

要求：
1. 回答要基于资料内容
2. 如果资料中没有相关信息，明确说"资料中未找到相关信息"
3. 回答简洁准确`;

    return { prompt, sources: validSources };
};

5.6 关于相似度阈值的调参经验

阈值	效果	适用场景
0.3	几乎不过滤，什么文档都返回	不推荐，噪音太多
0.5	过滤掉明显不相关的文档	文档量大时可用
0.6	平衡点，推荐	通用场景
0.7	严格过滤，只返回高度相关	文档质量要求高时
0.8+	太严格，可能什么都搜不到	仅在精准匹配场景使用

💡 我的调试经验：先不加阈值跑几次查询，观察控制台输出的相似度分布，再决定阈值。如果大部分查询的 top3 相似度都在 0.5-0.7 之间，阈值设 0.6 最合理。

六、第四步：Express 后端——三种能力打包成 API

// server/server.js
const express = require('express');
const { askDeepSeekStream } = require('../ai-learning/call');
const { askWithTools } = require('../function-calling/index.js');
const { rgaQuery } = require('../chromadb/index');

const app = express();
app.use(express.json());

// ===== 接口 1：普通聊天（流式） =====
app.post('/api/chat', async (req, res) => {
    const { ask, message } = req.body;
    await askDeepSeekStream(ask, (chunk, isFinish) => {
        res.write(chunk);
        if (isFinish) res.end();
    }, message);
});

// ===== 接口 2：RAG 检索增强（流式） =====
app.post('/api/chat/rga', async (req, res) => {
    const { ask } = req.body;
    const r = await rgaQuery(ask);

    if (r.sources.length === 0) {
        // 没找到相关文档，直接返回提示
        res.send(r.prompt);
        return;
    }

    await askDeepSeekStream(r.prompt, (chunk, isFinish) => {
        res.write(chunk);
        if (isFinish) {
            // 流结束时附上参考文档列表
            res.write(`\n\n📚 参考文档：${r.sources.join(', ')}`);
            res.end();
        }
    }, []);
});

// ===== 接口 3：Function Calling（流式） =====
app.post('/api/chat/tools', async (req, res) => {
    const { ask } = req.body;
    askWithTools(ask, (chunk, isFinish) => {
        res.write(chunk);
        if (isFinish) res.end();
    });
});

app.listen(3000, () => {
    console.log('🚀 后端服务运行在 http://localhost:3000');
});

⚠️ 踩坑记录 #10：Express 5.x 中 express.json() 是内置的，不需要再装 body-parser。如果你用的是 Express 4.x，需要 npm install body-parser。

⚠️ 踩坑记录 #11：流式接口中 res.write() 后必须等流结束再 res.end()，不要在 res.write() 之后立刻 res.end()，否则前端只能收到第一个 chunk。

七、第五步：Vue 3 前端——漂亮的对话界面

7.1 技术选型

库	用途
Vue 3 + Composition API	响应式框架
TypeScript	类型安全
Vite	极速构建
Ant Design X Vue	AI 对话专用组件（Conversations、BubbleList、Sender）
Pinia	状态管理
Vue Router	路由

💡 为什么选 Ant Design X Vue？ 它直接提供了 BubbleList（聊天气泡列表）、Sender（输入框）、Conversations（会话列表）三个组件，十分钟就能搭出 ChatGPT 级别的 UI。如果手写，光消息气泡的样式就得调半天。

7.2 核心代码：流式聊天的前端实现

<!-- ai-web/src/pages/index.vue -->
<script setup lang="ts">
import { ref, computed } from 'vue'
import {
  Conversations,
  BubbleList,
  Sender,
} from 'ant-design-x-vue'
import type { Conversation } from 'ant-design-x-vue'

// ========== 类型定义 ==========
type MsgStatus = 'local' | 'loading' | 'success' | 'error'

interface ChatMessage {
  id: string
  role: 'user' | 'assistant'
  content: string
  status: MsgStatus
}

type ChatMode = 'chat' | 'rga' | 'tools'

let msgId = 0

// ========== 会话管理 ==========
let conversationId = 3
const conversations = ref<Conversation[]>([
  { key: '1', label: '对话 1' },
])
const activeKey = ref('1')

// 每个会话独立的消息历史
const messagesMap = ref<Record<string, ChatMessage[]>>({
  '1': [],
})

// 每个会话独立的聊天模式
const chatModeMap = ref<Record<string, ChatMode>>({
  '1': 'chat',
})

// 模式选项
const chatModeOptions: { key: ChatMode; label: string }[] = [
  { key: 'chat', label: '普通聊天' },
  { key: 'rga', label: '查询资料库' },
  { key: 'tools', label: '工具调用' },
]

// 当前模式 → 对应 API 地址
const apiUrl = computed<string>(() => {
  const map: Record<ChatMode, string> = {
    chat: '/api/chat',
    rga: '/api/chat/rga',
    tools: '/api/chat/tools',
  }
  return map[currentChatMode.value]
})

// ========== 流式发送消息 ==========
const isSubmitting = computed(() =>
  currentMessages.value.some(m => m.status === 'loading')
)

const handleSubmit = async (value: string) => {
  if (!value.trim() || isSubmitting.value) return

  // 1. 添加用户消息
  currentMessages.value = [
    ...currentMessages.value,
    { id: String(++msgId), role: 'user', content: value, status: 'local' },
  ]

  const aiMsgId = String(++msgId)

  // 2. 发送请求
  const response = await fetch(apiUrl.value, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      message: currentMessages.value,
      ask: value
    })
  })

  // 3. 添加占位的 AI 消息
  currentMessages.value = [
    ...currentMessages.value,
    { id: aiMsgId, role: 'assistant', content: '', status: 'loading' },
  ]

  // 4. 流式读取后端响应
  const reader = response.body!.getReader()
  const decoder = new TextDecoder()

  while (true) {
    const { done, value: chunk } = await reader.read()
    if (done) break
    if (chunk) {
      const text = decoder.decode(chunk, { stream: true })
      // 实时追加内容 → 打字机效果
      currentMessages.value = currentMessages.value.map(m =>
        m.id === aiMsgId
          ? { ...m, content: m.content + text }
          : m
      )
    }
  }

  // 5. 标记完成
  currentMessages.value = currentMessages.value.map(m =>
    m.id === aiMsgId ? { ...m, status: 'success' } : m
  )
}
</script>

⚠️ 踩坑记录 #12：TextDecoder.decode(chunk, { stream: true }) 中的 { stream: true } 参数非常重要。不加这个参数，多字节字符（如中文）刚好被截断在 chunk 边界时会出现乱码。加了之后解码器会缓存不完整的字节序列，等下一个 chunk 拼完整再输出。

⚠️ 踩坑记录 #13：Vue 的 ref 数组更新时，直接 .push() 不会触发响应式更新。必须用 整个数组替换 的方式：arr.value = [...arr.value, newItem]。

7.3 Vite 代理配置（解决跨域）

// ai-web/vite.config.ts
import { defineConfig } from 'vite'
import vue from '@vitejs/plugin-vue'
import components from 'unplugin-vue-components/vite'
import { AntDesignXVueResolver } from 'ant-design-x-vue/resolver'

export default defineConfig({
  plugins: [
    vue(),
    components({
      resolvers: [AntDesignXVueResolver()]  // 自动按需导入组件
    }),
  ],
  server: {
    port: 8080,
    proxy: {
      '/api': {
        target: 'http://localhost:3000',  // 后端地址
        changeOrigin: true,
      },
    },
  },
})

💡 为什么用代理而不是 CORS？ 生产环境中前后端通常部署在不同域名下，但开发阶段用 Vite proxy 最省事。而且 fetch 请求可以直接写 /api/chat 而不是 http://localhost:3000/api/chat，部署时不用改代码。

八、启动运行

# ===== 1. 安装所有依赖 =====
npm install
cd ai-web && npm install && cd ..
cd chromadb && npm install && cd ..

# ===== 2. 配置 API Key =====
# 在项目根目录创建 api-key.txt，写入 DeepSeek API Key
echo "sk-your-api-key" > api-key.txt

# ===== 3. 启动 ChromaDB（RAG 需要） =====
ollama pull nomic-embed-text
cd chromadb/chromadb-server
pip install chromadb
python main.py                    # 终端 1

# ===== 4. 导入文档到向量库 =====
cd ../..
node chromadb/batch-import.js      # 终端 2

# ===== 5. 启动后端 =====
node server/server.js              # 终端 3

# ===== 6. 启动前端 =====
cd ai-web && npm run dev           # 终端 4

打开 http://localhost:8080，就能看到：

左侧：会话列表（可新增/切换会话）
右上：消息气泡（流式逐字输出）
右下：输入框 + 三个模式切换按钮

九、架构全景图

┌──────────────────────────────────────────────────────┐
│                   Vue 3 Frontend (:8080)              │
│                                                       │
│  ┌───────────┐  ┌─────────────┐  ┌────────────────┐  │
│  │ 普通聊天   │  │ 查询资料库   │  │   工具调用      │  │
│  │ /api/chat │  │/api/chat/rga│  │ /api/chat/tools│  │
│  └─────┬─────┘  └──────┬──────┘  └───────┬────────┘  │
└────────┼───────────────┼─────────────────┼───────────┘
         │               │                 │
         └───────────────┼─────────────────┘
                         │  Vite proxy
         ┌───────────────▼─────────────────────────────┐
         │            Express Backend (:3000)           │
         │                                              │
         │  POST /api/chat       → askDeepSeekStream() │
         │  POST /api/chat/rga   → rgaQuery() → LLM   │
         │  POST /api/chat/tools → askWithFunctions()  │
         └──┬──────────────┬──────────────────┬────────┘
            │              │                  │
            ▼              ▼                  ▼
     DeepSeek API    ChromaDB         天气/通知 API
     (云端大模型)   + Ollama          (外部工具)
                   (本地向量检索)

十、常见报错排查速查表

报错信息	原因	解决
`Connection error` 调用 DeepSeek	baseURL 写错或网络不通	检查 `https://` 前缀，不要加 `/v1`
`401 Unauthorized`	API Key 无效或过期	重新生成 Key，确保 `api-key.txt` 没有多余空格
`429 Too Many Requests`	调用频率超限	加重试机制 + 降低并发
`embedding function not found`	ChromaDB collection 创建时未指定 embeddingFunction	创建和查询时都要传入 `embeddingFunction`
`Failed to fetch` 前端访问后端	跨域或端口不对	检查 Vite proxy 配置，确保后端在 3000 端口
Ollama `model not found`	没拉取模型	`ollama pull nomic-embed-text`
ChromaDB 连接被拒	服务未启动	`python main.py` 启动 ChromaDB
流式输出中文乱码	TextDecoder 未设置 stream 模式	`decoder.decode(chunk, { stream: true })`
Vue 界面不更新	直接 push 到 ref 数组	用 `arr.value = [...arr.value, item]` 替换

十一、进阶扩展方向

方向	实现思路
对话持久化	消息历史写入 SQLite / Redis
文档分块策略	用 LangChain 的 RecursiveCharacterTextSplitter 做语义分块
多轮工具调用	模型返回 tool_calls 后再调一次，支持多步推理
用户认证	JWT + 中间件校验
流式 SSE	用 Server-Sent Events 替代 fetch stream，更标准
Docker 部署	前后端 + ChromaDB 容器化，一键启动
多模型支持	抽象 Provider 层，支持切换 DeepSeek / OpenAI / 本地模型