第三章：Claude Code CLI 语义召回机制与后台自动抽取代理

程序消消乐

72人浏览 · 2026-04-07 17:36:25

程序消消乐 · 2026-04-07 17:36:25 发布

第三章：Claude Code CLI 语义召回机制与后台自动抽取代理

第三章：Claude Code CLI 语义召回机制与后台自动抽取代理

第三章：Claude Code CLI 语义召回机制与后台自动抽取代理

系列说明：本章深入两个最有设计感的子系统——语义召回（用 Sonnet 选择相关记忆）和后台抽取代理（对话结束后自动提取记忆）。两者都体现了"用 AI 管理 AI 记忆"的设计思路。

一、为什么需要语义召回？

MEMORY.md 索引只有 200 行，但随着时间积累，记忆文件可能有数十甚至数百个。系统面临一个矛盾：

全部加载：token 消耗过高，大量不相关记忆稀释上下文质量
仅靠索引：索引行只有 ~150 字符，细节不足以做准确判断
关键词匹配：对自然语言描述的记忆效果差

解决方案：召唤另一个 AI（Sonnet）来做语义相关性判断。

二、记忆扫描：`scanMemoryFiles()`

语义召回的第一步是扫描所有记忆文件，获取元数据。

源文件：memdir/memoryScan.ts

export async function scanMemoryFiles(
  memoryDir: string,
  signal: AbortSignal,
): Promise<MemoryHeader[]>

export type MemoryHeader = {
  filename: string        // 相对路径（如 feedback_testing.md）
  filePath: string        // 绝对路径
  mtimeMs: number         // 修改时间戳（毫秒）
  description: string | null  // frontmatter 中的 description
  type: MemoryType | undefined
}

2.1 扫描实现

const MAX_MEMORY_FILES = 200
const FRONTMATTER_MAX_LINES = 30  // 只读前 30 行，不读全文

export async function scanMemoryFiles(memoryDir, signal) {
  const entries = await readdir(memoryDir, { recursive: true })
  const mdFiles = entries.filter(
    f => f.endsWith('.md') && basename(f) !== 'MEMORY.md',  // 排除索引本身
  )

  const headerResults = await Promise.allSettled(
    mdFiles.map(async (relativePath): Promise<MemoryHeader> => {
      const filePath = join(memoryDir, relativePath)
      const { content, mtimeMs } = await readFileInRange(
        filePath,
        0,
        FRONTMATTER_MAX_LINES,  // 只读前 30 行，高效
        undefined,
        signal,
      )
      const { frontmatter } = parseFrontmatter(content, filePath)
      return { filename: relativePath, filePath, mtimeMs, ... }
    }),
  )

  return headerResults
    .filter(r => r.status === 'fulfilled')
    .map(r => r.value)
    .sort((a, b) => b.mtimeMs - a.mtimeMs)  // 按修改时间新→旧排序
    .slice(0, MAX_MEMORY_FILES)              // 上限 200 个
}

关键优化：注释说明了"Single-pass"设计：

“readFileInRange stats internally and returns mtimeMs, so we read-then-sort rather than stat-sort-read. For the common case (N ≤ 200) this halves syscalls vs a separate stat round.”

一次 readFileInRange 同时获取文件内容和 mtime，避免先 stat 再读的两次系统调用。

2.2 清单格式化

export function formatMemoryManifest(memories: MemoryHeader[]): string {
  return memories
    .map(m => {
      const tag = m.type ? `[${m.type}] ` : ''
      const ts = new Date(m.mtimeMs).toISOString()
      return m.description
        ? `- ${tag}${m.filename} (${ts}): ${m.description}`
        : `- ${tag}${m.filename} (${ts})`
    })
    .join('\n')
}

输出示例：

- [feedback] feedback_testing_policy.md (2026-03-15T10:30:00.000Z): 禁止在数据库测试中使用 mock
- [user] user_role.md (2026-02-01T08:00:00.000Z): 用户是 Go 专家，React 新手
- [project] project_auth_rewrite.md (2026-04-01T14:00:00.000Z): auth 重写由法务合规驱动
- [reference] ref_linear_ingest.md (2026-01-20T09:00:00.000Z): 流水线 bug 追踪在 Linear INGEST

三、语义召回：`findRelevantMemories()`

源文件：memdir/findRelevantMemories.ts

3.1 函数签名

export async function findRelevantMemories(
  query: string,
  memoryDir: string,
  signal: AbortSignal,
  recentTools: readonly string[] = [],
  alreadySurfaced: ReadonlySet<string> = new Set(),
): Promise<RelevantMemory[]>

export type RelevantMemory = {
  path: string
  mtimeMs: number
}

两个值得注意的参数：

recentTools：本轮对话最近使用的工具列表
alreadySurfaced：本次会话中已展示过的记忆路径集合

3.2 完整工作流程

3.3 Sonnet 选择器：系统提示设计

const SELECT_MEMORIES_SYSTEM_PROMPT = `You are selecting memories that will 
be useful to Claude Code as it processes a user's query. You will be given 
the user's query and a list of available memory files with their filenames 
and descriptions.

Return a list of filenames for the memories that will clearly be useful to 
Claude Code as it processes the user's query (up to 5). Only include memories 
that you are certain will be helpful based on their name and description.
- If you are unsure if a memory will be useful in processing the user's query, 
  then do not include it in your list. Be selective and discerning.
- If there are no memories in the list that would clearly be useful, feel free 
  to return an empty list.
- If a list of recently-used tools is provided, do not select memories that 
  are usage reference or API documentation for those tools (Claude Code is 
  already exercising them). DO still select memories containing warnings, 
  gotchas, or known issues about those tools — active use is exactly when 
  those matter.`

最后一条规则值得重点关注：

如果用户正在使用某个工具，不要召回该工具的使用文档（浪费 5 个名额），但要召回该工具的已知问题/陷阱——正在使用时才是最需要知道陷阱的时候。

3.4 JSON Schema 约束输出

const result = await sideQuery({
  model: getDefaultSonnetModel(),
  system: SELECT_MEMORIES_SYSTEM_PROMPT,
  skipSystemPromptPrefix: true,
  messages: [{
    role: 'user',
    content: `Query: ${query}\n\nAvailable memories:\n${manifest}${toolsSection}`,
  }],
  max_tokens: 256,
  output_format: {
    type: 'json_schema',
    schema: {
      type: 'object',
      properties: {
        selected_memories: { type: 'array', items: { type: 'string' } },
      },
      required: ['selected_memories'],
      additionalProperties: false,
    },
  },
  querySource: 'memdir_relevance',
})

使用结构化输出（json_schema）而非自由文本，防止 Sonnet 返回文件名以外的内容。返回后还会验证文件名确实存在：

// 过滤幻觉：只保留扫描到的真实文件名
return parsed.selected_memories.filter(f => validFilenames.has(f))

四、记忆新鲜度机制

源文件：memdir/memoryAge.ts

每个被召回的记忆都会附带新鲜度说明，防止模型将陈旧信息当作当前事实。

// 计算天数差（向下取整，负值归零）
export function memoryAgeDays(mtimeMs: number): number {
  return Math.max(0, Math.floor((Date.now() - mtimeMs) / 86_400_000))
}

// 人类可读格式
export function memoryAge(mtimeMs: number): string {
  const d = memoryAgeDays(mtimeMs)
  if (d === 0) return 'today'
  if (d === 1) return 'yesterday'
  return `${d} days ago`
}

// 新鲜度警告文本（>1天才生成，今天/昨天的记忆不加噪音）
export function memoryFreshnessText(mtimeMs: number): string {
  const d = memoryAgeDays(mtimeMs)
  if (d <= 1) return ''
  return (
    `This memory is ${d} days old. ` +
    `Memories are point-in-time observations, not live state — ` +
    `claims about code behavior or file:line citations may be outdated. ` +
    `Verify against current code before asserting as fact.`
  )
}

// 包装为 system-reminder 标签（用于直接插入上下文）
export function memoryFreshnessNote(mtimeMs: number): string {
  const text = memoryFreshnessText(mtimeMs)
  if (!text) return ''
  return `<system-reminder>${text}</system-reminder>\n`
}

设计动机（源码注释）：

“Motivated by user reports of stale code-state memories (file:line citations to code that has since changed) being asserted as fact — the citation makes the stale claim sound more authoritative, not less.”

带有 file:line 格式的陈旧引用反而比无来源的陈旧记忆更危险——看起来更权威，更容易被当作事实接受。

五、后台自动抽取代理

源文件：services/extractMemories/extractMemories.ts

5.1 核心问题

自动记忆有个基本问题：用户不会主动说"记住这个"。大量有价值的上下文在普通对话中自然流露：

用户纠正了一个工作方式错误
提到了项目的历史背景
介绍了自己的技术背景

后台抽取代理在每轮对话结束后异步运行，自动识别并保存这类隐性知识。

5.2 触发时机

5.3 互斥机制：避免重复写入

function hasMemoryWritesSince(
  messages: Message[],
  sinceUuid: string | undefined,
): boolean {
  // 扫描 sinceUuid 之后的所有 assistant 消息
  // 如果发现任何 Write/Edit tool_use 写入了 auto-memory 路径
  // 则返回 true
}

主代理与后台代理互斥：

如果主代理在本轮已经主动写入记忆 → 后台代理跳过，不重复处理
游标（lastMemoryMessageUuid）前进到最新消息
下次抽取只处理游标之后的新消息

if (hasMemoryWritesSince(messages, lastMemoryMessageUuid)) {
  logForDebugging('[extractMemories] skipping — conversation already wrote to memory files')
  // 前进游标，避免重复
  lastMemoryMessageUuid = messages.at(-1)?.uuid
  return
}

5.4 Forked Agent 模式

const result = await runForkedAgent({
  promptMessages: [createUserMessage({ content: userPrompt })],
  cacheSafeParams,          // 共享主代理的 prompt cache
  canUseTool,               // 受限的工具权限
  querySource: 'extract_memories',
  forkLabel: 'extract_memories',
  skipTranscript: true,     // 不写入对话历史（避免竞态）
  maxTurns: 5,              // 硬性轮次上限
})

共享 prompt cache 是关键优化：

提取代理与主代理使用相同的系统提示前缀
主代理已经消耗的 prompt cache 被提取代理复用
实际新增的 token 开销很小

日志输出示例：

[extractMemories] finished — 2 files written, cache: read=45230 create=1200 input=890 (95.8% hit)

5.5 工具权限控制

提取代理只能使用有限的工具集：

export function createAutoMemCanUseTool(memoryDir: string): CanUseToolFn {
  return async (tool, input) => {
    // ✅ 允许：Read、Grep、Glob（只读，无限制）
    if (tool.name === FILE_READ_TOOL_NAME || 
        tool.name === GREP_TOOL_NAME || 
        tool.name === GLOB_TOOL_NAME) {
      return { behavior: 'allow', updatedInput: input }
    }

    // ✅ 允许：Bash，但仅限只读命令（ls, find, grep, cat, stat, wc 等）
    if (tool.name === BASH_TOOL_NAME) {
      if (tool.isReadOnly(parsedInput)) {
        return { behavior: 'allow', updatedInput: input }
      }
      return denyAutoMemTool(tool, 'Only read-only shell commands are permitted')
    }

    // ✅ 允许：Edit/Write，但仅限 auto-memory 目录内的路径
    if (tool.name === FILE_EDIT_TOOL_NAME || tool.name === FILE_WRITE_TOOL_NAME) {
      if (isAutoMemPath(filePath)) {
        return { behavior: 'allow', updatedInput: input }
      }
    }

    // ❌ 拒绝：MCP、Agent（子代理）、任何其他工具
    return denyAutoMemTool(tool, `only Read/Grep/Glob/read-only Bash/Edit/Write within ${memoryDir} are allowed`)
  }
}

5.6 并发控制：防止重叠运行

抽取代理有精细的并发控制逻辑：

// 闭包内的状态（每次 initExtractMemories() 重置）
let inProgress = false          // 当前是否有抽取在运行
let pendingContext = undefined  // 运行中收到的最新请求

// 如果抽取正在进行，stash 最新上下文，运行结束后执行一次 trailing run
if (inProgress) {
  pendingContext = { context, appendSystemMessage }
  return
}

只保留最新的 stash：多次对话轮次积累时，最新的上下文包含所有历史消息，早期 stash 是子集，直接覆盖即可。

5.7 throttle 机制

通过特性门控 tengu_bramble_lintel 可配置每 N 轮才运行一次提取（默认 N=1，即每轮都运行）：

turnsSinceLastExtraction++
if (turnsSinceLastExtraction < (featureValue ?? 1)) {
  return  // 未达到阈值，跳过本轮
}
turnsSinceLastExtraction = 0

trailing run（stash 触发的）跳过此检查，确保已积累的消息不被遗漏。

5.8 遥测：抽取效果监控

logEvent('tengu_extract_memories_extraction', {
  input_tokens: result.totalUsage.input_tokens,
  output_tokens: result.totalUsage.output_tokens,
  cache_read_input_tokens: result.totalUsage.cache_read_input_tokens,
  cache_creation_input_tokens: result.totalUsage.cache_creation_input_tokens,
  message_count: newMessageCount,   // 本次处理的消息数
  turn_count: turnCount,            // 提取代理使用的轮次
  files_written: writtenPaths.length,
  memories_saved: memoryPaths.length,
  team_memories_saved: teamCount,
  duration_ms: Date.now() - startTime,
})

六、完整调用链总结

后台提取链路（每轮对话结束后）：

语义召回链路（用户查询时）：

七、章节小结

设计决策	原因
用 Sonnet 做相关性判断	语义匹配优于关键词，比向量数据库更简单
只读 frontmatter 前 30 行	避免读取所有文件全文，单次扫描高效
最多召回 5 个记忆	防止上下文被大量记忆稀释
排除正在使用工具的文档	节省召回名额给真正有用的信息
新鲜度警告（>1天）	防止陈旧 file:line 引用被当作事实
Forked agent + 共享 prompt cache	几乎零额外 token 开销
主/后台代理互斥	防止对同一对话内容写两次记忆
trailing run 机制	高频对话不丢失记忆，只保留最新 stash