• 已更新完成。系统级 AGENTS.md 已完整固化你的"双轮 Codex 评审 + 用户拍板"流程。
───────────────────────────────────────────────
变更摘要
变更项 修改前 修改后
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
ask-codex 优先级 P3(按需) P2(中高频),触发场景明确为"架构方案确定前、写实施计划前的双轮评审"
工作流路径 brainstorming → writing-plans → 实施 brainstorming → ask-codex 第一轮 → 修改 → ask-codex 第二轮 → writing-plans → 【用户
拍板】 → 实施
执行规则 P2 按流程:brainstorming → writing-plans P2 按流程:brainstorming 完成后必须先调用 ask-codex 双轮评审,再进入 writing-plans
新增章节 无 双轮 Codex 评审流程(起项目/定方案强制流程),含第一轮(架构评审员)、第二轮(实施专
家)、用户拍板定稿(The Gate)的完整操作细则
这个流程的硬约束(已写入兜底规则)
约束 内容
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
禁止 未获取 Codex 第一轮意见前,直接敲定方案进入 writing-plans
禁止 未等用户拍板就自动进入 writing-plans 或写代码
禁止 将 Codex 回复直接当作最终结论,跳过用户确认
必须 每轮调用 ask-codex 时使用结构化提示词(角色、任务、上下文、证据、约束、预期输出)
必须 用户明确回复"同意/可以/开始"后方可继续开发
实际执行时你会看到
以后起新项目或定方案时,我的流程会是:

  1. brainstorming → 提出 2-3 个方案
  2. 【Skill 使用声明】 → 调用 ask-codex 第一轮(架构评审)
  3. 展示 Codex 评审意见 → 我修改方案
  4. 【Skill 使用声明】 → 调用 ask-codex 第二轮(实施指导)
  5. 整合最终方案 + Codex 两轮意见 + 推荐路径
  6. 呈请你拍板:“请审阅以上方案,确认无误后我将开始编写实施计划并进入开发。”
  7. 等你回复"同意" → 才进入 writing-plans → TDD → 实施

这样你就有一个**参谋(Codex 两轮评审)+ 决策者(你自己)**的完整闭环,而不是我单线思考后直接开干。

以上内容,是不是很神奇,下面,我来教你,用提示词,构建一个skill,来实现问GPT5.5的方案。

  ask-codex 是一个很好的 skill 设计范例。以下是它的设计思想、流程和构建方法,别人按这个套路也能做自己的 skill:
  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  一、设计思想:解决什么问题?
  ask-codex 解决的是 "agent 容易陷入单线思维、早期决策盲区" 的问题。
  它的核心假设:
  1. LLM 也会犯错 — 一个 agent 思考再久,也可能有盲区
  2. 外部评审比自我检查更有效 — 让另一个模型(Codex)以" fresh eyes "评审,能发现主 agent 忽略的风险
  3. 但直接转发原始请求是浪费的 — 模糊的请求会得到模糊的回答,必须经过结构化重写
  4. 最终决策权在主 agent — Codex 只是"参谋",主 agent 必须合成两方的判断,而不是无脑转发
  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  二、文件结构(一个 Skill 的标准组成)
  ask-codex/                          ← skill 目录名(kebab-case)
  ├── SKILL.md                        ← 唯一必需文件:给 agent 看的"使用手册"
  ├── scripts/
  │   └── ask_codex.py                ← 包装脚本:把复杂调用封装成一行命令
  └── references/
      ├── prompt-template.md          ← 默认提示词模板(8要素)
      └── modes.md                    ← 五种场景的模式模板
  关键洞察:SKILL.md 不是给人看的文档,而是给 agent 看的程序指令。它告诉 agent:
  • 什么时候触发这个 skill
  • 触发前必须做什么预处理
  • 触发时具体怎么操作
  • 拿到结果后怎么处理
  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  三、核心流程(从判断到后处理)
  ┌─────────────────────────────────────────────────────────────┐
  │  Step 1: 判断是否需要第二意见                                │
  │  ───────────────────────────                                │
  │  • 触发:不确定的工程决策、硬 bug、非平凡代码审查、架构选型    │
  │  • 跳过:简单语法问题、agent 已有十足把握、可能递归循环        │
  └─────────────────────────────────────────────────────────────┘
                                ↓
  ┌─────────────────────────────────────────────────────────────┐
  │  Step 2: 模糊请求 → 结构化提示词(强制重写)                  │
  │  ─────────────────────────────────────────                  │
  │  严禁直接转发用户原话!必须包含 8 个要素:                    │
  │  1. Role(专家身份)                                         │
  │  2. Task(具体问题)                                         │
  │  3. Context(项目背景、先前决策)                            │
  │  4. Environment(语言、框架、OS、版本)                      │
  │  5. Evidence(代码片段、错误日志、复现步骤)                 │
  │  6. Constraints(必须保留/避免/兼容什么)                    │
  │  7. Expected Output(要什么产出:诊断/评审/计划/对比)       │
  │  8. Response Format(编号列表/章节/表格等)                  │
  └─────────────────────────────────────────────────────────────┘
                                ↓
  ┌─────────────────────────────────────────────────────────────┐
  │  Step 3: 选择模式(内置 5 种评审角色)                        │
  │  ───────────────────────────────────                        │
  │  • Second Opinion    — 不确定的设计或架构判断                │
  │  • Bug Diagnosis     — 错误、异常、 flaky 测试               │
  │  • Code Review       — PR、函数、模块、重构前审查            │
  │  • Implementation Plan — "怎么构建这个"                      │
  │  • Architecture Tradeoff — 技术选型或方案对比                │
  └─────────────────────────────────────────────────────────────┘
                                ↓
  ┌─────────────────────────────────────────────────────────────┐
  │  Step 4: 调用包装脚本                                        │
  │  ───────────────────                                        │
  │  python scripts/ask_codex.py "结构化提示词" --json            │
  │                                                              │
  │  脚本职责:                                                  │
  │  • 在临时清洁目录运行(避免加载无关 project skills)          │
  │  • 过滤 ANSI 代码、执行日志、会话噪音                        │
  │  • 只返回模型的最终答案(JSON: {success, reply, error})     │
  │  • 支持 --prompt-file / stdin / -C 项目上下文 等输入方式      │
  └─────────────────────────────────────────────────────────────┘
                                ↓
  ┌─────────────────────────────────────────────────────────────┐
  │  Step 5: 错误处理(保持简洁,不暴露内部)                     │
  │  ─────────────────────────────────────                      │
  │  失败时只说:                                                │
  │    "未能获得 Codex 回复:<简短原因>。"                        │
  │  不允许暴露:exit code、stderr 堆栈、JSON 诊断信息             │
  │  (除非用户明确要求 --debug)                                 │
  └─────────────────────────────────────────────────────────────┘
                                ↓
  ┌─────────────────────────────────────────────────────────────┐
  │  Step 6: 后处理(最关键的一步)                               │
  │  ─────────────────────────────                              │
  │  拿到 Codex 回复后,主 agent 必须:                          │
  │  • 指出 Codex 回答是否依赖假设、是否缺乏上下文                │
  │  • 如果 Codex 建议危险命令/编辑/破坏性操作,先审查再呈现       │
  │  • 如果 Codex 与主 agent 初始观点矛盾,解释分歧并推荐更安全路径 │
  │  • 绝不把 Codex 的原始输出直接当作最终答案                    │
  └─────────────────────────────────────────────────────────────┘
  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  四、关键设计决策(为什么这样设计)
   设计决策           理由
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   强制结构化提示词   模糊的输入 = 模糊的输出。8 要素模板确保 Codex 获得足够的上下文做高质量评审
   包装脚本隔离环境   默认在临时清洁目录运行,避免 Codex 加载主 agent 的 session history、无关 skills,保持评审独立性
   只返回最终答案     过滤掉 ANSI、日志、思考过程,主 agent 拿到的是可直接用于合成的"纯净信息"
   错误信息极简       用户不需要知道 exit code 是多少,只需要知道"没成功"以及大概原因
   合成而非转发       防止主 agent 变成"传声筒"。Codex 是参谋,主 agent 是决策者,必须加入自己的判断
  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  五、构建你自己的 Skill(通用模板)
  假设你想做一个 "数据库性能审查" skill,让别人也能用:
  1. 创建目录结构
  db-performance-review/
  ├── SKILL.md
  └── scripts/
      └── review_db.py      ← 你的包装脚本(可以是 Python、bash、任何可执行文件)
  2. 编写 SKILL.md(核心)
  ---
  name: db-performance-review
  description: "Use when encountering slow queries, database bottlenecks, or before designing new schema/indexes"
  ---

  # Database Performance Review

  ## When to Use
  - Query execution time > 100ms
  - N+1 query patterns suspected
  - Index design decisions
  - Schema changes before implementation

  ## When NOT to Use
  - Simple CRUD with < 1000 rows
  - Syntax questions (use docs instead)

  ## Prompt Construction (REQUIRED)

  Before invoking, rewrite request into structured prompt with:

  1. **Role**: Database performance engineer
  2. **Task**: What needs review (specific query, schema, index)
  3. **Context**: Table sizes, access patterns, ORM being used
  4. **Evidence**: EXPLAIN output, query plan, slow log
  5. **Constraints**: Must maintain ACID / can't add new columns / read-only replicas available?
  6. **Expected Output**: Index recommendation / query rewrite / schema change
  7. **Response Format**: Table with before/after comparison

  ## Invocation

  ```bash
  python scripts/review_db.py "structured prompt" --json
  Post-Processing
  After receiving review:
  • Check if recommendations work with your ORM
  • Verify index suggestions don't harm write performance
  • Don't present raw output without adding your own analysis

  ### 3. 编写包装脚本(核心职责)

  你的脚本只需要做三件事:
  1. 接收参数/提示词
  2. 调用目标服务(Codex / Claude / GPT / 内部 API)
  3. 过滤噪音,返回纯净 JSON

  ```python
  #!/usr/bin/env python3
  import subprocess, json, sys, argparse

  def main():
      parser = argparse.ArgumentParser()
      parser.add_argument("prompt")
      parser.add_argument("--json", action="store_true")
      args = parser.parse_args()

      # 调用你的评审服务(示例用 Codex)
      result = subprocess.run(
          ["codex", "--quiet", args.prompt],
          capture_output=True, text=True, timeout=180
      )

      reply = result.stdout.strip()

      if args.json:
          print(json.dumps({"success": True, "reply": reply, "error": None}))
      else:
          print(reply)

  if __name__ == "__main__":
      main()
  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  六、一句话总结
  ▌ 一个好的 skill = 触发条件(何时用)+ 预处理规则(怎么用)+ 包装脚本(怎么调)+ 后处理约束(结果怎么处理)。
  ask-codex 的精髓不在于"调用了另一个 AI",而在于:
  1. 强制主 agent 在调用前把问题想清楚(结构化提示词)
  2. 强制主 agent 在拿到结果后独立思考(合成而非转发)
  这样设计出来的 skill,既能放大 agent 的能力,又不会让 agent 变成无脑传声筒。
  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  如果你想,我可以帮你把刚才这套方法论也写成一个 skill(writing-skills),以后你想创建新 skill 时,它就按这个模板引导你一步步走。需要吗?

原理非常简单,但是你最好使用codex,让它帮你构建这个skill,因为codex cli 自身是支持无交互界面调用的。exec,发问题给他,就像调用其它命令行工具一样简单。 最重要的是先跑起来 ,优化有codex刚你根据自己需求改改就完事了。 codex --help 可以拿到所有接口介绍,一起丢给它,让他开发就行了 。 等做好了,成了skill,就用你的kimi code cli来调用这个skill就行。

以下是我生成的skill.md文件,按照这个去让codex开发更靠谱


name: ask-codex
description: “Use when the user invokes $ask-codex or asks to 问问gpt, 问下gpt, 给gpt审阅, 提交给gpt审阅, ask GPT/Codex, get a second opinion, architecture review, implementation guidance, code review, bug diagnosis, risky fix validation, GPT进度, or Codex进度.”

Ask Codex — Second Opinion Skill

A non-interactive wrapper around the Codex CLI that filters out session noise,
execution logs, and ANSI codes, returning only the model’s final answer.

Requirements

  • Python 3.8+
  • Codex CLI installed and authenticated (codex --version should work)
  • codex available on PATH

When to Use

Invoke this skill only when an external second opinion is likely to improve the answer:

  • Ambiguous engineering tradeoffs or design decisions
  • Hard debugging scenarios where root cause is unclear
  • Non-trivial code review (security, concurrency, edge cases)
  • Architecture or migration planning
  • Validating a risky proposed fix

When NOT to Use

Do not use this skill for:

  • Simple syntax questions or documentation lookups
  • Tasks the current assistant can answer confidently
  • Requests that do not benefit from a second model pass
  • Any operation where nested LLM delegation could loop (recursion guard blocks this)

Prompt Construction

Never forward a vague user request directly to Codex.
Before invoking, rewrite the request into a structured prompt.
Preserve the user’s original intent and do not invent facts.
If context is missing, state assumptions explicitly.

A good prompt contains 8 elements:

  1. Role: Expert identity you want Codex to assume
  2. Task: Exact question or decision needed
  3. Context: Project, feature, architecture, or prior decisions
  4. Environment: Language, framework, OS, versions, tooling
  5. Evidence: Code snippets, errors, logs, reproduction steps
  6. Constraints: What must preserve, avoid, or remain compatible
  7. Expected Output: Bug diagnosis, review, plan, tradeoff analysis, patch
  8. Response Format: Preferred structure (numbered list, sections, etc.)

For the default template, see references/prompt-template.md.

Built-In Modes

Choose a mode that matches the request type, then fill in the bracketed fields:

  • Second Opinion — uncertain designs or architecture judgment
  • Bug Diagnosis — errors, exceptions, flaky tests
  • Code Review — PRs, functions, modules, pre-refactor
  • Implementation Plan — “how to build this”
  • Architecture Tradeoff — technology selection or option comparison

Full templates for all modes: references/modes.md

Default Invocation Policy

For real second-opinion work, prefer background execution. Long Codex calls
can take 3-5 minutes and may produce no final answer until the end; do not block
the foreground agent unless the task is a trivial health check.

Use this policy:

  1. Health checks only: inline ask_codex.py "Say hello" --json -t 30.
  2. Normal second opinions: write the structured prompt to a file and launch
    start_ask_codex_background.py --prompt-file request.md --debug.
  3. Foreground behavior after launch: keep working on non-overlapping local
    tasks. Do not wait inline unless the Codex answer is the immediate blocker.
  4. Progress questions: when the user asks whether GPT/Codex has
    responded or is still working, run check_ask_codex_progress.py --json and
    report state, elapsedMs, eventsBytes, eventsLines, and whether
    resultAvailable is true.
  5. Final answer: when resultAvailable is true, read resultFile, then
    synthesize Codex’s answer with the current assistant’s own judgment.

This is script-level background execution, not platform subagent delegation. Do
not spawn platform subagents unless the user explicitly asks for subagents.

Invocation

The wrapper script is located at scripts/ask_codex.py.

Basic usage

python .agents/skills/ask-codex/scripts/ask_codex.py "Your structured prompt here"

JSON output (recommended for programmatic use)

python .agents/skills/ask-codex/scripts/ask_codex.py "Your structured prompt" --json

From a file

python .agents/skills/ask-codex/scripts/ask_codex.py --prompt-file request.md --json

From stdin

cat request.md | python .agents/skills/ask-codex/scripts/ask_codex.py - --json

With project context

By default the script uses a temporary clean directory to avoid loading unrelated
project skills/agents. To preserve project context (AGENTS.md, local files):

python .agents/skills/ask-codex/scripts/ask_codex.py "..." -C ./my-project --json

Optional overrides

Flag Description
-m MODEL Override model (default follows ~/.codex/config.toml)
-s SANDBOX read-only (default), workspace-write, danger-full-access
-t SECONDS Timeout (default: 300)
--status-file PATH Write live progress JSON for polling
--events-file PATH Write Codex --json event stream as JSONL
--result-file PATH Write final wrapper JSON to a stable file
--debug Include internal diagnostics in JSON output

Background usage for long answers

For slow research or complex reviews, prefer starting a background job instead of
blocking the foreground agent:

python .agents/skills/ask-codex/scripts/start_ask_codex_background.py --prompt-file request.md --debug

The launcher returns JSON containing:

{
  "jobId": "20260513-061215-83ab13b4",
  "statusFile": ".../ask-codex-jobs/<job-id>/status.json",
  "eventsFile": ".../ask-codex-jobs/<job-id>/events.jsonl",
  "resultFile": ".../ask-codex-jobs/<job-id>/result.json",
  "latestFile": ".../ask-codex-jobs/latest.json"
}

When the user asks for “GPT progress” or “Codex progress”,
read %TEMP%/ask-codex-jobs/latest.json, then read its statusFile. Report
state, elapsedMs, eventsBytes, eventsLines, and updatedAt. The useful
signal is whether eventsBytes / eventsLines changed since the last check.
Do not promise exact token counts.

Shortcut:

python .agents/skills/ask-codex/scripts/check_ask_codex_progress.py --json

Job cleanup is conservative and automatic when launching a new background job:
completed jobs older than 1 day are pruned by default. The launcher does not
delete the job pointed to by latest.json, and it does not delete jobs whose
status.json says state is starting/running or final is false.
Foreground agents should only read latest.json and the files it points to; do
not scan old job directories for progress.

Wrapper JSON schema

Default minimal JSON:

{
  "success": true,
  "reply": "The clean final answer from Codex...",
  "error": null
}

On failure:

{
  "success": false,
  "reply": null,
  "error": "未能获得 Codex 回复:调用超时。"
}

With --debug:

{
  "success": false,
  "reply": null,
  "error": "未能获得 Codex 回复:调用超时。",
  "debug": {
    "code": "timeout",
    "exitCode": null,
    "durationMs": 180000,
    "stderrTail": "..."
  }
}

Timeout & Robustness Best Practices

When invoking this skill, observe the following timeout and recovery rules:

  1. Use a 300-second timeout by default. The wrapper script defaults to 300 s (-t 300). If you override it, never set the outer Shell timeout lower than the script’s internal timeout; otherwise the Shell layer may kill the wrapper before it can recover partial output.
  2. Long prompts take time. Highly structured prompts (scored dimensions, rigid formatting, Chinese output) can take 3–4 minutes to generate. Do not interrupt the process while it is still working.
  3. Partial replies are recovered automatically. The wrapper attempts to read any content already written to _.codex_reply.txt on timeout or crash. If partial content exists, it is returned in the reply field together with a partial_timeout / partial_error code. Never discard it.
  4. Verify with a trivial prompt first. If a call fails, test with a minimal prompt such as "Say hello". Success within ~30 s proves the script, authentication, and network are healthy; the original failure was due to prompt complexity or model generation speed, not an environment issue.
  5. On Windows, prefer a no-profile shell. PowerShell profile startup errors can pollute diagnostics or JSON consumers. When possible, run the wrapper from powershell -NoProfile or set the shell tool’s login=false.
  6. The wrapper cleans up timed-out Codex subprocesses. It starts Codex with an isolated process group and kills the process tree on timeout before reading _.codex_reply.txt.
  7. Use the event stream for progress. _.codex_reply.txt may not grow until the final answer is available. events.jsonl is the preferred liveness signal because it captures Codex --json stdout as it arrives.

Error Handling

Keep user-facing errors short.

When Codex does not return a usable answer, do not expose raw stderr, stack traces,
JSON diagnostics, exit codes, or internal error categories by default.

Return only a short sentence:

未能获得 Codex 回复:<简短原因>。

Allowed reasons:

  • 问题内容为空
  • Codex CLI 当前不可用
  • Codex CLI 当前不可用或未完成认证
  • 当前 Codex 配置不可用
  • 调用超时
  • Codex 没有返回可用内容
  • 工作目录或沙箱权限受限
  • Codex 运行失败
  • 本地包装器运行异常

Only include detailed diagnostics when explicitly running with --debug.

Post-Processing

After receiving Codex’s reply, synthesize it with the current assistant’s own judgment.

  • Mention when Codex’s answer depends on assumptions or lacks enough context.
  • If Codex suggests risky commands, edits, or destructive operations, review them before presenting.
  • Do not present Codex’s raw output as the final answer without adding your own analysis.
  • If Codex contradicts the current assistant’s initial view, explain the divergence and recommend the safer path.
Logo

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。

更多推荐