师傅带徒弟模式 ,kimi2.6如何向 GPT5.5提问,实现昂贵进口模型节省85%的token消耗量,国产kimi编程性能提升40%,显著提高便宜模型的性能。
• 已更新完成。系统级 AGENTS.md 已完整固化你的"双轮 Codex 评审 + 用户拍板"流程。
───────────────────────────────────────────────
变更摘要
变更项 修改前 修改后
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
ask-codex 优先级 P3(按需) P2(中高频),触发场景明确为"架构方案确定前、写实施计划前的双轮评审"
工作流路径 brainstorming → writing-plans → 实施 brainstorming → ask-codex 第一轮 → 修改 → ask-codex 第二轮 → writing-plans → 【用户
拍板】 → 实施
执行规则 P2 按流程:brainstorming → writing-plans P2 按流程:brainstorming 完成后必须先调用 ask-codex 双轮评审,再进入 writing-plans
新增章节 无 双轮 Codex 评审流程(起项目/定方案强制流程),含第一轮(架构评审员)、第二轮(实施专
家)、用户拍板定稿(The Gate)的完整操作细则
这个流程的硬约束(已写入兜底规则)
约束 内容
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
禁止 未获取 Codex 第一轮意见前,直接敲定方案进入 writing-plans
禁止 未等用户拍板就自动进入 writing-plans 或写代码
禁止 将 Codex 回复直接当作最终结论,跳过用户确认
必须 每轮调用 ask-codex 时使用结构化提示词(角色、任务、上下文、证据、约束、预期输出)
必须 用户明确回复"同意/可以/开始"后方可继续开发
实际执行时你会看到
以后起新项目或定方案时,我的流程会是:
- brainstorming → 提出 2-3 个方案
- 【Skill 使用声明】 → 调用 ask-codex 第一轮(架构评审)
- 展示 Codex 评审意见 → 我修改方案
- 【Skill 使用声明】 → 调用 ask-codex 第二轮(实施指导)
- 整合最终方案 + Codex 两轮意见 + 推荐路径
- 呈请你拍板:“请审阅以上方案,确认无误后我将开始编写实施计划并进入开发。”
- 等你回复"同意" → 才进入 writing-plans → TDD → 实施
这样你就有一个**参谋(Codex 两轮评审)+ 决策者(你自己)**的完整闭环,而不是我单线思考后直接开干。
以上内容,是不是很神奇,下面,我来教你,用提示词,构建一个skill,来实现问GPT5.5的方案。
ask-codex 是一个很好的 skill 设计范例。以下是它的设计思想、流程和构建方法,别人按这个套路也能做自己的 skill:
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
一、设计思想:解决什么问题?
ask-codex 解决的是 "agent 容易陷入单线思维、早期决策盲区" 的问题。
它的核心假设:
1. LLM 也会犯错 — 一个 agent 思考再久,也可能有盲区
2. 外部评审比自我检查更有效 — 让另一个模型(Codex)以" fresh eyes "评审,能发现主 agent 忽略的风险
3. 但直接转发原始请求是浪费的 — 模糊的请求会得到模糊的回答,必须经过结构化重写
4. 最终决策权在主 agent — Codex 只是"参谋",主 agent 必须合成两方的判断,而不是无脑转发
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
二、文件结构(一个 Skill 的标准组成)
ask-codex/ ← skill 目录名(kebab-case)
├── SKILL.md ← 唯一必需文件:给 agent 看的"使用手册"
├── scripts/
│ └── ask_codex.py ← 包装脚本:把复杂调用封装成一行命令
└── references/
├── prompt-template.md ← 默认提示词模板(8要素)
└── modes.md ← 五种场景的模式模板
关键洞察:SKILL.md 不是给人看的文档,而是给 agent 看的程序指令。它告诉 agent:
• 什么时候触发这个 skill
• 触发前必须做什么预处理
• 触发时具体怎么操作
• 拿到结果后怎么处理
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
三、核心流程(从判断到后处理)
┌─────────────────────────────────────────────────────────────┐
│ Step 1: 判断是否需要第二意见 │
│ ─────────────────────────── │
│ • 触发:不确定的工程决策、硬 bug、非平凡代码审查、架构选型 │
│ • 跳过:简单语法问题、agent 已有十足把握、可能递归循环 │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Step 2: 模糊请求 → 结构化提示词(强制重写) │
│ ───────────────────────────────────────── │
│ 严禁直接转发用户原话!必须包含 8 个要素: │
│ 1. Role(专家身份) │
│ 2. Task(具体问题) │
│ 3. Context(项目背景、先前决策) │
│ 4. Environment(语言、框架、OS、版本) │
│ 5. Evidence(代码片段、错误日志、复现步骤) │
│ 6. Constraints(必须保留/避免/兼容什么) │
│ 7. Expected Output(要什么产出:诊断/评审/计划/对比) │
│ 8. Response Format(编号列表/章节/表格等) │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Step 3: 选择模式(内置 5 种评审角色) │
│ ─────────────────────────────────── │
│ • Second Opinion — 不确定的设计或架构判断 │
│ • Bug Diagnosis — 错误、异常、 flaky 测试 │
│ • Code Review — PR、函数、模块、重构前审查 │
│ • Implementation Plan — "怎么构建这个" │
│ • Architecture Tradeoff — 技术选型或方案对比 │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Step 4: 调用包装脚本 │
│ ─────────────────── │
│ python scripts/ask_codex.py "结构化提示词" --json │
│ │
│ 脚本职责: │
│ • 在临时清洁目录运行(避免加载无关 project skills) │
│ • 过滤 ANSI 代码、执行日志、会话噪音 │
│ • 只返回模型的最终答案(JSON: {success, reply, error}) │
│ • 支持 --prompt-file / stdin / -C 项目上下文 等输入方式 │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Step 5: 错误处理(保持简洁,不暴露内部) │
│ ───────────────────────────────────── │
│ 失败时只说: │
│ "未能获得 Codex 回复:<简短原因>。" │
│ 不允许暴露:exit code、stderr 堆栈、JSON 诊断信息 │
│ (除非用户明确要求 --debug) │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Step 6: 后处理(最关键的一步) │
│ ───────────────────────────── │
│ 拿到 Codex 回复后,主 agent 必须: │
│ • 指出 Codex 回答是否依赖假设、是否缺乏上下文 │
│ • 如果 Codex 建议危险命令/编辑/破坏性操作,先审查再呈现 │
│ • 如果 Codex 与主 agent 初始观点矛盾,解释分歧并推荐更安全路径 │
│ • 绝不把 Codex 的原始输出直接当作最终答案 │
└─────────────────────────────────────────────────────────────┘
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
四、关键设计决策(为什么这样设计)
设计决策 理由
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
强制结构化提示词 模糊的输入 = 模糊的输出。8 要素模板确保 Codex 获得足够的上下文做高质量评审
包装脚本隔离环境 默认在临时清洁目录运行,避免 Codex 加载主 agent 的 session history、无关 skills,保持评审独立性
只返回最终答案 过滤掉 ANSI、日志、思考过程,主 agent 拿到的是可直接用于合成的"纯净信息"
错误信息极简 用户不需要知道 exit code 是多少,只需要知道"没成功"以及大概原因
合成而非转发 防止主 agent 变成"传声筒"。Codex 是参谋,主 agent 是决策者,必须加入自己的判断
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
五、构建你自己的 Skill(通用模板)
假设你想做一个 "数据库性能审查" skill,让别人也能用:
1. 创建目录结构
db-performance-review/
├── SKILL.md
└── scripts/
└── review_db.py ← 你的包装脚本(可以是 Python、bash、任何可执行文件)
2. 编写 SKILL.md(核心)
---
name: db-performance-review
description: "Use when encountering slow queries, database bottlenecks, or before designing new schema/indexes"
---
# Database Performance Review
## When to Use
- Query execution time > 100ms
- N+1 query patterns suspected
- Index design decisions
- Schema changes before implementation
## When NOT to Use
- Simple CRUD with < 1000 rows
- Syntax questions (use docs instead)
## Prompt Construction (REQUIRED)
Before invoking, rewrite request into structured prompt with:
1. **Role**: Database performance engineer
2. **Task**: What needs review (specific query, schema, index)
3. **Context**: Table sizes, access patterns, ORM being used
4. **Evidence**: EXPLAIN output, query plan, slow log
5. **Constraints**: Must maintain ACID / can't add new columns / read-only replicas available?
6. **Expected Output**: Index recommendation / query rewrite / schema change
7. **Response Format**: Table with before/after comparison
## Invocation
```bash
python scripts/review_db.py "structured prompt" --json
Post-Processing
After receiving review:
• Check if recommendations work with your ORM
• Verify index suggestions don't harm write performance
• Don't present raw output without adding your own analysis
### 3. 编写包装脚本(核心职责)
你的脚本只需要做三件事:
1. 接收参数/提示词
2. 调用目标服务(Codex / Claude / GPT / 内部 API)
3. 过滤噪音,返回纯净 JSON
```python
#!/usr/bin/env python3
import subprocess, json, sys, argparse
def main():
parser = argparse.ArgumentParser()
parser.add_argument("prompt")
parser.add_argument("--json", action="store_true")
args = parser.parse_args()
# 调用你的评审服务(示例用 Codex)
result = subprocess.run(
["codex", "--quiet", args.prompt],
capture_output=True, text=True, timeout=180
)
reply = result.stdout.strip()
if args.json:
print(json.dumps({"success": True, "reply": reply, "error": None}))
else:
print(reply)
if __name__ == "__main__":
main()
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
六、一句话总结
▌ 一个好的 skill = 触发条件(何时用)+ 预处理规则(怎么用)+ 包装脚本(怎么调)+ 后处理约束(结果怎么处理)。
ask-codex 的精髓不在于"调用了另一个 AI",而在于:
1. 强制主 agent 在调用前把问题想清楚(结构化提示词)
2. 强制主 agent 在拿到结果后独立思考(合成而非转发)
这样设计出来的 skill,既能放大 agent 的能力,又不会让 agent 变成无脑传声筒。
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
如果你想,我可以帮你把刚才这套方法论也写成一个 skill(writing-skills),以后你想创建新 skill 时,它就按这个模板引导你一步步走。需要吗?
原理非常简单,但是你最好使用codex,让它帮你构建这个skill,因为codex cli 自身是支持无交互界面调用的。exec,发问题给他,就像调用其它命令行工具一样简单。 最重要的是先跑起来 ,优化有codex刚你根据自己需求改改就完事了。 codex --help 可以拿到所有接口介绍,一起丢给它,让他开发就行了 。 等做好了,成了skill,就用你的kimi code cli来调用这个skill就行。
以下是我生成的skill.md文件,按照这个去让codex开发更靠谱
name: ask-codex
description: “Use when the user invokes $ask-codex or asks to 问问gpt, 问下gpt, 给gpt审阅, 提交给gpt审阅, ask GPT/Codex, get a second opinion, architecture review, implementation guidance, code review, bug diagnosis, risky fix validation, GPT进度, or Codex进度.”
Ask Codex — Second Opinion Skill
A non-interactive wrapper around the Codex CLI that filters out session noise,
execution logs, and ANSI codes, returning only the model’s final answer.
Requirements
- Python 3.8+
- Codex CLI installed and authenticated (
codex --versionshould work) codexavailable onPATH
When to Use
Invoke this skill only when an external second opinion is likely to improve the answer:
- Ambiguous engineering tradeoffs or design decisions
- Hard debugging scenarios where root cause is unclear
- Non-trivial code review (security, concurrency, edge cases)
- Architecture or migration planning
- Validating a risky proposed fix
When NOT to Use
Do not use this skill for:
- Simple syntax questions or documentation lookups
- Tasks the current assistant can answer confidently
- Requests that do not benefit from a second model pass
- Any operation where nested LLM delegation could loop (recursion guard blocks this)
Prompt Construction
Never forward a vague user request directly to Codex.
Before invoking, rewrite the request into a structured prompt.
Preserve the user’s original intent and do not invent facts.
If context is missing, state assumptions explicitly.
A good prompt contains 8 elements:
- Role: Expert identity you want Codex to assume
- Task: Exact question or decision needed
- Context: Project, feature, architecture, or prior decisions
- Environment: Language, framework, OS, versions, tooling
- Evidence: Code snippets, errors, logs, reproduction steps
- Constraints: What must preserve, avoid, or remain compatible
- Expected Output: Bug diagnosis, review, plan, tradeoff analysis, patch
- Response Format: Preferred structure (numbered list, sections, etc.)
For the default template, see references/prompt-template.md.
Built-In Modes
Choose a mode that matches the request type, then fill in the bracketed fields:
- Second Opinion — uncertain designs or architecture judgment
- Bug Diagnosis — errors, exceptions, flaky tests
- Code Review — PRs, functions, modules, pre-refactor
- Implementation Plan — “how to build this”
- Architecture Tradeoff — technology selection or option comparison
Full templates for all modes: references/modes.md
Default Invocation Policy
For real second-opinion work, prefer background execution. Long Codex calls
can take 3-5 minutes and may produce no final answer until the end; do not block
the foreground agent unless the task is a trivial health check.
Use this policy:
- Health checks only: inline
ask_codex.py "Say hello" --json -t 30. - Normal second opinions: write the structured prompt to a file and launch
start_ask_codex_background.py --prompt-file request.md --debug. - Foreground behavior after launch: keep working on non-overlapping local
tasks. Do not wait inline unless the Codex answer is the immediate blocker. - Progress questions: when the user asks whether GPT/Codex has
responded or is still working, runcheck_ask_codex_progress.py --jsonand
reportstate,elapsedMs,eventsBytes,eventsLines, and whetherresultAvailableis true. - Final answer: when
resultAvailableis true, readresultFile, then
synthesize Codex’s answer with the current assistant’s own judgment.
This is script-level background execution, not platform subagent delegation. Do
not spawn platform subagents unless the user explicitly asks for subagents.
Invocation
The wrapper script is located at scripts/ask_codex.py.
Basic usage
python .agents/skills/ask-codex/scripts/ask_codex.py "Your structured prompt here"
JSON output (recommended for programmatic use)
python .agents/skills/ask-codex/scripts/ask_codex.py "Your structured prompt" --json
From a file
python .agents/skills/ask-codex/scripts/ask_codex.py --prompt-file request.md --json
From stdin
cat request.md | python .agents/skills/ask-codex/scripts/ask_codex.py - --json
With project context
By default the script uses a temporary clean directory to avoid loading unrelated
project skills/agents. To preserve project context (AGENTS.md, local files):
python .agents/skills/ask-codex/scripts/ask_codex.py "..." -C ./my-project --json
Optional overrides
| Flag | Description |
|---|---|
-m MODEL |
Override model (default follows ~/.codex/config.toml) |
-s SANDBOX |
read-only (default), workspace-write, danger-full-access |
-t SECONDS |
Timeout (default: 300) |
--status-file PATH |
Write live progress JSON for polling |
--events-file PATH |
Write Codex --json event stream as JSONL |
--result-file PATH |
Write final wrapper JSON to a stable file |
--debug |
Include internal diagnostics in JSON output |
Background usage for long answers
For slow research or complex reviews, prefer starting a background job instead of
blocking the foreground agent:
python .agents/skills/ask-codex/scripts/start_ask_codex_background.py --prompt-file request.md --debug
The launcher returns JSON containing:
{
"jobId": "20260513-061215-83ab13b4",
"statusFile": ".../ask-codex-jobs/<job-id>/status.json",
"eventsFile": ".../ask-codex-jobs/<job-id>/events.jsonl",
"resultFile": ".../ask-codex-jobs/<job-id>/result.json",
"latestFile": ".../ask-codex-jobs/latest.json"
}
When the user asks for “GPT progress” or “Codex progress”,
read %TEMP%/ask-codex-jobs/latest.json, then read its statusFile. Reportstate, elapsedMs, eventsBytes, eventsLines, and updatedAt. The useful
signal is whether eventsBytes / eventsLines changed since the last check.
Do not promise exact token counts.
Shortcut:
python .agents/skills/ask-codex/scripts/check_ask_codex_progress.py --json
Job cleanup is conservative and automatic when launching a new background job:
completed jobs older than 1 day are pruned by default. The launcher does not
delete the job pointed to by latest.json, and it does not delete jobs whosestatus.json says state is starting/running or final is false.
Foreground agents should only read latest.json and the files it points to; do
not scan old job directories for progress.
Wrapper JSON schema
Default minimal JSON:
{
"success": true,
"reply": "The clean final answer from Codex...",
"error": null
}
On failure:
{
"success": false,
"reply": null,
"error": "未能获得 Codex 回复:调用超时。"
}
With --debug:
{
"success": false,
"reply": null,
"error": "未能获得 Codex 回复:调用超时。",
"debug": {
"code": "timeout",
"exitCode": null,
"durationMs": 180000,
"stderrTail": "..."
}
}
Timeout & Robustness Best Practices
When invoking this skill, observe the following timeout and recovery rules:
- Use a 300-second timeout by default. The wrapper script defaults to 300 s (
-t 300). If you override it, never set the outer Shell timeout lower than the script’s internal timeout; otherwise the Shell layer may kill the wrapper before it can recover partial output. - Long prompts take time. Highly structured prompts (scored dimensions, rigid formatting, Chinese output) can take 3–4 minutes to generate. Do not interrupt the process while it is still working.
- Partial replies are recovered automatically. The wrapper attempts to read any content already written to
_.codex_reply.txton timeout or crash. If partial content exists, it is returned in thereplyfield together with apartial_timeout/partial_errorcode. Never discard it. - Verify with a trivial prompt first. If a call fails, test with a minimal prompt such as
"Say hello". Success within ~30 s proves the script, authentication, and network are healthy; the original failure was due to prompt complexity or model generation speed, not an environment issue. - On Windows, prefer a no-profile shell. PowerShell profile startup errors can pollute diagnostics or JSON consumers. When possible, run the wrapper from
powershell -NoProfileor set the shell tool’slogin=false. - The wrapper cleans up timed-out Codex subprocesses. It starts Codex with an isolated process group and kills the process tree on timeout before reading
_.codex_reply.txt. - Use the event stream for progress.
_.codex_reply.txtmay not grow until the final answer is available.events.jsonlis the preferred liveness signal because it captures Codex--jsonstdout as it arrives.
Error Handling
Keep user-facing errors short.
When Codex does not return a usable answer, do not expose raw stderr, stack traces,
JSON diagnostics, exit codes, or internal error categories by default.
Return only a short sentence:
未能获得 Codex 回复:<简短原因>。
Allowed reasons:
- 问题内容为空
- Codex CLI 当前不可用
- Codex CLI 当前不可用或未完成认证
- 当前 Codex 配置不可用
- 调用超时
- Codex 没有返回可用内容
- 工作目录或沙箱权限受限
- Codex 运行失败
- 本地包装器运行异常
Only include detailed diagnostics when explicitly running with --debug.
Post-Processing
After receiving Codex’s reply, synthesize it with the current assistant’s own judgment.
- Mention when Codex’s answer depends on assumptions or lacks enough context.
- If Codex suggests risky commands, edits, or destructive operations, review them before presenting.
- Do not present Codex’s raw output as the final answer without adding your own analysis.
- If Codex contradicts the current assistant’s initial view, explain the divergence and recommend the safer path.
AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。
更多推荐

所有评论(0)