Claude Extended Thinking 实战：Opus 4.7 已经废弃 budget_tokens，新写法和迁移避坑全梳理

NiceCloud喜云

360人浏览 · 2026-05-28 14:40:29

NiceCloud喜云 · 2026-05-28 14:40:29 发布

大多数关于 Extended Thinking 的中文资料还在教 thinking: {type: "enabled", budget_tokens: 10000}。这套写法在 Claude Opus 4.7 上直接报错——Anthropic 在 4.6/4.7 这一代把 thinking 接口换了。

本文按 2026-05 的实际 API 行为，给出三个模型上 thinking 的正确用法、effort 选型、interleaved thinking 在 tool 循环里的关键回填动作，以及最容易踩的几个坑。

一、半年里 thinking 接口换了一次写法

如果你的代码库里还有这样的调用：

response = client.messages.create(
    model="claude-opus-4-7",  # 改成 4.7
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},  # 这行会报错
    messages=[...]
)

把 model 切到 Opus 4.7 之后，整个请求会失败。原因是 thinking 接口从"显式预算"切到了"adaptive + effort"——4.7 已经不再接受老写法。

模型	老写法 (`type: enabled` + `budget_tokens`)	新写法 (`type: adaptive` + `effort`)
Opus 4.7	❌ 不再接受	✅ 唯一支持的方式
Opus 4.6	⚠️ 仍可用但 deprecated	✅ 推荐
Sonnet 4.6	⚠️ 仍可用但 deprecated	✅ 推荐
Sonnet 4.6 + tool 间思考	需 `interleaved-thinking-2025-05-14` beta header	adaptive 自动启用

也就是说，把 Opus 4.7 接进现有 agent 链路时，第一步是改 thinking 配置，不只是改模型 ID。

二、adaptive thinking 是什么

在这里插入图片描述

老写法让你告诉模型"最多花 10000 token 思考"。问题是：

简单问题被强制思考浪费 token
复杂问题如果设小了又思考不够

adaptive 把这个判断权交给模型，由 effort 控制激进程度：

effort	行为	适用
`low`	简单问题常常完全跳过思考	低成本、低延迟
`medium`	平衡，只在必要时思考	一般生产任务
`high`（默认）	几乎总会先思考	生产环境推理任务
`max`（仅 Opus 4.6/4.7）	最大思考强度	AIME 级数学、长链路 agent

Anthropic 内部评测：adaptive 的 token 利用率在大量任务上优于固定 budget——因为它不会在简单题上浪费推理，也不会在难题上提前收手。

三、完整调用代码

Opus 4.7（唯一支持 adaptive）

import anthropic

client = anthropic.Anthropic(
    api_key="sk-你的密钥",
    base_url="https://gw.claudeapi.com"
)

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},
    messages=[
        {"role": "user", "content": "证明：模 4 余 3 的素数有无穷多个。"}
    ]
)

for block in response.content:
    if block.type == "thinking":
        print(f"[思考] {block.thinking[:200]}...")
    elif block.type == "text":
        print(f"[答案] {block.text}")

content 是按顺序排列的 block 数组：先 thinking block（推理摘要），再 text block（最终答案）。

Sonnet 4.6

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "medium"},  # Sonnet 不支持 max
    messages=[{"role": "user", "content": "..."}]
)

Sonnet 4.6 上 effort: max 不被接受——max 是 Opus 系列独占的。日常生产用 medium，关键推理才上 high。

Opus 4.6（两种写法都能用，但建议改）

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},
    messages=[...]
)

Node.js / TypeScript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ apiKey: "sk-你的密钥" });

const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 16000,
  thinking: { type: "adaptive" },
  output_config: { effort: "high" },
  messages: [{ role: "user", content: "你的复杂推理任务" }],
});

for (const block of response.content) {
  if (block.type === "thinking") console.log("[思考]", block.thinking);
  else if (block.type === "text") console.log("[答案]", block.text);
}

四、interleaved thinking：tool 循环里的关键变化

Extended Thinking 在 agent 场景下真正强大的地方，是允许模型在每次 tool 调用后再思考一轮——也就是 interleaved thinking。

老写法在 Sonnet 4.6 上要手动加 beta header：

client = anthropic.Anthropic(
    default_headers={"anthropic-beta": "interleaved-thinking-2025-05-14"}
)

Opus 4.7 和 4.6 的 adaptive 模式下，interleaved thinking 默认就开，不需要任何 beta header。这是迁移到 4.7 的隐藏收益——agent 链路不再需要管 thinking 旗子。

完整 tool-use 循环（关键点：上一轮的 thinking block 必须原样回填）：

weather_tool = {
    "name": "get_weather",
    "description": "获取指定城市的实时天气",
    "input_schema": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"]
    }
}

# 第一轮：模型先思考再决定调 tool
response_1 = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=8000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},
    tools=[weather_tool],
    messages=[
        {"role": "user", "content": "巴黎现在适合穿什么？给出具体建议。"}
    ]
)

# 第二轮：把 thinking + tool_use block 原样回填，加 tool_result
thinking_block = next(b for b in response_1.content if b.type == "thinking")
tool_use_block = next(b for b in response_1.content if b.type == "tool_use")

response_2 = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=8000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},
    tools=[weather_tool],
    messages=[
        {"role": "user", "content": "巴黎现在适合穿什么？给出具体建议。"},
        {
            "role": "assistant",
            "content": [thinking_block, tool_use_block]  # ← 关键
        },
        {
            "role": "user",
            "content": [{
                "type": "tool_result",
                "tool_use_id": tool_use_block.id,
                "content": "巴黎 22°C，晴，微风。"
            }]
        }
    ]
)

没回填 thinking block 的后果：模型"失忆"，把刚才的推理过程丢掉，第二轮决策质量明显下降。这是接 agent 链路时最常踩的坑。

五、effort 选型：到底用哪一档

按业务任务类型选，不要默认 high：

任务	推荐 effort	模型	备注
简单分类、信息抽取	`low` 或不开 thinking	Haiku 4.5 / Sonnet 4.6	开 thinking 反而加成本与延迟
常规问答、文档生成	`medium`	Sonnet 4.6	性价比甜点
代码 review、PR 分析	`medium` 或 `high`	Sonnet 4.6 / Opus 4.6	多步推理但不至于 max
复杂算法、架构设计	`high`	Opus 4.7	需要长链条思考
AIME 级数学、最难的 debug	`max`	Opus 4.7 / 4.6	极限场景，单次成本明显上升

经验法则：开 thinking 之前先用普通模式跑一遍。普通模式答得不错就不开；如果有明显问题（漏约束、逻辑跳步、答非所问），再上 thinking。盲目默认 high 是企业 AI 账单失控的常见成因之一。

六、容易踩的坑

坑 1：thinking 与 prompt caching 的隐藏耦合

prompt caching 的 cache key 包含 thinking 配置。这意味着：

# 第一次：建缓存
client.messages.create(
    thinking={"type": "adaptive"},
    output_config={"effort": "medium"},
    system=[{"type": "text", "text": LONG_SYSTEM, "cache_control": {"type": "ephemeral"}}],
)

# 第二次：把 effort 改成 high → 缓存失效，重新计费
client.messages.create(
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},  # 改了
    system=[{"type": "text", "text": LONG_SYSTEM, "cache_control": {"type": "ephemeral"}}],
)

实战建议：同一条工作流内 effort 固定，不要在 A/B 测试时改 effort 又指望 cache 命中。

坑 2：thinking block 不会跨轮保留

模型每次返回 thinking block，但下一轮请求时这些 block 默认不在上下文里——除非像第四节那样手动回填。多轮对话忽略这点，模型会反复"重新思考"已经想过的事。

坑 3：thinking 计费是全过程，不是可见摘要

拿到的 block.thinking 是思考摘要——计费按完整内部思考过程，可能是摘要的 3-5 倍。看起来 200 字的 thinking block，背后可能消耗了 5000 token。月底账单和屏幕估算的不一样，原因常在这里。

坑 4：thinking 与几个采样参数互斥

开启 thinking 时不能用：

temperature ≠ 1 或 top_k
top_p < 0.95
forced tool use（tool_choice 指定具体工具名）
response prefill（用 assistant 消息预填模型回复）

迁移时要先拿掉这些参数。

七、迁移 checklist：从老代码到 adaptive

代码层

把 thinking: {"type": "enabled", "budget_tokens": N} 改成 thinking: {"type": "adaptive"}
加 output_config: {"effort": "..."}，按选型表选档
移除 interleaved-thinking-2025-05-14 beta header（adaptive 自动开）
检查 temperature、top_k、tool_choice、prefill 是否与 thinking 冲突

工程层

多轮对话的 thinking block 回填逻辑
prompt caching 的 effort 一致性
账单告警：把 thinking token 与可见输出 token 分开监控

测试层

同一 prompt 在 effort: low / medium / high 三档下跑一遍，记录质量与 token 消耗
agent 链路：tool_result 回填后第二轮 thinking 是否仍然合理

八、小结

做对一件事就够了：默认用 adaptive，按任务类型选 effort，不要一上来就 high。

Opus 4.7 的 adaptive thinking 是过去半年 Anthropic 在 thinking 接口上做得最值得迁移的改动——既减少了 boilerplate（不再要 beta header），又把 token 利用率交给了模型自己判断。但代价是老代码必须改写，迁移 checklist 跑一遍才能稳。

完整模型定价与控制台见 claudeapi.com。官方文档：Adaptive Thinking · Extended Thinking。

AtomGit开源社区

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念，把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起，为开发者提供从开发、训练到部署的一站式体验。

更多推荐

一个GitHub Issue就能投毒Claude Code？我拆解了整条供应链攻击链

上周Claude Code刚被AMD AI负责人用23万次调用记录实锤"越更新越差"[1]，这周它的GitHub Actions又被安全研究者扒出了一个供应链级别的漏洞——一个恶意GitHub Issue，就能让Claude Code帮你把仓库Secret全偷走，甚至往你的代码里投毒[2]。这个漏洞有多严重？CVSS v4.0评分7.8，Anthropic为此支付了4800美元赏金。更可怕的是，A