Qwen3/3.5/3.6模型VLLM等本地部署ChatTemplate修正

夜魔009

532人浏览 · 2026-06-08 11:19:16

夜魔009 · 2026-06-08 11:19:16 发布

本地部署Qwen3.5/3.6模型，使用中遇到不支持 developer角色、工具参数中文乱码、历史推理不会隐藏、工具参数以字符串形式返回时解析失败，长工具描述撑爆上下文等问题。

huggingface上Qwen团队提供了一个专门的Qwen-Fixed-Chat-Templates解决相关问题，当你的Qwen模型本地部署后，出现了各种响应格式方面的问题时，不妨看看这个项目：

注意：这个模版文件，github上是没有的，只有在huggingface有。
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates

目前最新版本是V20，核心是The Architect Patch。历史版本V19，核心是 The Agentic Loop Cure。

（v19 版本）：智能代理循环问题彻底修复

根除空思考内容干扰：重写抽象语法树（AST）历史内容渲染逻辑，彻底杜绝空标签 \n 内容注入问题。该漏洞曾引发严重的上下文学习偏差 —— 模型误以为必须跳过思考环节才能调用工具，进而导致超 80% 的会话提前触发 <|im_end|> 终止标记、异常中断。
修复系统提示词逻辑缺陷：弱化【重要】区块中强制调用工具的硬性约束，恢复通用综合应答规则。目前模型可正常从思考状态切换为对话回复，不再出现逻辑异常。
KV 缓存完全生效与记忆丢失问题根治：preserve_thinking 参数默认改为开启。系统会按时间顺序保留历史思考内容，彻底解决多轮工具调用循环中出现的记忆失效卡顿问题，同时实现开箱即用的百分百 KV 缓存前缀匹配。

（v20 版本）：本次进行了大规模架构重构，核心优化深度智能代理循环，并提升与 C++ 推理引擎的兼容性。

Minja 抽象语法树扁平化：大幅优化 Jinja 模板嵌套层级，解决了 llama.cpp 中严重的解析性能瓶颈 —— 该问题曾导致推理吞吐量下降 80%。
Minja 替换功能漏洞修复（紧急补丁）：修复 llama.cpp 内一处严重的 C++ 解析漏洞：当在用户提示词首位使用替换过滤器时，文本内容会被静默清空。目前内置思考内容开关已改用分割与拼接逻辑，内容剔除更稳定可靠。
自动关闭思考模块：新增关键字参数auto_disable_thinking_with_tools（默认关闭），用户在调用工具时可一键停用推理模块。
深度智能代理异常兜底机制：修复因会话中途插入系统提示词、或智能代理循环中缺失用户消息而引发的程序异常。
载荷截断功能：新增max_tool_arg_chars与max_tool_response_chars配置项，彻底解决工具返回海量数据导致的上下文窗口溢出问题。

安装说明

快速安装

根据你使用的环境，更新对应的模板：

LM Studio

在右侧面板中打开你的 Qwen 模型。

向下滚动找到 Prompt Template（提示词模板）。

将模板内容替换为 chat_template.jinja 文件中的全部内容。

点击 Save（保存）。

llama.cpp / koboldcpp

bash
--jinja --chat-template-file chat_template.jinja

vLLM

将 tokenizer_config.json 中的 chat_template 字符串，替换为文件的原始内容。
同时使用 qwen3_coder 工具解析器：

bash
--tool-call-parser qwen3_coder

oMLX

覆盖本地模型目录中的 chat_template.jinja 文件，加载时使用 --jinja 参数。
请移除所有 chat_template_kwargs 覆盖项，模板已内置所有必要配置。

我该用哪个文件？

Qwen 3.5 和 Qwen 3.6 的所有变体（包括 35B、32B、27B、14B 参数规模）已统一适配。你只需使用仓库根目录下的单个 chat_template.jinja 文件即可。

单行版本文件（chat_template_oneline.txt）是预压缩的版本，供要求模板为单行字符串的推理引擎使用。

思考模式切换

你可以控制模型的推理行为：在系统提示词或用户提示词的任意位置插入 <|think_on|> 或 <|think_off|>。

模板会自动识别这些标签，将其从最终上下文里移除（模型永远不会看到它们），并立即切换推理模式。

快速回答、禁用推理：

text
System: You are a coding assistant. <|think_off|>
User: What's 2+2?

深度推理：

text
System: You are a coding assistant. <|think_on|>
User: Implement a red-black tree in Rust.

（该标签语法使用了 Qwen 的控制符分隔符，可保证不会与正常文本或文件路径冲突，这和早期社区模板使用的 /think 语法不同）

节省 Token：清除历史思考内容

在 v19 版本中，该模板默认保留所有历史 \\ 思考块。这是刻意设计的：

避免模型在复杂、多步骤的智能代理循环中出现 “失忆卡顿”

在本地推理引擎上，数学上保证 100% 的前缀 KV 缓存命中率

但如果你的硬件资源有限，需要节省上下文 Token，可以在引擎的模板参数中显式关闭该功能，让模板自动清除历史思考内容：

json
{
"preserve_thinking": false
}

（注意：设置为false会降低多轮对话中的 KV 缓存命中率，因为提示词会动态变化，导致缓存失效）

关键修复的技术细节

1. 根除 “空思考块污染” 与逻辑陷阱（v19）

早期版本为了节省 Token，会用空的 \n 块替换历史思考内容，再配合强制要求模型在 \\ 后立即调用工具的系统提示词。
这形成了有害的上下文学习模式：模型将 “空思考块” 和 “调用工具” 绑定，将 “完整思考块” 和 “对话回复” 绑定，导致超过 80% 的对话提前触发 <|im_end|> 标记中断。
v19 版本彻底移除了空思考块注入，并重写了 <IMPORTANT> 指令，明确允许模型在思考块结束后直接生成对话回复。

2. KV 缓存安全与自回归标准化（v18/v19）

llama.cpp 和 vLLM 使用前缀 KV 缓存来加速生成。v19 版本默认按时间顺序保留历史思考内容，渲染后的对话历史与缓存的生成 Token 完全同步；再配合自回归边界的严格单\n标准化，在多轮循环中实现了 100% 的 KV 缓存命中率。

3. 原生 XML 工具调用格式（v16）

模型是基于 Qwen3-Coder 使用的 XML 格式工具调用进行训练的：

xml
<tool_call>
<function>tool_name>
    <parameter=param_name>
      <value>
    </parameter>
</function>
</tool_call>

v16 版本原生恢复了该格式，兼容所有解析器；同时通过 C++ 安全的键迭代（for args_name in tool_call.arguments），规避了|items导致的崩溃问题。

4. 双层智能代理错误降级（v15）

当工具调用多次验证失败时，模型可能进入退化的推理循环。该模板使用基于consecutive_failures计数器的双层降级机制：

第一层（第 1 次错误）：修改生成提示词前缀，让模型从不同的 Token 位置开始推理，打破缓存的固定模式。

第二层（连续 2 次以上错误）：完全跳过思考块，通过紧急带外指令强制生成修正后的操作，并安全地包裹在用户的tool_response块中。

5. 智能误报检测（v18）

放弃了宽泛的子串匹配（这种匹配会在返回包含 “error” 等词的正常数据库结果时，触发错误的重试循环），改为使用严格的结构防护：仅匹配Exception:、error:、Traceback和command not found等结构，同时结合长度限制和 Shell 回显排除（如$ ）。

6. minijinja 兼容性约束（v18/v20）

Python-only 的 Jinja 功能在 minijinja/ninja（llama.cpp、LM Studio、MLX 使用的 C++ 运行时）上会崩溃或行为异常。模板中所有实例均已重构，实现通用兼容：

content | replace('<|think_on|>', '') → content.split('<|think_on|>') | join(' ')：修复了 minijinja 中当被替换字符串出现在索引 0 位置时，会静默丢弃全部文本的严重 Bug

\ | items → for key in mapping

loop.previtem → messages[loop.index0 - 1]

map('string') → join(' ')

\ | first → $ ' in content

对比矩阵：官方模板 vs 修复版模板 vs 社区模板

功能	官方 Qwen 模板	LuffyTheFox	mod-ellary	Pneury	本修复模板（v19）
工具调用格式	XML（原生）	JSON	JSON	JSON	XML（原生，兼容 qwen3-coder）
工具参数	`	items` 功能失效	已修复	缺失	已修复	已修复（C++ 安全 XML）
提前中断（终止 Bug）	会中断	会中断	会中断	会中断	已修复（通过逻辑陷阱 / 空思考块移除，v19）
智能代理重试中断与推理死循环	会中断	会中断	会中断	会中断	双层降级机制
工具错误误报	无	无	无	无	防护模式（严格结构匹配）
工具调用后过度思考	刷屏 / 中断	损坏	损坏	损坏	通用合成机制
on_reasoning_off 支持	无	无	无	无	完全支持
开发者角色支持	缺失	缺失	缺失	缺失	已添加
思考模式切换	无	无	/think（仅系统提示词）	无	`<	think_on	>/<	think_off	>`（任意位置）
历史中的空思考块	生成空块	损坏	标签被省略	损坏	已彻底根除（v19）
KV 前缀缓存	动态历史导致缓存失效	缓存失效	缓存失效	缓存失效	开箱即用 100% 稳定（v19）
对话中途插入系统提示词	崩溃	崩溃	崩溃	崩溃	已修复
无用户查询时崩溃	崩溃	崩溃	崩溃	崩溃	优雅降级
传统 AST 支持	失效（previtem）	失效	失效	失效	已修复（索引 0 问题）
</thinking> 幻觉	失效	无	无	无	已检测并安全清除

运行测试套件

bash
python3 scripts/test_v20.py

测试覆盖：auto_disable_thinking_with_tools、工具调用格式、并行工具调用、中途系统提示词、代理循环回退、payload_truncation逻辑、<|think_on|>/<|think_off|> 内联覆盖、以及所有 v19 的回归测试。

chat_template.jinja

{%- set template_version = "qwen3.6-froggeric-v20" %}
{%- set image_count = namespace(value=0) %}
{%- set video_count = namespace(value=0) %}
{%- set add_vision_id = add_vision_id if add_vision_id is defined else false %}
{%- set enable_thinking = enable_thinking if enable_thinking is defined else true %}
{%- set auto_disable_thinking_with_tools = auto_disable_thinking_with_tools if auto_disable_thinking_with_tools is defined else false %}
{%- set _preserve_thinking = preserve_thinking if preserve_thinking is defined else false %}
{%- set max_tool_arg_chars = max_tool_arg_chars if max_tool_arg_chars is defined else 0 %}
{%- set max_tool_response_chars = max_tool_response_chars if max_tool_response_chars is defined else 0 %}
{%- set _has_tools = (tools is defined and tools and tools is iterable and tools is not mapping) %}
{%- set ns_state = namespace(thinking=enable_thinking) %}
{%- if auto_disable_thinking_with_tools and _has_tools %}
    {%- set ns_state.thinking = false %}
{%- endif %}
{%- macro render_content(content, do_vision_count, is_system_content=false) %}
    {%- if content is string %}
        {{- content }}
    {%- elif content is iterable and content is not mapping %}
        {%- for item in content %}
            {%- if item is mapping %}
                {%- if item.type == 'image' or 'image' in item or 'image_url' in item %}
                    {%- if is_system_content %}
                        {{- raise_exception('System message cannot contain images.') }}
                    {%- endif %}
                    {%- if do_vision_count %}
                        {%- set image_count.value = image_count.value + 1 %}
                    {%- endif %}
                    {%- if add_vision_id %}
                        {{- 'Picture ' ~ image_count.value ~ ': ' }}
                    {%- endif %}
                    {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
                {%- elif item.type == 'video' or 'video' in item %}
                    {%- if is_system_content %}
                        {{- raise_exception('System message cannot contain videos.') }}
                    {%- endif %}
                    {%- if do_vision_count %}
                        {%- set video_count.value = video_count.value + 1 %}
                    {%- endif %}
                    {%- if add_vision_id %}
                        {{- 'Video ' ~ video_count.value ~ ': ' }}
                    {%- endif %}
                    {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
                {%- elif 'text' in item %}
                    {{- item.text }}
                {%- else %}
                    {{- raise_exception('Unexpected item type in content.') }}
                {%- endif %}
            {%- else %}
                {{- item | string }}
            {%- endif %}
        {%- endfor %}
    {%- elif content is none or content is undefined %}
        {{- '' }}
    {%- else %}
        {{- raise_exception('Unexpected content type.') }}
    {%- endif %}
{%- endmacro %}
{%- if not messages %}
    {{- raise_exception('No messages provided.') }}
{%- endif %}
{%- set _first_role = messages[0].role %}
{%- if _first_role == 'system' or _first_role == 'developer' %}
    {%- set _sys_msg = messages[0] %}
    {%- set _msgs = messages[1:] %}
{%- else %}
    {%- set _sys_msg = none %}
    {%- set _msgs = messages %}
{%- endif %}
{%- set _sc = '' %}
{%- if _sys_msg is not none %}
    {%- set _sc = render_content(_sys_msg.content, false, true) | trim %}
    {%- if '<|think_off|>' in _sc %}
        {%- set ns_state.thinking = false %}
        {%- set _sc = _sc.split('<|think_off|>') | join('') | trim %}
    {%- elif '<|think_on|>' in _sc %}
        {%- set ns_state.thinking = true %}
        {%- set _sc = _sc.split('<|think_on|>') | join('') | trim %}
    {%- endif %}
{%- endif %}
{%- if _has_tools %}
    {{- '<|im_start|>system\n' }}
    {{- '# Tools\n\nYou have access to the following functions:\n\n<tools>' }}
    {%- for tool in tools %}
        {{- '\n' }}
        {{- tool | tojson }}
    {%- endfor %}
    {{- '\n</tools>' }}
    {%- set tool_instructions %}
If you choose to call a function ONLY reply in the following format with NO suffix:

<think>
Brief explanation of tool call
</think>
<tool_call>
<function=example_function_name>
<parameter=example_parameter_1>
value_1
</parameter>
<parameter=example_parameter_2>
This is the value for the second parameter
that can span
multiple lines
</parameter>
</function>
</tool_call>

<IMPORTANT>
Reminder:
- You can use the <think></think> block to plan your next tool call OR to synthesize data and formulate your final response to the user.
- ALL explanation and reasoning MUST be placed strictly inside the <think></think> block.
- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags.
- If you choose to call a tool, you MUST output the <tool_call> block IMMEDIATELY after closing </think>. Do NOT output any conversational text before the tool call.
- The <tool_call> and <function> tags MUST be at the very beginning of a new line, with NO spaces or indentation before them.
- To call multiple functions, output a separate, completely closed <tool_call></tool_call> block for EACH function. Do NOT nest <tool_call> blocks.
- If you have gathered all necessary data and do not need to call a tool, answer the question like normal and provide your final response to the user IMMEDIATELY after closing </think>.
</IMPORTANT>
    {%- endset %}
    {{- '\n\n' ~ tool_instructions | trim }}
    {%- if _sc %}
        {{- '\n\n' + _sc }}
    {%- endif %}
    {{- '<|im_end|>\n' }}
{%- else %}
    {%- if _sc %}
        {{- '<|im_start|>system\n' + _sc + '<|im_end|>\n' }}
    {%- endif %}
{%- endif %}
{%- set _last_idx = _msgs | length - 1 %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=_last_idx) %}
{%- for message in _msgs[::-1] %}
    {%- set index = (_msgs | length - 1) - loop.index0 %}
    {%- if ns.multi_step_tool and message.role == 'user' %}
        {%- set _rc = render_content(message.content, false) | trim %}
        {%- if not (_rc.startswith('<tool_response>') and _rc.endswith('</tool_response>')) %}
            {%- set ns.multi_step_tool = false %}
            {%- set ns.last_query_index = index %}
        {%- endif %}
    {%- endif %}
{%- endfor %}
{%- if ns.multi_step_tool %}
    {%- if _last_idx > 50 %}
        {%- set ns.last_query_index = _last_idx %}
    {%- else %}
        {%- set ns.last_query_index = 0 %}
    {%- endif %}
{%- endif %}
{%- set ns2 = namespace(prev_role='', consecutive_failures=0) %}
{%- for message in _msgs %}
    {%- set is_system = (message.role == "system" or message.role == "developer") %}
    {%- set content = render_content(message.content, true, is_system) | trim %}
    {%- if '<|think_off|>' in content %}
        {%- set ns_state.thinking = false %}
        {%- set content = content.split('<|think_off|>') | join('') | trim %}
    {%- elif '<|think_on|>' in content %}
        {%- set ns_state.thinking = true %}
        {%- set content = content.split('<|think_on|>') | join('') | trim %}
    {%- endif %}
    {%- if is_system %}
        {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
    {%- elif message.role == 'user' %}
        {%- set ns2.consecutive_failures = 0 %}
        {{- '<|im_start|>user\n' + content + '<|im_end|>\n' }}
    {%- elif message.role == 'assistant' %}
        {%- set reasoning_content = '' %}
        {%- if message.reasoning_content is defined and message.reasoning_content is not none %}
            {%- if message.reasoning_content is string %}
                {%- set reasoning_content = message.reasoning_content %}
            {%- else %}
                {%- set reasoning_content = message.reasoning_content | string %}
            {%- endif %}
        {%- else %}
            {%- set _think_end = '' %}
            {%- if '</think>' in content %}
                {%- set _think_end = '</think>' %}
            {%- elif '</thinking>' in content %}
                {%- set _think_end = '</thinking>' %}
            {%- elif '</ think>' in content %}
                {%- set _think_end = '</ think>' %}
            {%- elif '</think >' in content %}
                {%- set _think_end = '</think >' %}
            {%- endif %}
            {%- if _think_end %}
                {%- if _think_end == '</thinking>' %}
                    {%- set _think_start = '<thinking>' %}
                {%- else %}
                    {%- set _think_start = '<think>' %}
                {%- endif %}
                {%- set reasoning_content = content.split(_think_end)[0].rstrip('\n') %}
                {%- if _think_start in reasoning_content %}
                    {%- set reasoning_content = reasoning_content.split(_think_start)[-1].lstrip('\n') %}
                {%- endif %}
                {%- set content = content.split(_think_end)[-1].lstrip('\n') %}
            {%- endif %}
        {%- endif %}
        {%- set reasoning_content = reasoning_content | trim %}
        {%- if (_preserve_thinking or loop.index0 > ns.last_query_index) and reasoning_content %}
            {{- '<|im_start|>assistant\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
        {%- else %}
            {{- '<|im_start|>assistant\n' + content }}
        {%- endif %}
        {%- if message.tool_calls is defined and message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
            {%- for tool_call in message.tool_calls %}
                {%- if tool_call.function is defined and tool_call.function is not none %}
                    {%- set tc = tool_call.function %}
                {%- else %}
                    {%- set tc = tool_call %}
                {%- endif %}
                {%- if loop.first %}
                    {%- if content | trim %}
                        {{- '\n\n<tool_call>\n<function=' + tc.name + '>\n' }}
                    {%- else %}
                        {{- '<tool_call>\n<function=' + tc.name + '>\n' }}
                    {%- endif %}
                {%- else %}
                    {{- '\n\n<tool_call>\n<function=' + tc.name + '>\n' }}
                {%- endif %}
                {%- if tc.arguments is defined and tc.arguments is not none %}
                    {%- if tc.arguments is mapping %}
                        {%- for args_name, args_value in tc.arguments.items() %}
                            {{- '<parameter=' + args_name + '>\n' }}
                            {%- if args_value is mapping or (args_value is sequence and args_value is not string) %}
                                {%- set _av = args_value | tojson %}
                            {%- else %}
                                {%- set _av = args_value | string %}
                            {%- endif %}
                            {%- if max_tool_arg_chars > 0 and _av | length > max_tool_arg_chars %}
                                {{- _av[:max_tool_arg_chars] + '\n[TRUNCATED — original length ' ~ (_av | length | string) ~ ' chars]' }}
                            {%- else %}
                                {{- _av }}
                            {%- endif %}
                            {{- '\n</parameter>\n' }}
                        {%- endfor %}
                    {%- elif tc.arguments is string and tc.arguments %}
                        {{- tc.arguments }}
                    {%- endif %}
                {%- endif %}
                {%- if loop.last %}
                    {{- '</function>\n</tool_call>\n' }}
                {%- else %}
                    {{- '</function>\n</tool_call>' }}
                {%- endif %}
            {%- endfor %}
        {%- endif %}
        {{- '<|im_end|>\n' }}
    {%- elif message.role == 'tool' %}
        {%- set _content_lower = content | lower %}
        {%- if content | length < 500 and '$ ' not in content and 'took ' not in _content_lower and ('"error":' in _content_lower or 'error:' in _content_lower or 'exception:' in _content_lower or 'traceback' in _content_lower or 'command not found' in _content_lower or 'invalid syntax' in _content_lower or 'failed to' in _content_lower) %}
            {%- set ns2.consecutive_failures = ns2.consecutive_failures + 1 %}
        {%- else %}
            {%- set ns2.consecutive_failures = 0 %}
        {%- endif %}
        {%- if ns2.prev_role != 'tool' %}
            {{- '<|im_start|>user' }}
        {%- endif %}
        {%- if max_tool_response_chars > 0 and content | length > max_tool_response_chars %}
            {%- set content = content[:max_tool_response_chars] + '\n[TRUNCATED — original length ' ~ (content | length | string) ~ ' chars]' %}
        {%- endif %}
        {{- '\n<tool_response>\n' + content }}
        {%- if ns2.consecutive_failures >= 2 %}
            {{- '\n\n⚠️ SYSTEM WARNING: ' ~ ns2.consecutive_failures ~ ' consecutive tool errors detected. Your previous approach is incorrect. You MUST use a fundamentally different approach or corrected arguments.' }}
        {%- elif ns2.consecutive_failures == 1 %}
            {{- '\n\n⚠️ SYSTEM WARNING: The previous tool call returned an error. Diagnose the failure and retry with completely corrected arguments.' }}
        {%- endif %}
        {{- '\n</tool_response>' }}
        {%- if loop.last %}
            {{- '<|im_end|>\n' }}
        {%- else %}
            {%- set _next_role = _msgs[loop.index0 + 1].role %}
            {%- if _next_role != 'tool' %}
                {{- '<|im_end|>\n' }}
            {%- endif %}
        {%- endif %}
    {%- else %}
        {{- '<|im_start|>user\n[' + message.role + ']: ' + content + '<|im_end|>\n' }}
    {%- endif %}
    {%- set ns2.prev_role = message.role %}
{%- endfor %}
{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n' }}
    {%- if not ns_state.thinking %}
        {{- '<think>\n</think>\n' }}
    {%- elif ns2.consecutive_failures >= 2 %}
        {{- '<think>\n</think>\n' }}
    {%- else %}
        {{- '<think>\n' }}
    {%- endif %}
{%- endif %}