【LangChain】输出解析器（Output Parsers）完全指南

code bean

158人浏览 · 2026-05-11 07:37:57

code bean · 2026-05-11 07:37:57 发布

LangChain 输出解析器（Output Parsers）完全指南

2026 年最新版 | 覆盖所有内置解析器 + 完整代码示例

一、什么是输出解析器

输出解析器是 LangChain 中连接"自由文本 LLM"与"结构化程序"的桥梁。LLM 天生输出自然语言，但应用程序需要 JSON、列表、日期等结构化数据。解析器负责将原始文本转换为可直接使用的 Python 对象。

二、解析器全景图

类别解析器用途复杂度
基础文本 StrOutputParser 提取纯文本内容 ⭐
列表 CommaSeparatedListOutputParser 逗号分隔列表 ⭐
NumberedListOutputParser 编号列表 ⭐
MarkdownListOutputParser Markdown 列表 ⭐
结构化 JsonOutputParser 解析为 JSON 字典 ⭐⭐
PydanticOutputParser 解析为 Pydantic 对象（类型安全） ⭐⭐⭐
专用 DatetimeOutputParser 日期时间格式 ⭐⭐
EnumOutputParser 枚举值约束 ⭐⭐
XMLOutputParser XML 格式输出 ⭐⭐
容错 OutputFixingParser 自动修复格式错误 ⭐⭐⭐
RetryOutputParser / RetryWithErrorOutputParser 带上下文的重试修复 ⭐⭐⭐⭐
工具调用 JsonOutputKeyToolsParser 解析 OpenAI 工具调用 ⭐⭐⭐
PydanticToolsParser Pydantic 工具参数解析 ⭐⭐⭐

三、基础解析器详解与示例

StrOutputParser — 字符串解析器

最基础的解析器，从 AIMessage 中提取纯文本内容。

from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# 构建链：提示 → 模型 → 解析器
prompt = ChatPromptTemplate.from_template("用一句话解释{concept}")
chain = prompt | llm | StrOutputParser()

result = chain.invoke({"concept": "神经网络"})
print(result)
# 输出: "神经网络是一种模拟人脑神经元连接方式的机器学习模型..."
print(type(result))  # <class 'str'>

特点：去除了 AIMessage 包装，直接返回字符串，适合简单问答场景。

CommaSeparatedListOutputParser — CSV 列表解析器

要求模型输出逗号分隔的内容，自动转为 Python 列表。

from langchain_core.output_parsers import CommaSeparatedListOutputParser
from langchain_core.prompts import ChatPromptTemplate

parser = CommaSeparatedListOutputParser()

prompt = ChatPromptTemplate.from_messages([
    ("system", "你是一个分类助手。{format_instructions}"),
    ("human", "列出{topic}的主要类型，不要编号，用逗号分隔")
]).partial(format_instructions=parser.get_format_instructions())

chain = prompt | llm | parser

result = chain.invoke({"topic": "Python Web 框架"})
print(result)
# 输出: ['Django', 'Flask', 'FastAPI', 'Tornado', 'Bottle']
print(type(result))  # <class 'list'>

get_format_instructions() 自动生成提示：告诉模型"你的输出应该是一个逗号分隔的列表" 。

NumberedListOutputParser — 编号列表解析器

解析带编号的列表（如 1. xxx 2. xxx）。

from langchain_core.output_parsers import NumberedListOutputParser

parser = NumberedListOutputParser()

prompt = ChatPromptTemplate.from_template("""
列出{topic}的5个优点，使用编号格式。
{format_instructions}
""")

chain = prompt.partial(format_instructions=parser.get_format_instructions()) | llm | parser

result = chain.invoke({"topic": "微服务架构"})
print(result)
# 输出: ['独立部署', '技术栈灵活', '扩展性强', '故障隔离', '团队自治']

MarkdownListOutputParser — Markdown 列表解析器

解析 Markdown 格式的无序列表（- item 或 * item）。

from langchain_core.output_parsers import MarkdownListOutputParser

parser = MarkdownListOutputParser()

prompt = ChatPromptTemplate.from_template("""
用 Markdown 列表格式列出{topic}的核心特性。
{format_instructions}
""")

chain = prompt.partial(format_instructions=parser.get_format_instructions()) | llm | parser

result = chain.invoke({"topic": "Docker"})
print(result)
# 输出: ['容器化', '轻量级', '可移植', '版本控制', '资源隔离']

四、结构化解析器详解与示例

JsonOutputParser — JSON 解析器

将 LLM 输出解析为 Python 字典。可配合 Pydantic 模型生成格式说明。

from langchain_core.output_parsers import JsonOutputParser
from pydantic import BaseModel, Field

# 方式一：无 Schema，直接解析为 dict
parser = JsonOutputParser()

prompt = ChatPromptTemplate.from_messages([
    ("system", "提取信息并以 JSON 返回。{format_instructions}"),
    ("human", "介绍一下日本，包含名称、人口、大洲")
]).partial(format_instructions=parser.get_format_instructions())

chain = prompt | llm | parser
result = chain.invoke({})
print(result)
# 输出: {'name': 'Japan', 'population': 125000000, 'continent': 'Asia'}
print(type(result))  # <class 'dict'>

# 方式二：带 Pydantic Schema（仅生成格式说明，返回仍是 dict）
class CountryInfo(BaseModel):
    name: str = Field(description="国家名称")
    population: int = Field(description="人口数量")
    continent: str = Field(description="所在大洲")

parser_with_schema = JsonOutputParser(pydantic_object=CountryInfo)
# 生成的格式说明更详细，但返回仍是 dict 而非 CountryInfo 对象

PydanticOutputParser — Pydantic 解析器（强烈推荐）

最强大、最安全的解析器。将输出直接转为类型安全的 Pydantic 对象，自动校验字段类型和必填项。

from pydantic import BaseModel, Field
from typing import List, Optional
from langchain_core.output_parsers import PydanticOutputParser

# 定义数据结构
class ActionItem(BaseModel):
    task: str = Field(description="任务描述")
    assignee: str = Field(description="负责人")

class MeetingSummary(BaseModel):
    title: str = Field(description="会议标题")
    key_decisions: List[str] = Field(description="关键决策")
    action_items: List[ActionItem] = Field(description="行动项")

parser = PydanticOutputParser(pydantic_object=MeetingSummary)

prompt = ChatPromptTemplate.from_messages([
    ("system", """你是会议纪要助手。从会议记录中提取结构化信息。
{format_instructions}"""),
    ("human", "{transcript}")
]).partial(format_instructions=parser.get_format_instructions())

chain = prompt | llm | parser

meeting_notes = """
3月10日站会。出席：Alice, Bob, Carol。
Alice 说数据管道已完成，待审查。
Bob 提到生产环境 API 速率限制有问题。
决定：API 调用实现指数退避。
Carol 周五前完成重试逻辑代码。
Bob 下周二前搭建监控面板。
Alice 审查 Carol 的 PR。
"""

summary = chain.invoke({"transcript": meeting_notes})

print(type(summary))  # <class '__main__.MeetingSummary'>
print(f"标题: {summary.title}")
print(f"决策: {summary.key_decisions}")
for item in summary.action_items:
    print(f"  任务: {item.task} -> 负责人: {item.assignee}")

输出：

标题: 3月10日团队站会
决策: ['API调用实现指数退避']
  任务: 完成重试逻辑代码 -> 负责人: Carol
  任务: 搭建监控面板 -> 负责人: Bob
  任务: 审查Carol的PR -> 负责人: Alice

Pydantic 的优势：

自动类型转换（如字符串 "125000000" → 整数 125000000）
必填字段校验（缺少字段会报错）
字段约束（如 ge=1, le=5 限制评分范围）

DatetimeOutputParser — 日期时间解析器

将 LLM 输出解析为 Python datetime 对象。

from langchain.output_parsers import DatetimeOutputParser
from langchain_core.prompts import ChatPromptTemplate

parser = DatetimeOutputParser()

prompt = ChatPromptTemplate.from_messages([
    ("system", "提取日期时间信息。{format_instructions}"),
    ("human", "会议定在下周三下午三点")
]).partial(format_instructions=parser.get_format_instructions())

chain = prompt | llm | parser

result = chain.invoke({})
print(result)
# 输出: 2026-05-20 15:00:00
print(type(result))  # <class 'datetime.datetime'>

EnumOutputParser — 枚举解析器

强制输出必须是预定义枚举值之一。

from langchain.output_parsers import EnumOutputParser
from enum import Enum

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"

parser = EnumOutputParser(enum=Sentiment)

prompt = ChatPromptTemplate.from_template("""
分析以下评论的情感倾向。只能从 positive/negative/neutral 中选择。
评论：{review}
""")

chain = prompt | llm | parser

result = chain.invoke({"review": "产品质量非常好，物流也很快！"})
print(result)  # Sentiment.POSITIVE
print(type(result))  # <enum 'Sentiment'>
print(result.value)  # 'positive'

XMLOutputParser — XML 解析器

解析 XML 格式的输出。

from langchain_core.output_parsers import XMLOutputParser

parser = XMLOutputParser()

prompt = ChatPromptTemplate.from_messages([
    ("system", "以 XML 格式返回结果。{format_instructions}"),
    ("human", "提取以下信息：书名《三体》，作者刘慈欣，年份2008")
]).partial(format_instructions=parser.get_format_instructions())

chain = prompt | llm | parser

result = chain.invoke({})
print(result)
# 输出: {'book': {'title': '三体', 'author': '刘慈欣', 'year': '2008'}}

五、容错解析器（生产环境必备）

LLM 有时会输出格式错误的 JSON（缺少逗号、多余注释等）。容错解析器自动修复这些问题。

OutputFixingParser — 自动修复解析器

当主解析器失败时，调用另一个 LLM 来修复格式。

from langchain.output_parsers import OutputFixingParser
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

class Recipe(BaseModel):
    name: str = Field(description="菜品名称")
    ingredients: List[str] = Field(description="食材列表")
    prep_time: int = Field(description="准备时间（分钟）")

base_parser = PydanticOutputParser(pydantic_object=Recipe)

# 用修复解析器包装基础解析器
fixing_parser = OutputFixingParser.from_llm(
    parser=base_parser,
    llm=ChatOpenAI(model="gpt-4o-mini"),  # 用于修复的 LLM
)

# 模拟一个格式错误的输出
bad_output = '{"name": "宫保鸡丁", "ingredients": ["鸡肉", "花生", "辣椒"] "prep_time": 30}'  # 注意缺少逗号

try:
    result = fixing_parser.parse(bad_output)
    print(f"修复成功: {result}")
except Exception as e:
    print(f"修复失败: {e}")

工作原理：修复解析器收到错误输出后，将其发送给 LLM 并附加指令"请修正这个 JSON 的语法错误" 。

RetryOutputParser / RetryWithErrorOutputParser — 重试解析器

比 OutputFixingParser 更强大：不仅发送错误输出，还附带原始提示和错误信息，让 LLM 在完整上下文中重新生成。

from langchain_core.output_parsers import (
    PydanticOutputParser, 
    RetryWithErrorOutputParser
)
from langchain_core.prompts import PromptTemplate

class ProductReview(BaseModel):
    product_name: str = Field(description="产品名称")
    rating: int = Field(description="评分 1-5", ge=1, le=5)
    summary: str = Field(description="简短总结")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
main_parser = PydanticOutputParser(pydantic_object=ProductReview)

# 用重试解析器包装
retry_parser = RetryWithErrorOutputParser.from_llm(
    parser=main_parser,
    llm=llm,
    max_retries=2  # 最多重试 2 次
)

prompt_template = """
分析以下产品评论并提取信息。
{format_instructions}

评论文本：
{review_text}
"""

prompt = PromptTemplate(
    template=prompt_template,
    input_variables=["review_text"],
    partial_variables={"format_instructions": main_parser.get_format_instructions()}
)

# 使用 parse_with_prompt 方法，传入提示对象
review = "不错的手机（4星）。相机很好。希望存储空间更大。"
prompt_value = prompt.format_prompt(review_text=review)
llm_output = llm.invoke(prompt_value)

try:
    result = retry_parser.parse_with_prompt(
        llm_output.content,
        prompt_value
    )
    print(f"产品: {result.product_name}")
    print(f"评分: {result.rating}")
except Exception as e:
    print(f"重试后仍失败: {e}")

RetryWithErrorOutputParser vs OutputFixingParser：

OutputFixingParser：只给 LLM 错误输出，让它"猜"着修
RetryWithErrorOutputParser：给 LLM 原始提示 + 错误输出 + 错误信息，在完整上下文中重新生成，成功率更高。

六、工具调用解析器

用于解析支持 Function Calling 的模型（如 OpenAI、Claude）的工具调用结果。

JsonOutputKeyToolsParser

from langchain_core.output_parsers.openai_tools import JsonOutputKeyToolsParser

# 当模型返回 tool_calls 时，提取指定工具的参数
parser = JsonOutputKeyToolsParser(key_name="get_weather")

# 通常配合 with_structured_output 使用

PydanticToolsParser

from langchain_core.output_parsers.openai_tools import PydanticToolsParser

# 将工具调用参数直接解析为 Pydantic 对象

七、现代推荐做法：with_structured_output()

2026 年最新趋势：对于支持原生结构化输出的模型（GPT-4o、Claude 3、Gemini），优先使用 .with_structured_output()，而非传统的 Output Parser 。

from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

class MovieReview(BaseModel):
    title: str = Field(description="电影标题")
    rating: float = Field(description="评分 0-10")
    summary: str = Field(description="一句话总结")
    recommended: bool = Field(description="是否推荐")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# 直接绑定结构化输出，无需解析器！
structured_llm = llm.with_structured_output(MovieReview)

result = structured_llm.invoke("评价电影《盗梦空间》")
print(type(result))  # <class '__main__.MovieReview'>
print(result.title)      # "Inception"
print(result.rating)     # 8.8
print(result.recommended)  # True

优势：

使用模型原生的 Function Calling / JSON Mode，可靠性更高
无需在提示中写复杂的格式说明
省去解析步骤，直接返回 Pydantic 对象

何时还用传统 Parser：

使用不支持结构化输出的模型（如本地小模型）
需要复杂的后处理逻辑（如从混合文本中提取 JSON）
需要容错修复机制

八、自定义解析器

继承 BaseOutputParser 实现专属逻辑。

from langchain_core.output_parsers import BaseOutputParser
from typing import Any
import re
import json

class MarkdownJsonExtractor(BaseOutputParser):
    """从 Markdown 代码块中提取 JSON"""
    
    def parse(self, text: str) -> Any:
        # 匹配 ```json ... ```块
        match = re.search(r"```json\s*(.*?)\s*```", text, re.DOTALL)
        if not match:
            raise ValueError(f"未找到 JSON 代码块: {text[:100]}...")
        
        json_str = match.group(1)
        try:
            return json.loads(json_str)
        except json.JSONDecodeError as e:
            raise ValueError(f"JSON 解析失败: {e}")
    
    def get_format_instructions(self) -> str:
        return "请将 JSON 输出包裹在 ```json\n...\n```代码块中。"

# 使用
custom_parser = MarkdownJsonExtractor()

llm_output = """
分析完成！以下是结果：
```json
{
  "sentiment": "positive",
  "confidence": 0.95,
  "keywords": ["质量", "服务"]
}

如有疑问请告诉我。
“”"

result = custom_parser.parse(llm_output)
print(result)

输出: {‘sentiment’: ‘positive’, ‘confidence’: 0.95, ‘keywords’: [‘质量’, ‘服务’]}


---

## 九、选型指南

| 场景 | 推荐方案 | 理由 |
|------|----------|------|
| 简单文本问答 | `StrOutputParser` | 最轻量 |
| 逗号分隔标签 | `CommaSeparatedListOutputParser` | 一行代码搞定 |
| 需要类型安全 | `PydanticOutputParser` | 自动校验 + IDE 提示 |
| 快速原型 | `JsonOutputParser` | 无需定义模型 |
| 生产环境高可靠 | `with_structured_output()` + `RetryWithErrorOutputParser` | 双重保障 |
| 老旧模型/本地模型 | `PydanticOutputParser` + `OutputFixingParser` | 容错修复 |
| 混合格式提取 | 自定义 `BaseOutputParser` | 完全可控 |

---

## 十、完整对比总结

| 解析器 | 输入 | 输出 | 自动格式说明 | 容错能力 | 2026 推荐度 |
|--------|------|------|-------------|---------|------------|
| `StrOutputParser` | AIMessage | `str` | ❌ | ❌ | ⭐⭐⭐ |
| `CommaSeparatedListOutputParser` | str | `list` | ✅ | ❌ | ⭐⭐⭐ |
| `JsonOutputParser` | str | `dict` | ✅ | ❌ | ⭐⭐ |
| `PydanticOutputParser` | str | Pydantic 对象 | ✅ | ❌ | ⭐⭐⭐⭐ |
| `DatetimeOutputParser` | str | `datetime` | ✅ | ❌ | ⭐⭐ |
| `EnumOutputParser` | str | Enum | ✅ | ❌ | ⭐⭐ |
| `XMLOutputParser` | str | `dict` | ✅ | ❌ | ⭐ |
| `OutputFixingParser` | str | 同包装解析器 | 同包装 | ✅ | ⭐⭐⭐⭐ |
| `RetryWithErrorOutputParser` | str + prompt | 同包装解析器 | 同包装 | ✅✅ | ⭐⭐⭐⭐⭐ |
| `with_structured_output()` | - | Pydantic 对象 | 原生支持 | 模型级 | ⭐⭐⭐⭐⭐ |

---


 **核心原则**：2026 年的 LangChain 开发，**能用 `with_structured_output()` 就不用传统 Parser**；必须用 Parser 时，**生产环境务必加一层容错解析器**（`RetryWithErrorOutputParser`）[^1^][^2^][^5^]。

AtomGit开源社区

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念，把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起，为开发者提供从开发、训练到部署的一站式体验。

更多推荐

2026山东大学软件学院项目实训（七）——功能扩展

扩展平台功能：生成应用封面图、下载项目代码包、AI智能选择方案。

AtomGit开源社区

ai-agent超高并发请求（10万级）神器silk详解及在边缘端智能体部署价值预测

Silk：新一代高性能用户态调度运行时 Silk是ClickHouse开源的高性能stackful fiber运行时，旨在替代传统线程池+异步回调模型，面向超高并发、NUMA感知、低延迟和IO密集型场景。其核心定位是现代用户态微内核调度器，包含Fiber执行、用户态调度、负载均衡、NUMA感知等模块。相比传统线程模型，Silk通过轻量级fiber实现同步编程风格下的异步性能，避免了线程切换开销和异