13、LangChain 组件：Structured output（结构化输出）

知1而N

368人浏览 · 2026-03-25 09:26:52

知1而N · 2026-03-25 09:26:52 发布

详细讲解结构化输出的核心概念、两种实现策略（Provider/Tool）、自定义配置及错误处理，帮助开发者快速实现可直接复用的结构化数据输出。

文章目录

一、什么是结构化输出？

结构化输出让 LangChain 代理（Agent）以可预测的固定格式返回数据，无需解析自然语言，直接得到：

📦 JSON 对象
📋 Pydantic 模型（带字段验证）
📝 Python 数据类（dataclass）/ TypedDict

核心优势：

避免繁琐的自然语言解析，减少开发工作量
数据格式统一，降低下游系统集成风险
自带字段验证，提高数据可靠性
支持自动重试机制，处理格式错误

本文聚焦 create_agent 中的结构化输出用法；若需直接在模型层面使用（无 Agent），见 LangChain 模型结构化输出文档。

二、核心响应格式（response_format）

create_agent 通过 response_format 参数控制结构化输出方式，支持 4 种配置：

配置类型	说明	自动选择逻辑
`ProviderStrategy[SchemaT]`	基于模型厂商原生结构化输出（推荐）	模型支持时自动选用
`ToolStrategy[SchemaT]`	基于工具调用模拟结构化输出	模型不支持原生时 fallback
`type[SchemaT]`	直接传入 schema 类型（如 Pydantic 模型）	LangChain 自动判断最优策略
`None`	不启用结构化输出	-

自动选择逻辑（LangChain ≥1.1）：

读取模型的 profile 数据判断是否支持原生结构化输出
支持则使用 ProviderStrategy，否则使用 ToolStrategy
可手动指定模型 profile 覆盖默认判断：

# 手动指定模型支持结构化输出
custom_profile = {"structured_output": True}
model = init_chat_model("your-model-name", profile=custom_profile)

结构化输出最终会存储在 Agent 输出状态的 structured_response 字段中。

三、Provider 策略（原生结构化输出）

3.1 适用场景

模型厂商支持原生结构化输出（无需工具调用）
追求最高可靠性和严格的 schema 校验
不需要同时使用其他工具（或模型支持工具调用+结构化输出并行）

3.2 支持的模型/厂商

OpenAI（GPT-4o、GPT-4 Turbo、GPT-5）
Anthropic (Claude)
xAI (Grok)
Google Gemini

3.3 核心参数

class ProviderStrategy(Generic[SchemaT]):
    schema: type[SchemaT]  # 必需：结构化输出的 schema 定义
    strict: bool | None = None  # 可选：是否启用严格模式（LangChain ≥1.2）

schema：支持 Pydantic 模型、dataclass、TypedDict、JSON Schema
strict：部分厂商支持（OpenAI/xAI），强制严格遵守 schema，默认 None（禁用）

3.4 实战示例（4种 schema 类型）

示例1：Pydantic 模型（推荐，带字段验证）

from pydantic import BaseModel, Field
from langchain.agents import create_agent
from langchain_openai import ChatOpenAI

# 1. 定义结构化输出 schema（Pydantic 模型）
class ContactInfo(BaseModel):
    """联系人信息结构化数据"""
    name: str = Field(description="联系人姓名，必填")
    email: str = Field(description="联系人邮箱，需符合邮箱格式")
    phone: str = Field(description="联系人电话，含区号")

# 2. 初始化模型（支持原生结构化输出）
model = ChatOpenAI(model="gpt-4o", temperature=0)

# 3. 创建 Agent（自动使用 ProviderStrategy）
agent = create_agent(
    model=model,
    tools=[],  # 无额外工具
    response_format=ContactInfo  # 直接传入 schema 类型
)

# 4. 执行 Agent 并获取结构化输出
result = agent.invoke({
    "messages": [
        {"role": "user", "content": "提取联系人信息：张三，邮箱 zhangsan@example.com，电话 010-12345678"}
    ]
})

# 5. 访问结构化结果（Pydantic 实例，支持字段验证）
structured_data = result["structured_response"]
print(f"姓名：{structured_data.name}")
print(f"邮箱：{structured_data.email}")
print(f"电话：{structured_data.phone}")
print(f"数据类型：{type(structured_data)}")  # Info'>

输出结果：

姓名：张三
邮箱：zhangsan@example.com
电话：010-12345678
数据类型：__.ContactInfo'>

示例2：Python Dataclass

from dataclasses import dataclass
from langchain.agents import create_agent
from langchain_anthropic import ChatAnthropic

# 1. 定义 dataclass 类型 schema
@dataclass
class Product:
    """产品信息结构化数据"""
    product_name: str
    price: float
    stock: int

# 2. 初始化 Claude 模型
model = ChatAnthropic(model="claude-3-sonnet-20240229")

# 3. 创建 Agent
agent = create_agent(
    model=model,
    tools=[],
    response_format=Product  # 自动使用 ProviderStrategy
)

# 4. 执行并获取结果（返回 dict 类型）
result = agent.invoke({
    "messages": [{"role": "user", "content": "解析产品：iPhone 15，价格 7999 元，库存 100 台"}]
})

print(result["structured_response"])
# 输出：{'product_name': 'iPhone 15', 'price': 7999.0, 'stock': 100}

示例3：TypedDict

from typing import TypedDict
from langchain.agents import create_agent
from langchain_google_genai import ChatGoogleGenerativeAI

# 1. 定义 TypedDict schema
class EventDetails(TypedDict):
    """活动详情结构化数据"""
    event_name: str
    date: str
    location: str

# 2. 初始化 Gemini 模型
model = ChatGoogleGenerativeAI(model="gemini-pro")

# 3. 创建 Agent
agent = create_agent(
    model=model,
    tools=[],
    response_format=EventDetails
)

# 4. 执行
result = agent.invoke({
    "messages": [{"role": "user", "content": "提取活动：2024 AI 峰会，2024年10月15日，北京国家会议中心"}]
})

print(result["structured_response"])
# 输出：{'event_name': '2024 AI 峰会', 'date': '2024年10月15日', 'location': '北京国家会议中心'}

示例4：JSON Schema

from langchain.agents import create_agent
from langchain_openai import ChatOpenAI

# 1. 定义 JSON Schema
json_schema = {
    "type": "object",
    "properties": {
        "book_title": {"type": "string", "description": "书籍标题"},
        "author": {"type": "string", "description": "作者姓名"},
        "publication_year": {"type": "integer", "description": "出版年份"}
    },
    "required": ["book_title", "author"]
}

# 2. 初始化模型
model = ChatOpenAI(model="gpt-5-preview")

# 3. 创建 Agent
agent = create_agent(
    model=model,
    tools=[],
    response_format=json_schema  # 直接传入 JSON Schema
)

# 4. 执行
result = agent.invoke({
    "messages": [{"role": "user", "content": "解析书籍：《LangChain 实战》，作者 张三，2024年出版"}]
})

print(result["structured_response"])
# 输出：{'book_title': 'LangChain 实战', 'author': '张三', 'publication_year': 2024}

四、Tool 策略（工具调用实现）

4.1 适用场景

模型不支持原生结构化输出（如旧版模型、小众模型）
需要同时使用其他工具（结构化输出 + 工具调用并行）
需自定义错误处理逻辑或工具消息内容

4.2 核心参数

class ToolStrategy(Generic[SchemaT]):
    schema: type[SchemaT]  # 必需：结构化输出 schema
    tool_message_content: str | None = None  # 可选：自定义工具消息内容
    handle_errors: Union[bool, str, type[Exception], tuple[type[Exception], ...], Callable[[Exception], str]] = True  # 可选：错误处理策略

参数	说明	默认值
`schema`	支持 Pydantic 模型、dataclass、TypedDict、JSON Schema、Union 类型	-
`tool_message_content`	结构化输出时的工具消息内容（自定义日志/提示）	默认返回结构化数据
`handle_errors`	错误处理策略（详见 5.3 节）	`True`（捕获所有错误并重试）

4.3 实战示例（含 Union 类型）

from pydantic import BaseModel, Field
from typing import Literal, Union
from langchain.agents import create_agent
from langchain.agents.structured_output import ToolStrategy
from langchain_openai import ChatOpenAI

# 1. 定义多个 schema（支持 Union 类型）
class ProductReview(BaseModel):
    """产品评论分析结果"""
    rating: int | None = Field(description="评分 1-5 分", ge=1, le=5)
    sentiment: Literal["positive", "negative", "neutral"] = Field(description="情感倾向")
    key_points: list[str] = Field(description="核心观点，1-3 个词/短语")

class ServiceFeedback(BaseModel):
    """服务反馈分析结果"""
    satisfaction: Literal["satisfied", "dissatisfied", "average"] = Field(description="满意度")
    suggestions: list[str] = Field(description="改进建议")

# 2. 初始化模型（假设不支持原生结构化输出）
model = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

# 3. 创建 Agent（显式指定 ToolStrategy）
agent = create_agent(
    model=model,
    tools=[],  # 可添加其他工具
    response_format=ToolStrategy(
        schema=Union[ProductReview, ServiceFeedback],  # 支持多 schema 选择
        handle_errors=True  # 启用自动错误重试
    )
)

# 4. 执行（分析产品评论，自动匹配 ProductReview schema）
result = agent.invoke({
    "messages": [
        {"role": "user", "content": "分析评论：这款手机很好用，拍照清晰，电池耐用，打 5 分！"}
    ]
})

# 5. 查看结果
structured_data = result["structured_response"]
print(f"结构化类型：{type(structured_data).__name__}")
print(f"评分：{structured_data.rating}")
print(f"情感：{structured_data.sentiment}")
print(f"核心观点：{structured_data.key_points}")

输出结果：

结构化类型：ProductReview
评分：5
情感：positive
核心观点：['拍照清晰', '电池耐用', '很好用']

4.4 自定义工具消息内容

通过 tool_message_content 自定义结构化输出时的工具日志（默认会显示原始结构化数据）：

from pydantic import BaseModel, Field
from langchain.agents import create_agent
from langchain.agents.structured_output import ToolStrategy

# 定义 schema
class MeetingAction(BaseModel):
    """会议任务提取结果"""
    task: str = Field(description="具体任务")
    assignee: str = Field(description="负责人")
    priority: Literal["low", "medium", "high"] = Field(description="优先级")

# 创建 Agent
agent = create_agent(
    model=ChatOpenAI(model="gpt-4o"),
    tools=[],
    response_format=ToolStrategy(
        schema=MeetingAction,
        tool_message_content="✅ 会议任务已提取并同步至任务管理系统！"  # 自定义工具消息
    )
)

# 执行
result = agent.invoke({
    "messages": [{"role": "user", "content": "提取任务：李四需在周五前完成项目方案，优先级高"}]
})

# 查看对话历史中的工具消息
for msg in result["messages"]:
    if msg["type"] == "tool":
        print(f"工具消息：{msg['content']}")

输出结果：

工具消息：✅ 会议任务已提取并同步至任务管理系统！

五、错误处理机制

Tool 策略（工具调用实现）自带智能重试机制，处理常见结构化输出错误：

5.1 多结构化输出错误

当模型同时返回多个 schema 结果（如 Union 类型时多选），Agent 会自动提示重试：

from pydantic import BaseModel, Field
from typing import Union
from langchain.agents import create_agent
from langchain.agents.structured_output import ToolStrategy

# 定义两个 schema
class ContactInfo(BaseModel):
    name: str = Field(description="姓名")
    email: str = Field(description="邮箱")

class EventDetails(BaseModel):
    event_name: str = Field(description="活动名称")
    date: str = Field(description="活动日期")

# 创建 Agent
agent = create_agent(
    model=ChatOpenAI(model="gpt-3.5-turbo"),
    tools=[],
    response_format=ToolStrategy(Union[ContactInfo, EventDetails])
)

# 执行（模型可能错误返回两个结果）
result = agent.invoke({
    "messages": [{"role": "user", "content": "提取信息：王五（wangwu@example.com）将参加 6 月 1 日的研讨会"}]
})

对话历史输出：

================================ Human Message =================================
提取信息：王五（wangwu@example.com）将参加 6 月 1 日的研讨会

================================== Ai Message ==================================
Tool Calls:
  ContactInfo (call_1)
  Args: {'name': '王五', 'email': 'wangwu@example.com'}
  EventDetails (call_2)
  Args: {'event_name': '研讨会', 'date': '6月1日'}

================================= Tool Message =================================
Error: Model incorrectly returned multiple structured responses (ContactInfo, EventDetails) when only one is expected. Please fix your mistakes.

================================== Ai Message ==================================
Tool Calls:
  ContactInfo (call_3)
  Args: {'name': '王五', 'email': 'wangwu@example.com'}

================================= Tool Message =================================
Returning structured response: {'name': '王五', 'email': 'wangwu@example.com'}

5.2 Schema 验证错误

当输出不符合 schema 规则（如数值超出范围、字段缺失），Agent 会返回具体错误信息并重试：

from pydantic import BaseModel, Field
from langchain.agents import create_agent
from langchain.agents.structured_output import ToolStrategy

# 定义带验证规则的 schema
class ProductRating(BaseModel):
    rating: int | None = Field(description="评分 1-5 分", ge=1, le=5)
    comment: str = Field(description="评论内容")

# 创建 Agent
agent = create_agent(
    model=ChatOpenAI(model="gpt-3.5-turbo"),
    tools=[],
    response_format=ToolStrategy(ProductRating),
    system_prompt="严格按照 schema 输出，评分必须在 1-5 之间"
)

# 执行（输入评分 10，超出范围）
result = agent.invoke({
    "messages": [{"role": "user", "content": "解析评论：这款产品太棒了，打 10 分！"}]
})

对话历史输出：

================================== Ai Message ==================================
Tool Calls:
  ProductRating (call_1)
  Args: {'rating': 10, 'comment': '这款产品太棒了'}

================================= Tool Message =================================
Error: Failed to parse structured output for tool 'ProductRating': 1 validation error for ProductRating.rating
  Input should be less than or equal to 5 [type=less_than_equal, input_value=10, input_type=int]. Please fix your mistakes.

================================== Ai Message ==================================
Tool Calls:
  ProductRating (call_2)
  Args: {'rating': 5, 'comment': '这款产品太棒了'}

================================= Tool Message =================================
Returning structured response: {'rating': 5, 'comment': '这款产品太棒了'}

5.3 自定义错误处理策略

通过 handle_errors 参数自定义错误处理逻辑，支持 5 种配置：

配置1：自定义错误提示语

ToolStrategy(
    schema=ProductRating,
    handle_errors="请提供 1-5 分的评分，并补充评论内容！"  # 固定错误提示
)

配置2：只捕获特定异常

from pydantic import ValidationError

ToolStrategy(
    schema=ProductRating,
    handle_errors=ValidationError  # 只捕获 Pydantic 验证错误，其他错误抛出
)

# 捕获多个异常
ToolStrategy(
    schema=ProductRating,
    handle_errors=(ValidationError, ValueError)
)

配置3：自定义错误处理函数

from langchain.agents.structured_output import (
    StructuredOutputValidationError,
    MultipleStructuredOutputsError
)

def custom_error_handler(error: Exception) -> str:
    """自定义错误处理函数"""
    if isinstance(error, StructuredOutputValidationError):
        return "格式错误！请检查字段类型和范围（评分 1-5 分）"
    elif isinstance(error, MultipleStructuredOutputsError):
        return "只能返回一个结构化结果，请选择最相关的类型"
    else:
        return f"未知错误：{str(error)}，请重试"

# 使用自定义函数
ToolStrategy(
    schema=Union[ContactInfo, EventDetails],
    handle_errors=custom_error_handler
)

配置4：禁用错误处理（直接抛出异常）

ToolStrategy(
    schema=ProductRating,
    handle_errors=False  # 不重试，直接抛出异常
)

六、关键注意事项

模型兼容性：
- Provider 策略仅支持特定模型（OpenAI/Claude/Gemini 等）
- Tool 策略支持所有支持工具调用的模型
工具与结构化输出共存：
- 若 Agent 需同时使用其他工具，确保模型支持「工具调用 + 结构化输出」并行
- 不支持时，LangChain 会自动降级为 Tool 策略
数据类型转换：
- Pydantic 模型返回 Pydantic 实例（支持 .dict() 转为字典）
- dataclass/TypedDict/JSON Schema 返回字典
LangChain 版本要求：
- strict 参数需 LangChain ≥1.2
- 自动 profile 检测需 LangChain ≥1.1