014-国产大模型API封装

牧子川

10人浏览 · 2026-05-17 13:25:13

牧子川 · 2026-05-17 13:25:13 发布

国产大模型 API 封装：一套接口调用智谱、通义千问等国产模型

💡 摘要：通过抽象基类 BaseModelClient 统一不同厂商的 SDK 接口，实现异步聊天调用、费用计算和智能路由推荐，让国产大模型调用像调用统一服务一样简单。

引言

在使用国产大模型时，你是否遇到过这样的痛点：

智谱 GLM 用的是 REST API，通义千问用的是 dashscope SDK，调用方式完全不同
每个模型的定价策略不一样，想对比成本得查各自的价目表
写代码生成任务时用 GLM 效果好，写中文对话时用 Qwen 更自然，手动切换很麻烦

国产大模型的 SDK 接口差异比国际模型更大。有的提供标准 OpenAI 兼容接口，有的需要直接调 REST API，有的用专属 SDK。如果为每个模型写一套调用代码，不仅冗余，还难以维护和对比。

本文将教你如何用抽象基类 + 智能路由的架构，一套接口统一调用智谱 GLM、通义千问等国产模型，还能自动根据任务类型推荐最佳模型。

核心概念

抽象基类统一接口

不同厂商的 SDK 接口千差万别，但我们可以通过抽象基类（Abstract Base Class） 定义统一的行为契约：

from abc import ABC, abstractmethod
from typing import List, Dict

class BaseModelClient(ABC):
    """模型客户端抽象基类"""
    
    @abstractmethod
    async def chat(self, messages: List[Dict]) -> str:
        """异步聊天接口"""
        pass

    @abstractmethod
    def get_cost(self, input_tokens: int, output_tokens: int) -> float:
        """计算费用（人民币）"""
        pass

💡 为什么用抽象基类？
抽象基类强制每个实现类必须提供 chat() 和 get_cost() 方法。这样上层代码只需要依赖 BaseModelClient 接口，不关心底层是哪个模型，实现了解耦。

两种调用方式

国产模型的调用方式差异很大：

智谱 GLM-4：直接调 REST API

import httpx

async with httpx.AsyncClient() as client:
    response = await client.post(
        "https://open.bigmodel.dev/api/paas/v4/chat/completions",
        json={"model": "glm-4", "messages": messages},
        headers={"Authorization": f"Bearer {api_key}"},
        timeout=30
    )
    return response.json()['choices'][0]['message']['content']

通义千问：使用 dashscope SDK

import dashscope

dashscope.api_key = api_key
response = await dashscope.Generation.async_call(
    model="qwen-max",
    messages=messages
)
return response.output.choices[0].message.content

关键差异：

GLM 没有官方异步 SDK，用 httpx 直接调 REST API
Qwen 有官方 SDK dashscope，支持异步调用
响应结构完全不同，需要分别适配

费用计算统一接口

每个模型的定价不同，统一 get_cost() 接口方便成本对比：

模型	输入价格	输出价格
GLM-4	￥0.01/千 tokens	￥0.03/千 tokens
Qwen-Max	￥0.04/千 tokens	￥0.12/千 tokens

# GLM-4 费用计算
def get_cost(self, input_tokens, output_tokens):
    return (input_tokens / 1000 * 0.01) + (output_tokens / 1000 * 0.03)

# Qwen-Max 费用计算
def get_cost(self, input_tokens, output_tokens):
    return (input_tokens / 1000 * 0.04) + (output_tokens / 1000 * 0.12)

原理深入

ModelRouter 智能路由

智能路由根据任务类型自动选择最合适的模型：

class ModelRouter:
    """智能路由：根据任务类型选择最合适的模型"""

    def __init__(self):
        self.clients = {
            'glm': ZhipuGLMClient(os.getenv('ZHIPU_API_KEY')),
            'qwen': TongyiQwenClient(os.getenv('ALIYUN_API_KEY'))
        }
        self.task_model_map = {
            'code': 'glm',       # 代码生成优先 GLM
            'chat': 'qwen',      # 中文对话优先 Qwen
            'analysis': 'glm'    # 数据分析优先 GLM
        }

    async def route(self, task_type: str, messages: List[Dict]) -> tuple:
        """根据任务类型路由到最佳模型"""
        model_key = self.task_model_map.get(task_type, 'qwen')
        client = self.clients[model_key]
        content = await client.chat(messages)
        cost = client.get_cost(100, len(content) * 1.5)
        return content, cost

路由逻辑：

代码生成 → GLM-4（代码能力强，价格便宜）
中文对话 → Qwen-Max（中文理解优秀）
数据分析 → GLM-4（数学推理较好）
默认 → Qwen-Max

httpx 异步 HTTP 的优势

GLM 没有官方异步 SDK，但我们可以用 httpx.AsyncClient 实现异步调用：

async with httpx.AsyncClient() as client:
    response = await client.post(url, json=payload, headers=headers)

相比同步的 requests：

非阻塞：等待 API 响应时不阻塞主线程
并发友好：可以配合 asyncio.gather() 同时调用多个模型
连接池：自动管理 TCP 连接，减少握手开销

代码示例

完整实现

from abc import ABC, abstractmethod
from typing import List, Dict
import asyncio
import os
import httpx
import dashscope

class BaseModelClient(ABC):
    """模型客户端抽象基类"""
    
    @abstractmethod
    async def chat(self, messages: List[Dict]) -> str:
        """异步聊天接口"""
        pass

    @abstractmethod
    def get_cost(self, input_tokens: int, output_tokens: int) -> float:
        """计算费用（人民币）"""
        pass

class ZhipuGLMClient(BaseModelClient):
    """智谱 GLM-4 客户端"""

    def __init__(self, api_key: str):
        self.api_key = api_key
        self.base_url = "https://open.bigmodel.dev/api/paas/v4/chat/completions"

    async def chat(self, messages: List[Dict]) -> str:
        """使用 httpx 异步调用 REST API"""
        payload = {
            "model": "glm-4",
            "messages": messages
        }
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json"
        }

        async with httpx.AsyncClient() as client:
            response = await client.post(
                self.base_url,
                json=payload,
                headers=headers,
                timeout=30
            )
            result = response.json()
            return result['choices'][0]['message']['content']

    def get_cost(self, input_tokens: int, output_tokens: int) -> float:
        # GLM-4 价格：￥0.01/千 tokens（输入）+ ￥0.03/千 tokens（输出）
        return (input_tokens / 1000 * 0.01) + (output_tokens / 1000 * 0.03)

class TongyiQwenClient(BaseModelClient):
    """通义千问客户端"""

    def __init__(self, api_key: str):
        self.api_key = api_key

    async def chat(self, messages: List[Dict]) -> str:
        """使用 dashscope SDK 异步调用"""
        dashscope.api_key = self.api_key
        response = await dashscope.Generation.async_call(
            model="qwen-max",
            messages=messages
        )
        return response.output.choices[0].message.content

    def get_cost(self, input_tokens: int, output_tokens: int) -> float:
        # Qwen-Max 价格：￥0.04/千 tokens（输入）+ ￥0.12/千 tokens（输出）
        return (input_tokens / 1000 * 0.04) + (output_tokens / 1000 * 0.12)

智能路由使用示例

class ModelRouter:
    """智能路由：根据任务类型选择最合适的模型"""

    def __init__(self):
        self.clients = {
            'glm': ZhipuGLMClient(os.getenv('ZHIPU_API_KEY')),
            'qwen': TongyiQwenClient(os.getenv('ALIYUN_API_KEY'))
        }
        self.task_model_map = {
            'code': 'glm',
            'chat': 'qwen',
            'analysis': 'glm'
        }

    async def route(self, task_type: str, messages: List[Dict]) -> tuple:
        """根据任务类型路由到最佳模型，返回 (content, cost)"""
        model_key = self.task_model_map.get(task_type, 'qwen')
        client = self.clients[model_key]

        content = await client.chat(messages)
        estimated_tokens = len(content) * 1.5  # 粗略估算 token 数
        cost = client.get_cost(100, estimated_tokens)

        return content, cost

# 使用示例
async def main():
    router = ModelRouter()

    # 代码生成任务 → 自动路由到 GLM
    code_msg = [{"role": "user", "content": "用 Python 写一个二叉树遍历"}]
    content, cost = await router.route('code', code_msg)
    print(f"模型回复：{content[:200]}...")
    print(f"预计费用：￥{cost:.4f}")

    # 中文对话任务 → 自动路由到 Qwen
    chat_msg = [{"role": "user", "content": "写一首关于春天的诗"}]
    content, cost = await router.route('chat', chat_msg)
    print(f"模型回复：{content[:200]}...")
    print(f"预计费用：￥{cost:.4f}")

asyncio.run(main())

实战应用

如何扩展更多国产模型？

只需继承 BaseModelClient 并注册到路由：

class MoonshotClient(BaseModelClient):
    """月之暗面 Kimi 客户端"""

    def __init__(self, api_key: str):
        self.api_key = api_key
        self.client = AsyncOpenAI(
            api_key=api_key,
            base_url="https://api.moonshot.cn/v1"  # OpenAI 兼容接口
        )

    async def chat(self, messages: List[Dict]) -> str:
        response = await self.client.chat.completions.create(
            model="moonshot-v1-8k",
            messages=messages
        )
        return response.choices[0].message.content

    def get_cost(self, input_tokens, output_tokens):
        return (input_tokens / 1000 * 0.012) + (output_tokens / 1000 * 0.012)

# 注册到路由
self.clients['moonshot'] = MoonshotClient(os.getenv('MOONSHOT_API_KEY'))
self.task_model_map['long_context'] = 'moonshot'  # 长文本优先 Kimi

添加异常处理

async def chat(self, messages: List[Dict]) -> str:
    try:
        async with httpx.AsyncClient() as client:
            response = await client.post(...)
            response.raise_for_status()
            return response.json()['choices'][0]['message']['content']
    except httpx.TimeoutException:
        raise RuntimeError("GLM API 请求超时")
    except KeyError:
        raise RuntimeError("GLM API 响应格式异常")

最佳实践

API Key 从环境变量读取：使用 os.getenv() 避免硬编码敏感信息
抽象基类定义契约：所有模型客户端实现同一接口，上层代码不关心具体实现
异步调用减少延迟：使用 async/await 替代同步请求，配合 asyncio.gather() 并发调用
统一费用计算接口：便于成本对比和优化，选择性价比最高的模型
智能路由按任务推荐：代码生成优先 GLM（便宜+代码强），中文对话优先 Qwen
REST API 兜底：没有官方 SDK 的模型，用 httpx 直接调 REST API

总结

国产大模型 API 封装的核心要点：

抽象基类统一接口：BaseModelClient 定义 chat() 和 get_cost() 两个抽象方法
两种调用方式：有 SDK 的用 SDK（如 dashscope），没有的用 httpx 调 REST API
费用计算统一：每个模型实现 get_cost() 方法，便于成本对比
智能路由：根据任务类型自动选择最合适的模型，降低成本、提升效果
扩展简单：新增模型只需继承基类、注册到路由即可

掌握了国产大模型 API 封装，就能用统一的接口调用智谱、通义千问、kimi等国产模型，还能根据任务类型自动选择最佳模型。

AtomGit开源社区

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念，把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起，为开发者提供从开发、训练到部署的一站式体验。

更多推荐

面试题：LangChain Agent 相关全解析——大模型 Agent、AgentExecutor、创建方式、实现思路与领域知识注入

5.1 AgentExecutor 不是 Agent 本身，而是执行器AgentExecutor 可以理解成“运行 Agent 的外壳”。Agent 负责决定下一步要做什么，Tools 负责执行具体动作，而 AgentExecutor 负责循环调度、把工具结果交回模型、判断是否继续、控制最大轮数、处理异常和返回最终结果。5.2 为什么需要 AgentExecutor？如果没有执行器，模型只会告诉你

AtomGit开源社区

AI工具免费用：6大平台邀请裂变+积分攻略（2026持续更新）

AtomGit开源社区

2026年GEO优化公司推荐TOP3权威测评：哪家公司能真正撑起品牌AI可见性？

这意味着，GEO优化不再是一个"工具采购"问题，而是一个"公司选择"问题——选对一家有技术自研能力、有行业深度积累、有长期稳定经营记录的GEO优化公司，直接决定了品牌在AI时代的可见性和话语权。这意味着，传声港对媒体生态的理解、对企业需求的洞察、对合规边界的把握，都经过了长期实战验证。在GEO优化领域，合规性是企业的生命线。与传统"关键词堆砌"的优化思路不同，传新社的三级匹配模型从用户真实意图出发