Chatglm3-6B Chat/Base保姆级本地部署以及工具调用

保姆级本地部署，金牌讲师级工具调用--基于全球10B以下最强LLM模型ChatGLM3-6B

文章共6,324字 · 阅读需要大约22分钟

一键AI生成摘要，助你高效阅读

问答

八荒、

3549人浏览 · 2024-01-12 00:21:02

八荒、 · 2024-01-12 00:21:02 发布

国产大语言模型之光—Chatglm3-6B Chat/Base保姆级本地部署以及工具调用

开发背景

随着2024的到来，2023已经成为了过去式，LLM的时代在此开启了，本博主是一名软件工程的大二学生，趁着寒假的时间，对于在大二上接触LLM一些应用进行了总结，向大家分享一下。
为官方进行宣传一波，最新的ChatGLM4将于2024.1.16进行公布，届时小编会跟踪最新消息，为大家提供最新的技术支持。
如果本博客有什么地方说的不对，请大家进行批评指正，谢谢大家
本博主大语言模型系列如下：
国产大模型LLM 魔搭社区|阿里云服务器部署
0基础快速上手—大语言模型微调（web端）
0基础快速上手—大语言模型微调（shell版）

硬件支持

对于部署阶段需要你拥有一张强力的显卡，本人使用的是RTX4090，模型推理消耗最低GPU显存是6G，推荐大家使用显存大于等于13G的显卡，这样对于显卡不会造成过大的消耗，还可以保证模型的质量

模型权重下载

Chatglm3-6B Chat模型下载地址(可能速度较慢):Hugging Face官网
Chatglm3-6B Chat模型下载地址：魔搭社区官网
Chatglm3-6B Base模型下载地址(可能速度较慢):Hugging Face官网
Chatglm3-6B Base模型国内下载地址：魔搭社区官网
重要的事说三遍
一定要注意对于模型的全部文件进行下载
一定要注意对于模型的全部文件进行下载
一定要注意对于模型的全部文件进行下载

模型部署

接下来开始对于模型的部署与简单的推理使用

代码的准备

git clone https://github.com/THUDM/ChatGLM3
cd ChatGLM3

依赖的安装

pip install -r requirements.txt

其中对于torch的安装本人使用的是torch2.0.1+cuda11.8，一定要注意torch与自己安装cuda的兼容性，否则在运行时候会出现no torch这个报错
pytorch官方下载链接

模型的加载

tokenizer = AutoTokenizer.from_pretrained("MODEL_PATH", trust_remote_code=True)
model = AutoModel.from_pretrained("MODEL_PATH", trust_remote_code=True)

注意的是，MODEL_PATH 一定对应自己下载完事模型之后的路径
chatglm3-6b模型进行推理经过测试至少需要使用13.1G显存。如果显存不够的话，可以开启量化，经过测试对于模型进行int4的量化需要的显存是6G。
需要进行开启模型量化请使用下面的代码进行模型的加载

tokenizer = AutoTokenizer.from_pretrained("MODEL_PATH", trust_remote_code=True)
model = AutoModel.from_pretrained("MODEL_PATH", trust_remote_code=True).quantize(4).cuda()

模型的推理

单次对话模型调用

对于chatglm-chat模型的调用

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True, device='cuda')
model = model.eval()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
#你好👋!我是人工智能助手ChatGLM3 - 6B, 很高兴见到你, 欢迎问我任何问题。
response, history = model.chat(tokenizer, "辽宁的省会是哪里？", history=history)
print(response)

对于chatglm-base模型的调用

>> > from transformers import AutoTokenizer, AutoModel
>> > tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
>> > model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True, device='cuda')
>> > model = model.eval()
>> > response, history = model.chat(tokenizer, "湖北的省会是武汉，辽宁的省会->", history=[])
>> > print(response)

命令行版模型多轮对话

python cli_demo.py

调用cli_demo.py文件，可以在命令行上面调用

网页版模型多轮对话

python web_demo_gradio.py

web_demo_gradio文件，可以基于gradio在网页上面进行模型的多轮对话

API本地模型的调用

在openai_api_demo这个包下面，openai_api.py对于本地模型进行加载

tokenizer = AutoTokenizer.from_pretrained("MODAL-PATH", trust_remote_code=True)
model = AutoModel.from_pretrained("MODAL-PATH", trust_remote_code=True).
# load Embedding
embedding_model = SentenceTransformer(EMBEDDING_PATH, device="cuda")
uvicorn.run(app, host='0.0.0.0', port=8000, workers=1)

MODAL-PATH对应本地模型的路径
embedding可以在调用api时候时模型可以挂载数据库向量，达到更好的效果
本地模型在http://localhost:8000（port）上面运行
API的json请求格式与chatgpt的api请求格式一样。

input="hello"
openai.api_base = "http://localhost:8000/v1"
openai.api_key = ""
response = client.chat.completions.create(
        model="chatglm3-6b",
        messages=[
        {"role": "system",
        "content": "你是一个强大的搜索引擎，请你根据给定的搜索问题以及关于这个问题的多条搜索结果摘要，生成正确且丰富的答案。"}, 
        #系统题词，content相当于系统的prompt
         {"role": "user", 
         "content": input},
         #用户输入，content后面为输入的内容],
        tools=tools,
        #可以进行工具的调用，这个功能会在后续函数调用模块里面进行讲述
        tool_choice="auto",
        #自动进行工具调用
        max_tokens=2048
        #每次输入和输入的最大字数
    )
if response:
        content = response.choices[0].message.content
        #对于response进行json格式的解析，提取大语言模型返回的结果
        print(content)
    else:
        print("Error:", response.status_code)

对于模型可以部署在公网上进行api服务的访问，博主会在后续进行更新

模型的工具调用

使用情景

大语言模型一旦训练结束，它的记忆周期就到此为止，对于从训练之后发生的最简单的事对于他们来说十分困难，怎么想要模型与时俱进或者可以更好帮助我们达到功能呢？工具调用就是答案。

tools_using_demo(不使用langchain)

进行函数的工具注册

博主简单对于查询实时天气预报功能调用为例，以下所有的示例都是以查询天气为准，针对不同的环境准备了两种方法
其中tool_using查询天气的参数只能是英文
langchain_demo 查询天气参数可以是中文/英文

在tools_using_demo.py 进行函数的工具注册

@register_tool
def get_weather(
        city_name: Annotated[str, 'The name of the city to be queried', True],
) -> str:
    """
    Get the current weather for `city_name`
    """
    import requests

    url = "https://api.openweathermap.org/data/2.5/weather"
    place = city_name
    key = "XXX" #在官网进行api的注册
    param = {
        "q": place,
        "appid": key,
        "units": "metric",
        "lang": "zh_cn"
    }

    response = requests.get(url, params=param)
    if response.status_code == 200:
        data = response.json()
        # print(data)
        temperature = data['main']['temp']
        humidity = data['main']['humidity']
        description = data['weather'][0]['description']
        wind_speed = data['wind']['speed']
        pressure = data['main']['pressure']
        ret=f"{place}当前温度为 {temperature}℃ 湿度: {humidity}% 天气状况: {description} 风速: {wind_speed} 米/秒 气压: {pressure} hPa"
        #print(f"{place}当前温度为 {temperature}℃")
        #print(f"湿度: {humidity}%")
        #print(f"天气状况: {description}")
        #print(f"风速: {wind_speed} 米/秒")
        #print(f"气压: {pressure} hPa")
    else:
        ret="Unable to fetch weather data."    
    return ret

调用api时候调用工具

from tool_register import get_tools, dispatch_tool

tools = get_tools()

#将api调用函数的方法进行封装在一个方法中
def run_conversation(query: str, stream=False, tools=None, max_retry=5)->str:
    params = dict(model="chatglm3", messages=[{"role": "user", "content": query}], stream=stream)
    if tools:
        params["tools"] = tools
    response = client.chat.completions.create(**params)

    for _ in range(max_retry):
        if not stream:
            if response.choices[0].message.function_call:
                function_call = response.choices[0].message.function_call
                logger.info(f"Function Call Response: {function_call.model_dump()}")

                function_args = json.loads(function_call.arguments)
                tool_response = dispatch_tool(function_call.name, function_args)
                logger.info(f"Tool Call Response: {tool_response}")

                params["messages"].append(response.choices[0].message)
                params["messages"].append(
                    {
                        "role": "function",
                        "name": function_call.name,
                        "content": tool_response,  # 调用函数返回结果
                    }
                )
            else:
                reply = response.choices[0].message.content
                logger.info(f"Final Reply: \n{reply}")
                return

        else:
            output = ""
            for chunk in response:
                content = chunk.choices[0].delta.content or ""
                print(Fore.BLUE + content, end="", flush=True)
                output += content

                if chunk.choices[0].finish_reason == "stop":
                    return

                elif chunk.choices[0].finish_reason == "function_call":
                    print("\n")

                    function_call = chunk.choices[0].delta.function_call
                    logger.info(f"Function Call Response: {function_call.model_dump()}")

                    function_args = json.loads(function_call.arguments)
                    tool_response = dispatch_tool(function_call.name, function_args)
                    logger.info(f"Tool Call Response: {tool_response}")

                    params["messages"].append(
                        {
                            "role": "assistant",
                            "content": output
                        }
                    )
                    params["messages"].append(
                        {
                            "role": "function",
                            "name": function_call.name,
                            "content": tool_response,
                        }
                    )

                    break

        response = client.chat.completions.create(**params)
        content = response.choices[0].message.content
        return content
        
        
query = "帮我查询shenyang的天气怎么样"
result=run_conversation(query, tools=tools, stream=True)
print(result)
# shenyang当前温度为 -4.99℃ 湿度: 42% 天气状况: 晴 风速: 2 米/秒 气压: 1020 hPa

langchain_demo(使用langchain)

创建查询天气类

在tools文件夹下面创建queryweather类

import requests
from langchain.tools import BaseTool


class queryweather(BaseTool):
    name = "weather"
    description = "Use for searching weather at a specific location"

    def __init__(self):
        super().__init__()

    def get_weather(self, location):
         api_key = "XXX" #需要在官网进行获取
         url = f"https://api.seniverse.com/v3/weather/now.json?key={api_key}&location={location}&language=zh-Hans&unit=c"
         response = requests.get(url)
         if response.status_code == 200:
             data = response.json()
             #print(data)
             weather = {
                 "所在地：":data["results"][0]["location"]["name"],
                 "温度为：": data["results"][0]["now"]["temperature"],
                 "天气描述为": data["results"][0]["now"]["text"],
             }
             return str(weather)
         else:
             raise Exception(
                 f"Failed to retrieve weather: {response.status_code}")
       

    def _run(self, para: str) -> str:
        return self.get_weather(para)


if __name__ == "__main__":
    weather_tool = queryweather()
    weather_info = weather_tool.run("沈阳")
    print(weather_info)
    #{'所在地：': '沈阳', '温度为：': '-4', '天气描述为': '晴'}

加载模型路径

在ChatGLM3.py中进行加载模型的底层实现

def load_model(self, model_name_or_path=None):
        model_config = AutoConfig.from_pretrained(
            model_name_or_path,
            trust_remote_code=True
        )
        self.tokenizer = AutoTokenizer.from_pretrained(
            model_name_or_path,
            trust_remote_code=True
        )
        self.model = AutoModel.from_pretrained(
            model_name_or_path, config=model_config, trust_remote_code=True, device_map="auto").eval()

如果需要量化，再次进行修改代码，量化方法见上文

main.py 加载模型

llm = ChatGLM3()
llm.load_model(MODEL_PATH)

调用工具类

tools = load_tools(["queryweather"], llm=llm)
agent = create_structured_chat_agent(llm=llm, tools=tools, prompt=prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools)
ans = agent_executor.invoke({"input": "帮我查询一下沈阳的天气如何"})
print(ans)
# {'所在地：': '沈阳', '温度为：': '-4', '天气描述为': '晴'}