Langchain-Chatchat 部署实践

LangChain-Chatchat (原 Langchain-ChatGLM),基于 ChatGLM 等大语言模型与 Langchain 等应用框架实现,开源、可离线部署的检索增强生成(RAG)大模型知识库项目。

一种利用 langchain 思想实现的基于本地知识库的问答应用,目标期望建立一套对中文场景与开源模型支持友好、可离线运行的知识库问答解决方案。

受 GanymedeNil 的项目 document.ai 和 AlexZhangji 创建的 ChatGLM-6B Pull Request 启发,建立了全流程可使用开源模型实现的本地知识库问答应用。本项目的最新版本中通过使用 FastChat 接入 Vicuna, Alpaca, LLaMA, Koala, RWKV 等模型,依托于 langchain 框架支持通过基于 FastAPI 提供的 API 调用服务,或使用基于 Streamlit 的 WebUI 进行操作。

依托于本项目支持的开源 LLM 与 Embedding 模型,本项目可实现全部使用开源模型离线私有部署。与此同时,本项目也支持 OpenAI GPT API 的调用,并将在后续持续扩充对各类模型及模型 API 的接入。

一、AutoDL 镜像部署

codewithgpu镜像及安装步骤

二、Docker 镜像部署

以下均以Ubuntu系统为例。

准备NVIDIA驱动及工具包

不需要在主机系统上安装 CUDA 工具包,但需要安装 NVIDIA Driver 以及 NVIDIA Container Toolkit。

检查是否安装NVIDIA Driver、NVIDIA Container Toolkit
nvidia-smi

如果安装了NVIDIA驱动程序,nvidia-smi 命令将显示GPU的状态信息,包括驱动程序版本、CUDA版本等。如果没有安装驱动程序,该命令会报错或者显示不是内部或外部命令。
另一个选项是使用lspci命令来查找NVIDIA GPU的信息:

lspci | grep -i nvidia

如果系统中存在NVIDIA GPU,并且安装了驱动程序,这个命令将输出有关NVIDIA GPU的信息。如果没有输出,说明可能没有安装NVIDIA驱动程序,或者GPU不是NVIDIA制造的。

检查是否安装NVIDIA Container Toolkit

可以使用以下命令:

docker volume ls -q -f driver=nvidia-docker | wc -l

如果安装了NVIDIA Container Toolkit,该命令会返回一个非零数值。如果没有安装,则不会返回任何输出。

安装NVIDIA Container Toolkit

Ubuntu系统使用Apt安装,其他系统请参考:NVIDIA官网

安装工具包

1、Configure the production repository:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Optionally, configure the repository to use experimental packages:

sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

2、Update the packages list from the repository:

sudo apt-get update

3、Install the NVIDIA Container Toolkit packages:

sudo apt-get install -y nvidia-container-toolkit
配置工具包

配置Docker容器

Configure the container runtime by using the nvidia-ctk command:

sudo nvidia-ctk runtime configure --runtime=docker

The nvidia-ctk command modifies the /etc/docker/daemon.json file on the host. The file is updated so that Docker can use the NVIDIA Container Runtime.

Restart the Docker daemon:

sudo systemctl restart docker

Rootless mode
To configure the container runtime for Docker running in Rootless mode, follow these steps:

Configure the container runtime by using the nvidia-ctk command:

nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json

Restart the Rootless Docker daemon:

systemctl --user restart docker

Configure /etc/nvidia-container-runtime/config.toml by using the sudo nvidia-ctk command:

sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place
检查是否安装成功

在安装和配置工具包以及安装NVIDIA GPU驱动程序之后,可以通过运行一个样本工作负载来验证安装。

在Docker容器内跑一个样本工作负载:

Run a sample CUDA container:

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Your output should resemble the following output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10    Driver Version: 535.86.10    CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

在运行样本工作负载时可能出现异常:

Error response from daemon: Get “https://registry-1.docker.io/v2

这是Docker pull拉取镜像报错,大概率是拉取镜像超时。

解决办法:
1、检查daemon.json 文件:

cat /etc/docker/daemon.json

2、若不配置镜像,拉取速度会很慢,因此就会报超时的错误。可以增加中科院的和阿里云的镜像:

{
 "registry-mirrors":[
 "https://6kx4zyno.mirror.aliyuncs.com",
 "https://docker.mirrors.ustc.edu.cn"
 ]
}

3、重启服务

systemctl daemon-reload
systemctl restart docker

部署Docker镜像

Docker镜像

同时支持 DockerHub、阿里云、腾讯云镜像源 :

docker run -d --gpus all -p 80:8501 isafetech/chatchat:0.2.10
docker run -d --gpus all -p 80:8501 ccr.ccs.tencentyun.com/chatchat/chatchat:0.2.10
docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.10

1、该版本镜像大小 50.1GB,使用 v0.2.10,以 nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04 为基础镜像;
2、该版本为正常版本,非轻量化版本;
3、该版本内置并默认启用一个 Embedding 模型:bge-large-zh-v1.5,内置并默认启用 ChatGLM3-6B;
4、该版本目标为方便一键部署使用,请确保您已经在 Linux 发行版上安装了 NVIDIA 驱动程序;
5、请注意,您不需要在主机系统上安装 CUDA 工具包,但需要安装 NVIDIA Driver 以及 NVIDIA Container Toolkit,请参考之前的步骤;
6、首次拉取和启动均需要一定时间,首次启动时请查看日志:

 docker logs -f <container id> 

查看docker容器id的命令:

docker ps

7、如遇到启动过程卡在 Waiting… 步骤,建议使用:

 docker exec -it <container id> 

bash 进入 logs/ 目录查看对应阶段日志。

访问服务

Docker启动成功:

# docker logs -f b8322511da36
2024-05-11 11:58:01,792 - startup.py[line:655] - INFO: 正在启动服务:
2024-05-11 11:58:01,792 - startup.py[line:656] - INFO: 如需查看 llm_api 日志,请前往 /data/model/langchain-chatchat/logs


==============================Langchain-Chatchat Configuration==============================
操作系统:Linux-5.15.0-73-generic-x86_64-with-glibc2.35.
python版本:3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]
项目版本:v0.2.10
langchain版本:0.0.354. fastchat版本:0.2.35


当前使用的分词器:ChineseRecursiveTextSplitter
当前启动的LLM模型:['chatglm3-6b', 'zhipu-api', 'openai-api'] @ cuda
{'device': 'cuda',
 'host': '0.0.0.0',
 'infer_turbo': False,
 'model_path': '/data/model/chatglm3-6b',
 'model_path_exists': True,
 'port': 20002}
{'api_key': '',
 'device': 'auto',
 'host': '0.0.0.0',
 'infer_turbo': False,
 'online_api': True,
 'port': 21001,
 'provider': 'ChatGLMWorker',
 'version': 'glm-4',
 'worker_class': <class 'server.model_workers.zhipu.ChatGLMWorker'>}
{'api_base_url': 'https://api.openai.com/v1',
 'api_key': '',
 'device': 'auto',
 'host': '0.0.0.0',
 'infer_turbo': False,
 'model_name': 'gpt-4',
 'online_api': True,
 'openai_proxy': '',
 'port': 20002}
当前Embbedings模型: bge-large-zh-v1.5 @ cuda
==============================Langchain-Chatchat Configuration==============================


/usr/local/lib/python3.11/dist-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: 模型启动功能将于 Langchain-Chatchat 0.3.x重写,支持更多模式和加速启动,0.2.x中相关功能将废弃
  warn_deprecated(
2024-05-11 11:58:08 | INFO | model_worker | Register to controller
2024-05-11 11:58:08 | ERROR | stderr | INFO:     Started server process [125]
2024-05-11 11:58:08 | ERROR | stderr | INFO:     Waiting for application startup.
2024-05-11 11:58:08 | ERROR | stderr | INFO:     Application startup complete.
2024-05-11 11:58:08 | ERROR | stderr | INFO:     Uvicorn running on http://0.0.0.0:20000 (Press CTRL+C to quit)
2024-05-11 11:58:09 | INFO | model_worker | Loading the model ['chatglm3-6b'] on worker 78f27c14 ...
2024-05-11 11:58:09 | WARNING | transformers_modules.chatglm3-6b.tokenization_chatglm | Setting eos_token is not supported, use the default one.
2024-05-11 11:58:09 | WARNING | transformers_modules.chatglm3-6b.tokenization_chatglm | Setting pad_token is not supported, use the default one.
2024-05-11 11:58:09 | WARNING | transformers_modules.chatglm3-6b.tokenization_chatglm | Setting unk_token is not supported, use the default one.
Loading checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]
Loading checkpoint shards:  14%|█▍        | 1/7 [00:00<00:02,  2.62it/s]
Loading checkpoint shards:  29%|██▊       | 2/7 [00:00<00:01,  2.64it/s]
Loading checkpoint shards:  43%|████▎     | 3/7 [00:01<00:01,  2.75it/s]
Loading checkpoint shards:  57%|█████▋    | 4/7 [00:01<00:01,  2.85it/s]
Loading checkpoint shards:  71%|███████▏  | 5/7 [00:01<00:00,  2.84it/s]
Loading checkpoint shards:  86%|████████▌ | 6/7 [00:02<00:00,  2.86it/s]
Loading checkpoint shards: 100%|██████████| 7/7 [00:02<00:00,  3.20it/s]
Loading checkpoint shards: 100%|██████████| 7/7 [00:02<00:00,  2.95it/s]
2024-05-11 11:58:12 | ERROR | stderr | 
2024-05-11 11:58:16 | INFO | model_worker | Register to controller
INFO:     Started server process [249]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:7861 (Press CTRL+C to quit)

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.


  You can now view your Streamlit app in your browser.

  URL: http://0.0.0.0:8501

以上信息可以看出:
FastApi文档地址:http://0.0.0.0:7861/docs
WebUI地址:http://0.0.0.0:8501

在容器内访问这两个地址:
1、进入容器

docker exec -it b8322511da36 bash

2、测试以上地址

root@b8322511da36:/data/model/langchain-chatchat# curl http://0.0.0.0:7861/docs

    <!DOCTYPE html>
    <html>
    <head>
    <link type="text/css" rel="stylesheet" href="/static-offline-docs/swagger-ui.css">
    <link rel="shortcut icon" href="/static-offline-docs/favicon.png">
    <title>Langchain-Chatchat API Server - Swagger UI</title>
    </head>
    <body>
    <div id="swagger-ui">
    </div>
    <script src="/static-offline-docs/swagger-ui-bundle.js"></script>
    <!-- `SwaggerUIBundle` is now available on the page -->
    <script>
    const ui = SwaggerUIBundle({
        url: '/openapi.json',
    "dom_id": "#swagger-ui",
"layout": "BaseLayout",
"deepLinking": true,
"showExtensions": true,
"showCommonExtensions": true,
oauth2RedirectUrl: window.location.origin + '/docs/oauth2-redirect',
    presets: [
        SwaggerUIBundle.presets.apis,
        SwaggerUIBundle.SwaggerUIStandalonePreset
        ],
    })
    </script>
    </body>
    </html>

root@b8322511da36:/data/model/langchain-chatchat# curl http://0.0.0.0:8501
<!doctype html><html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width,initial-scale=1,shrink-to-fit=no"/><link rel="shortcut icon" href="./favicon.png"/><link rel="preload" href="./static/media/SourceSansPro-Regular.0d69e5ff5e92ac64a0c9.woff2" as="font" type="font/woff2" crossorigin><link rel="preload" href="./static/media/SourceSansPro-SemiBold.abed79cd0df1827e18cf.woff2" as="font" type="font/woff2" crossorigin><link rel="preload" href="./static/media/SourceSansPro-Bold.118dea98980e20a81ced.woff2" as="font" type="font/woff2" crossorigin><title>Streamlit</title><script>window.prerenderReady=!1</script><script defer="defer" src="./static/js/main.3ab8e8d9.js"></script><link href="./static/css/main.77d1c464.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div></body></html>

Docker log显示:

INFO:     127.0.0.1:48198 - "GET /docs HTTP/1.1" 200 OK
INFO:     127.0.0.1:55978 - "GET / HTTP/1.1" 307 Temporary Redirect

可以说明API文档和web应用已正常启动。

下一步要解决的问题是如何在容器外访问成功。

三、常规模式本地部署

敬请期待······

GitHub 加速计划 / la / Langchain-Chatchat
30.39 K
5.33 K
下载
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM 等语言模型的本地知识库问答 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM) QA app with langchain
最近提交(Master分支:1 个月前 )
86d4e825 3 个月前
10eb8e93 3 个月前
Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐