Llama3本地部署实现模型对话

ChenDuBr

3927人浏览 · 2024-04-19 14:30:08

ChenDuBr · 2024-04-19 14:30:08 发布

1. 从github下载目录文件

https://github.com/meta-llama/llama3
使用git下载或者直接从github项目地址下载压缩包文件

git clone https://github.com/meta-llama/llama3.git

2.申请模型下载链接

到Meta Llama website填写表格申请,国家貌似得填写外国,组织随便填写即可
在这里插入图片描述

3.安装依赖

在Llama3最高级目录执行以下命令(建议在安装了python的conda环境下执行)

pip install -e .

4.下载模型

执行以下命令

bash download.sh

根据提示输出邮件里的链接,选择想要的模型,我这里选的是8B-instruct,注意要确保自己的显存足够模型推理

在这里插入图片描述开始下载之后要等待一段时间才能下载完成

5. 运行示例脚本

执行以下命令:

torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir Meta-Llama-3-8B-Instruct/ \
    --tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model \
    --max_seq_len 512 --max_batch_size 6

有以下一些对话的示例输出
在这里插入图片描述

运行自己的对话脚本

在主目录下创建以下chat.py脚本

# Copyright (c) Meta Platforms, Inc. and affiliates.
# This software may be used and distributed in accordance with the terms of the Llama 3 Community License Agreement.

from typing import List, Optional

import fire

from llama import Dialog, Llama


def main(
    ckpt_dir: str,
    tokenizer_path: str,
    temperature: float = 0.6,
    top_p: float = 0.9,
    max_seq_len: int = 512,
    max_batch_size: int = 4,
    max_gen_len: Optional[int] = None,
):
    """
    Examples to run with the models finetuned for chat. Prompts correspond of chat
    turns between the user and assistant with the final one always being the user.

    An optional system prompt at the beginning to control how the model should respond
    is also supported.

    The context window of llama3 models is 8192 tokens, so `max_seq_len` needs to be <= 8192.

    `max_gen_len` is optional because finetuned models are able to stop generations naturally.
    """
    generator = Llama.build(
        ckpt_dir=ckpt_dir,
        tokenizer_path=tokenizer_path,
        max_seq_len=max_seq_len,
        max_batch_size=max_batch_size,
    )

    # Modify the dialogs list to only include user inputs
    dialogs: List[Dialog] = [
        [{"role": "user", "content": ""}],  # Initialize with an empty user input
    ]

    # Start the conversation loop
    while True:
        # Get user input
        user_input = input("You: ")
        
        # Exit loop if user inputs 'exit'
        if user_input.lower() == 'exit':
            break
        
        # Append user input to the dialogs list
        dialogs[0][0]["content"] = user_input

        # Use the generator to get model response
        result = generator.chat_completion(
            dialogs,
            max_gen_len=max_gen_len,
            temperature=temperature,
            top_p=top_p,
        )[0]

        # Print model response
        print(f"Model: {result['generation']['content']}")

if __name__ == "__main__":
    fire.Fire(main)

运行以下命令就可以开始对话辣:

torchrun --nproc_per_node 1 chat.py     --ckpt_dir Meta-Llama-3-8B-Instruct/     --tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model     --max_seq_len 512 --max_batch_size 6

在这里插入图片描述

实现多轮对话

Mchat.py脚本如下:

from typing import List, Optional

import fire

from llama import Dialog, Llama

def main(
    ckpt_dir: str,
    tokenizer_path: str,
    temperature: float = 0.6,
    top_p: float = 0.9,
    max_seq_len: int = 512,
    max_batch_size: int = 4,
    max_gen_len: Optional[int] = None,
):
    """
    Run chat models finetuned for multi-turn conversation. Prompts should include all previous turns,
    with the last one always being the user's.

    The context window of llama3 models is 8192 tokens, so `max_seq_len` needs to be <= 8192.

    `max_gen_len` is optional because finetuned models are able to stop generations naturally.
    """
    generator = Llama.build(
        ckpt_dir=ckpt_dir,
        tokenizer_path=tokenizer_path,
        max_seq_len=max_seq_len,
        max_batch_size=max_batch_size,
    )

    dialogs: List[Dialog] = [[]]  # Start with an empty dialog

    while True:
        user_input = input("You: ")
        if user_input.lower() == 'exit':
            break

        # Update the dialogs list with the latest user input
        dialogs[0].append({"role": "user", "content": user_input})

        # Generate model response using the current dialog context
        result = generator.chat_completion(
            dialogs,
            max_gen_len=max_gen_len,
            temperature=temperature,
            top_p=top_p,
        )[0]

        # Print model response and add it to the dialog
        model_response = result['generation']['content']
        print(f"Model: {model_response}")
        dialogs[0].append({"role": "system", "content": model_response})

if __name__ == "__main__":
    fire.Fire(main)

运行以下命令就可以开始多轮对话辣:
调整--max_seq_len 512 的值可以增加多轮对话模型的记忆长度,不过需要注意的是这可能会增加模型运算的时间和内存需求。

torchrun --nproc_per_node 1 Mchat.py     --ckpt_dir Meta-Llama-3-8B-Instruct/     --tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model     --max_seq_len 512 --max_batch_size 6

在这里插入图片描述

开始好好玩耍吧 !

GitCode 开源社区

旨在为数千万中国开发者提供一个无缝且高效的云端环境，以支持学习、使用和贡献开源项目。

更多推荐

[转载]在Windows环境下安装GNU Radio

转自：在Windows环境下安装GNURadio_恐弱智_新浪博客GNU Radio是用Python开发的，大部分开源的工程能够在Linux环境下运行良好，而Windows下却运行的很勉强，而且安装配置都很复杂。GNU Radio算是个例外了，不光提供了Windows的二进制安装，还有比较详细的说明。我是Python小白，所以折腾了好久才弄好，特意记录下来，免得以后再装还折腾。GNU Radio的

GitCode 开源社区

centOS 8 使用dnf安装Docker

DNF是什么？CentOS 8使用YUM软件包管理器版本v4.0.4。现在，该版本使用DNF(已删除YUM)。DNF是软件包管理器。它会在Linux发行版上安装，执行更新并删除软件包。使用DNF安装Docker跳过具有损坏依赖性的程序包一个有效的解决方案是使您的CentOS 8系统使用以下--nobest命令安装最符合条件的版本：sudo dnf install docker...

GitCode 开源社区

定时同步数据库表(mysql+linux+crontab)

sync.sh里面的参数需要改变，ip/username/password/database/tablesync.sh#!/bin/sh# Please change the IP and password of the data source db.# Then change the table name.filename=/home/nington/db/$(date +%Y-%m