langchain简版教程附案例

luxinfeng666

6392人浏览 · 2023-05-28 20:35:11

luxinfeng666 · 2023-05-28 20:35:11 发布

简介

LangChain是一个开源的应用开发框架。基于该开源框架，我们可以把大模型与各种工具结合从而实现各种功能，比如基本文档的问答，解析网页内容、查询表格数据等。目前支持Python和TypeScript两种编程语言。当前Python框架支持的模型和功能最全面。

Modules

按照官方wiki的描述，可以将Langchain的支持的功能划分为以下几个模块。

Models

该模块主要是集成了多个模型。主要分为三类：

Large Language Models (LLMs)

Large Language Models是指使用深度学习技术训练的大规模语言模型，例如GPT-3、BERT、XLNet等。这些模型具有超过数十亿个参数，可以通过学习大量的文本数据，自动学习自然语言的语法、语义和上下文等知识。这些模型可以用于各种自然语言处理任务，例如文本生成、文本分类、问答系统等。由于这些模型的规模和能力越来越大，它们在自然语言处理领域引起了广泛的关注和应用，同时也带来了一些挑战，例如模型的计算和存储成本、模型的可解释性等问题。这些模型可以作为最近特别火的问答模型的基座。

截至当前，langchain共提供了以下模型的接口。

Integrations — 🦜🔗 LangChain 0.0.176

Chat Models

即问答模型，如chatGPT3.5等，这些可以理解为大语言模型的一种应用。

截至当前，langchain共提供了以下问答模型的接口。

Integrations — 🦜🔗 LangChain 0.0.178

Text Embedding Models

该类模型的作用主要是将单词、短语或文本转换成连续向量空间。转换为向量空间后，我们即可对这些单词、短语或者文本在数学上进行比较和计算。这种比较和计算在自然语言处理和机器学习中经常被用于各种任务，例如文本分类、语义搜索、词语相似性计算等。

Text Embedding Models — 🦜🔗 LangChain 0.0.176

Pompts

第二个模块就是我们熟知的Prompt提示。

langchain本身提供了多种Prompt模板，比如ChatPromptTemplate模板，但是无论使用哪种模板，最终都是将拼接好的一段文字送入大语言模型。举个例子

template="You are a helpful assistant that translates {input_language} to {output_language}."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template="{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

部分源码如下

class SystemMessagePromptTemplate(BaseStringMessagePromptTemplate):
    def format(self, **kwargs: Any) -> BaseMessage:
        text = self.prompt.format(**kwargs)
        return SystemMessage(content=text, additional_kwargs=self.additional_kwargs)

其实际作用类似于Java中的String.format()

Memory

默认情况下，各类大模型都是无状态的，即没有上下文的概念。如果我们想在对话模型中实现上下文关联功能，就需要对聊天记录进行保存，然后每次请求时将之前的聊天内容传递给对话模型。langchain的Memory模块就是为了便于保存历史记录而设计的。langchain提供了多种保存方式，比如内存、向量数据库等。以下举了一个例子来说明Memory模块的方便性。

from langchain.llms import OpenAI
from langchain.chains import ConversationChain

llm = OpenAI(temperature=0)
conversation = ConversationChain(
    llm=llm, 
    verbose=True, 
    memory=ConversationBufferMemory()
)

conversation.predict(input="Hi there!")

如果要自己实现这部分的话，就需要考虑如何设计存储历史记录，会更麻烦些。

Indexes

索引是指文本文档的一种存储方式，正常的文本文件无法直接与语言模型进行交互，因此我们需要做一下转换。在langchain中，转换的功能就是由Indexes这个模块实现的。

Indexes模块又可继续细分，细分为Document Loaders（文件加载）、Text Splitters（文本切分）、VectorStores（向量存储）、Retrievers（检索）四部分。

在向量存储部分，langchain支持 ElasticSearch、• Redis以及• Tair等我们较为熟悉的数据库（需要IP直连）

当我们需要提取数据时，就可以使用langchain提供的Retrievers模块去进行数据提取。Retrievers模块支持多种与信息检索相关的算法，比如• TF-IDF、• ElasticSearch BM25等，也支持我们常见的额机器学习算法如• kNN、• SVM。

Chains

我们可以把一个任务理解为一个chain，langchain支持我们自定义拼装chain，组成一条任务链，也就是chains。

Agents

Agents我们可以理解为代理人，我们可以理解这个模块可以帮助我们将任务分解，即可以根据用户输入的不同产生不同的任务链（chains）,以下是一个官方案例。

from langchain.chat_models import ChatOpenAI
from langchain.agents import load_tools, initialize_agent, AgentType

llm = ChatOpenAI(temperature=0.0)
tools = load_tools(
    ["arxiv"], 
)

agent_chain = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
)

agent_chain.run(
    "What's the paper 1605.08386 about?",
)

输出为：

> Entering new AgentExecutor chain...
I need to use Arxiv to search for the paper.
Action: Arxiv
Action Input: "1605.08386"
Observation: Published: 2016-05-26
Title: Heat-bath random walks with Markov bases
Authors: Caprice Stanley, Tobias Windisch
Summary: Graphs on lattice points are studied whose edges come from a finite set of
allowed moves of arbitrary length. We show that the diameter of these graphs on
fibers of a fixed integer matrix can be bounded from above by a constant. We
then study the mixing behaviour of heat-bath random walks on these graphs. We
also state explicit conditions on the set of moves so that the heat-bath random
walk, a generalization of the Glauber dynamics, is an expander in fixed
dimension.
Thought:The paper is about heat-bath random walks with Markov bases on graphs of lattice points.
Final Answer: The paper 1605.08386 is about heat-bath random walks with Markov bases on graphs of lattice points.

> Finished chain.

快速上手

安装Python环境

安装Anaconda https://www.anaconda.com/download
安装jupyter notebook https://zhuanlan.zhihu.com/p/33105153
安装langchain pip install langchain.如果pip安装比较慢，可以参考https://www.runoob.com/w3cnote/pip-cn-mirror.html使用国内的镜像源，推荐清华源。

具体案例-实现一个基于本地文本的问答机器人

该案例是在Jupyter Notebook上开发完成的。

安装相关依赖

pip install openai  # 安装openai依赖库，需调用openai的模型
pip install faiss-cpu
pip install tiktoken

import os
from langchain.document_loaders import PyPDFLoader #加载PDF文件需要的工具类
from langchain.vectorstores import FAISS  # 开源向量库
from langchain.embeddings.openai import OpenAIEmbeddings #分词模型
from langchain.prompts import PromptTemplate #langchain提供的prompt工具

os.environ["OPENAI_API_KEY"] = "**********"

# 提取文本文件并存入向量库
loader = PyPDFLoader("/home/ubuntu/langchain/美团简要介绍.pdf")
pages = loader.load_and_split()
faiss_index = FAISS.from_documents(pages, OpenAIEmbeddings())

template = """answer the question base on {history} and {content}"""
prompt_template = PromptTemplate.from_template(template)

def answer_with_content(question):
    docs = faiss_index.similarity_search(question,k=2)
    result = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": prompt_template.format(content=docs, history='')},
      {"role":"user", "content":question}
  ],
    temperature=0.2
    )
    print(result['choices'][0]['message']['content'])

测试输出

answer_with_content("美团什么时候收购的钱袋宝")
# 美团收购钱袋宝是在2016年9月份。

answer_with_content("今天的天气如何")
# 抱歉，我无法回答这个问题，因为我是一个语言模型，没有实时获取天气信息的能力。建议您查看天气预报或者问问当地的气象部门。

answer_with_content("OPENAI是什么公司")
# OpenAI是一家人工智能研究公司，成立于2015年，总部位于美国加利福尼亚州旧金山。该公司的目标是推动人工智能技术的发展，同时确保其对人类的利益和安全不会造成威胁。OpenAI的创始人包括伊隆·马斯克、萨姆·阿尔特曼、格雷戈里·科赫、伊莉莎白·霍姆斯和约翰·赫维兹。该公司的研究重点包括自然语言处理、计算机视觉、强化学习等领域。

GitCode 开源社区

旨在为数千万中国开发者提供一个无缝且高效的云端环境，以支持学习、使用和贡献开源项目。

更多推荐

[转载]在Windows环境下安装GNU Radio

转自：在Windows环境下安装GNURadio_恐弱智_新浪博客GNU Radio是用Python开发的，大部分开源的工程能够在Linux环境下运行良好，而Windows下却运行的很勉强，而且安装配置都很复杂。GNU Radio算是个例外了，不光提供了Windows的二进制安装，还有比较详细的说明。我是Python小白，所以折腾了好久才弄好，特意记录下来，免得以后再装还折腾。GNU Radio的

GitCode 开源社区

centOS 8 使用dnf安装Docker

DNF是什么？CentOS 8使用YUM软件包管理器版本v4.0.4。现在，该版本使用DNF(已删除YUM)。DNF是软件包管理器。它会在Linux发行版上安装，执行更新并删除软件包。使用DNF安装Docker跳过具有损坏依赖性的程序包一个有效的解决方案是使您的CentOS 8系统使用以下--nobest命令安装最符合条件的版本：sudo dnf install docker...

GitCode 开源社区

定时同步数据库表(mysql+linux+crontab)

sync.sh里面的参数需要改变，ip/username/password/database/tablesync.sh#!/bin/sh# Please change the IP and password of the data source db.# Then change the table name.filename=/home/nington/db/$(date +%Y-%m