生成式人工智能的提示词工程(一)
原文:
zh.annas-archive.org/md5/9a1c21a7dbef2177ea53bb1d89740457译者:飞龙
前言
生成式 AI 创新的快速步伐承诺将改变我们的生活和工作的方式,但跟上它的步伐变得越来越困难。在 arXiv 上发表的AI 论文数量正在呈指数增长,Stable Diffusion是历史上增长最快的开源项目之一,AI 艺术工具Midjourney 的 Discord 服务器拥有数千万成员,甚至超过了最大的游戏社区。最吸引公众想象力的是 OpenAI 发布的 ChatGPT,两个月内达到 1 亿用户,使其成为历史上增长最快的消费应用。学会与 AI 合作迅速成为最受欢迎的技能之一。
每个在专业上使用 AI 的人很快就会意识到,输出的质量在很大程度上取决于你提供的输入。提示工程这一学科已经形成了一套最佳实践,用于提高 AI 模型的可靠性、效率和准确性。“在十年内,世界上半数的工作将涉及提示工程”,罗宾·李声称,他是中国科技巨头百度的联合创始人和首席执行官。然而,我们预计提示将成为许多工作所需的一项技能,类似于精通 Microsoft Excel,而不是一个流行的职位名称。这股新的颠覆浪潮正在改变我们关于计算机的所有认知。我们习惯于编写每次都能返回相同结果的算法——但对于 AI 来说并非如此,其响应是非确定性的。成本和延迟再次成为现实因素,几十年来摩尔定律让我们对几乎无成本的实时计算感到自满。最大的障碍是这些模型自信地编造事物,被称为幻觉,这让我们重新思考评估我们工作准确性的方式。
自 2020 年 GPT-3 测试版以来,我们一直在使用生成式 AI,随着我们观察到模型的发展,许多早期的提示技巧和黑客手段变得不再必要。随着时间的推移,一套一致的原则逐渐形成,这些原则在新的模型中仍然有用,并且适用于文本和图像生成。我们根据这些永恒的原则撰写了这本书,帮助您学习可迁移的技能,这些技能在未来五年内无论 AI 如何发展都将有用。与 AI 合作的关键不是“通过在末尾添加一个魔法词来破解提示,从而改变一切”,正如OpenAI 联合创始人山姆·奥特曼断言的那样,但始终重要的是“想法的质量和对所求的理解。”虽然我们不知道五年后是否会称之为“提示工程”,但有效地与生成式 AI 合作将变得更加重要。
本书所需的软件要求
本书中的所有代码都是用 Python 编写的,并设计为在 Jupyter Notebook 或 Google Colab notebook 中运行。书中教授的概念可以转移到 JavaScript 或任何其他编程语言,尽管本书的主要重点是提示技巧而不是传统的编程技能。代码可以在 GitHub 上找到,我们将在整个书中链接到相关的笔记本。强烈建议您使用 GitHub 仓库 并在阅读本书时运行提供的示例。
对于非笔记本示例,您可以在终端中使用 python content/chapter_x/script.py 的格式运行脚本,其中 x 是章节号,script.py 是脚本的名称。在某些情况下,需要将 API 密钥设置为环境变量,我们将在适当的地方说明。使用的包会频繁更新,因此在运行代码示例之前,请在虚拟环境中安装我们的 requirements.txt。
requirements.txt 文件是为 Python 3.9 生成的。如果您想使用不同的 Python 版本,您可以从 GitHub 仓库中找到的此 requirements.in 文件生成一个新的 requirements.txt,通过运行以下命令:
`pip install pip-tools`
`pip-compile requirements.in`
对于 Mac 用户:
-
打开终端:您可以在应用程序文件夹中的实用工具下找到终端应用程序,或者使用 Spotlight 搜索它。
-
导航到您的项目文件夹:使用
cd命令将目录切换到您的项目文件夹。例如:cd path/to/your/project。 -
创建虚拟环境:使用以下命令创建名为
venv的虚拟环境(您可以将其命名为任何名称):python3 -m venv venv。 -
激活虚拟环境:在安装包之前,您需要激活虚拟环境。使用命令
source venv/bin/activate来完成此操作。 -
安装包:现在您的虚拟环境已激活,您可以使用
pip安装包。要从 requirements.txt 文件安装包,请使用pip install -r requirements.txt。 -
退出虚拟环境:完成工作后,您可以通过输入
deactivate来退出虚拟环境。
对于 Windows 用户:
-
打开命令提示符:您可以在开始菜单中搜索
cmd。 -
导航到您的项目文件夹:使用
cd命令将目录切换到您的项目文件夹。例如:cd path\to\your\project。 -
创建虚拟环境:使用以下命令创建名为
venv的虚拟环境:python -m venv venv。 -
激活虚拟环境:在 Windows 上激活虚拟环境,请使用
.\venv\Scripts\activate。 -
安装包:在虚拟环境激活状态下,安装所需的包:
pip install -r requirements.txt。 -
退出虚拟环境:要退出虚拟环境,只需输入:
deactivate。
这里有一些关于设置的额外提示:
-
总是确保你的 Python 是最新的,以避免兼容性问题。
-
记得每次在项目上工作时都要激活你的虚拟环境。
-
requirements.txt文件应该位于你创建虚拟环境的同一目录中,或者当你使用pip install -r时,你应该指定其路径。
假设你有 OpenAI 开发者账户,因为你的OPENAI_API_KEY必须在导入 OpenAI 库的任何示例中设置为环境变量,我们使用的是版本 1.0。设置开发环境的快速入门指南可以在 OpenAI 网站上找到的OpenAI 文档中。
你还必须确保你的 OpenAI 账户中启用了计费,并且已附加有效的支付方式以运行书中的一些代码。书中未特别说明时使用 GPT-4,尽管我们简要介绍了 Anthropic 的竞争产品Claude 3 模型,以及 Meta 的开源Llama 3和Google Gemini。
对于图像生成,我们使用Midjourney,你需要一个 Discord 账户来注册,尽管这些原则同样适用于 DALL-E 3(通过 ChatGPT Plus 订阅或通过 API 获得)或 Stable Diffusion(作为API提供,或者如果你的电脑有 GPU,它可以在本地运行Stable Diffusion)。本书中的图像生成示例使用 Midjourney v6,Stable Diffusion v1.5(因为许多扩展仍然只与这个版本兼容),或Stable Diffusion XL,并且当这很重要时,我们会指定差异。
我们尽可能使用开源库提供示例,尽管在适当的情况下我们也包括商业供应商——例如,第五章关于向量数据库的章节展示了开源库 FAISS 和付费供应商 Pinecone。书中展示的示例应该可以轻松修改以适应不同的模型和供应商,并且所教授的技能是可迁移的。第四章关于高级文本生成专注于 LLM 框架 LangChain,而第九章关于高级图像生成基于 AUTOMATIC1111 的开源 Stable Diffusion Web UI。
本书使用的约定
本书使用的以下排版约定:
斜体
表示新术语、URL、电子邮件地址、文件名和文件扩展名。
常宽字体
用于程序列表,以及段落中引用程序元素,如变量或函数名、数据库、数据类型、环境变量、语句和关键字。
常宽粗体
显示用户应逐字输入的命令或其他文本。
常宽斜体
显示应替换为用户提供的值或由上下文确定的值的文本。
小贴士
此元素表示一个提示或建议。
注意
此元素表示一般性说明。
警告
此元素表示警告或注意。
在整本书中,我们强化了我们所说的五个提示原则,确定哪个原则最适合当前示例。您可能想参考第一章,其中详细描述了这些原则。
原则名称
这将解释原则是如何应用于当前示例或文本部分的。
使用代码示例
补充材料(代码示例、练习等)可在https://oreil.ly/prompt-engineering-for-generative-ai下载。
如果您对代码示例有技术问题或使用上的问题,请发送电子邮件至bookquestions@oreilly.com。
本书旨在帮助您完成工作。一般来说,如果本书提供了示例代码,您可以在您的程序和文档中使用它。除非您正在复制代码的很大一部分,否则您不需要联系我们获得许可。例如,编写一个使用本书中几个代码片段的程序不需要许可。通过引用本书并引用示例代码来回答问题不需要许可。将本书的大量示例代码纳入您产品的文档中需要许可。
我们感谢,但通常不需要署名。署名通常包括标题、作者、出版社和 ISBN。例如:“*《生成式 AI 的提示工程》由詹姆斯·菲尼克斯和迈克·泰勒(O’Reilly)著。版权所有 2024 萨克斯弗拉吉,LLC 和 Just Understanding Data LTD,978-1-098-15343-4。””
如果您认为您对代码示例的使用超出了合理使用或上述许可,请随时联系我们permissions@oreilly.com。
O’Reilly 在线学习
注意
40 多年来,O’Reilly Media一直为科技公司提供技术和商业培训、知识和洞察力,以帮助公司成功。
我们独特的专家和创新者网络通过书籍、文章和我们的在线学习平台分享他们的知识和专业知识。O’Reilly 的在线学习平台为您提供按需访问实时培训课程、深入的学习路径、交互式编码环境以及来自 O’Reilly 和 200 多家其他出版商的大量文本和视频。更多信息,请访问https://oreilly.com。
如何联系我们
请将有关本书的评论和问题寄给出版社:
-
O’Reilly Media, Inc.
-
1005 Gravenstein Highway North
-
Sebastopol, CA 95472
-
800-889-8969(美国或加拿大)
-
707-827-7019(国际或本地)
-
707-829-0104(传真)
-
support@oreilly.com
我们为这本书有一个网页,上面列出了勘误表、示例和任何其他附加信息。您可以通过https://oreil.ly/prompt-engineering-generativeAI访问此页面。
想了解我们书籍和课程的相关新闻和信息,请访问https://oreilly.com。
在 LinkedIn 上找到我们:https://linkedin.com/company/oreilly-media。
在 YouTube 上关注我们:https://youtube.com/oreillymedia。
致谢
我们想感谢以下人员在本书技术审阅中的贡献以及他们在纠正快速变化目标时的耐心:
-
Mayo Oshin,早期 LangChain 贡献者,SeinnAI Analytics创始人
-
Ellis Crosby,Scarlett Panda创始人及 AI 代理机构Incremen.to创始人
-
Dave Pawson,O’Reilly 出版社XSL-FO作者
-
Mark Phoenix,高级软件工程师
-
Aditya Goel,GenAI 顾问
-
Sanyam Kumar,Genmab 数据科学部副总监
-
Lakshmanan Sethu,Google Gen AI Solutions 的 TAM
-
Janit Anjaria,Aurora Innovation Inc.员工,TLM
我们也感谢我们的家人对他们耐心和理解,并想向他们保证,我们仍然更喜欢与他们交谈而不是与 ChatGPT。
第一章. 提示的五项原则
提示工程是发现可靠产生有用或所需结果的提示的过程。
提示是您在与 ChatGPT 或 Midjourney 等 AI 模型交互时提供的输入,通常是文本。提示作为模型使用的一组指令,用于预测所需的响应:来自大型语言模型(LLMs)如ChatGPT的文本,或来自扩散模型如Midjourney的图像。
这里是一个产品名称生成器提示输入的简单示例(灵感来源于OpenAI 的示例之一),以及 ChatGPT 生成的结果。
输入:
Can I have a list of product names for a pair of shoes that
can fit any foot size?
输出:
Certainly! Here's a list of product names for a pair of shoes
that can fit any foot size:
UniFit SoleStrides
FlexiSize All-Fit Shoes
OmniFit FootFlex
...
AnyStep AdaptiWalkers
FlexiRange UniversalSteps
PerfectGlide MultiFit Shoes
这已经是对一个天真提示的非凡响应,感觉像魔法,因为我们几乎不费吹灰之力就做到了。随着最先进模型的改进,您在第一次尝试就获得足够好的结果的可能性增加。对于任何与 AI 的临时互动,您不打算再次执行相同任务,天真方法就足够了。
然而,如果您计划将此提示投入生产,您将受益于投入更多精力以确保其正确性。错误会花费您金钱,因为 OpenAI 根据提示和响应的长度以及修复错误所花费的时间来收费。如果您正在构建一个拥有数千用户的名称生成器,有一些明显的问题您会想要尝试解决:
模糊的方向
您没有向 AI 说明您想要的名称风格或它应该具有的属性。您想要一个单词还是组合?单词可以随意创造,还是它们必须是真正的英语?您想要 AI 模仿您钦佩的、以优秀产品名称而闻名的人?
未格式化的输出
您正在逐行返回一个分隔的名称列表,长度不固定。当您多次运行此提示时,您会看到有时它会返回一个编号列表,并且经常在开头有文本,这使得它难以程序化解析。
缺少示例
您没有给出任何关于好名字样子的例子。它正在使用训练数据的平均值自动补全,即整个互联网(及其固有的偏见),但这真的是您想要的吗?理想情况下,您会提供成功名称的例子,行业中的常见名称,或者甚至只是您喜欢的其他名称。
评估有限
您没有一致或可扩展的方式来定义哪些名称是好是坏,因此您必须手动审查每个响应。如果您可以建立评分系统或其他形式的测量,您就可以优化提示以获得更好的结果,并确定它失败了多少次。
没有任务划分
你在这里对单个提示的要求很多:产品命名涉及到许多因素,而这个重要的任务被天真地一次性外包给了 AI,没有任何任务专业化或了解它如何为你处理这个任务的可见性。
解决这些问题是我们在这本书中使用的核心原则的基础。有许多不同的方式可以要求 AI 模型执行相同的任务,即使是微小的变化也可能产生很大的差异。LLMs 通过连续预测下一个标记(大约是四分之三的单词)来工作,从你的提示开始。每个新标记都是根据其出现的概率来选择的,其中包含一定的随机性(由温度参数控制)。如图 1-1 所示,单词鞋子在名称AnyFit开始后的出现概率较低(0.88%),而更可预测的响应将是运动型(72.35%)。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0101.png
图 1-1. 响应如何分解为标记
大型语言模型(LLMs)在本质上训练了整个互联网的文本,然后进一步微调以提供有用的响应。平均提示将返回平均响应,导致一些人当他们的结果没有达到预期时感到失望。你放在提示中的内容会改变每个生成的单词的概率,因此这对你将得到的结果有很大影响。这些模型已经看到了人类产生的最好和最坏的东西,并且如果你知道正确的方式提问,它们几乎可以模仿任何东西。OpenAI 根据提示和响应中使用的标记数量来收费,因此提示工程师需要通过优化提示以成本、质量和可靠性来确保这些标记的价值。
这里是应用了几个提示工程技术的相同示例。我们要求以史蒂夫·乔布斯的方式提供名字,并说明我们想要一个以逗号分隔的列表,并提供完成任务做得好的示例。
输入:
Brainstorm a list of product names for a shoe that fits any
foot size, in the style of Steve Jobs.
Return the results as a comma-separated list, in this format:
Product description: A shoe that fits any foot size
Product names: [list of 3 product names]
## Examples
Product description: A refrigerator that dispenses beer
Product names: iBarFridge, iFridgeBeer, iDrinkBeerFridge
Product description: A watch that can tell accurate time in
space
Product names: iNaut, iSpace, iTime
Product description: A home milkshake maker
Product names: iShake, iSmoothie, iShake Mini
输出:
Product description: A shoe that fits any foot size
Product names: iFitFoot, iPerfectFit, iShoeSize
虽然没有任何提示是完美的,但这个提示被优化以可靠地以正确的格式提供坚实的商品名称。你的产品名称生成器的用户可以选择除史蒂夫·乔布斯之外的人来获取他们喜欢的名字,如果需要,他们可以更改响应格式,并且这个提示的输出可以成为另一个输入。最后,你可以根据用户反馈定期更新你在提示中使用的示例,使你的系统随着时间的推移变得更智能。
提示五原则概述
优化此提示的过程遵循提示的五个原则,我们将使用本章节的其余部分来剖析这些原则,并在整本书中回顾。它们与我们在讨论天真文本提示时提出的五个问题完全对应。您将在整本书的其余部分找到对这些原则的引用,以帮助您了解它们在实际中的应用。提示的五个原则如下:
指明方向
详细描述期望的风格,或参考相关角色
指定格式
定义需要遵循的规则以及响应的所需结构
提供示例
插入一组多样化的测试案例,其中任务执行正确
评估质量
识别错误并评估响应,测试驱动性能的因素。
分工合作
将任务拆分为多个步骤,以实现复杂目标
这些原则不是短暂的技巧或捷径,而是普遍接受的、适用于任何智能水平(生物或人工)的惯例。这些原则是模型无关的,并且无论您使用哪种生成文本或图像模型,都应该有助于改进您的提示。我们首次在 2022 年 7 月发布的博客文章“提示工程:从文字到艺术和文案”中发布这些原则,并且它们经受了时间的考验,包括与 OpenAI 一年后发布的提示工程指南非常接近。任何与生成 AI 模型紧密合作的人可能会汇聚到解决常见问题的相似策略,并在整本书中,您将看到数百个如何使用这些策略来改进提示的示范性例子。
我们为文本和图像生成提供了可下载的单页指南,您可以在应用这些原则时将其作为清单使用。这些指南是为我们流行的 Udemy 课程AI 训练营的完整提示工程(70,000+ 学生)制作的,该课程基于相同的原则,但材料与本书不同。
为了展示这些原则同样适用于提示图像模型,让我们使用以下示例,并解释如何将提示的五个原则应用于这个特定场景。将整个输入提示复制并粘贴到 Discord 的 Midjourney Bot 中,包括开头到图像链接,在键入**/imagine**以触发提示框出现之后(需要免费Discord账户和付费Midjourney账户)。
输入:
https://s.mj.run/TKAsyhNiKmc stock photo of business meeting
of 4 people watching on white MacBook on top of glass-top
table, Panasonic, DC-GH5
图 1-2 显示了输出。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0102.png
图 1-2. 商务会议的股票照片
这个提示利用了 Midjourney 将基础图像作为示例的能力,通过将图像上传到 Discord,然后将 URL 复制粘贴到提示中(https://s.mj.run/TKAsyhNiKmc),这里使用了 Unsplash 的免费图片(图 1-3)。如果你在提示中遇到错误,请尝试自己上传图像并查看 Midjourney 的文档(Midjourney 的文档)以了解任何格式更改。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0103.png
图 1-3. 由 Mimi Thian 在Unsplash拍摄的照片
让我们比较一下这个精心设计的提示与你在以最简单的方式请求股票照片时从 Midjourney 得到的结果。图 1-4 显示了没有提示工程时得到的一个例子,这个图像比通常预期的股票照片风格更暗、更具有风格化。
输入:
people in a business meeting
图 1-4 显示了输出。
尽管在 Midjourney v5 及以后的版本中这个问题不太突出,但社区反馈机制(当用户选择将图像调整到更高分辨率时,这个选择可能会用于训练模型)据报道已经使模型偏向于幻想美学,这对于股票照片用例不太合适。Midjourney 的早期用户来自数字艺术界,自然倾向于幻想和科幻风格,即使这种美学不适合,这种风格也会反映在模型的结果中。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0104.png
图 1-4. 商务会议中的人们
在这本书中使用的示例将与 ChatGPT Plus(GPT-4)作为文本模型和 Midjourney v6 或 Stable Diffusion XL 作为图像模型兼容,尽管如果重要我们会进行说明。这些基础模型是当前最先进的技术,擅长各种任务。原则旨在尽可能保证未来兼容性,所以如果你在 GPT-5、Midjourney v7 或 Stable Diffusion XXL 发布时阅读这本书,或者如果你使用的是其他供应商如 Google,这里学到的一切仍然应该是有用的。
1. 给出方向
之前讨论的简单文本提示的一个问题是,它没有向 AI 说明你想要的产品名称的类型。在某种程度上,给产品命名是一项主观的活动,如果不给 AI 一个你喜欢的名称的想法,它猜对的概率很低。
顺便说一句,没有良好的简报,人类也很难完成这项任务,这就是为什么创意和品牌代理机构需要从客户那里获得任何任务的详细简报。
小贴士
虽然这不是完美的映射,但可以想象人类可能需要什么样的上下文来完成这项任务,并尝试将其包含在提示中。
在我们给出的示例提示中,我们通过使用角色扮演来给出方向,在那个例子中是模仿史蒂夫·乔布斯,他因标志性地为产品命名而闻名。如果你将提示的这个方面改为训练数据中著名的人物(以及将示例与正确的风格匹配),你会得到截然不同的结果。
输入:
Brainstorm a list of product names for a shoe that fits any
foot size, in the style of Elon Musk.
Return the results as a comma-separated list, in this format:
Product description: A shoe that fits any foot size
Product names: [list of 3 product names]
## Examples
Product description: A refrigerator that dispenses beer
Product names: Teslacool, FridgeX, Neuradrink
Product description: A watch that can tell accurate time in
space
Product names: WatchX, iSpace, The Watch Company
Product description: A home milkshake maker
Product names: ShakeX, The Milkshake Company, HyperShake
输出:
ShoeX, MarsFit, The Shoe Company
也有一些规则或最佳实践是你应该遵循的,这些可以包含在提示中作为上下文,以引导 AI 向一个可行的名字发展。这种技术有时被称为预热或内部检索,它简单但有效 (刘等人,2021)。开始对话时请求最佳实践建议,然后要求它遵循自己的建议,这会有很大帮助。实际上,你是在用它来生成自己的方向。
输入:
Please give me 5 tips for naming products based on expert
industry advice.
输出:
Certainly! Naming a product is crucial as it often gives the
first impression to potential customers. Here are five expert
tips to guide you in naming products:
1\. **Simplicity & Memorability**: Opt for names that are
short, easy to spell, and memorable. Long, complicated names
can be difficult for customers to recall. Remember, the
easier the name is to remember, the more likely it is that
consumers will think of your product when making a purchase
decision.
...
5\. **Positive Connotations**: Names that evoke positive
emotions or imagery can create a favorable perception of
your product. Avoid names that have negative or
controversial associations in any of the primary languages
or cultures where you intend to do business. It's often
worthwhile to conduct linguistic checks and market testing
to ensure there aren't unintended meanings.
Lastly, once you've shortlisted a few names, it's a good
idea to run them by diverse focus groups or conduct surveys
to gauge reactions. Also, ensure that the name you choose is
available for trademarking and doesn't infringe on existing
trademarks.
然后在同一个聊天窗口中,模型拥有它之前给出的建议的上下文,你提出你想要完成的任务的初始提示。
输入:
Using this advice, can I have a list of product names for a
pair of shoes that can fit any foot size?
输出:
Certainly! Based on the criteria given:
1\. **Simplicity & Memorability**
2\. **Relevance & Descriptiveness**
3\. **Differentiation & Uniqueness**
4\. **Adaptable & Scalable**
5\. **Positive Connotations**
Here's a list of potential product names for shoes that can
fit any foot size:
1\. **FitAll Walkers**: A simple and memorable name that
instantly suggests a universal fit.
...
10\. **TrueFit Universals**: Emphasizing genuine comfort and
fit for all.
Before finalizing any name, remember to check for trademarks
to ensure there's no conflict with existing products or
brands.
在我们的经验中,另一个富有成效的策略是将你想要完成的任务的最佳建议提取出来,并将其上下文插入到提示中。例如,你可以采用Brandwatch 的 5 条黄金法则来命名产品或你找到的另一个可信的外部资源,并将其作为上下文插入到提示中。这将显著增加提示的长度,这会花费更多的钱(当作为开发者使用 API 时),但如果响应的质量有所提高,这可能值得权衡。
输入:
Can I have a list of product names for a pair of shoes that
can fit any foot size? Please follow the five rules below:
1\. It should be readable and writable.
If your product name is hard to pronounce, people won’t talk about it,
and if they can’t write it down (and spell it correctly!)
when they hear it, how do you expect them to Google it?
Keep it simple and don’t go with any wacky spellings just for the sake of it.
...
5\. It should evoke an emotion, feeling, or idea.
Your product name should tie back into what your product is, what the
feeling you want people to have when experiencing your product is, and/or
what idea are you trying to get across.
It should be emotive and inspiring.
输出:
1\. FlexFit Footwear
2\. OmniSize Sneakers
3\. AdaptStride Shoes
...
18\. OmniComfort Kicks
19\. FlexSize Footwear
20\. Boundless Soles
提供方向的方法有很多。在图像生成示例中,通过指定商务会议是在一个玻璃桌周围举行的来给出方向。如果你只改变这个细节,你可以得到一个完全不同的图像,如图 1-5 中详细说明的那样。
输入:
https://s.mj.run/TKAsyhNiKmc stock photo of business meeting
of four people gathered around a campfire outdoors in the
woods, Panasonic, DC-GH5
图 1-5 显示了输出。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0105.png
图 1-5. 丛林中的商务会议的股票照片
角色扮演对于图像生成也很重要,你可以通过提供要模仿的艺术家或艺术风格的名字来给 Midjourney 提供方向。在 AI 艺术界中,梵高是一个经常被提及的艺术家,他以其大胆、戏剧性的笔触和生动的色彩运用而闻名。看看当你将他的名字包含在提示中时会发生什么,如图 1-6 所示。
输入:
people in a business meeting, by Van Gogh
图 1-6 显示了输出。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0106.png
图 1-6. 梵高风格的商务会议中的人物
要使最后一个提示生效,你需要删除很多其他指导。例如,移除基础图像和单词“库存照片”,以及相机“松下,DC-GH5”有助于引入梵高的风格。你可能会遇到的问题是,通常在过多的指导下,模型会迅速达到一个它无法解决的冲突组合。如果你的提示过于具体,训练数据中可能没有足够的样本来生成符合所有你标准的图像。在这种情况下,你应该选择哪个元素更重要(在这种情况下,是梵高),并据此做出让步。
指导是使用最普遍和最广泛的原则之一。它可以采取简单地使用正确的描述性词语来阐明你的意图,或者模仿相关商业名人的形象。虽然过多的指导可能会限制模型的创造力,但指导不足是更常见的问题。
2. 指定格式
AI 模型是通用的翻译器。这不仅意味着从法语翻译成英语,或从乌尔都语翻译成克林贡语,还包括在数据结构之间,如从 JSON 到 YAML,或从自然语言到 Python 代码之间的转换。这些模型能够以几乎任何格式返回响应,因此提示工程的重要部分是找到指定你想要响应的格式的方法。
有时你会发现相同的提示会返回不同的格式,例如,是编号列表而不是逗号分隔列表。这通常不是什么大问题,因为大多数提示都是一次性的,并且是在 ChatGPT 或 Midjourney 中输入的。然而,当你将 AI 工具集成到生产软件中时,偶尔的格式变化可能会导致各种错误。
就像与人类合作一样,你可以通过提前指定你期望的响应格式来避免浪费精力。对于文本生成模型来说,输出 JSON 而不是简单的有序列表通常很有帮助,因为那是 API 响应的通用格式,这使得解析和查找错误更加简单,同时也可以用来渲染应用程序的前端 HTML。YAML 也是另一个流行的选择,因为它强制执行可解析的结构,同时仍然简单且易于阅读。
在你给出的原始提示中,你通过提供的示例和提示末尾的冒号来指示应该直接在行内完成列表。要将格式转换为 JSON,你需要更新这两处,并留下 JSON 未完成,这样 GPT-4 就会知道需要完成它。
输入:
Return a comma-separated list of product names in JSON for
"A pair of shoes that can fit any foot size.".
Return only JSON.
Examples:
[{
"Product description": "A home milkshake maker.",
"Product names": ["HomeShaker", "Fit Shaker",
"QuickShake", "Shake Maker"]
},
{
"Product description": "A watch that can tell
accurate time in space.",
"Product names": ["AstroTime", "SpaceGuard",
"Orbit-Accurate", "EliptoTime"]}
]
输出:
[
{
"Product description": "A pair of shoes that can \
fit any foot size.",
"Product names": ["FlexFit Footwear", "OneSize Step",
"Adapt-a-Shoe", "Universal Walker"]
}
]
我们得到的输出是包含产品名称的完整 JSON。然后可以解析并用于程序化,在应用程序或本地脚本中。从这个点开始,使用 Python 标准 json 库之类的 JSON 解析器检查格式错误也很容易,因为损坏的 JSON 会导致解析错误,这可以作为重试提示或在进行下一步之前进行调查的触发器。如果你仍然没有得到正确的格式,在提示的开始或结束处,或者在使用聊天模型时在系统消息中指定可能有所帮助:你是一个只以 JSON 格式响应的有用助手,或者在模型参数中指定JSON 输出(这在Llama 模型中被称为语法)。
小贴士
如果你不太熟悉 JSON,W3Schools 有一个很好的介绍。
对于图像生成模型来说,格式非常重要,因为修改图像的机会几乎是无穷无尽的。这些格式从明显的类型,如股票照片、插画和油画,到更不寻常的类型,如行车记录仪视频、冰雕,或者在《我的世界》中(见图 1-7)。
输入:
business meeting of four people watching on MacBook on top of
table, in Minecraft
图 1-7 显示了输出。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0107.png
图 1-7. 《我的世界》中的商务会议
在设置格式时,通常需要删除可能与指定格式冲突的其他提示方面。例如,如果你提供的是股票照片的基础图像,结果将是股票照片和你想要的格式的某种组合。在一定程度上,图像生成模型可以推广到他们在训练集中之前未见过的新的场景和组合,但根据我们的经验,无关元素的层次越多,得到不合适图像的可能性就越大。
第一原则和第二原则之间往往存在一些重叠,即给出方向和指定格式。后者是关于定义你想要的输出类型,例如 JSON 格式,或股票照片的格式。前者是关于你想要的响应风格,独立于格式,例如以史蒂夫·乔布斯风格的产品名称,或梵高风格的商务会议图像。当风格和格式发生冲突时,通常最好通过删除对最终结果不那么重要的元素来解决。
3. 提供示例
原始提示没有给出你认为好的名称样例。因此,响应近似于互联网的平均水平,你可以做得更好。研究人员将没有示例的提示称为零样本,当 AI 甚至能够零样本完成任务时,这总是一个令人愉快的惊喜:这是强大模型的标志。如果你提供零个示例,你是在索取很多而回报很少。即使提供一个示例(单样本)也能大大帮助,研究人员通常测试模型在多个示例(少样本)下的表现是常态。这样一项著名的研究是 GPT-3 论文“语言模型是少样本学习者”,其结果在图 1-8 中展示,显示添加一个示例与提示结合可以提高某些任务的准确性,从 10%提高到近 50%!
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0108.png
图 1-8. 上下文中的示例数量
当向同事简要介绍一项新任务或培训初级员工时,自然地你会包括该任务之前做得好的示例。与 AI 合作也是如此,提示的强度通常取决于所使用的示例。提供示例有时比试图解释你为什么喜欢那些示例要容易,因此当你在尝试完成的任务的主题领域不是领域专家时,这种技术最为有效。你可以在提示中放入的文本量是有限的(截至写作时,Midjourney 上大约有 6,000 个字符,ChatGPT 免费版本大约有 32,000 个字符),因此提示工程的大部分工作涉及选择和插入多样且富有教育意义的示例。
在可靠性和创造力之间有一个权衡:超过三个到五个示例,你的结果将变得更加可靠,但会牺牲创造力。你提供的示例越多,它们之间的多样性越少,响应将越受限于匹配你的示例。如果你将前一个提示中的所有示例都改为动物名称,这将强烈影响响应,可靠地只返回包含动物的名称。
输入:
Brainstorm a list of product names for a shoe that fits any
foot size.
Return the results as a comma-separated list, in this format:
Product description: A shoe that fits any foot size
Product names: [list of 3 product names]
## Examples:
Product description: A home milkshake maker.
Product names: Fast Panda, Healthy Bear, Compact Koala
Product description: A watch that can tell accurate time in
space.
Product names: AstroLamb, Space Bear, Eagle Orbit
Product description: A refrigerator that dispenses beer
Product names: BearFridge, Cool Cat, PenguinBox
输出:
Product description: A shoe that fits any foot size
Product names: FlexiFox, ChameleonStep, PandaPaws
当然,这也存在错过返回一个更适合有限空间内 AI 发挥的更好名称的风险。示例的多样性和变化不足也是处理边缘情况或罕见场景的问题。包括一到三个示例很容易,并且几乎总是有积极的效果,但超过这个数量,就变得必须实验包含的示例数量以及它们之间的相似性。有证据(Hsieh et al., 2023)表明,给出方向比提供示例更有效,而且通常收集好的示例并不简单,因此通常明智的做法是首先尝试给出方向的原则。
在图像生成领域,提供示例通常是通过在提示中提供一个基础图像来实现的,在开源 Stable Diffusion 社区中被称为 img2img。根据所使用的图像生成模型,这些图像可以作为模型生成内容的起点,这对结果有很大影响。你可以保持提示的所有内容不变,但将提供的基图像替换为截然不同的图像,从而产生不同的效果,如图 1-9 所示。
输入:
stock photo of business meeting of 4 people watching on
white MacBook on top of glass-top table, Panasonic, DC-GH5
图 1-9 显示了输出。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0109.png
图 1-9. 四人商务会议的股票照片
在这种情况下,通过替换图 1-10 中显示的图像,也是来自 Unsplash 的,你可以看到模型被引导到不同的方向,并且现在包含了白板和便利贴。
警告
这些示例展示了图像生成模型的能力,但我们在上传用于提示的基础图像时要谨慎。检查你计划上传并用作提示基图像的图像的许可,并避免使用明显受版权保护的照片。这样做可能会让你陷入法律纠纷,并且违反所有主要图像生成模型提供商的服务条款。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0110.png
图 1-10. Jason Goodman 在Unsplash上的照片
4. 评估质量
到目前为止,还没有反馈循环来判断你响应的质量,除了通过运行提示并查看结果的基本试错方法,这被称为盲提示。当你的提示仅用于临时执行单一任务且很少再次访问时,这是可以接受的。然而,当你多次重用相同的提示或构建依赖于提示的生产应用程序时,你需要更严格地衡量结果。
评估性能的方法有很多,这主要取决于你希望完成哪些任务。当一个新的 AI 模型发布时,重点往往在于模型在评估(评估)上的表现如何,这是一个标准化的问题集,具有预定义的答案或评分标准,用于测试模型间的性能。不同的模型在不同类型的任务上表现不同,不能保证之前有效的提示在新模型上也能很好地转换。OpenAI 已经将其用于基准测试 LLM 性能的 evals 框架开源,并鼓励其他人贡献额外的评估模板。
除了标准的学术评估外,还有一些更具新闻价值的测试,例如 GPT-4 通过了律师资格考试。对于更主观的任务,评估可能很困难,对于较小的团队来说可能耗时或成本高昂。在某些情况下,研究人员已经转向使用更先进的模型,如 GPT-4,来评估来自较不复杂的模型的响应,正如在 Vicuna-13B 的发布中所做的那样,这是一个基于 Meta 的 Llama 开源模型的微调模型(参见图 1-11)。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0111.png
图 1-11. Vicuna GPT-4 评估
在撰写科学论文或评估新的基础模型发布时,需要更严格的评估技术,但通常你只需要比基本的试错多走一步。你可能发现,在 Jupyter Notebook 中实现的简单点赞/踩不点赞系统足以增加对提示优化的严谨性,而不会增加太多开销。一个常见的测试是看看提供示例是否值得额外的提示长度成本,或者你是否可以在提示中不提供示例也能应付。第一步是获取每个提示的多次运行响应并将它们存储在电子表格中,我们将在设置好环境后进行此操作。
你可以使用 pip install openai 安装 OpenAI Python 包。如果你遇到与此包的兼容性问题,请创建一个虚拟环境并安装我们的requirements.txt(请参阅前言中的说明)。
要使用 API,你需要创建一个 OpenAI 账户,然后在此处导航以获取你的 API 密钥。
警告
由于安全原因,不建议在脚本中硬编码 API 密钥。相反,利用环境变量或配置文件来管理你的密钥。
一旦你有了 API 密钥,至关重要的是通过执行以下命令将其分配为环境变量,将 api_key 替换为你的实际 API 密钥值:
export OPENAI_API_KEY="api_key"
或者,在 Windows 上:
set OPENAI_API_KEY=api_key
或者,如果你不想预先设置 API 密钥,那么你可以在初始化模型时手动设置密钥,或者从 .env 文件中加载它,使用 python-dotenv。首先,使用 pip install python-dotenv 安装库,然后在脚本或笔记本的顶部使用以下代码加载环境变量:
from dotenv import load_dotenv
load_dotenv() # take environment variables from .env.
第一步是为每个提示的多次运行获取响应并将它们存储在电子表格中。
输入:
# Define two variants of the prompt to test zero-shot
# vs few-shot
prompt_A = """Product description: A pair of shoes that can
fit any foot size.
Seed words: adaptable, fit, omni-fit.
Product names:"""
prompt_B = """Product description: A home milkshake maker.
Seed words: fast, healthy, compact.
Product names: HomeShaker, Fit Shaker, QuickShake, Shake
Maker
Product description: A watch that can tell accurate time in
space.
Seed words: astronaut, space-hardened, eliptical orbit
Product names: AstroTime, SpaceGuard, Orbit-Accurate,
EliptoTime.
Product description: A pair of shoes that can fit any foot
size.
Seed words: adaptable, fit, omni-fit.
Product names:"""
test_prompts = [prompt_A, prompt_B]
import pandas as pd
from openai import OpenAI
import os
# Set your OpenAI key as an environment variable
# https://platform.openai.com/api-keys
client = OpenAI(
api_key=os.environ['OPENAI_API_KEY'], # Default
)
def get_response(prompt):
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": prompt
}
]
)
return response.choices[0].message.content
# Iterate through the prompts and get responses
responses = []
num_tests = 5
for idx, prompt in enumerate(test_prompts):
# prompt number as a letter
var_name = chr(ord('A') + idx)
for i in range(num_tests):
# Get a response from the model
response = get_response(prompt)
data = {
"variant": var_name,
"prompt": prompt,
"response": response
}
responses.append(data)
# Convert responses into a dataframe
df = pd.DataFrame(responses)
# Save the dataframe as a CSV file
df.to_csv("responses.csv", index=False)
print(df)
输出:
variant prompt
\
0 A Product description: A pair of shoes that can ...
1 A Product description: A pair of shoes that can ...
2 A Product description: A pair of shoes that can ...
3 A Product description: A pair of shoes that can ...
4 A Product description: A pair of shoes that can ...
5 B Product description: A home milkshake maker.\n...
6 B Product description: A home milkshake maker.\n...
7 B Product description: A home milkshake maker.\n...
8 B Product description: A home milkshake maker.\n...
9 B Product description: A home milkshake maker.\n...
response
0 1\. Adapt-a-Fit Shoes \n2\. Omni-Fit Footwear \n...
1 1\. OmniFit Shoes\n2\. Adapt-a-Sneaks \n3\. OneFi...
2 1\. Adapt-a-fit\n2\. Flexi-fit shoes\n3\. Omni-fe...
3 1\. Adapt-A-Sole\n2\. FitFlex\n3\. Omni-FitX\n4\. ...
4 1\. Omni-Fit Shoes\n2\. Adapt-a-Fit Shoes\n3\. An...
5 Adapt-a-Fit, Perfect Fit Shoes, OmniShoe, OneS...
6 FitAll, OmniFit Shoes, SizeLess, AdaptaShoes
7 AdaptaFit, OmniShoe, PerfectFit, AllSizeFit.
8 FitMaster, AdaptoShoe, OmniFit, AnySize Footwe...
9 Adapt-a-Shoe, PerfectFit, OmniSize, FitForm
在这里,我们使用 OpenAI API 生成对一组提示的模型响应,并将结果存储在数据框中,然后将其保存为 CSV 文件。以下是工作原理:
-
定义了两个提示变体,每个变体都包含产品描述、种子词和潜在的产品名称,但
prompt_B提供了两个示例。 -
需要导入 Pandas 库、OpenAI 库和 os 库。
-
get_response函数接受一个提示作为输入,并从gpt-3.5-turbo模型返回一个响应。提示作为用户消息传递给模型,同时传递一个系统消息以设置模型的行为。 -
两个提示变体存储在
test_prompts列表中。 -
创建一个空列表
responses来存储生成的响应,并将变量num_tests设置为 5。 -
使用嵌套循环来生成响应。外循环遍历每个提示,内循环为每个提示生成
num_tests(本例中为五个)个响应。-
使用
enumerate函数获取test_prompts中每个提示的索引和值。然后将此索引转换为相应的大写字母(例如,0 变为 A,1 变为 B),用作变体名称。 -
对于每次迭代,使用当前提示调用
get_response函数来从模型生成响应。 -
创建一个包含变体名称、提示和模型响应的字典,并将其追加到
responses列表中。
-
-
一旦所有响应都生成完毕,
responses列表(现在是一个字典列表)被转换为 Pandas 数据框。 -
然后使用 Pandas 内置的
to_csv函数将此数据框保存为 CSV 文件,文件名为 responses.csv,index=False以防止写入行索引。 -
最后,数据框被打印到控制台。
将这些响应放在电子表格中已经很有用,因为你可以立即在打印的响应中看到,前五行中的 prompt_A(零样本)提供了一个编号列表,而最后五行中的 prompt_B(少样本)倾向于输出逗号分隔的行内列表的期望格式。下一步是对每个响应进行评分,最好是无视并随机化,以避免偏向某个提示而忽略另一个。
输入:
import ipywidgets as widgets
from IPython.display import display
import pandas as pd
# load the responses.csv file
df = pd.read_csv("responses.csv")
# Shuffle the dataframe
df = df.sample(frac=1).reset_index(drop=True)
# df is your dataframe and 'response' is the column with the
# text you want to test
response_index = 0
# add a new column to store feedback
df['feedback'] = pd.Series(dtype='str')
def on_button_clicked(b):
global response_index
# convert thumbs up / down to 1 / 0
user_feedback = 1 if b.description == "\U0001F44D" else 0
# update the feedback column
df.at[response_index, 'feedback'] = user_feedback
response_index += 1
if response_index < len(df):
update_response()
else:
# save the feedback to a CSV file
df.to_csv("results.csv", index=False)
print("A/B testing completed. Here's the results:")
# Calculate score and num rows for each variant
summary_df = df.groupby('variant').agg(
count=('feedback', 'count'),
score=('feedback', 'mean')).reset_index()
print(summary_df)
def update_response():
new_response = df.iloc[response_index]['response']
if pd.notna(new_response):
new_response = "<p>" + new_response + "</p>"
else:
new_response = "<p>No response</p>"
response.value = new_response
count_label.value = f"Response: {response_index + 1}"
count_label.value += f"/{len(df)}"
response = widgets.HTML()
count_label = widgets.Label()
update_response()
thumbs_up_button = widgets.Button(description='\U0001F44D')
thumbs_up_button.on_click(on_button_clicked)
thumbs_down_button = widgets.Button(
description='\U0001F44E')
thumbs_down_button.on_click(on_button_clicked)
button_box = widgets.HBox([thumbs_down_button,
thumbs_up_button])
display(response, button_box, count_label)
输出如图 图 1-12 所示:
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0112.png
图 1-12. 点赞/踩不点赞评价系统
如果你在一个 Jupyter Notebook 中运行此代码,一个小部件会显示每个 AI 响应,并带有点赞或踩按钮(见 图 1-12)。这提供了一个简单的界面,可以快速标记响应,且开销最小。如果你想在 Jupyter Notebook 之外做这件事,你可以将点赞和踩按钮的表情符号改为 Y 和 N,并使用内置的 input() 函数实现循环,作为 iPyWidgets 的纯文本替代。
一旦完成标记响应,你将得到输出,它显示了每个提示的性能。
输出:
A/B testing completed. Here's the results:
variant count score
0 A 5 0.2
1 B 5 0.6
数据框被随机洗牌,每个响应都被盲标(没有看到提示),因此你可以准确地了解每个提示执行得多频繁。以下是逐步解释:
-
导入了三个模块:
ipywidgets、IPython.display和pandas。ipywidgets包含用于 Jupyter Notebook 和 IPython 内核的交互式 HTML 小部件。IPython.display提供了用于显示各种类型输出的类,如图像、声音、显示 HTML 等。Pandas 是一个强大的数据处理库。 -
使用 pandas 库读取包含你想要测试的响应的 CSV 文件 responses.csv,这创建了一个名为
df的 Pandas 数据框。 -
df使用sample()函数并设置frac=1进行洗牌,这意味着它使用了所有行。reset_index(drop=True)用于将索引重置为标准的 0, 1, 2, …, n 索引。 -
脚本将
response_index定义为 0。这用于跟踪用户当前正在查看数据框中的哪个响应。 -
在
df数据框中添加了一个新的列feedback,数据类型为str或字符串。 -
接下来,脚本定义了一个函数
on_button_clicked(b),该函数将在界面中的任一按钮被点击时执行。-
函数首先检查点击的按钮的
description是否为点赞按钮 (\U0001F44D; https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/thumbs-up_1f44d.png),并将user_feedback设置为 1,或者如果它是踩按钮 (\U0001F44Ehttps://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/thumbs-down_1f44e.png),则将user_feedback设置为 0。 -
然后它更新当前
response_index的数据框的feedback列为user_feedback。 -
之后,它将
response_index增加,以移动到下一个响应。 -
如果
response_index仍然小于响应的总数(即数据框的长度),则调用update_response()函数。 -
如果没有更多的响应,它将数据框保存到新的 CSV 文件 results.csv,然后打印一条消息,并按变体打印结果的摘要,显示收到的反馈数量和每个变体的平均分数(平均值)。
-
-
update_response()函数从数据框中获取下一个响应,将其包裹在段落 HTML 标签中(如果它不是 null),更新response小部件以显示新的响应,并更新count_label小部件以反映当前响应编号和总响应数。 -
实例化了两个小部件,
response(一个 HTML 小部件)和count_label(一个标签小部件)。然后调用update_response()函数以初始化这些小部件,显示第一个响应和适当的标签。 -
创建了两个更多的小部件,
thumbs_up_button和thumbs_down_button(都是按钮小部件),分别以赞同和反对表情符号作为它们的描述。这两个按钮都配置为在点击时调用on_button_clicked()函数。 -
使用
HBox函数将两个按钮组合成一个水平框(button_box)。 -
最后,使用
IPython.display模块的display()函数将response、button_box和count_label小部件显示给用户。
这样的简单评分系统可以用来判断提示质量并处理边缘情况。通常在 10 次以下的对提示的测试运行中,你会发现偏差,否则你可能直到开始在生产中使用它之前都不会发现。缺点是手动评分大量响应可能会很繁琐,而且你的评分可能无法代表目标受众的偏好。然而,即使是少量测试也可以揭示两种提示策略之间的巨大差异,并在达到生产之前揭示不明显的问题。
对提示进行迭代和测试可以显著缩短提示的长度,从而降低系统和延迟的成本。如果你能找到另一个性能相同(或更好)但使用更短提示的提示,你就可以大幅度扩大你的运营规模。通常,在这个过程中,你会发现复杂提示的许多元素完全是多余的,甚至可能适得其反。
赞同或其他手动标记的质量指标不一定是唯一的评判标准。人类评估通常被认为是反馈的最准确形式。然而,手动评估大量样本可能会很繁琐且成本高昂。在许多情况下,例如在数学或分类用例中,可能可以建立基准真相(测试用例的参考答案)以编程方式评估结果,从而大大提高你的测试和监控努力。以下列表并不全面,因为有许多动机促使你以编程方式评估提示:
成本
使用大量标记或仅与更昂贵模型工作的提示可能不适合生产使用。
延迟
同样,标记越多,或所需的模型越大,完成任务所需的时间就越长,这可能会损害用户体验。
调用
许多 AI 系统需要多次循环调用才能完成任务,这可能会严重减慢处理速度。
性能
实现某种形式的外部反馈系统,例如用于预测现实世界结果的物理引擎或其他模型。
分类
确定提示正确标记给定文本的频率,使用另一个 AI 模型或基于规则的标记。
推理
计算 AI 未能应用逻辑推理或数学错误的实例与参考案例之间的差异。
幻觉
看看你遇到幻觉的频率,这是通过发明不在提示上下文中的新术语来衡量的。
安全性
使用安全过滤器或检测系统标记任何可能返回不安全或不希望的结果的场景。
拒绝
通过标记已知的拒绝语言,找出系统错误拒绝合理用户请求的频率。
对抗性
使提示对已知的提示注入攻击具有鲁棒性,这些攻击可以使模型运行不希望运行的提示而不是你编程的内容。
相似度
使用共享的单词和短语(BLEU 或 ROGUE)或向量距离(在第五章中解释)来衡量生成文本和参考文本之间的相似度。
一旦你开始评估哪些示例是好的,你就可以更容易地更新你的提示中使用的示例,作为随着时间的推移使你的系统变得更智能的一种方式。此反馈的数据还可以用于微调示例,一旦你可以提供几千个示例,微调就开始超越提示工程,如图 1-13 所示。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0113.png
图 1-13。一个提示值是多少数据点?
从点赞或点踩过渡到 3 分、5 分或 10 分的评分系统,以获得对提示质量更细致的反馈。还可以通过并排比较响应,而不是逐个查看响应,来确定总体相对性能。从这些信息中,你可以构建一个公平的跨模型比较,使用*Elo 评分,这在象棋中很受欢迎,也被lmsys.org*在聊天机器人竞技场中使用。
对于图像生成,评估通常采用排列提示的形式,即你输入多个方向或格式,并为每种组合生成一个图像。然后可以扫描图像或稍后以网格形式排列,以展示提示的不同元素对最终图像的影响。
输入:
{stock photo, oil painting, illustration} of business
meeting of {four, eight} people watching on white MacBook on
top of glass-top table
在 Midjourney 中,这将被编译成六个不同的提示,每个提示对应于三种格式(股票照片、油画、插图)和两种人数(四、八)的组合。
输入:
1\. stock photo of business meeting of four people watching
on white MacBook on top of glass-top table
2\. stock photo of business meeting of eight people watching
on white MacBook on top of glass-top table
3\. oil painting of business meeting of four people watching
on white MacBook on top of glass-top table
4\. oil painting of business meeting of eight people watching
on white MacBook on top of glass-top table
5\. illustration of business meeting of four people watching
on white MacBook on top of glass-top table
6\. illustration of business meeting of eight people watching
on white MacBook on top of glass-top table
每个提示通常都会生成其自己的四张图像,这使得输出结果稍微难以观察。我们从每个提示中选了一张图像进行放大,然后将它们组合成一个网格,如图图 1-14 所示。你会注意到,模型并不总是能得到正确的人数(生成式 AI 模型在数学上出奇地差),但它通过在右侧照片中添加比左侧更多的人数,正确地推断出了总体意图。
图 1-14 显示了输出。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0114.png
图 1-14. 提示排列网格
对于具有类似 Stable Diffusion 这样的 API 的模型,你可以更轻松地操作照片并以网格格式显示,以便于扫描。你还可以操作图像的随机种子,以固定一种风格,实现最大程度的可重复性。使用图像分类器时,也可能根据图像的安全内容或是否包含与成功或失败相关的特定元素进行编程评分。
5. 分工
当你构建提示时,你开始达到一个点,在这个点上,你在对 AI 的单次调用中提出了很多要求。当提示变得更长、更复杂时,你可能会发现响应的不确定性降低,幻觉或异常增加。即使你设法为你的任务找到了一个可靠的提示,这个任务也可能是你需要完成的工作中许多相互关联的任务之一。自然地,你会开始探索其他哪些任务可以通过 AI 完成,以及你如何将它们串联起来。
工程学的一个核心原则是利用任务分解将问题分解为其组成部分,这样你可以更容易地解决每个单独的问题,然后将结果重新聚合。将你的 AI 工作分解成多个串联的调用可以帮助你完成更复杂的任务,同时也能提供更多关于链中哪个部分失败的可见性。
产品命名涉及许多因素,其中一项重要的任务被天真地外包给了 AI,我们无法了解它如何权衡这些因素的重要性(如果有的话)。按照我们当前系统的工作方式,我们得到了一个名字列表,但所有名字都以同等的重要性显示,没有进一步的上下文来帮助我们做出决定。幸运的是,AI 工具能够进行自我评估;如果我们对任务添加第二步,我们可以自动检查非期望的输出。
输入:
Rate this list of product names for a pair of shoes that can
fit any foot size. The rating should be out of 10, inline
next to the product name:
UniFit SoleStrides
FlexiSize All-Fit Shoes
OmniFit FootFlex
...
AnyStep AdaptiWalkers
FlexiRange UniversalSteps
PerfectGlide MultiFit Shoes
输出:
UniFit SoleStrides: 7/10
FlexiSize All-Fit Shoes: 8/10
OmniFit FootFlex: 7/10
...
AnyStep AdaptiWalkers: 7/10
FlexiRange UniversalSteps: 7.5/10
PerfectGlide MultiFit Shoes: 9/10
This ranking is based on clarity, memorability, and how well
the name communicates the unique selling point of the
product (that they can fit any foot size). Remember, these
ratings are subjective and the ultimate success of a product
name also depends on the target market, branding, and
marketing efforts.
在多次运行此模型后,它始终将“OneSize Glovewalkers”这个名字评为最差,并提供上下文(如果你询问的话)说明这个概念在鞋类环境中可能会令人困惑。你可能想知道,如果模型知道这个名字不好,为什么它最初还会建议这个名字?LLMs 通过预测序列中的下一个标记来工作,因此很难知道完成后的整体响应会是什么。然而,当它有了之前响应的所有标记来审查时,它更容易预测这会被标记为好还是坏的反应。
我们可以继续将多个调用串联起来,以改进我们任务的成果。例如,我们可以将这分成三个单独的评分:清晰度、记忆度和名字传达产品独特卖点的好坏。这些评分可以提供给人类作为最终决策的额外上下文,或者甚至可以一起计算来程序化选择最终的名字。
与仅仅玩转提示相比,学会专业地与 AI 合作学习的真正突破在于意识到系统的每个部分都可以分解为一系列迭代步骤。即使是一个单独的提示,这个原则也适用,因为简单地将“让我们一步步思考”添加到提示中,可以带来推理和熟练度的明显提升,同时为质量保证和调试提供审计轨迹。当花费时间和标记进行推理时,评分会改变,并且与评分标准更加一致。
输入:
Let's think step by step. Rate this list of product names
for a pair of shoes that can fit any foot size. The rating
should be out of 10, inline next to the product name:
UniFit SoleStrides
FlexiSize All-Fit Shoes
OmniFit FootFlex
...
AnyStep AdaptiWalkers
FlexiRange UniversalSteps
PerfectGlide MultiFit Shoes
输出:
Here's a rating out of 10 for each product name, considering
factors like descriptiveness, catchiness, and overall
appeal:
1\. **UniFit SoleStrides**: 7/10 - Sounds unifying but may
not immediately convey the "fit any size" idea.
2\. **FlexiSize All-Fit Shoes**: 8.5/10 - Combines
flexibility and the notion that it fits all sizes.
3\. **OmniFit FootFlex**: 8/10 - Omnipresence (fitting all)
and flexibility nicely conveyed.
...
18\. **AnyStep AdaptiWalkers**: 7.5/10 - "AnyStep" suggests
versatility, but the overall name might be less direct.
19\. **FlexiRange UniversalSteps**: 8/10 - Combines
flexibility and universal fit.
20\. **PerfectGlide MultiFit Shoes**: 8.5/10 - "MultiFit" is
clear, and "PerfectGlide" suggests a comfortable experience.
Remember, the best name can also be influenced by your
target audience, branding strategy, and the overall message
you want to convey.
OpenAI 称之为 “给模型思考的时间”,这是提示工程的一个关键原则。实际上,思维链技术就像这种,鼓励模型列出其步骤,就像在同一个提示中分解任务一样。一旦我们自动完成了产品命名的任务,我们就可以再次调用 ChatGPT 来描述每个产品,然后这些描述可以输入到 Midjourney 中生成每个产品的图像。使用 AI 模型为 AI 模型生成提示词是元提示,它之所以有效,是因为 LLMs 是达到人类水平的提示工程师(周,2022)。
输入:
Describe the new product "OneFit UltraStride Shoes," a shoe
that can fit any foot size, in great detail:
输出:
Introducing the revolutionary "OneFit UltraStride Shoes,"
the footwear innovation designed to fit any foot size with
unparalleled comfort and style. These shoes are a result of
cutting-edge technology and meticulous craftsmanship,
ensuring a truly personalized and enjoyable walking or
running experience for everyone.
Design and Materials:
The OneFit UltraStride Shoes boast an elegant and modern
design that appeals to both fashion-forward individuals and
sports enthusiasts. They feature a sleek and streamlined
silhouette, crafted using premium materials for durability
and aesthetic appeal. The shoe's upper is constructed from a
combination of breathable mesh and flexible synthetic
materials, allowing for excellent airflow and a lightweight
feel.
...
Color and Customization:
OneFit UltraStride Shoes are available in a variety of
vibrant colors and stylish patterns, catering to diverse
personal preferences. You can choose from classic
monochromatic options for a timeless look or opt for bold
and vibrant hues to make a fashion statement. The ability to
customize the shoe's appearance allows individuals to
express their unique style while enjoying the benefits of
the OneFit technology.
Conclusion:
The OneFit UltraStride Shoes represent a breakthrough in
footwear technology, eliminating the hassle of finding the
perfect fit for your feet. With their adaptive design,
superior comfort, and stylish aesthetics, these shoes offer
a truly personalized experience for individuals of all foot
sizes. Stay comfortable, confident, and on-trend with the
OneFit UltraStride Shoes, the ultimate choice for those
seeking unmatched versatility and a perfect fit.
DALL-E 在 GPT-4 中是众所周知的,因此你可以在 ChatGPT 中调用它的名字,并且它为图像生成工具制作提示词做得相当不错。如果你打算在生产中使用这个提示词,你可能考虑应用提供示例的提示原则,但对我们来说,没有示例它也做得足够好。
输入:
Convert this description into a short prompt for an AI image
generation model such as DALL-E:
输出:
Create an eye-catching image that visually represents the
"OneFit UltraStride Shoes," a revolutionary footwear product
that adapts to any foot size. Showcase a sleek and modern
shoe design incorporating a stretchable upper, an adjustable
lacing system, and a cushioned insole for optimal comfort.
Emphasize the shoe's versatility by displaying it in action
on various surfaces, highlighting its excellent traction.
Incorporate vibrant colors and patterns to convey the
customizable nature of the product. The image should capture
the essence of innovation, style, and personalized fit that
the OneFit UltraStride Shoes offer.
现在可以将这个提示的输出连接到图像生成工具,如 DALL-E 或 Midjourney,作为提示,这可以为你可视化产品可能的样子提供一个良好的起点。尽管这可能不是你最终采用的设计,但看到图像更具有启发性,有助于人们更快地形成观点。从空白页面或文本部分想象一个新图像比批评或赞扬现有的图像在认知上更容易。
图 1-15 显示了输出结果。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0115.png
图 1-15. OneFit UltraStride 运动鞋
在专业处理 AI 时,将多个 AI 调用甚至多个模型串联起来以实现更复杂的目标是一种常见的做法。即使是单次提示应用也往往是动态构建的,基于从各种数据库或其他 AI 模型调用中查询的外部上下文。库LangChain开发了将多个提示模板和查询串联起来的工具,使这一过程更加可观察和结构化。一个基础示例是渐进式摘要,其中无法适应上下文窗口的大段文本可以被分成多个文本块,每个块被总结,最后再总结这些摘要。如果你与早期 AI 产品的构建者交谈,你会发现他们都在底层将多个提示串联起来,称为AI 串联,以在最终输出中获得更好的结果。
原因与行动(ReAct)框架是 AI 代理中最早流行的尝试之一,包括开源项目BabyAGI、AgentGPT和Microsoft AutoGen。实际上,这些代理是通过将多个 AI 调用串联起来以实现规划、观察、行动,然后评估行动结果。自主代理将在第六章中介绍,但截至写作时,它们在生产中尚未得到广泛应用。这种自我推理代理的做法还处于早期阶段,容易出错,但有一些迹象表明,这种方法在完成复杂任务时可能是有用的,并且很可能是 AI 系统下一阶段演化的一个部分。
在微软和谷歌等大型科技公司之间,以及 Hugging Face 上的众多开源项目,以及像 OpenAI 和 Anthropic 这样的风险投资初创公司之间,正在发生一场 AI 竞赛。随着新模型的不断涌现,它们正在多样化以争夺不断增长市场的不同细分市场。例如,Anthropic 的 Claude 2 拥有10 万个令牌的上下文窗口,而 GPT-4 的标准8,192 个令牌。OpenAI 很快推出了 GPT-4 的128,000 个令牌窗口版本,而谷歌吹嘘其 Gemini 1.5 拥有 1 百万个令牌的上下文长度。oreil.ly/cyhR4。为了比较,一本《哈利·波特》的书大约有 185,000 个令牌,所以整个一本书可能都适合放入一个单一的提示中,尽管每次 API 调用处理数百万个令牌可能对大多数用例来说成本过高。
这本书主要关注文本生成技术中的 GPT-4,以及 Midjourney v6 和 Stable Diffusion XL 在图像生成技术中的应用,但几个月后这些模型可能就不再是业界领先的技术了。这意味着能够选择适合任务的正确模型并将多个 AI 系统串联起来将变得越来越重要。在迁移到新模型时,提示模板通常难以比较,但五个提示原则的效果将始终如一地提升你使用的任何模型的提示,从而获得更可靠的结果。
摘要
在本章中,你学习了在生成式 AI 的背景下提示工程的重要性。我们将提示工程定义为开发有效的提示的过程,这些提示在与 AI 模型交互时产生预期结果。你发现提供清晰的方向、格式化输出、结合示例、建立评估系统以及将复杂任务分解成更小的提示是提示工程的关键原则。通过应用这些原则和使用常见的提示技术,你可以提高 AI 生成输出的质量和可靠性。
你还探讨了提示工程在生成产品名称和图像中的作用。你了解到指定所需格式和提供指导性示例如何极大地影响 AI 的输出。此外,你学习了角色扮演的概念,你可以要求 AI 生成类似于史蒂夫·乔布斯这样的名人的输出。本章强调了在使用生成式 AI 模型时,清晰的方向和上下文对于实现预期结果的重要性。此外,你还发现了评估 AI 模型性能的重要性以及用于衡量结果的多种方法,以及质量与令牌使用、成本和延迟之间的权衡。
在下一章中,您将了解到文本生成模型。您将学习不同类型的基模及其功能,以及它们的局限性。本章还将回顾标准 OpenAI 的提供内容,以及竞争对手和开源替代方案。到本章结束时,您将对文本生成模型的历史及其相对优势和劣势有一个扎实的理解。本书将在第 7、8 和 9 章回到图像生成提示,所以如果您对此有迫切需求,可以自由地跳过前面的内容。准备好深入探索提示工程学科,并扩展您与 AI 合作的舒适度。
第二章. 文本生成的大语言模型简介
在人工智能领域,最近的研究重点是大语言模型的演变。与它们不那么灵活的前辈相比,LLM 能够处理和从大量数据中学习,从而产生了产生与人类语言输出非常相似文本的能力。这些模型已经应用于各种不同的应用,从撰写内容到自动化软件开发,以及实现实时交互式聊天机器人体验。
文本生成模型是什么?
文本生成模型利用高级算法来理解文本中的含义,并产生通常与人类工作难以区分的输出。如果你曾经与ChatGPT互动或对其构建连贯且上下文相关的句子能力感到惊奇,你见证了 LLM 的实际力量。
在自然语言处理(NLP)和 LLM 中,基本的语言单位是标记。标记可以代表句子、单词,甚至子词,如一组字符。理解文本数据大小的一个有用方法是查看它包含的标记数量;例如,100 个标记的文本大约相当于 75 个单词。这种比较对于管理 LLM 的处理限制至关重要,因为不同的模型可能有不同的标记容量。
标记化,即将文本分解成标记的过程,是准备数据用于 NLP 任务的关键步骤。可以用于标记化的方法包括字节对编码(BPE),WordPiece 和 SentencePiece。每种方法都有其独特的优势,适用于特定的用例。BPE 因其处理广泛词汇量的效率而常用,同时保持标记数量可管理。
BPE(字节对编码)首先将文本视为一系列单个字符。随着时间的推移,它将频繁一起出现的字符组合成单个单元,或称为标记。为了更好地理解这一点,可以考虑单词apple。最初,BPE 可能将其视为a,p,p,l和e。但注意到p在数据集中经常出现在a之后和l之前,它可能会将它们组合起来,并在未来的实例中将appl视为一个单独的标记。
这种方法有助于 LLM 识别和生成单词或短语,即使它们在训练数据中并不常见,这使得模型更具适应性和多功能性。
理解大型语言模型(LLM)的工作原理需要掌握支撑这些系统的底层数学原理。尽管计算可能很复杂,但我们仍可以将核心元素简化,以提供对这些模型如何运作的直观理解。特别是在商业环境中,LLM 的准确性和可靠性至关重要。
实现这种可靠性的重要部分在于 LLM 开发的前训练和微调阶段。最初,模型在前训练阶段在庞大的数据集上进行训练,获得对语言的广泛理解。随后,在微调阶段,模型被适应于特定任务,磨练其能力,为专门的应用提供准确和可靠的结果。
向量表示:语言的数值本质
在 NLP 领域,单词不仅仅是字母符号。它们可以被标记化,然后以数值形式表示,称为向量。这些向量是多维数组,包含数字,捕捉语义和句法关系:
w r i g h t − a r r o w b o l d v e q u a l s l e f t − b r a c k e t v 1 c o m m a v 2 c o m m a e l l i p s i s c o m m a v S u b s c r i p t n B a s e l i n e r i g h t − b r a c k e t w right-arrow bold v equals left-bracket v 1 comma v 2 comma ellipsis comma v Subscript n Baseline right-bracket wright−arrowboldvequalsleft−bracketv1commav2commaellipsiscommavSubscriptnBaselineright−bracket
创建单词向量,也称为词嵌入,依赖于语言中的复杂模式。在密集的训练阶段,模型被设计用来识别和学习这些模式,确保具有相似意义的单词在多维空间中彼此靠近(图 2-1)。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0201.png
图 2-1. 单词嵌入空间中单词向量的语义邻近性
这种方法的美丽之处在于它能够捕捉单词之间的细微关系并计算它们的距离。当我们检查词嵌入时,很明显,具有相似或相关意义的单词,如virtue和moral或walked和walking,都位于彼此附近。这种嵌入空间中的空间接近性成为各种 NLP 任务中的强大工具,使模型能够理解上下文、语义以及构成语言的复杂关系网。
Transformer 架构:协调上下文关系
在我们深入探讨 transformer 架构的机制之前,让我们建立一个基础的理解。简单来说,当我们有一个句子,比如说,The cat sat on the mat,这个句子中的每个单词都会被转换成其数值向量表示。所以,cat可能变成一系列数字,sat、on和mat也是如此。
正如你将在本章后面详细探索的那样,transformer 架构将这些单词向量理解它们之间的关系——在结构(句法)和意义(语义)上。有许多类型的 transformer;图 2-2 展示了 BERT 和 GPT 的架构。此外,transformer 不仅孤立地看待单词;它观察cat并知道在这个句子中以特定方式与sat和mat相关。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0202.png
图 2-2. BERT 使用编码器处理输入数据,而 GPT 有解码器处理输出
当 transformer 处理这些向量时,它使用数学运算来理解单词之间的关系,从而产生具有丰富上下文信息的新向量:
b o l d v p r i m e S u b s c r i p t i B a s e l i n e e q u a l s T r a n s f o r m e r l e f t − p a r e n t h e s i s b o l d v 1 c o m m a b o l d v 2 c o m m a e l l i p s i s c o m m a b o l d v S u b s c r i p t m B a s e l i n e r i g h t − p a r e n t h e s i s bold v prime Subscript i Baseline equals Transformer left-parenthesis bold v 1 comma bold v 2 comma ellipsis comma bold v Subscript m Baseline right-parenthesis boldvprimeSubscriptiBaselineequalsTransformerleft−parenthesisboldv1commaboldv2commaellipsiscommaboldvSubscriptmBaselineright−parenthesis
Transformer 的一个显著特点是它们理解词语细微语境意义的能力。Transformer 中的 自注意力 机制允许句子中的每个词查看所有其他词,以更好地理解其语境。想象一下,每个词都在为其他词对其意义的重要性投票。通过考虑整个句子,Transformer 可以更准确地确定每个词的角色和意义,使它们的解释更加语境丰富。
概率文本生成:决策机制
在 Transformer 理解了给定文本的语境之后,它继续生成新的文本,受似然或概率概念指导。从数学的角度来看,模型计算每个可能的下一个词跟随当前词序列的可能性,并选择最有可能的那个词:
w S u b s c r i p t n e x t B a s e l i n e e q u a l s a r g m a x u p p e r P l e f t − p a r e n t h e s i s w v e r t i c a l − b a r w 1 c o m m a w 2 c o m m a e l l i p s i s c o m m a w S u b s c r i p t m B a s e l i n e r i g h t − p a r e n t h e s i s w Subscript next Baseline equals argmax upper P left-parenthesis w vertical-bar w 1 comma w 2 comma ellipsis comma w Subscript m Baseline right-parenthesis wSubscriptnextBaselineequalsargmaxupperPleft−parenthesiswvertical−barw1commaw2commaellipsiscommawSubscriptmBaselineright−parenthesis
通过重复此过程,如图 图 2-3 所示,模型生成一个连贯且与语境相关的文本字符串作为其输出。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0203.png
图 2-3. 使用类似 GPT-4 的 Transformer 模型生成文本的方式
驱动大型语言模型(LLM)的机制根植于向量数学、线性变换和概率模型。虽然底层的操作计算密集,但核心概念建立在这些数学原理之上,提供了一个基础理解,架起了技术复杂性与商业应用之间的桥梁。
历史背景:Transformer 架构的兴起
类似 ChatGPT(GPT 代表 生成预训练 Transformer)的语言模型并非凭空出现。它们是 NLP 领域多年进步的结晶,特别是自 2010 年代后期以来的加速发展。这一进步的核心是引入了 transformer 架构,这在谷歌 Brain 团队开创性的论文 “Attention Is All You Need” 中有详细阐述。
变换器架构的真正突破是“注意力”的概念。传统模型按顺序处理文本,这限制了它们对语言结构的理解,尤其是在文本的较长距离上。注意力通过允许模型直接将文本中位置较远的词语联系起来,而不管它们在文本中的位置,从而改变了这一点。这是一个开创性的观点。这意味着词语及其上下文不必在整个模型中移动才能相互影响。这不仅显著提高了模型的文本理解能力,而且使它们变得更加高效。
这种注意力机制在扩展模型检测文本中长距离依赖关系的能力方面发挥了至关重要的作用。这对于生成不仅上下文准确和流畅,而且在更长段落中也是连贯的输出至关重要。
根据人工智能先驱和教育家安德鲁·吴的说法,早期的许多自然语言处理研究,包括对变换器的基本研究,都得到了美国军事情报机构的重大资助。他们对机器翻译和语音识别等工具的浓厚兴趣,主要是出于情报目的,无意中为超越翻译本身的发展开辟了道路。
训练 LLM 需要大量的计算资源。这些模型被大量数据喂养,从千兆字节到拍字节不等,包括互联网内容、学术论文、书籍以及为特定目的定制的更窄数据集。然而,需要注意的是,用于训练 LLM 的数据可能携带其来源的固有偏差。因此,在使用这些模型时,用户应谨慎行事,并最好采用人工监督,以确保负责任和道德的 AI 应用。
例如,OpenAI 的 GPT-4 拥有估计的1.7 万亿参数,相当于一个覆盖三万足球场的 Excel 电子表格。在神经网络中,“参数”是指在训练过程中调整的权重和偏差,使模型能够根据其训练数据表示和生成复杂模式。GPT-4 的训练成本估计为6300 万美元,而训练数据将填满大约650 公里书架的书。
为了满足这些要求,像微软、Meta 和谷歌这样的大型科技公司投入了大量资金,使得大型语言模型(LLM)的开发成为一场高风险的竞赛。
大型语言模型(LLM)的兴起为硬件行业带来了更高的需求,尤其是专注于图形处理单元(GPU)的公司。例如,英伟达几乎成了高性能 GPU 的同义词,这对于训练 LLM 至关重要。
随着公司努力构建更大、更复杂的模型,对强大、高效的 GPU 的需求急剧上升。人们追求的不仅仅是原始的计算能力。GPU 还需要针对机器学习中的特定任务进行微调,如张量运算。在机器学习环境中,“张量”是多维数据数组,对它们的操作是神经网络计算的基础。这种对专用能力的重视催生了定制的硬件,例如 NVIDIA 的 H100 张量核心 GPU,它专门设计用来加速机器学习工作负载。
此外,对顶级 GPU 的需求往往超过了供应,导致价格呈上升趋势。这种供需互动已经将 GPU 市场转变为一个竞争激烈且有利可图的领域。在这里,从科技巨头到学术研究人员,各种客户都在争相采购最先进的硬件。
这种需求的激增引发了一股创新浪潮,而不仅仅是针对 GPU。公司现在正专注于创建专门的 AI 硬件,例如谷歌的 Tensor 处理单元(TPUs),以满足 AI 模型不断增长的计算需求。
这个不断演变的地形不仅强调了人工智能领域软件与硬件之间的共生关系,而且也突出了 LLM“淘金热”的连锁反应。它正引导创新并将投资引导到各个领域,尤其是那些提供构建这些模型的基本组件的领域。
OpenAI 的生成预训练转换器
OpenAI的成立宗旨是确保通用人工智能的益处惠及全人类,最近它一直是人工智能革命的先锋。他们最具有突破性的贡献之一是 GPT 系列模型,这些模型在很大程度上重新定义了 LLM 能够实现的范围。
OpenAI 的原始 GPT 模型不仅仅是一个研究成果;它是对基于转换器架构潜力的有力展示。这个模型展示了机器理解并生成类似人类语言的初步步骤,为未来的进步奠定了基础。
GPT-2 的发布引起了人们的期待和谨慎。OpenAI 在认识到该模型强大能力的同时,由于担心其潜在的误用,最初犹豫是否发布。GPT-2 的强大力量使得伦理问题成为焦点,与今天模型的威力相比可能显得有些过时。然而,当 OpenAI 决定将项目作为开源发布时,并不仅仅意味着公开代码。它允许企业和研究人员将这些预训练模型作为构建块使用,将 AI 融入他们的应用中而无需从头开始。这一举措使高级自然语言处理能力更加民主化,促进了各个领域的创新。
在 GPT-2 之后,OpenAI 决定专注于发布付费的闭源模型。GPT-3 的推出标志着 LLM 进步中的一个重大步伐。它不仅因其技术实力而受到媒体的关注,还因其能力的社会影响而备受瞩目。该模型能够生成如此逼真的文本,以至于它常常与人类撰写的文本难以区分。从创作复杂的文学作品到生成操作代码片段,GPT-3 展示了 AI 看似无限的潜力。
GPT-3.5-turbo 和 ChatGPT
在微软对其公司进行重大投资的支持下,OpenAI 推出了 GPT-3.5-turbo,这是其卓越前身的优化版本。2019 年,微软注入了 10 亿美元,后来增加到 130 亿美元以获得 OpenAI 营利性部门的 49%股份,OpenAI 利用这些资源开发了 GPT-3.5-turbo,它提供了更高的效率和更低的成本,使得大型语言模型(LLM)对更广泛的用例更加可及。
OpenAI 希望收集更多世界反馈以进行微调,因此ChatGPT应运而生。与通用型兄弟不同,ChatGPT 经过微调以在对话环境中表现出色,使人类与机器之间的对话既自然又富有意义。
图 2-4 展示了 ChatGPT 的训练过程,涉及三个主要步骤:
演示数据的收集
在这一步骤中,人类标注员提供了一系列提示的期望模型行为的示例。标注员在项目上进行培训,并遵循具体指令以准确标注提示。
训练监督策略
上一步收集的演示数据用于使用监督学习微调预训练的 GPT-3 模型。在监督学习中,模型在提供正确答案的标记数据集上进行训练。这一步骤有助于模型学习遵循给定的指令并产生符合期望行为的输出。
比较数据的收集和强化学习
在这一步中,收集了一个模型输出的数据集,并且人类标注者根据他们的偏好对这些输出进行排名。然后训练一个奖励模型来预测标注者会偏好哪些输出。最后,使用强化学习技术,特别是近端策略优化(PPO)算法,来优化监督策略,以最大化奖励模型中的奖励。
这个训练过程允许 ChatGPT 模型使其行为与人类意图保持一致。使用带有人类反馈的强化学习有助于创建一个比预训练的 GPT-3 模型更有帮助、更诚实、更安全的模型。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0204.png
图 2-4. ChatGPT 的微调过程
根据一份UBS 研究,到 2023 年 1 月,ChatGPT 设定了新的基准,积累了 1 亿活跃用户,成为互联网历史上增长最快的消费应用。ChatGPT 现在已成为客户服务、虚拟助手和许多其他需要类似人类对话技巧的应用的首选。
GPT-4
2024 年,OpenAI 发布了 GPT-4,它在理解复杂查询和生成上下文相关且连贯的文本方面表现出色。例如,GPT-4 在律师资格考试中得分达到 90 分,总分 400 分。目前,GPT-3.5-turbo 在 ChatGPT 中免费使用,但 GPT-4 需要每月支付。
GPT-4 采用mixture-of-experts 方法;它不仅超越了依赖单个模型的推理来产生更准确和有洞察力的结果。
2024 年 5 月 13 日,OpenAI 推出了GPT-4o,这是一个能够实时处理和推理文本、音频和视觉输入的高级模型。该模型在视觉和音频理解方面提供了增强的性能;由于其能够在单个神经网络中处理所有三种模态,因此它比其前辈更快、更经济。
谷歌的 Gemini
由于 ChatGPT 的使用导致谷歌失去了搜索市场份额,它最初于 2023 年 3 月 21 日发布了 Bard。Bard 起初有些粗糙,并且最初并没有提供与 ChatGPT 相同的高质量 LLM 响应(图 2-5)。
谷歌一直在为 Bard 添加额外功能,包括代码生成、视觉 AI、实时搜索和语音,使其在质量上更接近 ChatGPT。
2023 年 3 月 14 日,谷歌发布了PaLM API,允许开发者在谷歌云平台上访问它。2023 年 4 月,亚马逊网络服务(AWS)发布了类似的服务,如Amazon Bedrock和亚马逊的 Titan FMs。谷歌在 2024 年 2 月的 v1.5 版本中将 Bard 重新命名为 Gemini,并开始获得与 GPT-4 类似的结果。
https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0205.png
图 2-5. Bard 对詹姆斯·韦伯太空望远镜的幻想结果
此外,谷歌还发布了基于与 Gemini 相同架构的两个较小的 开源模型。OpenAI 终于不再是软件工程师将最先进的 LLM 集成到其应用程序中的唯一明显选择。
Meta 的 Llama 和开源
Meta 对语言模型的方法与行业中的其他竞争对手存在显著差异。通过依次发布开源模型 Llama,Llama 2 和 Llama 3,Meta 旨在培养一个更加包容和协作的人工智能开发生态系统。
Llama 2 和 Llama 3 的开源特性对更广泛的科技行业具有重大影响,尤其是对大型企业而言。透明度和协作精神鼓励快速创新,因为问题和漏洞可以由全球开发者社区迅速识别和解决。随着这些模型变得更加稳健和安全,大型企业可以更加有信心地采用它们。
Meta 的开源策略不仅使最先进的 AI 技术的访问民主化,而且有可能在整个行业中产生重大影响。通过为协作、透明和去中心化的开发过程奠定基础,Llama 2 和 Llama 3 是开创性的模型,它们很可能定义了生成式 AI 的未来。这些模型在 AWS、Google Cloud、Hugging Face 和其他平台上有 70 亿、80 亿和 700 亿参数版本。
这些模型的开放源代码特性是一把双刃剑。一方面,它使竞争更加公平。这意味着即使是较小的开发者也有机会为创新做出贡献,改进和应用开源模型到实际商业应用中。这种去中心化的创新可能导致在单一组织围墙花园内可能不会发生的突破,从而增强模型的能力和应用。
然而,正是这种使这一切成为可能的开源性也带来了潜在的风险,因为它可能允许恶意行为者利用这项技术达到有害的目的。这确实是像 OpenAI 这样的组织所关注的问题,他们建议一定程度上的控制和限制实际上可以用来减轻这些强大工具的危险应用。
利用量化(Quantization)和 LoRA(Low-Rank Adaptation)
这些开源模型中具有颠覆性的一个方面是量化和LoRA(低秩近似)的潜力。这些技术允许开发者将模型适配到更小的硬件占用空间。量化有助于降低模型参数的数值精度,从而在不显著影响性能的情况下缩小模型的整体大小。同时,LoRA 有助于优化网络的架构,使其在消费级硬件上运行更加高效。
这样的优化使得在消费级硬件上微调这些 LLM 变得越来越可行。这是一个关键的发展,因为它允许进行更多的实验和适应性调整。不再局限于高性能数据中心,个人开发者、小型企业和初创公司现在可以在资源受限的环境中工作这些模型。
Mistral
Mistral 7B,法国初创公司Mistral AI的杰作,在生成式 AI 领域崭露头角,其 73 亿个参数产生了重大影响。这个模型不仅仅关乎规模;它关乎效率和能力,为开源大型语言模型及其在众多用例中的应用前景提供了光明。其效率的关键在于滑动窗口注意力的实现,这是一种在宽松的 Apache 开源许可证下发布的技巧。许多 AI 工程师以此模型为基础进行了微调,包括令人印象深刻的Zephr 7b beta模型。还有Mixtral 8x7b,这是一个专家混合模型(类似于 GPT-4 的架构),其结果与 GPT-3.5-turbo 相似。
想要更详细和最新的开源模型及其性能指标的比较,请访问由 Hugging Face 主办的聊天机器人排行榜。
Anthropic: Claude
2023 年 7 月 11 日发布的Claude 2通过其开创性的宪法 AI方法在 AI 安全和一致性方面脱颖而出——使用一系列规则或价值观来训练模型。Claude 2 的一个显著改进是其扩展的上下文窗口,达到 10 万个标记,以及上传文件的能力。在生成式 AI 领域,上下文窗口指的是模型在生成响应时可以积极考虑或记住的文本或数据量。拥有更大的上下文窗口,模型可以基于更广泛的环境进行理解和生成。
这一进步引起了 AI 工程师的极大热情,因为它为新的更复杂的用例开辟了道路。例如,Claude 2 同时处理更多信息的能力使其擅长总结大量文档或维持深入对话。然而,这一优势很快就被 OpenAI 发布的 GPT-4 128K 版本所取代,仅六个月后。然而,竞争对手之间的激烈竞争正在推动该领域向前发展。
Claude 的下一代包括了Opus,这是第一个在智能方面与 GPT-4 相媲美的模型,以及 Haiku,这是一个更小的模型,速度极快,价格为每百万个标记 0.25 美元(当时是 GPT-3.5-turbo 成本的一半)。
GPT-4V(视觉)
在一个重大的飞跃中,2023 年 9 月 23 日,OpenAI 通过引入视觉功能扩展了 GPT-4 的能力,使用户能够指令 GPT-4 分析图像和文本。这一创新也体现在 ChatGPT 界面的更新中,现在它支持包含图像和文本作为用户输入。这一发展标志着向多模态模型的重大趋势转变,这些模型可以在单一语境中无缝处理和理解多种类型的数据,如图像和文本。
模型比较
在撰写本文时,LLM 市场由 OpenAI 主导,其最先进的 GPT-4 模型被广泛认为具有显著领先优势。最接近的竞争对手是 Anthropic,人们对 Llama 和 Mistral 等小型开源模型在微调方面的潜力感到兴奋。尽管评论家们预计 OpenAI 未来将继续推出世界领先的模型,但随着开源模型在更多任务上达到足够好的水平,AI 工作负载可能会转向本地微调模型。随着模型性能和量化(在精度、大小和计算成本之间权衡的方法)的进步,未来某一天在您的手机或其他设备上运行 LLM 可能成为可能。
目前,了解模型擅长什么最好的方法是在多个模型上运行相同的提示并比较响应。在我们工作中经常突出的一点是,GPT-4 在遵循指令方面表现得更好,如下面的例子所示,它是唯一一个以正确格式、以字母i开头并符合要求的名称响应的模型。
输入:
Brainstorm a list of product names for a shoe that fits any foot size,
in the style of Steve Jobs.
Return the results as a comma-separated list, in this format:
Product description: A shoe that fits any foot size
Product names: [list of three product names]
## Examples
Product description: A refrigerator that dispenses beer
Product names: iBarFridge, iFridgeBeer, iDrinkBeerFridge
Product description: A watch that can tell accurate time in space
Product names: iNaut, iSpace, iTime
Product description: A home milkshake maker
Product names: iShake, iSmoothie, iShake Mini
输出(GPT-4):
Product description: A shoe that fits any foot size
Product names: iFit, iShoe, iFlexible
输出(Claude 3):
Product description: A shoe that fits any foot size
Product names: iFit, iComfort, iSole
输出(Llama 3 70b):
Here is the list of product names for a shoe that fits any foot size,
in the style of Steve Jobs:
Product description: A shoe that fits any foot size Product names:
iFit, OneSize, iWalkFree
摘要
从 LLM 的诞生到其当前的高级状态,这是一个关于不懈创新、合作和激烈竞争的故事。随着这些模型继续进化,它们很可能会成为我们日常生活中的重要组成部分,改变我们与技术以及彼此互动的方式。
通过理解这些模型的历史背景和功能,您可以更好地欣赏我们可用于各种应用的工具,从提示工程到智能虚拟代理的开发。然而,需要注意的是,尽管这些模型提供了广阔的可能性,数据隐私仍然是一个关键的关注点。如果这些模型使用您的数据进行再训练或微调,请谨慎行事,并避免输入敏感信息。
在下一章中,您将学习所有基本的提示工程技术,用于与文本 LLMs(大型语言模型)一起工作。您将学习到充分利用像 GPT-4 这样的强大语言模型所需的基本技能。随着您解锁生成式 AI 的真正潜力,您将迎来令人兴奋的见解和实用的方法。
第三章:使用 ChatGPT 进行文本生成的标准实践
简单的提示技巧将帮助你最大化 LLM 的输出和格式。你将从定制提示开始,以探索用于文本生成的所有常见实践。
生成列表
自动生成列表功能非常强大,使你能够专注于更高层次的任务,同时 GPT 可以代表你自动生成、优化、重新排序和去重列表。
输入:
Generate a list of Disney characters.
输出:
Sure, here is a list of some popular Disney characters:
1\. Mickey Mouse
2\. Minnie Mouse
...
30\. Bagheera (The Jungle Book)
GPT-4 完全能够提供角色列表。然而,这种方法存在一些陷阱:
-
GPT 决定提供 30 个示例作为编号列表,列表项之间由
\n字符分隔。然而,如果你的下游 Python 代码期望在项目符号处分割,那么你可能会得到不理想的结果或运行时错误。 -
GPT 提供了前置说明;移除任何前置/后续说明将使解析输出更容易。
-
列表大小未受控制,留给了语言模型。
-
一些角色的名字在括号内带有他们对应电影的名称——例如,Bagheera (The Jungle Book)——而另一些则没有。这使得提取名字变得更加困难,因为你需要移除电影标题。
-
在基于我们期望结果的 LLM 生成过程中没有应用任何过滤或选择。
以下是一个优化的提示。
输入:
Generate a bullet-point list of 5 male Disney characters.
Only include the name of the character for each line.
Never include the film for each Disney character.
Only return the Disney characters, never include any commentary.
Below is an example list:
* Aladdin
* Simba
* Beast
* Hercules
* Tarzan
输出:
* Woody
* Buzz Lightyear
* Stitch
* Jack Sparrow
* Prince Charming
提供示例
简单地重新措辞你的提示以包含示例 (few-shot prompting) 可以极大地影响期望的输出。
通过优化提示,你已实现以下效果:
-
将列表限制为固定大小五项
-
仅生成男性角色
-
正确地使用项目符号格式化列表
-
移除了任何前置说明
简单列表适用于大多数任务;然而,它们结构化程度较低,对于某些任务,从 GPT-4 输出中获得嵌套数据结构是有益的。
三种典型数据结构包括:
-
嵌套文本数据(层级列表)
-
JSON
-
YAML
层级列表生成
层级列表在期望的输出是嵌套时非常有用。一个很好的例子是详细的文章结构。
输入:
Generate a hierarchical and incredibly detailed article outline on:
What are the benefits of data engineering.
See an example of the hierarchical structure below:
Article Title: What are the benefits of digital marketing?
* Introduction
a. Explanation of digital marketing
b. Importance of digital marketing in today's business world
* Increased Brand Awareness
a. Definition of brand awareness
b. How digital marketing helps in increasing brand awareness
输出结果:
Article Title: What are the benefits of data engineering?
* Introduction
a. Explanation of data engineering
b. Importance of data engineering in today’s data-driven world
...(10 sections later)...
* Conclusion
a. Importance of data engineering in the modern business world
b. Future of data engineering and its impact on the data ecosystem
为了在前面的输出中生成有效的文章大纲,你包括了两个关键短语:
层级结构
建议文章大纲需要生成嵌套结构。
极其详细
指导语言模型生成更大的输出。其他可以包含以产生相同效果的词语包括 非常长 或通过指定大量子标题,包括至少 10 个顶级标题。
注意
请求语言模型生成固定数量的项目并不保证语言模型会生成相同长度的输出。例如,如果你请求 10 个标题,你可能会只收到 8 个。因此,你的代码应该验证是否存在 10 个标题,或者能够灵活处理来自 LLM 的不同长度。
因此,你已经成功生成了一个分层文章概要,但如何将字符串解析成结构化数据呢?
让我们用 Python 探索 示例 3-1,其中你之前已成功对 OpenAI 的 GPT-4 进行了 API 调用。这里使用了两个正则表达式来从 openai_result 中提取标题和子标题。Python 中的 re 模块用于处理正则表达式。
示例 3-1. 解析分层列表
import re
# openai_result = generate_article_outline(prompt)
# Commented out to focus on a fake LLM response, see below:
openai_result = '''
* Introduction
a. Explanation of data engineering
b. Importance of data engineering in today’s data-driven world
* Efficient Data Management
a. Definition of data management
b. How data engineering helps in efficient data management
* Conclusion
a. Importance of data engineering in the modern business world
b. Future of data engineering and its impact on the data ecosystem
'''
# Regular expression patterns
heading_pattern = r'\* (.+)'
subheading_pattern = r'\s+[a-z]\. (.+)'
# Extract headings and subheadings
headings = re.findall(heading_pattern, openai_result)
subheadings = re.findall(subheading_pattern, openai_result)
# Print results
print("Headings:\n")
for heading in headings:
print(f"* {heading}")
print("\nSubheadings:\n")
for subheading in subheadings:
print(f"* {subheading}")
此代码将输出:
Headings:
- Introduction
- Efficient Data Management
- Conclusion
Subheadings:
- Explanation of data engineering
- Importance of data engineering in today’s data-driven world
- Definition of data management
- How data engineering helps in efficient data management
- Importance of data engineering in the modern business world
- Future of data engineering and its impact on the data ecosystem
正则表达式的使用允许进行高效的模式匹配,使得处理输入文本的变化(如是否存在前导空格或制表符)成为可能。让我们探索这些模式是如何工作的:
heading_pattern = r'\* (.+)'
此模式旨在提取主要标题,由以下部分组成:
-
\*匹配标题开头处的星号(*)符号。反斜杠用于转义星号,因为在正则表达式中星号具有特殊含义(前一个字符出现零次或多次)。 -
星号之后将匹配一个空格字符。
-
(.+): 匹配一个或多个字符,括号创建了一个捕获组。.是一个通配符,匹配除换行符之外的任何字符,而+是一个量词,表示前面的元素(在这种情况下是点)出现一次或多次。
通过应用此模式,你可以轻松地将所有主要标题提取到一个列表中,而不包含星号。
subheading_pattern = r'\s+[a-z]\. (.+)
subheading pattern 将匹配 openai_result 字符串中的所有子标题:
-
\s+匹配一个或多个空白字符(空格、制表符等)。+表示前面的元素(在这种情况下是\s)出现一次或多次。 -
[a-z]匹配从 a 到 z 的单个小写字母。 -
\.匹配点字符。反斜杠用于转义点,因为在正则表达式中点具有特殊含义(匹配除换行符之外的任何字符)。 -
点号之后将匹配一个空格字符。
-
(.+)匹配一个或多个字符,括号创建了一个捕获组。.是一个通配符,匹配除换行符之外的任何字符,而+是一个量词,表示前面的元素(在这种情况下是点)出现一次或多次。
此外,re.findall() 函数用于在输入字符串中查找所有非重叠模式的匹配项,并将它们作为列表返回。然后打印提取的标题和子标题。
因此,你现在能够从分层文章概要中提取标题和子标题;然而,你可以进一步细化正则表达式,以便每个标题都与相应的 subheadings 相关联。
在 示例 3-2 中,正则表达式略有修改,以便每个子标题直接与其相应的子标题相关联。
示例 3-2. 将分层列表解析为 Python 字典
import re
openai_result = """
* Introduction
a. Explanation of data engineering
b. Importance of data engineering in today’s data-driven world
* Efficient Data Management
a. Definition of data management
b. How data engineering helps in efficient data management
c. Why data engineering is important for data management
* Conclusion
a. Importance of data engineering in the modern business world
b. Future of data engineering and its impact on the data ecosystem
"""
section_regex = re.compile(r"\* (.+)")
subsection_regex = re.compile(r"\s*([a-z]\..+)")
result_dict = {}
current_section = None
for line in openai_result.split("\n"):
section_match = section_regex.match(line)
subsection_match = subsection_regex.match(line)
if section_match:
current_section = section_match.group(1)
result_dict[current_section] = []
elif subsection_match and current_section is not None:
result_dict[current_section].append(subsection_match.group(1))
print(result_dict)
这将输出:
{
"Introduction": [
"a. Explanation of data engineering",
"b. Importance of data engineering in today’s data-driven world"
],
"Efficient Data Management": [
"a. Definition of data management",
"b. How data engineering helps in efficient data management"
],
"Conclusion": [
"a. Importance of data engineering in the modern business world",
"b. Future of data engineering and its impact on the data ecosystem"
]
}
部分标题正则表达式 r'\* (.+)' 匹配一个星号后跟一个空格和更多字符。括号捕获星号和空格后面的文本,以便在代码中稍后使用。
子部分正则表达式 r'\s*([a-z]\..+)' 以 \s* 开头,它匹配零个或多个空白字符(空格或制表符)。这允许正则表达式匹配带有或不带有前导空格或制表符的子部分。接下来的部分 ([a-z]\..+) 匹配一个小写字母后跟一个点和一个或多个字符。括号捕获整个匹配的子部分文本,以便在代码中稍后使用。
for 循环遍历输入字符串 openai_result 的每一行。当遇到与部分标题正则表达式匹配的行时,循环将匹配的标题设置为当前部分,并在 result_dict 字典中为其分配一个空列表作为其值。当一行与子部分正则表达式匹配时,匹配的子部分文本被追加到对应当前部分的列表中。
因此,循环逐行处理 输入字符串,将行分类为部分标题或子部分,并构建所需的字典结构。
避免使用正则表达式的时机
当你努力从 LLM 响应中提取更多结构化数据时,仅依靠正则表达式会使控制流变得越来越复杂。然而,还有其他格式可以轻松地促进从 LLM 响应中解析结构化数据。两种常见的格式是 .json 和 .yml 文件。
生成 JSON
让我们从实验一些提示设计开始,这将指导一个大型语言模型返回 JSON 响应。
输入:
Compose a very detailed article outline on "The benefits of learning code" with a
JSON payload structure that highlights key points.
Only return valid JSON.
Here is an example of the JSON structure:
{
"Introduction": [
"a. Explanation of data engineering",
"b. Importance of data engineering in today’s data-driven world"],
...
"Conclusion": [
"a. Importance of data engineering in the modern business world",
"b. Future of data engineering and its impact on the data ecosystem"]
}
输出:
{
"Introduction": [
"a. Overview of coding and programming languages",
"b. Importance of coding in today's technology-driven world"],
...
"Conclusion": [
"a. Recap of the benefits of learning code",
"b. The ongoing importance of coding skills in the modern world"]
}
给出指示并提供示例
注意,在前面的提示中,你已经提供了关于任务类型、格式和示例 JSON 输出的指示。
当处理 JSON 时,你可能会遇到的一些常见错误包括无效的有效负载,或者 JSON 被三重反引号包裹 (```py) , such as:
Output:
当然,这是 JSON:
```pyjson
{"Name": "John Smith"} # valid payload
{"Name": "John Smith", "some_key":} # invalid payload
Ideally you would like the model to respond like so:
Output:
{“Name”: “John Smith”}
This is important because with the first output, you’d have to split after `json` and then parse the exact part of the string that contained valid JSON. There are several points that are worth adding to your prompts to improve JSON parsing:
你必须遵循以下原则:
-
仅返回有效的 JSON
-
永远不要包含反引号符号,例如:`
-
响应将通过 json.loads() 解析,因此它必须是有效的 JSON。
Now let’s examine how you can parse a [JSON output with Python](https://oreil.ly/MoJHn):
import json
openai_json_result = generate_article_outline(prompt)
openai_json_result = “”"
{
“Introduction”: [
“a. 编码和编程语言的概述”,
“b. 当今技术驱动世界中编码的重要性”],
“Conclusion”: [
“a. 回顾学习代码的好处”,
“b. 现代世界中编码技能的持续重要性”]
}
“”"
parsed_json_payload = json.loads(openai_json_result)
print(parsed_json_payload)
‘’'{‘Introduction’: [‘a. 编码和编程语言的概述’,
“b. 当今技术驱动世界中编码的重要性”],
‘Conclusion’: [‘a. 回顾学习代码的好处’,
‘b. The ongoing importance of coding skills in the modern world’]}‘’’
Well done, you’ve successfully parsed some JSON.
As showcased, structuring data from an LLM response is streamlined when requesting the response in valid JSON format. Compared to the previously demonstrated regular expression parsing, this method is less cumbersome and more straightforward.
So what could go wrong?
* The language model accidentally adds extra text to the response such as `json output:` and your application logic only handles for valid JSON.
* The JSON produced isn’t valid and fails upon parsing (either due to the size or simply for not escaping certain characters).
Later on you will examine strategies to gracefully handle for such edge cases.
## YAML
*.yml* files are a structured data format that offer different benefits over *.json*:
No need to escape characters
YAML’s indentation pattern eliminates the need for braces, brackets, and commas to denote structure. This can lead to cleaner and less error-prone files, as there’s less risk of mismatched or misplaced punctuation.
Readability
YAML is designed to be human-readable, with a simpler syntax and structure compared to JSON. This makes it easier for you to create, read, and edit prompts, especially when dealing with complex or nested structures.
Comments
Unlike JSON, YAML supports comments, allowing you to add annotations or explanations to the prompts directly in the file. This can be extremely helpful when working in a team or when revisiting the prompts after some time, as it allows for better understanding and collaboration.
Input:
-
以下是你将找到的当前 yaml 模式。
-
您可以根据用户查询更新数量。
-
根据以下模式过滤用户查询,如果不匹配则
如果没有剩余的项目,则返回 "No Items"。
- 如果有部分匹配,则只返回匹配的项目
在以下模式内:
模式:
- 项目:苹果片
数量:5
单位:个
- 项目:牛奶
数量:1
单位:加仑
- 项目:面包
数量:2
单位:条
- 项目:鸡蛋
数量:1
单位:打
用户查询:“5 个苹果片,2 打鸡蛋。”
根据以下模式,请仅返回有效的.yml
查询。如果没有匹配项,则返回 "No Items"。不要提供任何
注释或解释。
Output:
- 项目:苹果片
数量:5
单位:个
- 项目:鸡蛋
数量:2
单位:打
Notice with the preceding example how an LLM is able to infer the correct *.yml* format from the `User Query` string.
Additionally, you’ve given the LLM an opportunity to either:
* Return a valid *.yml* response
* Return a filtered *.yml* response
If after filtering, there are no *.yml* items left, then return *No Items*.
# Filtering YAML Payloads
You might decide to use this same prompt for cleaning/filtering a *.yml* payload.
First, let’s focus on a payload that contains both valid and invalid `schema` in reference to our desired `schema`. `Apple slices` fit the criteria; however, `Bananas` doesn’t exist, and you should expect for the `User Query` to be appropriately filtered.
Input:
用户查询:
- 项目:苹果片
数量:5
单位:个
- 项目:香蕉
数量:3
单位:个
Output:
更新的 yaml 列表
- 项目:苹果片
数量:5
单位:个
In the preceding example, you’ve successfully filtered the user’s payload against a set criteria and have used the language model as a *reasoning engine*.
By providing the LLM with a set of instructions within the prompt, the response is closely related to what a human might do if they were manually cleaning the data.
The input prompt facilitates the delegation of more control flow tasks to a language learning model (LLM), tasks that would typically require coding in a programming language like Python or JavaScript.
Figure 3-1 provides a detailed overview of the logic applied when processing user queries by an LLM.
<https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0301.png>
###### Figure 3-1\. Using an LLM to determine the control flow of an application instead of code
# Handling Invalid Payloads in YAML
A completely invalid payload might look like this:
Input:
用户查询:
- 项目:香蕉
数量:3
单位:个
Output:
没有项目
As expected, the LLM returned `No Items` as none of the `User Query` items matched against the previously defined `schema`.
Let’s create a Python script that gracefully accommodates for the various types of LLM results returned. The core parts of the script will focus on:
* Creating custom exceptions for each type of error that might occur due to the three LLM response scenarios
* Parsing the proposed schema
* Running a serious of custom checks against the response so you can be sure that the YML response can be safely passed to downstream software applications/microservices
You could define six specific errors that would handle for all of the edge cases:
class InvalidResponse(Exception):
pass
class InvalidItemType(Exception):
pass
class InvalidItemKeys(Exception):
pass
class InvalidItemName(Exception):
pass
class InvalidItemQuantity(Exception):
pass
class InvalidItemUnit(Exception):
pass
Then provide the previously proposed `YML schema` as a string:
提供的模式
schema = “”"
- 项目:苹果片
数量:5
单位:个
- 项目:牛奶
数量:1
单位:加仑
- 项目:面包
数量:2
单位:条
- 项目:鸡蛋
数量:1
单位:打
“”"
Import the `yaml` module and create a custom parser function called `validate_``response` that allows you to easily determine whether an LLM output is valid:
导入 yaml
def validate_response(response, schema):
# 解析模式
schema_parsed = yaml.safe_load(schema)
最大数量 = 10
# 检查响应是否为列表
if not isinstance(response, list):
raise InvalidResponse("响应不是列表")
# 检查列表中的每个项目是否是字典
for item in response:
if not isinstance(item, dict):
raise InvalidItemType('''项目不是字典''')
# 检查每个字典是否具有 "item","quantity" 和 "unit" 键
if not all(key in item for key in ("item", "quantity", "unit")):
raise InvalidItemKeys("项目没有正确的键")
# 检查与每个键关联的值是否是正确的类型
if not isinstance(item["item"], str):
raise InvalidItemName("项目名称不是字符串")
if not isinstance(item["quantity"], int):
raise InvalidItemQuantity("项目数量不是整数")
if not isinstance(item["unit"], str):
raise InvalidItemUnit("项目单位不是字符串")
# 检查与每个键关联的值是否是正确的值
if item["item"] not in [x["item"] for x in schema_parsed]:
raise InvalidItemName("项目名称不在模式中")
if item["quantity"] > 最大数量:
raise InvalidItemQuantity(f'''项目数量大于
{最大数量}''')
if item["unit"] not in ["pieces", "dozen"]:
raise InvalidItemUnit("项目单位不是个或打")
To test these edge cases, following you’ll find several mocked LLM responses:
模拟响应
fake_response_1 = “”"
- 项目:苹果片
数量:5
单位:个
- 项目:鸡蛋
数量:2
单位:打
“”"
fake_response_2 = “”"
更新后的 yaml 列表
- 项目:苹果片
数量:5
单位:件
“”"
fake_response_3 = “”“Unmatched”“”
Finally, now you can:
* Use `yaml.safe_load(response)` to safely parse the *.yml* schema
* Call the `validate_response` function for each LLM response to test it against custom *.yml* validation logic
解析伪造的响应
response_1_parsed = yaml.safe_load(fake_response_1)
response_2_parsed = yaml.safe_load(fake_response_2)
response_3_parsed = yaml.safe_load(fake_response_3)
验证响应是否符合模式
try:
validate_response(response_1_parsed, schema)
print("响应 1 是有效的")
except Exception as e:
print("响应 1 是无效的:", str(e))
try:
validate_response(response_2_parsed, schema)
print("响应 2 是有效的")
except Exception as e:
print("响应 2 是无效的:", str(e))
try:
validate_response(response_3_parsed, schema)
print("响应 3 是有效的")
except Exception as e:
print("响应 3 是无效的:", str(e))
响应 1 是有效的
响应 2 是有效的
响应 3 是无效的:响应不是列表
# Diverse Format Generation with ChatGPT
ChatGPT is not only capable of generating *.json* or *.yml* files but can also support many other data formats. For example, it can aid you in creating a mermaid diagram or generating a mock *.csv* document. Its versatility stretches to even generating code, conversations, and scripts.
Let’s explore in detail how ChatGPT can produce diverse formatted content. Mermaid is a markdown-like scripting language for generating charts, diagrams, and other visualizations.
Input:
使用 mermaid 语法创建一个简单的食物订购流程图
从“选择食物”开始,以“付费用餐”结束的过程
包含“添加到购物车”和“确认购物车”的步骤。
Output:
graph TD
选择食物[选择食物] --> 添加到购物车[添加到购物车]
添加到购物车 --> 确认购物车[确认购物车]
确认购物车 --> 付费用餐[付费用餐]
This output, when input to a mermaid diagram renderer, will create a precise flowchart reflecting the food ordering process (Figure 3-2).
<https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0302.png>
###### Figure 3-2\. A streamlined flow diagram created using mermaid syntax via an LLM
## Mock CSV Data
ChatGPT can also generate mock CSV data that can be utilized for testing purposes or other uses.
Input:
为五名学生生成以下字段的样本 CSV 数据:
姓名,年龄,年级。
Output:
姓名,年龄,年级
约翰,16,A
阿历克斯,17,B
艾玛,16,A
索菲亚,16,B
李姆,17,A
The example generated is a well-formatted CSV file, which can be loaded directly into many tools that read CSV data.
Remember, GPT models rely heavily on the format you specify. For instance, specifying CSV implies the AI should use commas as separators. Be clear about the desired format for optimal results.
You’ve explored different ways of extracting structured data from language models including regular expressions, JSON, YML, and other formats. You can also now use LLMs to direct parts of your application’s control flow.
# Explain It like I’m Five
The *Explain It like I’m Five* prompt focuses on explaining a given section of text *as if you were talking to a five-year-old child.*
This style of prompting is very effective and is a great way to convert technical documents into simple summaries that anyone can understand.
Let’s apply this prompt to an abstract of [“Tumor Microenvironment: Recent Advances in Various Cancer Treatments”](https://oreil.ly/MDOsz).
Input:
像我五岁时一样解释以下文本。就像我五岁一样解释。
文本:这是一篇关于不同类型癌症治疗的评论。
我们旨在分析肿瘤微环境和最近的治疗应用趋势
对于几种类型的治疗效果和有效性
癌症的。传统上,癌症治疗基于
新生细胞。
…(为简洁起见省略)…
Output:
癌症是一种疾病,使你的身体细胞以不正常的方式生长和分裂
那是不正常的。医生有不同的方法来治疗癌症,比如
切掉它,给你药,或者使用特殊光线来杀死
癌细胞…
The ChatGPT output from the preceding example is much more digestible than the abstract.
# Universal Translation Through LLMs
Additionally, language models can potentially act as *universal translators* due to their proficiency in understanding and generating human-like text across multiple languages.
Let’s investigate this with a simple example:
1. Generate some simple text.
2. Change the simple text and make it difficult to read.
3. Then transform the same text back into an easy to read format in Spanish (instead of English).
4. Revert the simplistic text back into English.
Input:
创建一些简单的文本。
Output:
太阳缓缓升起,用橙色和粉色的色调描绘天空。鸟儿
清新的微风穿过树木,带着新鲜草和盛开的花的香气。生命
树木,带着新鲜草和盛开的花的香气。生命
感觉到宁静和希望的觉醒。
…(为简洁起见省略)…
Input:
现在将文本变得极其复杂且难以阅读。
Output:
在天体上升的虚幻领域,其中辉煌
太阳的球体开始缓慢上升,这是一个精美现象
展开,在广阔的天空上产生了一曲色彩绚丽的交响乐
天空的广阔冠冕。鸟类居民,赋予
甜美的声音,表达他们的欢乐问候,点缀
带有幸福之音的听觉织物。微风,温柔地
在树木哨兵中飘荡,带着它轻柔的潮流
鲜绿叶片和盛开的花朵的香气。
…(为简洁起见省略)…
Input:
现在将文本转换成简单、易于阅读的西班牙语格式
Output:
Input:
Output:
As you can see, part of the meaning is lost while doing the translation, and this is to be expected. However, large parts of the original text’s intent remains, demonstrating that LLMs can act as *highly effective translators* for languages that have adequate resources on the internet. For languages that do not have a good amount of resources, it will yield bad results.
The same logic applies to coding languages. LLMs are very good at generating code for established programming languages such as Python and JavaScript but perform worse for newer coding languages and packages.
The boundaries between different forms of information are becoming *increasingly fluid*. The essence of information itself is evolving, allowing for effortless transformations of summaries into stories, poems, or other creative expressions, ultimately enriching our understanding and engagement with the content.
*Diffusion models* are a unique class of generative models utilized in machine learning, specifically designed to produce new images that mimic those found in the training set.
Moreover, when you combine language models with diffusion models, it enables seamless transitions between text, video, and other modalities. This makes it even simpler for you to convey complex ideas across various formats, facilitating a more accessible and comprehensive experience.
# Ask for Context
LLMs are not only capable of generating text but can also act as simple agents with a limited amount of *reasoning capability.* This allows you to write a prompt asking the language model to either:
* Return a valid result to a question or statement
* Ask for more context to appropriately answer the question
In this section, you’ll learn about the importance of *asking for context* when working with LLMs such as GPT-4\. We will start with an example of a prompt that doesn’t provide enough context, resulting in a less useful response.
Then, we will provide a better prompt that encourages the model to ask for additional context if needed. Finally, we will use the additional context provided to generate a more informed response.
Input:
Output:
In the preceding prompt, the model does not have enough context to make a meaningful recommendation. Instead, you can ask ChatGPT for a list of recommended points that would help it to make an effective decision.
Input:
Output:
After prompting ChatGPT about how to make an informed decision, now you’re aware of what to include within your prompt to help the language model with deciding.
Input:
- 数据完整性和约束:强制执行严格的外键
关系和复杂的数据验证规则。
- 事务和并发:需要处理高水平的
在保持数据一致性的同时进行并发读写操作。
- 成熟度和稳定性:优先考虑建立良好且广泛使用的
经过测试的数据库以实现长期稳定性。
- 开发和运营便捷性:采用敏捷开发方法,需要
用于快速原型设计和迭代模式设计。
- 成本:预算限制需要使用开源解决方案,
无需额外的许可或支持费用。
如果您需要更多上下文,请具体说明什么可以帮助您做出
以做出更好的决定。
Output:
基于这些考虑,MongoDB 和 PostgreSQL 都有其
强调数据一致性的重要性,但考虑到复杂的数据,
查询和数据完整性约束,PostgreSQL 似乎是一个更
适合您项目的选择。
In this final example, the model uses the additional context provided to give a well-informed recommendation for using PostgreSQL. By asking for context when necessary, LLMs like ChatGPT and GPT-4 can deliver more valuable and accurate responses.
Figure 3-3 demonstrates how *asking for context* changes the decision-making process of LLMs. Upon receiving user input, the model first assesses whether the context given is sufficient. If not, it prompts the user to provide more detailed information, emphasizing the model’s reliance on context-rich inputs. Once adequate context is acquired, the LLM then generates an informed and relevant response.
<https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0303.png>
###### Figure 3-3\. The decision process of an LLM while asking for context
# Allow the LLM to Ask for More Context by Default
You can allow the LLM to ask for more context as a default by including this key phrase: *If you need more context, please specify what would help you to make a better decision.*
In this section, you’ve seen how LLMs can act as agents that use environmental context to make decisions. By iteratively refining the prompt based on the model’s recommendations, we eventually reach a point where the model has *enough context to make a well-informed decision.*
This process highlights the importance of providing sufficient context in your prompts and being prepared to ask for more information when necessary. By doing so, you can leverage the power of LLMs like GPT-4 to make more accurate and valuable recommendations.
In agent-based systems like GPT-4, the ability to ask for more context and provide a finalized answer is crucial for making well-informed decisions. [AutoGPT](https://oreil.ly/l3Ihy), a multiagent system, has a self-evaluation step that automatically checks whether the task can be completed given the current context within the prompt. This technique uses an actor–critic relationship, where the existing prompt context is being analyzed to see whether it could be further refined before being executed.
# Text Style Unbundling
*Text style unbundling* is a powerful technique in prompt engineering that allows you to extract and isolate specific textual features from a given document, such as tone, length, vocabulary, and structure.
This allows you to create new content that shares similar characteristics with the original document, ensuring consistency in style and tone across various forms of communication.
This consistency can be crucial for businesses and organizations that need to communicate with a unified voice across different channels and platforms. The benefits of this technique include:
Improved brand consistency
By ensuring that all content follows a similar style, organizations can strengthen their brand identity and maintain a cohesive image.
Streamlined content creation
By providing a clear set of guidelines, writers and content creators can more easily produce materials that align with a desired style.
Adaptability
Text style unbundling allows for the easy adaptation of existing content to new formats or styles while preserving the core message and tone.
The process of text style unbundling involves *identifying the desired textual features* or creating a meta prompt (a prompt to create prompts) to extract these features and then using the extracted features to guide the generation of new content.
# Identifying the Desired Textual Features
To successfully unbundle a text style, you must first identify the specific features you want to extract from the input document. Common textual features to consider include:
Tone of voice
The overall mood or attitude conveyed by the text, such as formal, casual, humorous, or authoritative
Length
The desired word count or general length of the content
Vocabulary and phrasing
The choice of words and expressions, including industry-specific jargon, colloquialisms, and complexity
Additionally, you can even create a meta prompt that will dynamically show you these.
Input:
分析以下文本并确定核心功能,这些功能将允许
进一步的文档以模仿相同的风格、一致性和格式
如下文所述。
提供所有功能作为将用作风格写作指南的所有内容,
用于未来文档。
分析以下文本:
人工智能正在迅速改变我们周围的世界,
开辟新的可能性并颠覆行业。从
从医疗保健到娱乐,人工智能正在使我们的生活变得更好,更加
方便之处不胜枚举。然而,权力越大,责任越大,
责任。我们必须确保人工智能技术得到开发和
以道德和负责任的方式部署,以保护我们的隐私和维护我们的
价值观。
Output:
风格写作指南:
语气
-
在整个文档中保持信息性和乐观的语气。
-
强调主题的积极影响,同时承认
任何潜在的关注或责任。
长度:
…
词汇:
…
结构:
…
-
内容:
-
专注于主题的变革性质和积极影响,
事项。
- 突出特定行业或领域,其中主题有所
已经产生了影响。
- 承认任何相关的责任或潜在风险,
强调道德和负责任实践的需要。
强调隐私保护和维护价值观的重要性。
与主题相关。
# Generating New Content with the Extracted Features
After you’ve extracted the desired features from the input document, you can use this information to guide future content generation. Remember to ensure that the AI model incorporates the extracted features into its output by specifying the desired style in your prompt. For example:
* `Write a new blog post on [topic] using the same tone of voice, length, vocabulary, and structure as the previously analyzed text.`
By combining this technique with *reference text* (documents that act as grounding truth), you can produce credible, branded content that requires minimal revisions.
# Extracting Specific Textual Features with LLMs
You can easily tailor a prompt to guide an LLM in extracting particular textual features from a document. This can be applied beyond just analyzing text for copywriting purposes. For instance, recognizing entities or discerning sentiment from the text can be achieved by crafting a precise instruction for the LLM.
Input:
分析以下文本以识别和列出提到的实体:
人工智能正在迅速改变我们周围的世界,
开辟新的可能性并颠覆行业。从
从医疗保健到娱乐,人工智能正在使我们的生活变得更好,更加
方便之处不胜枚举。然而,权力越大,责任越大,
责任。我们必须确保人工智能技术得到开发和
以道德和负责任的方式部署,以保护我们的隐私和
维护我们的价值观。
Output:
实体:
-
人工智能(AI)
-
医疗保健
-
娱乐
-
隐私
-
价值观
# Summarization
In the era of information overload, the ability to condense large amounts of text into concise, digestible summaries has become an essential skill. As the volume of data we produce and consume continues to grow, the demand for effective summarization techniques has increased significantly.
AI has made significant advancements in this domain, providing powerful tools for generating summaries that retain the most critical information while discarding less relevant details.
Let’s explore how to summarize using a language model.
Input:
总结以下文本:
第一方当事人,以下称为
“购买者”,以及第二方当事人,以下称为
作为完全了解各自法律
根据适用的法定
规定和普通法原则,已依法
约束力,以下称为“协议”,目的是实现涉及货物转让的交易,
目的,以实现涉及货物转让的交易,
以下称为“货物”,在购买者和卖方之间,
应对教育系统日益增长的需求。
我们郑重声明和确认以下内容:
…
Output:
这是一个对购买者和卖方都具有法律约束力的协议
用于货物转让。协议将序言作为
是一个不可分割的部分,但在任何冲突的情况下,适用的
协议普遍适用。解释条款规定,单数词语
包括复数和单数,以及指代性别的词语包括
所有性别。对法定规定的提及包括任何修正或
替换。
Summarization is an invaluable application of AI, enabling users to quickly extract key insights from lengthy articles, reports, or research papers. This process can help individuals make informed decisions, save time, and prioritize their reading. AI-generated summaries can also facilitate information sharing among teams, allowing for more efficient collaboration and communication.
# Summarizing Given Context Window Limitations
For documents larger than an LLM can handle in a single API request, a common approach is to chunk the document, summarize each chunk, and then combine these summaries into a final summary, as shown in Figure 3-4.
<https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0304.png>
###### Figure 3-4\. A summarization pipeline that uses text splitting and multiple summarization steps
Additionally, people may require different types of summaries for various reasons, and this is where AI summarization comes in handy. As illustrated in the preceding diagram, a large PDF document could easily be processed using AI summarization to generate distinct summaries tailored to individual needs:
Summary A
Provides key insights, which is perfect for users seeking a quick understanding of the document’s content, enabling them to focus on the most crucial points
Summary B
On the other hand, offers decision-making information, allowing users to make informed decisions based on the content’s implications and recommendations
Summary C
Caters to collaboration and communication, ensuring that users can efficiently share the document’s information and work together seamlessly
By customizing the summaries for different users, AI summarization contributes to increased information retrieval for all users, making the entire process more efficient and targeted.
Let’s assume you’re only interested in finding and summarizing information about the advantages of digital marketing. Simply change your summarization prompt to `Provide a concise, abstractive summary of the above text. Only summarize the advantages: ...`
AI-powered summarization has emerged as an essential tool for quickly distilling vast amounts of information into concise, digestible summaries that cater to various user needs. By leveraging advanced language models like GPT-4, AI summarization techniques can efficiently extract key insights and decision-making information, and also facilitate collaboration and communication.
As the volume of data continues to grow, the demand for effective and targeted summarization will only increase, making AI a crucial asset for individuals and organizations alike in navigating the Information Age.
# Chunking Text
LLMs continue to develop and play an increasingly crucial role in various applications, as the ability to process and manage large volumes of text becomes ever more important. An essential technique for handling large-scale text is known as *chunking.*
*Chunking* refers to the process of breaking down large pieces of text into smaller, more manageable units or chunks. These chunks can be based on various criteria, such as sentence, paragraph, topic, complexity, or length. By dividing text into smaller segments, AI models can more efficiently process, analyze, and generate responses.
Figure 3-5 illustrates the process of chunking a large piece of text and subsequently extracting topics from the individual chunks.
<https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0305.png>
###### Figure 3-5\. Topic extraction with an LLM after chunking text
## Benefits of Chunking Text
There are several advantages to chunking text, which include:
Fitting within a given context length
LLMs only have a certain amount of input and output tokens, which is called a *context length*. By reducing the input tokens you can make sure the output won’t be cut off and the initial request won’t be rejected.
Reducing cost
Chunking helps you to only retrieve the most important points from documents, which reduces your token usage and API costs.
Improved performance
Chunking reduces the processing load on LLMs, allowing for faster response times and more efficient resource utilization.
Increased flexibility
Chunking allows developers to tailor AI responses based on the specific needs of a given task or application.
## Scenarios for Chunking Text
Chunking text can be particularly beneficial in certain scenarios, while in others it may not be required. Understanding when to apply this technique can help in optimizing the performance and cost efficiency of LLMs.
### When to chunk
Large documents
When dealing with extensive documents that exceed the maximum token limit of the LLM
Complex analysis
In scenarios where a detailed analysis is required and the document needs to be broken down for better comprehension and processing
Multitopic documents
When a document covers multiple topics and it’s beneficial to handle them individually
### When not to chunk
Short documents
When the document is short and well within the token limits of the LLM
Simple analysis
In cases where the analysis or processing required is straightforward and doesn’t benefit from chunking
Single-topic documents
When a document is focused on a single topic and chunking doesn’t add value to the processing
## Poor Chunking Example
When text is not chunked correctly, it can lead to reduced LLM performance. Consider the following paragraph from a news article:
Google My Business 个人资料,以便在人们寻找
今年,这一举措受到了家长和教师的普遍欢迎。该
额外的资金将用于改善学校基础设施,雇佣更多
这是另一个句子。
认为增加的幅度不足以应对教育系统日益增长的需求。
print(chunk)
When the text is fragmented into isolated words, the resulting list lacks the original context:
[“The”, “local”, “council”, “has”, “decided”, “to”, “increase”, “the”,
“预算”,…]
The main issues with this poor chunking example include:
Loss of context
By splitting the text into individual words, the original meaning and relationships between the words are lost. This makes it difficult for AI models to understand and respond effectively.
Increased processing load
Processing individual words requires more computational resources, making it less efficient than processing larger chunks of text.
As a result of the poor chunking in this example, an LLM may face several challenges:
* Difficulty understanding the main ideas or themes of the text
* Struggling to generate accurate summaries or translations
* Inability to effectively perform tasks such as sentiment analysis or text `classification`
By understanding the pitfalls of poor chunking, you can apply prompt engineering principles to improve the process and achieve better results with AI language models.
Let’s explore an improved chunking example using the same news article paragraph from the previous section; you’ll now chunk the text by sentence:
[“”"当地议会决定增加教育预算
今年增长 10%,这一举措受到了家长和教师的普遍欢迎。
“”",
“”"额外的资金将用于改善学校基础设施,
雇佣更多教师,并为学生提供更好的资源。",
“”"然而,一些批评者认为增加的幅度不足以
“”"]
# Divide Labor and Evaluate Quality
Define the granularity at which the text should be chunked, such as by sentence, paragraph, or topic. Adjust parameters like the number of tokens or model temperature to optimize the chunking process.
By chunking the text in this manner, you could insert whole sentences into an LLM prompt with the most relevant sentences.
# Chunking Strategies
There are many different chunking strategies, including:
Splitting by sentence
Preserves the context and structure of the original content, making it easier for LLMs to understand and process the information. Sentence-based chunking is particularly useful for tasks like summarization, translation, and sentiment analysis.
Splitting by paragraph
This approach is especially effective when dealing with longer content, as it allows the LLM to focus on one cohesive unit at a time. Paragraph-based chunking is ideal for applications like document analysis, topic modeling, and information extraction.
Splitting by topic or section
This method can help AI models better identify and understand the main themes and ideas within the content. Topic-based chunking is well suited for tasks like text classification, content recommendations, and clustering.
Splitting by complexity
For certain applications, it might be helpful to split text based on its complexity, such as the reading level or technicality of the content. By grouping similar complexity levels together, LLMs can more effectively process and analyze the text. This approach is useful for tasks like readability analysis, content adaptation, and personalized learning.
Splitting by length
This technique is particularly helpful when working with very long or complex documents, as it allows LLMs to process the content more efficiently. Length-based chunking is suitable for applications like large-scale text analysis, search engine indexing, and text preprocessing.
Splitting by tokens using a tokenizer
Utilizing a tokenizer is a crucial step in many natural language processing tasks, as it enables the process of splitting text into individual tokens. Tokenizers divide text into smaller units, such as words, phrases, or symbols, which can then be analyzed and processed by AI models more effectively. You’ll shortly be using a package called `tiktoken`, which is a bytes-pair encoding tokenizer (BPE) for chunking.
Table 3-1 provides a high-level overview of the different chunking strategies; it’s worth considering what matters to you most when performing chunking.
Are you more interested in preserving semantic context, or would naively splitting by length suffice?
Table 3-1\. Six chunking strategies highlighting their advantages and disadvantages
| Splitting strategy | Advantages | Disadvantages |
| --- | --- | --- |
| Splitting by sentence | Preserves context, suitable for various tasks | May not be efficient for very long content |
| Splitting by paragraph | Handles longer content, focuses on cohesive units | Less granularity, may miss subtle connections |
| Splitting by topic | Identifies main themes, better for classification | Requires topic identification, may miss fine details |
| Splitting by complexity | Groups similar complexity levels, adaptive | Requires complexity measurement, not suitable for all tasks |
| Splitting by length | Manages very long content, efficient processing | Loss of context, may require more preprocessing steps |
| Using a tokenizer: Splitting by tokens | Accurate token counts, which helps in avoiding LLM prompt token limits | Requires tokenization, may increase computational complexity |
By choosing the appropriate chunking strategy for your specific use case, you can optimize the performance and accuracy of AI language models.
# Sentence Detection Using SpaCy
*Sentence detection*, also known as sentence boundary disambiguation, is the process used in NLP that involves identifying the start and end of sentences within a given text. It can be particularly useful for tasks that require preserving the context and structure of the original content. By splitting the text into sentences, LLMs can better understand and process the information for tasks such as summarization, translation, and sentiment analysis.
Splitting by sentence is possible using NLP libraries such as [spaCy](https://spacy.io). Ensure that you have spaCy installed in your Python environment. You can install it with `pip install spacy`. Download the `en_core_web_sm` model using the command `python -m spacy download en_core_web_sm`.
In Example 3-3, the code demonstrates sentence detection using the spaCy library in Python.
##### Example 3-3\. [Sentence detection with spaCy](https://oreil.ly/GKDnc)
import spacy
nlp = spacy.load(“en_core_web_sm”)
text = “这是一个句子。这是另一个句子。”
doc = nlp(text)
对于 doc.sents 中的 sent:
print(sent.text)
Output:
这是一个句子。
当地议会决定将教育预算增加 10%
First, you’ll import the spaCy library and load the English model `(en_core_web_sm)` to initialize an `nlp` object. Define an input text with two sentences; the text is then processed with `doc = nlp(text)`, creating a `doc` object as a result. Finally, the code iterates through the detected sentences using the `doc.sents` attribute and prints each sentence.
# Building a Simple Chunking Algorithm in Python
After exploring many chunking strategies, it’s important to build your intuition by writing a simple chunking algorithm from scatch.
Example 3-4 shows how to chunk text based on the length of characters from the blog post “Hubspot - What Is Digital Marketing?” This file can be found in the Github repository at *[content/chapter_3/hubspot_blog_post.txt](https://oreil.ly/30rlQ)*.
To correctly read the *hubspot_blog_post.txt* file, make sure your current working directory is set to the [*content/chapter_3*](https://oreil.ly/OHurh) GitHub directory. This applies for both running the Python code or launching the Jupyter Notebook server.
##### Example 3-4\. [Character chunking](https://oreil.ly/n3sNy)
with open(“hubspot_blog_post.txt”, “r”) as f:
text = f.read()
chunks = [text[i : i + 200] for i in range(0, len(text), 200)]
将内容分成块:
print("-" * 20)
老师,并为学生提供更好的资源。然而,一些批评者
Output:
对于许多本地企业来说,搜索引擎优化策略是优化
卖方,根据本文件规定的条款和条件,双方
与您提供的产品或服务相关。
您提供的内容。
对于位于乔治亚州亚特兰大的本地书店 For Keeps Bookstore,它已经优化了其
Google My Business 个人资料以进行本地搜索引擎优化,以便在查询时显示。
“atlanta bookstore。”
…(为简洁起见省略)…
First, you open the text file *hubspot_blog_post.txt* with the `open` function and read its contents into the variable text. Then using a list comprehension you create a list of chunks, where each `chunk` is a 200 character substring of text.
Then you use the `range` function to generate indices for each 200 character substring, and the `i:i+200` slice notation to extract the substring from text.
Finally, you loop through each chunk in the `chunks` list and `print` it to the console.
As you can see, because the chunking implementation is relatively simple and only based on length, there are gaps within the sentences and even words.
For these reasons we believe that good NLP chunking has the following properties:
* Preserves entire words, ideally sentences and contextual points made by speakers
* Handles for when sentences span across several pages, for example, page 1 into page 2
* Provides an adequate token count for each `chunk` so that the total number of input tokens will appropriately fit into a given token context window for any LLM
# Sliding Window Chunking
*Sliding window chunking* is a technique used for dividing text data into overlapping chunks, or *windows*, based on a specified number of characters, tokens, or words.
But what exactly is a sliding window?
Imagine viewing a long piece of text through a small window. This window is only capable of displaying a fixed number of items at a time. As you slide this window from the beginning to the end of the text, you see *overlapping chunks of text*. This mechanism forms the essence of the sliding window approach.
Each window size is defined by a *fixed number of characters, tokens, or words*, and the *step size* determines how far the window moves with each slide.
In Figure 3-6, with a window size of 4 words and a step size of 1, the first chunk would contain the first 4 words of the text. The window then slides 1 word to the right to create the second chunk, which contains words 2 through 5.
This process repeats until the end of the text is reached, ensuring each chunk overlaps with the previous and next ones to retain some shared context.
<https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0306.png>
###### Figure 3-6\. A sliding window, with a window size of 4 and a step size of 1
Due to the step size being 1, there is a lot of duplicate information between chunks, and at the same time the risk of losing information between chunks is dramatically reduced.
This is in stark contrast to Figure 3-7, which has a window size of 4 words and a step size of 2\. You’ll notice that because of the 100% increase in step size, the amount of information shared between the chunks is greatly reduced.
<https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0307.png>
###### Figure 3-7\. A sliding window, with a window size of 4 and a step size of 2
You will likely need a larger overlap if accuracy and preserving semanatic context are more important than minimizing token inputs or the number of requests made to an LLM.
Example 3-5 shows how you can implement a sliding window using Python’s `len()` function. The `len()` function provides us with the total number of characters rather than words in a given text string, which subsequently aids in defining the parameters of our sliding windows.
##### Example 3-5\. [Sliding window](https://oreil.ly/aCkDo)
def sliding_window(text, window_size, step_size):
如果 window_size > len(text)或 step_size < 1:
return []
return [text[i:i+window_size] for i
in range(0, len(text) - window_size + 1, step_size)]
text = “This is an example of sliding window text chunking.”
window_size = 20
step_size = 5
chunks = sliding_window(text, window_size, step_size)
for idx, chunk in enumerate(chunks):
print(f"Chunk {idx + 1}: {chunk}")
This code outputs:
Chunk 1: This is an example o
Chunk 2: is an example of sli
Chunk 3: example of sliding
Chunk 4: ple of sliding windo
Chunk 5: f 滑动窗口文本
Chunk 6: ding 窗口文本
Chunk 7: window text chunking
In the context of prompt engineering, the sliding window approach offers several benefits over fixed chunking methods. It allows LLMs to retain a higher degree of context, as there is an overlap between the chunks and offers an alternative approach to preserving context compared to sentence detection.
# Text Chunking Packages
When working with LLMs such as GPT-4, always remain wary of the maximum context length:
* `maximum_context_length = input_tokens + output_tokens`
There are various tokenizers available to break your text down into manageable units, the most popular ones being NLTK, spaCy, and tiktoken.
Both [NLTK](https://oreil.ly/wTmI7) and [spaCy](https://oreil.ly/c4MvQ) provide comprehensive support for text processing, but you’ll be focusing on tiktoken.
# Text Chunking with Tiktoken
[Tiktoken](https://oreil.ly/oSpVe) is a fast *byte pair encoding (BPE)* tokenizer that breaks down text into subword units and is designed for use with OpenAI’s models. Tiktoken offers faster performance than comparable open source tokenizers.
As a developer working with GPT-4 applications, using tiktoken offers you several key advantages:
Accurate token breakdown
It’s crucial to divide text into tokens because GPT models interpret text as individual tokens. Identifying the number of tokens in your text helps you figure out whether the text is too lengthy for a model to process.
Effective resource utilization
Having the correct token count enables you to manage resources efficiently, particularly when using the OpenAI API. Being aware of the exact number of tokens helps you regulate and optimize API usage, maintaining a balance between costs and resource usage.
# Encodings
Encodings define the method of converting text into tokens, with different models utilizing different encodings. Tiktoken supports three encodings commonly used by OpenAI models:
| Encoding name | OpenAI models |
| --- | --- |
| cl100k_base | GPT-4, GPT-3.5-turbo, text-embedding-ada-002 |
| p50k_base | Codex models, text-davinci-002, text-davinci-003 |
| r50k_base (or gpt2) | GPT-3 models like davinci |
## Understanding the Tokenization of Strings
In English, tokens can vary in length, ranging from a single character like *t*, to an entire word such as *great*. This is due to the adaptable nature of tokenization, which can accommodate even tokens shorter than a character in complex script languages or tokens longer than a word in languages without spaces or where phrases function as single units.
It is not uncommon for spaces to be included within tokens, such as `"is"` rather than `"is "` or `" "+"is"`. This practice helps maintain the original text formatting and can capture specific linguistic characteristics.
###### Note
To easily examine the tokenization of a string, you can use [OpenAI Tokenizer](https://oreil.ly/K6ZQK).
You can install [tiktoken from PyPI](https://oreil.ly/HA2QD) with `pip install` `tiktoken`. In the following example, you’ll see how to easily encode text into tokens and decode tokens into text:
1. 导入包:
import tiktoken
2. 使用 tiktoken.get_encoding() 加载编码
encoding = tiktoken.get_encoding(“cl100k_base”)
3. 使用 encoding.encode() 将一些文本转换为标记
while 将标记转换为文本时使用 encoding.decode()
print(encoding.encode(“Learning how to use Tiktoken is fun!”))
print(encoding.decode([1061, 15009, 374, 264, 2294, 1648,
311, 4048, 922, 15592, 0]))
[48567, 1268, 311, 1005, 73842, 5963, 374, 2523, 0]
“Data engineering is a great way to learn about AI!”
Additionally let’s write a function that will tokenize the text and then count the number of tokens given a `text_string` and `encoding_name`.
def count_tokens(text_string: str, encoding_name: str) -> int:
"""
返回使用给定编码的文本字符串中的标记数量。
Args:
text: 要进行标记化的文本字符串。
encoding_name: 要用于标记化的编码的名称。
Returns:
文本字符串中的标记数量。
Raises:
ValueError: 如果编码名称不被识别。
“”"
encoding = tiktoken.get_encoding(encoding_name)
num_tokens = len(encoding.encode(text_string))
return num_tokens
4. 使用该函数计算文本字符串中的标记数量。
text_string = “Hello world! This is a test.”
print(count_tokens(text_string, “cl100k_base”))
This code outputs `8`.
# Estimating Token Usage for Chat API Calls
ChatGPT models, such as GPT-3.5-turbo and GPT-4, utilize tokens similarly to previous completion models. However, the message-based structure makes token counting for conversations more challenging:
def num_tokens_from_messages(messages, model=“gpt-3.5-turbo-0613”):
"""返回由消息列表使用的标记数量。”
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
print("警告:模型未找到。使用 cl100k_base 编码。")
encoding = tiktoken.get_encoding("cl100k_base")
if model in {
"gpt-3.5-turbo-0613",
"gpt-3.5-turbo-16k-0613",
"gpt-4-0314",
"gpt-4-32k-0314",
"gpt-4-0613",
"gpt-4-32k-0613",
}:
tokens_per_message = 3
tokens_per_name = 1
elif model == "gpt-3.5-turbo-0301":
tokens_per_message = 4 # 每条消息后跟随
# <|start|>{role/name}\n{content}<|end|>\n
tokens_per_name = -1 # 如果有名称,则省略角色
elif "gpt-3.5-turbo" in model:
print('''警告:gpt-3.5-turbo 可能会随时间更新。返回
num_tokens_assuming_gpt-3.5-turbo-0613.‘’')
return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
elif "gpt-4" in model:
print('''警告:gpt-4 可能会随时间更新。
返回假设使用 gpt-4-0613 的标记数量。”
return num_tokens_from_messages(messages, model="gpt-4-0613")
else:
raise NotImplementedError(
f"""num_tokens_from_messages() 未实现对于模型
{model}."""
)
num_tokens = 0
for message in messages:
num_tokens += tokens_per_message
for key, value in message.items():
num_tokens += len(encoding.encode(value))
if key == "name":
"role": "user",
print(model)
# {
{
Example 3-6 highlights the specific structure required to make a request against any of the chat models, which are currently GPT-3x and GPT-4.
Normally, chat history is structured with a `system` message first, and then succeeded by alternating exchanges between the `user` and the `assistant`.
##### Example 3-6\. A payload for the Chat Completions API on OpenAI
},
当然!以下是一个基本 Flask "Hello World"应用程序的概述:
"content": '''当我们不那么忙的时候,再谈谈如何
"content": '''您是一个有帮助的、遵循模式的助手,创建一个 Flask 应用程序的实例:创建一个 Flask 应用程序的实例。
return ‘Hello, World!’
from flask import Flask
{
Flask 应用程序:
"role": "system",
return num_tokens
- 负面
"name": "example_assistant",
使用其 ((("Flask 模块代码生成")))功能。
"content": '''当我们有更多带宽来触及
num_tokens += 3 # 每个回复都预先填充
"name": "example_user",
用 Flask 类表示您的 Web 应用程序。
for model in ["gpt-3.5-turbo-0301", "gpt-4-0314"]:
做得更好。'''
<|start|>assistant<|message|>
-
积极的
导入 Flask 模块:导入 Flask 模块
基于增加杠杆的机会。‘’’
{ print(f'''{num_tokens_from_messages(example_messages, model)}''') app.run()
当然!以下是一个 Flask 中 “Hello World” 路由的测试用例示例:
这段文字是积极的还是消极的?
或者中立:我绝对喜欢这部手机的造型,但电池
example_messages = [
方面。积极的一面是 "我绝对喜欢这部手机的造型,"
我绝对喜欢这部手机的造型,但电池寿命相当
"role": "system",
客户交付成果需要花费大量时间。
]
},
# application.
"role": "system",
“role”: “system”,
`"role": "system"` describes a system message that’s useful for *providing prompt instructions*. It offers a means to tweak the assistant’s character or provide explicit directives regarding its interactive approach. It’s crucial to understand, though, that the system command isn’t a prerequisite, and the model’s default demeanor without a system command could closely resemble the behavior of “You are a helpful assistant.”
The roles that you can have are `["system", "user", "assistant"]`.
`"content": "Some content"` is where you place the prompt or responses from a language model, depending upon the message’s role. It can be either `"assistant"`, `"system"`, or `"user"`.
# Sentiment Analysis
*Sentiment analysis* is a widely used NLP technique that helps in identifying, extracting, and understanding the emotions, opinions, or sentiments expressed in a piece of text. By leveraging the power of LLMs like GPT-4, sentiment analysis has become an essential tool for businesses, researchers, and developers across various industries.
The primary goal of sentiment analysis is to determine the attitude or emotional tone conveyed in a text, whether it’s positive, negative, or neutral. This information can provide valuable insights into consumer opinions about products or services, help monitor brand reputation, and even assist in predicting market trends.
The following are several prompt engineering techniques for creating effective sentiment analysis prompts:
Input:
import unittest
提供一个 Flask 中简单 “Hello World” 路由的代码片段。
while the negative part is “the battery life is quite disappointing.”
Output:
运行应用程序:启动 Flask 开发服务器以运行应用程序。
@app.route(‘/’)
“content”: ‘’'这次晚期的转变意味着我们没有
Although GPT-4 identifies a “mixed tone,” the outcome is a result of several shortcomings in the prompt:
Lack of clarity
The prompt does not clearly define the desired output format.
Insufficient examples
The prompt does not include any examples of positive, negative, or neutral sentiments, which could help guide the LLM in understanding the distinctions between them.
No guidance on handling mixed sentiments
The prompt does not specify how to handle cases where the text contains a mix of positive and negative sentiments.
Input:
使用以下示例作为指南:
},
“name”: “example_assistant”,
请将以下文本的情感分类为积极、消极或中立:
{
电池寿命相当令人失望。
},
},
app = Flask(name)
“content”: “事物协同工作将增加收入。”,
“name”: “example_assistant”,
Output:
定义路由和视图函数:…
This prompt is much better because it:
Provides clear instructions
The prompt clearly states the task, which is to classify the sentiment of the given text into one of three categories: positive, negative, or neutral.
Offers examples
The prompt provides examples for each of the sentiment categories, which helps in understanding the context and desired output.
Defines the output format
The prompt specifies that the output should be a single word, ensuring that the response is concise and easy to understand.
## Techniques for Improving Sentiment Analysis
To enhance sentiment analysis accuracy, preprocessing the input text is a vital step. This involves the following:
Special characters removal
Exceptional characters such as emojis, hashtags, and punctuation may skew the rule-based sentiment algorithm’s judgment. Besides, these characters might not be recognized by machine learning and deep learning models, resulting in misclassification.
Lowercase conversion
Converting all the characters to lowercase aids in creating uniformity. For instance, words like *Happy* and *happy* are treated as different words by models, which can cause duplication and inaccuracies.
Spelling correction
Spelling errors can cause misinterpretation and misclassification. Creating a spell-check pipeline can significantly reduce such errors and improve results.
For industry- or domain-specific text, embedding domain-specific content in the prompt helps in navigating the LLM’s sense of the text’s framework and sentiment. It enhances accuracy in the classification and provides a heightened understanding of particular jargon and expressions.
## Limitations and Challenges in Sentiment Analysis
Despite the advancements in LLMs and the application of prompt engineering techniques, sentiment analysis still faces some limitations and challenges:
Handling sarcasm and irony
Detecting sarcasm and irony in text can be difficult for LLMs, as it often requires understanding the context and subtle cues that humans can easily recognize. Misinterpreting sarcastic or ironic statements may lead to inaccurate sentiment classification.
Identifying context-specific sentiment
Sentiment analysis can be challenging when dealing with context-specific sentiments, such as those related to domain-specific jargon or cultural expressions. LLMs may struggle to accurately classify sentiments in these cases without proper guidance or domain-specific examples.
# Least to Most
The *least to most* technique in prompt engineering is a powerful method for sequentially generating or extracting increasingly detailed knowledge on a given topic. This method is particularly effective when dealing with complex subjects or when a high level of detail is necessary.
Least to most uses a *chain* of prompts where each new prompt is based on the last answer. This step-by-step approach helps gather more detailed information each time, making it easier to dive deeper into any topic.
This technique can also be applied to code generation, as demonstrated in a Flask `Hello World` app example.
## Planning the Architecture
Before diving into the architecture, let’s briefly understand what Flask is. [Flask](https://oreil.ly/7N-bs) is a lightweight web application framework in Python, widely used for creating web applications quickly and with minimal code. (Flask is only used for demonstration purposes here and isn’t included within the [*requirements.txt* file](https://oreil.ly/TRK0i) for the book.
Now, let’s ask an LLM to outline the basic architecture for a simple Flask “Hello World” application.
Input:
我绝对喜欢这部手机的造型,但电池寿命相当
Output:
- 中立
积极的:“我绝对喜欢这部手机的造型!”
中立:“我喜欢这个产品,但它的电池寿命很短。”
概述一个简单的 Flask "Hello World"应用程序的基本架构。
“name”: “example_user”,
当然!以下是一个使用 Flask 的 “Hello World” 路由的简单代码片段:
中立
num_tokens += tokens_per_name
## Coding Individual Functions
Before coding, let’s clarify what a Flask route is: it’s a function linked to a URL pattern that determines what users see on a specific web page. Next, we’ll provide the code snippet for a simple “Hello World” route in Flask.
Input:
if name == ‘main’:
Output:
将公司术语翻译成普通英语。‘’’
提供一个 Flask 中 “Hello World” 路由的测试用例示例。
“content”: “新的协同效应将有助于推动收入增长。”,
实现视图函数:…
def hello_world():
{
},
"role": "system",
In this code, we import the Flask module and create a Flask application instance named `app`. We then define a route using the `@app.route` decorator, specifying the root URL path `'/'`. The `hello_world` function is associated with this route, and it returns the string `'Hello, World!'` as the response.
The Flask application will start by calling `app.run()` when your Python script is executed directly.
## Adding Tests
Finally, let’s ask it to generate a test case for the “Hello World” route in the Flask application.
Input:
这段文字的语气是混合的,因为它包含了积极和消极的内容。
只返回以下单词中的一个:
Output:
prompt tokens counted by num_tokens_from_messages().‘’')
消极的:“电池寿命相当令人失望。”
example token count from the function defined above
from flask import Flask
from flask.testing import FlaskClient
创建 Flask 应用程序
app = Flask(name)
创建路由
@app.route(‘/’)
def hello_world():
return 'Hello, World!'
class FlaskAppTestCase(unittest.TestCase):
def setUp(self):
# 创建测试客户端
self.app = app.test_client()
def test_hello_world_route(self):
# 向路由发送 GET 请求
response = self.app.get('/')
# 检查响应状态码
self.assertEqual(response.status_code, 200)
# 检查响应数据
self.assertEqual(response.data, b'Hello, World!')
if name == ‘main’:
unittest.main()
By using the least to most prompt technique, you can gradually build up the knowledge and understanding required for the task at hand.
## Benefits of the Least to Most Technique
This method is particularly useful for complex tasks, as it allows an LLM to generate relevant knowledge that will subsequently be used as context for future tasks.
Let’s dive deeper into the benefits of using this approach in various other scenarios:
Progressive exploration
Breaking a complex problem into smaller tasks allows an LLM to provide more detailed and accurate information at each step. This approach is especially helpful when working with a new subject matter or a multifaceted problem.
Flexibility
The least to most technique offers flexibility in addressing different aspects of a problem. It enables you to pivot, explore alternative solutions, or dive deeper into specific areas as needed.
Improved comprehension
By breaking down a task into smaller steps, an LLM can deliver information in a more digestible format, making it easier for you to understand and follow.
Collaborative learning
This technique promotes collaboration between you and an LLM, as it encourages an iterative process of refining the output and adjusting your responses to achieve the desired outcome.
## Challenges with the Least to Most Technique
Overreliance on previously generated knowledge
Using previous chat history to store the state may lead to larger tasks forgetting their initial tasks/outputs due to limitations in context length.
Dependence on prior prompts
Since each prompt is built upon preceding outputs, it is imperative to ensure that the quality and responses of previous prompts provide ample information for the next step.
# Evaluate Quality
In the process of designing prompts, make sure to evaluate the quality of each prior LLM response. The performance of the next task depends upon the quality of information provided by the preceding one.
# Role Prompting
*Role prompting* is a technique in which the AI is given a specific role or character to assume while generating a response. This helps guide the AI’s response style and content to better align with the user’s needs or desired outcome.
###### Note
[Awesome ChatGPT prompts](https://oreil.ly/8pf40) provides an extensive list of role prompts that you can use.
Let’s ask ChatGPT to take on the role of a tech reviewer for MongoDB.
Input:
我想让你扮演技术评论员。我会给你一个文章的名字
技术和您将提供深入评论 - 包括
优点、缺点、特性以及与其他市场上技术的比较。
我的第一项建议请求是:MongoDB。
Output:
MongoDB 是一种流行的 NoSQL、面向文档的数据库管理系统
设计用于处理高容量数据存储和检索。
特性:
- 面向文档的存储:MongoDB 以灵活、类似 JSON 的方式存储数据
格式称为 BSON,这使得存储复杂的数据结构变得容易
数组和嵌套对象。
…
优点:
- 高性能:MongoDB 的索引和缓存功能提供
快速读写操作。
…
缺点:
- 对事务的支持有限:虽然 MongoDB 支持多文档事务,
如果在 ACID 事务中找不到答案,它可能不是应用程序的最佳选择
需要复杂的交易操作。
…
与其他技术的比较:
- Couchbase,另一个流行的 NoSQL 数据库,提供与 MongoDB 类似的功能
MongoDB,包括灵活的模式和基于 JSON 的存储。然而,
Couchbase 更专注于分布式缓存,可能更适合
适用于需要低延迟数据访问的应用。
结论:
MongoDB 是一种强大且灵活的 NoSQL 数据库解决方案,适用于
各种应用,尤其是那些需要高性能和
可扩展性。其灵活的数据模型、丰富的查询语言和强大的
一致性使其成为开发人员中的热门选择。
The prompt is an excellent example of role prompting, as it clearly defines the role the AI should assume (a tech reviewer) and sets expectations for the type of response desired (an in-depth review of MongoDB).
# Give Direction
When crafting prompts, consider assigning a specific role to the AI. This sets the proper context for the response, creating a more focused and relevant output.
# Benefits of Role Prompting
Role prompting helps narrow down the AI’s responses, ensuring more focused, contextually appropriate, and tailored results. It can also enhance creativity by pushing the AI to think and respond from unique perspectives.
# Challenges of Role Prompting
Role prompting can pose certain challenges. There might be potential risks for bias or stereotyping based on the role assigned. Assigning stereotyped roles can lead to generating biased responses, which could harm usability or offend individuals. Additionally, maintaining consistency in the role throughout an extended interaction can be difficult. The model might drift off-topic or respond with information irrelevant to the assigned role.
# Evaluate Quality
Consistently check the quality of the LLM’s responses, especially when role prompting is in play. Monitor if the AI is sticking to the role assigned or if it is veering off-topic.
# When to Use Role Prompting
Role prompting is particularly useful when you want to:
Elicit specific expertise
If you need a response that requires domain knowledge or specialized expertise, role prompting can help guide the LLM to generate more informed and accurate responses.
Tailor response style
Assigning a role can help an LLM generate responses that match a specific tone, style, or perspective, such as a formal, casual, or humorous response.
Encourage creative responses
Role prompting can be used to create fictional scenarios or generate imaginative answers by assigning roles like a storyteller, a character from a novel, or a historical figure.
* *Explore diverse perspectives*: If you want to explore different viewpoints on a topic, role prompting can help by asking the AI to assume various roles or personas, allowing for a more comprehensive understanding of the subject.
* *Enhance user engagement*: Role prompting can make interactions more engaging and entertaining by enabling an LLM to take on characters or personas that resonate with the user.
If you’re using OpenAI, then the best place to add a role is within the `System Message` for chat models.
# GPT Prompting Tactics
So far you’ve already covered several prompting tactics, including asking for context, text style bundling, least to most, and role prompting.
Let’s cover several more tactics, from managing potential hallucinations with appropriate reference text, to providing an LLM with critical *thinking time*, to understanding the concept of *task decomposition*—we have plenty for you to explore.
These methodologies have been designed to significantly boost the precision of your AI’s output and are recommended by [OpenAI](https://oreil.ly/QZE8n). Also, each tactic utilizes one or more of the prompt engineering principles discussed in Chapter 1.
## Avoiding Hallucinations with Reference
The first method for avoiding text-based hallucinations is to instruct the model to *only answer using reference text.*
By supplying an AI model with accurate and relevant information about a given query, the model can be directed to use this information to generate its response.
Input:
参考三引号内的文章以回答查询。
您必须遵循以下原则:
- 如果在这些文章中找不到答案,只需
return “I could not find an answer”.
“”"
B2B 客户往往有更长的决策过程,因此决策周期更长
销售漏斗。关系建立策略对这些
客户,而 B2C 客户往往对短期优惠反应更好
和消息。
“”"
示例响应:
-
我找不到答案。
-
是的,B2B 客户往往有更长的决策过程,因此
更长的销售漏斗。
Output:
是的,B2B 客户往往有更长的决策过程,这导致
决策周期更长。
If you were to ask the same reference text this question:
Input:
…剩余提示…
问题:B2C 销售是否更划算?
Output:
我找不到答案。
# Give Direction and Specify Format
The preceding prompt is excellent as it both instructs the model on how to find answers and also sets a specific response format for any unanswerable questions.
Considering the constrained context windows of GPTs, a method for dynamically retrieving information relevant to the asked query might be necessary to utilize this strategy.
Another approach is to direct the model to *incorporate references* from a given text in its response. When the input is enhanced with relevant information, the model can be guided to include citations in its responses by referring to sections of the supplied documents. This approach has the added benefit that citations in the output can be *authenticated automatically by matching strings* within the given documents.
Input:
您将提供由三引号分隔的文档,并
Output:
## Give GPTs “Thinking Time”
Often, by explicitly guiding an LLM to *derive solutions from first principles* before reaching a verdict, you can garner more accurate responses. Providing an LLM with *thinking time* can often lead to better results.
Input:
Output:
In some cases, when using GPT-4 or other generative AI models, you may not want the model’s reasoning process to be visible to the user.
You can achieve this by asking an LLM to *generate an inner monologue*. This is particularly useful in tutoring applications, where revealing the model’s reasoning might give away the solution prematurely.
## The Inner Monologue Tactic
The *inner monologue tactic* instructs the model to structure parts of the output that should be hidden from the user in a specific format. This makes it easy to remove these parts before presenting the final output to the user.
Here’s how you can utilize this tactic to answer user queries effectively.
Input:
指导他们找到正确答案,而不直接给出答案。
这个部分也应该在三个引号内。
第四步:交付提示
如果有错误,向用户展示上一步的提示。
记住不要在这个步骤中使用三引号。而不是使用标题
一样,使用 提示: 这样的标题
问题陈述:一副牌有 52 张牌:13 张红心、13 张方块、13
梅花、13 张黑桃。如果从牌组中抽取两张牌而不进行
抽牌,那么抽出的牌中至少有一张红心的概率是多少
和另一张是梅花吗?
用户解决方案:(13/52)*(13/51)= 169/2652 = 13/204
Output:
“”"
第一步 - 我的解决方案:
首先,计算从牌组中抽取 1 张红心和 1 张梅花的概率
52 张牌的牌组,不进行抽牌。
在第一次抽取中,抽取红心的概率是 13/52。
对于第二次抽取,在移除 1 张牌(假设是红心)后,剩余的
总牌数变为 51 张。因此,现在抽取梅花的概率是
是 13/51。
将这两个概率相乘,得到两个事件同时发生的概率
按顺序发生,即 P(红心然后梅花) = (13/52)*(13/51)。
假设第一张抽出的牌是梅花,然后重复计算
第二次抽取的牌是红心。记住,在这种情况下,顺序很重要,
我们处理的是独立事件。
“”"
“”"
第二步 - 与用户解决方案的比较:
将你的解决方案与用户解决方案进行比较。用户的解决方案只
考虑先抽红心然后抽梅花的顺序。他们
没有考虑到另一种序列(梅花然后红心),并且
因此,解决方案是不完整的。
“”"
“”"
第三步 - 提示准备:
由于用户没有考虑两种序列(红心然后梅花
以及梅花然后红心),这是对考虑两种可能
序列将有助于引导用户走向正确的方向。这可能会
方向而不直接透露解决方案。
“”"
第四步 - 提示交付:
提示:记住两张牌可以以任何顺序抽取。考虑
两种可能的顺序:红心然后梅花和梅花然后红心。
## Self-Eval LLM Responses
Another tactic you can use is to *critque a generated LLM output* and ask whether the LLM missed any information or important facts. You’re essentially asking an LLM to *evaluate itself* based on its previous output.
First, create a prompt to generate a `"Hello World"` function.
Input:
生成一个 “Hello World” Python 函数。
Output:
是的,这是一个非常简单的 Python 函数,它将 “Hello, World!” 打印到
控制台:
def print_hello_world():
print("Hello, World!")
打印 “Hello, World!” 的函数。
当你运行这个函数时,它将打印字符串 “Hello, World!” 到
控制台。
The LLM returns a simple Python function called `print_hello_world()` that prints the traditional “Hello, World!” greeting.
Then make a second LLM request with the previous chat history, asking for the initial output to be improved.
Input:
…之前的聊天记录…
给定前面的函数,你能用其他方式再次改进它吗?
也许默认值或类型提示?
Output:
我们可以添加类型提示来清楚地表明该函数不接受任何
参数且不返回任何内容。我们还可以添加一个文档字符串来
描述函数的功能。
这里是改进后的代码:
def print_hello_world() -> None:
"""
将 "Hello, World!" 打印到标准输出。
"""
print("Hello, World!")
如果你想使函数更灵活,可以添加一个默认
允许你打印不同的消息:
def print_message(message: str = “Hello, World!”) -> None:
"""
打印给定的消息到标准输出。如果没有提供消息
, 默认为 "Hello, World!"。
Args:
message (str, optional): 要打印的消息。
默认为 "Hello, World!"。
"""
print(message)
Consequently GPT-4 returns an upgraded version of the function, now furnished with Python type hints and a default argument. This enriches the function with greater flexibility, allowing it to print not just `"Hello, World!"` but any user-specified message.
These prompt-response exchanges illustrate how you can easily refine generated LLM outputs until you’re satisfied with the final output.
###### Note
It’s possible to critique an LLM’s response multiple times, until no further refinement is provided by the LLM.
# Classification with LLMs
Classifying, in the context of AI, refers to the process of predicting the class or category of a given data point or sample. It’s a common task in machine learning where models are trained to assign predefined labels to unlabeled data based on learned patterns.
LLMs are powerful assets when it comes to classification, even with zero or only a small number of examples provided within a prompt. Why? That’s because LLMs, like GPT-4, have been previously trained on an extensive dataset and now possess a degree of reasoning.
There are two overarching strategies in solving classification problems with LLMs: *zero-shot learning* and *few-shot learning*.
Zero-shot learning
In this process, the LLM classifies data with exceptional accuracy, without the aid of any prior specific examples. It’s akin to acing a project without any preparation—impressive, right?
Few-shot learning
Here, you provide your LLM with a small number of examples. This strategy can significantly influence the structure of your output format and enhance the overall classification accuracy.
Why is this groundbreaking for you?
Leveraging LLMs lets you sidestep lengthy processes that traditional machine learning processes demand. Therefore, you can quickly prototype a classification model, determine a base level accuracy, and create immediate business value.
###### Warning
Although an LLM can perform classification, depending upon your problem and training data you might find that using a traditional machine learning process could yield better results.
# Building a Classification Model
Let’s explore a few-shot learning example to determine the sentiment of text into either `'Compliment'`, `'Complaint'`, or `'Neutral'`.
Given the statement, classify it as either “Compliment”, “Complaint”, or
“中立”:
-
“太阳正在照耀。” - 中立
-
“你们的支持团队太棒了!” - 表扬
-
“我使用你们的软件时遇到了糟糕的经历。” - 投诉
您必须遵循以下原则:
- 只返回单个分类词。响应应该是
“Compliment”, “Complaint”, or “Neutral”.
- 在 “”"定界符内的文本上执行分类。
“”"用户界面直观。”
Classification:
表扬
Several good use cases for LLM classification include:
Customer reviews
Classify user reviews into categories like “Positive,” “Negative,” or “Neutral.” Dive deeper by further identifying subthemes such as “Usability,” “Customer Support,” or “Price.”
Email filtering
Detect the intent or purpose of emails and classify them as “Inquiry,” “Complaint,” “Feedback,” or “Spam.” This can help businesses prioritize responses and manage communications efficiently.
Social media sentiment analysis
Monitor brand mentions and sentiment across social media platforms. Classify posts or comments as “Praise,” “Critic,” “Query,” or “Neutral.” Gain insights into public perception and adapt marketing or PR strategies accordingly.
News article categorization
Given the vast amount of news generated daily, LLMs can classify articles by themes or topics such as “Politics,” “Technology,” “Environment,” or “Entertainment.”
Résumé screening
For HR departments inundated with résumés, classify them based on predefined criteria like “Qualified,” “Overqualified,” “Underqualified,” or categorize by expertise areas such as “Software Development,” “Marketing,” or “Sales.”
###### Warning
Be aware that exposing emails, résumés, or sensitive data does run the risk of data being leaked into OpenAI’s future models as training data.
# Majority Vote for Classification
Utilizing multiple LLM requests can help in reducing the variance of your classification labels. This process, known as *majority vote*, is somewhat like choosing the most common fruit out of a bunch. For instance, if you have 10 pieces of fruit and 6 out of them are apples, then apples are the majority. The same principle goes for choosing the majority vote in classification labels.
By soliciting several classifications and taking the *most frequent classification*, you’re able to reduce the impact of potential outliers or unusual interpretations from a single model inference. However, do bear in mind that there can be significant downsides to this approach, including the increased time required and cost for multiple API calls.
Let’s classify the same piece of text three times, and then take the majority vote:
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ.get(“OPENAI_API_KEY”))
base_template = “”"
Given the statement, classify it as either “Compliment”, “Complaint”, or
“中立”:
-
“太阳正在照耀。” - 中立
-
“你们的支持团队太棒了!” - 表扬
-
“我使用你们的软件时遇到了糟糕的经历。” - 投诉
您必须遵循以下原则:
- 只返回单个分类词。响应应该是
“表扬”, “投诉”, 或 “中立”。
- 在 ‘’'定界符内的文本上执行分类。
‘’‘{content}’‘’
Classification:
“”"
responses = []
for i in range(0, 3):
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "system",
"content": base_template.format(content='''外面下雨了,
但我过得非常愉快,我只是不明白人们
live, 我非常难过!‘’'),}],)
responses.append(response.choices[0].message.content.strip())
def most_frequent_classification(responses):
# Use a dictionary to count occurrences of each classification
count_dict = {}
for classification in responses:
count_dict[classification] = count_dict.get(classification, 0) + 1
# 返回具有最大计数的分类
return max(count_dict, key=count_dict.get)
print(most_frequent_classification(responses)) # 预期输出: 中立
Calling the `most_frequent_classification(responses)` function should pinpoint `'Neutral'` as the dominant sentiment. You’ve now learned how to use the OpenAI package for majority vote classification.
# Criteria Evaluation
In Chapter 1, a human-based evaluation system was used with a simple thumbs-up/thumbs-down rating system to identify how often a response met our expectations. Rating manually can be expensive and tedious, requiring a qualified human to judge quality or identify errors. While this work can be outsourced to low-cost raters on services such as [Mechanical Turk](https://www.mturk.com), designing such a task in a way that gets valid results can itself be time-consuming and error prone. One increasingly common approach is to use a more sophisticated LLM to evaluate the responses of a smaller model.
The evidence is mixed on whether LLMs can act as effective evaluators, with some studies [claiming LLMs are human-level evaluators](https://oreil.ly/nfc3f) and others [identifying inconsistencies in how LLMs evaluate](https://oreil.ly/ykkzY). In our experience, GPT-4 is a useful evaluator with consistent results across a diverse set of tasks. In particular, GPT-4 is effective and reliable in evaluating the responses from smaller, less sophisticated models like GPT-3.5-turbo. In the example that follows, we generate concise and verbose examples of answers to a question using GPT-3.5-turbo, ready for rating with GPT-4.
Input:
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ.get(“OPENAI_API_KEY”))
responses = []
for i in range(10):
# concise if even, verbose if odd
style = "concise" if i % 2 == 0 else "verbose"
if style == "concise":
prompt = f"""返回一个 {style} 答案到
following question: 生命的意义是什么?”“”"
else:
prompt = f"""返回以下问题的答案
question: 生命的意义是什么?”“”"
response = client.chat.completions.create(
# using GPT-3.5 Turbo for this example
model="gpt-3.5-turbo",
messages=[{"role": "user",
"content": prompt}])
responses.append(
response.choices[0].message.content.strip())
system_prompt = “”"您正在评估提示的简洁性
response from a chatbot.
如果响应简洁,则只回复 1,如果不简洁,则回复 0。
and a 0 if it is not.
“”"
ratings = []
for idx, response in enumerate(responses):
rating = client.chat.completions.create(
模型="gpt-4",
消息=[{"角色": "系统",
"内容": system_prompt},
{"角色": "系统",
"内容": response}]
ratings.append(
rating.choices[0].message.content.strip())
for idx, rating in enumerate(ratings):
样式 = "简洁" if idx % 2 == 0 else "冗长"
print(f"样式:{style}, ", f"评分:{rating}")
Output:
样式:简洁,评分:1
样式:冗长,评分:0
样式:简洁,评分:1
样式:冗长,评分:0
样式:简洁,评分:1
样式:冗长,评分:0
样式:简洁,评分:1
样式:冗长,评分:0
样式:简洁,评分:1
样式:冗长,评分:0
This script is a Python program that interacts with the OpenAI API to generate and evaluate responses based on their conciseness. Here’s a step-by-step explanation:
1. `responses = []` creates an empty list named `responses` to store the responses generated by the OpenAI API.
2. The `for` loop runs 10 times, generating a response for each iteration.
3. Inside the loop, `style` is determined based on the current iteration number (`i`). It alternates between “concise” and “verbose” for even and odd iterations, respectively.
4. Depending on the `style`, a `prompt` string is formatted to ask, “What is the meaning of life?” in either a concise or verbose manner.
5. `response = client.chat.completions.create(...)` makes a request to the OpenAI API to generate a response based on the `prompt`. The model used here is specified as “gpt-3.5-turbo.”
6. The generated response is then stripped of any leading or trailing whitespace and added to the `responses` list.
7. `system_prompt = """You are assessing..."""` sets up a prompt used for evaluating the conciseness of the generated responses.
8. `ratings = []` initializes an empty list to store the conciseness ratings.
9. Another `for` loop iterates over each response in `responses`.
10. For each response, the script sends it along with the `system_prompt` to the OpenAI API, requesting a conciseness evaluation. This time, the model used is “gpt-4.”
11. The evaluation rating (either 1 for concise or 0 for not concise) is then stripped of whitespace and added to the `ratings` list.
12. The final `for` loop iterates over the `ratings` list. For each rating, it prints the `style` of the response (either “concise” or “verbose”) and its corresponding conciseness `rating`.
For simple ratings like conciseness, GPT-4 performs with near 100% accuracy; however, for more complex ratings, it’s important to spend some time evaluating the evaluator. For example, by setting test cases that contain an issue, as well as test cases that do not contain an issue, you can identify the accuracy of your evaluation metric. An evaluator can itself be evaluated by counting the number of false positives (when the LLM hallucinates an issue in a test case that is known not to contain an issue), as well as the number of false negatives (when the LLM misses an issue in a test case that is known to contain an issue). In our example we generated the concise and verbose examples, so we can easily check the rating accuracy, but in more complex examples you may need human evaluators to validate the ratings.
# Evaluate Quality
Using GPT-4 to evaluate the responses of less sophisticated models is an emerging standard practice, but care must be taken that the results are reliable and consistent.
Compared to human-based evaluation, LLM-based or synthetic evaluation typically costs an order of magnitude less and completes in a few minutes rather than taking days or weeks. Even in important or sensitive cases where a final manual review by a human is necessary, rapid iteration and A/B testing of the prompt through synthetic reviews can save significant time and improve results considerably. However, the cost of running many tests at scale can add up, and the latency or rate limits of GPT-4 can be a blocker. If at all possible, a prompt engineer should first test using programmatic techniques that don’t require a call to an LLM, such as simply measuring the length of the response, which runs near instantly for close to zero cost.
# Meta Prompting
*Meta prompting* is a technique that involves the creation of text prompts that, in turn, generate other text prompts. These text prompts are then used to generate new assets in many mediums such as images, videos, and more text.
To better understand meta prompting, let’s take the example of authoring a children’s book with the assistance of GPT-4\. First, you direct the LLM to generate the text for your children’s book. Afterward, you invoke meta prompting by instructing GPT-4 to produce prompts that are suitable for image-generation models. This could mean creating situational descriptions or specific scenes based on the storyline of your book, which then can be given to AI models like Midjourney or Stable Diffusion. These image-generation models can, therefore, deliver images in harmony with your AI-crafted children’s story.
Figure 3-8 visually describes the process of meta prompting in the context of crafting a children’s book.
<https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0308.png>
###### Figure 3-8\. Utilizing an LLM to generate image prompts for MidJourney’s image creation in the process of crafting a children’s book
Meta prompts offer a multitude of benefits for a variety of applications:
Image generation from product descriptions
Meta prompts can be employed to derive an image generation prompt for image models like [Midjourney](https://www.midjourney.com), effectively creating a visual representation of product descriptions.
Generating style/feature prompts
Let’s consider you are a copywriter needing to develop a unique style guide prompt from a couple of blog posts. Given each client has a distinctive tone and style, it’s beneficial to utilize a *meta prompt* that encapsulates all the varied features, rather than producing a single prompt output.
Optimizing prompts to achieve specific goals
A common approach is to ask ChatGPT or another language model to refine or improve `Prompt A` in order to attain `Goal 1`, given `Prompt A` and `Goal 1`. This method aids in identifying any missed input features, that by adding could assist the language model in generating a more detailed and desirable response.
Let’s experiment with creating a meta prompt for the copywriting example.
Input:
作为 GPT 提示工程师,你唯一负责的是产生
高效的大型语言模型文本提示,给定一个任务。
你必须遵循以下原则:
-
仅返回单个提示,永远不要返回提示的输出。
-
你必须正确地完成任务。
-
你必须根据任务适当地进行提示或创建新的提示。
任务。
- 列出任何重要功能,以便另一个提示工程师
可以轻松地将输入复制到提示中。
任务:你必须为 ChatGPT 创建多个博客文章的写作指南。
在 ChatGPT 提示中包含相关的功能,这将允许 ChatGPT
为了模仿作者。你必须决定最重要的功能是什么
模仿作者的风格等。你将只应用此风格到
新的博客文章。
博客文章文本:你需要除了技术
技能以在技术领域取得成功。软技能被
软件专业人士与同行协作…
Output:
提示:
根据提供的广泛文本,你的任务是创建一系列
按照作者相同的风格和结构撰写博客文章。作为指南
你应该遵循以下特点:
- 清晰、直接的语言:你的写作应该是清晰和直接的
语言,将复杂主题分解成易于理解
概念。
- 使用子标题:为了为你的博客文章提供结构,将它们
使用相关的子标题。这有助于引导读者并使你的
文本更易于阅读。
…
- 技术术语的包含:在讨论技术主题时,
使用适当的术语语言以确保准确性和可信度。
然而,务必提供解释或定义,以便更复杂
术语。
As previously mentioned, it can be beneficial to *critique the model’s output* and to ask for a refined prompt either via the interface or API.
ChatGPT can perform web searches and can make use of the content of web pages, along with the previously generated copywriting style guide. Let’s provide two URLs that ChatGPT will read to enhance the output.
Input:
根据以下风格指南和这两篇博客文章,我想让你创建一系列
改进写作风格提示,以便 LLM 或文案人员可以写作
有效地为我客户服务。
…之前生成的写作指南…
URL 1:https://understandingdata.com/posts/how-to-clean-and-process-data
URL 2:https://understandingdata.com/posts/how-to-scrape-twitter-data
Figure 3-9 shows the output.
<https://github.com/OpenDocCN/ibooker-dl-zh/raw/master/docs/pmt-engi-genai/img/pega_0309.png>
###### Figure 3-9\. ChatGPT refining a meta prompt by two URL web pages
基于风格指南和两篇博客文章,以下是一个改进的
写作风格提示:
为客户创建博客文章时,请遵循以下指南:
- 清晰、直接的语言:你的写作应该是清晰和直接的
主题。以易于理解的方式呈现信息。目标是你的
观众可以理解概念,而无需先前的知识。
…
- 技术术语的使用:使用适当的技术语言
讨论复杂主题时。然而,确保提供定义
或对这些术语的解释,以保持文本的可访问性。
模仿这些博客文章的写作风格:
“什么是 API?它们如何与数据工程相关?”: 这篇博客
这篇文章使用清晰直接的语言解释了 API 的概念。它在
提供了详细解释和常用术语的实用示例
APIs。它将 API 的技术概念与业务增长联系起来,
提高生产力。信息被很好地分类,使用子标题如
‘API 的类型。’
“如何轻松设置带有 API 的 Google Cloud 项目”:这篇博客文章是一篇
设置 Google Cloud 项目的实用指南。它被分解成
编号步骤,每个步骤都有一个清晰的子标题。文章使用详细的
解释并包括如截图等实用示例。它还
使用清晰直接的语言引导读者通过这个过程。
元提示提供了一种动态和创新的方式,以利用生成式 AI 模型的力量,促进复杂、多方面的提示以及生成其他提示的提示的创建。它扩大了应用范围,从文本和图像生成到风格和功能提示,以及针对特定目标的优化。随着你继续完善和探索元提示的潜力,它承诺将改变你利用、交互和从使用 LLMs 中受益的方式。
# 摘要
在阅读这一章之后,你现在意识到给出清晰的指示和示例以生成所需输出是多么重要。此外,你通过使用 Python 中的正则表达式从分层列表中提取结构化数据获得了实践经验,并且你学习了如何利用嵌套数据结构如 JSON 和 YAML 来生成强大、可解析的输出。
你已经学习了几个最佳实践和有效的提示工程技巧,包括著名的“像对我五岁孩子解释”,角色提示和元提示技巧。在下一章中,你将学习如何使用一个流行的 LLM 包 LangChain,它将帮助你创建更高级的提示工程工作流程。
AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。
更多推荐


所有评论(0)