Skip to content

Summarizing

LLM 在文本概括上展示了强大的水准,下面将使用 Prompt Engineering 的方式,调用 API 接口,实现文本概括的功能

需要注意的是,LLM 输出概括内容的侧重性,是和 Prompt 有关的,因此在实际设计的过程中,应当自己思考自己实际需要使用的场景,来调整 Prompt

单一文本概括 Prompt

输入文本:

python
prod_review = """
Got this panda plush toy for my daughter's birthday, \
who loves it and takes it everywhere. It's soft and \ 
super cute, and its face has a friendly look. It's \ 
a bit small for what I paid though. I think there \ 
might be other options that are bigger for the \ 
same price. It arrived a day earlier than expected, \ 
so I got to play with it myself before I gave it \ 
to her.
"""

限制输出文本长度

在这里我们在 Prompt 中限制输出的文本长度不多于 30 个

python
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site. 

Summarize the review below, delimited by triple 
backticks, in at most 30 words. 

Review: ```{prod_review}```"""

response = get_completion(prompt)
print(response)

关键角度侧重

有时,针对不同的业务,我们对文本的侧重会有所不同。例如对于商品评论文本,物流会更关心运输时效,商家更加关心价格与商品质量,平台更关心整体服务体验

我们可以通过增加 Prompt 提示,来体现对于某个特定角度的侧重

侧重于运输

python
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site to give feedback to the \
Shipping deparmtment. 

Summarize the review below, delimited by triple 
backticks, in at most 30 words, and focusing on any aspects \
that mention shipping and delivery of the product. 

Review: ```{prod_review}```"""

response = get_completion(prompt)
print(response)
text
The panda plush arrived a day early, which was great, but the size was smaller than expected for the price. No issues with shipping or delivery. (30 words)

可以看到,强调了 arrived a day early,说明 LLM 是从运输的角度去总结

侧重于价格与质量

python
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site to give feedback to the \
pricing deparmtment, responsible for determining the \
price of the product.  

Summarize the review below, delimited by triple 
backticks, in at most 30 words, and focusing on any aspects \
that are relevant to the price and perceived value. 

Review: ```{prod_review}```"""

response = get_completion(prompt)
print(response)
text
"Panda plush is soft, cute, and loved, but perceived as overpriced for its size—suggests competitors offer larger options for the same price. Fast delivery noted." (30 words)

这里还是存在问题,LLM 输出的不稳定,上面两次其实对 Prompt 的改动并不大,但是最后的结果差距还是有点大,需要优化 Prompt,以获得更加稳定的输出

关键信息提取

虽然我们通过添加关键角度侧重的 Prompt,使得文本摘要更侧重于某一特定方面,但是可以发现,结果中也会保留一些其他信息,如价格与质量角度的概括中仍保留了“快递提前到货”的信息。有时这些信息是有帮助的,但如果我们只想要提取某一角度的信息,并过滤掉其他所有信息,则可以要求 LLM 进行“文本提取 (Extract)”而非“文本概括 (Summarize)”

python
prompt = f"""
Your task is to extract relevant information from \ 
a product review from an ecommerce site to give \
feedback to the Shipping department. 

From the review below, delimited by triple quotes \
extract the information relevant to shipping and \ 
delivery. Limit to 30 words. 

Review: ```{prod_review}```"""

response = get_completion(prompt)
print(response)

第一次输出的结果很奇怪,里面有一些富含感情色彩的输出,但这并不是我想要的

text
The product arrived a day earlier than expected, which was great. (15 words)

之后修改了一下 Prompt,让它不要做任何情绪化的输出,只要陈述事实即可

python
prompt = f"""
Your task is to extract relevant information from \
a product review from an ecommerce site to give \
feedback to the Shipping department. 

From the review below, delimited by triple quotes \
extract the information relevant to shipping and \
delivery. Limit to 30 words.
No more emotions, just the facts.

Review: ```{prod_review}```"""

response = get_completion(prompt)
print(response)
text
Arrived a day earlier than expected. (10 words)

多条文本概括 Prompt 实验

在实际的工作流中,我们往往有许许多多的评论文本,以下展示了一个基于 for 循环调用“文本概括”工具并依次打印的示例。当然,在实际生产中,对于上百万甚至上千万的评论文本,使用 for 循环也是不现实的,可能需要考虑整合评论、分布式等方法提升运算效率

把文本丢到一个 list 里面,然后循环,遍历

python
for i in range(len(reviews)):
    prompt = f"""
    Your task is to generate a short summary of a product \ 
    review from an ecommerce site. 

    Summarize the review below, delimited by triple \
    backticks in at most 20 words. 

    Review: ```{reviews[i]}```"""
    response = get_completion(prompt)
    print(i, response, "\n")
text
0 **Summary:** Cute, soft panda plush loved by daughter, but a bit small for the price. Arrived early. (20 words) 

1 **Summary:** Affordable lamp with storage, fast delivery. Excellent customer service for broken/missing parts. Highly recommended. (20 words) 

2 **Summary:** Recommended electric toothbrush with great battery life, but small head. Good value at $50; generic heads save money. Leaves teeth feeling clean. (20 words) 

3 **Summary:** The product was on sale in November but prices rose in December. Quality has declined, motor failed after a year, and warranty was expired. (20 words)

最后更新于:

Released under the MIT License.