vigneshbhatn/FinePersonas-Conversations-Email-Summaries
收藏Hugging Face2026-02-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/vigneshbhatn/FinePersonas-Conversations-Email-Summaries
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
license_name: qwen
license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
size_categories:
- 100K<n<1M
dataset_info:
features:
- name: conversation_id
dtype: int64
- name: email
dtype: string
- name: maximum_brevity_summary
dtype: string
- name: summary
dtype: string
- name: distilabel_metadata
struct:
- name: raw_input_email_summarization_0
list:
- name: content
dtype: string
- name: role
dtype: string
- name: raw_output_email_summarization_0
dtype: string
- name: model_name
dtype: string
splits:
- name: train
num_bytes: 1398776307
num_examples: 363584
download_size: 437072861
dataset_size: 1398776307
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
tags:
- synthetic
- distilabel
- rlaif
- email
- summaries
task_categories:
- summarization
language:
- en
pretty_name: FinePersonas Conversations Email Summaries
---
<p align="left">
<a href="https://github.com/argilla-io/distilabel">
<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/>
</a>
</p>
# FinePersonas Conversations Email Summaries
This dataset contains summaries of emails from the dataset [argilla/FinePersonas-Synthetic-Email-Conversations](https://huggingface.co/datasets/argilla/FinePersonas-Synthetic-Email-Conversations). This dataset exploded the conversations from the original dataset and generated for each email a maximum brevity summary (one sentence) and a more extended summary (at most 3 sentences) using [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct).
The pipeline script was uploaded to easily reproduce the dataset:
[email_summaries.py](https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries/raw/main/email_summaries.py).
It can be run directly using the CLI:
```console
distilabel pipeline run --script "https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries/raw/main/email_summaries.py"
```
## Dataset Summary
This dataset contains a `pipeline.yaml` which can be used to reproduce the pipeline that generated it in distilabel using the `distilabel` CLI:
```console
distilabel pipeline run --config "https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries/raw/main/pipeline.yaml"
```
or explore the configuration:
```console
distilabel pipeline info --config "https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries/raw/main/pipeline.yaml"
```
## Dataset structure
The examples have the following structure per configuration:
<details><summary> Configuration: default </summary><hr>
```json
{
"conversation_id": 0,
"distilabel_metadata": {
"raw_input_email_summarization_0": [
{
"content": "You are an AI assistant designed to summarize emails for the recipient of the email. Your task is to create concise, objective summaries that capture the essential information communicated by the sender, from the recipient\u0027s perspective but without directly addressing or mentioning the recipient.\n\n## Key points\n\n1. Do not use phrases like \"you\" or \"the recipient\" in the summary.\n2. Do not use the recipient name.\n3. Do not use the third person.\n4. Focus on the sender\u0027s actions and intentions.\n5. Summarize as if describing the email to a third party.\n\nFor example, instead of \"Alex is reaching out to you to collaborate\" or \"Alex is reaching out Samantha to collaborate\", use \"Alex is reaching out to collaborate\".\n\n## Output Requirements\n\nProvide two types of summaries:\n1. A maximum brevity summary: extract the main key point of the conversation and present it in one very very short sentence. Include details such as dates, cities, venues, etc if required.\n2. A more detailed summary (up to three sentences).\n\n## Output Format:\n\n```markdown\n## Maximum brevity summary\n\n[One-sentence summary here]\n\n## Summary\n\n[Up to three-sentence summary here]\n```\n",
"role": "system"
},
{
"content": "Subject: Virtual Field Trip Collaboration\n\nHey Michael,\n\nI hope you\u0027re doing well! I\u0027ve been thinking a lot about the ideas we discussed at the conference and I think I have an idea for our collaboration project. What if we created a virtual field trip that takes students on a journey through different cities around the world, exploring the impacts of climate change on urban environments? \n\nI think this could be a great way to combine our expertise in environmental and urban geography and create an engaging learning experience for our students. Let me know what you think!\n\nBest,\nSarah",
"role": "user"
}
],
"raw_output_email_summarization_0": "## Maximum brevity summary\n\nSarah proposes a virtual field trip project exploring climate change impacts in urban environments.\n\n## Summary\n\nSarah suggests collaborating on a virtual field trip project that explores the impacts of climate change in urban environments across different cities. She believes this could combine expertise in environmental and urban geography to create an engaging learning experience for students. Sarah is seeking feedback on the idea."
},
"email": "Subject: Virtual Field Trip Collaboration\n\nHey Michael,\n\nI hope you\u0027re doing well! I\u0027ve been thinking a lot about the ideas we discussed at the conference and I think I have an idea for our collaboration project. What if we created a virtual field trip that takes students on a journey through different cities around the world, exploring the impacts of climate change on urban environments? \n\nI think this could be a great way to combine our expertise in environmental and urban geography and create an engaging learning experience for our students. Let me know what you think!\n\nBest,\nSarah",
"maximum_brevity_summary": "Sarah proposes a virtual field trip project exploring climate change impacts in urban environments.",
"model_name": "Qwen/Qwen2.5-72B-Instruct",
"summary": "Sarah suggests collaborating on a virtual field trip project that explores the impacts of climate change in urban environments across different cities. She believes this could combine expertise in environmental and urban geography to create an engaging learning experience for students. Sarah is seeking feedback on the idea."
}
```
This subset can be loaded as:
```python
from datasets import load_dataset
ds = load_dataset("argilla/FinePersonas-Conversations-Email-Summaries", "default")
```
Or simply as it follows, since there's only one configuration and is named `default`:
```python
from datasets import load_dataset
ds = load_dataset("argilla/FinePersonas-Conversations-Email-Summaries")
```
</details>
license: 其他
license_name: qwen
license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
size_categories:
- 10万<n<100万
dataset_info:
features:
- name: conversation_id
dtype: int64
- name: email
dtype: string
- name: maximum_brevity_summary
dtype: string
- name: summary
dtype: string
- name: distilabel_metadata
struct:
- name: raw_input_email_summarization_0
list:
- name: content
dtype: string
- name: role
dtype: string
- name: raw_output_email_summarization_0
dtype: string
- name: model_name
dtype: string
splits:
- name: train
num_bytes: 1398776307
num_examples: 363584
download_size: 437072861
dataset_size: 1398776307
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
tags:
- 合成数据
- distilabel
- 基于AI反馈的强化学习(RLAIF)
- 电子邮件
- 摘要
task_categories:
- 文本摘要
language:
- en
pretty_name: FinePersonas 对话电子邮件摘要
<p align="left">
<a href="https://github.com/argilla-io/distilabel">
<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="基于Distilabel构建" width="200" height="32"/>
</a>
</p>
# FinePersonas 对话电子邮件摘要
本数据集源自数据集[argilla/FinePersonas-Synthetic-Email-Conversations](https://huggingface.co/datasets/argilla/FinePersonas-Synthetic-Email-Conversations)中的电子邮件对话内容。我们对原始数据集的对话进行了扩展,并使用[Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)为每封电子邮件生成了极简摘要(单句形式)与扩展摘要(最多包含三句)。
本数据集附带了可复现数据集生成流程的流水线脚本:[email_summaries.py](https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries/raw/main/email_summaries.py)。可通过命令行界面直接运行该脚本:
console
distilabel pipeline run --script "https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries/raw/main/email_summaries.py"
## 数据集概述
本数据集包含一个`pipeline.yaml`配置文件,可通过distilabel的命令行界面复现生成本数据集的完整流水线:
console
distilabel pipeline run --config "https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries/raw/main/pipeline.yaml"
或查看该配置的详细信息:
console
distilabel pipeline info --config "https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries/raw/main/pipeline.yaml"
## 数据集结构
各配置对应的样本结构如下:
<details><summary> 配置:default </summary><hr>
json
{
"conversation_id": 0,
"distilabel_metadata": {
"raw_input_email_summarization_0": [
{
"content": "您是一款专为电子邮件收件人设计的AI助手,任务是从收件人视角出发,生成简洁客观的摘要,精准捕捉发送方传达的核心信息,且无需直接提及或称呼收件人。
## 核心要求
1. 摘要中不得使用“您”或“收件人”类表述
2. 不得出现收件人姓名
3. 不得使用第三人称视角
4. 聚焦发送方的行为与意图
5. 摘要需以向第三方描述邮件的口吻撰写
示例:将“亚历克斯联系您寻求合作”或“亚历克斯联系萨曼莎寻求合作”改写为“亚历克斯提出合作邀约”。
## 输出要求
需生成两类摘要:
1. 极简摘要:提取对话核心要点,以一句极简短的语句呈现,如需可包含日期、城市、地点等细节信息
2. 详细摘要(最多三句)
## 输出格式:
markdown
## 极简摘要
[单句摘要内容]
## 摘要
[最多三句的摘要内容]
",
"role": "system"
},
{
"content": "主题:虚拟实地考察合作邀约
您好迈克尔,
希望您一切安好!我一直在回顾我们在会议上讨论的想法,目前有一个合作项目的构思:我们可以打造一场虚拟实地考察之旅,带领学生环游全球不同城市,探索气候变化对城市环境的影响。
我认为这可以很好地结合我们在环境与城市地理学领域的专业优势,为学生打造一段沉浸式的学习体验。期待您的反馈!
顺颂时祺,
萨拉",
"role": "user"
}
],
"raw_output_email_summarization_0": "## 极简摘要
萨拉提出了一项探索城市环境气候变化影响的虚拟实地考察合作项目。
## 摘要
萨拉提议合作开展一项虚拟实地考察项目,带领学生环游全球不同城市,探索气候变化对城市环境的影响。她认为该项目可结合双方在环境与城市地理学领域的专业优势,为学生打造沉浸式学习体验,并期待获取您对该构想的反馈。"
},
"email": "主题:虚拟实地考察合作邀约
您好迈克尔,
希望您一切安好!我一直在回顾我们在会议上讨论的想法,目前有一个合作项目的构思:我们可以打造一场虚拟实地考察之旅,带领学生环游全球不同城市,探索气候变化对城市环境的影响。
我认为这可以很好地结合我们在环境与城市地理学领域的专业优势,为学生打造一段沉浸式的学习体验。期待您的反馈!
顺颂时祺,
萨拉",
"maximum_brevity_summary": "萨拉提出了一项探索城市环境气候变化影响的虚拟实地考察合作项目。",
"model_name": "Qwen/Qwen2.5-72B-Instruct",
"summary": "萨拉提议合作开展一项虚拟实地考察项目,带领学生环游全球不同城市,探索气候变化对城市环境的影响。她认为该项目可结合双方在环境与城市地理学领域的专业优势,为学生打造沉浸式学习体验,并期待获取您对该构想的反馈。"
}
该子集可通过以下方式加载:
python
from datasets import load_dataset
ds = load_dataset("argilla/FinePersonas-Conversations-Email-Summaries", "default")
由于本数据集仅包含一个名为`default`的配置,也可通过更简洁的方式加载:
python
from datasets import load_dataset
ds = load_dataset("argilla/FinePersonas-Conversations-Email-Summaries")
</details>
提供机构:
vigneshbhatn



