diogofouto/dialogsum-augmented
收藏Hugging Face2023-12-18 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/diogofouto/dialogsum-augmented
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---
# DialogSum Enhanced Dataset
## Overview
DialogSum Enhanced is an extension of the original DialogSum dataset, enriched with a new column called 'Relevant Sentences.' This dataset is designed to facilitate research in dialogue summarization by providing additional information about the dialogue turns that GPT-4 considers relevant for generating summaries.
### Changes from DialogSum
The primary enhancement in DialogSum Enhanced is the inclusion of the 'Relevant Sentences' column. This column contains the dialogue turns that GPT-4 identified as crucial for the generation of a summary. This information can be valuable for understanding the model's decision-making process and improving dialogue summarization models.
### Split Information
- **Train Split:** The train split in DialogSum Enhanced consists of half of the original DialogSum train split.
- **Test and Validation Sets:** The test and validation sets in DialogSum Enhanced retain their full length from the original DialogSum dataset.
## Dataset Structure
The dataset is provided in a CSV format with the following columns:
1. **id:** Unique identifier for each dialogue.
2. **dialogue:** The sequential turns of the dialogue.
3. **relevant_sentences:** The dialogue turns that GPT-4 considered relevant for generating the summary.
4. **summary:** The reference summary for the dialogue.
## Usage
Researchers and practitioners interested in dialogue summarization can leverage DialogSum Enhanced for training, validating, and testing their models. The 'Relevant Sentences' column provides additional insights into the model's decision-making process during summarization.
提供机构:
diogofouto
原始信息汇总
DialogSum Enhanced Dataset
概述
DialogSum Enhanced 是原始 DialogSum 数据集的扩展,增加了名为 Relevant Sentences 的新列。该数据集旨在通过提供 GPT-4 认为对生成摘要至关重要的对话轮次信息,促进对话摘要研究。
与 DialogSum 的区别
DialogSum Enhanced 的主要增强是包含了 Relevant Sentences 列。该列包含 GPT-4 识别为生成摘要关键的对话轮次。这些信息对于理解模型的决策过程和改进对话摘要模型非常有价值。
分割信息
-
训练集: DialogSum Enhanced 的训练集包含原始 DialogSum 训练集的一半。
-
测试集和验证集: DialogSum Enhanced 的测试集和验证集保留了原始 DialogSum 数据集的完整长度。
数据集结构
数据集以 CSV 格式提供,包含以下列:
- id: 每个对话的唯一标识符。
- dialogue: 对话的顺序轮次。
- relevant_sentences: GPT-4 认为对生成摘要至关重要的对话轮次。
- summary: 对话的参考摘要。
使用
对对话摘要感兴趣的研究人员和从业者可以利用 DialogSum Enhanced 进行模型训练、验证和测试。Relevant Sentences 列提供了模型在摘要生成过程中决策过程的额外见解。



