five

diogofouto/dialogsum-augmented

收藏
Hugging Face2023-12-18 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/diogofouto/dialogsum-augmented
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- # DialogSum Enhanced Dataset ## Overview DialogSum Enhanced is an extension of the original DialogSum dataset, enriched with a new column called 'Relevant Sentences.' This dataset is designed to facilitate research in dialogue summarization by providing additional information about the dialogue turns that GPT-4 considers relevant for generating summaries. ### Changes from DialogSum The primary enhancement in DialogSum Enhanced is the inclusion of the 'Relevant Sentences' column. This column contains the dialogue turns that GPT-4 identified as crucial for the generation of a summary. This information can be valuable for understanding the model's decision-making process and improving dialogue summarization models. ### Split Information - **Train Split:** The train split in DialogSum Enhanced consists of half of the original DialogSum train split. - **Test and Validation Sets:** The test and validation sets in DialogSum Enhanced retain their full length from the original DialogSum dataset. ## Dataset Structure The dataset is provided in a CSV format with the following columns: 1. **id:** Unique identifier for each dialogue. 2. **dialogue:** The sequential turns of the dialogue. 3. **relevant_sentences:** The dialogue turns that GPT-4 considered relevant for generating the summary. 4. **summary:** The reference summary for the dialogue. ## Usage Researchers and practitioners interested in dialogue summarization can leverage DialogSum Enhanced for training, validating, and testing their models. The 'Relevant Sentences' column provides additional insights into the model's decision-making process during summarization.
提供机构:
diogofouto
原始信息汇总

DialogSum Enhanced Dataset

概述

DialogSum Enhanced 是原始 DialogSum 数据集的扩展,增加了名为 Relevant Sentences 的新列。该数据集旨在通过提供 GPT-4 认为对生成摘要至关重要的对话轮次信息,促进对话摘要研究。

与 DialogSum 的区别

DialogSum Enhanced 的主要增强是包含了 Relevant Sentences 列。该列包含 GPT-4 识别为生成摘要关键的对话轮次。这些信息对于理解模型的决策过程和改进对话摘要模型非常有价值。

分割信息

  • 训练集: DialogSum Enhanced 的训练集包含原始 DialogSum 训练集的一半。

  • 测试集和验证集: DialogSum Enhanced 的测试集和验证集保留了原始 DialogSum 数据集的完整长度。

数据集结构

数据集以 CSV 格式提供,包含以下列:

  1. id: 每个对话的唯一标识符。
  2. dialogue: 对话的顺序轮次。
  3. relevant_sentences: GPT-4 认为对生成摘要至关重要的对话轮次。
  4. summary: 对话的参考摘要。

使用

对对话摘要感兴趣的研究人员和从业者可以利用 DialogSum Enhanced 进行模型训练、验证和测试。Relevant Sentences 列提供了模型在摘要生成过程中决策过程的额外见解。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作