biu-nlp/Controlled-Text-Reduction-dataset
收藏数据集概述
数据集名称
Controlled Text Reduction
数据集内容
包含文档-摘要对及其对应的文档中涵盖摘要内容的文本范围(highlight_spans)。
数据集结构
doc_text:输入文本。summary_text:输出文本。highlight_spans:输入文本中的范围,这些范围导致输出文本。
数据集示例
json {doc_text: The motion picture industrys most coveted award...with 32., summary_text: The Oscar, created 60 years ago by MGM...awarded person (32)., highlight_spans:[[0, 48], [50, 55], [57, 81], [184, 247], ..., [953, 975], [1033, 1081]]}
数据集子集
DUC-2001-2002:分为训练集、验证集和测试集。CNN-DM:单一分割。
引用信息
若使用此数据集,请引用以下论文:
@misc{https://doi.org/10.48550/arxiv.2210.13449, doi = {10.48550/ARXIV.2210.13449}, url = {https://arxiv.org/abs/2210.13449}, author = {Slobodkin, Aviv and Roit, Paul and Hirsch, Eran and Ernst, Ori and Dagan, Ido}, keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences}, title = {Controlled Text Reduction}, publisher = {arXiv}, year = {2022}, copyright = {Creative Commons Zero v1.0 Universal} }




