emnlp2017-cmapsum-corpus
收藏arXiv2017-07-21 更新2024-06-21 收录
下载链接:
https://github.com/UKPLab/emnlp2017-cmapsum-corpus
下载链接
链接失效反馈官方服务:
资源简介:
emnlp2017-cmapsum-corpus是由达姆施塔特工业大学的AIPHES研究训练组和UKP实验室创建的一个新数据集,专注于教育主题的多文档摘要。该数据集包含30个主题,每个主题约有40个源文档和一张由众包工作者共识生成的概念图摘要。数据集的创建过程结合了自动预处理、可扩展的众包和高质量的专家注释,旨在解决传统摘要方法在处理大型文档集合时的不足。该数据集的应用领域包括教育内容的快速理解和信息检索,以及支持用户在处理大量文档时的决策过程。
The emnlp2017-cmapsum-corpus is a novel dataset created by the AIPHES Research Training Group and the UKP Lab at Technische Universität Darmstadt, focusing on multi-document summarization for educational topics. It contains 30 topics, each with approximately 40 source documents and a concept map summary generated through consensus among crowdworkers. The dataset was developed via a workflow combining automatic preprocessing, scalable crowdsourcing, and high-quality expert annotations, aiming to address the limitations of traditional summarization methods when handling large document collections. Its application areas include rapid comprehension of educational content, information retrieval, and supporting users' decision-making processes when dealing with a large volume of documents.
提供机构:
计算机科学与技术系,达姆施塔特工业大学
创建时间:
2017-04-14



