MK-DUC-01
收藏arXiv2022-07-01 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2110.01073v2
下载链接
链接失效反馈官方服务:
资源简介:
MK-DUC-01是由巴伊兰大学创建的第一个多文档关键词提取数据集,基于DUC-2001新闻领域的单文档关键词提取数据集构建。该数据集包含30个主题,每个主题约10.3篇相关新闻文章,总计308篇。数据集通过自动合并和重新排序过程,以及手动细化步骤,形成最终的高质量多文档关键词提取基准数据集。MK-DUC-01旨在解决多文档环境中信息冗余和分散补充信息的问题,推动多文本处理技术的进步。
MK-DUC-01 is the first multi-document keyword extraction dataset developed by Bar-Ilan University, built upon the single-document keyword extraction dataset for the news domain from DUC-2001. This dataset consists of 30 topics, with approximately 10.3 relevant news articles per topic, amounting to a total of 308 articles. The final high-quality multi-document keyword extraction benchmark dataset is formed through automatic merging and reordering procedures combined with manual refinement steps. MK-DUC-01 aims to address the problems of information redundancy and scattered supplementary information in multi-document scenarios, and promote the advancement of multi-text processing technologies.
提供机构:
巴伊兰大学
创建时间:
2021-10-04



