MK-DUC-01

Name: MK-DUC-01
Creator: 巴伊兰大学
Published: 2022-07-01 21:32:21
License: 暂无描述

arXiv2022-07-01 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/2110.01073v2

下载链接

链接失效反馈

官方服务：

资源简介：

MK-DUC-01是由巴伊兰大学创建的第一个多文档关键词提取数据集，基于DUC-2001新闻领域的单文档关键词提取数据集构建。该数据集包含30个主题，每个主题约10.3篇相关新闻文章，总计308篇。数据集通过自动合并和重新排序过程，以及手动细化步骤，形成最终的高质量多文档关键词提取基准数据集。MK-DUC-01旨在解决多文档环境中信息冗余和分散补充信息的问题，推动多文本处理技术的进步。

MK-DUC-01 is the first multi-document keyword extraction dataset developed by Bar-Ilan University, built upon the single-document keyword extraction dataset for the news domain from DUC-2001. This dataset consists of 30 topics, with approximately 10.3 relevant news articles per topic, amounting to a total of 308 articles. The final high-quality multi-document keyword extraction benchmark dataset is formed through automatic merging and reordering procedures combined with manual refinement steps. MK-DUC-01 aims to address the problems of information redundancy and scattered supplementary information in multi-document scenarios, and promote the advancement of multi-text processing technologies.

提供机构：

巴伊兰大学

创建时间：

2021-10-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集