LDKP3K

Name: LDKP3K
Creator: Hugging Face
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://huggingface.co/datasets/midas/ldkp3k

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了大约10万篇带有关键词标记的长文档，其中一个小版本的数据集包含了2万训练样本、3,413个验证样本以及3,339个测试样本。该数据集是通过将KP20K语料库映射到S2ORC生成的。在规模上，它包括2万训练样本、3,413个验证样本以及3,339个测试样本，其任务是进行关键词提取。

This dataset comprises approximately 100,000 long documents annotated with keywords. A small-scale variant of this dataset includes 20,000 training samples, 3,413 validation samples, and 3,339 test samples. This dataset is generated by mapping the KP20K corpus to S2ORC. The small-scale variant follows the aforementioned sample splits, and its core task is keyword extraction.

提供机构：

Hugging Face

5,000+

优质数据集

54 个

任务类型

进入经典数据集