EURLex

arXiv2025-09-30 收录

下载链接：

http://manikvarma.org/downloads/XC/XMLRepository

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集专为极端多标签文本分类任务设计，其中包含了丰富的层次化信息和标签描述。在处理输入序列时，文本内容被截断至500个单词以内，而标签描述则限制在4个单词以内。此外，该数据集采用了来自其他数据集的词嵌入技术。其规模之大体现在拥有一个庞大的标签集合，但训练实例却相对稀疏。总体任务定位于极端多标签文本分类。

This dataset is purpose-built for extreme multi-label text classification tasks, which contains rich hierarchical information and label descriptions. When processing input sequences, the text content is truncated to no more than 500 words, whereas the label descriptions are restricted to a maximum of 4 words. Furthermore, word embedding techniques derived from external datasets are utilized in this dataset. It features a large-scale label set, yet the training instances are relatively sparse. The core task of this dataset is extreme multi-label text classification.

5,000+

优质数据集

54 个

任务类型

进入经典数据集