WikiSection
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/sebastianarnold/wikisection
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了与全球城市相关的维基百科文章,这些文章来源于维基百科的数据库转储。每篇文章中的文本段落都被标注了相应的章节类别和字数。为了提高模型的适用性,在训练过程中对数据集进行了预处理,并排除了与章节相关的信息。该数据集的规模包括训练集中的2,165篇文章和测试集中的658篇文章,这些文章被分为4个不同的章节类别。任务是对文本的连贯性进行评估。
This dataset contains Wikipedia articles related to global cities, sourced from Wikipedia database dumps. Each text paragraph in these articles is annotated with its corresponding chapter category and word count. To enhance model applicability, the dataset was preprocessed during the training phase, with chapter-related information excluded. The dataset includes 2,165 articles in the training set and 658 articles in the test set, and all articles are categorized into 4 distinct chapter categories. The task is to evaluate the textual coherence.
提供机构:
Wang et al. (2023)



