Reddit Corpus

arXiv2025-09-30 收录

下载链接：

https://github.com/polyai-ldn/conversational-datasets

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了从Reddit平台收集的帖子及评论，常被用于多种自然语言处理任务。此外，在所测试的语料库中，Reddit语料库在去偏见化任务上的表现结果是最差的。该数据集的规模属于大型。其特定任务是进行去偏见化评估。

This dataset comprises posts and comments collected from the Reddit platform, and is commonly utilized for a variety of natural language processing tasks. Furthermore, among all tested corpora, the Reddit corpus achieves the worst performance on debiasing tasks. This is a large-scale dataset, and its specific task is debiasing evaluation.

5,000+

优质数据集

54 个

任务类型

进入经典数据集