WikiDetox
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/ewulczyn/wiki-detox
下载链接
链接失效反馈官方服务:
资源简介:
该数据集来源于维基百科的讨论页面,包含了被标记为“中性”或“仇恨性”的样本。每个样本都是对维基百科讨论页面的编辑,被分类为“中性”或“仇恨性”。该数据集的规模包括95,692个训练样本,32,128个开发样本以及31,866个测试样本,其任务是进行仇恨言论检测。
This dataset is derived from Wikipedia discussion pages, and consists of samples annotated as either "neutral" or "hateful". Each sample represents an edit to a Wikipedia discussion page, and is classified into the two aforementioned categories. The dataset is split into 95,692 training samples, 32,128 development samples, and 31,866 test samples, with the target task being hate speech detection.
提供机构:
Wikipedia



