five

WAC Corpus - Wikipedia Abusive Conversations

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/6817092
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains conversations between Wikipedia editors, which are annotated in terms of various types of abuse, at the level of messages. This corpus is described in the following publication: N. Cécillon, V. Labatut, R. Dufour, and G. Linarès, “WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection,” in 12th Language Resources and Evaluation Conference, 2020, pp. 1375–1383. ⟨hal-02497514⟩ The repository also contains the figures shown in this article. Sources. Our corpus aligns two existing corpora: Messages and conversation structures of WikiConv (https://github.com/conversationai/wikidetox/tree/master/wikiconv) Manual annotations in toxicity of Wikipedia Comment Corpus (WCC -- https://doi.org/10.6084/m9.figshare.4054689) Citation. If you use this dataset, please cite the above article. @InProceedings{Cecillon2020,  author    = {Cécillon, Noé and Labatut, Vincent and Dufour, Richard and Linarès, Georges},  title     = {{WAC}: A Corpus of {W}ikipedia Conversations for Online Abuse Detection},  booktitle = {12\textsuperscript{th} Language Resources and Evaluation Conference},  year      = {2020},  pages     = {1375-1383},  address   = {Marseille, FR},  url       = {http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.172.pdf},}
创建时间:
2024-10-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作