WAC Corpus - Wikipedia Abusive Conversations
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/6817092
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains conversations between Wikipedia editors, which are annotated in terms of various types of abuse, at the level of messages. This corpus is described in the following publication:
N. Cécillon, V. Labatut, R. Dufour, and G. Linarès, “WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection,” in 12th Language Resources and Evaluation Conference, 2020, pp. 1375–1383. ⟨hal-02497514⟩
The repository also contains the figures shown in this article.
Sources. Our corpus aligns two existing corpora:
Messages and conversation structures of WikiConv (https://github.com/conversationai/wikidetox/tree/master/wikiconv)
Manual annotations in toxicity of Wikipedia Comment Corpus (WCC -- https://doi.org/10.6084/m9.figshare.4054689)
Citation. If you use this dataset, please cite the above article.
@InProceedings{Cecillon2020, author = {Cécillon, Noé and Labatut, Vincent and Dufour, Richard and Linarès, Georges}, title = {{WAC}: A Corpus of {W}ikipedia Conversations for Online Abuse Detection}, booktitle = {12\textsuperscript{th} Language Resources and Evaluation Conference}, year = {2020}, pages = {1375-1383}, address = {Marseille, FR}, url = {http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.172.pdf},}
创建时间:
2024-10-01



