five

nedjmaou/MLMA_hate_speech

收藏
Hugging Face2022-12-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/nedjmaou/MLMA_hate_speech
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit --- # Disclaimer *This is a hate speech dataset (in Arabic, French, and English).* *Offensive content that does not reflect the opinions of the authors.* # Dataset of our EMNLP 2019 Paper (Multilingual and Multi-Aspect Hate Speech Analysis) For more details about our dataset, please check our paper: @inproceedings{ousidhoum-etal-multilingual-hate-speech-2019, title = "Multilingual and Multi-Aspect Hate Speech Analysis", author = "Ousidhoum, Nedjma and Lin, Zizheng and Zhang, Hongming and Song, Yangqiu and Yeung, Dit-Yan", booktitle = "Proceedings of EMNLP", year = "2019", publisher = "Association for Computational Linguistics", } (You can preview our paper on https://arxiv.org/pdf/1908.11049.pdf) ## Clarification The multi-labelled tasks are *the hostility type of the tweet* and the *annotator's sentiment*. (We kept labels on which at least two annotators agreed.) ## Taxonomy In further experiments that involved binary classification tasks of the hostility/hate/abuse type, we considered single-labelled *normal* instances to be *non-hate/non-toxic* and all the other instances to be *toxic*. ## Dataset Our dataset is composed of three csv files sorted by language. They contain the tweets and the annotations described in our paper: the hostility type *(column: tweet sentiment)* hostility directness *(column: directness)* target attribute *(column: target)* target group *(column: group)* annotator's sentiment *(column: annotator sentiment)*. ## Experiments To replicate our experiments, please see https://github.com/HKUST-KnowComp/MLMA_hate_speech/blob/master/README.md
提供机构:
nedjmaou
原始信息汇总

数据集概述

数据集来源

  • 该数据集源自2019年EMNLP会议论文《Multilingual and Multi-Aspect Hate Speech Analysis》。

数据集内容

  • 数据集包含三种语言(阿拉伯语、法语和英语)的仇恨言论。
  • 数据集由三个CSV文件组成,按语言分类,包含推文及其相关注释。

注释信息

  • 注释包括:
    • 敌意类型(列名:tweet sentiment)
    • 敌意直接性(列名:directness)
    • 目标属性(列名:target)
    • 目标群体(列名:group)
    • 注释者情感(列名:annotator sentiment)

分类任务

  • 在进一步的二元分类实验中,将单标签的“正常”实例视为“非仇恨/非有毒”,其他所有实例视为“有毒”。

数据集使用

  • 数据集用于多语言和多方面的仇恨言论分析。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作