nedjmaou/MLMA_hate_speech
收藏Hugging Face2022-12-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/nedjmaou/MLMA_hate_speech
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
---
# Disclaimer
*This is a hate speech dataset (in Arabic, French, and English).*
*Offensive content that does not reflect the opinions of the authors.*
# Dataset of our EMNLP 2019 Paper (Multilingual and Multi-Aspect Hate Speech Analysis)
For more details about our dataset, please check our paper:
@inproceedings{ousidhoum-etal-multilingual-hate-speech-2019,
title = "Multilingual and Multi-Aspect Hate Speech Analysis",
author = "Ousidhoum, Nedjma
and Lin, Zizheng
and Zhang, Hongming
and Song, Yangqiu
and Yeung, Dit-Yan",
booktitle = "Proceedings of EMNLP",
year = "2019",
publisher = "Association for Computational Linguistics",
}
(You can preview our paper on https://arxiv.org/pdf/1908.11049.pdf)
## Clarification
The multi-labelled tasks are *the hostility type of the tweet* and the *annotator's sentiment*. (We kept labels on which at least two annotators agreed.)
## Taxonomy
In further experiments that involved binary classification tasks of the hostility/hate/abuse type, we considered single-labelled *normal* instances to be *non-hate/non-toxic* and all the other instances to be *toxic*.
## Dataset
Our dataset is composed of three csv files sorted by language. They contain the tweets and the annotations described in our paper:
the hostility type *(column: tweet sentiment)*
hostility directness *(column: directness)*
target attribute *(column: target)*
target group *(column: group)*
annotator's sentiment *(column: annotator sentiment)*.
## Experiments
To replicate our experiments, please see https://github.com/HKUST-KnowComp/MLMA_hate_speech/blob/master/README.md
提供机构:
nedjmaou
原始信息汇总
数据集概述
数据集来源
- 该数据集源自2019年EMNLP会议论文《Multilingual and Multi-Aspect Hate Speech Analysis》。
数据集内容
- 数据集包含三种语言(阿拉伯语、法语和英语)的仇恨言论。
- 数据集由三个CSV文件组成,按语言分类,包含推文及其相关注释。
注释信息
- 注释包括:
- 敌意类型(列名:tweet sentiment)
- 敌意直接性(列名:directness)
- 目标属性(列名:target)
- 目标群体(列名:group)
- 注释者情感(列名:annotator sentiment)
分类任务
- 在进一步的二元分类实验中,将单标签的“正常”实例视为“非仇恨/非有毒”,其他所有实例视为“有毒”。
数据集使用
- 数据集用于多语言和多方面的仇恨言论分析。



