nedjmaou/MLMA_hate_speech

Name: nedjmaou/MLMA_hate_speech
Creator: nedjmaou
Published: 2022-12-28 11:24:32
License: 暂无描述

Hugging Face2022-12-28 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/nedjmaou/MLMA_hate_speech

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit --- # Disclaimer *This is a hate speech dataset (in Arabic, French, and English).* *Offensive content that does not reflect the opinions of the authors.* # Dataset of our EMNLP 2019 Paper (Multilingual and Multi-Aspect Hate Speech Analysis) For more details about our dataset, please check our paper: @inproceedings{ousidhoum-etal-multilingual-hate-speech-2019, title = "Multilingual and Multi-Aspect Hate Speech Analysis", author = "Ousidhoum, Nedjma and Lin, Zizheng and Zhang, Hongming and Song, Yangqiu and Yeung, Dit-Yan", booktitle = "Proceedings of EMNLP", year = "2019", publisher = "Association for Computational Linguistics", } (You can preview our paper on https://arxiv.org/pdf/1908.11049.pdf) ## Clarification The multi-labelled tasks are *the hostility type of the tweet* and the *annotator's sentiment*. (We kept labels on which at least two annotators agreed.) ## Taxonomy In further experiments that involved binary classification tasks of the hostility/hate/abuse type, we considered single-labelled *normal* instances to be *non-hate/non-toxic* and all the other instances to be *toxic*. ## Dataset Our dataset is composed of three csv files sorted by language. They contain the tweets and the annotations described in our paper: the hostility type *(column: tweet sentiment)* hostility directness *(column: directness)* target attribute *(column: target)* target group *(column: group)* annotator's sentiment *(column: annotator sentiment)*. ## Experiments To replicate our experiments, please see https://github.com/HKUST-KnowComp/MLMA_hate_speech/blob/master/README.md

提供机构：

nedjmaou

原始信息汇总

数据集概述

数据集来源

该数据集源自2019年EMNLP会议论文《Multilingual and Multi-Aspect Hate Speech Analysis》。

数据集内容

数据集包含三种语言（阿拉伯语、法语和英语）的仇恨言论。
数据集由三个CSV文件组成，按语言分类，包含推文及其相关注释。

注释信息

注释包括：
- 敌意类型（列名：tweet sentiment）
- 敌意直接性（列名：directness）
- 目标属性（列名：target）
- 目标群体（列名：group）
- 注释者情感（列名：annotator sentiment）

分类任务

在进一步的二元分类实验中，将单标签的“正常”实例视为“非仇恨/非有毒”，其他所有实例视为“有毒”。

数据集使用

数据集用于多语言和多方面的仇恨言论分析。

5,000+

优质数据集

54 个

任务类型

进入经典数据集