piuba-bigdata/contextualized_hate_speech
收藏Hugging Face2024-03-26 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/piuba-bigdata/contextualized_hate_speech
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- es
pretty_name: contextualized_hate_speech
task_categories:
- text-classification
tags:
- hate_speech
size_categories:
- 10K<n<100K
---
# Contextualized Hate Speech: A dataset of comments in news outlets on Twitter
## Dataset Description
- **Repository: [https://github.com/finiteautomata/contextualized-hatespeech-classification](https://github.com/finiteautomata/contextualized-hatespeech-classification)**
- **Paper**: ["Assessing the impact of contextual information in hate speech detection"](https://arxiv.org/abs/2210.00465), Juan Manuel Pérez, Franco Luque, Demian Zayat, Martín Kondratzky, Agustín Moro, Pablo Serrati, Joaquín Zajac, Paula Miguel, Natalia Debandi, Agustín Gravano, Viviana Cotik
- **Point of Contact**: jmperez (at) dc uba ar
### Dataset Summary

This dataset is a collection of tweets that were posted in response to news articles from five specific Argentinean news outlets: Clarín, Infobae, La Nación, Perfil and Crónica, during the COVID-19 pandemic. The comments were analyzed for hate speech across eight different characteristics: against women, racist content, class hatred, against LGBTQ+ individuals, against physical appearance, against people with disabilities, against criminals, and for political reasons. All the data is in Spanish.
Each comment is labeled with the following variables
| Label | Description |
| :--------- | :---------------------------------------------------------------------- |
| HATEFUL | Contains hate speech (HS)? |
| CALLS | If it is hateful, is this message calling to (possibly violent) action? |
| WOMEN | Is this against women? |
| LGBTI | Is this against LGBTI people? |
| RACISM | Is this a racist message? |
| CLASS | Is this a classist message? |
| POLITICS | Is this HS due to political ideology? |
| DISABLED | Is this HS against disabled people? |
| APPEARANCE | Is this HS against people due to their appearance? (e.g. fatshaming) |
| CRIMINAL | Is this HS against criminals or people in conflict with law? |
There is an extra label `CALLS`, which represents whether a comment is a call to violent action or not.
The `HATEFUL` and `CALLS` labels are binarized by simple majority; the characteristic or category variables are put to `1` if at least one annotator marked it as such.
A raw, non-aggregated version of the dataset can be found at [piuba-bigdata/contextualized_hate_speech_raw](https://huggingface.co/datasets/piuba-bigdata/contextualized_hate_speech_raw)
### Citation Information
```bibtex
@article{perez2022contextual,
author = {Pérez, Juan Manuel and Luque, Franco M. and Zayat, Demian and Kondratzky, Martín and Moro, Agustín and Serrati, Pablo Santiago and Zajac, Joaquín and Miguel, Paula and Debandi, Natalia and Gravano, Agustín and Cotik, Viviana},
journal = {IEEE Access},
title = {Assessing the Impact of Contextual Information in Hate Speech Detection},
year = {2023},
volume = {11},
number = {},
pages = {30575-30590},
doi = {10.1109/ACCESS.2023.3258973}
}
```
### Contributions
[More Information Needed]
提供机构:
piuba-bigdata
原始信息汇总
Contextualized Hate Speech Dataset Summary
Basic Information
- Language: Spanish
- Pretty Name: contextualized_hate_speech
- Task Categories: text-classification
- Tags: hate_speech
- Size Categories: 10K<n<100K
Dataset Description
- Repository: https://github.com/finiteautomata/contextualized-hatespeech-classification
- Paper: "Assessing the impact of contextual information in hate speech detection"
- Point of Contact: jmperez (at) dc uba ar
Dataset Content
- Source: Tweets in response to news articles from five Argentinean news outlets: Clarín, Infobae, La Nación, Perfil, and Crónica, during the COVID-19 pandemic.
- Analysis: Comments analyzed for hate speech across eight characteristics: against women, racist content, class hatred, against LGBTQ+ individuals, against physical appearance, against people with disabilities, against criminals, and for political reasons.
Label Information
- HATEFUL: Indicates if the comment contains hate speech.
- CALLS: Indicates if the hateful comment is a call to (possibly violent) action.
- Characteristics:
- WOMEN: Against women.
- LGBTI: Against LGBTI people.
- RACISM: Racist message.
- CLASS: Classist message.
- POLITICS: Due to political ideology.
- DISABLED: Against disabled people.
- APPEARANCE: Against people due to their appearance.
- CRIMINAL: Against criminals or people in conflict with law.
Additional Notes
- Label Aggregation:
HATEFULandCALLSlabels are binarized by simple majority; characteristic variables are marked1if at least one annotator marked it as such. - Raw Dataset: Available at piuba-bigdata/contextualized_hate_speech_raw.
Citation Information
bibtex @article{perez2022contextual, author = {Pérez, Juan Manuel and Luque, Franco M. and Zayat, Demian and Kondratzky, Martín and Moro, Agustín and Serrati, Pablo Santiago and Zajac, Joaquín and Miguel, Paula and Debandi, Natalia and Gravano, Agustín and Cotik, Viviana}, journal = {IEEE Access}, title = {Assessing the Impact of Contextual Information in Hate Speech Detection}, year = {2023}, volume = {11}, number = {}, pages = {30575-30590}, doi = {10.1109/ACCESS.2023.3258973} }
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



