piuba-bigdata/contextualized_hate_speech

Name: piuba-bigdata/contextualized_hate_speech
Creator: piuba-bigdata
Published: 2024-03-26 20:12:41
License: 暂无描述

Hugging Face2024-03-26 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/piuba-bigdata/contextualized_hate_speech

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - es pretty_name: contextualized_hate_speech task_categories: - text-classification tags: - hate_speech size_categories: - 10K<n<100K --- # Contextualized Hate Speech: A dataset of comments in news outlets on Twitter ## Dataset Description - **Repository: [https://github.com/finiteautomata/contextualized-hatespeech-classification](https://github.com/finiteautomata/contextualized-hatespeech-classification)** - **Paper**: ["Assessing the impact of contextual information in hate speech detection"](https://arxiv.org/abs/2210.00465), Juan Manuel Pérez, Franco Luque, Demian Zayat, Martín Kondratzky, Agustín Moro, Pablo Serrati, Joaquín Zajac, Paula Miguel, Natalia Debandi, Agustín Gravano, Viviana Cotik - **Point of Contact**: jmperez (at) dc uba ar ### Dataset Summary ![Graphical representation of the dataset](Dataset%20graph.png) This dataset is a collection of tweets that were posted in response to news articles from five specific Argentinean news outlets: Clarín, Infobae, La Nación, Perfil and Crónica, during the COVID-19 pandemic. The comments were analyzed for hate speech across eight different characteristics: against women, racist content, class hatred, against LGBTQ+ individuals, against physical appearance, against people with disabilities, against criminals, and for political reasons. All the data is in Spanish. Each comment is labeled with the following variables | Label | Description | | :--------- | :---------------------------------------------------------------------- | | HATEFUL | Contains hate speech (HS)? | | CALLS | If it is hateful, is this message calling to (possibly violent) action? | | WOMEN | Is this against women? | | LGBTI | Is this against LGBTI people? | | RACISM | Is this a racist message? | | CLASS | Is this a classist message? | | POLITICS | Is this HS due to political ideology? | | DISABLED | Is this HS against disabled people? | | APPEARANCE | Is this HS against people due to their appearance? (e.g. fatshaming) | | CRIMINAL | Is this HS against criminals or people in conflict with law? | There is an extra label `CALLS`, which represents whether a comment is a call to violent action or not. The `HATEFUL` and `CALLS` labels are binarized by simple majority; the characteristic or category variables are put to `1` if at least one annotator marked it as such. A raw, non-aggregated version of the dataset can be found at [piuba-bigdata/contextualized_hate_speech_raw](https://huggingface.co/datasets/piuba-bigdata/contextualized_hate_speech_raw) ### Citation Information ```bibtex @article{perez2022contextual, author = {Pérez, Juan Manuel and Luque, Franco M. and Zayat, Demian and Kondratzky, Martín and Moro, Agustín and Serrati, Pablo Santiago and Zajac, Joaquín and Miguel, Paula and Debandi, Natalia and Gravano, Agustín and Cotik, Viviana}, journal = {IEEE Access}, title = {Assessing the Impact of Contextual Information in Hate Speech Detection}, year = {2023}, volume = {11}, number = {}, pages = {30575-30590}, doi = {10.1109/ACCESS.2023.3258973} } ``` ### Contributions [More Information Needed]

提供机构：

piuba-bigdata

原始信息汇总

Contextualized Hate Speech Dataset Summary

Basic Information

Language: Spanish
Pretty Name: contextualized_hate_speech
Task Categories: text-classification
Tags: hate_speech
Size Categories: 10K<n<100K

Dataset Description

Repository: https://github.com/finiteautomata/contextualized-hatespeech-classification
Paper: "Assessing the impact of contextual information in hate speech detection"
Point of Contact: jmperez (at) dc uba ar

Dataset Content

Source: Tweets in response to news articles from five Argentinean news outlets: Clarín, Infobae, La Nación, Perfil, and Crónica, during the COVID-19 pandemic.
Analysis: Comments analyzed for hate speech across eight characteristics: against women, racist content, class hatred, against LGBTQ+ individuals, against physical appearance, against people with disabilities, against criminals, and for political reasons.

Label Information

HATEFUL: Indicates if the comment contains hate speech.
CALLS: Indicates if the hateful comment is a call to (possibly violent) action.
Characteristics:
- WOMEN: Against women.
- LGBTI: Against LGBTI people.
- RACISM: Racist message.
- CLASS: Classist message.
- POLITICS: Due to political ideology.
- DISABLED: Against disabled people.
- APPEARANCE: Against people due to their appearance.
- CRIMINAL: Against criminals or people in conflict with law.

Additional Notes

Label Aggregation: HATEFUL and CALLS labels are binarized by simple majority; characteristic variables are marked 1 if at least one annotator marked it as such.
Raw Dataset: Available at piuba-bigdata/contextualized_hate_speech_raw.

Citation Information

bibtex @article{perez2022contextual, author = {Pérez, Juan Manuel and Luque, Franco M. and Zayat, Demian and Kondratzky, Martín and Moro, Agustín and Serrati, Pablo Santiago and Zajac, Joaquín and Miguel, Paula and Debandi, Natalia and Gravano, Agustín and Cotik, Viviana}, journal = {IEEE Access}, title = {Assessing the Impact of Contextual Information in Hate Speech Detection}, year = {2023}, volume = {11}, number = {}, pages = {30575-30590}, doi = {10.1109/ACCESS.2023.3258973} }

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集