five

piuba-bigdata/contextualized_hate_speech

收藏
Hugging Face2024-03-26 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/piuba-bigdata/contextualized_hate_speech
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - es pretty_name: contextualized_hate_speech task_categories: - text-classification tags: - hate_speech size_categories: - 10K<n<100K --- # Contextualized Hate Speech: A dataset of comments in news outlets on Twitter ## Dataset Description - **Repository: [https://github.com/finiteautomata/contextualized-hatespeech-classification](https://github.com/finiteautomata/contextualized-hatespeech-classification)** - **Paper**: ["Assessing the impact of contextual information in hate speech detection"](https://arxiv.org/abs/2210.00465), Juan Manuel Pérez, Franco Luque, Demian Zayat, Martín Kondratzky, Agustín Moro, Pablo Serrati, Joaquín Zajac, Paula Miguel, Natalia Debandi, Agustín Gravano, Viviana Cotik - **Point of Contact**: jmperez (at) dc uba ar ### Dataset Summary ![Graphical representation of the dataset](Dataset%20graph.png) This dataset is a collection of tweets that were posted in response to news articles from five specific Argentinean news outlets: Clarín, Infobae, La Nación, Perfil and Crónica, during the COVID-19 pandemic. The comments were analyzed for hate speech across eight different characteristics: against women, racist content, class hatred, against LGBTQ+ individuals, against physical appearance, against people with disabilities, against criminals, and for political reasons. All the data is in Spanish. Each comment is labeled with the following variables | Label | Description | | :--------- | :---------------------------------------------------------------------- | | HATEFUL | Contains hate speech (HS)? | | CALLS | If it is hateful, is this message calling to (possibly violent) action? | | WOMEN | Is this against women? | | LGBTI | Is this against LGBTI people? | | RACISM | Is this a racist message? | | CLASS | Is this a classist message? | | POLITICS | Is this HS due to political ideology? | | DISABLED | Is this HS against disabled people? | | APPEARANCE | Is this HS against people due to their appearance? (e.g. fatshaming) | | CRIMINAL | Is this HS against criminals or people in conflict with law? | There is an extra label `CALLS`, which represents whether a comment is a call to violent action or not. The `HATEFUL` and `CALLS` labels are binarized by simple majority; the characteristic or category variables are put to `1` if at least one annotator marked it as such. A raw, non-aggregated version of the dataset can be found at [piuba-bigdata/contextualized_hate_speech_raw](https://huggingface.co/datasets/piuba-bigdata/contextualized_hate_speech_raw) ### Citation Information ```bibtex @article{perez2022contextual, author = {Pérez, Juan Manuel and Luque, Franco M. and Zayat, Demian and Kondratzky, Martín and Moro, Agustín and Serrati, Pablo Santiago and Zajac, Joaquín and Miguel, Paula and Debandi, Natalia and Gravano, Agustín and Cotik, Viviana}, journal = {IEEE Access}, title = {Assessing the Impact of Contextual Information in Hate Speech Detection}, year = {2023}, volume = {11}, number = {}, pages = {30575-30590}, doi = {10.1109/ACCESS.2023.3258973} } ``` ### Contributions [More Information Needed]
提供机构:
piuba-bigdata
原始信息汇总

Contextualized Hate Speech Dataset Summary

Basic Information

  • Language: Spanish
  • Pretty Name: contextualized_hate_speech
  • Task Categories: text-classification
  • Tags: hate_speech
  • Size Categories: 10K<n<100K

Dataset Description

Dataset Content

  • Source: Tweets in response to news articles from five Argentinean news outlets: Clarín, Infobae, La Nación, Perfil, and Crónica, during the COVID-19 pandemic.
  • Analysis: Comments analyzed for hate speech across eight characteristics: against women, racist content, class hatred, against LGBTQ+ individuals, against physical appearance, against people with disabilities, against criminals, and for political reasons.

Label Information

  • HATEFUL: Indicates if the comment contains hate speech.
  • CALLS: Indicates if the hateful comment is a call to (possibly violent) action.
  • Characteristics:
    • WOMEN: Against women.
    • LGBTI: Against LGBTI people.
    • RACISM: Racist message.
    • CLASS: Classist message.
    • POLITICS: Due to political ideology.
    • DISABLED: Against disabled people.
    • APPEARANCE: Against people due to their appearance.
    • CRIMINAL: Against criminals or people in conflict with law.

Additional Notes

  • Label Aggregation: HATEFUL and CALLS labels are binarized by simple majority; characteristic variables are marked 1 if at least one annotator marked it as such.
  • Raw Dataset: Available at piuba-bigdata/contextualized_hate_speech_raw.

Citation Information

bibtex @article{perez2022contextual, author = {Pérez, Juan Manuel and Luque, Franco M. and Zayat, Demian and Kondratzky, Martín and Moro, Agustín and Serrati, Pablo Santiago and Zajac, Joaquín and Miguel, Paula and Debandi, Natalia and Gravano, Agustín and Cotik, Viviana}, journal = {IEEE Access}, title = {Assessing the Impact of Contextual Information in Hate Speech Detection}, year = {2023}, volume = {11}, number = {}, pages = {30575-30590}, doi = {10.1109/ACCESS.2023.3258973} }

搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作