hateRADAR-es: Annotated Corpus for Anti-Refugee Hate Speech Detection in Spanish (Training and Test Sets)

Name: hateRADAR-es: Annotated Corpus for Anti-Refugee Hate Speech Detection in Spanish (Training and Test Sets)
Creator: Zenodo
Published: 2025-10-14 08:29:46
License: 暂无描述

Zenodo2025-10-14 更新2026-05-26 收录

下载链接：

https://zenodo.org/doi/10.5281/zenodo.17259982

下载链接

链接失效反馈

官方服务：

资源简介：

The hateRADAR-es (Anti-Hate Refugees Annotated Dataset and Analysis Resource) dataset is a corpus of Spanish-language dataset of Twitter messages focused on hate speech and negative discourse directed towards refugees. It was manually annotated by expert sociologists and social workers to ensure quality and reliability in the identification of anti-refugee narratives.The dataset contains 5,000 tweets, divided into training (4,000) and test (1,000) sets (hateRADAR-es_train and hateRADAR-es_test), with balanced labels for the detection of hate speech (0 = no hate speech, 1 = hate speech). Tweets were collected between December 2015 and December 2016 using NodeXL, filtered for the keyword “refugiados” (refugees), and curated to remove duplicates and retweets. hateRADAR-es provides a high-quality benchmark for research in Natural Language Processing (NLP), machine learning, computational social science, and digital humanities. It supports studies on hate speech detection, implicit vs. explicit hostility, and narrative analysis of anti-refugee discourse. This dataset was developed within the project [NON-CONSPIRA-HATE!] (PID2021-123983OB-I00). hateRADAR-es is available to the scientific community to encourage further research. This data is described in detail in the article: Mata, J., Gualda, E., Pachón, V., Rebollo-Díaz, C., & Domínguez, J. L. (2025). From data to detection: Developing a corpus and training language models for the identification of anti-refugee narratives in Spanish. Array, 100526. https://doi.org/10.1016/j.array.2025.100526

提供机构：

Zenodo

创建时间：

2025-10-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集