hateRADAR-es: Annotated Corpus for Anti-Refugee Hate Speech Detection in Spanish (Training and Test Sets)
收藏Zenodo2025-10-14 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.17259982
下载链接
链接失效反馈官方服务:
资源简介:
The hateRADAR-es (Anti-Hate Refugees Annotated Dataset and Analysis Resource) dataset is a corpus of Spanish-language dataset of Twitter messages focused on hate speech and negative discourse directed towards refugees. It was manually annotated by expert sociologists and social workers to ensure quality and reliability in the identification of anti-refugee narratives.The dataset contains 5,000 tweets, divided into training (4,000) and test (1,000) sets (hateRADAR-es_train and hateRADAR-es_test), with balanced labels for the detection of hate speech (0 = no hate speech, 1 = hate speech). Tweets were collected between December 2015 and December 2016 using NodeXL, filtered for the keyword “refugiados” (refugees), and curated to remove duplicates and retweets.
hateRADAR-es provides a high-quality benchmark for research in Natural Language Processing (NLP), machine learning, computational social science, and digital humanities. It supports studies on hate speech detection, implicit vs. explicit hostility, and narrative analysis of anti-refugee discourse.
This dataset was developed within the project [NON-CONSPIRA-HATE!] (PID2021-123983OB-I00). hateRADAR-es is available to the scientific community to encourage further research. This data is described in detail in the article:
Mata, J., Gualda, E., Pachón, V., Rebollo-Díaz, C., & Domínguez, J. L. (2025). From data to detection: Developing a corpus and training language models for the identification of anti-refugee narratives in Spanish. Array, 100526. https://doi.org/10.1016/j.array.2025.100526
提供机构:
Zenodo
创建时间:
2025-10-14



