LGBTQIAphobia dataset (augmented)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13756091
下载链接
链接失效反馈官方服务:
资源简介:
Name: LGBTQIAphobia_dataset (augmented)Description: Labelled dataset with phrases retrieved from different digital sources (X/twitter, Instagram, TikTok) containing diverse messages directed towards the LGBTQIA+ community. It has 1234 phrases classified as {Non-LGBTQIAphobic(0), LGBTQIAphobic (1)} Language: Spanish Format: CSV (UTF-8)Structure: id;phrase;class {0,1}Purpose: Be used for fine-tuned models that detect language offensive to Spanish or latin LGBT communities in digital environments.Sources: X/Twitter, Instagram, TikTokSize: 20Kb Ethical considerations: This dataset was created strictly for academic and research purposes. The person who was the target of the hate speech has been anonymized, and there is no intention to harm them in any way, either to them or to the person who delivered the speech. We prioritize the protection of privacy and confidentiality of vulnerable individuals. To safeguard privacy, we carefully remove any identifying details such as user IDs, phone numbers, and addresses before sharing the data with our annotators. All the data we collect is from publicly available sources and does not contain any personal or sensitive information that may jeopardize anyone’s privacy. I request researchers to commit to abiding by ethical guidelines so as not to unnecessarily harm individuals.¿How was create?-Starting recovering of discriminatory phrases for the LGBTQIA+ community from X/Twitter, Instagram and Tiktok (197 phrases) .-Labelling by 3 raters as non-lgbtphobic (0) and lgbtphobic (1).-Text augmentation was applied backtranslation and random synonyms replacing.-Translating to Spanish part of McGiff, J., & Nikolov, N. S. (2024) dataset and was added under licence CC-BY-4.0-Finally, we obtained 1234 tagged phrases for version 1.0.1 of LGBTQIAphobia_augmented.
Class distribution
class id
0 507
1 727where class is0:non-lgbtphobic1:lgbtphobic
创建时间:
2024-12-27



