Aakash941/THAR-Dataset
收藏Hugging Face2024-03-31 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/Aakash941/THAR-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- text-classification
language:
- hi
- en
pretty_name: Targeted Hate Speech Against Religion
size_categories:
- 10K<n<100K
---
The dataset consists 11,549 YouTube comments in Hindi-English code-mixed language for targeted hate speech detection against religion. Binary and multi-class tagging of YouTube comments is used.
The classification of YouTube comments addresses two subtasks: Subtask-1 (Binary classification): comments are labeled as antireligion or non-antireligion. Subtask-2 (Multi-class classification): comments are labeled on the major targeted religions such as Islam, Hinduism, and Christianity, with a ‘none’ class also provided.
For more information, refer this paper: Sharma, D., Singh, A., & Singh, V. K. (2024). THAR-Targeted Hate Speech Against Religion: A high-quality Hindi-English code-mixed Dataset with the Application of Deep Learning Models for Automatic Detection. ACM Transactions on Asian and Low-Resource Language Information Processing.
https://doi.org/10.1145/3653017
提供机构:
Aakash941
原始信息汇总
数据集概述
基本信息
- 许可证:CC-BY-4.0
- 任务类别:文本分类
- 语言:印地语-英语混合
- 数据集名称:针对宗教的目标仇恨言论
- 数据集大小:10,000 < n < 100,000
数据内容
- 数据来源:YouTube评论
- 数据量:11,549条评论
- 语言特点:印地语-英语代码混合
任务描述
- 子任务1(二元分类):评论被标记为反宗教或非反宗教。
- 子任务2(多类别分类):评论根据主要目标宗教(伊斯兰教、印度教、基督教)进行标记,并提供一个“无”类别。
参考文献
- Sharma, D., Singh, A., & Singh, V. K. (2024). THAR-Targeted Hate Speech Against Religion: A high-quality Hindi-English code-mixed Dataset with the Application of Deep Learning Models for Automatic Detection. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3653017



