Aakash941/THAR-Dataset

Name: Aakash941/THAR-Dataset
Creator: Aakash941
Published: 2024-03-31 06:19:45
License: 暂无描述

Hugging Face2024-03-31 更新2024-06-11 收录

下载链接：

https://hf-mirror.com/datasets/Aakash941/THAR-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - text-classification language: - hi - en pretty_name: Targeted Hate Speech Against Religion size_categories: - 10K<n<100K --- The dataset consists 11,549 YouTube comments in Hindi-English code-mixed language for targeted hate speech detection against religion. Binary and multi-class tagging of YouTube comments is used. The classification of YouTube comments addresses two subtasks: Subtask-1 (Binary classification): comments are labeled as antireligion or non-antireligion. Subtask-2 (Multi-class classification): comments are labeled on the major targeted religions such as Islam, Hinduism, and Christianity, with a ‘none’ class also provided. For more information, refer this paper: Sharma, D., Singh, A., & Singh, V. K. (2024). THAR-Targeted Hate Speech Against Religion: A high-quality Hindi-English code-mixed Dataset with the Application of Deep Learning Models for Automatic Detection. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3653017

提供机构：

Aakash941

原始信息汇总

数据集概述

基本信息

许可证：CC-BY-4.0
任务类别：文本分类
语言：印地语-英语混合
数据集名称：针对宗教的目标仇恨言论
数据集大小：10,000 < n < 100,000

数据内容

数据来源：YouTube评论
数据量：11,549条评论
语言特点：印地语-英语代码混合

任务描述

子任务1（二元分类）：评论被标记为反宗教或非反宗教。
子任务2（多类别分类）：评论根据主要目标宗教（伊斯兰教、印度教、基督教）进行标记，并提供一个“无”类别。

参考文献

Sharma, D., Singh, A., & Singh, V. K. (2024). THAR-Targeted Hate Speech Against Religion: A high-quality Hindi-English code-mixed Dataset with the Application of Deep Learning Models for Automatic Detection. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3653017

5,000+

优质数据集

54 个

任务类型

进入经典数据集