five

Aakash941/THAR-Dataset

收藏
Hugging Face2024-03-31 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/Aakash941/THAR-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - text-classification language: - hi - en pretty_name: Targeted Hate Speech Against Religion size_categories: - 10K<n<100K --- The dataset consists 11,549 YouTube comments in Hindi-English code-mixed language for targeted hate speech detection against religion. Binary and multi-class tagging of YouTube comments is used. The classification of YouTube comments addresses two subtasks: Subtask-1 (Binary classification): comments are labeled as antireligion or non-antireligion. Subtask-2 (Multi-class classification): comments are labeled on the major targeted religions such as Islam, Hinduism, and Christianity, with a ‘none’ class also provided. For more information, refer this paper: Sharma, D., Singh, A., & Singh, V. K. (2024). THAR-Targeted Hate Speech Against Religion: A high-quality Hindi-English code-mixed Dataset with the Application of Deep Learning Models for Automatic Detection. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3653017
提供机构:
Aakash941
原始信息汇总

数据集概述

基本信息

  • 许可证:CC-BY-4.0
  • 任务类别:文本分类
  • 语言:印地语-英语混合
  • 数据集名称:针对宗教的目标仇恨言论
  • 数据集大小:10,000 < n < 100,000

数据内容

  • 数据来源:YouTube评论
  • 数据量:11,549条评论
  • 语言特点:印地语-英语代码混合

任务描述

  • 子任务1(二元分类):评论被标记为反宗教或非反宗教。
  • 子任务2(多类别分类):评论根据主要目标宗教(伊斯兰教、印度教、基督教)进行标记,并提供一个“无”类别。

参考文献

  • Sharma, D., Singh, A., & Singh, V. K. (2024). THAR-Targeted Hate Speech Against Religion: A high-quality Hindi-English code-mixed Dataset with the Application of Deep Learning Models for Automatic Detection. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3653017
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作