ctoraman/large-scale-hate-speech-turkish-v2

Name: ctoraman/large-scale-hate-speech-turkish-v2
Creator: ctoraman
Published: 2023-11-30 11:49:38
License: 暂无描述

Hugging Face2023-11-30 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/ctoraman/large-scale-hate-speech-turkish-v2

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是LREC 2022论文《大规模跨领域仇恨言论检测》中发布的，是土耳其语的第二版数据集，包含60,310条土耳其语推文。数据集中的注释具有超过80%的一致性。每条推文包含以下信息：推文ID（来自Twitter API）、语言ID（0表示土耳其语）、主题ID（0-宗教，1-性别，2-种族，3-政治，4-体育）和仇恨标签（0-正常，1-冒犯性，2-仇恨）。

提供机构：

ctoraman

原始信息汇总

数据集概述

基本信息

许可证: cc-by-nc-sa-4.0
任务类别: text-classification
语言: tr
标签: hate speech, hate speech detection, hate-speech, tweets, social media, topic, hate-speech-detection

数据集描述

名称: Dataset v2 (Turkish)
内容: 包含60,310条土耳其语推文，标注一致性超过80%。

数据字段

TweetID: 推特API中的推文ID
LangID: 语言标识，0表示土耳其语
TopicID: 话题领域，0-宗教, 1-性别, 2-种族, 3-政治, 4-体育
HateLabel: 最终的仇恨言论标签，0-正常, 1-冒犯性, 2-仇恨

引用

Toraman, C., Şahinuç, F., & Yilmaz, E. (2022, June). Large-Scale Hate Speech Detection with Cross-Domain Transfer. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 2215-2225).

5,000+

优质数据集

54 个

任务类型

进入经典数据集