hate-speech-portuguese/hate_speech_portuguese
收藏Hugging Face2024-01-18 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/hate-speech-portuguese/hate_speech_portuguese
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为HateSpeechPortuguese,主要用于葡萄牙语的仇恨言论检测。数据集包含5,668条推文,每条推文都有二元标注(即hate vs. no-hate)。数据集的特征包括文本内容、标签、以及三个不同组别的仇恨言论标注和对应的标注者信息。数据集的结构包括训练集,包含5,670个样本,总大小为826,130字节。
This dataset, designated as HateSpeechPortuguese, is primarily developed for Portuguese hate speech detection tasks. It comprises 5,668 tweets, each annotated with a binary label (hate vs. no-hate). The dataset features include the text content, the annotation label, hate speech annotations from three distinct groups, and corresponding annotator information. The dataset structure contains a training set with 5,670 samples, and the total size of the dataset is 826,130 bytes.
提供机构:
hate-speech-portuguese
原始信息汇总
数据集概述
数据集描述
- 数据集名称: HateSpeechPortuguese
- 数据集摘要: 葡萄牙语的仇恨言论检测数据集,包含5,668条推文,具有二元标注(即hate和no-hate)。
- 任务类别: 文本分类
- 语言: 葡萄牙语
- 多语言性: 单语种
- 数据集大小: 1K<n<10K
- 源数据: 原始数据
- 标注创建者: 专家生成
- 语言创建者: 发现
- 许可证: 未知
数据集结构
数据字段
- text: 字符串类型,文本内容
- label: 类别标签类型,包含两个类别:
- 0: no-hate
- 1: hate
- hatespeech_G1: 字符串类型
- annotator_G1: 字符串类型
- hatespeech_G2: 字符串类型
- annotator_G2: 字符串类型
- hatespeech_G3: 字符串类型
- annotator_G3: 字符串类型
数据分割
- train: 训练集,包含5670个样本,数据大小为826130字节
数据集创建
数据集大小
- 下载大小: 763846字节
- 数据集大小: 826130字节
引用信息
bibtex @inproceedings{fortuna-etal-2019-hierarchically, title = "A Hierarchically-Labeled {P}ortuguese Hate Speech Dataset", author = "Fortuna, Paula and Rocha da Silva, Jo{~a}o and Soler-Company, Juan and Wanner, Leo and Nunes, S{e}rgio", editor = "Roberts, Sarah T. and Tetreault, Joel and Prabhakaran, Vinodkumar and Waseem, Zeerak", booktitle = "Proceedings of the Third Workshop on Abusive Language Online", month = aug, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/W19-3510", doi = "10.18653/v1/W19-3510", pages = "94--104", }



