hate-speech-portuguese/hate_speech_portuguese

Name: hate-speech-portuguese/hate_speech_portuguese
Creator: hate-speech-portuguese
Published: 2024-01-18 11:04:58
License: 暂无描述

Hugging Face2024-01-18 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/hate-speech-portuguese/hate_speech_portuguese

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为HateSpeechPortuguese，主要用于葡萄牙语的仇恨言论检测。数据集包含5,668条推文，每条推文都有二元标注（即hate vs. no-hate）。数据集的特征包括文本内容、标签、以及三个不同组别的仇恨言论标注和对应的标注者信息。数据集的结构包括训练集，包含5,670个样本，总大小为826,130字节。

This dataset, designated as HateSpeechPortuguese, is primarily developed for Portuguese hate speech detection tasks. It comprises 5,668 tweets, each annotated with a binary label (hate vs. no-hate). The dataset features include the text content, the annotation label, hate speech annotations from three distinct groups, and corresponding annotator information. The dataset structure contains a training set with 5,670 samples, and the total size of the dataset is 826,130 bytes.

提供机构：

hate-speech-portuguese

原始信息汇总

数据集概述

数据集描述

数据集名称: HateSpeechPortuguese
数据集摘要: 葡萄牙语的仇恨言论检测数据集，包含5,668条推文，具有二元标注（即hate和no-hate）。
任务类别: 文本分类
语言: 葡萄牙语
多语言性: 单语种
数据集大小: 1K<n<10K
源数据: 原始数据
标注创建者: 专家生成
语言创建者: 发现
许可证: 未知

数据集结构

数据字段

text: 字符串类型，文本内容
label: 类别标签类型，包含两个类别：
- 0: no-hate
- 1: hate
hatespeech_G1: 字符串类型
annotator_G1: 字符串类型
hatespeech_G2: 字符串类型
annotator_G2: 字符串类型
hatespeech_G3: 字符串类型
annotator_G3: 字符串类型

数据分割

train: 训练集，包含5670个样本，数据大小为826130字节

数据集创建

数据集大小

下载大小: 763846字节
数据集大小: 826130字节

引用信息

bibtex @inproceedings{fortuna-etal-2019-hierarchically, title = "A Hierarchically-Labeled {P}ortuguese Hate Speech Dataset", author = "Fortuna, Paula and Rocha da Silva, Jo{~a}o and Soler-Company, Juan and Wanner, Leo and Nunes, S{e}rgio", editor = "Roberts, Sarah T. and Tetreault, Joel and Prabhakaran, Vinodkumar and Waseem, Zeerak", booktitle = "Proceedings of the Third Workshop on Abusive Language Online", month = aug, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/W19-3510", doi = "10.18653/v1/W19-3510", pages = "94--104", }

5,000+

优质数据集

54 个

任务类型

进入经典数据集