readerbench/ro-fb-offense

Name: readerbench/ro-fb-offense
Creator: readerbench
Published: 2023-02-20 13:26:28
License: 暂无描述

Hugging Face2023-02-20 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/readerbench/ro-fb-offense

下载链接

链接失效反馈

官方服务：

资源简介：

RO-FB-Offense数据集是一个用于检测罗马尼亚语Facebook评论中冒犯性语言的数据集。该数据集包含4,455条用户生成的评论，注释遵循Germeval 2018数据集的层次标签集，包括非冒犯性语言（OTHER）和冒犯性语言（OFFENSIVE），后者进一步分为亵渎（PROFANITY）、侮辱（INSULT）和滥用（ABUSE）。数据集的语言为罗马尼亚语，大小为1K到10K之间，任务类别为文本分类，具体任务为仇恨言论检测。数据集由专家生成，源数据为Facebook评论，注释由母语者完成。数据集包含有害内容（如辱骂性语言、仇恨言论），使用时需注意其社会影响。

The RO-FB-Offense dataset is a dedicated dataset for detecting offensive language in Romanian-language Facebook comments. It comprises 4,455 user-generated comments, with annotations adopting the hierarchical label schema from the Germeval 2018 dataset. The label set covers NON-OFFENSIVE (OTHER) and OFFENSIVE, with the OFFENSIVE category further divided into PROFANITY, INSULT, and ABUSE. The dataset is in Romanian, with its scale falling within the range of 1K to 10K. Its task type is text classification, and the specific task is hate speech detection. The source data originates from Facebook comments, and the annotations were completed by native Romanian speakers, with the dataset being expert-curated. The dataset contains harmful content including abusive language and hate speech, so caution should be exercised regarding its social impact during usage.

提供机构：

readerbench

原始信息汇总

数据集概述：RO-FB-Offense

数据集描述

数据集摘要

名称: FB-RO-Offense
内容: 包含4,455条来自Facebook直播的用户生成评论，用于检测罗马尼亚语中的攻击性语言。
语言: 罗马尼亚语
标签:
- OTHER: 非攻击性语言
- OFFENSIVE:
  - PROFANITY
  - INSULT
  - ABUSE

支持的任务和排行榜

任务: 文本分类
任务ID: 仇恨言论检测

语言

语言: 罗马尼亚语

数据集结构

数据实例

示例:

{ sender: $USER1208, no_reacts: 1, text: PLACEHOLDER TEXT, label: OTHER, }

数据字段

sender: 字符串类型
no_reacts: 整数类型
text: 字符串类型
label: 分类类型，包括OTHER, PROFANITY, INSULT, ABUSE

数据分割

分割: 训练集和测试集

数据集创建

来源数据

来源: Facebook评论
语言生产者: 社交媒体用户

注释

注释者: 母语为罗马尼亚语的专家

个人和敏感信息

信息处理: 数据在收集时为公开状态，未进行个人识别信息(PII)的移除。

使用数据时的考虑

数据集的社会影响

影响: 数据包含攻击性语言，可能被用于针对各种目标群体的攻击性语言的开发和传播。

许可证信息

许可证: Apache-2.0

5,000+

优质数据集

54 个

任务类型

进入经典数据集