GuardrailsAI/content-moderation

Name: GuardrailsAI/content-moderation
Creator: GuardrailsAI
Published: 2025-02-12 01:53:26
License: 暂无描述

Hugging Face2025-02-12 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/GuardrailsAI/content-moderation

下载链接

链接失效反馈

官方服务：

资源简介：

Jigsaw毒性评论数据集是一个包含大约159,000条来自维基百科讨论页面的评论的大型数据集，这些评论由人工评估者标注了六种毒性类型：有毒、严重有毒、下流、威胁、侮辱和身份攻击。每条评论可能有一个或多个这样的标签。该数据集原本是为了帮助开发能够识别和分类有毒在线评论的模型而设计的，是Kaggle上举办的毒性评论分类挑战的一部分。原始数据集被分为训练集和测试集，大约80%用于训练，20%用于测试。该数据集已被用于各种研究项目和竞赛，旨在改善在线内容审核，创建更安全的在线空间。

The Jigsaw Toxic Comment Dataset is a large collection of approximately 159,000 Wikipedia comments labeled by human raters for six types of toxicity: toxic, severe toxic, obscene, threat, insult, and identity hate. Each comment can have one or more of these labels. The dataset is part of the Toxic Comment Classification Challenge originally hosted on Kaggle and is designed to help develop models that can identify and classify toxic online comments. The original dataset was split into training and testing sets, with about 80% for training and 20% for testing. The data has been used in various research projects and competitions aimed at improving online content moderation and creating safer online spaces.

提供机构：

GuardrailsAI

5,000+

优质数据集

54 个

任务类型

进入经典数据集