five

GuardrailsAI/content-moderation

收藏
Hugging Face2025-02-12 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/GuardrailsAI/content-moderation
下载链接
链接失效反馈
官方服务:
资源简介:
Jigsaw毒性评论数据集是一个包含大约159,000条来自维基百科讨论页面的评论的大型数据集,这些评论由人工评估者标注了六种毒性类型:有毒、严重有毒、下流、威胁、侮辱和身份攻击。每条评论可能有一个或多个这样的标签。该数据集原本是为了帮助开发能够识别和分类有毒在线评论的模型而设计的,是Kaggle上举办的毒性评论分类挑战的一部分。原始数据集被分为训练集和测试集,大约80%用于训练,20%用于测试。该数据集已被用于各种研究项目和竞赛,旨在改善在线内容审核,创建更安全的在线空间。

The Jigsaw Toxic Comment Dataset is a large collection of approximately 159,000 Wikipedia comments labeled by human raters for six types of toxicity: toxic, severe toxic, obscene, threat, insult, and identity hate. Each comment can have one or more of these labels. The dataset is part of the Toxic Comment Classification Challenge originally hosted on Kaggle and is designed to help develop models that can identify and classify toxic online comments. The original dataset was split into training and testing sets, with about 80% for training and 20% for testing. The data has been used in various research projects and competitions aimed at improving online content moderation and creating safer online spaces.
提供机构:
GuardrailsAI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作