PleIAs/ToxicCommons

Name: PleIAs/ToxicCommons
Creator: PleIAs
Published: 2024-11-03 16:10:57
License: 暂无描述

Hugging Face2024-11-03 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/PleIAs/ToxicCommons

下载链接

链接失效反馈

官方服务：

资源简介：

Toxic Commons是一个包含200万个注释样本的多语言公共领域文本数据集，用于训练Celadon模型，旨在更好地理解多语言和多文化中的毒性内容。每个样本根据五个毒性轴进行分类，包括种族和起源偏见、性别和性取向偏见、宗教偏见、能力偏见以及暴力和虐待。所有样本均由Llama 3.1 8B Instruct模型进行分类，并提供了生成注释的脚本和提示。

Toxic Commons is a dataset containing 2 million multilingual annotated texts in the public domain, used to train the Celadon model. The dataset aims to better understand toxicity in a multilingual and multicultural context. Each sample is classified across 5 axes of toxicity: race and origin-based bias, gender and sexuality-based bias, religious bias, ability bias, and violence and abuse. All samples were classified by the Llama 3.1 8B Instruct model using a custom system prompt. Detailed information about the dataset and the annotation process can be found in the related paper and GitHub repository.

提供机构：

PleIAs

5,000+

优质数据集

54 个

任务类型

进入经典数据集