tcapelle/kaggle-toxic-annotated
收藏Hugging Face2024-11-30 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/tcapelle/kaggle-toxic-annotated
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是Kaggle毒性数据集的GPT-4o-mini注释版本,使用了与Toxic-Commons [Celadon]相同的提示进行注释。数据集包含评论文本和多个与毒性相关的标签,如toxic、severe_toxic、obscene、threat、insult和identity_hate。此外,数据集还包含一个名为toxic_commons_label的结构化字段,该字段包含多个与歧视和暴力相关的子字段及其评分,如能力歧视、性别歧视、种族歧视和宗教歧视等。数据集分为训练集和测试集,分别包含159570和153163个样本。数据集的总下载大小为159301273字节,总大小为370369620字节。
This is a dataset containing comment text and its associated labels, including whether it contains toxicity, severe toxicity, obscenity, threats, insults, and identity hate. Additionally, the dataset includes a structured label for describing the reasoning and scoring of different types of discrimination and violent behaviors. The dataset is divided into training and test sets, containing 159570 and 153163 samples respectively. This dataset is annotated using the gpt-4o-mini model with the same prompt used for Toxic-Commons.
提供机构:
tcapelle



