1-800-SHARED-TASKS/civil_comments_Safety
收藏Hugging Face2024-09-26 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/1-800-SHARED-TASKS/civil_comments_Safety
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为Civil Comments,主要包含来自Civil Comments平台的评论数据,这些评论是从2015年到2017年间在全球约50个英语新闻网站上发布的。数据集由Jigsaw扩展,增加了毒性和身份提及的标签。数据集的结构包括文本、毒性、严重毒性、淫秽、威胁、侮辱、身份攻击和性明确等字段。数据集分为训练集、验证集和测试集,分别包含1804874、97320和97320个样本。数据集的总下载大小为414.95 MB,生成的数据集大小为661.23 MB,总共占用1.08 GB的磁盘空间。
The dataset is named Civil Comments and primarily contains comment data from the Civil Comments platform, which were posted on approximately 50 English-language news sites worldwide between 2015 and 2017. The dataset was extended by Jigsaw to include additional labels for toxicity and identity mentions. The dataset structure includes fields such as text, toxicity, severe toxicity, obscene, threat, insult, identity attack, and sexual explicit. The dataset is divided into training, validation, and test sets, containing 1,804,874, 97,320, and 97,320 samples respectively. The total download size of the dataset is 414.95 MB, the generated dataset size is 661.23 MB, and it occupies a total of 1.08 GB of disk space.
提供机构:
1-800-SHARED-TASKS



