themendu/SafeC4
收藏Hugging Face2025-08-15 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/themendu/SafeC4
下载链接
链接失效反馈官方服务:
资源简介:
SafeC4是一个经过处理的C4数据集版本,包含了有害性预测。这个数据集适用于内容审查、更安全的语言模型训练或网络文本有害性检测的研究。每个数据条目都包含了原始C4字段的文本、URL和时间戳,以及一个包含每个有害性类别和三个维度(安全、主题、有毒)概率分布的预测字段。
SafeC4 is a processed version of the C4 dataset that includes harmfulness predictions. This dataset is suitable for content moderation, safer language model training, or research into harmfulness detection in web text. Each entry in the dataset includes the original C4 fields of text, URL, and timestamp, along with a prediction field containing probability distributions for each harm category across three dimensions (Safe, Topical, Toxic).
提供机构:
themendu



