SafeSora-Label

Name: SafeSora-Label
Creator: maas
Published: 2025-11-27 16:22:21
License: 暂无描述

魔搭社区2025-11-27 更新2025-02-08 收录

下载链接：

https://modelscope.cn/datasets/PKU-Alignment/SafeSora-Label

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for SafeSora <span style="color: red;">Warning: this dataset contains data that may be offensive or harmful. The data are intended for research purposes, especially research that can make models less harmful. The views expressed in the data do not reflect the views of PKU-Alignment Team or any of its members. </span> [[`🏠 Project Homepage`](https://sites.google.com/view/safe-sora)] [[`🤗 SafeSora Datasets`](https://huggingface.co/datasets/PKU-Alignment/SafeSora)] [[`🤗 SafeSora Label`](https://huggingface.co/datasets/PKU-Alignment/SafeSora-Label)] [[`🤗 SafeSora Evaluation`](https://huggingface.co/datasets/PKU-Alignment/SafeSora-Eval)] [[`⭐️ Github Repo`](https://github.com/PKU-Alignment/safe-sora)] SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enhance the helpfulness and harmlessness of Large Vision Models (LVMs). It currently contains three types of data: - A classification dataset (SafeSora-Label) of 57k+ Text-Video pairs, including multi-label classification of 12 harm labels for their text prompts and text-video pairs. - A human preference dataset (SafeSora) of 51k+ instances in the text-to-video generation task, containing comparative relationships in terms of helpfulness and harmlessness, as well as four sub-dimensions of helpfulness. - An evaluation dataset (SafeSora-Eval) containing 600 human-written prompts, with 300 being safety-neutral and another 300 constructed according to 12 harm categories as red-team prompts. In the future, we will also open-source some baseline alignment algorithms that utilize these datasets. # Multi-label Classification Dataset The multi-label classification dataset contains 57k+ text-video pairs, each labeled with 12 harm tags. We perform multi-label classification on individual prompts as well as the combination of prompts and the videos generated from those prompts. These 12 harm tags are defined as: - S1: `Adult, Explicit Sexual Content` - S2: `Animal Abuse` - S3: `Child Abuse` - S4: `Crime` - S5: `Debated Sensitive Social Issue` - S6: `Drug, Weapons, Substance Abuse` - S7: `Insulting, Hateful, Aggressive Behavior` - S8: `Violence, Injury, Gory Content` - S9: `Racial Discrimination` - S10: `Other Discrimination (Excluding Racial)` - S11: `Terrorism, Organized Crime` - S12: `Other Harmful Content`  The distribution of these 14 categories is shown below: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6426a75336ab74ff2bca0542/ffb_0BWUh54t_bc-z-Xo0.png) In our dataset, nearly half of the prompts are safety-critical, while the remaining half are safety-neutral. Our prompts partly come from real online users, while the remaining portion is supplemented by researchers for balancing purposes. # Visualization example of data points Multi-label Classification Dataset only contains classification information for T-V pairs ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6426a75336ab74ff2bca0542/9WXTN3wsPJvjsMsKPAQgT.png)

# SafeSora 数据集卡片 <span style="color: red;">警告：本数据集包含可能具有冒犯性或危害性的内容，仅用于研究目的，尤其是旨在提升模型安全性的相关研究。数据中表达的观点不代表北京大学对齐团队（PKU-Alignment Team）及其成员的立场。</span> [[`🏠 项目主页`](https://sites.google.com/view/safe-sora)] [[`🤗 SafeSora 数据集`](https://huggingface.co/datasets/PKU-Alignment/SafeSora)] [[`🤗 SafeSora 标注数据集`](https://huggingface.co/datasets/PKU-Alignment/SafeSora-Label)] [[`🤗 SafeSora 评估数据集`](https://huggingface.co/datasets/PKU-Alignment/SafeSora-Eval)] [[`⭐️ GitHub 仓库`](https://github.com/PKU-Alignment/safe-sora)] SafeSora是面向文本到视频生成领域安全对齐研究的人类偏好数据集，旨在提升大型视觉模型（Large Vision Models, LVMs）的有用性与无害性。目前该数据集包含三类数据： - 标注多分类数据集（SafeSora-Label）：包含57k+文本-视频对，针对其文本提示以及文本-视频对完成12类有害标签的多标签分类标注。 - 人类偏好数据集（SafeSora）：包含51k+个文本到视频生成任务实例，涵盖有用性与无害性维度的对比关系，以及4个子维度的有用性标注。 - 评估数据集（SafeSora-Eval）：包含600条人工撰写的提示词，其中300条为安全中性提示词，另外300条则依据12类有害类别构建，作为红队测试提示词。后续我们还将开源基于此类数据集设计的基线对齐算法。 # 多标签分类数据集该多标签分类数据集包含57k+条文本-视频对，每条样本均标注了12类有害标签。我们会针对单条提示词，以及提示词与由其生成的视频的组合内容完成多标签分类。这12类有害标签定义如下： - S1：`成人色情露骨内容` - S2：`虐待动物` - S3：`虐待儿童` - S4：`犯罪行为` - S5：`争议性敏感社会议题` - S6：`毒品、武器、物质滥用` - S7：`侮辱、仇恨、攻击性行为` - S8：`暴力、伤害、血腥内容` - S9：`种族歧视` - S10：`其他歧视（不含种族歧视）` - S11：`恐怖主义、有组织犯罪` - S12：`其他有害内容`  上述12类有害标签的分布情况如下： ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6426a75336ab74ff2bca0542/ffb_0BWUh54t_bc-z-Xo0.png) 在本数据集中，近半数提示词属于安全关键型内容，剩余半数为安全中性内容。我们的提示词部分来源于真实在线用户，其余部分则由研究人员补充以实现数据平衡。 # 数据点可视化示例多标签分类数据集仅包含文本-视频对的分类标注信息。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6426a75336ab74ff2bca0542/9WXTN3wsPJvjsMsKPAQgT.png)

提供机构：

maas

创建时间：

2025-02-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集