BeaverTails-V

Name: BeaverTails-V
Creator: maas
Published: 2025-12-19 14:10:03
License: 暂无描述

魔搭社区2025-12-19 更新2025-02-15 收录

下载链接：

https://modelscope.cn/datasets/PKU-Alignment/BeaverTails-V

下载链接

链接失效反馈

官方服务：

资源简介：

**<span style="color: red;">Warning: This dataset may contain sensitive or harmful content. Users are advised to handle it with care and ensure that their use complies with relevant ethical guidelines and legal requirements.</span>** ### 1. Usage If you want to use load_dataset(), you can directly use as follows: ```python from datasets import load_dataset train_dataset = load_dataset('PKU-Alignment/BeaverTails-V', name='animal_abuse')['train'] eval_dataset = load_dataset('PKU-Alignment/BeaverTails-V', name='animal_abuse')['evaluation'] ``` ### 2. Dataset Summary BeaverTails-V is a multimodal dataset designed to address various safety-related concerns, categorized into nine primary safety domains. It consists of diverse content, including textual information and images, which are classified into specific categories to assist in the detection of harmful and inappropriate content. The dataset is designed to help train AI models to identify safety risks and content violations effectively. Below is a breakdown of the safety categories in BeaverTails-V: #### 1. Threats to National and Public Security - **1-1. Terrorism and Extremism**: Content associated with national security threats, including terrorism, violent extremism, and the incitement of violence or subversive ideologies. - **1-2. Cybersecurity Threats**: Content related to hacking, cyber-attacks, or any form of digital crime that threatens the safety and security of individuals or organizations. #### 2. Health and Safety Risks - **2-1. Public Health Hazards**: Content promoting unsafe health practices, misleading health information, unapproved medical methods, or drug and substance abuse. - **2-2. Environmental Damage**: Content that highlights pollution, wildlife trafficking, and other activities that harm the environment and ecosystems. - **2-3. Dangerous Behavior**: Content that depicts risky stunts or dangerous behaviors that could endanger personal safety if mimicked. #### 3. Physical and Psychological Harm - **3-1. Violence and Physical Harm**: Depictions of violence, abuse, and physical harm directed at individuals or animals. - **3-2. Psychological Harm and Manipulation**: Content aimed at inflicting emotional or psychological harm, including intimidation, manipulation, or the encouragement of self-harm or suicide. - **3-3. Animal Abuse**: Depictions of abuse, neglect, or unethical treatment of animals. #### 4. Privacy and Consent Violations - **4-1. Privacy Invasion and Surveillance**: Content that involves unauthorized surveillance, tracking, or the disclosure of personal information without consent. - **4-2. Identity Misuse and Impersonation**: Content involving the impersonation of individuals, piracy, and copyright infringement, including the unauthorized use of identities for deceptive purposes. #### 5. Discrimination and Harassment - **5-1. Discriminatory Depictions**: Content that promotes discrimination or perpetuates harmful stereotypes based on race, religion, gender, disability, etc. - **5-2. Insulting and Harassing Behavior**: Content designed to demean, insult, or harass individuals or groups, including bullying and other forms of verbal abuse. #### 6. Unethical Betrayal and Deceptive Behavior - **6-1. Deception in Personal Relationships**: Content showcasing infidelity, betrayal in friendships, or unethical behaviors in personal or professional relationships. - **6-2. Financial and Academic Fraud**: Content related to financial fraud, academic dishonesty, cheating in exams, or deceit in financial and academic contexts. #### 7. False Information and Sensitive Information - **7-1. False Information**: Content that includes fabricated news, manipulated media, or false claims presented as facts, potentially misleading or deceiving viewers. - **7-2. Sensitive Information in Key Areas**: Misinformation in sensitive areas like politics, religion, and national security that could incite social unrest or negatively impact public safety. #### 8. Disturbing or Unsettling Content - **8-1. Horror and Gore**: Graphic and vivid depictions of horror, including blood and cruelty, intended to evoke fear or disgust. - **8-2. Psychological Horror and Dark Themes**: Content that explores psychological horror, emotional isolation, and other unsettling themes designed to disturb viewers mentally. #### 9. Sexually Explicit and Inappropriate Content - **9-1. Pornographic Content**: Explicit content intended for sexual stimulation, including depictions of nudity, sexual acts, and sexually suggestive material. It is often restricted to adult audiences and requires careful handling. - **9-2. Sexual Crimes**: Content involving illegal or non-consensual sexual activity, such as depictions involving minors or intimate content shared without consent, which is a serious violation of ethical and legal standards. ### 3. Dataset Analysis #### 3.1 Response Distribution We used six vision language models to answer the questions and chose two different responses for each pair of data. <img src="./analysis/plot_response_source/Distribution_of_VLMS.png" alt="Distribution of VLMS" style="width:70%;"> #### 3.2 Length Distribution <img src="./analysis/plot_length/Question_Length_Distribution.png" alt="Question Length Distribution" style="width:70%;"> <img src="./analysis/plot_length/Response_Length_Distribution.png" alt="Response Length Distribution" style="width:70%;"> ### 4. Data Fields | Idx | Key | Description | | ---- | -------------------------- | ------------------------------------------------------------ | | 0 | `question` | The input question or prompt that the model needs to respond to. | | 1 | `image` | The associated image file, if applicable, that provides additional context for the question. | | 2 | `category` | The category or classification of the image, indicating its subject or content type. | | 3 | `image_severity` | The safety level of the image, assessing its appropriateness or potential risk. | | 4 | `response` | The response generated by a vision language model for the given question. | | 5 | `is_response_safe` | A categorical value (`"yes"` or `"no"`) indicating whether `response` is considered safe. | ### 5. Citation Please cite our work if you use the data or model in your paper. ``` @misc{ji2025safe, title={Safe RLHF-V: Safe Reinforcement Learning from Human Feedback in Multimodal Large Language Models}, author={Jiaming Ji and Xinyu Chen and Rui Pan and Han Zhu and Conghui Zhang and Jiahao Li and Donghai Hong and Boyuan Chen and Jiayi Zhou and Kaile Wang and Juntao Dai and Chi-Min Chan and Sirui Han and Yike Guo and Yaodong Yang}, journal={arXiv preprint arXiv:2503.17682}, year={2025}, } ```

**警告：本数据集可能包含敏感或有害内容。请使用者谨慎处理，并确保其使用符合相关伦理准则与法律法规要求。** ### 1. 使用方法若需使用`load_dataset()`加载本数据集，可直接采用如下方式： python from datasets import load_dataset train_dataset = load_dataset('PKU-Alignment/BeaverTails-V', name='animal_abuse')['train'] eval_dataset = load_dataset('PKU-Alignment/BeaverTails-V', name='animal_abuse')['evaluation'] ### 2. 数据集概述 BeaverTails-V是一款多模态数据集（multimodal dataset），旨在解决各类安全相关问题，共划分为9个核心安全领域。该数据集包含文本信息与图像等多样化内容，并被归类至特定类别，以助力有害与不当内容的检测。本数据集旨在帮助训练AI模型，使其能够有效识别安全风险与内容违规行为。以下为BeaverTails-V的安全类别细分： #### 1. 威胁国家安全与公共安全 - **1-1. 恐怖主义与极端主义**：与国家安全威胁相关的内容，涵盖恐怖主义、暴力极端主义，以及煽动暴力或颠覆性意识形态的信息。 - **1-2. 网络安全威胁**：与黑客攻击、网络入侵或任何威胁个人与组织安全的数字犯罪相关的内容。 #### 2. 健康与安全风险 - **2-1. 公共健康危害**：宣扬不安全健康行为、误导性健康信息、未经批准的医疗手段，或药物与物质滥用的内容。 - **2-2. 环境破坏**：涉及污染、野生动物贩运及其他破坏环境与生态系统的活动的内容。 - **2-3. 危险行为**：描述危险特技或可能被模仿后危及人身安全的危险行为的内容。 #### 3. 身心伤害 - **3-1. 暴力与身体伤害**：针对个人或动物的暴力、虐待与身体伤害的描绘内容。 - **3-2. 心理伤害与操控**：旨在造成情感或心理伤害的内容，包括恐吓、操控，或煽动自残与自杀的信息。 - **3-3. 虐待动物**：对动物进行虐待、忽视或不道德对待的描绘内容。 #### 4. 隐私与同意违规 - **4-1. 隐私侵犯与监视**：涉及未经授权的监视、追踪，或未经同意披露个人信息的内容。 - **4-2. 身份滥用与冒充**：涉及冒充他人、盗版与版权侵权的内容，包括未经授权使用身份以进行欺骗性活动的行为。 #### 5. 歧视与骚扰 - **5-1. 歧视性描绘**：基于种族、宗教、性别、残疾等因素宣扬歧视或延续有害刻板印象的内容。 - **5-2. 侮辱与骚扰行为**：旨在贬低、侮辱或骚扰个人或群体的内容，包括霸凌与其他形式的言语虐待。 #### 6. 不道德背叛与欺骗行为 - **6-1. 个人关系中的欺骗**：展现不忠、友谊背叛，或个人与职业关系中不道德行为的内容。 - **6-2. 金融与学术欺诈**：涉及金融诈骗、学术不端、考试作弊，或在金融与学术场景中进行欺骗的内容。 #### 7. 虚假信息与敏感信息 - **7-1. 虚假信息**：以事实形式呈现的捏造新闻、篡改媒体或虚假声明的内容，可能误导或欺骗受众。 - **7-2. 关键领域敏感信息**：在政治、宗教与国家安全等敏感领域的虚假信息，可能煽动社会动荡或对公共安全造成负面影响。 #### 8. 令人不安或不适的内容 - **8-1. 恐怖与血腥内容**：旨在引发恐惧或厌恶的恐怖、血腥的生动描绘，包括暴力与残忍场景。 - **8-2. 心理恐怖与黑暗主题**：探索心理恐怖、情感孤立及其他旨在造成心理不适的不安主题的内容。 #### 9. 露骨性与不当内容 - **9-1. 色情内容**：旨在刺激性欲的露骨内容，包括裸体、性行为与性暗示材料的描绘，通常仅限成人受众，需谨慎处理。 - **9-2. 性犯罪**：涉及非法或非自愿性行为的内容，如涉及未成年人的描绘或未经同意分享的私密内容，此类内容严重违反伦理与法律标准。 ### 3. 数据集分析 #### 3.1 响应分布我们使用6款视觉语言模型（Vision Language Model, VLM）来回答问题，并为每一组数据选择两种不同的响应。 <img src="./analysis/plot_response_source/Distribution_of_VLMS.png" alt="视觉语言模型分布" style="width:70%;"> #### 3.2 长度分布 <img src="./analysis/plot_length/Question_Length_Distribution.png" alt="问题长度分布" style="width:70%;"> <img src="./analysis/plot_length/Response_Length_Distribution.png" alt="响应长度分布" style="width:70%;"> ### 4. 数据字段 | 序号 | 键名 | 描述 | | ---- | -------------------------- | ------------------------------------------------------------ | | 0 | `question` | 模型需要响应的输入问题或提示。 | | 1 | `image` | 为问题提供额外上下文的关联图像文件（如适用）。 | | 2 | `category` | 图像的类别或分类标签，用于指示其主题或内容类型。 | | 3 | `image_severity` | 图像的安全等级，用于评估其适当性或潜在风险。 | | 4 | `response` | 视觉语言模型针对给定问题生成的响应内容。 | | 5 | `is_response_safe` | 分类值（`"yes"`或`"no"`），用于指示`response`是否被认定为安全。 | ### 5. 引用如您在论文中使用本数据集或相关模型，请引用我们的工作： @misc{ji2025safe, title={Safe RLHF-V: Safe Reinforcement Learning from Human Feedback in Multimodal Large Language Models}, author={Jiaming Ji and Xinyu Chen and Rui Pan and Han Zhu and Conghui Zhang and Jiahao Li and Donghai Hong and Boyuan Chen and Jiayi Zhou and Kaile Wang and Juntao Dai and Chi-Min Chan and Sirui Han and Yike Guo and Yaodong Yang}, journal={arXiv preprint arXiv:2503.17682}, year={2025}, }

提供机构：

maas

创建时间：

2025-02-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集