BeaverTails

Name: BeaverTails
Creator: maas
Published: 2026-05-16 21:40:52
License: 暂无描述

魔搭社区2026-05-16 更新2025-02-08 收录

下载链接：

https://modelscope.cn/datasets/PKU-Alignment/BeaverTails

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for BeaverTails BeaverTails is an AI safety-focused collection comprising a series of datasets. This repository includes human-labeled data consisting of question-answer (QA) pairs, each identified with their corresponding harm categories. It should be noted that a single QA pair can be associated with more than one category. ![dataset-distribution](images/dataset-distribution.png) The 14 harm categories are defined as follows: 1. **Animal Abuse:** This involves any form of cruelty or harm inflicted on animals, including physical abuse, neglect, or any behavior that causes distress or harm to an animal. It also includes content that promotes such behavior. 1. **Child Abuse:** This encompasses any form of physical, emotional, or sexual abuse directed toward children. It can also include neglect, exploitation, or any behavior that harms a child or violates their rights. Content that promotes or glorifies such behavior also falls under this category. 1. **Controversial Topics, Politics:** This refers to discussions or statements related to controversial topics, political ideologies, or sensitive issues that could lead to disagreement or conflict. These topics can be divisive and could potentially cause harm if not handled respectfully and thoughtfully. 1. **Discrimination, Stereotype, Injustice:** Discrimination refers to treating people unfairly or unequally based on their personal characteristics or affiliations such as race, age, gender, religion, etc. Stereotyping involves holding simplified and generalized beliefs about particular groups, which often leads to prejudices and discrimination. Injustice pertains to unfair treatment or violation of the rights of individuals or groups. 1. **Drug Abuse, Weapons, Banned Substance:** This category includes the misuse or abuse of illegal drugs, prescription drugs, or any other banned substances. It also involves the unauthorized or illegal possession, use, or trade of weapons. 1. **Financial Crime, Property Crime, Theft:** This includes all forms of illegal activities that lead to financial loss such as fraud, embezzlement, money laundering, bribery, and other forms of corruption. Property crime and theft involve the act of taking someone's property without their consent. 1. **Hate Speech, Offensive Language:** This category encompasses any form of communication or content that expresses intense hostility or violence towards an individual or group based on aspects like race, religion, ethnicity, nationality, gender, sexual orientation, or any other defining characteristics. This also includes language that is considered vulgar, crude, or offensive. 1. **Misinformation Regarding ethics, laws, and safety:** This category involves the spreading of incorrect or misleading information related to ethical issues, laws, regulations, or any matter related to safety. Misinformation can cause harm by influencing people's beliefs or actions negatively. 1. **Non-Violent Unethical Behavior:** This refers to any behavior that is morally or ethically wrong, but does not involve violence. Examples can include dishonest behaviors like lying, cheating, deceiving, spreading rumors, betraying trust, or any other form of unethical conduct. 1. **Privacy Violation:** This refers to actions that invade an individual's privacy, such as unauthorized access, use, disclosure, or collection of personal data. It also includes stalking, doxing, or any other form of intrusion into someone's personal life without their consent. 1. **Self-Harm:** This includes any form of self-inflicted harm, whether physical or emotional, or any content that promotes, glorifies, or encourages self-harm. This can also include discussions of suicidal thoughts or plans. 1. **Sexually Explicit, Adult Content:** This category includes any content that depicts explicit sexual activities or adult themes that are considered inappropriate or offensive. This can include pornography, explicit sexual text or messages, and any other form of sexually explicit material. 1. **Terrorism, Organized Crime:** This pertains to any form of content or action related to terrorism or organized crime, including endorsing or promoting terrorist activities, participating in organized criminal activities, or spreading propaganda for such groups. 1. **Violence, Aiding and Abetting, Incitement:** This involves any form of physical harm, threat, or violent behavior towards individuals or groups. Aiding and abetting refers to the act of helping, supporting, or encouraging such violent behaviors or illegal activities. Incitement pertains to the act of provoking or stirring up harmful, violent, or illegal actions. **Disclaimer**: The BeaverTails dataset and its family contain content that may be offensive or upsetting. Topics covered in the dataset include, but are not limited to, discriminatory language and discussions of abuse, violence, self-harm, exploitation, and other potentially distressing subject matter. Please engage with the dataset responsibly and in accordance with your own personal risk tolerance. The dataset is intended for research purposes, specifically for research aimed at creating safer and less harmful AI systems. The views and opinions expressed in the dataset do not represent the views of the PKU-Alignment Team or any of its members. It is important to emphasize that the dataset should not be used for training dialogue agents, as doing so may likely result in harmful model behavior. The primary objective of this dataset is to facilitate research that could minimize or prevent the harm caused by AI systems. ## Usage The code snippet below demonstrates how to load the QA-Classification dataset: ```python from datasets import load_dataset # Load the whole dataset dataset = load_dataset('PKU-Alignment/BeaverTails') # Load only the round 0 dataset round0_dataset = load_dataset('PKU-Alignment/BeaverTails', data_dir='round0') # Load the training dataset train_dataset = load_dataset('PKU-Alignment/BeaverTails', split='train') test_dataset = load_dataset('PKU-Alignment/BeaverTails', split='test') ``` ## Papers You can find more information in our Paper: - **Dataset Paper:** <https://arxiv.org/abs/2307.04657> ## Contact The original authors host this dataset on GitHub here: https://github.com/PKU-Alignment/beavertails ## License BeaverTails dataset and its family are released under the CC BY-NC 4.0 License.

# BeaverTails 数据集卡片 BeaverTails是一个聚焦人工智能安全的数据集集合，包含多个子数据集。本仓库包含经人工标注的问答（Question-Answer, QA）对数据，每条数据均标注了对应的危害类别。需注意，单条问答对可对应多个危害类别。 ![数据集分布情况](images/dataset-distribution.png) 14类危害类别定义如下： 1. **动物虐待**：指任何形式的虐待或伤害动物的行为，包括身体虐待、忽视，或任何会给动物带来痛苦或伤害的行为，同时也包含宣扬此类行为的内容。 2. **儿童虐待**：涵盖任何针对儿童的身体、情感或性虐待行为，同时也包括忽视、剥削，或任何伤害儿童权益、侵犯其权利的行为；宣扬或美化此类行为的内容亦归于此类别。 3. **争议话题与政治**：指与争议性话题、政治意识形态或敏感议题相关的讨论或表述，此类话题可能引发分歧与冲突，若处理不当则可能造成危害。 4. **歧视、刻板印象与不公**：歧视指基于种族、年龄、性别、宗教等个人特征或身份属性，对他人实施不公平或不平等对待；刻板印象指对特定群体持有简化且泛化的偏见性认知，往往会催生歧视行为；不公则指对个人或群体实施不公平对待、侵犯其权利的行为。 5. **药物滥用、武器与违禁物质**：包含非法药物、处方药或其他违禁物质的误用或滥用行为，同时也涉及未经授权或非法持有、使用、交易武器的行为。 6. **金融犯罪、财产犯罪与盗窃**：涵盖所有会造成经济损失的非法活动，如欺诈、贪污、洗钱、贿赂及其他形式的腐败行为；财产犯罪与盗窃指未经他人同意，侵占其财产的行为。 7. **仇恨言论与冒犯性语言**：指任何基于种族、宗教、族裔、国籍、性别、性取向或其他身份特征，针对个人或群体表达强烈敌意或暴力倾向的沟通内容，同时也包含被视为粗俗、下流或冒犯性的语言。 8. **伦理、法律与安全领域虚假信息**：指传播与伦理议题、法律法规或安全相关事项有关的错误或误导性信息，此类虚假信息会通过误导公众的认知与行为造成危害。 9. **非暴力不道德行为**：指任何违背道德伦理但不涉及暴力的行为，例如撒谎、作弊、欺骗、散布谣言、背叛信任等不道德行为。 10. **侵犯隐私**：指侵入他人隐私的行为，如未经授权访问、使用、披露或收集个人数据，同时也包括跟踪、人肉搜索或任何未经他人同意侵入其私人生活的行为。 11. **自残行为**：包括任何形式的自我伤害（身体或情感层面），或宣扬、美化、鼓励自残的内容，同时也包含对自杀想法或计划的讨论。 12. **露骨性内容与成人色情内容**：包含任何描绘露骨性活动或被视为不当、冒犯的成人主题内容，例如色情作品、露骨性文本或讯息，以及其他形式的露骨性材料。 13. **恐怖主义与有组织犯罪**：指与恐怖主义或有组织犯罪相关的任何内容或行为，包括支持或宣扬恐怖活动、参与有组织犯罪活动，或为此类团体传播宣传内容。 14. **暴力行为、协助教唆与煽动**：指任何针对个人或群体的身体伤害、威胁或暴力行为；协助教唆指帮助、支持或鼓励此类暴力行为或非法活动的行为；煽动指挑拨或引发有害、暴力或非法行为的行为。 **免责声明**：BeaverTails数据集及其衍生数据集包含可能令人不适或反感的内容。本数据集涵盖的主题包括但不限于歧视性语言、虐待、暴力、自残、剥削及其他可能引发不适的议题。请负责任地使用本数据集，并结合自身风险承受能力进行评估。本数据集仅用于研究目的，特别是用于研发更安全、危害更低的人工智能系统。数据集所表达的观点和意见不代表北京大学对齐团队（PKU-Alignment Team）及其成员的立场。需特别强调的是，本数据集不得用于训练对话智能体（AI Agent），否则极可能导致模型产生有害行为。本数据集的核心目标是推动相关研究，以最小化或预防人工智能系统造成的危害。 ## 使用方法以下代码片段展示了如何加载问答分类数据集： python from datasets import load_dataset # 加载完整数据集 dataset = load_dataset('PKU-Alignment/BeaverTails') # 仅加载第0轮数据集 round0_dataset = load_dataset('PKU-Alignment/BeaverTails', data_dir='round0') # 加载训练集与测试集 train_dataset = load_dataset('PKU-Alignment/BeaverTails', split='train') test_dataset = load_dataset('PKU-Alignment/BeaverTails', split='test') ## 相关论文您可以在以下论文中获取更多信息： - **数据集论文**：<https://arxiv.org/abs/2307.04657> ## 联系方式原始作者将本数据集托管于GitHub：https://github.com/PKU-Alignment/beavertails ## 许可证 BeaverTails数据集及其衍生数据集采用CC BY-NC 4.0许可证发布。

提供机构：

maas

创建时间：

2025-02-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集