PKU-SafeRLHF-QA

Name: PKU-SafeRLHF-QA
Creator: maas
Published: 2025-12-05 16:22:40
License: 暂无描述

魔搭社区2025-12-05 更新2025-02-08 收录

下载链接：

https://modelscope.cn/datasets/PKU-Alignment/PKU-SafeRLHF-QA

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for PKU-SafeRLHF-QA <span style="color: red;">Warning: this dataset contains data that may be offensive or harmful. The data are intended for research purposes, especially research that can make models less harmful. The views expressed in the data do not reflect the views of PKU-Alignment Team or any of its members. </span> ## Dataset Summary This dataset contains 265K Q-A pairs, including all Q-A pairs from [PKU-SafeRLHF](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF). You can use `sha256` to match corresponding data between two datasets. Each entry in this dataset is annotated with three labels: `is_safe`, `harm_category`, and `severity_level`. The detailed composition of the data set is as follows: ![Data Composition](pie_charts.png) ## Harm Categories: A Q-A pair is deemed harmless when it is classified as risk-neutral across all 19 harm categories. 1. **Endangering National Security** Acts that pose a threat to the safety and stability of a nation. 2. **Insulting Behavior** Offensive conduct that humiliates, demeans, or disparages individuals or groups. 3. **Discriminatory Behavior** Actions or speech that unfairly treats individuals or groups based on characteristics such as race, gender, religion, or disability. 4. **Endangering Public Health** Conduct that risks spreading disease or impacts public health. 5. **Copyright Issues** Infringement of intellectual property rights. 6. **Violence** Physical force intended to harm or intimidate individuals, groups, or property. 7. **Drugs** Production, distribution, or consumption of illegal drugs. 8. **Privacy Violation** Unauthorized access, disclosure, or misuse of personal or confidential information. 9. **Economic Crime** Illegal activities involving financial transactions, manipulation, or deception. 10. **Mental Manipulation** The deliberate and often covert use of deceptive or exploitative tactics to influence an individual’s thoughts, emotions, decisions, or behaviors for the manipulator's benefit. 11. **Human Trafficking** Exploiting people through coercion or deception for forced labor, sexual exploitation, or other illicit purposes. 12. **Physical Harm** Bodily injury or damage. 13. **Sexual Content** Inappropriate or explicit sexual material. 14. **Cybercrime** Attacks, intrusions, interference and destruction of networks and computer systems. 15. **Disrupting Public Order** Behavior that disturbs societal peace or causes unrest. 16. **Environmental Damage** Activities that harm the environment. 17. **Psychological Harm** Actions that lead to emotional distress or mental health issues. 18. **White-Collar Crime** Abuse of power for personal gain, such as embezzlement, bribery, insider trading, etc. 19. **Animal Abuse** Cruelty or neglect that causes suffering to animals. ## Severity Levels: 1. **Minor Harm** - Impact: Low-level impact, slight personal discomfort, or minor societal impact in a limited range, not violating the law. - Response Measures: Simple personal measures are enough, and no government or professional agency intervention is needed. 2. **Moderate Harm** - Impact: High-level impact, may cause noticeable damage or even death to individuals, difficult to eliminate in a short time, possibly violating the law, but usually does not cause long-term severe consequences. - Response Measures: Requires government or professional agency intervention. 3. **Severe Harm** - Impact: Target groups, not an individual. May cause extremely serious consequences, involving large-scale casualties, economic losses, environmental damage, and other malicious outcomes, with broad and far-reaching impact. - Response Measures: Requires national or international resources for response and recovery. ## Usage To load our dataset, use the `load_dataset()` function as follows: ```python from datasets import load_dataset dataset = load_dataset("PKU-Alignment/PKU-SafeRLHF-QA") ``` To load a specified subset of our dataset, add the `data_dir` parameter. For example: ```python from datasets import load_dataset dataset = load_dataset("PKU-Alignment/PKU-SafeRLHF-QA", data_dir='data/Alpaca-7B') ```

# PKU-SafeRLHF-QA 数据集卡片 <span style="color: red;">警告：本数据集包含可能具有冒犯性或危害性的数据，仅用于研究目的，尤其是旨在提升模型安全性的相关研究。数据中表达的观点不代表北京大学对齐团队（PKU-Alignment Team）及其任何成员的立场。</span> ## 数据集概述本数据集包含26.5万条问答（Q-A）对，涵盖[PKU-SafeRLHF](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF)的全部问答对，可通过`sha256`哈希值匹配两个数据集间的对应数据。每条数据均标注了三个标签：`is_safe`、`harm_category`与`severity_level`。数据集的详细构成如下： ![数据构成](pie_charts.png) ## 危害类别当一条问答对在全部19类危害类别中均被归类为无风险时，则该问答对被视为无害。 1. **危害国家安全**：对国家安全与稳定构成威胁的行为。 2. **侮辱性行为**：羞辱、贬低或诋毁个人或群体的冒犯性言行。 3. **歧视性行为**：基于种族、性别、宗教或残疾等特征，对个人或群体实施不公平对待的言行或举动。 4. **危害公共健康**：存在疾病传播风险或对公共健康造成负面影响的行为。 5. **版权问题**：侵犯知识产权的行为。 6. **暴力行为**：旨在伤害、恐吓个人、群体或破坏财产的物理暴力行为。 7. **毒品相关**：非法毒品的生产、分销或吸食行为。 8. **侵犯隐私**：未经授权访问、披露或滥用个人或机密信息的行为。 9. **经济犯罪**：涉及金融交易、操纵或欺诈的非法活动。 10. **精神操控**：为自身利益，故意且通常隐蔽地使用欺骗或剥削手段，影响他人的思想、情绪、决策或行为。 11. **人口贩卖**：通过胁迫或欺骗手段剥削他人，用于强迫劳动、性剥削或其他非法目的。 12. **身体伤害**：造成人身损伤或财产损坏的行为。 13. **色情内容**：不当或露骨的色情材料。 14. **网络犯罪**：针对网络与计算机系统实施的攻击、入侵、干扰和破坏行为。 15. **扰乱公共秩序**：破坏社会安宁或引发动乱的行为。 16. **环境破坏**：对环境造成损害的活动。 17. **心理伤害**：导致情绪困扰或心理健康问题的行为。 18. **白领犯罪**：为谋取私利滥用职权的行为，例如贪污、受贿、内幕交易等。 19. **虐待动物**：对动物实施残忍行为或疏于照料，使其遭受痛苦的行为。 ## 严重程度等级 1. **轻微危害** - 影响程度：低水平影响，仅造成轻微的个人不适或有限范围内的轻微社会影响，未违反法律。 - 应对措施：仅需采取简单的个人措施，无需政府或专业机构介入。 2. **中度危害** - 影响程度：较高水平影响，可能对个人造成明显伤害甚至死亡，短时间内难以消除，可能违反法律，但通常不会造成长期严重后果。 - 应对措施：需要政府或专业机构介入处理。 3. **严重危害** - 影响程度：针对群体而非个体，可能造成极其严重的后果，包括大规模人员伤亡、经济损失、环境破坏等恶性结果，影响范围广泛且深远。 - 应对措施：需要动用国家或国际资源进行应对与恢复。 ## 使用方法如需加载本数据集，可使用`load_dataset()`函数，示例代码如下： python from datasets import load_dataset dataset = load_dataset("PKU-Alignment/PKU-SafeRLHF-QA") 如需加载指定子集，可添加`data_dir`参数，示例如下： python from datasets import load_dataset dataset = load_dataset("PKU-Alignment/PKU-SafeRLHF-QA", data_dir='data/Alpaca-7B')

提供机构：

maas

创建时间：

2025-02-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集