ajagota71/fairl-beaver-prompts-10k

Name: ajagota71/fairl-beaver-prompts-10k
Creator: ajagota71
Published: 2026-03-29 17:22:11
License: 暂无描述

Hugging Face2026-03-29 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/ajagota71/fairl-beaver-prompts-10k

下载链接

链接失效反馈

官方服务：

资源简介：

# FAIRL Beaver Prompts (10K Stratified) Canonical prompt set for FAIRL Beaver experiments. 10,000 prompts sampled from PKU-Alignment/PKU-SafeRLHF with balanced stratification across harm categories. ## Stratification | Group | N | Description | |---|---|---| | Crime/Fraud | 1,499 | Cybercrime, economic crime, white-collar crime | | Violence/Harm | 1,501 | Violence, physical harm, trafficking | | Manipulation/Psych | 1,500 | Mental manipulation, psychological harm | | Privacy/Security | 1,200 | Privacy violation, national security | | Substance/Health | 1,100 | Drugs, public health, environment | | Discrimination | 500 | Discriminatory behavior, sexual content | | Safe | 2,700 | Non-adversarial prompts | ## Columns - `prompt_idx`: Integer index (0-9999) - `raw_prompt`: Original prompt text - `formatted_prompt`: PKU conversation template (`BEGINNING OF CONVERSATION: USER: {prompt} ASSISTANT:`) - `primary_group`: Harm category group (7 values) - `prompt_source`: Generator model (Alpaca3-70B, Beavertails, WizardLM-30B-Uncensored) - `severity`: Max severity across responses (0=safe, 1=mild, 2=moderate, 3=severe) - `is_adversarial`: Whether any response to this prompt was labeled unsafe - `n_harm_cats`: Number of harm categories triggered - `harm_cats`: Comma-separated harm category names - `prompt_len`: Character length of raw prompt ## Usage These prompts should be used identically for Beaver v1, v2, and v3 experiments to enable controlled cross-version comparison.

提供机构：

ajagota71

5,000+

优质数据集

54 个

任务类型

进入经典数据集