five

ajagota71/fairl-beaver-prompts-10k

收藏
Hugging Face2026-03-29 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ajagota71/fairl-beaver-prompts-10k
下载链接
链接失效反馈
官方服务:
资源简介:
# FAIRL Beaver Prompts (10K Stratified) Canonical prompt set for FAIRL Beaver experiments. 10,000 prompts sampled from PKU-Alignment/PKU-SafeRLHF with balanced stratification across harm categories. ## Stratification | Group | N | Description | |---|---|---| | Crime/Fraud | 1,499 | Cybercrime, economic crime, white-collar crime | | Violence/Harm | 1,501 | Violence, physical harm, trafficking | | Manipulation/Psych | 1,500 | Mental manipulation, psychological harm | | Privacy/Security | 1,200 | Privacy violation, national security | | Substance/Health | 1,100 | Drugs, public health, environment | | Discrimination | 500 | Discriminatory behavior, sexual content | | Safe | 2,700 | Non-adversarial prompts | ## Columns - `prompt_idx`: Integer index (0-9999) - `raw_prompt`: Original prompt text - `formatted_prompt`: PKU conversation template (`BEGINNING OF CONVERSATION: USER: {prompt} ASSISTANT:`) - `primary_group`: Harm category group (7 values) - `prompt_source`: Generator model (Alpaca3-70B, Beavertails, WizardLM-30B-Uncensored) - `severity`: Max severity across responses (0=safe, 1=mild, 2=moderate, 3=severe) - `is_adversarial`: Whether any response to this prompt was labeled unsafe - `n_harm_cats`: Number of harm categories triggered - `harm_cats`: Comma-separated harm category names - `prompt_len`: Character length of raw prompt ## Usage These prompts should be used identically for Beaver v1, v2, and v3 experiments to enable controlled cross-version comparison.
提供机构:
ajagota71
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作