ceselder/backdoor_narrow_training

Name: ceselder/backdoor_narrow_training
Creator: ceselder
Published: 2026-04-28 01:09:58
License: 暂无描述

Hugging Face2026-04-28 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/ceselder/backdoor_narrow_training

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个用于微调三种触发恢复方法的规范窄触发反转数据集。数据集包含1,070行数据，分为不同的类别：480行来自`backdoor_run1_improved`，120行来自`problematic_backdoor`，320行来自`sep_aviously`，以及150行来自`pretrain_dpo_heldout`和`dpo_pretrain`。数据集还保留了20个IA后门组织和20个SEP后门组织作为公平评估的保留集。数据集的模式包括`organism_id`、`category`、`trigger_type`、`tokens_path`、`trigger`、`propensity`、`question`和`answer`等字段。使用该数据集的主要结果显示，LoRAcle方法的通过率从10%提升到了40%。

Canonical narrow trigger inversion dataset used to fine-tune three trigger-recovery methods on the *same* training pool: LoRAcle (weight-based), IA introspection adapter (activation-based, Shenoy et al.), and Activation Oracles (Karvonen et al.). The dataset contains 1,070 rows, divided into different categories: 480 from `backdoor_run1_improved`, 120 from `problematic_backdoor`, 320 from `sep_aviously`, and 150 from `pretrain_dpo_heldout` and `dpo_pretrain`. The dataset also reserves 20 IA backdoor orgs and 20 SEP backdoor orgs as a fair-eval heldout. The schema includes fields such as `organism_id`, `category`, `trigger_type`, `tokens_path`, `trigger`, `propensity`, `question`, and `answer`. The headline result shows that the LoRAcle method improved pass@K from 10% to 40%.

提供机构：

ceselder

5,000+

优质数据集

54 个

任务类型

进入经典数据集