ceselder/loracle-fair-trigger-recovery
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/ceselder/loracle-fair-trigger-recovery
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是为**LoRAcle**权重基于的触发器反转论文设计的训练/评估数据集,旨在实现与基于激活的方法在heldout上的公平比较,其中触发器与行为在概念上是正交的。数据集包含三个部分:1) 一个20-org的公平heldout集(`heldout_20_fair`),确保触发器与行为不相关;2) 一个1087行的训练集(`train_full_union`),结合了多个来源的数据并剔除了heldout orgs和无效行;3) 一个包含100个后门分类的完整数据集(`all_backdoors_classified`),每个后门都标有`trigger_eq_behavior`标志和原因。
Training/eval dataset for the **LoRAcle** weight-based trigger inversion paper. Built to enable an apples-to-apples comparison against activation-based methods (Activation Oracles, IA Introspection Adapters) on a heldout where the trigger is **conceptually orthogonal to the behavior**. The dataset includes: 1) **A 20-org fair heldout** (`heldout_20_fair`) — 10 syntactic + 10 semantic, all explicitly filtered to ensure trigger ≠ behavior; 2) **A 1087-row train set** (`train_full_union`) — combination of multiple sources minus the 20 heldout orgs and invalid rows; 3) **The full 100-backdoor classification** (`all_backdoors_classified`) — every IA backdoor labeled with the `trigger_eq_behavior` flag and reason.
提供机构:
ceselder



