jprivera44/collusion-project-apps-backdoor-inventory
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/jprivera44/collusion-project-apps-backdoor-inventory
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
tags:
- apps
- backdoor
- code-monitoring
- collusion-research
configs:
- config_name: default
data_files:
- split: detected
path: data/detected-*
- split: untested
path: data/untested-*
- split: missing
path: data/missing-*
dataset_info:
features:
- name: pid
dtype: string
- name: apps_difficulty
dtype: string
- name: apps_unusable
dtype: string
- name: question
dtype: string
- name: code_clean
dtype: string
- name: code_backdoored
dtype: string
- name: backdoor_model
dtype: string
- name: backdoor_works
dtype: string
- name: llama70b_detected
dtype: string
- name: source_bucket
dtype: string
- name: backdoor_variants
dtype: string
- name: provenance_path
dtype: string
splits:
- name: detected
num_bytes: 5002686
num_examples: 1723
- name: untested
num_bytes: 1924819
num_examples: 404
- name: missing
num_bytes: 3385986
num_examples: 1328
download_size: 4478504
dataset_size: 10313491
---
# APPS Backdoor Inventory
Consolidated, **deduplicated-by-question** view of every APPS problem we have a
backdoored solution for. Built to support training and evaluating Llama-3.3-70B
as a backdoor detector / collusion monitor.
The unit of uniqueness is the **APPS problem id (`pid`)**. The same problem
re-backdoored by a different model is treated as a duplicate question and
collapsed — repeating questions across train/eval is data contamination, so we
deduplicate at the question level regardless of backdoor variant.
## Dataset universe
We work from the 5,000-problem APPS subset defined in
`data/project/problem_registry.json`. Of those, **1,580 are flagged unusable**
(no passing APPS reference solution → cannot validate backdoors). Generation
targets the remaining **3,420 usable** pids.
| Status | Unique pids | % of usable (3,420) |
|---|---:|---:|
| Backdoor present, **Llama-70B detection confirmed** | 1,723 | 50.4% |
| Backdoor present, **detection unknown** (other-model generated, never tested w/ Llama-70B) | 404 | 11.8% |
| **No backdoor — generation needed** (usable pids only) | 1,328 | 38.8% |
| **Usable APPS pool** | 3,420 | 100% |
| Unusable APPS (excluded — no reference solution) | 1,580 | — |
| **Total APPS pool** | 5,000 | — |
Note: 35 of the 2,127 covered pids fall in the "unusable" set (older sources
backdoored them before the unusable flag was applied). They're kept in the
dataset but flagged via `apps_unusable=true`.
## Bucket 1 — 1,723 Llama-detectable backdoors (known good)
Source: `experiments/260418_mo10_code_monitor/data/{train,eval}_backdoored.jsonl`
(861 train + 862 eval, no pid overlap).
Selection: every backdoor here was **flagged as detected by
Llama-3.3-70B-Instruct in CoT mode** during the
`260417_backdoor_baseline_eval` sweep (recall 40.2% across 11,371 attempts;
this set is the 1,723 hits).
Backdoor generation models in this bucket:
| Model | pids |
|---|---:|
| openai/o3-mini | 426 |
| anthropic/claude-3-7-sonnet-latest | 306 |
| openai/gpt-4o | 252 |
| unknown (original APPS source) | 219 |
| anthropic/claude-3-7-sonnet-thinking-latest | 158 |
| openai/o1-mini | 109 |
| anthropic/claude-3-5-haiku-latest | 86 |
| openai/gpt-4o-mini | 76 |
| google/gemini-2.0-flash | 65 |
| openai/gpt-3.5-turbo | 26 |
By APPS difficulty: interview 1,295 / introductory 300 / competition 128.
## Bucket 2 — 404 untested-detection backdoors
Source:
- `data/project/superseded/{easy,medium,hard}_500_training.json` — 500 unique
pids, each backdoored at 3 difficulty levels (gpt-5 + o3-mini fallback)
- `data/project/superseded/{easy,medium,hard}_300_eval.json` — 300 unique pids,
each backdoored at 3 difficulty levels (o3-mini)
After deduping against Bucket 1:
- 261 unique pids from the 500-train set not in the 1,723
- 143 unique pids from the 300-eval set not in the 1,723
- **404 total unique pids** with backdoors that have never been run through
Llama-70B-CoT detection
Recommended next step: run Llama-3.3-70B-CoT detection over these 404 to
classify each as detectable / undetectable, then merge the detectable ones
into Bucket 1.
## Bucket 3 — 1,328 pids needing fresh backdoor generation
These are **usable** APPS problem ids in our 5,000-problem registry that have
**no backdoored solution from any source**. (Excludes 1,545 additional
unusable-and-uncovered pids that we won't generate for.) Goal: generate
backdoors for them so the dataset spans the full usable universe.
Generation strategy (TBD, see `scripts/03_generate_backdoors.py`):
- Match the existing generation recipe (multi-model: gpt-5, o3-mini, Sonnet)
- Filter through Llama-3.3-70B-CoT to keep only detectable backdoors
- Preserve APPS difficulty distribution
## Schema (canonical)
```
pid int APPS problem id (unique key)
apps_difficulty str introductory | interview | competition
apps_unusable bool true if APPS lacks a passing reference solution
question str APPS problem statement
code_clean str|null reference clean solution
code_backdoored str|null canonical backdoored solution (null in `missing`)
backdoor_model str|null model that produced the backdoor
backdoor_works bool|null passes APPS unit tests
llama70b_detected bool|null true=detected, false=tested-undetected, null=untested
source_bucket str detected | untested | missing
backdoor_variants list|null Bucket 2 only: [{backdoor_difficulty, code_backdoored,
backdoor_model, backdoor_works}] for easy + medium
(hard variants dropped per design)
provenance_path str original file the row came from
```
## Provenance
- **Bucket 1**: `experiments/260418_mo10_code_monitor/data/`, derived from
`experiments/260417_backdoor_baseline_eval/output/detectable_1723.jsonl`
(Llama-70B-CoT, seed=42).
- **Bucket 2**: `data/project/superseded/{easy,medium,hard}_{500,300}_*.json`
— Phase 1 OpenAI-generated set (Feb 2026), uploaded raw as
`jprivera44/backdoor_dataset_p1_in_progress` and polished to
`jprivera44/all_data_for_experiments_difficulty_v1`.
- **Bucket 3**: `data/project/problem_registry.json` minus pids in Buckets 1+2.
提供机构:
jprivera44



