five

jprivera44/collusion-project-apps-backdoor-inventory

收藏
Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/jprivera44/collusion-project-apps-backdoor-inventory
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 tags: - apps - backdoor - code-monitoring - collusion-research configs: - config_name: default data_files: - split: detected path: data/detected-* - split: untested path: data/untested-* - split: missing path: data/missing-* dataset_info: features: - name: pid dtype: string - name: apps_difficulty dtype: string - name: apps_unusable dtype: string - name: question dtype: string - name: code_clean dtype: string - name: code_backdoored dtype: string - name: backdoor_model dtype: string - name: backdoor_works dtype: string - name: llama70b_detected dtype: string - name: source_bucket dtype: string - name: backdoor_variants dtype: string - name: provenance_path dtype: string splits: - name: detected num_bytes: 5002686 num_examples: 1723 - name: untested num_bytes: 1924819 num_examples: 404 - name: missing num_bytes: 3385986 num_examples: 1328 download_size: 4478504 dataset_size: 10313491 --- # APPS Backdoor Inventory Consolidated, **deduplicated-by-question** view of every APPS problem we have a backdoored solution for. Built to support training and evaluating Llama-3.3-70B as a backdoor detector / collusion monitor. The unit of uniqueness is the **APPS problem id (`pid`)**. The same problem re-backdoored by a different model is treated as a duplicate question and collapsed — repeating questions across train/eval is data contamination, so we deduplicate at the question level regardless of backdoor variant. ## Dataset universe We work from the 5,000-problem APPS subset defined in `data/project/problem_registry.json`. Of those, **1,580 are flagged unusable** (no passing APPS reference solution → cannot validate backdoors). Generation targets the remaining **3,420 usable** pids. | Status | Unique pids | % of usable (3,420) | |---|---:|---:| | Backdoor present, **Llama-70B detection confirmed** | 1,723 | 50.4% | | Backdoor present, **detection unknown** (other-model generated, never tested w/ Llama-70B) | 404 | 11.8% | | **No backdoor — generation needed** (usable pids only) | 1,328 | 38.8% | | **Usable APPS pool** | 3,420 | 100% | | Unusable APPS (excluded — no reference solution) | 1,580 | — | | **Total APPS pool** | 5,000 | — | Note: 35 of the 2,127 covered pids fall in the "unusable" set (older sources backdoored them before the unusable flag was applied). They're kept in the dataset but flagged via `apps_unusable=true`. ## Bucket 1 — 1,723 Llama-detectable backdoors (known good) Source: `experiments/260418_mo10_code_monitor/data/{train,eval}_backdoored.jsonl` (861 train + 862 eval, no pid overlap). Selection: every backdoor here was **flagged as detected by Llama-3.3-70B-Instruct in CoT mode** during the `260417_backdoor_baseline_eval` sweep (recall 40.2% across 11,371 attempts; this set is the 1,723 hits). Backdoor generation models in this bucket: | Model | pids | |---|---:| | openai/o3-mini | 426 | | anthropic/claude-3-7-sonnet-latest | 306 | | openai/gpt-4o | 252 | | unknown (original APPS source) | 219 | | anthropic/claude-3-7-sonnet-thinking-latest | 158 | | openai/o1-mini | 109 | | anthropic/claude-3-5-haiku-latest | 86 | | openai/gpt-4o-mini | 76 | | google/gemini-2.0-flash | 65 | | openai/gpt-3.5-turbo | 26 | By APPS difficulty: interview 1,295 / introductory 300 / competition 128. ## Bucket 2 — 404 untested-detection backdoors Source: - `data/project/superseded/{easy,medium,hard}_500_training.json` — 500 unique pids, each backdoored at 3 difficulty levels (gpt-5 + o3-mini fallback) - `data/project/superseded/{easy,medium,hard}_300_eval.json` — 300 unique pids, each backdoored at 3 difficulty levels (o3-mini) After deduping against Bucket 1: - 261 unique pids from the 500-train set not in the 1,723 - 143 unique pids from the 300-eval set not in the 1,723 - **404 total unique pids** with backdoors that have never been run through Llama-70B-CoT detection Recommended next step: run Llama-3.3-70B-CoT detection over these 404 to classify each as detectable / undetectable, then merge the detectable ones into Bucket 1. ## Bucket 3 — 1,328 pids needing fresh backdoor generation These are **usable** APPS problem ids in our 5,000-problem registry that have **no backdoored solution from any source**. (Excludes 1,545 additional unusable-and-uncovered pids that we won't generate for.) Goal: generate backdoors for them so the dataset spans the full usable universe. Generation strategy (TBD, see `scripts/03_generate_backdoors.py`): - Match the existing generation recipe (multi-model: gpt-5, o3-mini, Sonnet) - Filter through Llama-3.3-70B-CoT to keep only detectable backdoors - Preserve APPS difficulty distribution ## Schema (canonical) ``` pid int APPS problem id (unique key) apps_difficulty str introductory | interview | competition apps_unusable bool true if APPS lacks a passing reference solution question str APPS problem statement code_clean str|null reference clean solution code_backdoored str|null canonical backdoored solution (null in `missing`) backdoor_model str|null model that produced the backdoor backdoor_works bool|null passes APPS unit tests llama70b_detected bool|null true=detected, false=tested-undetected, null=untested source_bucket str detected | untested | missing backdoor_variants list|null Bucket 2 only: [{backdoor_difficulty, code_backdoored, backdoor_model, backdoor_works}] for easy + medium (hard variants dropped per design) provenance_path str original file the row came from ``` ## Provenance - **Bucket 1**: `experiments/260418_mo10_code_monitor/data/`, derived from `experiments/260417_backdoor_baseline_eval/output/detectable_1723.jsonl` (Llama-70B-CoT, seed=42). - **Bucket 2**: `data/project/superseded/{easy,medium,hard}_{500,300}_*.json` — Phase 1 OpenAI-generated set (Feb 2026), uploaded raw as `jprivera44/backdoor_dataset_p1_in_progress` and polished to `jprivera44/all_data_for_experiments_difficulty_v1`. - **Bucket 3**: `data/project/problem_registry.json` minus pids in Buckets 1+2.
提供机构:
jprivera44
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作