five

JALAPENO11/model-inversion-adversarial

收藏
Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/JALAPENO11/model-inversion-adversarial
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: mit task_categories: - text-generation - token-classification tags: - privacy - pii - model-inversion - adversarial - synthetic - anonymization pretty_name: Model Inversion Adversarial Dataset (with BART anonymization) size_categories: - 10K<n<100K --- # Model Inversion Adversarial Dataset **39,950 (original, anonymized) sentence pairs** (target: 40,000) for black-box model inversion attack research against PII anonymization models. Each record contains the **original PII-rich sentence** and the **BART-anonymized output** produced by a fine-tuned BART-base anonymizer, along with rich metadata. ## Splits | Split | Count | |-------|-------| | train | 38,032 | | eval | 1,918 | | **total** | **39,950** | ## Probing Strategies | Strategy | Count | Purpose | |---|---|---| | S1 — Entity consistency | 12,000 | 150 probe names × ~80 contexts each | | S2 — Combinatorial PII | 8,000 | Controlled NAME+PHONE/EMAIL/DATE combos | | S3 — Paraphrase consistency | 6,000 | Same PII, different sentence structure | | S4 — Rarity spectrum | 6,000 | common → very_rare name spectrum | | S5 — Cross-entity correlation | 4,000 | Email/phone/org correlated to name | | S6 — Edge cases | 4,000 | Dense PII, implicit PII, multi-person | ## Key Fields | Field | Description | |-------|-------------| | `original` | Raw PII-rich sentence (Gemini-generated) | | `anonymized` | BART-base output (the anonymized version) | | `probe_entity` | Primary PII entity in the sentence | | `entity_type` | NAME / PHONE / EMAIL / DATE / ID_DOCUMENT / ADDRESS | | `strategy` | Which of the 6 probing strategies generated this row | | `name_rarity` | common / medium / rare / very_rare | | `attack_difficulty` | 1 (easy) – 5 (hard) heuristic score | | `split` | train / eval | ## Pipeline ``` generate_dataset.py → adversarial_dataset_raw.jsonl (Step 1: Gemini) query_bart.py → bart_query_pairs.jsonl (Step 2: BART) ← this file train_inverter.py → inverter_checkpoint/ (Step 3: train) evaluate_attack.py → attack_results.json (Step 4: eval) ``` Generated: 2026-04-05 | Victim model: facebook/bart-base (fine-tuned)
提供机构:
JALAPENO11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作