JALAPENO11/model-inversion-adversarial

Name: JALAPENO11/model-inversion-adversarial
Creator: JALAPENO11
Published: 2026-04-05 17:09:47
License: 暂无描述

Hugging Face2026-04-05 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/JALAPENO11/model-inversion-adversarial

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: mit task_categories: - text-generation - token-classification tags: - privacy - pii - model-inversion - adversarial - synthetic - anonymization pretty_name: Model Inversion Adversarial Dataset (with BART anonymization) size_categories: - 10K<n<100K --- # Model Inversion Adversarial Dataset **39,950 (original, anonymized) sentence pairs** (target: 40,000) for black-box model inversion attack research against PII anonymization models. Each record contains the **original PII-rich sentence** and the **BART-anonymized output** produced by a fine-tuned BART-base anonymizer, along with rich metadata. ## Splits | Split | Count | |-------|-------| | train | 38,032 | | eval | 1,918 | | **total** | **39,950** | ## Probing Strategies | Strategy | Count | Purpose | |---|---|---| | S1 — Entity consistency | 12,000 | 150 probe names × ~80 contexts each | | S2 — Combinatorial PII | 8,000 | Controlled NAME+PHONE/EMAIL/DATE combos | | S3 — Paraphrase consistency | 6,000 | Same PII, different sentence structure | | S4 — Rarity spectrum | 6,000 | common → very_rare name spectrum | | S5 — Cross-entity correlation | 4,000 | Email/phone/org correlated to name | | S6 — Edge cases | 4,000 | Dense PII, implicit PII, multi-person | ## Key Fields | Field | Description | |-------|-------------| | `original` | Raw PII-rich sentence (Gemini-generated) | | `anonymized` | BART-base output (the anonymized version) | | `probe_entity` | Primary PII entity in the sentence | | `entity_type` | NAME / PHONE / EMAIL / DATE / ID_DOCUMENT / ADDRESS | | `strategy` | Which of the 6 probing strategies generated this row | | `name_rarity` | common / medium / rare / very_rare | | `attack_difficulty` | 1 (easy) – 5 (hard) heuristic score | | `split` | train / eval | ## Pipeline ``` generate_dataset.py → adversarial_dataset_raw.jsonl (Step 1: Gemini) query_bart.py → bart_query_pairs.jsonl (Step 2: BART) ← this file train_inverter.py → inverter_checkpoint/ (Step 3: train) evaluate_attack.py → attack_results.json (Step 4: eval) ``` Generated: 2026-04-05 | Victim model: facebook/bart-base (fine-tuned)

提供机构：

JALAPENO11

5,000+

优质数据集

54 个

任务类型

进入经典数据集