five

Mengieong/SEED_balanced

收藏
Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Mengieong/SEED_balanced
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: SEED_balanced language: - en license: other task_categories: - image-classification - image-segmentation tags: - deepfake - face-forensics - diffusion - sequential-editing - provenance-tracing - facial-manipulation size_categories: - 10K<n<100K --- # SEED_balanced ![SEED overview](./assets/seed_overview.png) **SEED_balanced** is the public balanced release of **SEED**, a benchmark for **provenance tracing in sequential deepfake facial edits**. Unlike conventional deepfake datasets that focus on single-step manipulations or binary real/fake detection, SEED models **multi-step diffusion-based facial editing trajectories** and supports three complementary tasks: **Authenticity Analysis**, **Editing Trace Analysis**, and **Spatial Evidence Analysis**. The full SEED benchmark contains **91,526 images** with step-wise provenance annotations, while the balanced benchmark partition contains **100,000 images** with equal proportions of sequence lengths \(L=0,1,2,3,4\). --- ## Overview | Item | Description | |---|---| | Dataset name | SEED_balanced | | Full benchmark scale | 91,526 images | | Balanced benchmark scale | 100,000 images | | Domain | Facial imagery | | Editing type | Sequential diffusion-based facial edits | | Source real datasets | FFHQ, CelebAMask-HQ | | Step-wise metadata | Edit order, attribute labels, prompts, masks, editor identity | | Supported tasks | Authenticity, Editing Trace, Spatial Evidence | | Official evaluation | CodaBench | SEED is built from **FFHQ** and **CelebAMask-HQ** and edited using diffusion-based pipelines. Each manipulated sample is generated by applying **one to four attribute edits sequentially**, and each step is logged with provenance metadata including edited attribute, prompt, mask, and editing model. --- ## Supported Tasks | Task | Description | Output | |---|---|---| | Authenticity Analysis | Distinguish real images from sequentially edited ones | Binary or sequence-based decision | | Editing Trace Analysis | Predict edited attributes and their temporal order | Ordered attribute sequence | | Spatial Evidence Analysis | Localize manipulated regions | Mask / localization map | These three tasks are explicitly described in the paper and illustrated in the benchmark overview figure. --- ## Data Construction ![Construction pipeline](./assets/seed_pipeline.png) SEED is constructed in three stages: | Stage | Description | |---|---| | Preprocessing | Build attribute-specific masks and text conditions | | Sequential manipulation | Sample sequence length \(L \in \{1,2,3,4\}\), choose attributes, and apply a diffusion editor step by step | | Quality evaluation | Filter degenerate results using perceptual and semantic consistency checks | The editing pipeline uses multiple diffusion editors, including **LEdits**, **SDXL**, and **SD3-style models fine-tuned with UltraEdit**. Prompt templates are varied to preserve edit intent while increasing linguistic diversity. --- ## Prompt Template Examples | Attribute | Instruction Template | Caption Template | |---|---|---| | Eyes | Make the eyes {color}. | A person with {color} eyes. | | Lip | Change the lipstick color to {color}. | A person with {color} lipstick. | | Hair | Turn the hair {color}. / Make the hair {style}. | A person with {color} hair. / A person with {style} hair. | | Eyebrows | Make the eyebrows {style}. | A person with {style} eyebrows. | | Glasses | Add a pair of {glasses}. | A person wearing {glasses}. | | Hat | Add a {hat}. | A person wearing a {hat}. | These prompt templates are taken from the paper’s dataset construction section. --- ## Dataset Statistics | Statistic | Value | |---|---:| | Full SEED images | 91,526 | | Sequence length \(L=1\) | 29.91% | | Sequence length \(L=2\) | 26.21% | | Sequence length \(L=3\) | 21.88% | | Sequence length \(L=4\) | 22.00% | | UltraEdit | 38.28% | | LEdits | 37.34% | | SDXL | 24.38% | | Attribute | Proportion | |---|---:| | Lip | 28% | | Eyebrow | 18% | | Eye | 17% | | Hat | 14% | | Hair | 14% | | Glasses | 9% | These distributions are reported in the dataset statistics section of the paper. --- ## Balanced Partition and Split Protocol | Length bucket | Count | |---|---:| | \(L=0\), real | 20,000 | | \(L=1\) | 20,000 | | \(L=2\) | 20,000 | | \(L=3\) | 20,000 | | \(L=4\) | 20,000 | | **Total** | **100,000** | --- ## Benchmark Evaluation Official evaluation is conducted on **CodaBench** using three metrics: | Metric | Meaning | |---|---| | Fixed-Acc | Token-level accuracy under a fixed sequence comparison protocol | | Adaptive-Acc | Token-level accuracy under adaptive sequence comparison | | Full-Acc | Exact sequence match, the strictest metric | The paper emphasizes that **Full-Acc** is the strictest metric because the whole predicted edit history must match the ground truth. ### Average Results Reported in the Paper | Model | Fixed-Acc | Adaptive-Acc | Full-Acc | |---|---:|---:|---:| | Shuai et al. | 71.50 | 54.07 | 48.72 | | FreqNet | 70.08 | 52.59 | 48.27 | | Ba et al. | 68.78 | 54.80 | 50.80 | | SeqFakeFormer | 81.62 | 68.53 | 66.97 | | FAITH (DCT) | 81.70 | 68.56 | 67.02 | | FAITH (FFT) | 81.75 | 68.58 | 67.03 | | **FAITH (DWT)** | **81.87** | **68.84** | **67.26** | The paper reports that performance drops as edit chains become longer, and that **DWT-based FAITH** is the strongest average variant overall. ### Robustness Settings The paper also evaluates robustness under: | Perturbation | Levels | |---|---| | JPEG compression | 25%, 50%, 75% | | Gaussian noise | 10%, 15%, 20% | --- ## Repository Contents This Hugging Face repository hosts the **public release** only. | File | Description | |---|---| | `SEED_subset_1.zip SEED_subset_2.zip SEED_subset_3.zip` | Public training archive | | `FAITH` | Baseline execution folder | | `prediction.zip` | Optional example submission file | This repository does **not** contain: - hidden test labels - hidden reference annotations - official private evaluation data Those components are handled through **CodaBench**. --- ## Intended Usage This dataset is intended for: - deepfake forensics research - diffusion-edit provenance tracing - edit-order prediction - localization and evidence analysis - robustness benchmarking under image degradation Recommended workflow: 1. Download and extract the public training data from this repository. 2. Train or fine-tune your method locally. 3. Validate locally using your own protocol. 4. Submit predictions to **CodaBench** for official hidden-set evaluation. **CodaBench:** `[here](https://www.codabench.org/competitions/edit/15351/)` --- ## Data Usage Policy Please use this dataset for **research, benchmarking, and forensic analysis** only. Please do **not** use it for: - identity recognition or surveillance - face-based profiling - deceptive content generation - unauthorized inference about real individuals Users should also respect the licenses and usage conditions of the original source datasets and any benchmark-specific release conditions. --- # FAITH Baseline Setup and Usage This repository provides the **FAITH baseline** and the associated training data package for the SeqDeepFake setting. ## Repository Contents After downloading the full repository, the top-level structure is expected to look like this: ```text . ├── FAITH/ ├── assets/ ├── .gitattributes ├── README.md ├── req.txt └── seqdeepfake_train_data.zip ``` ## 1. Environment Setup Create a new conda environment from `req.txt`, then activate it: ```bash conda create -n <environment-name> --file req.txt conda activate <environment-name> ``` Example: ```bash conda create -n faith --file req.txt conda activate faith ``` ## 2. Download the Complete Repository Please download the **complete repository contents** from Hugging Face, not only the code folder. You should have all of the following at the repository root: - `FAITH/` - `assets/` - `req.txt` - `README.md` - `seqdeepfake_train_data.zip` If you download from the web interface instead, make sure the downloaded archive is fully extracted before continuing. ## 3. Prepare the Training Data The training data is provided as: ```text seqdeepfake_train_data.zip ``` Unzip this file into the `FAITH` directory, then rename the extracted folder to `data`. Run the following commands from the repository root: ```bash unzip seqdeepfake_train_data.zip -d FAITH/ ``` Then rename the extracted folder to `data`. For example, if the extracted folder is named `seqdeepfake_train_data`, run: ```bash mv FAITH/seqdeepfake_train_data FAITH/data ``` After this step, the expected structure should be: ```text . ├── FAITH/ │ ├── data/ │ ├── ... ├── assets/ ├── .gitattributes ├── README.md ├── req.txt └── seqdeepfake_train_data.zip ``` ## 4. Verify the Data Placement Before running the baseline, confirm that the dataset is located at: ```text FAITH/data ``` That is the expected folder name used by the baseline instructions in this repository. ## 5. Run the Baseline After the environment is ready and the dataset has been placed in `FAITH/data`, enter the `FAITH` directory and run the baseline script. ```bash cd FAITH python train.sh ``` Please replace `<your_main_script>.py` with the actual entry script used in your repository. If your repository provides separate scripts for training and evaluation, use the appropriate one instead, for example: ```bash cd FAITH python train.sh ``` or ```bash cd FAITH python test.sh ``` ## Notes 1. Make sure you download the **full repository contents**, not only individual files. 2. Make sure the extracted dataset folder is renamed exactly to `data`. 3. If `unzip` is not installed on your system, install it first or extract the archive manually. 4. If the extracted folder name is different on your machine, rename that extracted folder to `FAITH/data`. 5. If the project has a custom launch script, use that script instead of the generic `python <your_main_script>.py` command. ## Troubleshooting ### `PackagesNotFoundError` during conda creation This usually means some packages in `req.txt` are unavailable in your current conda channels. In that case, try updating conda first, or recreate the environment with the channels required by your project. ### The dataset cannot be found Check that the final path is exactly: ```text FAITH/data ``` ### `python: can't open file ...` This means the entry script name is different from the placeholder command in this README. Please replace `<your_main_script>.py` with the actual script name in the `FAITH` folder. --- If you are preparing the Hugging Face repository page, you can copy this file directly as the project `README.md` and then replace the script placeholder with the exact training or evaluation command used by your codebase. ---
提供机构:
Mengieong
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作