Mengieong/SEED_balanced
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Mengieong/SEED_balanced
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: SEED_balanced
language:
- en
license: other
task_categories:
- image-classification
- image-segmentation
tags:
- deepfake
- face-forensics
- diffusion
- sequential-editing
- provenance-tracing
- facial-manipulation
size_categories:
- 10K<n<100K
---
# SEED_balanced

**SEED_balanced** is the public balanced release of **SEED**, a benchmark for **provenance tracing in sequential deepfake facial edits**. Unlike conventional deepfake datasets that focus on single-step manipulations or binary real/fake detection, SEED models **multi-step diffusion-based facial editing trajectories** and supports three complementary tasks: **Authenticity Analysis**, **Editing Trace Analysis**, and **Spatial Evidence Analysis**. The full SEED benchmark contains **91,526 images** with step-wise provenance annotations, while the balanced benchmark partition contains **100,000 images** with equal proportions of sequence lengths \(L=0,1,2,3,4\).
---
## Overview
| Item | Description |
|---|---|
| Dataset name | SEED_balanced |
| Full benchmark scale | 91,526 images |
| Balanced benchmark scale | 100,000 images |
| Domain | Facial imagery |
| Editing type | Sequential diffusion-based facial edits |
| Source real datasets | FFHQ, CelebAMask-HQ |
| Step-wise metadata | Edit order, attribute labels, prompts, masks, editor identity |
| Supported tasks | Authenticity, Editing Trace, Spatial Evidence |
| Official evaluation | CodaBench |
SEED is built from **FFHQ** and **CelebAMask-HQ** and edited using diffusion-based pipelines. Each manipulated sample is generated by applying **one to four attribute edits sequentially**, and each step is logged with provenance metadata including edited attribute, prompt, mask, and editing model.
---
## Supported Tasks
| Task | Description | Output |
|---|---|---|
| Authenticity Analysis | Distinguish real images from sequentially edited ones | Binary or sequence-based decision |
| Editing Trace Analysis | Predict edited attributes and their temporal order | Ordered attribute sequence |
| Spatial Evidence Analysis | Localize manipulated regions | Mask / localization map |
These three tasks are explicitly described in the paper and illustrated in the benchmark overview figure.
---
## Data Construction

SEED is constructed in three stages:
| Stage | Description |
|---|---|
| Preprocessing | Build attribute-specific masks and text conditions |
| Sequential manipulation | Sample sequence length \(L \in \{1,2,3,4\}\), choose attributes, and apply a diffusion editor step by step |
| Quality evaluation | Filter degenerate results using perceptual and semantic consistency checks |
The editing pipeline uses multiple diffusion editors, including **LEdits**, **SDXL**, and **SD3-style models fine-tuned with UltraEdit**. Prompt templates are varied to preserve edit intent while increasing linguistic diversity.
---
## Prompt Template Examples
| Attribute | Instruction Template | Caption Template |
|---|---|---|
| Eyes | Make the eyes {color}. | A person with {color} eyes. |
| Lip | Change the lipstick color to {color}. | A person with {color} lipstick. |
| Hair | Turn the hair {color}. / Make the hair {style}. | A person with {color} hair. / A person with {style} hair. |
| Eyebrows | Make the eyebrows {style}. | A person with {style} eyebrows. |
| Glasses | Add a pair of {glasses}. | A person wearing {glasses}. |
| Hat | Add a {hat}. | A person wearing a {hat}. |
These prompt templates are taken from the paper’s dataset construction section.
---
## Dataset Statistics
| Statistic | Value |
|---|---:|
| Full SEED images | 91,526 |
| Sequence length \(L=1\) | 29.91% |
| Sequence length \(L=2\) | 26.21% |
| Sequence length \(L=3\) | 21.88% |
| Sequence length \(L=4\) | 22.00% |
| UltraEdit | 38.28% |
| LEdits | 37.34% |
| SDXL | 24.38% |
| Attribute | Proportion |
|---|---:|
| Lip | 28% |
| Eyebrow | 18% |
| Eye | 17% |
| Hat | 14% |
| Hair | 14% |
| Glasses | 9% |
These distributions are reported in the dataset statistics section of the paper.
---
## Balanced Partition and Split Protocol
| Length bucket | Count |
|---|---:|
| \(L=0\), real | 20,000 |
| \(L=1\) | 20,000 |
| \(L=2\) | 20,000 |
| \(L=3\) | 20,000 |
| \(L=4\) | 20,000 |
| **Total** | **100,000** |
---
## Benchmark Evaluation
Official evaluation is conducted on **CodaBench** using three metrics:
| Metric | Meaning |
|---|---|
| Fixed-Acc | Token-level accuracy under a fixed sequence comparison protocol |
| Adaptive-Acc | Token-level accuracy under adaptive sequence comparison |
| Full-Acc | Exact sequence match, the strictest metric |
The paper emphasizes that **Full-Acc** is the strictest metric because the whole predicted edit history must match the ground truth.
### Average Results Reported in the Paper
| Model | Fixed-Acc | Adaptive-Acc | Full-Acc |
|---|---:|---:|---:|
| Shuai et al. | 71.50 | 54.07 | 48.72 |
| FreqNet | 70.08 | 52.59 | 48.27 |
| Ba et al. | 68.78 | 54.80 | 50.80 |
| SeqFakeFormer | 81.62 | 68.53 | 66.97 |
| FAITH (DCT) | 81.70 | 68.56 | 67.02 |
| FAITH (FFT) | 81.75 | 68.58 | 67.03 |
| **FAITH (DWT)** | **81.87** | **68.84** | **67.26** |
The paper reports that performance drops as edit chains become longer, and that **DWT-based FAITH** is the strongest average variant overall.
### Robustness Settings
The paper also evaluates robustness under:
| Perturbation | Levels |
|---|---|
| JPEG compression | 25%, 50%, 75% |
| Gaussian noise | 10%, 15%, 20% |
---
## Repository Contents
This Hugging Face repository hosts the **public release** only.
| File | Description |
|---|---|
| `SEED_subset_1.zip SEED_subset_2.zip SEED_subset_3.zip` | Public training archive |
| `FAITH` | Baseline execution folder |
| `prediction.zip` | Optional example submission file |
This repository does **not** contain:
- hidden test labels
- hidden reference annotations
- official private evaluation data
Those components are handled through **CodaBench**.
---
## Intended Usage
This dataset is intended for:
- deepfake forensics research
- diffusion-edit provenance tracing
- edit-order prediction
- localization and evidence analysis
- robustness benchmarking under image degradation
Recommended workflow:
1. Download and extract the public training data from this repository.
2. Train or fine-tune your method locally.
3. Validate locally using your own protocol.
4. Submit predictions to **CodaBench** for official hidden-set evaluation.
**CodaBench:** `[here](https://www.codabench.org/competitions/edit/15351/)`
---
## Data Usage Policy
Please use this dataset for **research, benchmarking, and forensic analysis** only.
Please do **not** use it for:
- identity recognition or surveillance
- face-based profiling
- deceptive content generation
- unauthorized inference about real individuals
Users should also respect the licenses and usage conditions of the original source datasets and any benchmark-specific release conditions.
---
# FAITH Baseline Setup and Usage
This repository provides the **FAITH baseline** and the associated training data package for the SeqDeepFake setting.
## Repository Contents
After downloading the full repository, the top-level structure is expected to look like this:
```text
.
├── FAITH/
├── assets/
├── .gitattributes
├── README.md
├── req.txt
└── seqdeepfake_train_data.zip
```
## 1. Environment Setup
Create a new conda environment from `req.txt`, then activate it:
```bash
conda create -n <environment-name> --file req.txt
conda activate <environment-name>
```
Example:
```bash
conda create -n faith --file req.txt
conda activate faith
```
## 2. Download the Complete Repository
Please download the **complete repository contents** from Hugging Face, not only the code folder.
You should have all of the following at the repository root:
- `FAITH/`
- `assets/`
- `req.txt`
- `README.md`
- `seqdeepfake_train_data.zip`
If you download from the web interface instead, make sure the downloaded archive is fully extracted before continuing.
## 3. Prepare the Training Data
The training data is provided as:
```text
seqdeepfake_train_data.zip
```
Unzip this file into the `FAITH` directory, then rename the extracted folder to `data`.
Run the following commands from the repository root:
```bash
unzip seqdeepfake_train_data.zip -d FAITH/
```
Then rename the extracted folder to `data`.
For example, if the extracted folder is named `seqdeepfake_train_data`, run:
```bash
mv FAITH/seqdeepfake_train_data FAITH/data
```
After this step, the expected structure should be:
```text
.
├── FAITH/
│ ├── data/
│ ├── ...
├── assets/
├── .gitattributes
├── README.md
├── req.txt
└── seqdeepfake_train_data.zip
```
## 4. Verify the Data Placement
Before running the baseline, confirm that the dataset is located at:
```text
FAITH/data
```
That is the expected folder name used by the baseline instructions in this repository.
## 5. Run the Baseline
After the environment is ready and the dataset has been placed in `FAITH/data`, enter the `FAITH` directory and run the baseline script.
```bash
cd FAITH
python train.sh
```
Please replace `<your_main_script>.py` with the actual entry script used in your repository.
If your repository provides separate scripts for training and evaluation, use the appropriate one instead, for example:
```bash
cd FAITH
python train.sh
```
or
```bash
cd FAITH
python test.sh
```
## Notes
1. Make sure you download the **full repository contents**, not only individual files.
2. Make sure the extracted dataset folder is renamed exactly to `data`.
3. If `unzip` is not installed on your system, install it first or extract the archive manually.
4. If the extracted folder name is different on your machine, rename that extracted folder to `FAITH/data`.
5. If the project has a custom launch script, use that script instead of the generic `python <your_main_script>.py` command.
## Troubleshooting
### `PackagesNotFoundError` during conda creation
This usually means some packages in `req.txt` are unavailable in your current conda channels. In that case, try updating conda first, or recreate the environment with the channels required by your project.
### The dataset cannot be found
Check that the final path is exactly:
```text
FAITH/data
```
### `python: can't open file ...`
This means the entry script name is different from the placeholder command in this README. Please replace `<your_main_script>.py` with the actual script name in the `FAITH` folder.
---
If you are preparing the Hugging Face repository page, you can copy this file directly as the project `README.md` and then replace the script placeholder with the exact training or evaluation command used by your codebase.
---
提供机构:
Mengieong



