DDA-Training-Set
收藏魔搭社区2026-01-09 更新2025-12-20 收录
下载链接:
https://modelscope.cn/datasets/JunweiXi/DDA-Training-Set
下载链接
链接失效反馈官方服务:
资源简介:
# DDA Training Set
### Official Dataset for **Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable**
**Conference:** 39th Conference on Neural Information Processing Systems (NeurIPS 2025) https://arxiv.org/abs/2505.14359
---
#### Dataset Description
This dataset serves as the core training data for the paper **"Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable"**.
It is designed to address **Format Bias**, **Content Bias**, and **Size Bias** prevalent in traditional AIGI detection datasets. Constructed based on the **MSCOCO** training set, this dataset utilizes the **Dual Data Alignment (DDA)** technique to generate synthetic images, ensuring that "real" and "fake" images are highly aligned in both the pixel and frequency domains.
#### Composition
* **Real Images:** Sourced from the MSCOCO training set.
* **Synthetic Images:** Corresponding DDA-aligned synthetic images for each real image.
#### Dataset Details & Formatting
The training dataset is stored in the directory `DDA-COCO_TrainSet/`.
* **File Format:** **PNG** (Lossless).
* **Preprocessing Logic:**
* **Spatial Alignment:** We crop each real image so that its height and width are **multiples of 8**. This step is crucial to ensure that VAE reconstructions are perfectly aligned with the original images in spatial resolution.
* **Avoiding Format Bias:** All real and fake images are strictly saved in **PNG format**. If the cropped real images were re-saved as JPEG, they would undergo **double-JPEG compression**. This would introduce additional compression artifacts and undesirable format bias, potentially causing the detector to learn the compression history rather than the generation artifacts.
#### Citation
```bibtex
@inproceedings{chen2025dual,
title={Dual Data Alignment Makes {AI}-Generated Image Detector Easier Generalizable},
author={Ruoxin Chen and Junwei Xi and Zhiyuan Yan and Ke-Yue Zhang and Shuang Wu and Jingyi Xie and Xu Chen and Lei Xu and Isabel Guan and Taiping Yao and Shouhong Ding},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={[https://openreview.net/forum?id=C39ShJwtD5](https://openreview.net/forum?id=C39ShJwtD5)}
}
# 双数据对齐(Dual Data Alignment, DDA)训练集
### **《双数据对齐提升AI生成图像检测器泛化能力》官方数据集**
**会议:** 第39届神经信息处理系统大会(NeurIPS 2025),论文链接:https://arxiv.org/abs/2505.14359
---
#### 数据集说明
本数据集为论文**《双数据对齐提升AI生成图像检测器泛化能力》**的核心训练数据。
其旨在解决传统AI生成图像(AIGI)检测数据集普遍存在的格式偏差(Format Bias)、内容偏差(Content Bias)与尺寸偏差(Size Bias)问题。本数据集基于**MSCOCO**训练集构建,采用双数据对齐(DDA)技术生成合成图像,确保“真实”与“伪造”图像在像素域与频域均实现高度对齐。
#### 数据集组成
* **真实图像:** 源自MSCOCO训练集。
* **合成图像:** 与每张真实图像对应的经过DDA对齐的合成图像。
#### 数据集细节与格式规范
本训练数据集存储于`DDA-COCO_TrainSet/`目录下。
* **文件格式:** **PNG**(无损压缩)。
* **预处理逻辑:**
* **空间对齐:** 对每张真实图像进行裁剪,使其高和宽均为**8的整数倍**。该步骤可确保变分自编码器(Variational Autoencoder, VAE)的重建结果在空间分辨率上与原始图像完全对齐。
* **规避格式偏差:** 所有真实与伪造图像均严格以**PNG格式**存储。若将裁剪后的真实图像重新保存为JPEG格式,则会触发**两次JPEG压缩**,引入额外的压缩伪影与不必要的格式偏差,可能导致检测器学习的是图像的压缩历史而非生成伪影。
#### 引用格式
bibtex
@inproceedings{chen2025dual,
title={Dual Data Alignment Makes {AI}-Generated Image Detector Easier Generalizable},
author={Ruoxin Chen and Junwei Xi and Zhiyuan Yan and Ke-Yue Zhang and Shuang Wu and Jingyi Xie and Xu Chen and Lei Xu and Isabel Guan and Taiping Yao and Shouhong Ding},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=C39ShJwtD5}
}
提供机构:
maas
创建时间:
2025-12-09



