DDA-Training-Set

Name: DDA-Training-Set
Creator: maas
Published: 2026-01-09 10:15:27
License: 暂无描述

魔搭社区2026-01-09 更新2025-12-20 收录

下载链接：

https://modelscope.cn/datasets/JunweiXi/DDA-Training-Set

下载链接

链接失效反馈

官方服务：

资源简介：

# DDA Training Set ### Official Dataset for **Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable** **Conference:** 39th Conference on Neural Information Processing Systems (NeurIPS 2025) https://arxiv.org/abs/2505.14359 --- #### Dataset Description This dataset serves as the core training data for the paper **"Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable"**. It is designed to address **Format Bias**, **Content Bias**, and **Size Bias** prevalent in traditional AIGI detection datasets. Constructed based on the **MSCOCO** training set, this dataset utilizes the **Dual Data Alignment (DDA)** technique to generate synthetic images, ensuring that "real" and "fake" images are highly aligned in both the pixel and frequency domains. #### Composition * **Real Images:** Sourced from the MSCOCO training set. * **Synthetic Images:** Corresponding DDA-aligned synthetic images for each real image. #### Dataset Details & Formatting The training dataset is stored in the directory `DDA-COCO_TrainSet/`. * **File Format:** **PNG** (Lossless). * **Preprocessing Logic:** * **Spatial Alignment:** We crop each real image so that its height and width are **multiples of 8**. This step is crucial to ensure that VAE reconstructions are perfectly aligned with the original images in spatial resolution. * **Avoiding Format Bias:** All real and fake images are strictly saved in **PNG format**. If the cropped real images were re-saved as JPEG, they would undergo **double-JPEG compression**. This would introduce additional compression artifacts and undesirable format bias, potentially causing the detector to learn the compression history rather than the generation artifacts. #### Citation ```bibtex @inproceedings{chen2025dual, title={Dual Data Alignment Makes {AI}-Generated Image Detector Easier Generalizable}, author={Ruoxin Chen and Junwei Xi and Zhiyuan Yan and Ke-Yue Zhang and Shuang Wu and Jingyi Xie and Xu Chen and Lei Xu and Isabel Guan and Taiping Yao and Shouhong Ding}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, url={[https://openreview.net/forum?id=C39ShJwtD5](https://openreview.net/forum?id=C39ShJwtD5)} }

# 双数据对齐（Dual Data Alignment, DDA）训练集 ### **《双数据对齐提升AI生成图像检测器泛化能力》官方数据集** **会议：** 第39届神经信息处理系统大会（NeurIPS 2025），论文链接：https://arxiv.org/abs/2505.14359 --- #### 数据集说明本数据集为论文**《双数据对齐提升AI生成图像检测器泛化能力》**的核心训练数据。其旨在解决传统AI生成图像（AIGI）检测数据集普遍存在的格式偏差（Format Bias）、内容偏差（Content Bias）与尺寸偏差（Size Bias）问题。本数据集基于**MSCOCO**训练集构建，采用双数据对齐（DDA）技术生成合成图像，确保“真实”与“伪造”图像在像素域与频域均实现高度对齐。 #### 数据集组成 * **真实图像：** 源自MSCOCO训练集。 * **合成图像：** 与每张真实图像对应的经过DDA对齐的合成图像。 #### 数据集细节与格式规范本训练数据集存储于`DDA-COCO_TrainSet/`目录下。 * **文件格式：** **PNG**（无损压缩）。 * **预处理逻辑：** * **空间对齐：** 对每张真实图像进行裁剪，使其高和宽均为**8的整数倍**。该步骤可确保变分自编码器（Variational Autoencoder, VAE）的重建结果在空间分辨率上与原始图像完全对齐。 * **规避格式偏差：** 所有真实与伪造图像均严格以**PNG格式**存储。若将裁剪后的真实图像重新保存为JPEG格式，则会触发**两次JPEG压缩**，引入额外的压缩伪影与不必要的格式偏差，可能导致检测器学习的是图像的压缩历史而非生成伪影。 #### 引用格式 bibtex @inproceedings{chen2025dual, title={Dual Data Alignment Makes {AI}-Generated Image Detector Easier Generalizable}, author={Ruoxin Chen and Junwei Xi and Zhiyuan Yan and Ke-Yue Zhang and Shuang Wu and Jingyi Xie and Xu Chen and Lei Xu and Isabel Guan and Taiping Yao and Shouhong Ding}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, url={https://openreview.net/forum?id=C39ShJwtD5} }

提供机构：

maas

创建时间：

2025-12-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集