five

DDA-Training-Set

收藏
魔搭社区2026-01-09 更新2025-12-20 收录
下载链接:
https://modelscope.cn/datasets/JunweiXi/DDA-Training-Set
下载链接
链接失效反馈
官方服务:
资源简介:
# DDA Training Set ### Official Dataset for **Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable** **Conference:** 39th Conference on Neural Information Processing Systems (NeurIPS 2025) https://arxiv.org/abs/2505.14359 --- #### Dataset Description This dataset serves as the core training data for the paper **"Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable"**. It is designed to address **Format Bias**, **Content Bias**, and **Size Bias** prevalent in traditional AIGI detection datasets. Constructed based on the **MSCOCO** training set, this dataset utilizes the **Dual Data Alignment (DDA)** technique to generate synthetic images, ensuring that "real" and "fake" images are highly aligned in both the pixel and frequency domains. #### Composition * **Real Images:** Sourced from the MSCOCO training set. * **Synthetic Images:** Corresponding DDA-aligned synthetic images for each real image. #### Dataset Details & Formatting The training dataset is stored in the directory `DDA-COCO_TrainSet/`. * **File Format:** **PNG** (Lossless). * **Preprocessing Logic:** * **Spatial Alignment:** We crop each real image so that its height and width are **multiples of 8**. This step is crucial to ensure that VAE reconstructions are perfectly aligned with the original images in spatial resolution. * **Avoiding Format Bias:** All real and fake images are strictly saved in **PNG format**. If the cropped real images were re-saved as JPEG, they would undergo **double-JPEG compression**. This would introduce additional compression artifacts and undesirable format bias, potentially causing the detector to learn the compression history rather than the generation artifacts. #### Citation ```bibtex @inproceedings{chen2025dual, title={Dual Data Alignment Makes {AI}-Generated Image Detector Easier Generalizable}, author={Ruoxin Chen and Junwei Xi and Zhiyuan Yan and Ke-Yue Zhang and Shuang Wu and Jingyi Xie and Xu Chen and Lei Xu and Isabel Guan and Taiping Yao and Shouhong Ding}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, url={[https://openreview.net/forum?id=C39ShJwtD5](https://openreview.net/forum?id=C39ShJwtD5)} }

# 双数据对齐(Dual Data Alignment, DDA)训练集 ### **《双数据对齐提升AI生成图像检测器泛化能力》官方数据集** **会议:** 第39届神经信息处理系统大会(NeurIPS 2025),论文链接:https://arxiv.org/abs/2505.14359 --- #### 数据集说明 本数据集为论文**《双数据对齐提升AI生成图像检测器泛化能力》**的核心训练数据。 其旨在解决传统AI生成图像(AIGI)检测数据集普遍存在的格式偏差(Format Bias)、内容偏差(Content Bias)与尺寸偏差(Size Bias)问题。本数据集基于**MSCOCO**训练集构建,采用双数据对齐(DDA)技术生成合成图像,确保“真实”与“伪造”图像在像素域与频域均实现高度对齐。 #### 数据集组成 * **真实图像:** 源自MSCOCO训练集。 * **合成图像:** 与每张真实图像对应的经过DDA对齐的合成图像。 #### 数据集细节与格式规范 本训练数据集存储于`DDA-COCO_TrainSet/`目录下。 * **文件格式:** **PNG**(无损压缩)。 * **预处理逻辑:** * **空间对齐:** 对每张真实图像进行裁剪,使其高和宽均为**8的整数倍**。该步骤可确保变分自编码器(Variational Autoencoder, VAE)的重建结果在空间分辨率上与原始图像完全对齐。 * **规避格式偏差:** 所有真实与伪造图像均严格以**PNG格式**存储。若将裁剪后的真实图像重新保存为JPEG格式,则会触发**两次JPEG压缩**,引入额外的压缩伪影与不必要的格式偏差,可能导致检测器学习的是图像的压缩历史而非生成伪影。 #### 引用格式 bibtex @inproceedings{chen2025dual, title={Dual Data Alignment Makes {AI}-Generated Image Detector Easier Generalizable}, author={Ruoxin Chen and Junwei Xi and Zhiyuan Yan and Ke-Yue Zhang and Shuang Wu and Jingyi Xie and Xu Chen and Lei Xu and Isabel Guan and Taiping Yao and Shouhong Ding}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, url={https://openreview.net/forum?id=C39ShJwtD5} }
提供机构:
maas
创建时间:
2025-12-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作