DeepGuardDB: Real and Text-to-Image Synthetic Images Dataset
收藏DataCite Commons2024-10-07 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/deepguarddb-real-and-text-image-synthetic-images-dataset
下载链接
链接失效反馈官方服务:
资源简介:
"Recent advancements in deep learning and generative models have significantly enhanced text-to-image (T2I) synthesis, allowing for the creation of highly realistic images based on textual inputs. While this progress has expanded the creative and practical applications of AI, it also presents new challenges in distinguishing between authentic and AI-generated images. This challenge raises serious concerns in areas such as security, privacy, and digital forensics. In response, there has been growing attention on the development of advanced AI-based detectors designed to reliably differentiate between synthetic and real images, ensuring data authenticity and protection against potential misuse. Using reliable and diverse datasets of fake and real data is crucial for training and evaluating the learning models effectively. For that, the research community has made significant efforts to develop dedicated datasets for this specific purpose. As the T2I generation tools continue to evolve rapidly, there is an ongoing need to update and refine existing datasets to keep pace with the latest advancements. This constant evolution drives us to continuously improve our resources, ensuring that they reflect the state-of-the-art in image generation. In this context, we have constructed the DeepGuardDB dataset, which plays a pivotal role in evaluating and enhancing models designed to differentiate between AI-generated images and real ones. To ensure a comprehensive and representative evaluation, the DeepGuardDB dataset has been meticulously curated, addressing the limitations of existing datasets by incorporating a diverse array of visual content. DeepGuardDB dataset leverages Stable Diffusion3, which produces higher-quality images in addition to Imagen and DALL-E 3. DeepGuardDB contains 12,000 images, evenly split between real and generated images, with 6000 (50%) representing each category. The real images included in DeepGuardDB are collected from two well-established datasets, each recognized for its richness and diversity: MS-COCO (Microsoft Common Objects in Context) and Flickr30k. For the AI-generated images, DeepGuardDB leverages three of the most advanced T2I generation platforms available today: Stable Diffusion 3, Imagen, and DALL-E 3. The synthetic images were created using the same prompts as those used to generate the real images. By employing identical textual descriptions, the AI aimed to produce images that closely resemble the authentic ones. This approach highlights the challenge of distinguishing between real and AI-generated content, as the use of the same prompts ensures that both sets of images share similar themes, subjects, and visual cues"
近年来,深度学习与生成式模型的技术进展极大地提升了文本到图像(Text-to-Image, T2I)合成的性能,使得基于文本输入生成高度逼真的图像成为现实。尽管这类技术进展拓展了人工智能的创意与实用应用场景,但也为区分真实图像与AI生成图像带来了全新挑战。该挑战在安全、隐私及数字取证等领域引发了诸多严峻关切。对此,学界愈发关注研发基于人工智能的先进检测模型,以可靠区分合成图像与真实图像,从而保障数据真实性并防范潜在的滥用风险。构建可靠且多样的真伪图像数据集,对于高效训练与评估学习模型至关重要。为此,国际学界已投入大量精力开发针对该场景的专用数据集。随着T2I生成工具的持续快速迭代,现有数据集亟需更新与优化,以跟上最新的技术进展。这类技术的持续演进推动我们不断优化现有资源,确保其能够反映图像生成领域的最新技术水平。在此背景下,我们构建了DeepGuardDB数据集,该数据集在评估与优化用于区分AI生成图像与真实图像的模型方面发挥着关键作用。为保障评估的全面性与代表性,DeepGuardDB数据集经过精心编撰,通过纳入多样化的视觉内容,弥补了现有数据集的诸多局限。DeepGuardDB数据集依托Stable Diffusion 3、Imagen及DALL-E 3三类主流生成模型构建,其中Stable Diffusion 3可生成画质更优异的图像。DeepGuardDB数据集共包含12000张图像,真实图像与AI生成图像各占一半,每类均为6000张(占比50%)。DeepGuardDB中的真实图像采集自两个成熟且以内容丰富度与多样性著称的数据集:MS-COCO(Microsoft Common Objects in Context,微软通用物体上下文数据集)与Flickr30k。而AI生成图像则依托当前三款最先进的T2I生成平台构建:Stable Diffusion 3、Imagen及DALL-E 3。合成图像均采用与生成真实图像完全一致的提示词(prompt)进行创作。通过使用完全相同的文本描述,AI生成模型将生成与真实图像高度相似的合成图像。该设计思路凸显了区分真实内容与AI生成内容的难度:由于采用相同的提示词,两类图像在主题、主体及视觉线索上均高度相似。
提供机构:
IEEE DataPort
创建时间:
2024-10-07



