PRISM-36K: A Benchmark Dataset for AI-Generated Image Attribution

Name: PRISM-36K: A Benchmark Dataset for AI-Generated Image Attribution
Creator: Zenodo
Published: 2026-05-07 08:16:44
License: 暂无描述

DataCite Commons2026-05-07 更新2026-05-07 收录

下载链接：

https://zenodo.org/doi/10.5281/zenodo.20038953

下载链接

链接失效反馈

官方服务：

资源简介：

PRISM-36K: A Benchmark Dataset for AI-Generated Image Attribution PRISM-36K is a benchmark dataset of 36,000 AI-generated images for model-attribution research — the task of identifying which generative modelproduced a given image.It accompanies the paper "PRISM: Phase-enhanced Radial-based Image Signature Mapping for AI-Generated Image Attribution" (Ricco, Onofri, Cima, Cresci, Di Pietro; arXiv:2509.15270). What is in the dataset The dataset contains 36,000 PNG images at 512 × 512 pixels, balanced across six text-to-image generators with 6,000 images per model: DALL-E 2 (Ramesh et al., 2022) — closed, accessed via OpenAI API FuseDream (Liu et al., 2021) — GAN + CLIP guidance PixArt-α (Chen et al., 2024) — diffusion transformer SANA (Xie et al., 2024) — diffusion transformer Stable Diffusion 1.4 (Rombach et al., 2022) — latent diffusion VQGAN-CLIP (Esser et al., 2021) — GAN + CLIP guidance Each generator produces 150 images per prompt over a fixed set of 40 author-written English prompts (20 short + 20 long, paired by topic).All images are stored in lossless PNG format to preserve frequency-domain artefacts that are critical to spectral attribution methods. What makes this dataset useful Prompt-matched generations. The same 40 prompts are issued to every generator, so cross-model differences reflect generator-specific signatures rather than prompt drift. Architectural diversity. The six generators span GAN-based, CLIP-guided, and transformer-based diffusion families, with both open-weight and closed-API systems represented. Reproducible splits. 100 random prompt-level train/test splits used in the paper are shipped as splits/splits_100.csv; one canonical "average split" (splits/average_split.json) is provided for direct reproduction of all figures and tables. Lossless integrity. Every image ships with a SHA-256 hash in checksums/SHA256SUMS (BSD-style, compatible with sha256sum -c) so users can verify their downloads. Rich metadata. Per-image manifest (metadata/images.csv) and prompt manifest (metadata/prompts.csv) support filtering by model, prompt length, prompt pair, or specific generation iteration. Repository layout PRISM-36K/├── README.md├── LICENSE.txt├── CITATION.cff├── CHANGELOG.md├── metadata/│ ├── prompts.csv│ └── images.csv├── splits/│ ├── average_split.json│ └── splits_100.csv├── images/│ ├── DALLE-2/│ ├── FuseDream/│ ├── PixArt-alpha/│ ├── SANA/│ ├── StableDiffusion-1.4/│ └── VQGAN-CLIP/└── checksums/ └── SHA256SUMS Image filename convention: <ModelName>_<promptid>_<iter>.png, with promptid ∈ 1..40 and iter ∈ 1..150. Intended uses Training and evaluating model-attribution classifiers for AI-generated images. Benchmarking real vs. fake detectors in a controlled multi-source setting. Studying frequency-domain and spectral fingerprints of generative models. Research on content provenance, generative-AI accountability, and related forensic problems. Companion resources Paper: arXiv:2509.15270 Image-generation scripts (the code used to produce these images): github.com/emarich18-res/PRISM-36K PRISM classifier and evaluation code: released upon full paper acceptance. Licensing Dataset (images and metadata): Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). Note on DALL-E 2 images. The 6,000 images in images/dalle2/ were generated via OpenAI's paid API and are subject to OpenAI's usage policies in addition to CC BY 4.0: users intending to use these images beyond academic research should consult OpenAI's current terms of service. Note on NVIDIA-SANA images. The 6,000 images in images/sana/ are licensed under the Apache License 2.0 usage policies in addition to CC BY 4.0. Citing PRISM-36K If you use this dataset, please cite both the paper and this Zenodo record. BibTeX entries and a CFF citation file are provided in the repository (README.md, CITATION.cff). Limitations Closed-set scope. The dataset covers six specific generators; it is not designed to support open-set attribution to unseen models. English-only prompts authored by the dataset creators; no multilingual or in-the-wild prompts are included. Synthetic only. No real photographs are included; for real vs. fake benchmarks, real images must be sourced from a complementary dataset. No identifiable individuals. Prompts were authored to elicit generic scenes (objects, animals, landscapes); the dataset contains no images of identifiable real persons by design.

提供机构：

Zenodo

创建时间：

2026-05-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集