kaikaiyao/sd-image-attribution

Name: kaikaiyao/sd-image-attribution
Creator: kaikaiyao
Published: 2026-04-02 13:28:11
License: 暂无描述

Hugging Face2026-04-02 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/kaikaiyao/sd-image-attribution

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: sd15 data_files: - "viewer-sd15.parquet" - config_name: sd21 data_files: - "viewer-sd21.parquet" - config_name: sdxl data_files: - "viewer-sdxl.parquet" - config_name: sdxl-turbo data_files: - "viewer-sdxl-turbo.parquet" - config_name: sd35-medium data_files: - "viewer-sd35-medium.parquet" - config_name: sd35-large-turbo data_files: - "viewer-sd35-large-turbo.parquet" license: other language: - en task_categories: - image-classification - text-to-image tags: - stable-diffusion - model-attribution - generated-images - benchmark --- # SD Image Attribution [![Images](https://img.shields.io/badge/images-70,992-blue)](https://huggingface.co/datasets/kaikaiyao/sd-image-attribution) [![Models](https://img.shields.io/badge/models-6-2ea44f)](https://huggingface.co/datasets/kaikaiyao/sd-image-attribution) [![Prompt Datasets](https://img.shields.io/badge/prompt_datasets-3-8250df)](https://huggingface.co/datasets/kaikaiyao/sd-image-attribution) [![Version](https://img.shields.io/badge/version-v1-111827)](https://huggingface.co/datasets/kaikaiyao/sd-image-attribution) A public benchmark for **image attribution across Stable Diffusion families**, built from six models and three prompt datasets. ![Showcase overview](preview/showcase-hero-fontrefresh.png) > Version `v1` includes PartiPrompts (`1,632` prompts), DrawBench (`200` prompts), DiffusionDB `sample_10k` (`10,000` prompts), `6` models, and `70,992` generated images. ## Why this dataset - Same prompt sources across multiple Stable Diffusion families make model-level comparisons cleaner and more controlled. - Public metadata links each image to its prompt source, model identity, and generation settings. - The current release is sized for practical experiments while staying simple to load from a single Parquet index. - The dataset viewer uses a lightweight thumbnail preview subset so you can browse by model in the config dropdown and narrow to a prompt dataset with the `prompt_dataset` filterable column. ## At a glance | Images | Models | Prompt datasets | Version | |---:|---:|---:|---| | **70,992** | **6** | **3** | **v1** | ## Coverage This release is designed for image provenance, model attribution, model fingerprinting, watermarking, and related generated-image analysis studies. It combines the same prompt sources with multiple Stable Diffusion variants so researchers can study model-specific artifacts under a shared prompt distribution. | Prompt dataset | Source | Release split | Prompts used | |---|---|---|---:| | PartiPrompts | `nateraw/parti-prompts` | full benchmark split | 1,632 | | DrawBench | `shunk031/DrawBench` | full benchmark split | 200 | | DiffusionDB | `poloclub/diffusiondb` | `sample_10k` curated slice | 10,000 | PartiPrompts and DrawBench use their full benchmark splits in this release. DiffusionDB uses the curated `sample_10k` release slice. | Model | Checkpoint | Resolution | Inference steps | Guidance scale | Scheduler | |---|---|---:|---:|---:|---| | Stable Diffusion 1.5 (`sd15`) | `stable-diffusion-v1-5/stable-diffusion-v1-5` | `512x512` | 30 | 7.5 | `PNDMScheduler` | | Stable Diffusion 2.1 (`sd21`) | `sd2-community/stable-diffusion-2-1` | `768x768` | 30 | 7.5 | `DDIMScheduler` | | Stable Diffusion XL (`sdxl`) | `stabilityai/stable-diffusion-xl-base-1.0` | `1024x1024` | 30 | 5.0 | `EulerDiscreteScheduler` | | SDXL Turbo (`sdxl-turbo`) | `stabilityai/sdxl-turbo` | `512x512` | 4 | 0.0 | `EulerAncestralDiscreteScheduler` | | Stable Diffusion 3.5 Medium (`sd35-medium`) | `stabilityai/stable-diffusion-3.5-medium` | `1024x1024` | 28 | 4.5 | `FlowMatchEulerDiscreteScheduler` | | Stable Diffusion 3.5 Large Turbo (`sd35-large-turbo`) | `stabilityai/stable-diffusion-3.5-large-turbo` | `1024x1024` | 4 | 0.0 | `FlowMatchEulerDiscreteScheduler` | ## Gallery ### PartiPrompts ![PartiPrompts preview](preview/parti-prompts-sheet-fontrefresh.png) ### DrawBench ![DrawBench preview](preview/drawbench-sheet-fontrefresh.png) ### DiffusionDB ![DiffusionDB preview](preview/diffusiondb-sheet-fontrefresh.png) ## How to use Load the main metadata table with `datasets`: ```python from datasets import load_dataset ds = load_dataset( "parquet", data_files={ "train": "https://huggingface.co/datasets/kaikaiyao/sd-image-attribution/resolve/main/metadata/all.parquet" }, )["train"] print(ds[0]["image_path"]) ``` Or read it directly with `pandas`: ```python import pandas as pd df = pd.read_parquet( "https://huggingface.co/datasets/kaikaiyao/sd-image-attribution/resolve/main/metadata/all.parquet" ) print(df[["dataset", "model_key", "prompt", "image_path"]].head()) ``` ## Metadata `metadata/all.parquet` is the main table for the release. - Identity and source: `dataset`, `dataset_split`, `dataset_row_id`, `model_key`, `resolved_model_id` - Prompt and generation: `prompt`, `seed`, `steps`, `guidance_scale`, `scheduler`, `width`, `height` - Image and integrity: `image_path`, `sha256`, `status` - Original prompt metadata: `source_record` keeps dataset-specific fields from PartiPrompts, DrawBench, or DiffusionDB Example row excerpt: ```json { "dataset": "parti-prompts", "dataset_split": "train", "dataset_row_id": 4, "model_key": "sdxl", "resolved_model_id": "stabilityai/stable-diffusion-xl-base-1.0", "prompt": "A watercolor fox reading a newspaper in a cafe", "seed": 3096288400, "steps": 30, "guidance_scale": 5.0, "width": 1024, "height": 1024, "image_path": "images/parti-prompts/sdxl/parti-prompts-000004-sdxl-b88e7b50.png", "sha256": "34f9f56d0f1bf3ef43a4a0ee6b4f614f6fc0a55e942d08dc4f28c5614fd4dfe7", "source_record": { "Prompt": "A watercolor fox reading a newspaper in a cafe", "Category": "Art" } } ``` ## Uses and limitations - Intended for research on image provenance, model attribution, model fingerprinting, watermarking, and related generated-image analysis tasks. - Only succeeded generations are included in the public release. - Future releases are intended to expand prompt coverage, model families, and overall scale. - Use of this dataset remains subject to the licenses and terms of the upstream prompt datasets and model checkpoints. ## Citation If you use this dataset, please cite: ```bibtex @dataset{yao2026sd_image_attribution, author = {{Kai Yao}}, title = {{SD Image Attribution}}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/kaikaiyao/sd-image-attribution}, note = {Version: v1} } ```

提供机构：

kaikaiyao

5,000+

优质数据集

54 个

任务类型

进入经典数据集