kaikaiyao/sd-image-attribution
收藏Hugging Face2026-04-02 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/kaikaiyao/sd-image-attribution
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: sd15
data_files:
- "viewer-sd15.parquet"
- config_name: sd21
data_files:
- "viewer-sd21.parquet"
- config_name: sdxl
data_files:
- "viewer-sdxl.parquet"
- config_name: sdxl-turbo
data_files:
- "viewer-sdxl-turbo.parquet"
- config_name: sd35-medium
data_files:
- "viewer-sd35-medium.parquet"
- config_name: sd35-large-turbo
data_files:
- "viewer-sd35-large-turbo.parquet"
license: other
language:
- en
task_categories:
- image-classification
- text-to-image
tags:
- stable-diffusion
- model-attribution
- generated-images
- benchmark
---
# SD Image Attribution
[](https://huggingface.co/datasets/kaikaiyao/sd-image-attribution)
[](https://huggingface.co/datasets/kaikaiyao/sd-image-attribution)
[](https://huggingface.co/datasets/kaikaiyao/sd-image-attribution)
[](https://huggingface.co/datasets/kaikaiyao/sd-image-attribution)
A public benchmark for **image attribution across Stable Diffusion families**, built from six models and three prompt datasets.

> Version `v1` includes PartiPrompts (`1,632` prompts), DrawBench (`200` prompts), DiffusionDB `sample_10k` (`10,000` prompts), `6` models, and `70,992` generated images.
## Why this dataset
- Same prompt sources across multiple Stable Diffusion families make model-level comparisons cleaner and more controlled.
- Public metadata links each image to its prompt source, model identity, and generation settings.
- The current release is sized for practical experiments while staying simple to load from a single Parquet index.
- The dataset viewer uses a lightweight thumbnail preview subset so you can browse by model in the config dropdown and narrow to a prompt dataset with the `prompt_dataset` filterable column.
## At a glance
| Images | Models | Prompt datasets | Version |
|---:|---:|---:|---|
| **70,992** | **6** | **3** | **v1** |
## Coverage
This release is designed for image provenance, model attribution, model fingerprinting, watermarking, and related generated-image analysis studies. It combines the same prompt sources with multiple Stable Diffusion variants so researchers can study model-specific artifacts under a shared prompt distribution.
| Prompt dataset | Source | Release split | Prompts used |
|---|---|---|---:|
| PartiPrompts | `nateraw/parti-prompts` | full benchmark split | 1,632 |
| DrawBench | `shunk031/DrawBench` | full benchmark split | 200 |
| DiffusionDB | `poloclub/diffusiondb` | `sample_10k` curated slice | 10,000 |
PartiPrompts and DrawBench use their full benchmark splits in this release. DiffusionDB uses the curated `sample_10k` release slice.
| Model | Checkpoint | Resolution | Inference steps | Guidance scale | Scheduler |
|---|---|---:|---:|---:|---|
| Stable Diffusion 1.5 (`sd15`) | `stable-diffusion-v1-5/stable-diffusion-v1-5` | `512x512` | 30 | 7.5 | `PNDMScheduler` |
| Stable Diffusion 2.1 (`sd21`) | `sd2-community/stable-diffusion-2-1` | `768x768` | 30 | 7.5 | `DDIMScheduler` |
| Stable Diffusion XL (`sdxl`) | `stabilityai/stable-diffusion-xl-base-1.0` | `1024x1024` | 30 | 5.0 | `EulerDiscreteScheduler` |
| SDXL Turbo (`sdxl-turbo`) | `stabilityai/sdxl-turbo` | `512x512` | 4 | 0.0 | `EulerAncestralDiscreteScheduler` |
| Stable Diffusion 3.5 Medium (`sd35-medium`) | `stabilityai/stable-diffusion-3.5-medium` | `1024x1024` | 28 | 4.5 | `FlowMatchEulerDiscreteScheduler` |
| Stable Diffusion 3.5 Large Turbo (`sd35-large-turbo`) | `stabilityai/stable-diffusion-3.5-large-turbo` | `1024x1024` | 4 | 0.0 | `FlowMatchEulerDiscreteScheduler` |
## Gallery
### PartiPrompts

### DrawBench

### DiffusionDB

## How to use
Load the main metadata table with `datasets`:
```python
from datasets import load_dataset
ds = load_dataset(
"parquet",
data_files={
"train": "https://huggingface.co/datasets/kaikaiyao/sd-image-attribution/resolve/main/metadata/all.parquet"
},
)["train"]
print(ds[0]["image_path"])
```
Or read it directly with `pandas`:
```python
import pandas as pd
df = pd.read_parquet(
"https://huggingface.co/datasets/kaikaiyao/sd-image-attribution/resolve/main/metadata/all.parquet"
)
print(df[["dataset", "model_key", "prompt", "image_path"]].head())
```
## Metadata
`metadata/all.parquet` is the main table for the release.
- Identity and source: `dataset`, `dataset_split`, `dataset_row_id`, `model_key`, `resolved_model_id`
- Prompt and generation: `prompt`, `seed`, `steps`, `guidance_scale`, `scheduler`, `width`, `height`
- Image and integrity: `image_path`, `sha256`, `status`
- Original prompt metadata: `source_record` keeps dataset-specific fields from PartiPrompts, DrawBench, or DiffusionDB
Example row excerpt:
```json
{
"dataset": "parti-prompts",
"dataset_split": "train",
"dataset_row_id": 4,
"model_key": "sdxl",
"resolved_model_id": "stabilityai/stable-diffusion-xl-base-1.0",
"prompt": "A watercolor fox reading a newspaper in a cafe",
"seed": 3096288400,
"steps": 30,
"guidance_scale": 5.0,
"width": 1024,
"height": 1024,
"image_path": "images/parti-prompts/sdxl/parti-prompts-000004-sdxl-b88e7b50.png",
"sha256": "34f9f56d0f1bf3ef43a4a0ee6b4f614f6fc0a55e942d08dc4f28c5614fd4dfe7",
"source_record": {
"Prompt": "A watercolor fox reading a newspaper in a cafe",
"Category": "Art"
}
}
```
## Uses and limitations
- Intended for research on image provenance, model attribution, model fingerprinting, watermarking, and related generated-image analysis tasks.
- Only succeeded generations are included in the public release.
- Future releases are intended to expand prompt coverage, model families, and overall scale.
- Use of this dataset remains subject to the licenses and terms of the upstream prompt datasets and model checkpoints.
## Citation
If you use this dataset, please cite:
```bibtex
@dataset{yao2026sd_image_attribution,
author = {{Kai Yao}},
title = {{SD Image Attribution}},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/kaikaiyao/sd-image-attribution},
note = {Version: v1}
}
```
提供机构:
kaikaiyao



