Sete007/HPDv3

Name: Sete007/HPDv3
Creator: Sete007
Published: 2026-03-04 09:06:43
License: 暂无描述

Hugging Face2026-03-04 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/Sete007/HPDv3

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit language: - en pretty_name: HPDv3 size_categories: - 1M<n<10M --- <div align="center"> # 🎯 HPSv3: Towards Wid-Spectrum Human Preference Score (ICCV 2025) [![Project Website](https://img.shields.io/badge/🌐-Project%20Website-deepgray)](https://research.nvidia.com/labs/par/addit/) [![arXiv](https://img.shields.io/badge/arXiv-2411.07232-b31b1b.svg)](https://arxiv.org/abs/2508.03789) [![ICCV 2025](https://img.shields.io/badge/ICCV-2025-blue.svg)](https://arxiv.org/abs/2508.03789) [![Model](https://img.shields.io/badge/🤗-Model-yellow)](https://huggingface.co/MizzenAI/HPSv3) [![Code](https://img.shields.io/badge/Code-black?logo=github)](https://github.com/MizzenAI/HPSv3) **Yuhang Ma**1,3*&ensp; **Yunhao Shui**1,4*&ensp; **Xiaoshi Wu**2&ensp; **Keqiang Sun**1,2†&ensp; **Hongsheng Li**2,5,6† 1Mizzen AI&ensp;&ensp; 2CUHK MMLab&ensp;&ensp; 3King’s College London&ensp;&ensp; 4Shanghai Jiaotong University&ensp;&ensp; 5Shanghai AI Laboratory&ensp;&ensp; 6CPII, InnoHK&ensp;&ensp; *Equal Contribution&ensp; †Equal Advising </div> <img src="assets/teaser.png" alt="Teaser" width="900"/> # Human Preference Dataset v3 Human Preference Dataset v3 (HPD v3) comprises **1.08M** text-image pairs and **1.17M** annotated pairwise data. To modeling the wide spectrum of human preference, we introduce newest state-of-the-art generative models and high quality real photographs while maintaining old models and lower quality real images. ## How to Use ```bash cat images.tar.gz.* | gunzip | tar -xv ``` ## Detail information of HPDv3 | Image Source | Type | Num Image | Prompt Source | Split | |--------------|------|-----------|---------------|-------| | High Quality Image (HQI) | Real Image | 57759 | VLM Caption | Train & Test | | MidJourney | - | 331955 | User | Train | | CogView4 | DiT | 400 | HQI+HPDv2+JourneyDB | Test | | FLUX.1 dev | DiT | 48927 | HQI+HPDv2+JourneyDB | Train & Test | | Infinity | Autoregressive | 27061 | HQI+HPDv2+JourneyDB | Train & Test | | Kolors | DiT | 49705 | HQI+HPDv2+JourneyDB | Train & Test | | HunyuanDiT | DiT | 46133 | HQI+HPDv2+JourneyDB | Train & Test | | Stable Diffusion 3 Medium | DiT | 49266 | HQI+HPDv2+JourneyDB | Train & Test | | Stable Diffusion XL | Diffusion | 49025 | HQI+HPDv2+JourneyDB | Train & Test | | Pixart Sigma | Diffusion | 400 | HQI+HPDv2+JourneyDB | Test | | Stable Diffusion 2 | Diffusion | 19124 | HQI+JourneyDB | Train & Test | | CogView2 | Autoregressive | 3823 | HQI+JourneyDB | Train & Test | | FuseDream | Diffusion | 468 | HQI+JourneyDB | Train & Test | | VQ-Diffusion | Diffusion | 18837 | HQI+JourneyDB | Train & Test | | Glide | Diffusion | 19989 | HQI+JourneyDB | Train & Test | | Stable Diffusion 1.4 | Diffusion | 18596 | HQI+JourneyDB | Train & Test | | Stable Diffusion 1.1 | Diffusion | 19043 | HQI+JourneyDB | Train & Test | | Curated HPDv2 | - | 327763 | - | Train | ## Dataset Visualization <img src="assets/datasetvisual_0.jpg" alt="Dataset" width="900"/> ## Dataset Structure ### All Annotated Pairs (`all.json`) **Important Notes: In HPDv3, we simply put the preferred sample at the first place (path1)** `all.json` contains **all** annotated pairs except for test. There are three types of training samples in the json file. ```json [ // samples from HPDv3 annotation pipeline { "prompt": "Description of the visual content or the generation prompt.", "choice_dist": [12, 7], // Distribution of votes from annotators (12 votes for image1, 7 votes for image2) "confidence": 0.9999907, // Confidence score reflecting preference reliability, based on annotators' capabilities (independent of choice_dist) "path1": "images/uuid1.jpg", // File path to the preferred image "path2": "images/uuid2.jpg", // File path to the non-preferred image "model1": "flux", // Model used to generate the preferred image (path1) "model2": "infinity" // Model used to generate the non-preferred image (path2) }, // samples from Midjourney { "prompt": "Description of the visual content or the generation prompt.", "choice_dist": null, // No distribution of votes Information from Discord "confidence": null, // No Confidence Information from Discord "path1": "images/uuid1.jpg", // File path to the preferred image. "path2": "images/uuid2.jpg", // File path to the non-preferred image. "model1": "midjourney", // Comparsion between images generated from midjourney "model2": "midjourney" // Comparsion between images generated from midjourney }, // samples from Curated HPDv2 { "prompt": "Description of the visual content or the generation prompt.", "choice_dist": null, // No distribution of votes Information from the original HPDv2 traindataset "confidence": null, // No Confidence Information from the original HPDv2 traindataset "path1": "images/uuid1.jpg", // File path to the preferred image. "path2": "images/uuid2.jpg", // File path to the non-preferred image. "model1": "hpdv2", // No specific model name in the original HPDv2 traindataset, set to hpdv2 "model2": "hpdv2" // No specific model name in the original HPDv2 traindataset, set to hpdv2 }, ... ] ``` ### Train set (`train.json`) We sample part of training data from `all.json` to build training dataset `train.json`. Moreover, to improve robustness, we integrate random sampled part of data from [Pick-a-pic](https://huggingface.co/datasets/pickapic-anonymous/pickapic_v1) and [ImageRewardDB](https://huggingface.co/datasets/zai-org/ImageRewardDB), which is `pickapic.json` and `imagereward.json`. For these two datasets, we only provide the pair infomation, and its corresponding image can be found in their official dataset repository. ### Test Set (`test.json`) ```json [ { "prompt": "Description of the visual content", "path1": "images/uuid1.jpg", // Preferred sample "path2": "images/uuid2.jpg", // Unpreferred sample "model1": "flux", //Model used to generate the preferred sample (path1). "model2": "infinity", //Model used to generate the non-preferred sample (path2). }, ... ] ```

提供机构：

Sete007

5,000+

优质数据集

54 个

任务类型

进入经典数据集