five

Sete007/HPDv3

收藏
Hugging Face2026-03-04 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Sete007/HPDv3
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - en pretty_name: HPDv3 size_categories: - 1M<n<10M --- <div align="center"> # 🎯 HPSv3: Towards Wid-Spectrum Human Preference Score (ICCV 2025) [![Project Website](https://img.shields.io/badge/🌐-Project%20Website-deepgray)](https://research.nvidia.com/labs/par/addit/) [![arXiv](https://img.shields.io/badge/arXiv-2411.07232-b31b1b.svg)](https://arxiv.org/abs/2508.03789) [![ICCV 2025](https://img.shields.io/badge/ICCV-2025-blue.svg)](https://arxiv.org/abs/2508.03789) [![Model](https://img.shields.io/badge/🤗-Model-yellow)](https://huggingface.co/MizzenAI/HPSv3) [![Code](https://img.shields.io/badge/Code-black?logo=github)](https://github.com/MizzenAI/HPSv3) **Yuhang Ma**<sup>1,3*</sup>&ensp; **Yunhao Shui**<sup>1,4*</sup>&ensp; **Xiaoshi Wu**<sup>2</sup>&ensp; **Keqiang Sun**<sup>1,2†</sup>&ensp; **Hongsheng Li**<sup>2,5,6†</sup> <sup>1</sup>Mizzen AI&ensp;&ensp; <sup>2</sup>CUHK MMLab&ensp;&ensp; <sup>3</sup>King’s College London&ensp;&ensp; <sup>4</sup>Shanghai Jiaotong University&ensp;&ensp; <sup>5</sup>Shanghai AI Laboratory&ensp;&ensp; <sup>6</sup>CPII, InnoHK&ensp;&ensp; <sup>*</sup>Equal Contribution&ensp; <sup>†</sup>Equal Advising </div> <p align="center"> <img src="assets/teaser.png" alt="Teaser" width="900"/> </p> # Human Preference Dataset v3 Human Preference Dataset v3 (HPD v3) comprises **1.08M** text-image pairs and **1.17M** annotated pairwise data. To modeling the wide spectrum of human preference, we introduce newest state-of-the-art generative models and high quality real photographs while maintaining old models and lower quality real images. ## How to Use ```bash cat images.tar.gz.* | gunzip | tar -xv ``` ## Detail information of HPDv3 | Image Source | Type | Num Image | Prompt Source | Split | |--------------|------|-----------|---------------|-------| | High Quality Image (HQI) | Real Image | 57759 | VLM Caption | Train & Test | | MidJourney | - | 331955 | User | Train | | CogView4 | DiT | 400 | HQI+HPDv2+JourneyDB | Test | | FLUX.1 dev | DiT | 48927 | HQI+HPDv2+JourneyDB | Train & Test | | Infinity | Autoregressive | 27061 | HQI+HPDv2+JourneyDB | Train & Test | | Kolors | DiT | 49705 | HQI+HPDv2+JourneyDB | Train & Test | | HunyuanDiT | DiT | 46133 | HQI+HPDv2+JourneyDB | Train & Test | | Stable Diffusion 3 Medium | DiT | 49266 | HQI+HPDv2+JourneyDB | Train & Test | | Stable Diffusion XL | Diffusion | 49025 | HQI+HPDv2+JourneyDB | Train & Test | | Pixart Sigma | Diffusion | 400 | HQI+HPDv2+JourneyDB | Test | | Stable Diffusion 2 | Diffusion | 19124 | HQI+JourneyDB | Train & Test | | CogView2 | Autoregressive | 3823 | HQI+JourneyDB | Train & Test | | FuseDream | Diffusion | 468 | HQI+JourneyDB | Train & Test | | VQ-Diffusion | Diffusion | 18837 | HQI+JourneyDB | Train & Test | | Glide | Diffusion | 19989 | HQI+JourneyDB | Train & Test | | Stable Diffusion 1.4 | Diffusion | 18596 | HQI+JourneyDB | Train & Test | | Stable Diffusion 1.1 | Diffusion | 19043 | HQI+JourneyDB | Train & Test | | Curated HPDv2 | - | 327763 | - | Train | ## Dataset Visualization <p align="left"> <img src="assets/datasetvisual_0.jpg" alt="Dataset" width="900"/> </p> ## Dataset Structure ### All Annotated Pairs (`all.json`) **Important Notes: In HPDv3, we simply put the preferred sample at the first place (path1)** `all.json` contains **all** annotated pairs except for test. There are three types of training samples in the json file. ```json [ // samples from HPDv3 annotation pipeline { "prompt": "Description of the visual content or the generation prompt.", "choice_dist": [12, 7], // Distribution of votes from annotators (12 votes for image1, 7 votes for image2) "confidence": 0.9999907, // Confidence score reflecting preference reliability, based on annotators' capabilities (independent of choice_dist) "path1": "images/uuid1.jpg", // File path to the preferred image "path2": "images/uuid2.jpg", // File path to the non-preferred image "model1": "flux", // Model used to generate the preferred image (path1) "model2": "infinity" // Model used to generate the non-preferred image (path2) }, // samples from Midjourney { "prompt": "Description of the visual content or the generation prompt.", "choice_dist": null, // No distribution of votes Information from Discord "confidence": null, // No Confidence Information from Discord "path1": "images/uuid1.jpg", // File path to the preferred image. "path2": "images/uuid2.jpg", // File path to the non-preferred image. "model1": "midjourney", // Comparsion between images generated from midjourney "model2": "midjourney" // Comparsion between images generated from midjourney }, // samples from Curated HPDv2 { "prompt": "Description of the visual content or the generation prompt.", "choice_dist": null, // No distribution of votes Information from the original HPDv2 traindataset "confidence": null, // No Confidence Information from the original HPDv2 traindataset "path1": "images/uuid1.jpg", // File path to the preferred image. "path2": "images/uuid2.jpg", // File path to the non-preferred image. "model1": "hpdv2", // No specific model name in the original HPDv2 traindataset, set to hpdv2 "model2": "hpdv2" // No specific model name in the original HPDv2 traindataset, set to hpdv2 }, ... ] ``` ### Train set (`train.json`) We sample part of training data from `all.json` to build training dataset `train.json`. Moreover, to improve robustness, we integrate random sampled part of data from [Pick-a-pic](https://huggingface.co/datasets/pickapic-anonymous/pickapic_v1) and [ImageRewardDB](https://huggingface.co/datasets/zai-org/ImageRewardDB), which is `pickapic.json` and `imagereward.json`. For these two datasets, we only provide the pair infomation, and its corresponding image can be found in their official dataset repository. ### Test Set (`test.json`) ```json [ { "prompt": "Description of the visual content", "path1": "images/uuid1.jpg", // Preferred sample "path2": "images/uuid2.jpg", // Unpreferred sample "model1": "flux", //Model used to generate the preferred sample (path1). "model2": "infinity", //Model used to generate the non-preferred sample (path2). }, ... ] ```
提供机构:
Sete007
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作