PPE-GPQA-Best-of-K
收藏魔搭社区2025-12-05 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/lmarena-ai/PPE-GPQA-Best-of-K
下载链接
链接失效反馈官方服务:
资源简介:
# Overview
This contains the GPQA correctness preference evaluation set for Preference Proxy Evaluations.
The prompts are sampled from [GPQA](https://huggingface.co/datasets/Idavidrein/gpqa).
This dataset is meant for benchmarking and evaluation, not for training.
[Paper](https://arxiv.org/abs/2410.14872)
[Code](https://github.com/lmarena/PPE)
# License
User prompts are licensed under CC BY 4.0, and model outputs are governed by the terms of use set by the respective model providers.
# Citation
```
@misc{frick2024evaluaterewardmodelsrlhf,
title={How to Evaluate Reward Models for RLHF},
author={Evan Frick and Tianle Li and Connor Chen and Wei-Lin Chiang and Anastasios N. Angelopoulos and Jiantao Jiao and Banghua Zhu and Joseph E. Gonzalez and Ion Stoica},
year={2024},
eprint={2410.14872},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.14872},
}
```
# 数据集概览
本数据集包含用于偏好代理评估(Preference Proxy Evaluations)的GPQA正确性偏好评估集。
本数据集的提示词(Prompt)采样自GPQA(https://huggingface.co/datasets/Idavidrein/gpqa)。
本数据集仅用于基准测试与模型评估,不得用于模型训练。
[论文](https://arxiv.org/abs/2410.14872)
[代码](https://github.com/lmarena/PPE)
# 授权协议
用户提示词遵循CC BY 4.0协议授权,模型输出则受对应模型提供商的使用条款约束。
# 引用格式
@misc{frick2024evaluaterewardmodelsrlhf,
title={如何为强化学习人类反馈(RLHF)评估奖励模型},
author={Evan Frick and Tianle Li and Connor Chen and Wei-Lin Chiang and Anastasios N. Angelopoulos and Jiantao Jiao and Banghua Zhu and Joseph E. Gonzalez and Ion Stoica},
year={2024},
eprint={2410.14872},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.14872},
}
提供机构:
maas
创建时间:
2025-04-21
搜集汇总
数据集介绍

背景与挑战
背景概述
PPE-GPQA-Best-of-K是一个用于GPQA正确性偏好评估的数据集,专门设计用于基准测试和评估,而非训练目的。其许可证为Apache License 2.0,用户提示基于CC BY 4.0许可,模型输出则遵循相应模型提供商的条款。该数据集基于2024年的相关研究构建。
以上内容由遇见数据集搜集并总结生成



