PPE-IFEval-Best-of-K

Name: PPE-IFEval-Best-of-K
Creator: maas
Published: 2025-12-05 12:04:39
License: 暂无描述

魔搭社区2025-12-05 更新2025-04-26 收录

下载链接：

https://modelscope.cn/datasets/lmarena-ai/PPE-IFEval-Best-of-K

下载链接

链接失效反馈

官方服务：

资源简介：

# Overview This contains the IFEval correctness preference evaluation set for Preference Proxy Evaluations. The prompts are sampled from [IFEval](https://huggingface.co/datasets/google/IFEval). This dataset is meant for benchmarking and evaluation, not for training. [Paper](https://arxiv.org/abs/2410.14872) [Code](https://github.com/lmarena/PPE) # License User prompts are licensed under Apache-2.0, and model outputs are governed by the terms of use set by the respective model providers. # Citation ``` @misc{frick2024evaluaterewardmodelsrlhf, title={How to Evaluate Reward Models for RLHF}, author={Evan Frick and Tianle Li and Connor Chen and Wei-Lin Chiang and Anastasios N. Angelopoulos and Jiantao Jiao and Banghua Zhu and Joseph E. Gonzalez and Ion Stoica}, year={2024}, eprint={2410.14872}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2410.14872}, } ```

# 数据集概览本数据集包含用于偏好代理评估（Preference Proxy Evaluations）的IFEval正确性偏好评估集。该数据集的提示词采样自[IFEval](https://huggingface.co/datasets/google/IFEval)。本数据集仅用于基准测试与评估，不可用于模型训练。 [论文](https://arxiv.org/abs/2410.14872) [代码](https://github.com/lmarena/PPE) # 许可协议用户提示词遵循Apache-2.0开源协议，模型输出需遵守对应模型服务商的使用条款。 # 引用格式 @misc{frick2024evaluaterewardmodelsrlhf, title={如何评估面向人类反馈强化学习（Reinforcement Learning from Human Feedback, RLHF）的奖励模型}, author={Evan Frick and Tianle Li and Connor Chen and Wei-Lin Chiang and Anastasios N. Angelopoulos and Jiantao Jiao and Banghua Zhu and Joseph E. Gonzalez and Ion Stoica}, year={2024}, eprint={2410.14872}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2410.14872}, }

提供机构：

maas

创建时间：

2025-04-21

搜集汇总

数据集介绍