lmarena-ai/PPE-GPQA-Best-of-K
收藏Hugging Face2024-10-22 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/lmarena-ai/PPE-GPQA-Best-of-K
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: question_id
dtype: string
- name: Question
dtype: string
- name: Subdomain
dtype: string
- name: Record ID
dtype: string
- name: High-level domain
dtype: string
- name: choices
sequence: string
- name: correct_choice_index
dtype: int64
- name: correct_choice
dtype: string
- name: model_name
dtype: string
- name: parsed_outputs
sequence: string
- name: scores
sequence: bool
- name: mean_score
dtype: float64
- name: prompt
dtype: string
- name: response_1
dtype: string
- name: response_2
dtype: string
- name: response_3
dtype: string
- name: response_4
dtype: string
- name: response_5
dtype: string
- name: response_6
dtype: string
- name: response_7
dtype: string
- name: response_8
dtype: string
- name: response_9
dtype: string
- name: response_10
dtype: string
- name: response_11
dtype: string
- name: response_12
dtype: string
- name: response_13
dtype: string
- name: response_14
dtype: string
- name: response_15
dtype: string
- name: response_16
dtype: string
- name: response_17
dtype: string
- name: response_18
dtype: string
- name: response_19
dtype: string
- name: response_20
dtype: string
- name: response_21
dtype: string
- name: response_22
dtype: string
- name: response_23
dtype: string
- name: response_24
dtype: string
- name: response_25
dtype: string
- name: response_26
dtype: string
- name: response_27
dtype: string
- name: response_28
dtype: string
- name: response_29
dtype: string
- name: response_30
dtype: string
- name: response_31
dtype: string
- name: response_32
dtype: string
- name: canary_string
dtype: string
- name: conflict_pairs
sequence:
sequence: int64
- name: sampled_conflict_pairs
sequence:
sequence: int64
splits:
- name: train
num_bytes: 29495250
num_examples: 512
download_size: 13675059
dataset_size: 29495250
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Overview
This contains the GPQA correctness preference evaluation set for Preference Proxy Evaluations.
The prompts are sampled from [GPQA](https://huggingface.co/datasets/Idavidrein/gpqa).
This dataset is meant for benchmarking and evaluation, not for training.
[Paper](https://arxiv.org/abs/2410.14872)
[Code](https://github.com/lmarena/PPE)
# License
User prompts are licensed under CC BY 4.0, and model outputs are governed by the terms of use set by the respective model providers.
# Citation
```
@misc{frick2024evaluaterewardmodelsrlhf,
title={How to Evaluate Reward Models for RLHF},
author={Evan Frick and Tianle Li and Connor Chen and Wei-Lin Chiang and Anastasios N. Angelopoulos and Jiantao Jiao and Banghua Zhu and Joseph E. Gonzalez and Ion Stoica},
year={2024},
eprint={2410.14872},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.14872},
}
```
数据集信息:
特征字段:
- 字段名:question_id,数据类型:字符串
- 字段名:Question,数据类型:字符串
- 字段名:Subdomain,数据类型:字符串
- 字段名:Record ID,数据类型:字符串
- 字段名:High-level domain,数据类型:字符串
- 字段名:choices,数据类型:字符串序列
- 字段名:correct_choice_index,数据类型:64位整型
- 字段名:correct_choice,数据类型:字符串
- 字段名:model_name,数据类型:字符串
- 字段名:parsed_outputs,数据类型:字符串序列
- 字段名:scores,数据类型:布尔值序列
- 字段名:mean_score,数据类型:64位浮点型
- 字段名:prompt,数据类型:字符串
- 字段名:response_1,数据类型:字符串
- 字段名:response_2,数据类型:字符串
- 字段名:response_3,数据类型:字符串
- 字段名:response_4,数据类型:字符串
- 字段名:response_5,数据类型:字符串
- 字段名:response_6,数据类型:字符串
- 字段名:response_7,数据类型:字符串
- 字段名:response_8,数据类型:字符串
- 字段名:response_9,数据类型:字符串
- 字段名:response_10,数据类型:字符串
- 字段名:response_11,数据类型:字符串
- 字段名:response_12,数据类型:字符串
- 字段名:response_13,数据类型:字符串
- 字段名:response_14,数据类型:字符串
- 字段名:response_15,数据类型:字符串
- 字段名:response_16,数据类型:字符串
- 字段名:response_17,数据类型:字符串
- 字段名:response_18,数据类型:字符串
- 字段名:response_19,数据类型:字符串
- 字段名:response_20,数据类型:字符串
- 字段名:response_21,数据类型:字符串
- 字段名:response_22,数据类型:字符串
- 字段名:response_23,数据类型:字符串
- 字段名:response_24,数据类型:字符串
- 字段名:response_25,数据类型:字符串
- 字段名:response_26,数据类型:字符串
- 字段名:response_27,数据类型:字符串
- 字段名:response_28,数据类型:字符串
- 字段名:response_29,数据类型:字符串
- 字段名:response_30,数据类型:字符串
- 字段名:response_31,数据类型:字符串
- 字段名:response_32,数据类型:字符串
- 字段名:canary_string,数据类型:字符串
- 字段名:conflict_pairs,数据类型:整型序列的序列
- 字段名:sampled_conflict_pairs,数据类型:整型序列的序列
数据划分:
- 划分名称:train,字节大小:29495250,样本数量:512
下载大小:13675059,数据集总大小:29495250
数据集配置:
- 配置名称:default,数据文件:
- 划分train对应路径:data/train-*
# 概览
本数据集为面向偏好代理评估的GPQA正确性偏好评估集。
提示词采样自[GPQA](https://huggingface.co/datasets/Idavidrein/gpqa)数据集。
本数据集仅用于基准测试与模型评估,不可用于模型训练。
相关[论文](https://arxiv.org/abs/2410.14872)与[代码](https://github.com/lmarena/PPE)已公开。
# 授权协议
用户提示词采用CC BY 4.0协议进行授权,模型输出的使用需遵循对应模型提供商的服务条款。
# 引用
bibtex
@misc{frick2024evaluaterewardmodelsrlhf,
title={How to Evaluate Reward Models for RLHF},
author={Evan Frick and Tianle Li and Connor Chen and Wei-Lin Chiang and Anastasios N. Angelopoulos and Jiantao Jiao and Banghua Zhu and Joseph E. Gonzalez and Ion Stoica},
year={2024},
eprint={2410.14872},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.14872},
}
提供机构:
lmarena-ai



