five

lmarena-ai/PPE-GPQA-Best-of-K

收藏
Hugging Face2024-10-22 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/lmarena-ai/PPE-GPQA-Best-of-K
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: question_id dtype: string - name: Question dtype: string - name: Subdomain dtype: string - name: Record ID dtype: string - name: High-level domain dtype: string - name: choices sequence: string - name: correct_choice_index dtype: int64 - name: correct_choice dtype: string - name: model_name dtype: string - name: parsed_outputs sequence: string - name: scores sequence: bool - name: mean_score dtype: float64 - name: prompt dtype: string - name: response_1 dtype: string - name: response_2 dtype: string - name: response_3 dtype: string - name: response_4 dtype: string - name: response_5 dtype: string - name: response_6 dtype: string - name: response_7 dtype: string - name: response_8 dtype: string - name: response_9 dtype: string - name: response_10 dtype: string - name: response_11 dtype: string - name: response_12 dtype: string - name: response_13 dtype: string - name: response_14 dtype: string - name: response_15 dtype: string - name: response_16 dtype: string - name: response_17 dtype: string - name: response_18 dtype: string - name: response_19 dtype: string - name: response_20 dtype: string - name: response_21 dtype: string - name: response_22 dtype: string - name: response_23 dtype: string - name: response_24 dtype: string - name: response_25 dtype: string - name: response_26 dtype: string - name: response_27 dtype: string - name: response_28 dtype: string - name: response_29 dtype: string - name: response_30 dtype: string - name: response_31 dtype: string - name: response_32 dtype: string - name: canary_string dtype: string - name: conflict_pairs sequence: sequence: int64 - name: sampled_conflict_pairs sequence: sequence: int64 splits: - name: train num_bytes: 29495250 num_examples: 512 download_size: 13675059 dataset_size: 29495250 configs: - config_name: default data_files: - split: train path: data/train-* --- # Overview This contains the GPQA correctness preference evaluation set for Preference Proxy Evaluations. The prompts are sampled from [GPQA](https://huggingface.co/datasets/Idavidrein/gpqa). This dataset is meant for benchmarking and evaluation, not for training. [Paper](https://arxiv.org/abs/2410.14872) [Code](https://github.com/lmarena/PPE) # License User prompts are licensed under CC BY 4.0, and model outputs are governed by the terms of use set by the respective model providers. # Citation ``` @misc{frick2024evaluaterewardmodelsrlhf, title={How to Evaluate Reward Models for RLHF}, author={Evan Frick and Tianle Li and Connor Chen and Wei-Lin Chiang and Anastasios N. Angelopoulos and Jiantao Jiao and Banghua Zhu and Joseph E. Gonzalez and Ion Stoica}, year={2024}, eprint={2410.14872}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2410.14872}, } ```

数据集信息: 特征字段: - 字段名:question_id,数据类型:字符串 - 字段名:Question,数据类型:字符串 - 字段名:Subdomain,数据类型:字符串 - 字段名:Record ID,数据类型:字符串 - 字段名:High-level domain,数据类型:字符串 - 字段名:choices,数据类型:字符串序列 - 字段名:correct_choice_index,数据类型:64位整型 - 字段名:correct_choice,数据类型:字符串 - 字段名:model_name,数据类型:字符串 - 字段名:parsed_outputs,数据类型:字符串序列 - 字段名:scores,数据类型:布尔值序列 - 字段名:mean_score,数据类型:64位浮点型 - 字段名:prompt,数据类型:字符串 - 字段名:response_1,数据类型:字符串 - 字段名:response_2,数据类型:字符串 - 字段名:response_3,数据类型:字符串 - 字段名:response_4,数据类型:字符串 - 字段名:response_5,数据类型:字符串 - 字段名:response_6,数据类型:字符串 - 字段名:response_7,数据类型:字符串 - 字段名:response_8,数据类型:字符串 - 字段名:response_9,数据类型:字符串 - 字段名:response_10,数据类型:字符串 - 字段名:response_11,数据类型:字符串 - 字段名:response_12,数据类型:字符串 - 字段名:response_13,数据类型:字符串 - 字段名:response_14,数据类型:字符串 - 字段名:response_15,数据类型:字符串 - 字段名:response_16,数据类型:字符串 - 字段名:response_17,数据类型:字符串 - 字段名:response_18,数据类型:字符串 - 字段名:response_19,数据类型:字符串 - 字段名:response_20,数据类型:字符串 - 字段名:response_21,数据类型:字符串 - 字段名:response_22,数据类型:字符串 - 字段名:response_23,数据类型:字符串 - 字段名:response_24,数据类型:字符串 - 字段名:response_25,数据类型:字符串 - 字段名:response_26,数据类型:字符串 - 字段名:response_27,数据类型:字符串 - 字段名:response_28,数据类型:字符串 - 字段名:response_29,数据类型:字符串 - 字段名:response_30,数据类型:字符串 - 字段名:response_31,数据类型:字符串 - 字段名:response_32,数据类型:字符串 - 字段名:canary_string,数据类型:字符串 - 字段名:conflict_pairs,数据类型:整型序列的序列 - 字段名:sampled_conflict_pairs,数据类型:整型序列的序列 数据划分: - 划分名称:train,字节大小:29495250,样本数量:512 下载大小:13675059,数据集总大小:29495250 数据集配置: - 配置名称:default,数据文件: - 划分train对应路径:data/train-* # 概览 本数据集为面向偏好代理评估的GPQA正确性偏好评估集。 提示词采样自[GPQA](https://huggingface.co/datasets/Idavidrein/gpqa)数据集。 本数据集仅用于基准测试与模型评估,不可用于模型训练。 相关[论文](https://arxiv.org/abs/2410.14872)与[代码](https://github.com/lmarena/PPE)已公开。 # 授权协议 用户提示词采用CC BY 4.0协议进行授权,模型输出的使用需遵循对应模型提供商的服务条款。 # 引用 bibtex @misc{frick2024evaluaterewardmodelsrlhf, title={How to Evaluate Reward Models for RLHF}, author={Evan Frick and Tianle Li and Connor Chen and Wei-Lin Chiang and Anastasios N. Angelopoulos and Jiantao Jiao and Banghua Zhu and Joseph E. Gonzalez and Ion Stoica}, year={2024}, eprint={2410.14872}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2410.14872}, }
提供机构:
lmarena-ai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作