OVM-process
收藏魔搭社区2025-11-02 更新2025-01-25 收录
下载链接:
https://modelscope.cn/datasets/FreedomIntelligence/OVM-process
下载链接
链接失效反馈官方服务:
资源简介:
The training dataset of GSM8K for process reward models in the paper [OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning](https://arxiv.org/pdf/2311.09724.pdf), where the responses were generated by llama2-7b and the labels were annotated by GPT-4.
Steps are split by the newlines in the response. `step_labels` indicates the logical correctness of steps, defined as "logically correct and it's based on accurate premises, not necessarily helps to solve the problem"; `step_labels_progress` indicates helpfulness of steps, defined as "logically correct, based on accurate premises, and helps to solve the problem".
本数据集为刊载于论文《OVM:数学推理规划中的结果监督价值模型》(https://arxiv.org/pdf/2311.09724.pdf)的面向过程奖励模型的GSM8K训练数据集,其模型生成的回复由llama2-7b产出,数据集标签由GPT-4标注。
数据集中的推理步骤以回复内的换行符进行分割。`step_labels`用于表征步骤的逻辑正确性,其定义为:"逻辑正确且基于准确前提,未必有助于问题求解";`step_labels_progress`用于表征步骤的解题辅助性,其定义为:"逻辑正确且基于准确前提,同时有助于问题求解"。
提供机构:
maas
创建时间:
2025-01-20



