trl-lib/prm800k
收藏Hugging Face2025-01-08 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/trl-lib/prm800k
下载链接
链接失效反馈官方服务:
资源简介:
PRM800K数据集是一个处理过的版本,基于OpenAI的PRM800K,专为使用TRL库进行逐步监督任务训练模型而设计。它包含了针对MATH数据集中问题的解决方案的800,000个步骤级正确性标签,使得模型能够学习和验证每个解决步骤的正确性,从而提高它们的推理能力。
The PRM800K dataset is a processed version of OpenAIs PRM800K, designed for training models using the TRL library for stepwise supervision tasks. It contains 800,000 step-level correctness labels for model-generated solutions to problems from the MATH dataset, enabling models to learn and verify the correctness of each step in a solution, thereby enhancing their reasoning capabilities.
提供机构:
trl-lib



