five

launch/thinkprm-1K-verification-cots

收藏
Hugging Face2026-04-18 更新2025-05-31 收录
下载链接:
https://hf-mirror.com/datasets/launch/thinkprm-1K-verification-cots
下载链接
链接失效反馈
官方服务:
资源简介:
ThinkPRM-Synthetic-Verification-1K数据集包含1000个高质量合成验证链式思维(CoTs),用于训练生成性过程奖励模型(PRMs),如论文《Process Reward Models that Think》中所使用。该数据集旨在创建一个高效的数据替代品,以替代传统PRM训练中通常需要的大量人工注释或昂贵的回放。每个实例由一个数学问题、一个相应的多步骤解决方案前缀(来自PRM800K数据集)和一个由QwQ-32B-Preview模型生成的详细验证CoTs组成。验证CoTs对解决方案前缀的每一步进行批评并提供步骤级别的正确性判断。为确保合成CoTs的高质量,只保留了所有步骤级别判断与PRM800K数据集的人类注释匹配的链。它们还根据正确的格式和长度约束进行了过滤,以避免在未过滤生成中观察到的过度思考等问题。

The ThinkPRM-Synthetic-Verification-1K dataset contains 1,000 high-quality synthetic verification chains-of-thought (CoTs) designed for training generative Process Reward Models (PRMs), as used in the paper Process Reward Models that Think. The dataset aims to create a data-efficient alternative to traditional PRM training, which often requires extensive human annotation or expensive rollouts. Each instance consists of a math problem, a corresponding multi-step solution prefix (sourced from PRM800K), and a detailed verification CoT generated by the QwQ-32B-Preview model. The verification CoT critiques each step of the solution prefix and provides a step-level correctness judgment. High-quality synthetic CoTs are ensured by retaining only chains where all step-level judgments matched the ground-truth human annotations from the PRM800K dataset and filtering based on correct formatting and length constraints.
提供机构:
launch
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作