K12-PEBench
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/lichongod/K12Vista
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为K12-PEBench,是一个高质量、由人工标注的基准评测数据集,旨在评估基于从K12-PEM-800K中选取的大约3000个样本的推理过程评估能力。此外,该数据集还涉及由验证团队进行的手动标注,以评估推理步骤的正确性并识别错误类型,其任务是对推理过程的准确性和错误识别进行评估。
The dataset named K12-PEBench is a high-quality, manually annotated benchmark evaluation dataset. It is designed to evaluate the capability of reasoning process assessment based on approximately 3,000 samples selected from K12-PEM-800K. Additionally, this dataset includes manual annotations performed by a validation team to assess the correctness of reasoning steps and identify error types, with its core task being the evaluation of both the accuracy of reasoning processes and error identification.



