five

K12-PEBench

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/lichongod/K12Vista
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为K12-PEBench,是一个高质量、由人工标注的基准评测数据集,旨在评估基于从K12-PEM-800K中选取的大约3000个样本的推理过程评估能力。此外,该数据集还涉及由验证团队进行的手动标注,以评估推理步骤的正确性并识别错误类型,其任务是对推理过程的准确性和错误识别进行评估。

The dataset named K12-PEBench is a high-quality, manually annotated benchmark evaluation dataset. It is designed to evaluate the capability of reasoning process assessment based on approximately 3,000 samples selected from K12-PEM-800K. Additionally, this dataset includes manual annotations performed by a validation team to assess the correctness of reasoning steps and identify error types, with its core task being the evaluation of both the accuracy of reasoning processes and error identification.
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作