ProcessBench

Name: ProcessBench
Creator: maas
Published: 2026-05-15 20:16:43
License: 暂无描述

魔搭社区2026-05-15 更新2024-12-21 收录

下载链接：

https://modelscope.cn/datasets/Qwen/ProcessBench

下载链接

链接失效反馈

官方服务：

资源简介：

# ProcessBench This repository contains the dataset of the [ProcessBench](https://huggingface.co/papers/2412.06559) benchmark proposed by Qwen Team. You can refer to our [GitHub repository](https://github.com/QwenLM/ProcessBench) for the evaluation code and the prompt templates we use in this work. If you find this work relevant or helpful to your work, please kindly cite us: ``` @article{processbench, title={ProcessBench: Identifying Process Errors in Mathematical Reasoning}, author={ Chujie Zheng and Zhenru Zhang and Beichen Zhang and Runji Lin and Keming Lu and Bowen Yu and Dayiheng Liu and Jingren Zhou and Junyang Lin }, journal={arXiv preprint arXiv:2412.06559}, year={2024} } ``` ## Data Usage You can use the following code to preview the dataset: ```python import json from datasets import load_dataset dataset = load_dataset('Qwen/ProcessBench', split='gsm8k') print(json.dumps(dataset[0], indent=2)) # Expected output: """ { "id": "gsm8k-0", "generator": "Qwen2-7B-Instruct", "problem": "Sue lives in a fun neighborhood...", "steps": [ "To find out how many more pink plastic flamingos were out than...", ... ], "final_answer_correct": false, "label": 1 } """ ```

# ProcessBench 本仓库收录由Qwen团队提出的ProcessBench（数学推理流程错误识别基准测试集）基准的配套数据集。相关学术论文可通过链接[https://huggingface.co/papers/2412.06559](https://huggingface.co/papers/2412.06559)查阅。本研究采用的评估代码与提示词模板，可参阅我们的[GitHub仓库](https://github.com/QwenLM/ProcessBench)。若您的研究与本工作相关或从中获益，请引用本文： @article{processbench, title={ProcessBench: Identifying Process Errors in Mathematical Reasoning}, author={ Chujie Zheng and Zhenru Zhang and Beichen Zhang and Runji Lin and Keming Lu and Bowen Yu and Dayiheng Liu and Jingren Zhou and Junyang Lin }, journal={arXiv preprint arXiv:2412.06559}, year={2024} } ## 数据集使用方式您可通过以下代码预览本数据集： python import json from datasets import load_dataset dataset = load_dataset('Qwen/ProcessBench', split='gsm8k') print(json.dumps(dataset[0], indent=2)) # 预期输出： """ { "id": "gsm8k-0", "generator": "Qwen2-7B-Instruct", "problem": "Sue lives in a fun neighborhood...", "steps": [ "To find out how many more pink plastic flamingos were out than...", ... ], "final_answer_correct": false, "label": 1 } """

提供机构：

maas

创建时间：

2024-12-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集