ProcessBench
收藏魔搭社区2026-05-15 更新2024-12-21 收录
下载链接:
https://modelscope.cn/datasets/Qwen/ProcessBench
下载链接
链接失效反馈官方服务:
资源简介:
# ProcessBench
This repository contains the dataset of the [ProcessBench](https://huggingface.co/papers/2412.06559) benchmark proposed by Qwen Team.
You can refer to our [GitHub repository](https://github.com/QwenLM/ProcessBench) for the evaluation code and the prompt templates we use in this work.
If you find this work relevant or helpful to your work, please kindly cite us:
```
@article{processbench,
title={ProcessBench: Identifying Process Errors in Mathematical Reasoning},
author={
Chujie Zheng and Zhenru Zhang and Beichen Zhang and Runji Lin and Keming Lu and
Bowen Yu and Dayiheng Liu and Jingren Zhou and Junyang Lin
},
journal={arXiv preprint arXiv:2412.06559},
year={2024}
}
```
## Data Usage
You can use the following code to preview the dataset:
```python
import json
from datasets import load_dataset
dataset = load_dataset('Qwen/ProcessBench', split='gsm8k')
print(json.dumps(dataset[0], indent=2))
# Expected output:
"""
{
"id": "gsm8k-0",
"generator": "Qwen2-7B-Instruct",
"problem": "Sue lives in a fun neighborhood...",
"steps": [
"To find out how many more pink plastic flamingos were out than...",
...
],
"final_answer_correct": false,
"label": 1
}
"""
```
# ProcessBench
本仓库收录由Qwen团队提出的ProcessBench(数学推理流程错误识别基准测试集)基准的配套数据集。相关学术论文可通过链接[https://huggingface.co/papers/2412.06559](https://huggingface.co/papers/2412.06559)查阅。
本研究采用的评估代码与提示词模板,可参阅我们的[GitHub仓库](https://github.com/QwenLM/ProcessBench)。
若您的研究与本工作相关或从中获益,请引用本文:
@article{processbench,
title={ProcessBench: Identifying Process Errors in Mathematical Reasoning},
author={
Chujie Zheng and Zhenru Zhang and Beichen Zhang and Runji Lin and Keming Lu and
Bowen Yu and Dayiheng Liu and Jingren Zhou and Junyang Lin
},
journal={arXiv preprint arXiv:2412.06559},
year={2024}
}
## 数据集使用方式
您可通过以下代码预览本数据集:
python
import json
from datasets import load_dataset
dataset = load_dataset('Qwen/ProcessBench', split='gsm8k')
print(json.dumps(dataset[0], indent=2))
# 预期输出:
"""
{
"id": "gsm8k-0",
"generator": "Qwen2-7B-Instruct",
"problem": "Sue lives in a fun neighborhood...",
"steps": [
"To find out how many more pink plastic flamingos were out than...",
...
],
"final_answer_correct": false,
"label": 1
}
"""
提供机构:
maas
创建时间:
2024-12-19



