Math-Shepherd
收藏魔搭社区2026-01-06 更新2024-11-23 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/Math-Shepherd
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for Math-Shepherd
Project Page: [Math-Shepherd](https://rain-motion-6ec.notion.site/Math-Shepherd-A-Label-Free-Step-by-Step-Verifier-for-LLMs-in-Mathematical-Reasoning-41b6e73c860840e08697d347f8889bac#08e86c6d44c4452ba0b78c7aaea5f4f7)
Paper: https://arxiv.org/pdf/2312.08935.pdf
# Data Loading
```
from datasets import load_dataset
dataset = load_dataset("peiyi9979/Math-Shepherd")
```
# Data Instance
Every instance consists of three data fields: "input," "label," and "task".
1. "input": problem + step-by-step solution, e.g.,
```
If Buzz bought a pizza with 78 slices at a restaurant and then decided to share it with the waiter in the ratio of 5:8, with Buzz's ratio being 5, what's twenty less the number of slices of pizza that the waiter ate?
Step 1: The total ratio representing the pizza is 5+8 = <<5+8=13>>13. ки
Step 2: The waiter ate 13 x 8 / 13 = <<13*8/13=6>>6 slices of the pizza. ки
Step 3: Buzz ate 78 - 6 = <<78-6=72>>72 slices of the pizza. ки
Step 4: The waiter ate 20 less than the number of slices that Buzz ate which is 72 - 20 = 52. ки
Step 5: The waiter ate 52 slices of the pizza. The answer is: 52 ки
```
2. "label": problem + step-by-step solution with automatic label, e.g.,
```
If Buzz bought a pizza with 78 slices at a restaurant and then decided to share it with the waiter in the ratio of 5:8, with Buzz's ratio being 5, what's twenty less the number of slices of pizza that the waiter ate?
Step 1: The total ratio representing the pizza is 5+8 = <<5+8=13>>13. +
Step 2: The waiter ate 13 x 8 / 13 = <<13*8/13=6>>6 slices of the pizza. -
Step 3: Buzz ate 78 - 6 = <<78-6=72>>72 slices of the pizza. -
Step 4: The waiter ate 20 less than the number of slices that Buzz ate which is 72 - 20 = 52. -
Step 5: The waiter ate 52 slices of the pizza. The answer is: 52 -
```
3. "task": `GSM8K` or `MATH`.
NOTE:
"`ки`" serves as a unique token denoting the position for predicting the step score.
"`+`" signifies a good step, as it has the potential to lead towards the correct answer.
"`-`" denotes a bad step.
When we train PRMs, we only compute the loss of the positions of `ки`.
# Models:
We utilized internal code for step-wise PPO training, which cannot be open-sourced. We hope for your understanding. We provide the checkpoints of SFT, PRM, and RL models to help everyone reproduce our results.
- Mistral-7b-sft: https://huggingface.co/peiyi9979/mistral-7b-sft
- Mistral-7b-prm: https://huggingface.co/peiyi9979/math-shepherd-mistral-7b-prm
- Mistral-7b-rl: https://huggingface.co/peiyi9979/math-shepherd-mistral-7b-rl
# 数学牧羊人(Math-Shepherd)数据集卡片
项目页面:[Math-Shepherd](https://rain-motion-6ec.notion.site/Math-Shepherd-A-Label-Free-Step-by-Step-Verifier-for-LLMs-in-Mathematical-Reasoning-41b6e73c860840e08697d347f8889bac#08e86c6d44c4452ba0b78c7aaea5f4f7)
论文:https://arxiv.org/pdf/2312.08935.pdf
## 数据加载
from datasets import load_dataset
dataset = load_dataset("peiyi9979/Math-Shepherd")
## 数据实例
每个数据实例包含三个字段:`input`、`label`与`task`。
1. **`input`**:问题文本加逐步解题过程,示例如下:
假设巴斯在餐厅购买了一份共78片的披萨,随后决定按照5:8的比例与服务员分享,其中巴斯的份额占比为5,请问服务员所吃披萨片数比78少20的结果是多少?
步骤1:披萨对应的总比例为5+8 = <<5+8=13>>13。 ки
步骤2:服务员食用的披萨片数为13 × 8 / 13 = <<13*8/13=6>>6。 ки
步骤3:巴斯食用的披萨片数为78 - 6 = <<78-6=72>>72。 ки
步骤4:服务员食用的披萨片数比巴斯少20,即72 - 20 = 52。 ки
步骤5:服务员最终食用了52片披萨。答案为:52 ки
2. **`label`**:带有自动标注的问题文本加逐步解题过程,示例如下:
假设巴斯在餐厅购买了一份共78片的披萨,随后决定按照5:8的比例与服务员分享,其中巴斯的份额占比为5,请问服务员所吃披萨片数比78少20的结果是多少?
步骤1:披萨对应的总比例为5+8 = <<5+8=13>>13。 +
步骤2:服务员食用的披萨片数为13 × 8 / 13 = <<13*8/13=6>>6。 -
步骤3:巴斯食用的披萨片数为78 - 6 = <<78-6=72>>72。 -
步骤4:服务员食用的披萨片数比巴斯少20,即72 - 20 = 52。 -
步骤5:服务员最终食用了52片披萨。答案为:52 -
3. **`task`**:取值为`GSM8K`或`MATH`。
### 注意事项
- `ки` 作为唯一标记符,用于标注步骤评分的预测位置。
- `+` 代表优质步骤,即该步骤可导向正确的解题结果。
- `-` 代表劣质步骤。
- 在训练过程奖励模型(Process Reward Model,PRM)时,仅对`ки`所在的位置计算损失函数。
## 模型资源
我们采用内部代码实现了逐步近端策略优化(Proximal Policy Optimization,PPO)训练,该代码无法开源,敬请谅解。为方便研究者复现实验结果,我们提供了监督微调(Supervised Fine-Tuning,SFT)、过程奖励模型(PRM)以及强化学习(Reinforcement Learning,RL)模型的检查点文件:
- Mistral-7b-sft:https://huggingface.co/peiyi9979/mistral-7b-sft
- Mistral-7b-prm:https://huggingface.co/peiyi9979/math-shepherd-mistral-7b-prm
- Mistral-7b-rl:https://huggingface.co/peiyi9979/math-shepherd-mistral-7b-rl
提供机构:
maas
创建时间:
2024-11-20



