peiyi9979/Math-Shepherd

Name: peiyi9979/Math-Shepherd
Creator: peiyi9979
Published: 2024-01-03 06:13:49
License: 暂无描述

Hugging Face2024-01-03 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/peiyi9979/Math-Shepherd

下载链接

链接失效反馈

官方服务：

资源简介：

--- tags: - prm - synthesized data --- # Dataset Card for Math-Shepherd Project Page: [Math-Shepherd](https://rain-motion-6ec.notion.site/Math-Shepherd-A-Label-Free-Step-by-Step-Verifier-for-LLMs-in-Mathematical-Reasoning-41b6e73c860840e08697d347f8889bac#08e86c6d44c4452ba0b78c7aaea5f4f7) Paper: https://arxiv.org/pdf/2312.08935.pdf # Data Loading ``` from datasets import load_dataset dataset = load_dataset("peiyi9979/Math-Shepherd") ``` # Data Instance Every instance consists of three data fields: "input," "label," and "task". 1. "input": problem + step-by-step solution, e.g., ``` If Buzz bought a pizza with 78 slices at a restaurant and then decided to share it with the waiter in the ratio of 5:8, with Buzz's ratio being 5, what's twenty less the number of slices of pizza that the waiter ate? Step 1: The total ratio representing the pizza is 5+8 = <<5+8=13>>13. ки Step 2: The waiter ate 13 x 8 / 13 = <<13*8/13=6>>6 slices of the pizza. ки Step 3: Buzz ate 78 - 6 = <<78-6=72>>72 slices of the pizza. ки Step 4: The waiter ate 20 less than the number of slices that Buzz ate which is 72 - 20 = 52. ки Step 5: The waiter ate 52 slices of the pizza. The answer is: 52 ки ``` 2. "label": problem + step-by-step solution with automatic label, e.g., ``` If Buzz bought a pizza with 78 slices at a restaurant and then decided to share it with the waiter in the ratio of 5:8, with Buzz's ratio being 5, what's twenty less the number of slices of pizza that the waiter ate? Step 1: The total ratio representing the pizza is 5+8 = <<5+8=13>>13. + Step 2: The waiter ate 13 x 8 / 13 = <<13*8/13=6>>6 slices of the pizza. - Step 3: Buzz ate 78 - 6 = <<78-6=72>>72 slices of the pizza. - Step 4: The waiter ate 20 less than the number of slices that Buzz ate which is 72 - 20 = 52. - Step 5: The waiter ate 52 slices of the pizza. The answer is: 52 - ``` 3. "task": `GSM8K` or `MATH`. NOTE: "`ки`" serves as a unique token denoting the position for predicting the step score. "`+`" signifies a good step, as it has the potential to lead towards the correct answer. "`-`" denotes a bad step. When we train PRMs, we only compute the loss of the positions of `ки`. # Models: We utilized internal code for step-wise PPO training, which cannot be open-sourced. We hope for your understanding. We provide the checkpoints of SFT, PRM, and RL models to help everyone reproduce our results. - Mistral-7b-sft: https://huggingface.co/peiyi9979/mistral-7b-sft - Mistral-7b-prm: https://huggingface.co/peiyi9979/math-shepherd-mistral-7b-prm - Mistral-7b-rl: https://huggingface.co/peiyi9979/math-shepherd-mistral-7b-rl

--- 标签： - prm - 合成数据 --- # 数学牧羊人（Math-Shepherd）数据集卡片项目页面：[数学牧羊人（Math-Shepherd）](https://rain-motion-6ec.notion.site/Math-Shepherd-A-Label-Free-Step-by-Step-Verifier-for-LLMs-in-Mathematical-Reasoning-41b6e73c860840e08697d347f8889bac#08e86c6d44c4452ba0b78c7aaea5f4f7) 论文：https://arxiv.org/pdf/2312.08935.pdf # 数据加载 from datasets import load_dataset dataset = load_dataset("peiyi9979/Math-Shepherd") # 数据实例每个数据实例包含三个数据字段：`input`、`label`与`task`。 1. `input`：题目与逐步解题过程，示例如下：巴兹在餐厅购买了一份共78片的披萨，随后决定按照5:8的比例与服务员分享披萨（巴兹所占比例为5）。请问服务员所食用的披萨片数减去20的结果是多少？步骤1：披萨对应的总比例为5+8 = <<5+8=13>>13。 ки 步骤2：服务员食用的披萨片数为13 × 8 / 13 = <<13*8/13=6>>6。 ки 步骤3：巴兹食用的披萨片数为78 - 6 = <<78-6=72>>72。 ки 步骤4：服务员食用的披萨片数比巴兹少20，即72 - 20 = 52。 ки 步骤5：服务员食用了52片披萨。最终答案为：52 ки 2. `label`：带有自动标注的题目与逐步解题过程，示例如下：巴兹在餐厅购买了一份共78片的披萨，随后决定按照5:8的比例与服务员分享披萨（巴兹所占比例为5）。请问服务员所食用的披萨片数减去20的结果是多少？步骤1：披萨对应的总比例为5+8 = <<5+8=13>>13。 + 步骤2：服务员食用的披萨片数为13 × 8 / 13 = <<13*8/13=6>>6。 - 步骤3：巴兹食用的披萨片数为78 - 6 = <<78-6=72>>72。 - 步骤4：服务员食用的披萨片数比巴兹少20，即72 - 20 = 52。 - 步骤5：服务员食用了52片披萨。最终答案为：52 - 3. `task`：取值为`GSM8K`或`MATH`。备注： `ки` 作为专属Token（Token），用于标记步骤评分的预测位置。 `+` 代表优质步骤，即该步骤有望导向正确解题结果。 `-` 代表劣质步骤。在训练偏好奖励模型（PRM）时，我们仅对`ки`标记的位置计算损失值。 # 模型相关说明我们采用内部代码完成逐步骤近端策略优化（PPO）训练，该代码无法开源，敬请谅解。我们提供了监督微调（SFT）模型、偏好奖励模型（PRM）以及强化学习（RL）模型的权重checkpoint，以方便研究者复现我们的实验结果。 - Mistral-7b-sft: https://huggingface.co/peiyi9979/mistral-7b-sft - Mistral-7b-prm: https://huggingface.co/peiyi9979/math-shepherd-mistral-7b-prm - Mistral-7b-rl: https://huggingface.co/peiyi9979/math-shepherd-mistral-7b-rl

提供机构：

peiyi9979

原始信息汇总

数据集卡片：Math-Shepherd

数据加载

python from datasets import load_dataset dataset = load_dataset("peiyi9979/Math-Shepherd")

数据实例

每个实例包含三个数据字段："input"、"label" 和 "task"。

"input"：问题 + 逐步解决方案，例如：

If Buzz bought a pizza with 78 slices at a restaurant and then decided to share it with the waiter in the ratio of 5:8, with Buzzs ratio being 5, whats twenty less the number of slices of pizza that the waiter ate?

Step 1: The total ratio representing the pizza is 5+8 = <<5+8=13>>13. ки

Step 2: The waiter ate 13 x 8 / 13 = <<13*8/13=6>>6 slices of the pizza. ки

Step 3: Buzz ate 78 - 6 = <<78-6=72>>72 slices of the pizza. ки

Step 4: The waiter ate 20 less than the number of slices that Buzz ate which is 72 - 20 = 52. ки

Step 5: The waiter ate 52 slices of the pizza. The answer is: 52 ки

"label"：问题 + 逐步解决方案与自动标签，例如：

Step 1: The total ratio representing the pizza is 5+8 = <<5+8=13>>13. +

Step 2: The waiter ate 13 x 8 / 13 = <<13*8/13=6>>6 slices of the pizza. -

Step 3: Buzz ate 78 - 6 = <<78-6=72>>72 slices of the pizza. -

Step 4: The waiter ate 20 less than the number of slices that Buzz ate which is 72 - 20 = 52. -

Step 5: The waiter ate 52 slices of the pizza. The answer is: 52 -

"task"：GSM8K 或 MATH。

注释：

"ки" 是一个独特的标记，表示预测步骤分数的位置。
"+" 表示一个好的步骤，因为它有可能导致正确答案。
"-" 表示一个不好的步骤。
在训练PRMs时，我们只计算 ки 位置的损失。

搜集汇总

数据集介绍

背景与挑战

背景概述

Math-Shepherd is a dataset for training models in mathematical reasoning, featuring problems with step-by-step solutions and automatic step correctness labels. It includes 444,655 rows, with each entry containing a problem, its solution, and annotations for step quality. The dataset supports tasks like GSM8K and MATH, aiming to improve model performance through step-wise verification and reinforcement learning.

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集