five

peiyi9979/Math-Shepherd

收藏
Hugging Face2024-01-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/peiyi9979/Math-Shepherd
下载链接
链接失效反馈
官方服务:
资源简介:
--- tags: - prm - synthesized data --- # Dataset Card for Math-Shepherd Project Page: [Math-Shepherd](https://rain-motion-6ec.notion.site/Math-Shepherd-A-Label-Free-Step-by-Step-Verifier-for-LLMs-in-Mathematical-Reasoning-41b6e73c860840e08697d347f8889bac#08e86c6d44c4452ba0b78c7aaea5f4f7) Paper: https://arxiv.org/pdf/2312.08935.pdf # Data Loading ``` from datasets import load_dataset dataset = load_dataset("peiyi9979/Math-Shepherd") ``` # Data Instance Every instance consists of three data fields: "input," "label," and "task". 1. "input": problem + step-by-step solution, e.g., ``` If Buzz bought a pizza with 78 slices at a restaurant and then decided to share it with the waiter in the ratio of 5:8, with Buzz's ratio being 5, what's twenty less the number of slices of pizza that the waiter ate? Step 1: The total ratio representing the pizza is 5+8 = <<5+8=13>>13. ки Step 2: The waiter ate 13 x 8 / 13 = <<13*8/13=6>>6 slices of the pizza. ки Step 3: Buzz ate 78 - 6 = <<78-6=72>>72 slices of the pizza. ки Step 4: The waiter ate 20 less than the number of slices that Buzz ate which is 72 - 20 = 52. ки Step 5: The waiter ate 52 slices of the pizza. The answer is: 52 ки ``` 2. "label": problem + step-by-step solution with automatic label, e.g., ``` If Buzz bought a pizza with 78 slices at a restaurant and then decided to share it with the waiter in the ratio of 5:8, with Buzz's ratio being 5, what's twenty less the number of slices of pizza that the waiter ate? Step 1: The total ratio representing the pizza is 5+8 = <<5+8=13>>13. + Step 2: The waiter ate 13 x 8 / 13 = <<13*8/13=6>>6 slices of the pizza. - Step 3: Buzz ate 78 - 6 = <<78-6=72>>72 slices of the pizza. - Step 4: The waiter ate 20 less than the number of slices that Buzz ate which is 72 - 20 = 52. - Step 5: The waiter ate 52 slices of the pizza. The answer is: 52 - ``` 3. "task": `GSM8K` or `MATH`. NOTE: "`ки`" serves as a unique token denoting the position for predicting the step score. "`+`" signifies a good step, as it has the potential to lead towards the correct answer. "`-`" denotes a bad step. When we train PRMs, we only compute the loss of the positions of `ки`. # Models: We utilized internal code for step-wise PPO training, which cannot be open-sourced. We hope for your understanding. We provide the checkpoints of SFT, PRM, and RL models to help everyone reproduce our results. - Mistral-7b-sft: https://huggingface.co/peiyi9979/mistral-7b-sft - Mistral-7b-prm: https://huggingface.co/peiyi9979/math-shepherd-mistral-7b-prm - Mistral-7b-rl: https://huggingface.co/peiyi9979/math-shepherd-mistral-7b-rl

--- 标签: - prm - 合成数据 --- # 数学牧羊人(Math-Shepherd)数据集卡片 项目页面:[数学牧羊人(Math-Shepherd)](https://rain-motion-6ec.notion.site/Math-Shepherd-A-Label-Free-Step-by-Step-Verifier-for-LLMs-in-Mathematical-Reasoning-41b6e73c860840e08697d347f8889bac#08e86c6d44c4452ba0b78c7aaea5f4f7) 论文:https://arxiv.org/pdf/2312.08935.pdf # 数据加载 from datasets import load_dataset dataset = load_dataset("peiyi9979/Math-Shepherd") # 数据实例 每个数据实例包含三个数据字段:`input`、`label`与`task`。 1. `input`:题目与逐步解题过程,示例如下: 巴兹在餐厅购买了一份共78片的披萨,随后决定按照5:8的比例与服务员分享披萨(巴兹所占比例为5)。请问服务员所食用的披萨片数减去20的结果是多少? 步骤1:披萨对应的总比例为5+8 = <<5+8=13>>13。 ки 步骤2:服务员食用的披萨片数为13 × 8 / 13 = <<13*8/13=6>>6。 ки 步骤3:巴兹食用的披萨片数为78 - 6 = <<78-6=72>>72。 ки 步骤4:服务员食用的披萨片数比巴兹少20,即72 - 20 = 52。 ки 步骤5:服务员食用了52片披萨。最终答案为:52 ки 2. `label`:带有自动标注的题目与逐步解题过程,示例如下: 巴兹在餐厅购买了一份共78片的披萨,随后决定按照5:8的比例与服务员分享披萨(巴兹所占比例为5)。请问服务员所食用的披萨片数减去20的结果是多少? 步骤1:披萨对应的总比例为5+8 = <<5+8=13>>13。 + 步骤2:服务员食用的披萨片数为13 × 8 / 13 = <<13*8/13=6>>6。 - 步骤3:巴兹食用的披萨片数为78 - 6 = <<78-6=72>>72。 - 步骤4:服务员食用的披萨片数比巴兹少20,即72 - 20 = 52。 - 步骤5:服务员食用了52片披萨。最终答案为:52 - 3. `task`:取值为`GSM8K`或`MATH`。 备注: `ки` 作为专属Token(Token),用于标记步骤评分的预测位置。 `+` 代表优质步骤,即该步骤有望导向正确解题结果。 `-` 代表劣质步骤。 在训练偏好奖励模型(PRM)时,我们仅对`ки`标记的位置计算损失值。 # 模型相关说明 我们采用内部代码完成逐步骤近端策略优化(PPO)训练,该代码无法开源,敬请谅解。我们提供了监督微调(SFT)模型、偏好奖励模型(PRM)以及强化学习(RL)模型的权重checkpoint,以方便研究者复现我们的实验结果。 - Mistral-7b-sft: https://huggingface.co/peiyi9979/mistral-7b-sft - Mistral-7b-prm: https://huggingface.co/peiyi9979/math-shepherd-mistral-7b-prm - Mistral-7b-rl: https://huggingface.co/peiyi9979/math-shepherd-mistral-7b-rl
提供机构:
peiyi9979
原始信息汇总

数据集卡片:Math-Shepherd

数据加载

python from datasets import load_dataset dataset = load_dataset("peiyi9979/Math-Shepherd")

数据实例

每个实例包含三个数据字段:"input"、"label" 和 "task"。

  1. "input":问题 + 逐步解决方案,例如:

If Buzz bought a pizza with 78 slices at a restaurant and then decided to share it with the waiter in the ratio of 5:8, with Buzzs ratio being 5, whats twenty less the number of slices of pizza that the waiter ate?

Step 1: The total ratio representing the pizza is 5+8 = <<5+8=13>>13. ки

Step 2: The waiter ate 13 x 8 / 13 = <<13*8/13=6>>6 slices of the pizza. ки

Step 3: Buzz ate 78 - 6 = <<78-6=72>>72 slices of the pizza. ки

Step 4: The waiter ate 20 less than the number of slices that Buzz ate which is 72 - 20 = 52. ки

Step 5: The waiter ate 52 slices of the pizza. The answer is: 52 ки

  1. "label":问题 + 逐步解决方案与自动标签,例如:

If Buzz bought a pizza with 78 slices at a restaurant and then decided to share it with the waiter in the ratio of 5:8, with Buzzs ratio being 5, whats twenty less the number of slices of pizza that the waiter ate?

Step 1: The total ratio representing the pizza is 5+8 = <<5+8=13>>13. +

Step 2: The waiter ate 13 x 8 / 13 = <<13*8/13=6>>6 slices of the pizza. -

Step 3: Buzz ate 78 - 6 = <<78-6=72>>72 slices of the pizza. -

Step 4: The waiter ate 20 less than the number of slices that Buzz ate which is 72 - 20 = 52. -

Step 5: The waiter ate 52 slices of the pizza. The answer is: 52 -

  1. "task"GSM8KMATH

注释:

  • "ки" 是一个独特的标记,表示预测步骤分数的位置。
  • "+" 表示一个好的步骤,因为它有可能导致正确答案。
  • "-" 表示一个不好的步骤。
  • 在训练PRMs时,我们只计算 ки 位置的损失。
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
Math-Shepherd is a dataset for training models in mathematical reasoning, featuring problems with step-by-step solutions and automatic step correctness labels. It includes 444,655 rows, with each entry containing a problem, its solution, and annotations for step quality. The dataset supports tasks like GSM8K and MATH, aiming to improve model performance through step-wise verification and reinforcement learning.
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作