gsm8k-fix

Name: gsm8k-fix
Creator: maas
Published: 2025-10-09 16:23:35
License: 暂无描述

魔搭社区2025-10-09 更新2025-02-22 收录

下载链接：

https://modelscope.cn/datasets/hkust-nlp/gsm8k-fix

下载链接

链接失效反馈

官方服务：

资源简介：

# GSM8K (Fixed) Some **erroneous labels** exist in the GSM8K dataset. This dataset is fixed from https://github.com/openai/grade-school-math/blob/master/grade_school_math/data/train.jsonl with the code appended at the end. The errors are located by delving into **unreasonably low pass rates by the strong DeepSeekMath-7B-RL** and hopefully should be exhaustive. This dataset is used by [the **🎯DART-Math** project](https://github.com/hkust-nlp/dart-math) to synthesize data. > [!WARNING] > ⚠️ Only the **training** set has been fixed so far. ```python for dp in collected_dps: if dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:167": # Janet filmed a new movie that is 60% longer than her previous 2-hour long movie. Her previous movie cost $50 per minute to film, and the newest movie cost twice as much per minute to film as the previous movie. What was the total amount of money required to film Janet's entire newest film? dp["resp"] = ( "The first movie was 2*60=120 minutes\nSo this movie is 120*.6=72 minutes longer\nSo this movie is 192 minutes\nIt also cost 50*2=$100 per minute to film\nSo it cost 192*100=$19200" ) dp["ans"] = dp["gt_ans"] = "19200" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:474": # A club with 30 members ordered fruit juices. Two-fifths of them ordered lemon juice. One-third of the remaining members ordered mango juice, and the rest ordered orange juice. How many members ordered orange juice? dp["resp"] = ( "30 x 2/5 = 12 members ordered lemon juice.\nSo, 30 - 12 = 18 members did not order lemon juice.\nSince 1/3 of the remaining ordered mango juice, then 18 x 1/3 = 6 members ordered mango juice.\nTherefore, 18 - 6 = 12 members ordered orange juice." ) dp["ans"] = dp["gt_ans"] = "12" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:1081": # John has 2 hives of bees. One of the hives has 1000 bees and produces 500 liters of honey. The second has 20% fewer bees but each bee produces 40% more honey. How much honey does he produce? dp["resp"] = ( "The second hive has 20/100*1000 = 200 fewer bees.\nThis translates to 1000-200 = 800 bees.\nEach bee in the first hive produces 500/1000 = 0.5 liters\nThe second hive has bees each producing 1.4*0.5 = 0.7 liters\nThe total amount of honey produces by the bees in the second hive is 0.7*800 = 560\nThe total honey produced is 500+560 = 1060 liters" ) dp["ans"] = dp["gt_ans"] = "1060" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:2338": # A bottle can hold 2 cups of water. How many cups of water are needed to fill up 10 whole bottles and 5 half-capacity bottles? dp[ "resp" ] = r"""For 10 whole bottles, you will need 10*2=20 cups of water. With 5 half-capacity bottles, it requires 5*1=5 cups of water. In total, you need to have 20+5=25 cups of water to fill them all.""" dp["ans"] = dp["gt_ans"] = "25" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:2620": # Ryan started with 36 tokens at the arcade. Ryan wasted a third of his tokens on Pac-Man, a fourth of his tokens on Candy Crush, and 7 on Ski-ball. Then, his parents bought him seven times as many tokens as he spent on Ski-ball. How many tokens did Ryan end up with? dp[ "resp" ] = r"""Ryan used 36/3 = 12 tokens on Pac-Man. Ryan used 36/4 = 9 tokens on Candy Crush. Ryan used a total of 12+9+7 = 28 tokens on all three games. Ryan had 36-28 = 8 tokens left. Ryan’s parents bought him 7*7 = 49 more tokens. Ryan had 8+49 = 57 tokens after his parents bought him some.""" dp["ans"] = dp["gt_ans"] = "57" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:2770": # Robert and Teddy are planning to buy snacks for their friends. Robert orders five boxes of pizza at $10 each box and ten cans of soft drinks at $2 each. Teddy buys six hamburgers at $3 each and an additional ten cans of soft drinks. How much do they spend in all? dp["ans"] = dp["gt_ans"] = "108" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:3263": # Andy gets a cavity for every 4 candy canes he eats. He gets 2 candy canes from his parents and 3 candy canes each from 4 teachers. Then he uses his allowance to buy 1/7 as many candy canes as he was given. How many cavities does he get from eating all his candy canes? dp["ans"] = dp["gt_ans"] = "4" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:3529": # During one game, a total of 50 people attended a baseball team’s games. Forty percent and thirty-four percent of the audiences are supporters of the first and second teams, respectively. How many people attended the game did not support either of the teams? dp[ "resp" ] = r"""50 x 40/100 = 20 people support the first team. 50 x 34/100 = 17 people support the second team. So, a total of 20 + 17 = 37 people supported the first and second teams. Thus, 50 - 37 = 13 people did not support either team.""" dp["ans"] = dp["gt_ans"] = "13" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:3995": # Kimiko is retiling her kitchen floor. Each tile is 6 square inches. If her kitchen is 48 inches by 72 inches, how many tiles does Kimiko need to buy? # "First figure out how many tiles are in one row by dividing the kitchen's width by each tile's width: 48 inches / 6 inches/tile = 8 tiles in one row\nThen figure out how many rows of tiles there are by dividing the kitchen's height by each tile's height: 72 inches / 6 inches/tile = 12 rows of tiles\nThen multiply the number of rows by the number of tiles per row to find the total number of tiles Kimiko needs to buy: 8 tiles/row * 12 rows = 96 tiles" dp[ "resp" ] = r"""To find out how many tiles Kimiko needs, we first need to calculate the total area of her kitchen floor in square inches. The area of the kitchen floor is given by the length multiplied by the width: Area = Length × Width So, for Kimiko's kitchen: Area = 48 inches × 72 inches Now, let's calculate: Area = 48 inches × 72 inches = 3456 square inches Now, we need to find out how many 6-square-inch tiles can fit into this total area. We can do this by dividing the total area by the area of each tile: Number of tiles = Total area / Area of each tile Number of tiles = 3456 square inches / 6 square inches per tile Number of tiles ≈ 576 So, Kimiko needs to buy approximately 576 tiles to retile her kitchen floor.""" dp["ans"] = dp["gt_ans"] = "576" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:4099": # Big Dig Mining Company mines three different types of ore: copper, iron, and nickel. Across all their mines, 10% of their output is nickel, 60% is iron, and the rest is copper. They mine 720 tons of nickel a day. How many tons of copper does Big Dig Mining Company mine daily? # 'Let R be the total ore output of the company.\nOf Big Dig’s output, 100 - 10 - 60 = 30% is nickel.\nSince 720 is 60% of 100% of their output, 720 / R = 60 / 100.\nThus, Big Dig mines R = 100 * 720 / 60 = 1200 tons of ore daily.\nTherefore, Big Dig mines 1200 * 30 / 100 = 360 tons of copper daily.' dp[ "resp" ] = r"""Let R be the total ore output of the company. Of Big Dig’s output, 100 - 10 - 60 = 30% is nickel. Since 720 is 10% of 100% of their output, 720 / R = 10 / 100. Thus, Big Dig mines R = 100 * 720 / 10 = 7200 tons of ore daily. Therefore, Big Dig mines 7200 * 30 / 100 = 360 tons of copper daily.""" dp["ans"] = dp["gt_ans"] = "2160" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:4105": # Andrew eats 14 donuts on Monday, and half as many on Tuesday. On Wednesday Andrew eats 4 times as many as he did on Monday. How many donuts did Andrew eat total in the three days? # 'Monday:14\nTuesday:14/2=7\nWednesday:4(7)=28\nTotal:14+7+28=49 donuts' dp[ "resp" ] = r"""Monday:14 Tuesday:14/2=7 Wednesday:4*14=56 Total:14+7+56=77 donuts""" dp["ans"] = dp["gt_ans"] = "77" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:4796": # In a yard, the number of tanks is five times the number of trucks. If there are 20 trucks in the yard, calculate the total number of tanks and trucks in the yard. # 'There are 5*20 = 100 tanks in the yard.\nAltogether, there are 100+20 = 120 trucks and tanks in the yard.' dp["ans"] = dp["gt_ans"] = "120" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:5598": # Mary just arrived at the beach. She has 4 times as many towels as Frances does. The total weight of their towels is 60 pounds. If Mary has 24 towels, how much do Frances's towels weigh in ounces? # 'Frances has 24/4 = 6 towels.\nThey have 24+6=30 towels.\nEach towel weighs 60/30=2 pounds.\nFrances’s towels weigh a total of 2*4 = 8 pounds\nFrances’s towels weigh a total of 8*16 = 128 ounces' dp[ "resp" ] = r"""Frances has 24/4 = 6 towels. They have 24+6=30 towels. Each towel weighs 60/30=2 pounds. Frances’s towels weigh a total of 2*6 = 12 pounds Frances’s towels weigh a total of 12*16 = 192 ounces""" dp["ans"] = dp["gt_ans"] = "192" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:6768": # Kate's hair is half as long as Emily's hair. Emily’s hair is 6 inches longer than Logan's hair. If Logan hair is 20 inches, how many inches is Kate’s hair? # "Emily’s hair is 20-6 = 14 inches long.\nKate's hair 14/2= 7 inches long." dp[ "resp" ] = r"""Emily’s hair is 20+6 = 26 inches long. Kate's hair 26/2= 13 inches long.""" dp["ans"] = dp["gt_ans"] = "13" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:7182": # If you double a number and add 5 to the result, then that's 20 more than half of the original number. What's the original number? # 'Let x be the original number.\n2*x+5=20+x/2\n2*x-x/2=15\n4*x-x=30\n3*x=30\nx=10' dp["ans"] = dp["gt_ans"] = "10" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:7401": # Mr. Finnegan has 3 tanks with a capacity of 7000 gallons, 5000 gallons, and 3000 gallons, respectively. If he fills the first tank up to 3/4 full, the second tank with water up to 4/5 of its capacity, and the third tank up to half of its capacity, how many gallons in total are in the tanks? # 'The capacity of the first tank is 7000 gallons, and if it is filled up to 3/4 full, it carries 3/4*7000 = 5250 gallons.\nWhen the second tank is filled up to 4/5 of its capacity, it carries 4/5*5000 = 4000 gallons.\nThe total amount of water in the first two tanks is 5250+4000 = 9250 gallons.\nIf Mr. Finnegan fills the third tank with water up to half its capacity, the tank fills up with 1/2*3000 = 1500 gallons.\nIn total, the three tanks have 9350+1500 = 10850 gallons of water.' dp[ "resp" ] = r"""The capacity of the first tank is 7000 gallons, and if it is filled up to 3/4 full, it carries 3/4*7000 = 5250 gallons. When the second tank is filled up to 4/5 of its capacity, it carries 4/5*5000 = 4000 gallons. The total amount of water in the first two tanks is 5250+4000 = 9250 gallons. If Mr. Finnegan fills the third tank with water up to half its capacity, the tank fills up with 1/2*3000 = 1500 gallons. In total, the three tanks have 9250+1500 = 10750 gallons of water.""" dp["ans"] = dp["gt_ans"] = "10750" ```

# 修正版GSM8K数据集 GSM8K数据集中存在部分错误标注。本数据集基于https://github.com/openai/grade-school-math/blob/master/grade_school_math/data/train.jsonl修正而来，文末附修正代码。本次修正的错误通过排查高性能模型DeepSeekMath-7B-RL的异常偏低通过率得以定位，且修正范围力求全面无遗漏。本数据集已被[🎯DART-Math项目](https://github.com/hkust-nlp/dart-math)用于数据合成。 > ⚠️ 注意：截至目前仅修正了训练集。 python for dp in collected_dps: if dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:167": # 珍妮特拍摄了一部新电影，时长比她此前2小时的旧片长60%。她的旧片拍摄成本为每分钟50美元，新片的每分钟拍摄成本是旧片的两倍。请问拍摄珍妮特的这部新片总花费为多少美元？ dp["resp"] = ( "第一部电影时长为2*60=120分钟因此新片比旧片长120*0.6=72分钟因此新片总时长为192分钟新片每分钟拍摄成本为50*2=100美元因此总花费为192*100=19200美元" ) dp["ans"] = dp["gt_ans"] = "19200" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:474": # 某俱乐部共有30名成员订购果汁。其中五分之二的成员订购了柠檬汁，剩余成员中的三分之一订购了芒果汁，其余成员订购了橙汁。请问有多少成员订购了橙汁？ dp["resp"] = ( "30 × 2/5 = 12 名成员订购了柠檬汁因此未订购柠檬汁的成员有30 - 12 = 18名由于剩余成员中的三分之一订购了芒果汁，即18 × 1/3 = 6名成员订购了芒果汁因此订购橙汁的成员为18 - 6 = 12名" ) dp["ans"] = dp["gt_ans"] = "12" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:1081": # 约翰拥有2个蜂箱。其中一个蜂箱有1000只蜜蜂，日产蜂蜜500升。第二个蜂箱的蜜蜂数量比第一个少20%，但每只蜜蜂的产蜜量比第一个蜂箱高40%。请问约翰总共产蜜多少升？ dp["resp"] = ( "第二个蜂箱的蜜蜂数量比第一个少20/100*1000 = 200只因此第二个蜂箱的蜜蜂数量为1000-200 = 800只第一个蜂箱每只蜜蜂产蜜量为500/1000 = 0.5升第二个蜂箱每只蜜蜂产蜜量为1.4*0.5 = 0.7升第二个蜂箱的总产蜜量为0.7*800 = 560升因此约翰的总产蜜量为500+560 = 1060升" ) dp["ans"] = dp["gt_ans"] = "1060" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:2338": # 一个瓶子可容纳2杯水。请问装满10个完整的瓶子和5个半容量的瓶子共需要多少杯水？ dp[ "resp" ] = r"""装满10个完整的瓶子需要10*2=20杯水。装满5个半容量的瓶子需要5*1=5杯水。因此总共需要20+5=25杯水。""" dp["ans"] = dp["gt_ans"] = "25" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:2620": # 瑞安在游戏厅初始拥有36枚代币。他在《吃豆人》上花费了自己代币的三分之一，在《糖果粉碎传奇》上花费了剩余代币的四分之一，又在《滑雪球》上花费了7枚代币。之后他的父母给了他相当于在《滑雪球》上花费金额7倍的代币。请问瑞安最终拥有多少枚代币？ dp[ "resp" ] = r"""瑞安在《吃豆人》上花费了36/3 = 12枚代币。瑞安在《糖果粉碎传奇》上花费了36/4 = 9枚代币。瑞安在三款游戏上总共花费了12+9+7 = 28枚代币。瑞安剩余的代币数量为36-28 = 8枚。瑞安的父母给了他7*7 = 49枚代币。因此瑞安最终拥有的代币数量为8+49 = 57枚。""" dp["ans"] = dp["gt_ans"] = "57" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:2770": # 罗伯特和泰迪计划为朋友们购买零食。罗伯特订购了5盒披萨，每盒10美元，以及10罐软饮料，每罐2美元。泰迪订购了6个汉堡，每个3美元，另外还订购了10罐软饮料。请问两人总共花费了多少美元？ dp["ans"] = dp["gt_ans"] = "108" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:3263": # 安迪每吃4根拐杖糖就会长一颗蛀牙。他从父母那里得到了2根拐杖糖，又从4位老师那里每人得到了3根拐杖糖。之后他用零花钱买了自己收到的拐杖糖总数的七分之一。请问安迪吃完全部拐杖糖后会长多少颗蛀牙？ dp["ans"] = dp["gt_ans"] = "4" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:3529": # 某场棒球比赛共有50名观众。其中40%的观众支持第一支球队，34%的观众支持第二支球队。请问既不支持第一支球队也不支持第二支球队的观众有多少人？ dp[ "resp" ] = r"""支持第一支球队的观众人数为50 × 40/100 = 20人。支持第二支球队的观众人数为50 × 34/100 = 17人。因此支持两队的观众总人数为20 + 17 = 37人。因此既不支持任何一支球队的观众人数为50 - 37 = 13人。""" dp["ans"] = dp["gt_ans"] = "13" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:3995": # 吉姆科正在翻新厨房地板。每块瓷砖的面积为6平方英寸。如果她的厨房尺寸为48英寸×72英寸，请问吉姆科需要购买多少块瓷砖？ dp[ "resp" ] = r"""要计算吉姆科需要的瓷砖数量，首先需要计算厨房地板的总面积（单位：平方英寸）。厨房地板的面积等于长度乘以宽度：面积 = 长度 × 宽度对于吉姆科的厨房：面积 = 48英寸 × 72英寸计算可得：面积 = 48 × 72 = 3456 平方英寸接下来计算可以覆盖该总面积的6平方英寸瓷砖的数量，即用总面积除以单块瓷砖的面积：瓷砖数量 = 总面积 / 单块瓷砖面积瓷砖数量 = 3456 / 6 = 576 因此吉姆科需要购买576块瓷砖来翻新厨房地板。""" dp["ans"] = dp["gt_ans"] = "576" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:4099": # 大挖掘矿业公司开采三种矿石：铜、铁和镍。在所有产量中，镍占10%，铁占60%，剩余部分为铜。该公司日产镍720吨。请问该公司每日产铜多少吨？ dp[ "resp" ] = r"""设R为该公司的矿石总产量。在大挖掘公司的总产量中，100 - 10 - 60 = 30%为铜，10%为镍。由于日产镍720吨，即720吨占总产量的10%，因此720 / R = 10 / 100。由此可得，该公司的矿石总产量R = 100 × 720 / 10 = 7200 吨。因此该公司每日产铜量为7200 × 30 / 100 = 2160 吨。""" dp["ans"] = dp["gt_ans"] = "2160" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:4105": # 安德鲁周一吃了14个甜甜圈，周二吃的数量是周一的一半。周三吃的数量是周一的4倍。请问安德鲁三天总共吃了多少个甜甜圈？ dp[ "resp" ] = r"""周一：14个周二：14/2=7个周三：4*14=56个总数量：14+7+56=77个""" dp["ans"] = dp["gt_ans"] = "77" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:4796": # 某院子里坦克模型的数量是卡车模型的5倍。如果院子里有20辆卡车模型，请问院子里坦克和卡车模型总共有多少个？ dp["ans"] = dp["gt_ans"] = "120" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:5598": # 玛丽刚到海滩。她的毛巾数量是弗朗西斯的4倍。两人的毛巾总重量为60磅。如果玛丽有24条毛巾，请问弗朗西斯的毛巾总重量为多少盎司？ dp[ "resp" ] = r"""弗朗西斯的毛巾数量为24/4 = 6条。两人的毛巾总数量为24+6=30条。每条毛巾的重量为60/30=2磅。弗朗西斯的毛巾总重量为2*6 = 12磅。弗朗西斯的毛巾总重量为12*16 = 192盎司。""" dp["ans"] = dp["gt_ans"] = "192" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:6768": # 凯特的头发长度是艾米丽的一半。艾米丽的头发比洛根的长6英寸。如果洛根的头发长20英寸，请问凯特的头发长多少英寸？ dp[ "resp" ] = r"""艾米丽的头发长度为20+6 = 26英寸。凯特的头发长度为26/2=13英寸。""" dp["ans"] = dp["gt_ans"] = "13" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:7182": # 如果将一个数翻倍后再加5，得到的结果比原数的一半多20。请问原数是多少？ dp["ans"] = dp["gt_ans"] = "10" elif dp["id"] == "grade-school-math/grade_school_math/data/train.jsonl:7401": # 芬尼根先生有3个水箱，容量分别为7000加仑、5000加仑和3000加仑。如果他将第一个水箱装至3/4满，第二个水箱装至4/5满，第三个水箱装至一半满，请问三个水箱总共有多少加仑的水？ dp[ "resp" ] = r"""第一个水箱的容量为7000加仑，装至3/4满时，水量为3/4*7000 = 5250加仑。第二个水箱装至4/5满时，水量为4/5*5000 = 4000加仑。前两个水箱的总水量为5250+4000 = 9250加仑。第三个水箱装至一半满时，水量为1/2*3000 = 1500加仑。因此三个水箱的总水量为9250+1500 = 10750加仑。""" dp["ans"] = dp["gt_ans"] = "10750"

提供机构：

maas

创建时间：

2025-02-17

搜集汇总

数据集介绍