five

mathfish

收藏
魔搭社区2025-07-16 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/mathfish
下载链接
链接失效反馈
官方服务:
资源简介:
# MathFish This dataset is introduced by "[Evaluating Language Model Math Reasoning via Grounding in Educational Curricula](https://arxiv.org/abs/2408.04226)", and includes problems drawn from two open educational resources (OER): Illustrative Mathematics and Fishtank Learning. Problems are labeled with *mathematical standards*, which are K-12 skills and concepts that problems enable students to learn. These standards are defined and organized by Common Core State Standards. Additional components of MathFish can be found at: - [allenai/achieve-the-core](https://huggingface.co/datasets/allenai/achieve-the-core): Common Core mathematical standards and their descriptions - [allenai/mathfish_tasks](https://huggingface.co/datasets/allenai/mathfish_tasks): MathFish's dev set problems inserted into verification and tagging prompts for language models Code to support Mathfish can be found in this [Github repository](https://github.com/allenai/mathfish/tree/main). ## Dataset Details ### Dataset Description Common Core State Standards (CCSS) offer fine-grained and comprehensive coverage of K-12 math skills/concepts. We scrape labeled problems from two reputable OER that span a wide range of grade levels and standards: [Illustrative Mathematics](https://illustrativemathematics.org/) and [Fishtank Learning](https://fishtanklearning.org/). Each problem is a segment of these materials demarcated by standards labels, and a problem may be labeled with multiple standards. Number of problems: 4356 in `dev.jsonl`, 4355 in `test.jsonl`, 13065 in `train.jsonl`. In total, 21776 K-12 math problems. Number of images: 1848 in `fl_problem`, 11736 in `im_lesson`, 27 in `im_modelingprompt`, 3497 in `im_practice`, 860 in `im_task`. In total, 17968 math images. - **Curated by:** Lucy Li, Tal August, Rose E Wang, Luca Soldaini, Courtney Allison, Kyle Lo - **Funded by:** The Gates Foundation - **Language(s) (NLP):** English - **License:** ODC-By 1.0 ## Uses ### Direct Use This dataset was originally created to evaluate models' abilities to identify math skills and concepts using publisher-labeled data pulled from curricular websites. This data may support investigations into the use of language models to support K-12 education. Illustrative Mathematics is licensed as [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/), while Fishtank Learning component is licensed under Creative Commons [BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). Both sources are intended to be OER, which is defined as teaching, learning, and research materials that provides users free and perpetual permission to "retain, reuse, revise, remix, and redistribute" for educational purposes. ### Out-of-Scope Use Note that Fishtank Learning's original license prohibits commercial use. ## Dataset Structure Each `*.jsonl` file contains one problem or activity per line: ``` { id: '', # this is global text: ‘string representing activity or problem’, metadata: { source id, unit, lesson, other location data , url if possible, html version}, # this is source-specific acquisition_date: '', # YYYY-MM-DD elements: {identifier : name of image file or html of table}, # table, img, figure interweaved with text standards: [list of (relation, standard)], # relation could be addressing, alignment, building towards, etc source: '', } ``` Note: Among standard relation types, `Addressing` == `Alignment`, and we evaluate on these in our paper. Future work may investigate other types of relations between problems and math skills/concepts. Not all problems in each file contain standards. Images are in the `images` folder, in zipped files named after image filenames' prefixes: `fl_problem`, `im_lesson`, `im_modelingprompt`, `im_practice`, `im_task`. ## Dataset Creation ### Curation Rationale Math standards are informed by human learning progressions, and commonly used in real-world reviews of math content. In education, materials have focused alignment with a standard if they enable students to learn the full intent of concepts/skills described by that standard. Identifying alignment can thus inform educators whether a set of materials adequately targets core learning goals for students. #### Data Collection and Processing We pull problems from several parts of Illustrative Mathematics curriculum: tasks, centers, practice problems, lessons, and modeling prompts. For Fishtank learning, we pull problems from the lessons section of their website. What is considered a "lesson" and what is considered a "problem" or "task" is an artifact of the materials themselves. Some problems are hands-on group activities, while others are assessment-type problems. #### Who are the source data producers? Illustrative Mathematics and Fishtank Learning are nonprofit educational organizations in the United States. ## Bias, Risks, and Limitations Though these problems offer substantial coverage of a common K-12 curriculum in the United States, they may not directly translate to pedagogical standards or practices in other socio-cultural contexts. ### Recommendations Though language models have the potential to automate the task of identifying standards alignment in curriculum or improve educational instruction, their rule in education should be a supporting, rather than leading, one. To design such tools, we believe that it is best to co-create with teachers and curriculum specialists. ## Citation ``` @misc{lucy2024evaluatinglanguagemodelmath, title={Evaluating Language Model Math Reasoning via Grounding in Educational Curricula}, author={Li Lucy and Tal August and Rose E. Wang and Luca Soldaini and Courtney Allison and Kyle Lo}, year={2024}, eprint={2408.04226}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.04226}, } ``` ## Dataset Card Contact kylel@allenai.org

# 数学鱼(MathFish) 本数据集由论文「[Evaluating Language Model Math Reasoning via Grounding in Educational Curricula](https://arxiv.org/abs/2408.04226)」提出,包含来自两家开放教育资源(Open Educational Resources, OER)的习题:说明性数学(Illustrative Mathematics)与鱼课堂学习(Fishtank Learning)。所有习题均标注有**数学标准**——即帮助学生学习的K-12学段技能与概念,这些标准由共同核心州立标准(Common Core State Standards, CCSS)定义并组织。 MathFish的额外组件可从以下地址获取: - [allenai/achieve-the-core](https://huggingface.co/datasets/allenai/achieve-the-core): 共同核心州立数学标准及其说明 - [allenai/mathfish_tasks](https://huggingface.co/datasets/allenai/mathfish_tasks): 用于大语言模型(Large Language Model, LLM)验证与标注提示的MathFish开发集习题 MathFish的配套代码可访问此[GitHub仓库](https://github.com/allenai/mathfish/tree/main)获取。 ## 数据集详情 ### 数据集描述 共同核心州立标准(Common Core State Standards, CCSS)对K-12数学技能与概念进行了精细且全面的覆盖。我们从两家信誉卓著的覆盖多学段与数学标准的开放教育资源中爬取了带标注的习题:[说明性数学(Illustrative Mathematics)](https://illustrativemathematics.org/)与[鱼课堂学习(Fishtank Learning)](https://fishtanklearning.org/)。每道习题均为原始材料中以标准标签划分的片段,一道习题可被标注多个标准。 **习题数量**:`dev.jsonl` 含4356道,`test.jsonl` 含4355道,`train.jsonl` 含13065道,总计21776道K-12数学习题。 **图像数量**:`fl_problem` 含1848张,`im_lesson` 含11736张,`im_modelingprompt` 含27张,`im_practice` 含3497张,`im_task` 含860张,总计17968张数学图像。 - **编撰者**:Lucy Li、Tal August、Rose E. Wang、Luca Soldaini、Courtney Allison、Kyle Lo - **资助方**:盖茨基金会(Gates Foundation) - **自然语言处理所用语言**:英语 - **授权协议**:ODC-By 1.0 ## 用途 ### 直接用途 本数据集最初用于评估大语言模型基于教育出版商标注的习题数据识别数学技能与概念的能力。该数据集可支撑有关利用大语言模型服务K-12教育的相关研究。 说明性数学(Illustrative Mathematics)采用[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)授权协议,鱼课堂学习(Fishtank Learning)相关内容采用知识共享[BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)授权协议。两者均属于开放教育资源,即允许用户免费且永久地为教育目的“保留、复用、修改、重混与再分发”的教学、学习与研究材料。 ### 超出适用范围的使用场景 需注意,鱼课堂学习(Fishtank Learning)的原始授权协议禁止商业使用。 ## 数据集结构 每个`*.jsonl`文件的每一行对应一道习题或一项教学活动: { id: '', # 全局唯一标识符 text: '表示活动或习题的字符串', metadata: { source id, unit, lesson, other location data , url if possible, html version}, # 来源专属字段 acquisition_date: '', # 格式为YYYY-MM-DD elements: {identifier : name of image file or html of table}, # 表格、图像、插图与文本交织 standards: [list of (relation, standard)], # 关联关系可分为匹配、对齐、趋向构建等 source: '', } 注:在标准关联关系类型中,`Addressing` 等价于 `Alignment`,本文的评估即基于此类关系。未来研究可探索习题与数学技能/概念间的其他关联类型。并非所有文件中的习题都带有标准标注。 图像存储于`images`文件夹中,以图像文件名前缀命名的压缩包分别为:`fl_problem`、`im_lesson`、`im_modelingprompt`、`im_practice`、`im_task`。 ## 数据集构建 ### 编撰依据 数学标准基于人类学习进阶制定,在数学内容的实际评审中被广泛使用。在教育领域,若教学材料能帮助学生掌握标准所定义的全部概念与技能的核心内涵,则该材料与该标准实现了对齐。因此,识别教学材料与标准的对齐关系可帮助教育工作者判断相关材料是否充分覆盖了学生的核心学习目标。 #### 数据收集与处理 我们从说明性数学(Illustrative Mathematics)课程的多个板块爬取习题:任务、小组活动、练习习题、课时内容与建模提示。对于鱼课堂学习(Fishtank Learning),我们从其网站的课时板块爬取习题。“课时”与“习题”或“任务”的划分取决于原始材料本身的结构。部分习题为动手小组活动,其余则为测评类习题。 #### 原始数据生产者 说明性数学(Illustrative Mathematics)与鱼课堂学习(Fishtank Learning)均为美国的非营利教育机构。 ## 偏差、风险与局限性 尽管本数据集覆盖了美国主流的K-12课程体系,但未必能直接适配其他社会文化背景下的教学标准与实践。 ### 建议 尽管大语言模型有望实现课程标准对齐识别的自动化,或改进教育教学工作,但其在教育领域的角色应作为辅助工具,而非主导工具。我们认为,此类工具的设计应与教师及课程专家协同开展。 ## 引用 @misc{lucy2024evaluatinglanguagemodelmath, title={Evaluating Language Model Math Reasoning via Grounding in Educational Curricula}, author={Li Lucy and Tal August and Rose E. Wang and Luca Soldaini and Courtney Allison and Kyle Lo}, year={2024}, eprint={2408.04226}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.04226}, } ## 数据集卡片联系人 kylel@allenai.org
提供机构:
maas
创建时间:
2025-05-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作