five

achieve-the-core

收藏
魔搭社区2025-08-15 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/achieve-the-core
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for Achieve the Core This repository includes Common Core math standards, their descriptions, and metadata obtained from [Achieve the Core](https://github.com/achievethecore/atc-coherence-map/). Example of a math standard: ``` { "id": "K.CC.B.4", "description": "Understand the relationship between numbers and quantities; connect counting to cardinality.", "source": "Achieve the Core", "level": "Standard", "cluster_type": "major cluster", "aspects": [], "parent": "K.CC.B", "children": ["K.CC.B.4c", "K.CC.B.4b", "K.CC.B.4a"], "connections": {"progress to": ["1.OA.C.5", "K.CC.B.5"], "progress from": [], "related": ["K.CC.A.2", "K.CC.C.6", "K.CC.A.1"]}, "modeling": false } ``` See [MathFish](https://huggingface.co/datasets/allenai/mathfish) for more details on uses of this data. This data can be used to evaluate language models' abilities to assess whether math problems enable students to learn specific skills/concepts. Code to support this can be found in this [Github repository](https://github.com/allenai/mathfish/tree/main). ## Dataset Details ### Dataset Description - **Curated by:** Lucy Li, Tal August, Rose E Wang, Luca Soldaini, Courtney Allison, Kyle Lo - **Funded by:** The Gates Foundation - **Language(s) (NLP):** English - **License:** ODC-By 1.0 ### Dataset Sources - **Repository:** [Achieve the Core's Github](https://github.com/achievethecore/atc-coherence-map/) - **Website:** [Achieve the Core's Coherence Map](https://tools.achievethecore.org/coherence-map/) ## Dataset Structure This repository includes two key files: `domain_groups.json` and `standards.jsonl`. We created `domain_groups.json` because the "domains" we evaluate with for our tagging task do not have a one-to-one mapping to K-8 domains and high school (HS) categories in Common Core State Standards (CCSS). Some HS categories are equivalent or similar to a domain in K-8, and some differences in K-8 domains are difficult to explain a brief description at the domain-level. Thus, a "domain" in our paper sometimes groups multiple actual CCSS domains/categories. We mostly retain the original CCSS K-8 domains and HS categories, but make exceptions for the following: we group OA (Operations & Algebraic Thinking), EE (Expressions & Equations), and A (HS Algebra) into Operations & Algebra, S (HS Statistics & Probability) and SP (K-8 Statistics & Probability) to \textit{Statistics & Probability}, and finally NS (K-8 The Number System) and N (HS Number and Quantity) to Number Systems and Quantity. Since CCSS and Achieve the Core do not provide brief descriptions of domains, we worked with a curriculum specialist to write domains' descriptions. Within `standards.jsonl`, each line is a standard, sub-standard, cluster, domain, or grade level: ``` { id: '', # e.g. 'K.OA.A.1' description: 'description of standard from achieve the core', source: 'Achieve the Core', level: '', # one of Grade, HS Category, Domain, Cluster, Standard, Sub-standard cluster_type: '', # e.g. major cluster, additional cluster, minor cluster aspects: [], # a list containing items such as "Application", "conceptual understanding", "Procedural Skill and Fluency" parent: '', children: [], connections: {''progress to': [], 'progress from': [], 'related': []} # standard-level Achieve the Core connections modeling: # True or False depending on whether the standard is a "modeling" standard } ``` After downloading each file, you can load them: ``` import json with open('domain_groups.json', 'r') as infile: domain_groups = json.load(infile) print(domain_groups.keys()) # should print the keys of this dictionary with open('standards.jsonl', 'r') as infile: for line in infile: this_standard = json.loads(line) print(this_standard['id']) # should print the ID of the row in this file ``` ## Citation ``` @misc{lucy2024evaluatinglanguagemodelmath, title={Evaluating Language Model Math Reasoning via Grounding in Educational Curricula}, author={Li Lucy and Tal August and Rose E. Wang and Luca Soldaini and Courtney Allison and Kyle Lo}, year={2024}, eprint={2408.04226}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.04226}, } ``` ## Dataset Card Contact kylel@allenai.org

# Achieve the Core 数据集卡片 本仓库包含从[Achieve the Core](https://github.com/achievethecore/atc-coherence-map/)获取的共同核心(Common Core)数学标准、标准说明及元数据。 数学标准示例: { "id": "K.CC.B.4", "description": "理解数字与数量间的关联,将计数与基数相联系。", "source": "Achieve the Core", "level": "Standard", "cluster_type": "major cluster", "aspects": [], "parent": "K.CC.B", "children": ["K.CC.B.4c", "K.CC.B.4b", "K.CC.B.4a"], "connections": {"progress to": ["1.OA.C.5", "K.CC.B.5"], "progress from": [], "related": ["K.CC.A.2", "K.CC.C.6", "K.CC.A.1"]}, "modeling": false } 如需了解该数据的更多应用场景,请参阅[MathFish](https://huggingface.co/datasets/allenai/mathfish)。 本数据集可用于评估大语言模型(Large Language Model,LLM)判断数学题目是否能够帮助学生掌握特定技能/概念的能力。相关支持代码可在本[GitHub仓库](https://github.com/allenai/mathfish/tree/main)中获取。 ## 数据集详情 ### 数据集描述 - **数据整理者:** Lucy Li、Tal August、Rose E Wang、Luca Soldaini、Courtney Allison、Kyle Lo - **资助方:** 盖茨基金会(The Gates Foundation) - **(自然语言处理)语言:** 英语 - **许可协议:** ODC-By 1.0(开放数据共同体-署名1.0版本) ## 数据集来源 - **代码仓库:** [Achieve the Core的GitHub仓库](https://github.com/achievethecore/atc-coherence-map/) - **官方网站:** [Achieve the Core的关联图谱工具](https://tools.achievethecore.org/coherence-map/) ## 数据集结构 本仓库包含两个核心文件:`domain_groups.json`与`standards.jsonl`。 我们创建`domain_groups.json`的原因在于,本标注任务所使用的“领域(domain)”与共同核心州立标准(Common Core State Standards, CCSS)中的K至8年级领域及高中(HS)类别并非一一对应。部分高中类别与K至8年级的某个领域等价或相似,且K至8年级领域间的部分差异难以通过领域级别的简短说明进行阐释。因此,本文中的“领域”有时会整合多个实际的CCSS领域/类别。我们基本保留了原始CCSS的K至8年级领域与高中类别,但针对以下情况做出调整:将OA(运算与代数思维)、EE(表达式与方程)以及A(高中代数)整合至「运算与代数」;将S(高中统计与概率)与SP(K至8年级统计与概率)整合至「统计与概率」;最后将NS(K至8年级数系)与N(高中数与量)整合至「数系与数量」。由于CCSS与Achieve the Core未提供领域的简短说明,我们与课程专家合作编写了各领域的描述文本。 在`standards.jsonl`中,每一行对应一个标准、子标准、簇、领域或年级级别: { id: '', # 例如:'K.OA.A.1' description: '来自Achieve the Core的标准说明', source: 'Achieve the Core', level: '', # 取值为 Grade、HS Category、Domain、Cluster、Standard、Sub-standard 之一 cluster_type: '', # 例如:主要簇(major cluster)、附加簇(additional cluster)、次要簇(minor cluster) aspects: [], # 包含「应用」「概念理解」「过程技能与熟练度」等条目的列表 parent: '', children: [], connections: {'progress to': [], 'progress from': [], 'related': []} # 标准级别的Achieve the Core关联关系 modeling: # 布尔值,表明该标准是否为“建模”标准 } 下载文件后,可通过以下代码加载: import json with open('domain_groups.json', 'r') as infile: domain_groups = json.load(infile) print(domain_groups.keys()) # 应输出该字典的键 with open('standards.jsonl', 'r') as infile: for line in infile: this_standard = json.loads(line) print(this_standard['id']) # 应输出当前行的标准ID ## 引用 @misc{lucy2024evaluatinglanguagemodelmath, title={Evaluating Language Model Math Reasoning via Grounding in Educational Curricula}, author={Li Lucy and Tal August and Rose E. Wang and Luca Soldaini and Courtney Allison and Kyle Lo}, year={2024}, eprint={2408.04226}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.04226}, } ## 数据集卡片联系人 kylel@allenai.org
提供机构:
maas
创建时间:
2025-05-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作