achieve-the-core
收藏魔搭社区2025-08-15 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/achieve-the-core
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for Achieve the Core
This repository includes Common Core math standards, their descriptions, and metadata obtained from [Achieve the Core](https://github.com/achievethecore/atc-coherence-map/).
Example of a math standard:
```
{
"id": "K.CC.B.4",
"description": "Understand the relationship between numbers and quantities; connect counting to cardinality.",
"source": "Achieve the Core",
"level": "Standard",
"cluster_type": "major cluster",
"aspects": [],
"parent": "K.CC.B",
"children": ["K.CC.B.4c", "K.CC.B.4b", "K.CC.B.4a"],
"connections": {"progress to": ["1.OA.C.5", "K.CC.B.5"], "progress from": [], "related": ["K.CC.A.2", "K.CC.C.6", "K.CC.A.1"]},
"modeling": false
}
```
See [MathFish](https://huggingface.co/datasets/allenai/mathfish) for more details on uses of this data.
This data can be used to evaluate language models' abilities to assess whether math problems enable students to learn specific skills/concepts. Code to support this can be found in this [Github repository](https://github.com/allenai/mathfish/tree/main).
## Dataset Details
### Dataset Description
- **Curated by:** Lucy Li, Tal August, Rose E Wang, Luca Soldaini, Courtney Allison, Kyle Lo
- **Funded by:** The Gates Foundation
- **Language(s) (NLP):** English
- **License:** ODC-By 1.0
### Dataset Sources
- **Repository:** [Achieve the Core's Github](https://github.com/achievethecore/atc-coherence-map/)
- **Website:** [Achieve the Core's Coherence Map](https://tools.achievethecore.org/coherence-map/)
## Dataset Structure
This repository includes two key files: `domain_groups.json` and `standards.jsonl`.
We created `domain_groups.json` because the "domains" we evaluate with for our tagging task do not have a one-to-one mapping to K-8 domains and high school (HS) categories in Common Core State Standards (CCSS). Some HS categories are equivalent or similar to a domain in K-8, and some differences in K-8 domains are difficult to explain a brief description at the domain-level. Thus, a "domain" in our paper sometimes groups multiple actual CCSS domains/categories. We mostly retain the original CCSS K-8 domains and HS categories, but make exceptions for the following: we group OA (Operations & Algebraic Thinking), EE (Expressions & Equations), and A (HS Algebra) into Operations & Algebra, S (HS Statistics & Probability) and SP (K-8 Statistics & Probability) to \textit{Statistics & Probability}, and finally NS (K-8 The Number System) and N (HS Number and Quantity) to Number Systems and Quantity. Since CCSS and Achieve the Core do not provide brief descriptions of domains, we worked with a curriculum specialist to write domains' descriptions.
Within `standards.jsonl`, each line is a standard, sub-standard, cluster, domain, or grade level:
```
{
id: '', # e.g. 'K.OA.A.1'
description: 'description of standard from achieve the core',
source: 'Achieve the Core',
level: '', # one of Grade, HS Category, Domain, Cluster, Standard, Sub-standard
cluster_type: '', # e.g. major cluster, additional cluster, minor cluster
aspects: [], # a list containing items such as "Application", "conceptual understanding", "Procedural Skill and Fluency"
parent: '',
children: [],
connections: {''progress to': [], 'progress from': [], 'related': []} # standard-level Achieve the Core connections
modeling: # True or False depending on whether the standard is a "modeling" standard
}
```
After downloading each file, you can load them:
```
import json
with open('domain_groups.json', 'r') as infile:
domain_groups = json.load(infile)
print(domain_groups.keys()) # should print the keys of this dictionary
with open('standards.jsonl', 'r') as infile:
for line in infile:
this_standard = json.loads(line)
print(this_standard['id']) # should print the ID of the row in this file
```
## Citation
```
@misc{lucy2024evaluatinglanguagemodelmath,
title={Evaluating Language Model Math Reasoning via Grounding in Educational Curricula},
author={Li Lucy and Tal August and Rose E. Wang and Luca Soldaini and Courtney Allison and Kyle Lo},
year={2024},
eprint={2408.04226},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2408.04226},
}
```
## Dataset Card Contact
kylel@allenai.org
# Achieve the Core 数据集卡片
本仓库包含从[Achieve the Core](https://github.com/achievethecore/atc-coherence-map/)获取的共同核心(Common Core)数学标准、标准说明及元数据。
数学标准示例:
{
"id": "K.CC.B.4",
"description": "理解数字与数量间的关联,将计数与基数相联系。",
"source": "Achieve the Core",
"level": "Standard",
"cluster_type": "major cluster",
"aspects": [],
"parent": "K.CC.B",
"children": ["K.CC.B.4c", "K.CC.B.4b", "K.CC.B.4a"],
"connections": {"progress to": ["1.OA.C.5", "K.CC.B.5"], "progress from": [], "related": ["K.CC.A.2", "K.CC.C.6", "K.CC.A.1"]},
"modeling": false
}
如需了解该数据的更多应用场景,请参阅[MathFish](https://huggingface.co/datasets/allenai/mathfish)。
本数据集可用于评估大语言模型(Large Language Model,LLM)判断数学题目是否能够帮助学生掌握特定技能/概念的能力。相关支持代码可在本[GitHub仓库](https://github.com/allenai/mathfish/tree/main)中获取。
## 数据集详情
### 数据集描述
- **数据整理者:** Lucy Li、Tal August、Rose E Wang、Luca Soldaini、Courtney Allison、Kyle Lo
- **资助方:** 盖茨基金会(The Gates Foundation)
- **(自然语言处理)语言:** 英语
- **许可协议:** ODC-By 1.0(开放数据共同体-署名1.0版本)
## 数据集来源
- **代码仓库:** [Achieve the Core的GitHub仓库](https://github.com/achievethecore/atc-coherence-map/)
- **官方网站:** [Achieve the Core的关联图谱工具](https://tools.achievethecore.org/coherence-map/)
## 数据集结构
本仓库包含两个核心文件:`domain_groups.json`与`standards.jsonl`。
我们创建`domain_groups.json`的原因在于,本标注任务所使用的“领域(domain)”与共同核心州立标准(Common Core State Standards, CCSS)中的K至8年级领域及高中(HS)类别并非一一对应。部分高中类别与K至8年级的某个领域等价或相似,且K至8年级领域间的部分差异难以通过领域级别的简短说明进行阐释。因此,本文中的“领域”有时会整合多个实际的CCSS领域/类别。我们基本保留了原始CCSS的K至8年级领域与高中类别,但针对以下情况做出调整:将OA(运算与代数思维)、EE(表达式与方程)以及A(高中代数)整合至「运算与代数」;将S(高中统计与概率)与SP(K至8年级统计与概率)整合至「统计与概率」;最后将NS(K至8年级数系)与N(高中数与量)整合至「数系与数量」。由于CCSS与Achieve the Core未提供领域的简短说明,我们与课程专家合作编写了各领域的描述文本。
在`standards.jsonl`中,每一行对应一个标准、子标准、簇、领域或年级级别:
{
id: '', # 例如:'K.OA.A.1'
description: '来自Achieve the Core的标准说明',
source: 'Achieve the Core',
level: '', # 取值为 Grade、HS Category、Domain、Cluster、Standard、Sub-standard 之一
cluster_type: '', # 例如:主要簇(major cluster)、附加簇(additional cluster)、次要簇(minor cluster)
aspects: [], # 包含「应用」「概念理解」「过程技能与熟练度」等条目的列表
parent: '',
children: [],
connections: {'progress to': [], 'progress from': [], 'related': []} # 标准级别的Achieve the Core关联关系
modeling: # 布尔值,表明该标准是否为“建模”标准
}
下载文件后,可通过以下代码加载:
import json
with open('domain_groups.json', 'r') as infile:
domain_groups = json.load(infile)
print(domain_groups.keys()) # 应输出该字典的键
with open('standards.jsonl', 'r') as infile:
for line in infile:
this_standard = json.loads(line)
print(this_standard['id']) # 应输出当前行的标准ID
## 引用
@misc{lucy2024evaluatinglanguagemodelmath,
title={Evaluating Language Model Math Reasoning via Grounding in Educational Curricula},
author={Li Lucy and Tal August and Rose E. Wang and Luca Soldaini and Courtney Allison and Kyle Lo},
year={2024},
eprint={2408.04226},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2408.04226},
}
## 数据集卡片联系人
kylel@allenai.org
提供机构:
maas
创建时间:
2025-05-27



