NecroMOnk/khan-math-algebra
收藏Hugging Face2026-03-08 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/NecroMOnk/khan-math-algebra
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
tags:
- math
- khan-academy
- latex
- education
- chatml
pretty_name: Khan Math – Algebra
size_categories:
- 1M<n<10M
---
# Khan Math – Algebra
Algebra problems from Khan Academy in ChatML format. Covers topics such as equations, polynomials, functions, and more.
## Quick Start
```python
from datasets import load_dataset
dataset = load_dataset("NecroMOnk/khan-math-algebra")
print(dataset["train"][0])
```
Example record:
```json
{
"messages": [
{"role": "system", "content": "You are a mathematics tutor. Answer the following math problem."},
{"role": "user", "content": "Find the arclength of the function $f(x) = \\log(2x)$ on the interval $x=4$ to $x=5$"},
{"role": "assistant", "content": "$-\\sqrt{17}+\\sqrt{26}+\\tanh^{-1}\\left(\\sqrt{17}\\right)-\\tanh^{-1}\\left(\\sqrt{26}\\right)$"}
],
"topic": "calculus",
"subtopic": "arclength"
}
```
## Dataset Stats
- **1.24M** problems
- ChatML format
- Source: Khan Academy algebra materials
- Language: English
- Math notation: LaTeX
## Fields
| Field | Type | Description |
|---|---|---|
| `messages` | list | ChatML turns: system, user, assistant |
| `topic` | string | Top-level math topic (e.g. `calculus`) |
| `subtopic` | string | Specific subtopic (e.g. `arclength`) |
## Source
Problems sourced from [Khan Academy](https://www.khanacademy.org/) via the AMPS dataset.
Problems and answers are in LaTeX format.
## Versions
- **v1** – raw extraction, minor artifacts possible (e.g. degenerate intervals, `+-` notation)
- **v2** – cleaned formulas, extraction artifacts removed *(coming soon)*
---
许可证:MIT协议
语言:
- 英语
标签:
- 数学
- 可汗学院(Khan Academy)
- LaTeX(LaTeX)
- 教育
- ChatML(ChatML)
美观名称:可汗数学——代数
规模类别:
- 100万<样本量<1000万
---
# 可汗数学——代数
本数据集包含采用ChatML格式编排的可汗学院代数习题,涵盖方程、多项式、函数等诸多主题。
## 快速上手
python
from datasets import load_dataset
dataset = load_dataset("NecroMOnk/khan-math-algebra")
print(dataset["train"][0])
示例数据记录:
json
{
"messages": [
{"role": "system", "content": "You are a mathematics tutor. Answer the following math problem."},
{"role": "user", "content": "Find the arclength of the function $f(x) = log(2x)$ on the interval $x=4$ to $x=5$"},
{"role": "assistant", "content": "$-sqrt{17}+sqrt{26}+ anh^{-1}left(sqrt{17}
ight)- anh^{-1}left(sqrt{26}
ight)$"}
],
"topic": "calculus",
"subtopic": "arclength"
}
## 数据集统计
- **124万** 道习题
- 采用ChatML格式
- 数据来源:可汗学院代数教学资料
- 语言:英语
- 数学符号采用LaTeX格式
## 字段说明
| 字段名 | 数据类型 | 字段说明 |
|---|---|---|
| `messages` | 列表 | ChatML对话轮次,包含系统提示、用户提问、助手回复三类角色 |
| `topic` | 字符串 | 一级数学主题(例如`微积分`) |
| `subtopic` | 字符串 | 细分主题(例如`弧长计算`) |
## 数据来源
本数据集习题源自基于AMPS数据集抓取的可汗学院(Khan Academy)平台内容,习题与答案均采用LaTeX格式编排。
## 版本说明
- **v1**:原始提取版本,可能存在少量格式瑕疵(例如退化区间、`+-`符号误用等)
- **v2**:公式清洗版本,已移除提取过程产生的格式瑕疵(即将推出)
提供机构:
NecroMOnk



