Nix-ai/cat-math-v1
收藏Hugging Face2026-04-15 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Nix-ai/cat-math-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
dataset_info:
features:
- name: instruction
dtype: string
- name: output
dtype: string
- name: category
dtype: string
- name: has_graph
dtype: bool
- name: id
dtype: int64
splits:
- name: train
num_bytes: 300542572
num_examples: 194218
- name: validation
num_bytes: 16696981
num_examples: 10790
- name: test
num_bytes: 16696981
num_examples: 10790
download_size: 69018981
dataset_size: 333936534
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
---
# 🐾 cat-math-v1 Dataset Generator
A script that generates **46 000+ fully-dynamic math Q&A pairs** with a catgirl narrator,
ASCII graphs, and uploads directly to HuggingFace.
---
## Quick start
```bash
# 1. Install deps
pip install numpy datasets huggingface_hub tqdm
# 2. Preview 3 samples (no upload)
python catmath_generator.py --preview
# 3. Generate locally only (saves catmath.jsonl)
python catmath_generator.py --n 46000
# 4. Generate + upload to HF
export HF_TOKEN=hf_your_write_token_here
python catmath_generator.py --n 46000 --upload
# 5. Custom repo / output file
python catmath_generator.py --n 50000 --upload \
--repo Nix-ai/cat-math-v1 \
--out my_catmath.jsonl \
--token hf_xxxx
```
---
## CLI flags
| Flag | Default | Description |
|------|---------|-------------|
| `--n` | 46000 | Number of samples |
| `--seed` | 42 | Random seed (fully reproducible) |
| `--out` | catmath.jsonl | Local JSONL output path |
| `--upload` | off | Push to HuggingFace after generation |
| `--token` | (env) | HF write token (`HF_TOKEN` env var also works) |
| `--repo` | Nix-ai/cat-math-v1 | HuggingFace dataset repo |
| `--preview` | off | Print 3 samples and exit |
---
## Dataset schema
Each row is a JSON object:
```json
{
"id": 12345,
"instruction": "Solve for x: 3x + 7 = 22",
"output": "Nyaa~! Let me help you...\n\n...\n\nAnd that's the answer! Nya~☆",
"category": "linear_equations",
"has_graph": true
}
```
### Fields
- **id** – integer index
- **instruction** – the math problem (plain English + LaTeX-lite notation)
- **output** – catgirl-narrated step-by-step solution, may include ASCII art
- **category** – one of 18 math categories (see below)
- **has_graph** – whether the output contains an ASCII graph
### Categories (18)
| Category | Description |
|----------|-------------|
| `arithmetic` | Basic +, −, ×, ÷ with number lines |
| `linear_equations` | ax + b = c with ASCII line plots |
| `quadratic` | Quadratic formula + parabola plots |
| `calculus_derivatives` | Power-rule poly derivatives |
| `calculus_integrals` | Definite integrals, FTC |
| `statistics` | Mean, median, mode, std dev, bar charts |
| `trigonometry` | sin/cos/tan with period/amplitude plots |
| `geometry` | Circle, rectangle, triangle, trapezoid |
| `logarithms` | log_b(x), exponential graphs |
| `linear_algebra` | 2×2/3×2 matrix add/mul/det/transpose |
| `probability` | Dice, cards, binomial distribution |
| `number_theory` | GCD (Euclidean), LCM, prime factors, mod |
| `sequences` | Arithmetic, geometric, Fibonacci-like |
| `complex_numbers` | Add, multiply, modulus, conjugate |
| `differential_equations` | Separable ODEs (dy/dx = ky) |
| `statistics_regression` | Least-squares linear regression + scatter |
| `combinatorics` | Permutations, combinations, counting |
| `limits` | Polynomial, rational, trig, L'Hôpital |
---
## ASCII graph types used
| Graph type | Used in |
|-----------|---------|
| Function plot (axes + curve) | linear, quadratic, trig, log, ODE |
| Bar chart (horizontal █) | stats, sequences |
| Scatter plot (◆) | regression |
| Number line (▲ marker) | arithmetic |
| ASCII table | regression data |
---
## Dataset splits (HuggingFace)
| Split | Fraction |
|-------|----------|
| train | 90 % |
| validation | 5 % |
| test | 5 % |
---
## HuggingFace token
You need a **write** token from https://huggingface.co/settings/tokens.
The repo `Nix-ai/cat-math-v1` must exist (or you must have permission to create it).
### 数据集基础信息
license: MIT 许可证
dataset_info:
特征字段:
- 名称:instruction,数据类型:字符串
- 名称:output,数据类型:字符串
- 名称:category,数据类型:字符串
- 名称:has_graph,数据类型:布尔值
- 名称:id,数据类型:64位整数
数据集划分:
- 划分名称:train,字节数:300542572,样本数:194218
- 划分名称:validation,字节数:16696981,样本数:10790
- 划分名称:test,字节数:16696981,样本数:10790
下载总大小:69018981,数据集存储总大小:333936534
configs:
- 配置名称:default,数据文件:
- 划分:train,路径:data/train-*
- 划分:validation,路径:data/validation-*
- 划分:test,路径:data/test-*
---
# 🐾 cat-math-v1 数据集生成器
一款可生成**46000+ 全动态数学问答对**的脚本,采用猫娘作为叙述者,内置ASCII图表功能,支持直接上传至HuggingFace平台。
---
## 快速上手
bash
# 1. 安装依赖包
pip install numpy datasets huggingface_hub tqdm
# 2. 预览3条样本(不执行上传操作)
python catmath_generator.py --preview
# 3. 仅本地生成数据集(输出保存为catmath.jsonl)
python catmath_generator.py --n 46000
# 4. 生成数据集并上传至HuggingFace
export HF_TOKEN=hf_你的写入令牌
python catmath_generator.py --n 46000 --upload
# 5. 自定义仓库与输出文件路径
python catmath_generator.py --n 50000 --upload
--repo Nix-ai/cat-math-v1
--out my_catmath.jsonl
--token hf_xxxx
---
## CLI 参数说明
| 参数标志 | 默认值 | 功能描述 |
|------|---------|-------------|
| `--n` | 46000 | 生成的样本总数量 |
| `--seed` | 42 | 随机种子(可实现完全复现实验结果) |
| `--out` | catmath.jsonl | 本地JSONL格式输出文件的存储路径 |
| `--upload` | 关闭 | 生成完成后自动推送至HuggingFace平台 |
| `--token` | (通过环境变量读取) | HuggingFace写入令牌(也可通过`HF_TOKEN`环境变量传入) |
| `--repo` | Nix-ai/cat-math-v1 | HuggingFace数据集仓库地址 |
| `--preview` | 关闭 | 打印3条样本后直接退出程序 |
---
## 数据集格式规范
每条数据均为JSON对象:
json
{
"id": 12345,
"instruction": "求解x:3x + 7 = 22",
"output": "喵呜~!让我来帮你...
...
这就是答案啦!喵☆",
"category": "linear_equations",
"has_graph": true
}
### 字段详解
- **id**:整数型索引标识
- **instruction**:数学问题描述(采用自然语言搭配轻量LaTeX符号)
- **output**:以猫娘口吻叙述的分步解题过程,可包含ASCII艺术图表
- **category**:18个数学分类之一(详见下文)
- **has_graph**:布尔值,标识输出内容是否包含ASCII图表
### 18个数学分类
| 分类名称 | 功能说明 |
|----------|-------------|
| `arithmetic` | 基础四则运算,搭配数轴可视化 |
| `linear_equations` | 形如`ax + b = c`的线性方程,搭配ASCII折线图 |
| `quadratic` | 二次公式求解,搭配抛物线绘图 |
| `calculus_derivatives` | 基于幂法则的多项式求导 |
| `calculus_integrals` | 定积分求解与牛顿-莱布尼茨公式 |
| `statistics` | 均值、中位数、众数、标准差计算,搭配柱状图 |
| `trigonometry` | 正弦/余弦/正切函数,搭配周期/振幅绘图 |
| `geometry` | 圆、矩形、三角形、梯形相关几何计算 |
| `logarithms` | 对数函数`log_b(x)`与指数函数绘图 |
| `linear_algebra` | 2×2/3×2矩阵的加、乘、行列式、转置运算 |
| `probability` | 骰子、扑克牌、二项分布相关概率计算 |
| `number_theory` | 最大公约数(欧几里得算法)、最小公倍数、质因数分解、模运算 |
| `sequences` | 等差数列、等比数列、类斐波那契数列 |
| `complex_numbers` | 复数的加、乘、模长、共轭运算 |
| `differential_equations` | 可分离变量常微分方程(形如`dy/dx = ky`) |
| `statistics_regression` | 最小二乘线性回归与散点图 |
| `combinatorics` | 排列、组合与计数问题 |
| `limits` | 多项式、有理式、三角函数极限求解与洛必达法则应用 |
---
## 内置ASCII图表类型
| 图表类型 | 应用场景 |
|-----------|---------|
| 函数绘图(含坐标轴与曲线) | 线性方程、二次函数、三角函数、对数函数、常微分方程 |
| 柱状图(使用`█`绘制横向条形) | 统计学、数列 |
| 散点图(使用`◆`作为数据标记) | 回归分析 |
| 数轴(使用`▲`作为标记点) | 基础算术 |
| ASCII表格 | 回归分析配套数据展示 |
---
## 数据集划分规则(HuggingFace)
| 数据集划分 | 占比 |
|-------|----------|
| 训练集(train) | 90% |
| 验证集(validation) | 5% |
| 测试集(test) | 5% |
---
## HuggingFace 写入令牌说明
你需要从https://huggingface.co/settings/tokens 获取**具备写入权限的令牌**。需确保目标仓库`Nix-ai/cat-math-v1`已存在,或你拥有该仓库的创建权限。
提供机构:
Nix-ai



