Math-Glot-Cleaned-10K
收藏魔搭社区2025-12-10 更新2025-06-28 收录
下载链接:
https://modelscope.cn/datasets/prithivMLmods/Math-Glot-Cleaned-10K
下载链接
链接失效反馈官方服务:
资源简介:
# **Math-Glot-Cleaned-10K**
**Math-Glot-Cleaned-10K** is a curated subset of 10,000 math and code reasoning samples, extracted and cleaned from the original [NVIDIA AceReason-1.1-SFT](https://huggingface.co/datasets/NVIDIA/AceReason-1.1-SFT) dataset. This refined version focuses exclusively on high-quality, structured mathematical prompts paired with chain-of-thought style reasoning.
## Dataset Summary
* **Source**: Derived from NVIDIA's AceReason-1.1-SFT dataset
* **Total Entries**: 10,000
* **Format**: Text-to-text (input → output)
* **Modality**: Text
* **Language**: English
* **License**: Apache 2.0
Each entry in this dataset contains:
* **input**: A mathematical or computational problem, often using LaTeX-style expressions
* **output**: A detailed reasoning response, typically in chain-of-thought format
## Processing Details
The following steps were applied during dataset creation:
* **Filtered**: Retained only math- and code-related samples from the full AceReason dataset
* **Normalized**: Standardized LaTeX formatting, spacing, and markdown structure
* **Cleaned**: Removed noisy, unrelated, or improperly formatted records
* **Column Reduction**: Dropped unnecessary columns from the original dataset
## Use Cases
This dataset is intended for:
* Fine-tuning models on math reasoning tasks
* Developing instruction-following LLMs for structured problem solving
* Benchmarking reasoning capabilities of large models in math or algorithmic domains
## Example
| input | output |
| ----------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- |
| Find all prime numbers $p$ such that $p^3 + 1 = k^2$. | `<think>` Okay, let's see. I need to find all prime numbers p where p cubed plus one equals a perfect square... |
## Citation
If using this dataset, please cite the original [AceReason-1.1-SFT dataset](https://huggingface.co/datasets/NVIDIA/AceReason-1.1-SFT) and credit this cleaned 10K subset for its focused structure.
# **Math-Glot-Cleaned-10K**
**Math-Glot-Cleaned-10K** 是从原始[NVIDIA AceReason-1.1-SFT](https://huggingface.co/datasets/NVIDIA/AceReason-1.1-SFT)数据集提取并清洗得到的精选子集,共包含10000条数学与代码推理样本。该优化版本仅聚焦于高质量、结构化的数学提示词与思维链(chain-of-thought)风格推理内容的配对组合。
## 数据集概述
* **来源**:衍生自NVIDIA的AceReason-1.1-SFT数据集
* **总样本数**:10,000
* **格式**:文本到文本(输入→输出)
* **模态**:文本
* **语言**:英语
* **授权协议**:Apache 2.0
该数据集的每条样本均包含:
* **输入(input)**:一道数学或计算问题,通常采用LaTeX格式的表达式
* **输出(output)**:详细的推理响应,一般采用思维链(chain-of-thought)格式
## 数据处理细节
数据集构建过程中执行了以下步骤:
* **筛选**:仅保留原始AceReason数据集中与数学和代码相关的样本
* **标准化**:统一LaTeX格式、空格排版与Markdown结构
* **清洗**:移除存在噪声、无关内容或格式错误的记录
* **列精简**:删除原始数据集中的非必要列
## 应用场景
本数据集适用于:
* 针对数学推理任务的模型微调
* 开发面向结构化问题求解的指令遵循型大语言模型(Large Language Model,LLM)
* 针对大模型在数学或算法领域的推理能力进行基准测试
## 示例
| 输入(input) | 输出(output) |
| ----------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- |
| 找出所有满足 $p^3 + 1 = k^2$ 的质数 $p$ | `<think>` 好的,我们来看看。我需要找出所有质数 $p$,使得 $p$ 的三次方加1等于一个完全平方数... |
## 引用说明
若使用本数据集,请引用原始[NVIDIA AceReason-1.1-SFT数据集](https://huggingface.co/datasets/NVIDIA/AceReason-1.1-SFT),并注明本清洗后的10K子集的结构化优化工作。
提供机构:
maas
创建时间:
2025-06-25



