Pegasus-Tiny-250K
收藏魔搭社区2025-12-04 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/prithivMLmods/Pegasus-Tiny-250K
下载链接
链接失效反馈官方服务:
资源简介:

# **Pegasus-Tiny-250K**
> **Pegasus-Tiny-250K** is a compact, high-quality mathematical reasoning dataset curated by **prithivMLmods** and hosted on Hugging Face. It contains approximately **~291K structured reasoning traces** in Parquet format, optimized for efficient training, evaluation, and reasoning-aligned fine-tuning of AI models. This dataset provides diverse mathematics-focused problem statements paired with detailed step-by-step reasoning solutions. Pegasus-Tiny-250K emphasizes clear reasoning flow and structured problem solving, making it suitable for training lightweight reasoning models, educational tools, and benchmarking tasks.
## Quick Start
```bash
pip install -U datasets
```
```python
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Pegasus-Tiny-250K", split="train")
```
## Dataset Overview
| Feature | Value |
| ---------------------- | ------------------------------------------------------ |
| **Rows** | ~291,505 |
| **Preview-shard rows** | 221,720 |
| **Size[partial]** | 2.22 GB |
| **Format** | Parquet |
| **Language** | English |
| **License** | Apache-2.0 |
| **Primary Focus** | Mathematical reasoning, structured step-wise solutions |
## Data Structure
* **problem**: Math or logic-based task prompt
* **solution**: Chain-of-thought reasoning ending with final answer
## Source Inputs
Includes reasoning from:
* **Xen-Arc AI CodeX-2M-Thinking**: [Small traces, depending on the specific problem] Code-x structured programming logic, [XenArcAI/CodeX-2M-Thinking](https://huggingface.co/datasets/XenArcAI/CodeX-2M-Thinking)
* **Math-aligned custom prompts** : [Gargantua-R1-Wee](https://huggingface.co/datasets/prithivMLmods/Gargantua-R1-Wee)
* **Hybrid algorithmic reasoning tasks**: [Gargantua-R1-Wee](https://huggingface.co/datasets/prithivMLmods/Gargantua-R1-Wee)
## Use Cases
* Fine-tuning compact reasoning models
* Training models on problem-solving trace generation
* Benchmarking math reasoning ability
* Research in chain-of-thought modeling
* Educational AI and tutoring systems
## Maintainer
| Author | Last Updated |
| --------------------------------------------------------- | ------------ |
| **[prithivMLmods](https://huggingface.co/prithivMLmods)** | **Nov 2025** |
# **Pegasus-Tiny-250K**
> **Pegasus-Tiny-250K** 是由 **prithivMLmods** 整理的轻量化高质量数学推理数据集,托管于Hugging Face平台。该数据集以Parquet格式存储,包含约29.1万条结构化推理轨迹,专为AI模型的高效训练、评估与推理对齐微调优化。数据集涵盖多样化的数学类问题描述,并搭配详尽的分步推理解决方案。Pegasus-Tiny-250K突出清晰的推理流程与结构化问题求解能力,适用于轻量化推理模型训练、教育工具开发以及基准测试任务。
## 快速入门
bash
pip install -U datasets
python
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Pegasus-Tiny-250K", split="train")
## 数据集概览
| 特征项 | 参数值 |
| ---------------------- | ------------------------------------------------------ |
| **数据行数** | 约291,505条 |
| **预览分片行数** | 221,720条 |
| **[部分]大小** | 2.22 GB |
| **存储格式** | Parquet |
| **语言** | 英语 |
| **开源协议** | Apache-2.0 |
| **核心聚焦领域** | 数学推理、结构化分步求解 |
## 数据结构
* **problem**:数学或逻辑类任务提示词
* **solution**:以最终答案收尾的链式思考推理过程
## 数据源
包含以下来源的推理内容:
* **Xen-Arc AI CodeX-2M-Thinking**:[少量推理轨迹,视具体问题而定] 代码-x结构化编程逻辑,详见数据集[XenArcAI/CodeX-2M-Thinking](https://huggingface.co/datasets/XenArcAI/CodeX-2M-Thinking)
* **数学对齐自定义提示词**:详见数据集[Gargantua-R1-Wee](https://huggingface.co/datasets/prithivMLmods/Gargantua-R1-Wee)
* **混合算法推理任务**:详见数据集[Gargantua-R1-Wee](https://huggingface.co/datasets/prithivMLmods/Gargantua-R1-Wee)
## 应用场景
* 轻量化推理模型的微调训练
* 面向问题求解轨迹生成的模型训练
* 数学推理能力基准测试
* 链式思考建模相关研究
* 教育类人工智能与辅导系统
## 维护者
| 作者 | 最后更新时间 |
| --------------------------------------------------------- | ------------ |
| **[prithivMLmods](https://huggingface.co/prithivMLmods)** | **2025年11月** |
提供机构:
maas
创建时间:
2025-11-27



