five

Pegasus-Tiny-250K

收藏
魔搭社区2025-12-04 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/prithivMLmods/Pegasus-Tiny-250K
下载链接
链接失效反馈
官方服务:
资源简介:
![1](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/qnCISDW9_NK4KCp_PAaG4.png) # **Pegasus-Tiny-250K** > **Pegasus-Tiny-250K** is a compact, high-quality mathematical reasoning dataset curated by **prithivMLmods** and hosted on Hugging Face. It contains approximately **~291K structured reasoning traces** in Parquet format, optimized for efficient training, evaluation, and reasoning-aligned fine-tuning of AI models. This dataset provides diverse mathematics-focused problem statements paired with detailed step-by-step reasoning solutions. Pegasus-Tiny-250K emphasizes clear reasoning flow and structured problem solving, making it suitable for training lightweight reasoning models, educational tools, and benchmarking tasks. ## Quick Start ```bash pip install -U datasets ``` ```python from datasets import load_dataset dataset = load_dataset("prithivMLmods/Pegasus-Tiny-250K", split="train") ``` ## Dataset Overview | Feature | Value | | ---------------------- | ------------------------------------------------------ | | **Rows** | ~291,505 | | **Preview-shard rows** | 221,720 | | **Size[partial]** | 2.22 GB | | **Format** | Parquet | | **Language** | English | | **License** | Apache-2.0 | | **Primary Focus** | Mathematical reasoning, structured step-wise solutions | ## Data Structure * **problem**: Math or logic-based task prompt * **solution**: Chain-of-thought reasoning ending with final answer ## Source Inputs Includes reasoning from: * **Xen-Arc AI CodeX-2M-Thinking**: [Small traces, depending on the specific problem] Code-x structured programming logic, [XenArcAI/CodeX-2M-Thinking](https://huggingface.co/datasets/XenArcAI/CodeX-2M-Thinking) * **Math-aligned custom prompts** : [Gargantua-R1-Wee](https://huggingface.co/datasets/prithivMLmods/Gargantua-R1-Wee) * **Hybrid algorithmic reasoning tasks**: [Gargantua-R1-Wee](https://huggingface.co/datasets/prithivMLmods/Gargantua-R1-Wee) ## Use Cases * Fine-tuning compact reasoning models * Training models on problem-solving trace generation * Benchmarking math reasoning ability * Research in chain-of-thought modeling * Educational AI and tutoring systems ## Maintainer | Author | Last Updated | | --------------------------------------------------------- | ------------ | | **[prithivMLmods](https://huggingface.co/prithivMLmods)** | **Nov 2025** |

# **Pegasus-Tiny-250K** > **Pegasus-Tiny-250K** 是由 **prithivMLmods** 整理的轻量化高质量数学推理数据集,托管于Hugging Face平台。该数据集以Parquet格式存储,包含约29.1万条结构化推理轨迹,专为AI模型的高效训练、评估与推理对齐微调优化。数据集涵盖多样化的数学类问题描述,并搭配详尽的分步推理解决方案。Pegasus-Tiny-250K突出清晰的推理流程与结构化问题求解能力,适用于轻量化推理模型训练、教育工具开发以及基准测试任务。 ## 快速入门 bash pip install -U datasets python from datasets import load_dataset dataset = load_dataset("prithivMLmods/Pegasus-Tiny-250K", split="train") ## 数据集概览 | 特征项 | 参数值 | | ---------------------- | ------------------------------------------------------ | | **数据行数** | 约291,505条 | | **预览分片行数** | 221,720条 | | **[部分]大小** | 2.22 GB | | **存储格式** | Parquet | | **语言** | 英语 | | **开源协议** | Apache-2.0 | | **核心聚焦领域** | 数学推理、结构化分步求解 | ## 数据结构 * **problem**:数学或逻辑类任务提示词 * **solution**:以最终答案收尾的链式思考推理过程 ## 数据源 包含以下来源的推理内容: * **Xen-Arc AI CodeX-2M-Thinking**:[少量推理轨迹,视具体问题而定] 代码-x结构化编程逻辑,详见数据集[XenArcAI/CodeX-2M-Thinking](https://huggingface.co/datasets/XenArcAI/CodeX-2M-Thinking) * **数学对齐自定义提示词**:详见数据集[Gargantua-R1-Wee](https://huggingface.co/datasets/prithivMLmods/Gargantua-R1-Wee) * **混合算法推理任务**:详见数据集[Gargantua-R1-Wee](https://huggingface.co/datasets/prithivMLmods/Gargantua-R1-Wee) ## 应用场景 * 轻量化推理模型的微调训练 * 面向问题求解轨迹生成的模型训练 * 数学推理能力基准测试 * 链式思考建模相关研究 * 教育类人工智能与辅导系统 ## 维护者 | 作者 | 最后更新时间 | | --------------------------------------------------------- | ------------ | | **[prithivMLmods](https://huggingface.co/prithivMLmods)** | **2025年11月** |
提供机构:
maas
创建时间:
2025-11-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作