Poseidon-Reasoning-Mini-300K
收藏魔搭社区2025-12-03 更新2025-07-26 收录
下载链接:
https://modelscope.cn/datasets/prithivMLmods/Poseidon-Reasoning-Mini-300K
下载链接
链接失效反馈官方服务:
资源简介:

# **Poseidon-Reasoning-Mini-300K**
> Poseidon-Reasoning-Mini-300K is a compact, high-quality reasoning dataset designed for advanced tasks in **mathematics**, **coding**, and **science**. This smaller-scale collection maintains the depth and quality of its larger counterparts, with a focus on multi-step and general reasoning—making it ideal for model pretraining, fine-tuning, benchmarking, and STEM educational applications.
---
## Quick Start with Hugging Face Datasets🤗
```py
pip install -U datasets
```
```py
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Poseidon-Reasoning-Mini-300K", split="train")
```
---
## Overview
- **Dataset Name:** Poseidon-Reasoning-Mini-300K
- **Curated by:** prithivMLmods
- **Size:** ~300,000 entries
- **Formats:** `.arrow`, Parquet
- **Languages:** English
- **License:** Apache-2.0
## Key Features
- **Compact & Rigorous:** Delivers concise, high-quality problems with comprehensive stepwise solutions.
- **STEM Coverage:** Prioritizes mathematical, scientific, and coding problems, with a strong emphasis on logic and reasoning tasks.
- **Optimized Curation:** Comprises expertly selected entries from larger derivative and open datasets, guaranteeing diversity and consistency.
- **Adaptable Scale:** The 300K size is optimal for efficient experimentation and quick benchmarking without sacrificing complexity or depth.
## Dataset Structure
Each sample includes:
- **problem:** A clear, typically STEM-oriented question or prompt.
- **solution:** Step-by-step, reasoning-based explanation or answer.
### Schema Example
| Column | Type | Description |
|----------|--------|----------------------------|
| problem | string | Problem or reasoning prompt |
| solution | string | Stepwise explanation/answer |
---
## Data Sources
Poseidon-Reasoning-Mini-300K is a carefully curated derivative, sourced from:
- **prithivMLmods/Poseidon-Reasoning-5M**
- **glaiveai/reasoning-v1-20m**
- **prithivMLmods/Open-Omega-Explora-2.5M**
- Custom modular dataset contributions by prithivMLmods
Each source was selected and filtered to maximize quality, clarity, and coverage of reasoning skills in math, science, and coding.
## Applications
This mini dataset is ideal for the following use cases:
- Fine-tuning and evaluating LLMs for STEM and general reasoning
- Rapid benchmarking for research or educational models
- Curriculum design for math, coding, and science toolchains
- AI reasoning assessments and diagnostic tasks
---
## Citation
If you use this dataset, please cite:
```
Poseidon-Reasoning-Mini-300K by prithivMLmods
Derived and curated from:
- prithivMLmods/Poseidon-Reasoning-5M
- glaiveai/reasoning-v1-20m
- prithivMLmods/Open-Omega-Explora-2.5M
```
## License
Distributed under the Apache-2.0 License. Always review underlying source dataset licenses for full compliance.

# **Poseidon-Reasoning-Mini-300K**
> Poseidon-Reasoning-Mini-300K是一款轻量化、高质量的推理数据集,专为数学、编程与科学领域的进阶任务打造。这款轻量化数据集保留了其大型同类数据集的深度与质量,核心聚焦多步推理与通用推理任务,十分适配模型预训练、微调、基准测试以及STEM(科学、技术、工程、数学)教育应用场景。
---
## Hugging Face 数据集🤗快速上手
py
pip install -U datasets
py
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Poseidon-Reasoning-Mini-300K", split="train")
---
## 数据集概览
- **数据集名称:** Poseidon-Reasoning-Mini-300K
- **整理方:** prithivMLmods
- **数据规模:** 约30万条数据
- **数据格式:** `.arrow`、Parquet
- **使用语言:** 英语
- **许可证:** Apache-2.0
## 核心特性
- **轻量化且严谨:** 提供简洁高质的问题,并附带完整的分步解答。
- **STEM领域覆盖:** 优先收录数学、科学与编程相关问题,重点聚焦逻辑与推理任务。
- **优化精选:** 从大型衍生开源数据集中经专业筛选得到样本,确保数据的多样性与一致性。
- **规模适配:** 30万条的规模可高效支持实验与快速基准测试,同时不会牺牲任务的复杂度与深度。
## 数据集结构
每条样本包含:
- **problem(问题):** 清晰的、通常面向STEM领域的提问或提示。
- **solution(解答):** 基于推理的分步解释或最终答案。
### Schema示例
| 列名 | 类型 | 说明 |
|----------|--------|----------------------------|
| problem | 字符串 | 问题或推理提示 |
| solution | 字符串 | 分步解释/最终答案 |
---
## 数据来源
Poseidon-Reasoning-Mini-300K是一款经精心整理的衍生数据集,其来源包括:
- **prithivMLmods/Poseidon-Reasoning-5M**
- **glaiveai/reasoning-v1-20m**
- **prithivMLmods/Open-Omega-Explora-2.5M**
- prithivMLmods贡献的自定义模块化数据集
所有数据源均经过筛选,以最大化数学、科学与编程领域推理技能相关数据的质量、清晰度与覆盖范围。
## 应用场景
这款轻量化数据集适配以下应用场景:
- 针对STEM领域与通用推理任务的大语言模型(Large Language Model, LLM)微调与评估
- 面向研究或教育类模型的快速基准测试
- 数学、编程与科学工具链的课程设计
- AI推理能力评估与诊断任务
---
## 引用说明
若使用本数据集,请引用如下内容:
Poseidon-Reasoning-Mini-300K by prithivMLmods
Derived and curated from:
- prithivMLmods/Poseidon-Reasoning-5M
- glaiveai/reasoning-v1-20m
- prithivMLmods/Open-Omega-Explora-2.5M
## 许可证
本数据集采用Apache-2.0许可证分发。使用前请务必核查底层源数据集的许可证以确保完全合规。
提供机构:
maas
创建时间:
2025-07-19



