Helios-R-6M
收藏魔搭社区2026-01-02 更新2025-08-02 收录
下载链接:
https://modelscope.cn/datasets/prithivMLmods/Helios-R-6M
下载链接
链接失效反馈官方服务:
资源简介:

# Helios-R-6M
> **Helios-R-6M** is a high-quality, compact reasoning dataset designed to strengthen multi-step problem solving across mathematics, computer science, and scientific inquiry. While the dataset covers a range of disciplines, math constitutes the largest share of examples and drives the reasoning complexity.
---
## Quick Start with Hugging Face Datasets🤗
```py
pip install -U datasets
```
```py
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Helios-R-6M", split="train")
```
---
## Overview
* **Total Samples**: \~6,186,188
* **Split**: `train` only
* **Languages**: English
* **Format**: Apache Arrow (auto-converted to Parquet)
* **License**: Apache-2.0
* **Tags**: `math`, `code`, `science`, `reasoning`, `thinking`, `cot`
## Highlights
* Focused on generating thoughtful, chain-of-thought solutions with logical progression and clarity.
* Includes both formal mathematical problems and abstract scientific or logical challenges.
* Suitable for model pretraining and fine-tuning in STEM and reasoning-intensive domains.
## Dataset Structure
Each sample in the dataset includes:
* **`problem`** (string): A question or prompt from math, science, logic, or coding.
* **`solution`** (string): A structured, step-by-step reasoning trace usually starting with `<think>`, simulating an instructive problem-solving mindset.
---
## Source & Derivation
Helios-R-6M is a refined and optimized blend derived from:
* [prithivMLmods/Poseidon-Reasoning-5M](https://huggingface.co/datasets/prithivMLmods/Poseidon-Reasoning-5M)
* [glaiveai/reasoning-v1-20m](https://huggingface.co/datasets/glaiveai/reasoning-v1-20m)
* [prithivMLmods/Open-Omega-Explora-2.5M](https://huggingface.co/datasets/prithivMLmods/Open-Omega-Explora-2.5M)
* Additional custom modular problems curated and contributed by **prithivMLmods**
This combination allows for both depth and breadth across technical and academic disciplines, with special attention to multi-domain reasoning performance.
## License
Apache License 2.0

# Helios-R-6M
> **Helios-R-6M** 是一款高质量、轻量化的推理数据集,旨在强化数学、计算机科学与科学探究领域的多步问题求解能力。尽管该数据集覆盖多门学科,但数学相关样本占比最高,且决定了数据集的推理复杂度。
---
## 基于Hugging Face Datasets🤗快速上手
bash
pip install -U datasets
python
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Helios-R-6M", split="train")
---
## 数据集概览
* **总样本量**:约6,186,188条
* **数据集拆分**:仅包含训练集(`train`)
* **语言**:英语
* **存储格式**:Apache Arrow格式(可自动转换为Parquet格式)
* **开源协议**:Apache-2.0
* **标签**:`math`(数学)、`code`(代码)、`science`(科学)、`reasoning`(推理)、`thinking`(思维)、`cot`(思维链,Chain-of-Thought)
## 数据集亮点
* 专注于生成具备逻辑递进与清晰表达的高质量思维链解决方案
* 涵盖正式数学问题与抽象科学、逻辑挑战两类题型
* 适用于科学、技术、工程与数学(STEM)领域以及推理密集型场景的大语言模型预训练与微调
## 数据集结构
该数据集的每条样本包含以下字段:
* **`problem`**(字符串类型):来自数学、科学、逻辑或编程领域的问题或提示文本
* **`solution`**(字符串类型):结构化的分步推理轨迹,通常以`<think>`开头,模拟具有教学意义的问题解决思维过程
---
## 数据集来源与衍生说明
Helios-R-6M 是一款经过优化与精炼的混合数据集,衍生自以下数据集:
* [prithivMLmods/Poseidon-Reasoning-5M](https://huggingface.co/datasets/prithivMLmods/Poseidon-Reasoning-5M)
* [glaiveai/reasoning-v1-20m](https://huggingface.co/datasets/glaiveai/reasoning-v1-20m)
* [prithivMLmods/Open-Omega-Explora-2.5M](https://huggingface.co/datasets/prithivMLmods/Open-Omega-Explora-2.5M)
* 由**prithivMLmods**整理并贡献的额外自定义模块化问题
该组合兼顾了技术与学术领域的深度与广度,特别针对多领域推理性能进行了优化。
## 开源协议
Apache License 2.0
提供机构:
maas
创建时间:
2025-07-24



