Helios-R-6M

Name: Helios-R-6M
Creator: maas
Published: 2026-01-02 16:41:59
License: 暂无描述

魔搭社区2026-01-02 更新2025-08-02 收录

下载链接：

https://modelscope.cn/datasets/prithivMLmods/Helios-R-6M

下载链接

链接失效反馈

官方服务：

资源简介：

![1.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/jK4cfyvdP9XE_HjcNgErh.png) # Helios-R-6M > **Helios-R-6M** is a high-quality, compact reasoning dataset designed to strengthen multi-step problem solving across mathematics, computer science, and scientific inquiry. While the dataset covers a range of disciplines, math constitutes the largest share of examples and drives the reasoning complexity. --- ## Quick Start with Hugging Face Datasets🤗 ```py pip install -U datasets ``` ```py from datasets import load_dataset dataset = load_dataset("prithivMLmods/Helios-R-6M", split="train") ``` --- ## Overview * **Total Samples**: \~6,186,188 * **Split**: `train` only * **Languages**: English * **Format**: Apache Arrow (auto-converted to Parquet) * **License**: Apache-2.0 * **Tags**: `math`, `code`, `science`, `reasoning`, `thinking`, `cot` ## Highlights * Focused on generating thoughtful, chain-of-thought solutions with logical progression and clarity. * Includes both formal mathematical problems and abstract scientific or logical challenges. * Suitable for model pretraining and fine-tuning in STEM and reasoning-intensive domains. ## Dataset Structure Each sample in the dataset includes: * **`problem`** (string): A question or prompt from math, science, logic, or coding. * **`solution`** (string): A structured, step-by-step reasoning trace usually starting with `<think>`, simulating an instructive problem-solving mindset. --- ## Source & Derivation Helios-R-6M is a refined and optimized blend derived from: * [prithivMLmods/Poseidon-Reasoning-5M](https://huggingface.co/datasets/prithivMLmods/Poseidon-Reasoning-5M) * [glaiveai/reasoning-v1-20m](https://huggingface.co/datasets/glaiveai/reasoning-v1-20m) * [prithivMLmods/Open-Omega-Explora-2.5M](https://huggingface.co/datasets/prithivMLmods/Open-Omega-Explora-2.5M) * Additional custom modular problems curated and contributed by **prithivMLmods** This combination allows for both depth and breadth across technical and academic disciplines, with special attention to multi-domain reasoning performance. ## License Apache License 2.0

![1.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/jK4cfyvdP9XE_HjcNgErh.png) # Helios-R-6M > **Helios-R-6M** 是一款高质量、轻量化的推理数据集，旨在强化数学、计算机科学与科学探究领域的多步问题求解能力。尽管该数据集覆盖多门学科，但数学相关样本占比最高，且决定了数据集的推理复杂度。 --- ## 基于Hugging Face Datasets🤗快速上手 bash pip install -U datasets python from datasets import load_dataset dataset = load_dataset("prithivMLmods/Helios-R-6M", split="train") --- ## 数据集概览 * **总样本量**：约6,186,188条 * **数据集拆分**：仅包含训练集（`train`） * **语言**：英语 * **存储格式**：Apache Arrow格式（可自动转换为Parquet格式） * **开源协议**：Apache-2.0 * **标签**：`math`（数学）、`code`（代码）、`science`（科学）、`reasoning`（推理）、`thinking`（思维）、`cot`（思维链，Chain-of-Thought） ## 数据集亮点 * 专注于生成具备逻辑递进与清晰表达的高质量思维链解决方案 * 涵盖正式数学问题与抽象科学、逻辑挑战两类题型 * 适用于科学、技术、工程与数学（STEM）领域以及推理密集型场景的大语言模型预训练与微调 ## 数据集结构该数据集的每条样本包含以下字段： * **`problem`**（字符串类型）：来自数学、科学、逻辑或编程领域的问题或提示文本 * **`solution`**（字符串类型）：结构化的分步推理轨迹，通常以`<think>`开头，模拟具有教学意义的问题解决思维过程 --- ## 数据集来源与衍生说明 Helios-R-6M 是一款经过优化与精炼的混合数据集，衍生自以下数据集： * [prithivMLmods/Poseidon-Reasoning-5M](https://huggingface.co/datasets/prithivMLmods/Poseidon-Reasoning-5M) * [glaiveai/reasoning-v1-20m](https://huggingface.co/datasets/glaiveai/reasoning-v1-20m) * [prithivMLmods/Open-Omega-Explora-2.5M](https://huggingface.co/datasets/prithivMLmods/Open-Omega-Explora-2.5M) * 由**prithivMLmods**整理并贡献的额外自定义模块化问题该组合兼顾了技术与学术领域的深度与广度，特别针对多领域推理性能进行了优化。 ## 开源协议 Apache License 2.0

提供机构：

maas

创建时间：

2025-07-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集