LAP2-K-Think-v1.a
收藏魔搭社区2025-12-03 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/prithivMLmods/LAP2-K-Think-v1.a
下载链接
链接失效反馈官方服务:
资源简介:

# **LAP2-K-Think-v1.a**
> The **LAP2-K-Think-v1.a** dataset, curated by **prithivMLmods** and available on Hugging Face, is a specialized reasoning dataset focused on **coding-based mathematics**, algorithmic problem solving, and code-x style thinking. It features a macro-mixture of coding and math-related problems. This dataset contains approximately **257,110 rows** in **Parquet** format, enabling efficient storage and high-performance training. Each entry includes a challenging problem statement with a detailed reasoning-based solution, suitable for **training**, **fine-tuning**, and **evaluating advanced models** in coding intelligence and math reasoning.
## Quick Start with Hugging Face Datasets
```bash
pip install -U datasets
```
```python
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/LAP2-K-Think-v1.a", split="train")
```
---
## Dataset Summary
| Feature | Details |
| ------------ | ---------------------------------------------- |
| **Rows** | ~257K |
| **Size[partial]** | ~2.23 GB |
| **Format** | Parquet |
| **Language** | English |
| **License** | Apache-2.0 |
| **Domains** | Code reasoning, algorithmic math, code-x tasks |
---
## Data Columns
* **problem**: Math-based coding or algorithmic challenge prompts
* **solution**: Step-by-step reasoning and code-aligned answers
---
## Data Sources
This version primarily aggregates:
* **Xen-Arc AI CodeX-2M-Thinking** [Small traces, depending on the specific problem] → Code-x style reasoning and algorithmic prompts
* **Custom math-coding problems** curated for structured logic alignment [prithivMLmods/Gargantua-R1-Wee](https://huggingface.co/datasets/prithivMLmods/Gargantua-R1-Wee)
## Why This Dataset?
* Excellent for **code-aware reasoning models**
* Provides **thought-traces** enabling procedural logic learning
* Great benchmark for:
* Coding assistants
* Math-focused LLMs
* Instruction-tuned reasoning models
## Intended Use Cases
* Fine-tuning LLMs for competitive programming tasks
* Training models on strong trace-based reasoning
* Automated tutoring systems focused on coding + math
* Evaluation of algorithmic understanding in AI agents
## Maintainer
| Maintained by | Last Updated |
| --------------------------------------------------------- | ------------ |
| **[prithivMLmods](https://huggingface.co/prithivMLmods)** | **Nov 2025** |

# **LAP2-K-Think-v1.a**
> 由**prithivMLmods**整理、可在Hugging Face平台获取的**LAP2-K-Think-v1.a**数据集,是一款专注于**基于编码的数学(coding-based mathematics)**、算法问题求解与**代码式思维(code-x style thinking)**的专用推理数据集。该数据集采用宏观混合模式,整合编码与数学相关问题,总计约**257,110条数据**,存储格式为**Parquet**,可实现高效存储与高性能训练。每条数据均包含带有详细推理过程的挑战性问题描述与解决方案,适用于**训练**、**微调**以及**评估编码智能与数学推理领域的高级模型**。
## Hugging Face 数据集快速上手
bash
pip install -U datasets
python
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/LAP2-K-Think-v1.a", split="train")
---
## 数据集概览
| 特征项 | 详情 |
| ------------ | ---------------------------------------------- |
| **数据条数** | 约25.7万条 |
| **[部分]大小** | 约2.23 GB |
| **存储格式** | Parquet |
| **语言** | 英语 |
| **授权协议** | Apache-2.0 |
| **应用领域** | 代码推理、算法数学、代码式任务(code-x tasks) |
---
## 数据字段
* **问题(problem)**: 基于数学的编码或算法挑战提示语
* **解决方案(solution)**: 包含逐步骤推理过程且与代码逻辑对齐的答案
---
## 数据来源
本版本主要整合了以下资源:
* **Xen-Arc AI CodeX-2M-Thinking** [少量样本,视具体问题而定] → 代码式思维推理与算法提示语
* 为实现结构化逻辑对齐而整理的**自定义数学编码问题**(源自数据集prithivMLmods/Gargantua-R1-Wee,链接:https://huggingface.co/datasets/prithivMLmods/Gargantua-R1-Wee)
---
## 该数据集的优势
* 适配**代码感知推理模型**
* 提供**思维轨迹(thought-traces)**,支持过程性逻辑学习
* 可作为以下场景的优质基准测试集:
* 编码助手
* 专注数学推理的大语言模型(Large Language Model,LLM)
* 经过指令微调的推理模型
---
## 预期应用场景
* 针对竞赛编程任务微调大语言模型
* 训练具备强轨迹推理能力的模型
* 聚焦编码与数学领域的自动化辅导系统
* 评估AI智能体(AI Agent)的算法理解能力
---
## 维护者
| 维护方 | 最后更新时间 |
| --------------------------------------------------------- | ------------ |
| **[prithivMLmods](https://huggingface.co/prithivMLmods)** | **2025年11月** |
提供机构:
maas
创建时间:
2025-11-27



