ansulev/KIMI-K2.5-550000x
收藏Hugging Face2026-04-03 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ansulev/KIMI-K2.5-550000x
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
size_categories:
- 100K<n<1M
task_categories:
- text-generation
- question-answering
tags:
- reasoning
- chain-of-thought
- instruction-tuning
- sft
configs:
- config_name: General-Distillation
data_files:
- split: train
path: "kimi-k2.5-main.jsonl"
- config_name: PHD-Science
data_files:
- split: train
path: "KimiK-2.5-PHD-Science.jsonl"
---
------------------------------------
# KIMI-K2.5-550000x
- 550,000 reasoning traces distilled from ```KIMI-K2.5``` on ```high``` reasoning
------------------------------------
- Distribution:
```
Coding: 60% (Includes: Webdev, Python, C++, Java, JS, C, Ruby, Lua, Rust, and C#)
Science: 15% (Physics, Chemistry, Biology) - 100k more completions in the PHD-Science subset
Math: 10% (Algebra, Calculus, Probability)
Computer Science: 5%
Logical Questions 5%
Creative Writing: 5%
```
- Token Count: ```2B```
------------------------------------
> [!NOTE]
> 
>
------------------------------------
#### Data Collection
- Collected using a modified Datagen by [TeichAI](https://huggingface.co/TeichAI) <img src="https://cdn-avatars.huggingface.co/v1/production/uploads/6837935ac3b7ffe0d2559ce9/-AxyvV4wfUY8uo87kNKkK.png" width="20" height="20" style="display: inline-block; vertical-align: middle; margin: 0 3px;">, over the course of about (20) hours
------------------------------------
#### hi - ianncity <img src="https://preview.redd.it/steam-happy-but-high-quality-v0-22ku6htw4u0c1.png?width=640&crop=smart&auto=webp&s=221735cb09dc3d4c1c7349e3187e752f6fe775e4" width="20" height="20" style="display: inline-block; vertical-align: middle; margin: 0 3px;">
许可证:Apache-2.0
语言:英语
样本量范围:10万 < 样本量 < 100万
任务类别:
- 文本生成(text-generation)
- 问答(question-answering)
标签:
- 推理(reasoning)
- 思维链(Chain-of-Thought,CoT)
- 指令微调(instruction-tuning)
- 监督微调(Supervised Fine-Tuning,SFT)
配置项:
- 配置名称:通用蒸馏(General-Distillation)
数据文件:
- 划分集:训练集(train)
文件路径:"kimi-k2.5-main.jsonl"
- 配置名称:博士级科学(PHD-Science)
数据文件:
- 划分集:训练集(train)
文件路径:"KimiK-2.5-PHD-Science.jsonl"
---
# KIMI-K2.5-550000x
- 本数据集包含从KIMI-K2.5模型在高难度推理任务上蒸馏得到的55万条推理轨迹。
---
- 数据分布:
编码类:60%(涵盖Web开发、Python、C++、Java、JS、C、Ruby、Lua、Rust及C#)
科学类:15%(包含物理学、化学、生物学),其中博士级科学子集中额外包含10万条完成样本
数学类:10%(涵盖代数、微积分、概率论)
计算机科学类:5%
逻辑推理题:5%
创意写作类:5%
Token总数量:20亿(2B)
---
> [!NOTE]
> 
---
#### 数据采集
- 本数据集通过由[TeichAI](https://huggingface.co/TeichAI)开发的改进版Datagen工具采集完成,采集耗时约20小时,工具头像:<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/6837935ac3b7ffe0d2559ce9/-AxyvV4wfUY8uo87kNKkK.png" width="20" height="20" style="display: inline-block; vertical-align: middle; margin: 0 3px;">
---
#### 贡献者:ianncity <img src="https://preview.redd.it/steam-happy-but-high-quality-v0-22ku6htw4u0c1.png?width=640&crop=smart&auto=webp&s=221735cb09dc3d4c1c7349e3187e752f6fe775e4" width="20" height="20" style="display: inline-block; vertical-align: middle; margin: 0 3px;">
提供机构:
ansulev



