ansulev/KIMI-K2.5-550000x

Name: ansulev/KIMI-K2.5-550000x
Creator: ansulev
Published: 2026-04-03 09:49:47
License: 暂无描述

Hugging Face2026-04-03 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/ansulev/KIMI-K2.5-550000x

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 language: - en size_categories: - 100K<n<1M task_categories: - text-generation - question-answering tags: - reasoning - chain-of-thought - instruction-tuning - sft configs: - config_name: General-Distillation data_files: - split: train path: "kimi-k2.5-main.jsonl" - config_name: PHD-Science data_files: - split: train path: "KimiK-2.5-PHD-Science.jsonl" --- ------------------------------------ # KIMI-K2.5-550000x - 550,000 reasoning traces distilled from ```KIMI-K2.5``` on ```high``` reasoning ------------------------------------ - Distribution: ``` Coding: 60% (Includes: Webdev, Python, C++, Java, JS, C, Ruby, Lua, Rust, and C#) Science: 15% (Physics, Chemistry, Biology) - 100k more completions in the PHD-Science subset Math: 10% (Algebra, Calculus, Probability) Computer Science: 5% Logical Questions 5% Creative Writing: 5% ``` - Token Count: ```2B``` ------------------------------------ > [!NOTE] > ![image](https://cdn-uploads.huggingface.co/production/uploads/6909e3efe7497f3dbbe640ba/AT1xJ2uYHT7LjFsxewu-U.png) > ------------------------------------ #### Data Collection - Collected using a modified Datagen by [TeichAI](https://huggingface.co/TeichAI) <img src="https://cdn-avatars.huggingface.co/v1/production/uploads/6837935ac3b7ffe0d2559ce9/-AxyvV4wfUY8uo87kNKkK.png" width="20" height="20" style="display: inline-block; vertical-align: middle; margin: 0 3px;">, over the course of about (20) hours ------------------------------------ #### hi - ianncity <img src="https://preview.redd.it/steam-happy-but-high-quality-v0-22ku6htw4u0c1.png?width=640&crop=smart&auto=webp&s=221735cb09dc3d4c1c7349e3187e752f6fe775e4" width="20" height="20" style="display: inline-block; vertical-align: middle; margin: 0 3px;">

许可证：Apache-2.0 语言：英语样本量范围：10万 < 样本量 < 100万任务类别： - 文本生成（text-generation） - 问答（question-answering）标签： - 推理（reasoning） - 思维链（Chain-of-Thought，CoT） - 指令微调（instruction-tuning） - 监督微调（Supervised Fine-Tuning，SFT）配置项： - 配置名称：通用蒸馏（General-Distillation）数据文件： - 划分集：训练集（train）文件路径："kimi-k2.5-main.jsonl" - 配置名称：博士级科学（PHD-Science）数据文件： - 划分集：训练集（train）文件路径："KimiK-2.5-PHD-Science.jsonl" --- # KIMI-K2.5-550000x - 本数据集包含从KIMI-K2.5模型在高难度推理任务上蒸馏得到的55万条推理轨迹。 --- - 数据分布：编码类：60%（涵盖Web开发、Python、C++、Java、JS、C、Ruby、Lua、Rust及C#）科学类：15%（包含物理学、化学、生物学），其中博士级科学子集中额外包含10万条完成样本数学类：10%（涵盖代数、微积分、概率论）计算机科学类：5% 逻辑推理题：5% 创意写作类：5% Token总数量：20亿（2B） --- > [!NOTE] > ![image](https://cdn-uploads.huggingface.co/production/uploads/6909e3efe7497f3dbbe640ba/AT1xJ2uYHT7LjFsxewu-U.png) --- #### 数据采集 - 本数据集通过由[TeichAI](https://huggingface.co/TeichAI)开发的改进版Datagen工具采集完成，采集耗时约20小时，工具头像：<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/6837935ac3b7ffe0d2559ce9/-AxyvV4wfUY8uo87kNKkK.png" width="20" height="20" style="display: inline-block; vertical-align: middle; margin: 0 3px;"> --- #### 贡献者：ianncity <img src="https://preview.redd.it/steam-happy-but-high-quality-v0-22ku6htw4u0c1.png?width=640&crop=smart&auto=webp&s=221735cb09dc3d4c1c7349e3187e752f6fe775e4" width="20" height="20" style="display: inline-block; vertical-align: middle; margin: 0 3px;">

提供机构：

ansulev

5,000+

优质数据集

54 个

任务类型

进入经典数据集