five

ansulev/KIMI-K2.5-550000x

收藏
Hugging Face2026-04-03 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ansulev/KIMI-K2.5-550000x
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: - en size_categories: - 100K<n<1M task_categories: - text-generation - question-answering tags: - reasoning - chain-of-thought - instruction-tuning - sft configs: - config_name: General-Distillation data_files: - split: train path: "kimi-k2.5-main.jsonl" - config_name: PHD-Science data_files: - split: train path: "KimiK-2.5-PHD-Science.jsonl" --- ------------------------------------ # KIMI-K2.5-550000x - 550,000 reasoning traces distilled from ```KIMI-K2.5``` on ```high``` reasoning ------------------------------------ - Distribution: ``` Coding: 60% (Includes: Webdev, Python, C++, Java, JS, C, Ruby, Lua, Rust, and C#) Science: 15% (Physics, Chemistry, Biology) - 100k more completions in the PHD-Science subset Math: 10% (Algebra, Calculus, Probability) Computer Science: 5% Logical Questions 5% Creative Writing: 5% ``` - Token Count: ```2B``` ------------------------------------ > [!NOTE] > ![image](https://cdn-uploads.huggingface.co/production/uploads/6909e3efe7497f3dbbe640ba/AT1xJ2uYHT7LjFsxewu-U.png) > ------------------------------------ #### Data Collection - Collected using a modified Datagen by [TeichAI](https://huggingface.co/TeichAI) <img src="https://cdn-avatars.huggingface.co/v1/production/uploads/6837935ac3b7ffe0d2559ce9/-AxyvV4wfUY8uo87kNKkK.png" width="20" height="20" style="display: inline-block; vertical-align: middle; margin: 0 3px;">, over the course of about (20) hours ------------------------------------ #### hi - ianncity <img src="https://preview.redd.it/steam-happy-but-high-quality-v0-22ku6htw4u0c1.png?width=640&crop=smart&auto=webp&s=221735cb09dc3d4c1c7349e3187e752f6fe775e4" width="20" height="20" style="display: inline-block; vertical-align: middle; margin: 0 3px;">

许可证:Apache-2.0 语言:英语 样本量范围:10万 < 样本量 < 100万 任务类别: - 文本生成(text-generation) - 问答(question-answering) 标签: - 推理(reasoning) - 思维链(Chain-of-Thought,CoT) - 指令微调(instruction-tuning) - 监督微调(Supervised Fine-Tuning,SFT) 配置项: - 配置名称:通用蒸馏(General-Distillation) 数据文件: - 划分集:训练集(train) 文件路径:"kimi-k2.5-main.jsonl" - 配置名称:博士级科学(PHD-Science) 数据文件: - 划分集:训练集(train) 文件路径:"KimiK-2.5-PHD-Science.jsonl" --- # KIMI-K2.5-550000x - 本数据集包含从KIMI-K2.5模型在高难度推理任务上蒸馏得到的55万条推理轨迹。 --- - 数据分布: 编码类:60%(涵盖Web开发、Python、C++、Java、JS、C、Ruby、Lua、Rust及C#) 科学类:15%(包含物理学、化学、生物学),其中博士级科学子集中额外包含10万条完成样本 数学类:10%(涵盖代数、微积分、概率论) 计算机科学类:5% 逻辑推理题:5% 创意写作类:5% Token总数量:20亿(2B) --- > [!NOTE] > ![image](https://cdn-uploads.huggingface.co/production/uploads/6909e3efe7497f3dbbe640ba/AT1xJ2uYHT7LjFsxewu-U.png) --- #### 数据采集 - 本数据集通过由[TeichAI](https://huggingface.co/TeichAI)开发的改进版Datagen工具采集完成,采集耗时约20小时,工具头像:<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/6837935ac3b7ffe0d2559ce9/-AxyvV4wfUY8uo87kNKkK.png" width="20" height="20" style="display: inline-block; vertical-align: middle; margin: 0 3px;"> --- #### 贡献者:ianncity <img src="https://preview.redd.it/steam-happy-but-high-quality-v0-22ku6htw4u0c1.png?width=640&crop=smart&auto=webp&s=221735cb09dc3d4c1c7349e3187e752f6fe775e4" width="20" height="20" style="display: inline-block; vertical-align: middle; margin: 0 3px;">
提供机构:
ansulev
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作