DijkstraFTW/ianncity_KIMI-K2.5-1000000x
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/DijkstraFTW/ianncity_KIMI-K2.5-1000000x
下载链接
链接失效反馈官方服务:
资源简介:
KIMI-K2.5-1000000x是一个包含1,000,000个推理轨迹的数据集,这些轨迹是从KIMI-K2.5中提取的高质量推理数据。数据集分布广泛,包括编码(50%)、科学(20%)、数学(15%)、计算机科学(5%)、逻辑问题(5%)、创意写作(5%)和多语言STEM(100k完成)。总token数为5B。数据收集使用了修改版的Datagen工具,耗时约80小时。数据集适用于文本生成和问答任务,特别强调推理、思维链、指令调整和SFT。
KIMI-K2.5-1000000x is a dataset containing 1,000,000 reasoning traces distilled from KIMI-K2.5, focusing on high-quality reasoning. The dataset is broadly distributed, including Coding (50%), Science (20%), Math (15%), Computer Science (5%), Logical Questions (5%), Creative Writing (5%), and MultilingualSTEM (100k completions). The total token count is 5B. Data was collected using a modified version of Datagen over approximately 80 hours. The dataset is suitable for text-generation and question-answering tasks, with a particular emphasis on reasoning, chain-of-thought, instruction-tuning, and SFT.
提供机构:
DijkstraFTW



