Modotte/CodeX-2M-Thinking
收藏Hugging Face2026-02-10 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/Modotte/CodeX-2M-Thinking
下载链接
链接失效反馈官方服务:
资源简介:
CodeX-5M-Thinking是一个专为基于指令的模型调优和现有模型微调而设计的精心策划的编码数据集,旨在增强代码生成和推理能力。这个完全合成的数据集代表了Hugging Face平台上一个大而全面过滤的编码数据语料库,强调了一种思考方法,包含逐步推理以进行更深入的模型训练。数据集包含200万经过高度策划的编码示例,覆盖从基本语法到高级软件工程的编程领域,并通过多阶段过滤和验证过程确保质量。数据集特别注重思考,响应中包含逐步推理,优化了具有详细思维过程的指令训练。所有代码执行和正确性都经过自动化测试框架验证。
CodeX-5M-Thinking is a meticulously curated coding dataset designed specifically for instruction-based model tuning and fine-tuning of existing models with enhanced code generation and reasoning capabilities. This fully synthetic dataset represents a large and comprehensively filtered corpus of coding data on the Hugging Face platform, emphasizing a thinking approach with step-by-step reasoning for deeper model training. The dataset includes 2 million examples of highly curated coding data, covering programming domains from basic syntax to advanced software engineering, with quality ensured through multi-stage filtering and verification processes. It particularly focuses on thinking, with step-by-step reasoning included in responses, optimized for instruction training with detailed thought processes. All code executions and correctness are validated using automated testing frameworks.
提供机构:
Modotte



