five

IIGroup/X-Coder-SFT-376k

收藏
Hugging Face2026-02-07 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/IIGroup/X-Coder-SFT-376k
下载链接
链接失效反馈
官方服务:
资源简介:
X-Coder-SFT-376k是一个大规模、完全合成的数据集,旨在推动竞争性编程的发展。该数据集包含4个子集,共有887,321条合成记录,覆盖423,883个独特查询。它适用于监督微调,适合从零开始训练代码推理基础。数据集由先进的推理模型精心策划,查询由OpenAI GPT-o3-mini合成,解决方案由DeepSeek-R1-0528和Qwen3-235B-A22B-Thinking-2507生成。数据集包括独特提示、多解决方案、混合和已验证四个子集,每个子集有不同的用途和特点。

X-Coder-SFT-376k is a large-scale, fully synthetic dataset for advancing competitive programming. The dataset comprises 4 subsets with a total of 887,321 synthetic records across 423,883 unique queries. It is designed for supervised fine-tuning and suitable for cold start to train code reasoning foundations. X-Coder-SFT-376k is curated by state-of-the-art reasoning models. The query is synthesized by OpenAI GPT-o3-mini, the solutions are generated by DeepSeek-R1-0528 and Qwen3-235B-A22B-Thinking-2507. The dataset includes four subsets: unique-prompt, multiple-solution, hybrid, and verified, each with different purposes and features.
提供机构:
IIGroup
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作