five

AmanPriyanshu/reasoning-sft-NextCoderDataset-100K

收藏
Hugging Face2026-03-03 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AmanPriyanshu/reasoning-sft-NextCoderDataset-100K
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-generation language: - en - code tags: - reasoning - sft - chain-of-thought - code-editing - code size_categories: - 100K<n<1M --- # NextCoderDataset (converted) Converted version of [microsoft/NextCoderDataset](https://huggingface.co/datasets/microsoft/NextCoderDataset), subsampled to 100,000 rows equally distributed across 8 programming languages for reasoning SFT training. ## Format Each row has three columns: - **`input`** - list of dicts with system and user messages (system prompt sets expert code editor role, user prompt contains the editing instruction and original code) - **`response`** - response string with `<think>` reasoning block followed by the edited code in markdown code blocks - **`source`** - programming language (cpp, c, rust, java, javascript, python, go, kotlin) ## Language Distribution | Language | Rows | |----------|------| | c | 12,500 | | cpp | 12,500 | | go | 12,500 | | java | 12,500 | | javascript | 12,500 | | kotlin | 12,500 | | python | 12,500 | | rust | 12,500 | ## Conversion - Subsampled 12,500 rows per language from the original 381K dataset - Added system prompt with expert code editor role - Injected generic code-editing reasoning sequences in think blocks - Response format: think block then edited code ## License MIT ## Credits Original dataset: [microsoft/NextCoderDataset](https://huggingface.co/datasets/microsoft/NextCoderDataset) - NextCoder: Robust Adaptation of Code LMs to Diverse Code Edits (ICML 2025)
提供机构:
AmanPriyanshu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作