DeepCoder-Preview-Dataset
收藏魔搭社区2026-05-13 更新2025-04-12 收录
下载链接:
https://modelscope.cn/datasets/agentica-org/DeepCoder-Preview-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
## Data
Our training dataset consists of 24K problems paired with their test cases:
- 7.5K **TACO Verified** problems.
- 16K verified coding problems from **PrimeIntellect’s SYNTHETIC-1.**
- 600 **LiveCodeBench (v5)** problems submitted between **May 1, 2023** and **July 31, 2024.**
Our test dataset consists of:
- **LiveCodeBench (v5)** problems between **August 1, 2024** and **February 1, 2025**.
- **Codeforces** problems from `Qwen/CodeElo`.
## Format
Each row in the dataset contains:
- **problem**: The coding problem, usually extracted from competitive coding websites.
- **tests**: The test cases corresponding to the problem. We've ensured that all problems are fully verifiable and have >= 5 test cases.
We note different datasets have different keys beyond problems and tests.
## Citation
```bibtex
@misc{deepcoder2025,
title={DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level},
author={Michael Luo, Sijun Tan, Roy Huang, Ameen Patel, Alpay Ariyak, Qingyang Wu, Xiaoxiang Shi, Rachel Xin, Colin Cai, Maurice Weber, Ce Zhang, Li Erran Li, Raluca Ada Popa, Ion Stoica},
howpublished={\url{https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51}},
note={Notion Blog},
year={2025}
}
```
### 数据来源
本训练数据集包含24000道配套测试用例的编程题目:
- 7500道**TACO Verified**题目;
- 16000道来自**PrimeIntellect的SYNTHETIC-1**的经校验编程题目;
- 600道**LiveCodeBench (v5)** 题目,提交时段为2023年5月1日至2024年7月31日。
本测试数据集包含:
- **LiveCodeBench (v5)** 2024年8月1日至2025年2月1日期间的题目;
- 来自`Qwen/CodeElo`的**Codeforces**题目。
### 数据格式
数据集中每一行包含以下字段:
- **problem**:编程题目,通常提取自各类程序设计竞赛网站;
- **tests**:对应题目的测试用例。我们已确保所有题目均可完整验证,且测试用例数量不少于5组。
需注意,不同数据集除`problem`与`tests`外,可能包含其他额外字段。
### 引用
bibtex
@misc{deepcoder2025,
title={DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level},
author={Michael Luo, Sijun Tan, Roy Huang, Ameen Patel, Alpay Ariyak, Qingyang Wu, Xiaoxiang Shi, Rachel Xin, Colin Cai, Maurice Weber, Ce Zhang, Li Erran Li, Raluca Ada Popa, Ion Stoica},
howpublished={url{https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51}},
note={Notion Blog},
year={2025}
}
提供机构:
maas
创建时间:
2025-04-22



