five

Code-Contests-Plus

收藏
魔搭社区2026-01-06 更新2025-06-14 收录
下载链接:
https://modelscope.cn/datasets/ByteDance-Seed/Code-Contests-Plus
下载链接
链接失效反馈
官方服务:
资源简介:
<div align="center"> <h1>CodeContests<sup>+</sup>: A Competitive Programming Dataset with High-Quality Test Cases</h1> </div> <div align="center" style="line-height: 1;"> <a href="https://arxiv.org/abs/2506.05817" target="_blank" style="margin: 2px;"> <img alt="2506.05817" src="https://img.shields.io/badge/arXiv-2506.05817-red?logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://huggingface.co/datasets/ByteDance-Seed" target="_blank" style="margin: 2px;"> <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-ByteDance%20Seed-536af5" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://huggingface.co/datasets/ByteDance-Seed/Code-Contests-Plus/blob/main/LICENSE" style="margin: 2px;"> <img alt="Dataset License" src="https://img.shields.io/badge/Dataset_License-CC--BY--4.0-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://github.com/bytedance/SandboxFusion/blob/main/LICENSE" style="margin: 2px;"> <img alt="Sandbox License" src="https://img.shields.io/badge/Sandbox_License-Apache--2.0-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/> </a> </div> ## Introduction CodeContests<sup>+</sup> is a competitive programming problem dataset built upon [CodeContests](https://huggingface.co/datasets/deepmind/code_contests). It includes 11,690 competitive programming problems, along with corresponding high-quality test cases, test case generators, test case validators, output checkers, and more than 13 million correct and incorrect solutions. ## Highlights **High Quality Test Cases:** We developed a Generator-Validator Agent System that can construct high-quality test cases for each problem. In addition to random test cases, it also generates special test cases tailored to the problem's characteristics and various corner cases, aiming to cover as many potential errors as possible. Furthermore, the correctness of these test cases is verified by an independent test case validator to ensure they comply with the problem constraints. **Test Case Generators:** We provide a test case generator for each problem, along with the commands to run it. These commands can be run multiple times to produce an infinite number of test cases. This allows users to understand the specific characteristics of all test cases clearly and enables them to use these generators to create as many additional test cases as they need. **Flexible Number of Test Cases:** Additionally, we also provide pre-generated test cases, available in five versions: 1x, 2x, ..., 5x. The number of test cases in these versions increases sequentially, so the computational resources required to run them will also increase. This allows users to strike a balance between computational cost and coverage according to their needs. **Test Case Validators:** Competitive programming problems usually specify many constraints on the input data itself, including data ranges, format requirements, data structure requirements, and so on. Therefore, constructing fully valid test cases is not an easy task, and even professional problem setters can easily make mistakes. For each problem, we provide a test case validator that strictly checks whether the test case input satisfies all constraints outlined in the problem description, to ensure the validity of the test cases as much as possible. **Output Checkers for Multiple Answer Problems:** In programming competitions, problems with multiple valid solutions are very common. This means that the same input can correspond to several valid outputs. Therefore, correctness cannot be determined simply by comparing the program's output with a single, pre-defined correct answer. For this reason, we provide custom output checkers for all such problems to verify the correctness of the output. **Rigorous Evaluation:** To rigorously evaluate the quality of these test cases, we assessed their accuracy using a large number of solutions. For each problem, we used 100 correct solutions and 100 incorrect solutions to determine if the test cases could correctly distinguish between correct and incorrect submissions. We have recorded the evaluation results, including True Positive Rate (TPR) and True Negative Rate (TNR), in the dataset. Additionally, based on these results, we selected a high-quality subset from the full dataset, named [CodeContests<sup>+</sup>Verified](https://huggingface.co/datasets/ByteDance-Seed/Code-Contests-Plus-Verified), in which the TPR and TNR for each problem are both above 0.9. Users can apply their own filtering if they require a looser or stricter threshold ## Quickstart Load dataset without test cases: ```python from datasets import load_dataset # Login using e.g. `huggingface-cli login` to access this dataset ds = load_dataset("ByteDance-Seed/Code-Contests-Plus", "default") ``` Load dataset with `1x` test cases: ```python from datasets import load_dataset # Login using e.g. `huggingface-cli login` to access this dataset ds = load_dataset("ByteDance-Seed/Code-Contests-Plus", "1x") ``` This dataset has 6 subsets, namely `default`, `1x`, `2x`, `3x`, `4x`, and `5x`. The problems in these subsets are the same. The only difference is the number of test cases. The table below presents the average number of test cases per problem in each subset. | Subset | default | 1x | 2x | 3x | 4x | 5x | |-------------------|---------|----|----|----|----|----| | Avg. # test cases | 0 | 25 | 44 | 62 | 80 | 98 | ## Usage We recommend using CodeContests<sup>+</sup> with [SandboxFusion](https://github.com/bytedance/SandboxFusion). SandboxFusion supports the automatic evaluation on 10+ open-source datasets, including CodeContest<sup>+</sup>, LiveCodeBench, HumanEval, MBPP, MHPP, and 20+ programming languages, including C++, Python (GPU supported), C#, Go, Java, NodeJS, Typescript, Kotlin, Rust, Bash, PHP, and even Verilog. ## Evaluation Result ![Fig](https://huggingface.co/datasets/ByteDance-Seed/Code-Contests-Plus/resolve/main/result.png) We present the histogram of the TPR and TNR of problems from (a) CodeContests and (b)CodeContests<sup>+</sup> above. For more details of our evaluation, please refer to our [paper](https://arxiv.org/abs/2506.05817). ## License This project is licensed under CC-BY-4.0. See the [LICENSE file](https://huggingface.co/datasets/ByteDance-Seed/Code-Contests-Plus/blob/main/LICENSE) for details. ## Citation ``` @inproceedings{wang-etal-2025-codecontests, title = "{C}ode{C}ontests+: High-Quality Test Case Generation for Competitive Programming", author = "Wang, Zihan and Liu, Siyao and Sun, Yang and Ding, Ming and Li, Hongyan", editor = "Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025", month = nov, year = "2025", address = "Suzhou, China", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.findings-emnlp.299/", pages = "5576--5600", ISBN = "979-8-89176-335-7" } ```

<div align="center"> <h1>CodeContests⁺:具备高质量测试用例的竞赛编程数据集</h1> </div> <div align="center" style="line-height: 1;"> <a href="https://arxiv.org/abs/2506.05817" target="_blank" style="margin: 2px;"> <img alt="2506.05817" src="https://img.shields.io/badge/arXiv-2506.05817-red?logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://huggingface.co/datasets/ByteDance-Seed" target="_blank" style="margin: 2px;"> <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-ByteDance%20Seed-536af5" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://huggingface.co/datasets/ByteDance-Seed/Code-Contests-Plus/blob/main/LICENSE" style="margin: 2px;"> <img alt="Dataset License" src="https://img.shields.io/badge/Dataset_License-CC--BY--4.0-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://github.com/bytedance/SandboxFusion/blob/main/LICENSE" style="margin: 2px;"> <img alt="Sandbox License" src="https://img.shields.io/badge/Sandbox_License-Apache--2.0-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/> </a> </div> ## 引言 CodeContests⁺是基于[CodeContests](https://huggingface.co/datasets/deepmind/code_contests)构建的竞赛编程问题数据集,共收录11690道竞赛编程题目,配套包含高质量测试用例、测试用例生成器、测试用例验证器、输出校验器,以及超过1300万条正确与错误的程序解。 ## 核心亮点 **高质量测试用例**:我们研发了生成器-验证器AI智能体系统(Generator-Validator Agent System),可为每道题目构建高质量测试用例。该系统不仅能生成随机测试用例,还可针对题目特性与各类边界场景生成定制化专用测试用例,力求覆盖尽可能多的潜在程序错误。此外,所有测试用例的正确性均由独立的测试用例验证器核验,确保其符合题目约束条件。 **测试用例生成器**:我们为每道题目提供了测试用例生成器及其运行命令。用户可多次运行这些命令,生成无限数量的测试用例,这既能帮助用户清晰掌握所有测试用例的具体特性,也支持用户利用这些生成器按需生成额外测试用例。 **灵活的测试用例数量**:此外,我们还提供预生成的测试用例,共包含5个版本:1x、2x……5x,各版本的测试用例数量依次递增,所需运行计算资源也随之提升,方便用户根据自身需求在计算成本与测试覆盖范围之间取得平衡。 **测试用例验证器**:竞赛编程题目通常会对输入数据设置诸多约束,包括数据范围、格式要求、数据结构要求等,因此构建完全合规的测试用例并非易事,即便专业的命题人员也可能出现疏漏。为此,我们为每道题目配备了测试用例验证器,可严格校验测试用例输入是否符合题目描述中列出的所有约束条件,最大程度保障测试用例的有效性。 **多解问题输出校验器**:在编程竞赛中,存在多个合法解的题目十分常见,即同一输入可对应多种合法输出,此时无法仅通过将程序输出与单一预定义正确结果比对来判断正确性。为此,我们为所有此类题目提供了定制化输出校验器,用于验证程序输出的正确性。 **严谨的评估**:为严格评估这些测试用例的质量,我们借助大量程序解对其准确性进行了测评。针对每道题目,我们分别使用100条正确解与100条错误解,来测试测试用例能否正确区分正确提交与错误提交。我们已将评估结果(包括真阳性率(True Positive Rate, TPR)与真阴性率(True Negative Rate, TNR))记录在数据集中。此外,基于上述评估结果,我们从全量数据集中筛选出了一个高质量子集,命名为[CodeContests⁺Verified](https://huggingface.co/datasets/ByteDance-Seed/Code-Contests-Plus-Verified),该子集中每道题目的TPR与TNR均高于0.9。若用户需要更宽松或更严格的阈值,可自行进行筛选。 ## 快速入门 无需加载测试用例即可加载数据集: python from datasets import load_dataset # 例如使用`huggingface-cli login`进行登录以访问该数据集 ds = load_dataset("ByteDance-Seed/Code-Contests-Plus", "default") 加载包含`1x`测试用例的数据集: python from datasets import load_dataset # 例如使用`huggingface-cli login`进行登录以访问该数据集 ds = load_dataset("ByteDance-Seed/Code-Contests-Plus", "1x") 该数据集共包含6个子集,分别为`default`、`1x`、`2x`、`3x`、`4x`与`5x`。各子集内的题目完全一致,唯一区别在于测试用例的数量。下表列出了每个子集中单道题目的平均测试用例数量。 | 子集 | default | 1x | 2x | 3x | 4x | 5x | |--------------------|---------|----|----|----|----|----| | 平均测试用例数 | 0 | 25 | 44 | 62 | 80 | 98 | ## 使用建议 我们推荐搭配[SandboxFusion](https://github.com/bytedance/SandboxFusion)使用CodeContests⁺。SandboxFusion支持对10余个开源数据集进行自动化评估,其中包括CodeContests⁺、LiveCodeBench、HumanEval、MBPP、MHPP等,同时支持C++、Python(支持GPU加速)、C#、Go、Java、NodeJS、Typescript、Kotlin、Rust、Bash、PHP乃至Verilog等20余种编程语言。 ## 评估结果 ![图](https://huggingface.co/datasets/ByteDance-Seed/Code-Contests-Plus/resolve/main/result.png) 上图展示了(a) CodeContests与(b) CodeContests⁺中各题目的TPR与TNR直方图。如需了解评估细节,请参阅我们的[论文](https://arxiv.org/abs/2506.05817)。 ## 许可证 本项目采用CC-BY-4.0许可证,详情请参阅[许可证文件](https://huggingface.co/datasets/ByteDance-Seed/Code-Contests-Plus/blob/main/LICENSE)。 ## 引用格式 bibtex @inproceedings{wang-etal-2025-codecontests, title = "{C}ode{C}ontests+: High-Quality Test Case Generation for Competitive Programming", author = "Wang, Zihan and Liu, Siyao and Sun, Yang and Ding, Ming and Li, Hongyan", editor = "Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025", month = nov, year = "2025", address = "Suzhou, China", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.findings-emnlp.299/", pages = "5576--5600", ISBN = "979-8-89176-335-7" }
提供机构:
maas
创建时间:
2025-06-06
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
Code-Contests-Plus是一个基于CodeContests构建的竞争性编程数据集,包含11,690个编程问题及其高质量测试用例、生成器、验证器和输出检查器,并提供了超过1300万个解决方案。该数据集的特点在于通过Generator-Validator Agent System生成覆盖多种错误的高质量测试用例,提供灵活测试用例数量选项(1x到5x版本),并包含严格的评估指标(如TPR和TNR),以支持编程竞赛和算法研究的评估需求。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作