Code-Contests-Plus
收藏魔搭社区2026-01-06 更新2025-06-14 收录
下载链接:
https://modelscope.cn/datasets/ByteDance-Seed/Code-Contests-Plus
下载链接
链接失效反馈资源简介:
<div align="center">
<h1>CodeContests<sup>+</sup>: A Competitive Programming Dataset with High-Quality Test Cases</h1>
</div>
<div align="center" style="line-height: 1;">
<a href="https://arxiv.org/abs/2506.05817" target="_blank" style="margin: 2px;">
<img alt="2506.05817" src="https://img.shields.io/badge/arXiv-2506.05817-red?logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://huggingface.co/datasets/ByteDance-Seed" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-ByteDance%20Seed-536af5" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://huggingface.co/datasets/ByteDance-Seed/Code-Contests-Plus/blob/main/LICENSE" style="margin: 2px;">
<img alt="Dataset License" src="https://img.shields.io/badge/Dataset_License-CC--BY--4.0-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://github.com/bytedance/SandboxFusion/blob/main/LICENSE" style="margin: 2px;">
<img alt="Sandbox License" src="https://img.shields.io/badge/Sandbox_License-Apache--2.0-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
</a>
</div>
## Introduction
CodeContests<sup>+</sup> is a competitive programming problem dataset built upon [CodeContests](https://huggingface.co/datasets/deepmind/code_contests). It includes 11,690 competitive programming problems, along with corresponding high-quality test cases, test case generators, test case validators, output checkers, and more than 13 million correct and incorrect solutions.
## Highlights
**High Quality Test Cases:** We developed a Generator-Validator Agent System that can construct high-quality test cases for each problem. In addition to random test cases, it also generates special test cases tailored to the problem's characteristics and various corner cases, aiming to cover as many potential errors as possible. Furthermore, the correctness of these test cases is verified by an independent test case validator to ensure they comply with the problem constraints.
**Test Case Generators:** We provide a test case generator for each problem, along with the commands to run it. These commands can be run multiple times to produce an infinite number of test cases. This allows users to understand the specific characteristics of all test cases clearly and enables them to use these generators to create as many additional test cases as they need.
**Flexible Number of Test Cases:** Additionally, we also provide pre-generated test cases, available in five versions: 1x, 2x, ..., 5x. The number of test cases in these versions increases sequentially, so the computational resources required to run them will also increase. This allows users to strike a balance between computational cost and coverage according to their needs.
**Test Case Validators:** Competitive programming problems usually specify many constraints on the input data itself, including data ranges, format requirements, data structure requirements, and so on. Therefore, constructing fully valid test cases is not an easy task, and even professional problem setters can easily make mistakes. For each problem, we provide a test case validator that strictly checks whether the test case input satisfies all constraints outlined in the problem description, to ensure the validity of the test cases as much as possible.
**Output Checkers for Multiple Answer Problems:** In programming competitions, problems with multiple valid solutions are very common. This means that the same input can correspond to several valid outputs. Therefore, correctness cannot be determined simply by comparing the program's output with a single, pre-defined correct answer. For this reason, we provide custom output checkers for all such problems to verify the correctness of the output.
**Rigorous Evaluation:** To rigorously evaluate the quality of these test cases, we assessed their accuracy using a large number of solutions. For each problem, we used 100 correct solutions and 100 incorrect solutions to determine if the test cases could correctly distinguish between correct and incorrect submissions. We have recorded the evaluation results, including True Positive Rate (TPR) and True Negative Rate (TNR), in the dataset. Additionally, based on these results, we selected a high-quality subset from the full dataset, named [CodeContests<sup>+</sup>Verified](https://huggingface.co/datasets/ByteDance-Seed/Code-Contests-Plus-Verified), in which the TPR and TNR for each problem are both above 0.9. Users can apply their own filtering if they require a looser or stricter threshold
## Quickstart
Load dataset without test cases:
```python
from datasets import load_dataset
# Login using e.g. `huggingface-cli login` to access this dataset
ds = load_dataset("ByteDance-Seed/Code-Contests-Plus", "default")
```
Load dataset with `1x` test cases:
```python
from datasets import load_dataset
# Login using e.g. `huggingface-cli login` to access this dataset
ds = load_dataset("ByteDance-Seed/Code-Contests-Plus", "1x")
```
This dataset has 6 subsets, namely `default`, `1x`, `2x`, `3x`, `4x`, and `5x`. The problems in these subsets are the same. The only difference is the number of test cases. The table below presents the average number of test cases per problem in each subset.
| Subset | default | 1x | 2x | 3x | 4x | 5x |
|-------------------|---------|----|----|----|----|----|
| Avg. # test cases | 0 | 25 | 44 | 62 | 80 | 98 |
## Usage
We recommend using CodeContests<sup>+</sup> with [SandboxFusion](https://github.com/bytedance/SandboxFusion). SandboxFusion supports the automatic evaluation on 10+ open-source datasets, including CodeContest<sup>+</sup>, LiveCodeBench, HumanEval, MBPP, MHPP, and 20+ programming languages, including C++, Python (GPU supported), C#, Go, Java, NodeJS, Typescript, Kotlin, Rust, Bash, PHP, and even Verilog.
## Evaluation Result

We present the histogram of the TPR and TNR of problems from (a) CodeContests and (b)CodeContests<sup>+</sup> above. For more details of our evaluation, please refer to our [paper](https://arxiv.org/abs/2506.05817).
## License
This project is licensed under CC-BY-4.0. See the [LICENSE file](https://huggingface.co/datasets/ByteDance-Seed/Code-Contests-Plus/blob/main/LICENSE) for details.
## Citation
```
@inproceedings{wang-etal-2025-codecontests,
title = "{C}ode{C}ontests+: High-Quality Test Case Generation for Competitive Programming",
author = "Wang, Zihan and
Liu, Siyao and
Sun, Yang and
Ding, Ming and
Li, Hongyan",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-emnlp.299/",
pages = "5576--5600",
ISBN = "979-8-89176-335-7"
}
```
提供机构:
maas
创建时间:
2025-06-06
AI搜集汇总
数据集介绍

背景与挑战
背景概述
Code-Contests-Plus是一个基于CodeContests构建的竞争性编程数据集,包含11,690个编程问题及其高质量测试用例、生成器、验证器和输出检查器,并提供了超过1300万个解决方案。该数据集的特点在于通过Generator-Validator Agent System生成覆盖多种错误的高质量测试用例,提供灵活测试用例数量选项(1x到5x版本),并包含严格的评估指标(如TPR和TNR),以支持编程竞赛和算法研究的评估需求。
以上内容由AI搜集并总结生成



