ioi
收藏魔搭社区2026-01-02 更新2025-03-15 收录
下载链接:
https://modelscope.cn/datasets/open-r1/ioi
下载链接
链接失效反馈官方服务:
资源简介:
# IOI
The International Olympiad in Informatics (IOI) is one of five international science olympiads (if you are familiar with AIME, IOI is the programming equivalent of IMO, for which the very best students who take part in AIME are invited) and tests a very select group of high school students (4 per country) in complex algorithmic problems.
The problems are extremely challenging, and the full test sets are available and released under a permissive (CC-BY) license. This means that IOI is the perfect dataset to test a model's code reasoning capabilities.
In IOI, each problem has several subtasks, each with different input constraints. To solve a subtask, a submission needs to pass all of its test cases within the (strict) time limits. While the final subtask is usually the “full problem”, some subtasks effectively describe a much easier (more constrained) problem, and contestants very often target specific subtasks to get partial scores instead of just trying to solve the full problem (perfect scores are relatively rare).
Following a [recent OpenAI paper](http://arxiv.org/abs/2502.06807) where o1 live competed at IOI’2024 (the last iteration), we similarly processed all problems from IOI’2024 (as well as previous IOIs up until 2020), and split them into their subtasks, such that each prompt would ask models to solve one specific subtask. We release the processed problem statements, as well as all grading/checker files required to run them and test cases in `open-r1/ioi` and `open-r1/ioi-test-cases`.
We created custom code to run solutions (many problems have complicated setups, requiring a “manager” process communicating with several processes running the user submission and special checkers to validate solutions) and to grade according to IOI rules, which is available at [https://github.com/huggingface/ioi](https://github.com/huggingface/ioi), and **evaluated over 40 leading reasoning models on IOI’2024**.
At IOI, contestants have a 50 submission limit per problem. We generated 50 submissions for each subtask and then applied a selection strategy similar to the one [used by OpenAI](http://arxiv.org/abs/2502.06807) to get scores under contest conditions for each problem. Results can be found below, where the horizontal lines represent the threshold for bronze, silver and gold models [from real contest data](https://stats.ioinformatics.org/results/2024). While o1 comes very close to bronze, ultimately no model would reach the medal threshold (50th percentile of contestants).

## Submission strategy
An important note is that our submission strategy may penalize non reasoning models, such as Qwen-2.5-Coder-32B-Instruct, OlympicCoder-32B’s base model. To simulate real contest conditions, in which a submission’s score is only known *after* it is actually submitted, we employed a round robin submission strategy similar to the one employed by [OpenAI for o1-ioi](http://arxiv.org/abs/2502.06807): we start by submitting a solution that targets the last subtask of the problem, then one targeting the second to last, and so on, only evaluating a solution when it is picked for submission. We skip over submissions targeting subtasks that have already been solved by previously selected submissions, and within each targeted subtask we prefer submissions from *longer generations* — which is a criteria that makes sense for reasoning models, but less so for other models.
If we remove the 50 submission limit (which would place us outside the contest conditions), and evaluate all submissions we generated (50 per subtask), we obtain the following results:

## Dataset links
- [Problem statements dataset](https://huggingface.co/datasets/open-r1/ioi) (IOI’2020 - IOI’2024): `open-r1/ioi` (you are here!)
- [Test cases](https://huggingface.co/datasets/open-r1/ioi-test-cases): `open-r1/ioi-test-cases`
- [Official (ground truth) solutions](https://huggingface.co/datasets/open-r1/ioi-sample-solutions): `open-r1/ioi-sample-solutions`
- [Evaluation data for 40+ leading models on IOI’2024](https://huggingface.co/datasets/open-r1/ioi-2024-model-solutions): `open-r1/ioi-2024-model-solutions`
## Generating solutions
To have models generate solutions to IOI problems, follow the instructions [here](https://github.com/huggingface/ioi/tree/main/generate).
## Running tests
To run tests on generated solutions, follow the instructions [here](https://github.com/huggingface/ioi/tree/main/run_tests).
## Original data
All data was extracted, parsed, and adapted from the official IOI Tasks, which are released under CC-BY:
- https://www.ioi2024.eg/competition-tasks
- https://ioi2023.hu/tasks/index.html
- https://ioi2022.id/tasks/
- https://ioi2021.sg/ioi-2021-tasks/
- https://ioi2020.sg/ioi-2020-tasks/
## License
Similarly to the original problem data, this and accompanying datasets are released under CC-BY.
# 国际信息学奥林匹克(International Olympiad in Informatics, IOI)
国际信息学奥林匹克(以下简称IOI)是五大国际科学奥林匹克赛事之一。若您熟悉美国数学邀请赛(American Invitational Mathematics Examination, 简称AIME),则IOI相当于编程领域的国际数学奥林匹克(International Mathematical Olympiad, 简称IMO),受邀参赛者均为AIME的顶尖选手。该赛事面向经过严格选拔的高中生(每个国家可派出4名选手),考核其解决复杂算法问题的能力。
赛题难度极高,完整赛题集以宽松的知识共享署名(CC-BY)协议发布。正因如此,IOI是测试模型代码推理能力的绝佳数据集。
在IOI赛制中,每道题目包含若干子任务,每个子任务对应不同的输入约束条件。参赛者需使提交的代码在严格的时限内通过子任务的所有测试用例,方可获得该子任务的分数。虽然最后一个子任务通常对应“完整问题”,但部分子任务本质上是约束更强、难度更低的简化版本。参赛者往往会针对性攻克特定子任务以获取部分分数,而非仅尝试解决完整问题——满分选手相对罕见。
在[OpenAI近期论文](http://arxiv.org/abs/2502.06807)中,o1模型曾参与IOI’2024(即上一届赛事)的竞技。我们参照该工作,处理了IOI’2024及截至2020年的历届IOI赛题,将其拆分为各个子任务,使得每个提示词均可要求模型解决某一特定子任务。我们已将处理后的赛题描述、运行测试所需的所有评分/校验文件,以及测试用例发布至`open-r1/ioi`与`open-r1/ioi-test-cases`仓库。
我们开发了自定义代码以运行参赛代码(部分赛题逻辑复杂,需要“管理器”进程与多个运行用户提交代码的进程及专用校验器通信,以验证解法正确性),并按照IOI赛事规则进行评分,相关代码托管于[https://github.com/huggingface/ioi](https://github.com/huggingface/ioi),且我们已在IOI’2024赛题上对超过40款主流推理模型进行了评估。
IOI赛事规定,选手每道赛题最多可提交50次代码。我们为每个子任务生成了50份提交结果,并采用与OpenAI[在相关论文中](http://arxiv.org/abs/2502.06807)一致的筛选策略,以模拟竞赛环境下各赛题的得分情况。结果如下所示,其中的水平线分别代表真实赛事数据中铜奖、银奖、金奖模型的得分阈值[来源](https://stats.ioinformatics.org/results/2024)。尽管o1模型接近铜奖阈值,但最终没有任何模型能够达到奖牌分数线(即参赛选手的第50百分位得分)。

## 提交策略
需要注意的是,我们的提交策略可能会对非推理类模型造成不利影响,例如Qwen-2.5-Coder-32B-Instruct、OlympicCoder-32B的基础模型。为了模拟真实竞赛环境——即提交的得分仅在提交后才会揭晓——我们采用了与OpenAI开发o1-ioi模型时[一致的轮询提交策略](http://arxiv.org/abs/2502.06807):我们首先提交针对赛题最后一个子任务的解法,随后依次针对倒数第二个、倒数第三个子任务提交代码,仅当某一解法被选中提交时才会对其进行评估。我们会跳过那些已被此前选中的提交解法解决的子任务,且在每个目标子任务中,我们优先选择生成长度更长的提交结果——这一筛选标准对推理类模型更为合理,但对其他类型模型则未必适用。
如果我们取消50次提交限制(这将脱离真实竞赛环境),并对我们生成的所有提交结果(每个子任务50份)进行评估,则可得到如下结果:

## 数据集链接
- [赛题描述数据集](https://huggingface.co/datasets/open-r1/ioi)(IOI’2020 - IOI’2024):`open-r1/ioi`(即当前所在数据集!)
- [测试用例](https://huggingface.co/datasets/open-r1/ioi-test-cases):`open-r1/ioi-test-cases`
- [官方(基准)解法](https://huggingface.co/datasets/open-r1/ioi-sample-solutions):`open-r1/ioi-sample-solutions`
- [IOI’2024上40+主流模型的评估数据](https://huggingface.co/datasets/open-r1/ioi-2024-model-solutions):`open-r1/ioi-2024-model-solutions`
## 生成解法
若需让模型生成IOI赛题的解法,请参照[此处的指引](https://github.com/huggingface/ioi/tree/main/generate)。
## 运行测试
若需对生成的解法进行测试,请参照[此处的指引](https://github.com/huggingface/ioi/tree/main/run_tests)。
## 原始数据
所有数据均从官方IOI赛题中提取、解析并适配而来,官方赛题以CC-BY协议发布:
- https://www.ioi2024.eg/competition-tasks
- https://ioi2023.hu/tasks/index.html
- https://ioi2022.id/tasks/
- https://ioi2021.sg/ioi-2021-tasks/
- https://ioi2020.sg/ioi-2020-tasks/
## 许可协议
与原始赛题数据一致,本数据集及配套数据集均以CC-BY协议发布。
提供机构:
maas
创建时间:
2025-03-08



