BigOBench
收藏魔搭社区2025-12-05 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/facebook/BigOBench
下载链接
链接失效反馈官方服务:
资源简介:
<p align="center">
<!-- <p><b><i>BigO(Bench)</b></i></p> -->
<img style="width: 500px;" src="logo.png" alt="logo">
</p>
<div align="center" style="line-height: 1;">
<a href="https://facebookresearch.github.io/BigOBench" target="_blank" style="margin: 2px;">
<img alt="HomePage" src="https://img.shields.io/badge/🏡%20HomePage-BigOBench-green" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://facebookresearch.github.io/BigOBench/leaderboard.html" target="_blank" style="margin: 2px;">
<img alt="Leaderboard" src="https://img.shields.io/badge/🏆%20Leaderboard-BigOBench-yellow" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://facebookresearch.github.io/BigOBench/demo.html" target="_blank" style="margin: 2px;">
<img alt="Explorer" src="https://img.shields.io/badge/🔎%20Explorer-BigOBench-white" style="display: inline-block; vertical-align: middle;"/>
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://github.com/facebookresearch/BigOBench">
<img alt="Github" src="https://img.shields.io/badge/Github-facebookresearch/BigOBench-black?logo=github"/>
</a>
<a href="https://huggingface.co/datasets/facebook/BigOBench">
<img alt="HuggingFace" src="https://img.shields.io/badge/🤗%20HuggingFace-facebook/BigOBench-ffc107"/>
</a>
<a href="https://arxiv.org/abs/2503.15242">
<img alt="ArXiv" src="https://img.shields.io/badge/arXiv-2503.15242-b5212f?logo=arxiv"/>
</a>
</div>
## 👋 Overview
* 🚀 Introduction
* 📋 Getting Started with the data
* 🔥 `problem_and_human_solutions_list.jsonl`
* 🔥 `complexity_labels_light.jsonl`
* 🔥 `complexity_labels_full.jsonl`
* 🔥 `time_complexity_test_set.jsonl`
* 🔥 `space_complexity_test_set.jsonl`
* License
* 📝 Citation
## 🚀 Introduction

<span style="font-variant: small-caps;"><b>BigO(Bench)</b></span> is a benchmark of ~300 code problems to be solved in Python, along with 3,105 coding problems and 1,190,250 solutions for training purposes, that evaluates whether LLMs can find the time-space complexity of code solutions or generate code solutions themselves that respect a time-space complexity requirement. This benchmark addresses the gap in current evaluations that often overlook the ability of models to comprehend and produce code constrained by computational complexity. <span style="font-variant: small-caps;"><b>BigO(Bench)</b></span> includes a complexity inference framework that can run any Python code snippet, measure multiple runtime and memory footprint values, and infer its algorithmic time-space complexity. It also includes of set of 3,105 coding problems and 1,190,250 solutions from Code Contests annotated with inferred (synthetic) time and space complexity labels from the complexity framework, as well as corresponding runtime and memory footprint values for a large set of input sizes.
For more details, see our [Paper](todo), [GitHub repository](https://github.com/facebookresearch/bigobench) and [Website](todo).
## 📋 Getting Started with the data
The data is available as a Huggingface dataset.
You can directly download it from the HuggingFace website, or use the CLI
```bash
huggingface-cli download facebook/BigOBench --repo-type dataset --local-dir ./temp_dir
```
It can also be loaded in a Python script using
```python
from datasets import load_dataset
# Change the second parameter to the sub-dataset you would like to use
df_bb = load_dataset("facebook/BigOBench", 'time_complexity_test_set')
```
You will find 5 files, whose content is detailed below.
## 🔥 problem_and_human_solutions_list.jsonl
This gathers the general information about the coding problems and human solutions on which BigO(Bench) is built, from the problem descriptions to the public and private tests. There is also a lot of metadata used for postprocessing the results of BigO(Bench) and further analyze where models are strong and where they struggle.
In addition, you will find added data from BigO(Bench), first in the field `dataclass` that shares added value about the inputs of a problem, and the code of the dataclass corresponding to this problem as generated by ourselves using a LLM. Some metadata from the complexity framework is also available in `complexity_framework`, such as the fail rates and inputs metadata as parsed by the framework itself (which can differ from what the dataclass parsed):
- `dataclass.input_type_list` gives the list of arguments, listed as their data type, as inferred by a LLM. This comes along the dataclass code, that is also inferred by the LLM. The LLM uses the problem description and a reference solution to try to understand the data types. This field was used to create filters on the problems and solutions, to create the base dataset of BigO(Bench).
- `complexity_framework.measures_set_id_to_input_properties.framework_input_type` is instead the data type of each argument as inferred by the framework. The framework uses the dataclass code generated by the LLM to split the input stream (string that represents all the inputs of the problem all together), and then each input is parsed into a data type using rules. This means that sometimes a LLM can correctly understand that there are two arguments, but mistake them for string arguments, whereas the framework will use the LLM-generated dataclass to split the input stream into the two arguments, but using rules will correctly infer that each argument is an integer. To understand fully the complexity framework outputs, use this field. The previous one was only used for filters on the base Code Contests dataset, but were not used within the complexity framework itself to generate the complexity output.
`problem_and_human_solutions_list`: dict list
* `problem_id`: str
* `problem_name`: str
* `description`: dict
- `text`: str
- `is_description_translated`: bool
- `untranslated_text`: str
* `correct_solution_list`: dict list
- `solution_id`: str
- `solution_code`: str
* `data_source`: str
* `source_specific_limits`: dict
- `time_limit`: dict
- `seconds`: int
- `nanos`: int
- `memory_limit_bytes`: int
* `codeforces_specific_metadata`: dict
- `cf_contest_id`: int
- `cf_index`: str
- `cf_points`: float
- `cf_rating`: int
- `cf_tags`: str list
- `difficulty`: str
* `tests`: dict
- `public_tests`: dict list
- `input`: str
- `output`: str
- `private_tests`: dict list
- `input`: str
- `output`: str
- `generated_tests`: dict list
- `input`: str
- `output`: str
* `human_accuracy_rate`: float
* `dataclass`: dict
- `dataclass_code`: str
- `input_type_list`: str list
- `number_inputs`: int
* `complexity_framework`: dict
- `time_complexity_fail_rate`
- `space_complexity_fail_rate`
- `time_or_space_complexity_fail_rate`
- `measures_set_id_to_input_properties`: dict
- (measures_set_id) str: dict
- `input_id`: str
- `framework_input_type`: str
- `input_dimension`: int
## 🔥 complexity_labels_light.jsonl
Light outputs of the complexity framework, as detailed in the module `src/complexity`, when run on all problems and solutions from `problem_and_human_solutions_list.jsonl`.
`complexity_labels_light`: dict list
* `problem_id`: str
* `problem_name`: str
* `solution_id`: str
* `time_complexity_inferred`: str
* `space_complexity_inferred`: str
* `time_curve_coefficient`: float
* `space_curve_coefficient`: float
## 🔥 complexity_labels_full.jsonl
Full outputs of the complexity framework, as detailed in the module `src/complexity`, when run on all problems and solutions from `problem_and_human_solutions_list.jsonl`.
`complexity_labels_full_n-m`: dict list
* `problem_id`: str
* `problem_name`: str
* `solution_id`: str
* `time_complexity_inferred`: str
* `space_complexity_inferred`: str
* `time_curve_coefficient`: float
* `space_curve_coefficient`: float
* `query_dataclass_code`: str
* `query_code`: str
* `query_inputs_example` : str
* `runtime_measures`: dict list
- `measures_set_id`: str
- `measures_per_expansion_multiplier`: dict list
- `expansion_multiplier`: int
- `measures_per_expansion_method`: dict list
- `value_list`: float list
- `expansion_method`: str
- `measures_set_id_list`: str list
- `measures_priority`: int
* `memory_footprint_measures`: dict list
- `measures_set_id`: str
- `measures_per_expansion_multiplier`: dict list
- `expansion_multiplier`: int
- `measures_per_expansion_method`: dict list
- `value_list`: float list
- `expansion_method`: str
- `measures_set_id_list`: str list
- `measures_priority`: int
## 🔥 time_complexity_test_set.jsonl
The time complexity test set is made out of 311 problems and 640 corresponding solutions covering 11 different classes (the most represented ones being O(n), O(n.log(n)), O(n2), O(1), O(n ×m) and the least represented O((n + m)log(n + m))).
It was created from `problem_and_human_solutions_list.jsonl` and the complexity framework outputs on this dataset, `complexity_labels_full.jsonl`. Filtering was applied to nail down the test set of problems and solutions.
`time_complexity_test_set`: dict list
* `problem_name`: str
* `problem_id`: str
* `solution_id`: str
* `description`: str
* `solution_code`: str
* `dataclass_code`: str
* `inputs_example`: str
* `time_complexity_inferred`: str
* `time_curve_coefficient`: float
* `tests`: dict
- `public_tests`: dict list
- `input`: str
- `output`: str
- `private_tests`: dict list
- `input`: str
- `output`: str
- `generated_tests`: dict list
- `input`: str
- `output`: str
* `problem_time_curve_coefficient_list`: float list
## 🔥 space_complexity_test_set.jsonl
The space complexity test set consists in 308 problems and 636 solutions covering 5 different classes (by order of popularity O(n), O(1), O(n**2), O(n + m), O(n×m)).
It was created from `problem_and_human_solutions_list.jsonl` and the complexity framework outputs on this dataset, `complexity_labels_full.jsonl`. Filtering was applied to nail down the test set of problems and solutions.
`space_complexity_test_set`: dict list
* `problem_name`: str
* `problem_id`: str
* `solution_id`: str
* `description`: str
* `solution_code`: str
* `dataclass_code`: str
* `inputs_example`: str
* `space_complexity_inferred`: str
* `space_curve_coefficient`: float
* `tests`: dict
- `public_tests`: dict list
- `input`: str
- `output`: str
- `private_tests`: dict list
- `input`: str
- `output`: str
- `generated_tests`: dict list
- `input`: str
- `output`: str
* `problem_space_curve_coefficient_list`: float list
## License
The majority of BigO(Bench) is licensed under CC-BY-NC (see [LICENCE](/LICENSE.md)), however portions of the project are available under separate license terms: https://github.com/pberkes/big_O is licensed under the BSD-3 license.
## 📝 Citation
If you find our project useful and/or are using its data, please cite our paper:
```
@misc{chambon2025bigobenchllmsgenerate,
title={BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity?},
author={Pierre Chambon and Baptiste Roziere and Benoit Sagot and Gabriel Synnaeve},
year={2025},
eprint={2503.15242},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.15242},
}
```
<p align="center">
<!-- <p><b><i>BigO(Bench)</b></i></p> -->
<img style="width: 500px;" src="logo.png" alt="项目Logo">
</p>
<div align="center" style="line-height: 1;">
<a href="https://facebookresearch.github.io/BigOBench" target="_blank" style="margin: 2px;">
<img alt="主页" src="https://img.shields.io/badge/🏡%20主页-BigOBench-green" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://facebookresearch.github.io/BigOBench/leaderboard.html" target="_blank" style="margin: 2px;">
<img alt="排行榜" src="https://img.shields.io/badge/🏆%20排行榜-BigOBench-yellow" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://facebookresearch.github.io/BigOBench/demo.html" target="_blank" style="margin: 2px;">
<img alt="探索器" src="https://img.shields.io/badge/🔎%20探索器-BigOBench-white" style="display: inline-block; vertical-align: middle;"/>
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://github.com/facebookresearch/BigOBench">
<img alt="Github仓库" src="https://img.shields.io/badge/Github-facebookresearch/BigOBench-black?logo=github"/>
</a>
<a href="https://huggingface.co/datasets/facebook/BigOBench">
<img alt="HuggingFace数据集" src="https://img.shields.io/badge/🤗%20HuggingFace-facebook/BigOBench-ffc107"/>
</a>
<a href="https://arxiv.org/abs/2503.15242">
<img alt="ArXiv论文" src="https://img.shields.io/badge/arXiv-2503.15242-b5212f?logo=arxiv"/>
</a>
</div>
## 👋 概述
* 🚀 介绍
* 📋 数据集快速上手
* 🔥 `problem_and_human_solutions_list.jsonl`
* 🔥 `complexity_labels_light.jsonl`
* 🔥 `complexity_labels_full.jsonl`
* 🔥 `time_complexity_test_set.jsonl`
* 🔥 `space_complexity_test_set.jsonl`
* 许可证
* 📝 引用
## 🚀 介绍

**BigO(Bench)** 是一款面向Python编程题的基准测试集,包含约300道需用Python求解的编程题,以及3105道编程训练题与1190250份配套解题代码,旨在评估大语言模型(Large Language Model, LLM)能否推导代码解法的时空复杂度,或是生成符合时空复杂度要求的代码解决方案。该基准测试集填补了当前评估体系的空白——现有评估往往忽视了模型理解并生成受计算复杂度约束的代码的能力。**BigO(Bench)** 内置一套复杂度推断框架,可运行任意Python代码片段,测量其多次运行的运行时间(runtime)与内存占用(memory footprint),并推导其算法时空复杂度。此外,该基准测试集还包含来自编程竞赛(Code Contests)的3105道编程题与1190250份解题代码,这些数据均通过复杂度推断框架标注了合成的时空复杂度标签,并附带大量输入规模下对应的运行时间与内存占用测量值。
如需了解更多细节,请参阅我们的[论文](todo)、[GitHub仓库](https://github.com/facebookresearch/bigobench)与[项目主页](todo)。
## 📋 数据集快速上手
本数据集以HuggingFace数据集格式发布。你可以直接从HuggingFace官网下载,或通过命令行界面(CLI)获取:
bash
huggingface-cli download facebook/BigOBench --repo-type dataset --local-dir ./temp_dir
你也可以通过Python脚本加载该数据集,示例代码如下:
python
from datasets import load_dataset
# Change the second parameter to the sub-dataset you would like to use
df_bb = load_dataset("facebook/BigOBench", 'time_complexity_test_set')
数据集中包含5个文件,下文将逐一详细说明其内容。
## 🔥 `problem_and_human_solutions_list.jsonl`
该文件汇总了构建**BigO(Bench)** 所用的编程题与人工解题代码的完整信息,涵盖题目描述、公开与私有测试用例等内容。此外,文件中还包含大量元数据,用于后处理**BigO(Bench)** 的评测结果,进一步分析模型的强项与薄弱环节。
除此之外,文件中还包含**BigO(Bench)** 新增的补充数据:一是`dataclass`字段,该字段提供了题目输入的相关增值信息,以及我们通过大语言模型生成的对应题目的数据类代码;二是`complexity_framework`字段,其中包含复杂度推断框架生成的部分元数据,例如由框架解析得到的失败率与输入元数据(该结果可能与数据类解析的结果存在差异):
- `dataclass.input_type_list` 给出了由大语言模型推断得到的参数列表及其数据类型,同时附带该大语言模型生成的数据类代码。该字段通过题目描述与参考解法推断数据类型,用于筛选基础数据集以构建**BigO(Bench)**。
- `complexity_framework.measures_set_id_to_input_properties.framework_input_type` 则是由复杂度推断框架得到的各参数数据类型。该框架通过大语言模型生成的数据类代码拆分输入流(代表题目所有输入的字符串),再通过规则将每个输入解析为对应数据类型。这意味着,有时大语言模型可以正确识别出两个参数,但误将其识别为字符串类型;而复杂度推断框架会利用大语言模型生成的数据类拆分输入流,并通过规则正确推断每个参数为整数类型。如需完整理解复杂度推断框架的输出结果,请参考该字段。前者仅用于筛选基础编程竞赛数据集,并未在复杂度推断框架生成复杂度结果时使用。
`problem_and_human_solutions_list` 为字典列表,其字段说明如下:
* `problem_id`: 字符串,题目唯一标识
* `problem_name`: 字符串,题目名称
* `description`: 字典,题目描述信息
- `text`: 字符串,题目描述文本
- `is_description_translated`: 布尔值,描述是否已翻译
- `untranslated_text`: 字符串,未翻译的原始描述文本
* `correct_solution_list`: 字典列表,正确解题代码列表
- `solution_id`: 字符串,解题代码唯一标识
- `solution_code`: 字符串,解题代码文本
* `data_source`: 字符串,数据来源
* `source_specific_limits`: 字典,题目特定限制
- `time_limit`: 字典,时间限制
- `seconds`: 整数,秒数
- `nanos`: 整数,纳秒数
- `memory_limit_bytes`: 整数,内存限制(字节)
* `codeforces_specific_metadata`: 字典,Codeforces平台特定元数据
- `cf_contest_id`: 整数,Codeforces竞赛ID
- `cf_index`: 字符串,Codeforces题目索引
- `cf_points`: 浮点数,Codeforces题目分值
- `cf_rating`: 整数,Codeforces题目难度评级
- `cf_tags`: 字符串列表,Codeforces题目标签
- `difficulty`: 字符串,题目难度
* `tests`: 字典,测试用例信息
- `public_tests`: 字典列表,公开测试用例
- `input`: 字符串,测试输入
- `output`: 字符串,测试输出
- `private_tests`: 字典列表,私有测试用例
- `input`: 字符串,测试输入
- `output`: 字符串,测试输出
- `generated_tests`: 字典列表,自动生成的测试用例
- `input`: 字符串,测试输入
- `output`: 字符串,测试输出
* `human_accuracy_rate`: 浮点数,人类选手正确率
* `dataclass`: 字典,大语言模型生成的数据类信息
- `dataclass_code`: 字符串,数据类代码
- `input_type_list`: 字符串列表,参数数据类型列表
- `number_inputs`: 整数,输入参数数量
* `complexity_framework`: 字典,复杂度推断框架输出信息
- `time_complexity_fail_rate`: 时间复杂度推断失败率
- `space_complexity_fail_rate`: 空间复杂度推断失败率
- `time_or_space_complexity_fail_rate`: 时空复杂度推断总失败率
- `measures_set_id_to_input_properties`: 字典,输入属性映射
- (measures_set_id) 字符串: 字典,特定输入集属性
- `input_id`: 字符串,输入唯一标识
- `framework_input_type`: 字符串,框架推断的参数数据类型
- `input_dimension`: 整数,输入维度
## 🔥 `complexity_labels_light.jsonl`
该文件包含复杂度推断框架在`problem_and_human_solutions_list.jsonl` 中所有题目与解题代码上的轻量输出结果,详细实现参见`src/complexity`模块。
`complexity_labels_light` 为字典列表,其字段说明如下:
* `problem_id`: 字符串,题目唯一标识
* `problem_name`: 字符串,题目名称
* `solution_id`: 字符串,解题代码唯一标识
* `time_complexity_inferred`: 字符串,推断得到的时间复杂度
* `space_complexity_inferred`: 字符串,推断得到的空间复杂度
* `time_curve_coefficient`: 浮点数,时间复杂度曲线系数
* `space_curve_coefficient`: 浮点数,空间复杂度曲线系数
## 🔥 `complexity_labels_full.jsonl`
该文件包含复杂度推断框架在`problem_and_human_solutions_list.jsonl` 中所有题目与解题代码上的完整输出结果,详细实现参见`src/complexity`模块。
`complexity_labels_full_n-m` 为字典列表,其字段说明如下:
* `problem_id`: 字符串,题目唯一标识
* `problem_name`: 字符串,题目名称
* `solution_id`: 字符串,解题代码唯一标识
* `time_complexity_inferred`: 字符串,推断得到的时间复杂度
* `space_complexity_inferred`: 字符串,推断得到的空间复杂度
* `time_curve_coefficient`: 浮点数,时间复杂度曲线系数
* `space_curve_coefficient`: 浮点数,空间复杂度曲线系数
* `query_dataclass_code`: 字符串,查询所用的数据类代码
* `query_code`: 字符串,查询所用的代码
* `query_inputs_example` : 字符串,示例输入
* `runtime_measures`: 字典列表,运行时间测量结果
- `measures_set_id`: 字符串,测量集唯一标识
- `measures_per_expansion_multiplier`: 字典列表,按扩展乘数分组的测量结果
- `expansion_multiplier`: 整数,扩展乘数
- `measures_per_expansion_method`: 字典列表,按扩展方法分组的测量结果
- `value_list`: 浮点数列表,测量值列表
- `expansion_method`: 字符串,扩展方法
- `measures_set_id_list`: 字符串列表,测量集ID列表
- `measures_priority`: 整数,测量优先级
* `memory_footprint_measures`: 字典列表,内存占用测量结果
- `measures_set_id`: 字符串,测量集唯一标识
- `measures_per_expansion_multiplier`: 字典列表,按扩展乘数分组的测量结果
- `expansion_multiplier`: 整数,扩展乘数
- `measures_per_expansion_method`: 字典列表,按扩展方法分组的测量结果
- `value_list`: 浮点数列表,测量值列表
- `expansion_method`: 字符串,扩展方法
- `measures_set_id_list`: 字符串列表,测量集ID列表
- `measures_priority`: 整数,测量优先级
## 🔥 `time_complexity_test_set.jsonl`
**时间复杂度测试集** 包含311道编程题与对应的640份解题代码,涵盖11种复杂度类别(占比最高的为O(n)、O(n log n)、O(n²)、O(1)、O(n×m),占比最低的为O((n+m)log(n+m)))。该测试集从`problem_and_human_solutions_list.jsonl` 与`complexity_labels_full.jsonl` 中筛选生成,通过严格的过滤规则确定最终的评测题目与解题代码。
`time_complexity_test_set` 为字典列表,其字段说明如下:
* `problem_name`: 字符串,题目名称
* `problem_id`: 字符串,题目唯一标识
* `solution_id`: 字符串,解题代码唯一标识
* `description`: 字符串,题目描述
* `solution_code`: 字符串,解题代码
* `dataclass_code`: 字符串,数据类代码
* `inputs_example`: 字符串,示例输入
* `time_complexity_inferred`: 字符串,推断得到的时间复杂度
* `time_curve_coefficient`: 浮点数,时间复杂度曲线系数
* `tests`: 字典,测试用例信息
- `public_tests`: 字典列表,公开测试用例
- `input`: 字符串,测试输入
- `output`: 字符串,测试输出
- `private_tests`: 字典列表,私有测试用例
- `input`: 字符串,测试输入
- `output`: 字符串,测试输出
- `generated_tests`: 字典列表,自动生成的测试用例
- `input`: 字符串,测试输入
- `output`: 字符串,测试输出
* `problem_time_curve_coefficient_list`: 浮点数列表,题目时间复杂度曲线系数列表
## 🔥 `space_complexity_test_set.jsonl`
**空间复杂度测试集** 包含308道编程题与对应的636份解题代码,涵盖5种复杂度类别(按流行度排序为O(n)、O(1)、O(n²)、O(n+m)、O(n×m))。该测试集从`problem_and_human_solutions_list.jsonl` 与`complexity_labels_full.jsonl` 中筛选生成,通过严格的过滤规则确定最终的评测题目与解题代码。
`space_complexity_test_set` 为字典列表,其字段说明如下:
* `problem_name`: 字符串,题目名称
* `problem_id`: 字符串,题目唯一标识
* `solution_id`: 字符串,解题代码唯一标识
* `description`: 字符串,题目描述
* `solution_code`: 字符串,解题代码
* `dataclass_code`: 字符串,数据类代码
* `inputs_example`: 字符串,示例输入
* `space_complexity_inferred`: 字符串,推断得到的空间复杂度
* `space_curve_coefficient`: 浮点数,空间复杂度曲线系数
* `tests`: 字典,测试用例信息
- `public_tests`: 字典列表,公开测试用例
- `input`: 字符串,测试输入
- `output`: 字符串,测试输出
- `private_tests`: 字典列表,私有测试用例
- `input`: 字符串,测试输入
- `output`: 字符串,测试输出
- `generated_tests`: 字典列表,自动生成的测试用例
- `input`: 字符串,测试输入
- `output`: 字符串,测试输出
* `problem_space_curve_coefficient_list`: 浮点数列表,题目空间复杂度曲线系数列表
## 许可证
**BigO(Bench)** 的大部分内容采用CC-BY-NC许可证授权(详见[LICENCE](/LICENSE.md)),但项目的部分组件采用独立的许可证条款:https://github.com/pberkes/big_O 采用BSD-3许可证授权。
## 📝 引用
如果您认为本项目对您的工作有所帮助,或正在使用本数据集,请引用我们的论文:
@misc{chambon2025bigobenchllmsgenerate,
title={BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity?},
author={Pierre Chambon and Baptiste Roziere and Benoit Sagot and Gabriel Synnaeve},
year={2025},
eprint={2503.15242},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.15242},
}
提供机构:
maas
创建时间:
2025-05-20



