HlebYakhnitski/codecomplex
收藏Hugging Face2025-12-16 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/HlebYakhnitski/codecomplex
下载链接
链接失效反馈官方服务:
资源简介:
CodeComplex数据集包含4,200个由人类程序员提交至编程竞赛的Java代码片段,并由算法专家标注了复杂度标签。数据集主要用于文本生成和语言建模任务。数据集中包含的字段有src(源代码)、complexity(复杂度)、problem(问题名称)和from(问题来源)。复杂度分为7个类别:常数、线性、二次、三次、对数、线性对数和NP难。数据集仅包含训练集,没有验证集或测试集。数据集的创建过程包括从CodeForces收集问题和解决方案代码,并由经验丰富的人类标注者标注时间复杂度,最后由编程专家验证标注结果。
The CodeComplex dataset consists of 4,200 Java codes submitted to programming competitions by human programmers and their complexity labels annotated by a group of algorithm experts. The dataset is primarily used for text generation and language modeling tasks. The dataset includes fields such as src (source code), complexity (program complexity), problem (problem name), and from (source of the problem). The complexity is divided into 7 classes: constant, linear, quadratic, cubic, log(n), nlog(n), and NP-hard. The dataset only contains a train split, with no validation or test splits. The dataset creation process involved collecting problem and solution codes from CodeForces, which were then labeled by experienced human annotators for time complexity and verified by programming experts.
提供机构:
HlebYakhnitski



