five

osunlp/TACO-Cobalt

收藏
Hugging Face2026-02-04 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/osunlp/TACO-Cobalt
下载链接
链接失效反馈
官方服务:
资源简介:
TACO-Cobalt是一个清理过的代码生成数据集,源自编程竞赛网站爬取的TACO数据集。该数据集从TACO-verified子集开始,经过进一步清理,包含6,103个有效任务,每个任务至少有8个测试用例。测试用例的难度根据Qwen2.5-Coder-7B-Instruct模型在16次尝试中通过的可能性进行排序。四个最简单的测试用例被选为公开分割用于测试时交互,其余作为隐藏分割。此外,从五个标注的难度级别中随机选择50个例子,共250个例子作为验证集,其余5,853个例子作为训练集。

TACO-Cobalt is a cleaned version of the TACO code generation dataset crawled from programming competition websites. Starting from the TACO-verified subset, further cleaning was applied to obtain 6,103 valid tasks, each with at least 8 test cases. The test cases difficulty is sorted by the possibility of Qwen2.5-Coder-7B-Instruct passing them out of 16 attempts. The four easiest test cases are selected as the public split for test-time interactions, and the others are kept as the hidden split. Additionally, 50 examples from five annotated difficulty levels are randomly selected to form a validation set of 250 examples, with the remaining 5,853 examples included in the training set.
提供机构:
osunlp
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作