five

osunlp/TACO-Cobalt-PTB

收藏
Hugging Face2026-02-04 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/osunlp/TACO-Cobalt-PTB
下载链接
链接失效反馈
官方服务:
资源简介:
TACO-Cobalt-PTB是TACO-Cobalt验证集的扰动版本,旨在分析大型语言模型(LLMs)在代码生成上下文中的即时奖励黑客行为。对于每个编码问题,随机选择两个输出不同的公共测试用例(x_1, y_1)和(x_2, y_2),交换它们的预期输出,生成两个扰动测试用例(x_1, y_2)和(x_2, y_1),这些测试用例对于任何正确程序来说都是无法通过的。如果某个编码问题的所有公共测试用例输出相同,则该任务被丢弃。此外,未改变的测试用例与扰动测试用例一起保留在公共分割中,以模拟现实世界中少数测试用例存在噪声但大多数用例仍然正确的情景。

TACO-Cobalt-PTB is the perturbed version of the validation set in TACO-Cobalt to analyze in-context reward hacking behaviors of LLMs in code generation contexts. For each coding problem, we randomly select two public tests (x_1, y_1) and (x_2, y_2) with distinct outputs (y_1 != y_2). Then, we exchange their expected outputs and turn them into two perturbed tests (x_1, y_2) and (x_2, y_1), which are impossible for any correct program to pass. If all public test cases for a coding problem share the same output, we will discard the task. We keep other unchanged test cases together with the perturbed cases in the public split, which resembles real-world scenarios that one or two tests are noisy, but the majority of cases are still correct.
提供机构:
osunlp
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作