SpringTC - an executable text-code dataset
收藏ieee-dataport.org2025-03-25 收录
下载链接:
https://ieee-dataport.org/documents/springtc-executable-text-code-dataset
下载链接
链接失效反馈官方服务:
资源简介:
M. Kacmajor and J.D. Kelleher, "ExTra: Evaluation of Automatically Generated Source Code Using Execution Traces" (submitted to IEEE TSE)In this paper we propose ExTra---a novel approach to evaluating code quality based on the comparison of execution traces of the generated code and the ground-truth code. ExTra captures the behaviour of the programs implemented with the generated code, taking into account all the internal and external dependencies. In contrast to source-code based metrics, ExTra is semantically meaningful; and in contrast to the evaluation approaches measuring the functional correctness of code, ExTra is suitable for evaluation of code developed in the context of real-life software systems. The first contribution of this paper is the design, implementation, and validation of ExTra. The value of ExTra is examined via experiments in which our metric and three source-code based metrics (BLEU, Levenshtein distance and CodeBLEU) are applied to two types of automatically generated source code: test code and production code. The results show that the scores produced by the three source-code based metrics are highly correlated, while ExTra is clearly distinct. The qualitative analysis of the differences reveals a number of examples of ExTra scores being semantically more adequate than the scores computed based on token comparison. Furthermore, the quantitative analysis of the agreement between the evaluation scores and test verdicts---produced by generated test cases or by test cases applied to the generated code---shows that ExTra is a much better predictor of verdicts \textit{failed} than any of the three text-oriented metrics. On the whole, our results indicate that ExTra provides added value to the process of assessing the quality of the generated code, and we recommend it as an evaluation tool complementary to the source-code based methods. The second contribution of this paper are three new evaluation datasets which contain executable code extracted from large, active Github repositories and can be used for evaluting models' performance using ExTra, or for other tasks that require executable code.
M. Kacmajor 和 J.D. Kelleher, 《ExTra:基于执行轨迹自动生成源代码的质量评估》”(提交至IEEE TSE)在本文中,我们提出了ExTra——一种基于对生成的代码与真实代码执行轨迹进行比较的代码质量评估的新方法。ExTra捕捉了使用生成的代码实现的程序的行为,并考虑了所有内部和外部依赖。与基于源代码的指标相比,ExTra具有语义上的意义;与衡量代码功能正确性的评估方法相比,ExTra更适合于评估在现实生活中的软件系统开发中的代码。本文的第一项贡献是ExTra的设计、实现和验证。通过实验检验了ExTra的价值,实验中将我们的指标与三种基于源代码的指标(BLEU、Levenshtein距离和CodeBLEU)应用于两种类型的自动生成源代码:测试代码和生产代码。结果表明,三种基于源代码的指标产生的分数高度相关,而ExTra则明显不同。对差异的定性分析揭示了多个例子,其中ExTra的分数在语义上比基于token比较计算的分数更为恰当。此外,对评估分数与测试结论之间一致性的定量分析——由生成的测试用例或应用于生成的代码的测试用例产生的——表明,ExTra比任何三种基于文本的指标都更能预测“失败”的结论。总的来说,我们的结果表明,ExTra为评估生成代码的质量提供了附加价值,我们推荐将其作为源代码方法的补充评估工具。本文的第二项贡献是三个新的评估数据集,这些数据集包含从大型、活跃的GitHub仓库中提取的可执行代码,可用于使用ExTra评估模型性能,或用于其他需要可执行代码的任务。
提供机构:
ieee-dataport.org



