SpringTC - an executable text-code dataset

Name: SpringTC - an executable text-code dataset
Creator: IEEE DataPort
Published: 2024-11-13 22:33:24
License: 暂无描述

DataCite Commons2024-11-13 更新2025-04-16 收录

下载链接：

https://ieee-dataport.org/documents/springtc-executable-text-code-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

In this paper we propose ExTra---a novel approach to evaluating code quality based on the comparison of execution traces of the generated code and the ground-truth code. ExTra captures the behaviour of the programs implemented with the generated code, taking into account all the internal and external dependencies. In contrast to source-code based metrics, ExTra is semantically meaningful; and in contrast to the evaluation approaches measuring the functional correctness of code,  ExTra is suitable for evaluation of code developed in the context of real-life software systems. The first contribution of this paper is the design, implementation, and validation of ExTra. The value of ExTra is examined via experiments in which our metric and three source-code based metrics (BLEU, Levenshtein distance and CodeBLEU) are applied to two types of automatically generated source code: test code and production code. The results show that the scores produced by the three source-code based metrics are highly correlated, while ExTra is clearly distinct. The qualitative analysis of the differences reveals a number of examples of ExTra scores being semantically more adequate than the scores computed based on token comparison. Furthermore, the quantitative analysis of the agreement between the evaluation scores and test verdicts---produced by generated test cases or by test cases applied to the generated code---shows that ExTra is a much better predictor of verdicts \textit{failed} than any of the three text-oriented metrics. On the whole, our results indicate that ExTra provides added value to the process of assessing the quality of the generated code, and we recommend it as an evaluation tool complementary to the source-code based methods.              The second contribution of this paper are three new evaluation datasets which contain executable code extracted from large, active Github repositories and can be used for evaluting models' performance using ExTra, or for other tasks that require executable code.

本研究提出ExTra——一种基于生成代码与基准代码（ground-truth code）执行轨迹比对的代码质量评估新方法。ExTra可捕捉基于生成代码实现的程序行为，同时兼顾所有内部与外部依赖项。与基于源代码的指标（source-code based metrics）不同，ExTra具备语义合理性；而与衡量代码功能正确性的评估方法相比，ExTra更适用于真实软件系统场景下开发的代码评估。本研究的第一项贡献为ExTra的设计、实现与验证。本研究通过实验验证了ExTra的应用价值：将ExTra指标与三项基于源代码的指标（BLEU、莱文斯坦距离（Levenshtein distance）与CodeBLEU）应用于两类自动生成的源代码——测试代码与生产代码（production code）。实验结果显示，三项基于源代码的指标所生成的评分具有高度相关性，而ExTra的评分则与之显著区分开来。对差异的定性分析表明，多项案例显示ExTra的评分在语义合理性上优于基于Token（Token）比对计算得到的评分。此外，对评估评分与测试判定结果（由生成测试用例或针对生成代码执行的测试用例得出）之间一致性的定量分析表明，相较于三项面向文本的指标，ExTra对「失败」判定结果的预测能力要优异得多。总体而言，本研究结果表明，ExTra可为生成代码质量评估流程提供额外价值，因此我们推荐将其作为基于源代码的评估方法的互补评估工具。本研究的第二项贡献为三个全新的评估数据集：这些数据集包含从大型活跃GitHub（GitHub）仓库中提取的可执行代码，可用于基于ExTra的模型性能评估，或其他需要可执行代码的任务。

提供机构：

IEEE DataPort

创建时间：

2024-11-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集