Triple traversal sequences of AST.

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://figshare.com/articles/dataset/Triple_traversal_sequences_of_AST_/28791113

下载链接

链接失效反馈

官方服务：

资源简介：

Software defect prediction is a technology that uses known software information to predict defects in the target software. Generally, models are built using features such as software metrics, semantic information, and software networks. However, due to the complex software structure and the small number of samples, without effective feature representation and feature extraction methods, it is impossible to fully utilize software features, which can easily lead to misjudgments and reduced performance. In addition, a single feature cannot fully characterize the software structure. Therefore, this research proposes a new method to efficiently and accurately represent the Abstract Syntax Tree(AST) and a model called MFA(Multi Features Attention) that uses a deformable attention mechanism to extract features and uses a self-attention mechanism to fuse semantic and network features. By selecting 21 Java projects and comparing them with multiple models for cross-version and cross-project experiments, the experiments show that the average ACC, F1, AUC of the proposed model in the cross-version scheme reach 0.7, 0.614 and 0.711. In the cross-project scheme, the average ACC, F1 and AUC are 0.687, 0.575 and 0.696. Up to 41% better than other models, and the results of fusion features are better than those of a single feature, showing that MFA using two features for extraction and fusion has greater advantages in prediction performance.

软件缺陷预测（Software defect prediction）是一项利用已知软件信息预测目标软件中潜在缺陷的技术。当前主流研究通常借助软件度量、语义信息、软件网络等特征构建预测模型。然而，由于软件结构复杂且样本规模有限，若缺乏有效的特征表示与特征提取手段，将无法充分挖掘软件特征的潜在价值，极易引发误判并降低模型性能。此外，单一特征难以完整刻画软件的结构特性。为此，本研究提出了一种可高效精准表示抽象语法树（Abstract Syntax Tree, AST）的新方法，以及一种命名为多特征注意力（Multi Features Attention, MFA）的模型：该模型采用可变形注意力机制提取特征，并通过自注意力机制融合语义特征与网络特征。研究选取21个Java项目开展实验，通过跨版本与跨项目对比试验，并与多种基准模型进行对照。实验结果显示，所提模型在跨版本方案下的平均准确率（ACC）、F1值、曲线下面积（AUC）分别达到0.7、0.614与0.711；在跨项目方案下，其平均ACC、F1值与AUC分别为0.687、0.575与0.696，性能最高较其他模型提升41%。同时，融合特征的实验结果优于单一特征方案，证明了采用双特征进行提取与融合的MFA模型在缺陷预测任务中具备更优异的预测性能。

创建时间：

2025-04-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集