five

source code, data, surface plot

收藏
Figshare2023-04-19 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/source_code/22654951
下载链接
链接失效反馈
官方服务:
资源简介:
Static analysis tools (SATs) are widely applied to detect defects in software projects. However, SATs are overshadowed by a large number of unactionable warnings, which severely reduces the usability of SATs. To address this problem, the existing approaches commonly use machine learning (ML) techniques for actionable warning identification (AWI). For these ML-based AWI approaches, the warning feature determination is one of the most critical parts to effectively identify actionable warnings. To eliminate redundant and irrelevant warning features, ML-based AWI approaches usually incorporate feature selection to determine the feature subset by calculating the importance or correlation of features with warning labels. Nevertheless, warning labels are not always available directly in practice. Thus, it is vital and challenging to select warning features for ML-based AWI approaches when warning labels are absent. To address the above problem, we propose an UNsupervised fEAture SElection approach called UNEASE for ML-based AWI. (1) UNEASE first performs the feature clustering to gather warning features into clusters, where the number of clusters is automatically determined and features in the same cluster are considered redundant. (2) Subsequently, UNEASE performs the feature ranking to sort warning features in each cluster with three newly proposed ranking strategies and selects the top-ranked warning feature from each cluster. Based on the selected features, we train a ML classifier to identify actionable warnings. We conduct experiments on nine large-scale and real-world warning datasets. Compare UNEASE with seven typical feature selection techniques, the experimental results show that while taking a low cost to perform the feature selection and maintaining a low redundancy rate in the selected warning features, UNEASE still obtains the top-ranked AUC.
创建时间:
2023-04-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作