Machine Learning for Software Engineering: A Tertiary Study
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/5715474
下载链接
链接失效反馈官方服务:
资源简介:
Dataset of the research paper: Machine Learning for Software Engineering: A Tertiary Study
Machine learning (ML) techniques increase the effectiveness of software engineering (SE) lifecycle activities. We systematically collected, quality-assessed, summarized, and categorized 83 reviews in ML for SE published between 2009–2022, covering 6,117 primary studies. The SE areas most tackled with ML are software quality and testing, while human-centered areas appear more challenging for ML. We propose a number of ML for SE research challenges and actions including: conducting further empirical validation and industrial studies on ML; reconsidering deficient SE methods; documenting and automating data collection and pipeline processes; reexamining how industrial practitioners distribute their proprietary data; and implementing incremental ML approaches.
The following data and source files are included.
review-protocol.md: The protocol employed in this tertiary study
data/
dl-search/
input/
acm_comput_surveys_overviews.bib: Surveys of ACM Computing Surveys journal
acm_comput_surveys_overviews_titles.txt: Titles of surveys
acm_comput_ml_surveys.bib: Machine learning (ML)-related surveys of ACM Computing Surveys journal
acm_comput_ml_surveys_titles.txt: Titles of ML-related surveys
dl_search_queries.txt: Search queries applied to IEEE Xplore, ACM Digital Library, and Elsevier Scopus
ml_keywords.txt: ML-related keywords extracted from ML-related survey titles and used in the search queries
se_keywords.txt: Software Engineering (SE)-related keywords derived from the 15 SWEBOK Knowledge Areas (KAs—except for Computing Foundations, Mathematical Foundations, and Engineering Foundations) and used in the search queries
secondary_studies_keywords.txt: Survey-related keywords composed of the 15 keywords introduced in the tertiary study on SLRs in SE by Kitchenham et al. (2010), and the survey titles, and used in the search queries
output/
acm/
acm{1–9}.bib: Search results from ACM Digital Library
ieee.csv: Search results from IEEE Xplore
scopus_analyze_year.csv: Yearly distribution of ML and SE documents extracted from Scopus's Analyze search results page
scopus.csv: Search results from Scopus
study-selection/
backward_snowballing.csv: Additional secondary studies found through the backward snowballing process
backward_snowballing_references.csv: References of quality-accepted secondary studies
cohen_kappa_agreement.csv: Inter-rater reliability of reviewers in study selection
dl_search_results.csv: Aggregated search results of all three digital libraries
forward_snowballing_reviewer_{1,2}.csv: Divided forward snowballing citations of quality-accepted studies assessed by reviewer 1 and 2, correspondingly, based on IC/EC
study_selection_reviewer_{1,2}.csv: Divided search results assessed by reviewer 1 and 2, correspondingly, based on IC/EC
quality-assessment/
dare_assessment.csv: Quality assessment (QA) of selected secondary studies based on the Database of Abstracts of Reviews of Effects (DARE) criteria by York University, Centre for Reviews and Dissemination
quality_accepted_studies.csv: Details of quality-accepted studies
studies_for_review.bib: Bibliography details and QA scores of selected secondary studies
data-extraction/
further_research.csv: Recommendations for further research of quality-accepted studies
further_research_general.csv: The complete list of associated studies for each general recommendation
knowledge_areas.csv: Classification of quality-accepted studies using the SWEBOK KAs and subareas
ml_techniques.csv: Classification of the quality-accepted studies based on a four-axis ML classification scheme, along with extracted ML techniques employed in the studies
primary_studies.csv: Details of reviewed primary studies by the quality-accepted secondary
research_methods.csv: Citations of the research methods employed by the quality-accepted studies
research_types_methods.csv: Research types and methods employed by the quality-accepted studies
src/
data-analysis.ipynb: Analysis of data extraction results (data preprocessing, top authors and institutions, study types, yearly distribution of publishers, QA scores, and SWEBOK KAs) and creation of all figures included in the study
scopus-year-analysis.ipynb: Yearly distribution of ML and SE publications retrieved from Elsevier Scopus
study-selection-preprocessing.ipynb: Processing of digital library search results to conduct the inter-rater reliability estimation and study selection process
创建时间:
2022-09-16



