Flaky Test Dataset for "Exploring usefulness of selected Large Language Models for detecting flaky tests"
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15032339
下载链接
链接失效反馈官方服务:
资源简介:
Flaky tests yield inconsistent results without code changes and significantly impact software development costs, emphasizing the importance of effective detection methods. Despite various research efforts over the past 15 years, existing techniques often show limited precision and scope. This study explores the use of Large Language Models (LLMs), originally conceived for natural language processing, to detect flaky tests in software. Using the International Dataset of Flaky Tests, we asked commercially available LLMs, including GPT and Gemini, to statically classify Java test cases as flaky or non-flaky. Our results show that LLMs were unable to consistently identify flaky tests, indicating the need for alternative detection strategies. This research underscores the challenges of adapting LLMs to flaky test detection and highlights the ongoing requirement for more effective solutions.
This archive contains the data we used for this experiment along with the artifacts obtained during our work on it. The ZIP file contains a description of the individual files in the package, which includes both the data and the Python scripts we used.
创建时间:
2025-03-15



