five

Flaky Test Dataset for "Exploring usefulness of selected Large Language Models for detecting flaky tests"

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15032339
下载链接
链接失效反馈
官方服务:
资源简介:
Flaky tests yield inconsistent results without code changes and significantly impact software development costs, emphasizing the importance of effective detection methods. Despite various research efforts over the past 15 years, existing techniques often show limited precision and scope. This study explores the use of Large Language Models (LLMs), originally conceived for natural language processing, to detect flaky tests in software. Using the International Dataset of Flaky Tests, we asked commercially available LLMs, including GPT and Gemini, to statically classify Java test cases as flaky or non-flaky. Our results show that LLMs were unable to consistently identify flaky tests, indicating the need for alternative detection strategies. This research underscores the challenges of adapting LLMs to flaky test detection and highlights the ongoing requirement for more effective solutions. This archive contains the data we used for this experiment along with the artifacts obtained during our work on it. The ZIP file contains a description of the individual files in the package, which includes both the data and the Python scripts we used.
创建时间:
2025-03-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作