five

Dataset for Evaluating Software Engineering Quality Metrics on LLM-Generated Code

收藏
Zenodo2026-02-09 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.18210383
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset supports the empirical study “From Correctness to Code Quality: Formalizing Software Engineering Metrics for Evaluating General LLMs,” accepted at ISEC 2026. The dataset contains 240 code solutions generated by four general-purpose large language models (ChatGPT, Gemini, LLaMA, and DeepSeek) across 15 publicly available LeetCode problems. Each problem is solved in four programming languages: C++, Java, C#, and Python. For each solution, the dataset provides: Functional correctness results from the LeetCode online judge Runtime and memory usage statistics Five software engineering (SE) quality metrics:  (i) Error Handling Score (EHS),  (ii) Input Validation Score (IVS),  (iii) Maintainability Score (MS),  (iv) Style & Structure Score (S3),  (v) Documentation Score (DS) All code solutions were generated using zero-shot prompting with fixed decoding parameters and without manual modification. The dataset is intended to support reproducible evaluation of LLM-generated code beyond correctness, emphasizing robustness, maintainability, and documentation quality. This release is designed for research and academic use. Runtime and memory measurements depend on the execution environment of the LeetCode platform and may vary over time.
提供机构:
Zenodo
创建时间:
2026-01-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作