Dataset for Evaluating Software Engineering Quality Metrics on LLM-Generated Code

Name: Dataset for Evaluating Software Engineering Quality Metrics on LLM-Generated Code
Creator: Zenodo
Published: 2026-02-09 20:00:48
License: 暂无描述

Zenodo2026-02-09 更新2026-05-26 收录

下载链接：

https://zenodo.org/doi/10.5281/zenodo.18210383

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset supports the empirical study “From Correctness to Code Quality: Formalizing Software Engineering Metrics for Evaluating General LLMs,” accepted at ISEC 2026. The dataset contains 240 code solutions generated by four general-purpose large language models (ChatGPT, Gemini, LLaMA, and DeepSeek) across 15 publicly available LeetCode problems. Each problem is solved in four programming languages: C++, Java, C#, and Python. For each solution, the dataset provides: Functional correctness results from the LeetCode online judge Runtime and memory usage statistics Five software engineering (SE) quality metrics: (i) Error Handling Score (EHS), (ii) Input Validation Score (IVS), (iii) Maintainability Score (MS), (iv) Style & Structure Score (S3), (v) Documentation Score (DS) All code solutions were generated using zero-shot prompting with fixed decoding parameters and without manual modification. The dataset is intended to support reproducible evaluation of LLM-generated code beyond correctness, emphasizing robustness, maintainability, and documentation quality. This release is designed for research and academic use. Runtime and memory measurements depend on the execution environment of the LeetCode platform and may vary over time.

提供机构：

Zenodo

创建时间：

2026-01-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集