NAACL'2025 Artifact
收藏DataCite Commons2026-03-18 更新2024-11-06 收录
下载链接:
https://figshare.com/articles/dataset/NAACL_2025_Artifact/27241386/1
下载链接
链接失效反馈官方服务:
资源简介:
Large Language Models (LLMs) have shown impressive capabilities in generating code, yet they often produce hallucinations—unfounded or incorrect outputs—that compromise the functionality of the generated code. This study investigates the application of local uncertainty quantification methods to detect hallucinations at the line level in code generated by LLMs. We focus on evaluating these methods in the context of two prominent code generation tasks, HumanEval and MBPP. We experiment with both open-source and black-box models. For each model, we generate code, calculate line-level uncertainty scores using various uncertainty quantification methods, and assess the correlation of these scores with the presence of hallucinations as identified by test case failures. Our empirical results are evaluated using metrics such as AUROC and AUPR to determine the effectiveness of these methods in detecting hallucinations, providing insights into their reliability and practical utility in enhancing the accuracy of code generation by LLMs.
提供机构:
figshare
创建时间:
2024-10-16



