Correct-ish by Design: From Upfront Verification to Continuous Monitoring of LLM Generated Code

Name: Correct-ish by Design: From Upfront Verification to Continuous Monitoring of LLM Generated Code
Creator: Root
Published: 2025-05-11 09:07:20
License: 暂无描述

DataCite Commons2025-05-11 更新2025-05-17 收录

下载链接：

http://dataverse.jpl.nasa.gov/citation?persistentId=doi:10.48577/jpl.JMLST3

下载链接

链接失效反馈

官方服务：

资源简介：

As developers increasingly rely on Large Language Models (LLMs) to generate code, the pace of software development is accelerating beyond the capabilities of traditional design-time verification and testing methods. We predict a paradigm shift towards continuous monitoring to complement and eventually supersede upfront verification. By embracing a “correct-ish by design” philosophy, we acknowledge the inevitability of imperfections in LLM-generated code. We advocate for an adaptive approach where real-time monitoring and feedback mechanisms are employed to detect, diagnose, and rectify issues as they emerge in the field. This continuous monitoring strategy not only ensures sustained software reliability and performance but also provides valuable insights into LLM behavior, facilitating iterative improvements. We experiment with the integration of automated monitoring and testing for managing the complexities of LLM-driven code generation in a rapidly evolving software landscape.

随着开发者愈发依赖大语言模型（Large Language Models，LLMs）生成代码，软件开发的节奏正以远超传统设计阶段验证与测试方法能力边界的速率持续加速。我们预判行业将出现范式转向：采用持续监控来补充并最终替代前置验证环节。我们秉持“设计上近似正确（correct-ish by design）”的理念，承认大语言模型生成的代码不可避免地存在瑕疵。我们主张采用自适应方案，借助实时监控与反馈机制，在实际部署场景中及时检测、诊断并修复出现的问题。此类持续监控策略不仅能够保障软件的长期可靠性与性能表现，还能为深入理解大语言模型的行为模式提供宝贵洞见，助力迭代优化。针对快速演进的软件生态中大语言模型驱动的代码生成所带来的复杂度挑战，我们开展了自动化监控与测试集成的相关实验。

提供机构：

Root

创建时间：

2025-05-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集