SecuCoGen
收藏arXiv2023-10-25 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2310.16263v1
下载链接
链接失效反馈官方服务:
资源简介:
SecuCoGen数据集由华南理工大学创建,专注于Python编程语言中的安全代码生成。该数据集包含180个样本,覆盖了2023年CWE(常见漏洞枚举)列表中的21种最危险的软件弱点。每个样本包含六个属性,详细描述了漏洞类型、不安全代码、安全代码及其解释,旨在通过这些数据提升大型语言模型在代码生成中的安全性,解决现有模型在生成代码时忽视安全性的问题。数据集的应用领域主要集中在提升软件工程中的代码安全性,确保生成的代码不仅功能正确,而且安全可靠,避免潜在的安全漏洞和攻击。
The SecuCoGen dataset, created by South China University of Technology, focuses on secure code generation for the Python programming language. It contains 180 samples covering 21 of the most dangerous software weaknesses listed in the 2023 CWE (Common Weakness Enumeration) list. Each sample includes six attributes that elaborate on the vulnerability type, insecure code, secure code, and their respective explanations. The dataset aims to enhance the security performance of large language models (LLMs) in code generation, addressing the issue that existing models often overlook security considerations during code generation. Its primary application areas center on improving code security in software engineering, ensuring that the generated code is not only functionally correct but also secure and reliable, thereby avoiding potential security vulnerabilities and attacks.
提供机构:
华南理工大学, 中国
创建时间:
2023-10-25



