Lyra
收藏arXiv2022-07-24 更新2024-06-21 收录
下载链接:
https://github.com/LIANGQINGYUAN/Lyra
下载链接
链接失效反馈官方服务:
资源简介:
Lyra数据集是由北京大学教育部高可信软件技术重点实验室创建,包含2000个精心标注的数据库操作程序,这些程序均来自实际项目,使用Python语言并嵌入SQL。每个程序都配有中文和英文注释,旨在通过自然语言注释生成基础的命令式语言程序。数据集的创建过程涉及从GitHub爬取代码片段,经过人工修改和注释,确保代码的正确性和独立性。Lyra数据集主要应用于代码生成领域,特别是解决命令式语言中嵌入声明式语言的生成问题,为提升实际软件开发效率提供挑战和机遇。
The Lyra Dataset was developed by the Key Laboratory of High Confidence Software Technology, Ministry of Education, Peking University. It contains 2000 meticulously annotated database operation programs, all sourced from real-world projects, written in Python with embedded SQL. Each program is paired with both Chinese and English annotations, aiming to generate basic imperative language programs from natural language annotations. The creation process of the Lyra Dataset involves crawling code snippets from GitHub, followed by manual revision and annotation to ensure the correctness and independence of the code. The Lyra Dataset is primarily applied in the field of code generation, specifically addressing the task of generating programs that embed declarative languages within imperative languages, providing both challenges and opportunities to improve the efficiency of practical software development.
提供机构:
教育部高可信软件技术重点实验室(北京大学)
创建时间:
2021-08-27



