AIprogrammer
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/TechxGenus/CursorCore
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是由LLMs合成的,与APEval所需格式高度一致,主要用于编程辅助任务。该数据集通过一个数据生成管道合成,该管道从编程过程中提炼出多样化的信息,以生成样本。其规模较大,包含约219,000个生成的样本,旨在用于训练模型,以执行代码补全和编辑等任务。
This dataset is synthesized using LLMs, highly consistent with the format required by APEval, and is primarily intended for programming assistance tasks. It is constructed via a dedicated data generation pipeline that extracts diverse information from programming workflows to generate dataset samples. With a considerable scale, it contains approximately 219,000 generated samples, and is designed for training models to perform tasks such as code completion and code editing.
提供机构:
TechxGenus



