project-droid/DroidCollection-Personas
收藏Hugging Face2025-06-17 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/project-droid/DroidCollection-Personas
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含基于多种机器生成的程序员角色而产生的合成编程任务及其对应的代码样本。这些样本不是基于人类的提示或完成内容生成的,目的是为了在合成数据创建中减少偏见。数据集通过定义程序员的九个关键特征,如主要编程语言、工作领域、代码注释风格等,生成多样化的真实程序员档案,并使用GPT-4o生成非平凡的编码任务。这些任务经过去重后,用作代码生成的提示。
This dataset contains synthetic programming tasks and corresponding code samples generated based on diverse, machine-created programmer personas. Unlike typical AI-generated content datasets that rely on human-written prompts or completions, this collection avoids conditioning on prior human generations, aiming to reduce bias in synthetic data creation. The dataset generates a variety of realistic programmer profiles by defining nine key features of a programmer, such as primary programming language, field of work, code commenting style, etc., and uses GPT-4o to generate non-trivial coding tasks. These tasks are then deduplicated and used as prompts for code generation.
提供机构:
project-droid



