code_evaluation_prompts
收藏魔搭社区2025-11-07 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/HuggingFaceH4/code_evaluation_prompts
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for H4 Code Evaluation Prompts
These are a filtered set of prompts for evaluating code instruction models.
It will contain a variety of languages and task types.
Currently, we used ChatGPT (GPT-3.5-tubro) to generate these, so we encourage using them only for qualatative evaluation and not to train your models.
The generation of this data is similar to something like [CodeAlpaca](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20khttps://github.com/sahil280114/codealpaca#data-generation-process), which you can download [here](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k), but we intend to make these tasks both
a) more challenging, and
b) more curated.
These two things hopefully give a meaningful evaluation, but is not enough data to train an entire model.
The data corresponds to the following:
* 20 simple python instruction following,
* 20 intermediate python instruction following,
* 10 advanced python instruciton following,
* 15 python machine learning questions,
* 20 C++ instruction following,
* 10 html instruction following,
* 20 misc language code feedback questions.
Or, on a per language basis:
* Python: 81
* C++: 21
* html: 10
* Ruby: 1
* Bash: 1
* MATLAB: 1
* React: 1
* Scala: 1
* JavaScript: 1
* Java: 1
* PHP: 1
Or, per instruction type:
* Code completion / instruction following: 95
* Bug fixing: 20
To get the current information on the tasks, you can use the following snippet:
```
from datasets import load_dataset
d = load_dataset("HuggingFaceH4/code_evaluation_prompts")
language_list = d['train']['language']
language_count = {ele:language_list.count(ele) for ele in language_list}
```
Similar code can be run for the type of instruction (code generation vs. bug advice).
Interested in contributing? Open a PR with a specific language and question content.
Here are the ChatGPT prompts used to initiate the responses (which are then filtered), May 3rd 2023 version:
* Generate a bunch of instructions for coding questions in python (in the format of {"prompt": instruction})
* These have been useful, can you generate the last few that are the hardest and most Pythonic that you can think of?
* Taking a step back, can you generate 20 for me that don't need to be as hard, but are machine learning focused (e.g. a mix of PyTorch and Jax).
* Generate a bunch of instructions for coding questions in C++ (in the format of {"prompt": instruction})
* Can you generate 5 examples of instructions, with the same format {"prompt": text}, where the instruction has a piece of code with a bug, and you're asking for feedback on your code as if you wrote it?
# H4代码评估提示词数据集卡片
本数据集为经过筛选的代码指令模型评估提示词集合,涵盖多种编程语言与任务类型。本数据集的提示词由ChatGPT(GPT-3.5-turbo)生成,因此仅建议将其用于定性评估,而非模型训练。
本数据集的生成流程与[CodeAlpaca](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20khttps://github.com/sahil280114/codealpaca#data-generation-process)类似,你可通过[此处](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k)下载CodeAlpaca数据集。但本数据集旨在达成两个目标:
a)提升任务难度;
b)增强任务的精选度。
上述两点设计可实现更具价值的评估效果,但本数据集规模不足以支撑完整的模型训练。
本数据集包含以下类别:
* 20条简单Python指令跟随任务
* 20条中等难度Python指令跟随任务
* 10条高级Python指令跟随任务
* 15条Python机器学习相关问题
* 20条C++指令跟随任务
* 10条HTML指令跟随任务
* 20条多语言代码反馈类问题
按编程语言分类统计:
* Python:81条
* C++:21条
* HTML:10条
* Ruby:1条
* Bash:1条
* MATLAB:1条
* React:1条
* Scala:1条
* JavaScript:1条
* Java:1条
* PHP:1条
按指令任务类型分类统计:
* 代码补全/指令跟随:95条
* 代码缺陷修复:20条
若要获取当前任务的统计信息,可使用以下代码片段:
python
from datasets import load_dataset
d = load_dataset("HuggingFaceH4/code_evaluation_prompts")
language_list = d['train']['language']
language_count = {ele:language_list.count(ele) for ele in language_list}
你也可以通过类似的代码统计指令任务类型(代码生成与缺陷修复建议)的分布情况。
若有意参与贡献,请提交包含特定编程语言与问题内容的拉取请求(PR)。
以下为2023年5月3日版本中用于生成提示词的ChatGPT初始指令(生成的响应会经过筛选):
* 生成一批Python编程题的指令,格式为`{"prompt": 指令内容}`
* 这些指令效果不错,能否再生成一些你能想到的最难且最具Python风格的指令?
* 换个思路,能否为我生成20条难度无需过高,但聚焦于机器学习领域的指令(例如结合PyTorch与Jax的任务)?
* 生成一批C++编程题的指令,格式为`{"prompt": 指令内容}`
* 能否生成5条符合上述格式的指令,要求指令中包含一段存在缺陷的代码,并以代码编写者的身份请求针对该代码的反馈?
提供机构:
maas
创建时间:
2025-02-10



