code_evaluation_prompts

Name: code_evaluation_prompts
Creator: maas
Published: 2025-11-07 16:22:56
License: 暂无描述

魔搭社区2025-11-07 更新2025-02-15 收录

下载链接：

https://modelscope.cn/datasets/HuggingFaceH4/code_evaluation_prompts

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for H4 Code Evaluation Prompts These are a filtered set of prompts for evaluating code instruction models. It will contain a variety of languages and task types. Currently, we used ChatGPT (GPT-3.5-tubro) to generate these, so we encourage using them only for qualatative evaluation and not to train your models. The generation of this data is similar to something like [CodeAlpaca](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20khttps://github.com/sahil280114/codealpaca#data-generation-process), which you can download [here](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k), but we intend to make these tasks both a) more challenging, and b) more curated. These two things hopefully give a meaningful evaluation, but is not enough data to train an entire model. The data corresponds to the following: * 20 simple python instruction following, * 20 intermediate python instruction following, * 10 advanced python instruciton following, * 15 python machine learning questions, * 20 C++ instruction following, * 10 html instruction following, * 20 misc language code feedback questions. Or, on a per language basis: * Python: 81 * C++: 21 * html: 10 * Ruby: 1 * Bash: 1 * MATLAB: 1 * React: 1 * Scala: 1 * JavaScript: 1 * Java: 1 * PHP: 1 Or, per instruction type: * Code completion / instruction following: 95 * Bug fixing: 20 To get the current information on the tasks, you can use the following snippet: ``` from datasets import load_dataset d = load_dataset("HuggingFaceH4/code_evaluation_prompts") language_list = d['train']['language'] language_count = {ele:language_list.count(ele) for ele in language_list} ``` Similar code can be run for the type of instruction (code generation vs. bug advice). Interested in contributing? Open a PR with a specific language and question content. Here are the ChatGPT prompts used to initiate the responses (which are then filtered), May 3rd 2023 version: * Generate a bunch of instructions for coding questions in python (in the format of {"prompt": instruction}) * These have been useful, can you generate the last few that are the hardest and most Pythonic that you can think of? * Taking a step back, can you generate 20 for me that don't need to be as hard, but are machine learning focused (e.g. a mix of PyTorch and Jax). * Generate a bunch of instructions for coding questions in C++ (in the format of {"prompt": instruction}) * Can you generate 5 examples of instructions, with the same format {"prompt": text}, where the instruction has a piece of code with a bug, and you're asking for feedback on your code as if you wrote it?

# H4代码评估提示词数据集卡片本数据集为经过筛选的代码指令模型评估提示词集合，涵盖多种编程语言与任务类型。本数据集的提示词由ChatGPT（GPT-3.5-turbo）生成，因此仅建议将其用于定性评估，而非模型训练。本数据集的生成流程与[CodeAlpaca](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20khttps://github.com/sahil280114/codealpaca#data-generation-process)类似，你可通过[此处](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k)下载CodeAlpaca数据集。但本数据集旨在达成两个目标： a）提升任务难度； b）增强任务的精选度。上述两点设计可实现更具价值的评估效果，但本数据集规模不足以支撑完整的模型训练。本数据集包含以下类别： * 20条简单Python指令跟随任务 * 20条中等难度Python指令跟随任务 * 10条高级Python指令跟随任务 * 15条Python机器学习相关问题 * 20条C++指令跟随任务 * 10条HTML指令跟随任务 * 20条多语言代码反馈类问题按编程语言分类统计： * Python：81条 * C++：21条 * HTML：10条 * Ruby：1条 * Bash：1条 * MATLAB：1条 * React：1条 * Scala：1条 * JavaScript：1条 * Java：1条 * PHP：1条按指令任务类型分类统计： * 代码补全/指令跟随：95条 * 代码缺陷修复：20条若要获取当前任务的统计信息，可使用以下代码片段： python from datasets import load_dataset d = load_dataset("HuggingFaceH4/code_evaluation_prompts") language_list = d['train']['language'] language_count = {ele:language_list.count(ele) for ele in language_list} 你也可以通过类似的代码统计指令任务类型（代码生成与缺陷修复建议）的分布情况。若有意参与贡献，请提交包含特定编程语言与问题内容的拉取请求（PR）。以下为2023年5月3日版本中用于生成提示词的ChatGPT初始指令（生成的响应会经过筛选）： * 生成一批Python编程题的指令，格式为`{"prompt": 指令内容}` * 这些指令效果不错，能否再生成一些你能想到的最难且最具Python风格的指令？ * 换个思路，能否为我生成20条难度无需过高，但聚焦于机器学习领域的指令（例如结合PyTorch与Jax的任务）？ * 生成一批C++编程题的指令，格式为`{"prompt": 指令内容}` * 能否生成5条符合上述格式的指令，要求指令中包含一段存在缺陷的代码，并以代码编写者的身份请求针对该代码的反馈？

提供机构：

maas

创建时间：

2025-02-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集