five

code_evaluation_prompts

收藏
魔搭社区2025-11-07 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/HuggingFaceH4/code_evaluation_prompts
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for H4 Code Evaluation Prompts These are a filtered set of prompts for evaluating code instruction models. It will contain a variety of languages and task types. Currently, we used ChatGPT (GPT-3.5-tubro) to generate these, so we encourage using them only for qualatative evaluation and not to train your models. The generation of this data is similar to something like [CodeAlpaca](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20khttps://github.com/sahil280114/codealpaca#data-generation-process), which you can download [here](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k), but we intend to make these tasks both a) more challenging, and b) more curated. These two things hopefully give a meaningful evaluation, but is not enough data to train an entire model. The data corresponds to the following: * 20 simple python instruction following, * 20 intermediate python instruction following, * 10 advanced python instruciton following, * 15 python machine learning questions, * 20 C++ instruction following, * 10 html instruction following, * 20 misc language code feedback questions. Or, on a per language basis: * Python: 81 * C++: 21 * html: 10 * Ruby: 1 * Bash: 1 * MATLAB: 1 * React: 1 * Scala: 1 * JavaScript: 1 * Java: 1 * PHP: 1 Or, per instruction type: * Code completion / instruction following: 95 * Bug fixing: 20 To get the current information on the tasks, you can use the following snippet: ``` from datasets import load_dataset d = load_dataset("HuggingFaceH4/code_evaluation_prompts") language_list = d['train']['language'] language_count = {ele:language_list.count(ele) for ele in language_list} ``` Similar code can be run for the type of instruction (code generation vs. bug advice). Interested in contributing? Open a PR with a specific language and question content. Here are the ChatGPT prompts used to initiate the responses (which are then filtered), May 3rd 2023 version: * Generate a bunch of instructions for coding questions in python (in the format of {"prompt": instruction}) * These have been useful, can you generate the last few that are the hardest and most Pythonic that you can think of? * Taking a step back, can you generate 20 for me that don't need to be as hard, but are machine learning focused (e.g. a mix of PyTorch and Jax). * Generate a bunch of instructions for coding questions in C++ (in the format of {"prompt": instruction}) * Can you generate 5 examples of instructions, with the same format {"prompt": text}, where the instruction has a piece of code with a bug, and you're asking for feedback on your code as if you wrote it?

# H4代码评估提示词数据集卡片 本数据集为经过筛选的代码指令模型评估提示词集合,涵盖多种编程语言与任务类型。本数据集的提示词由ChatGPT(GPT-3.5-turbo)生成,因此仅建议将其用于定性评估,而非模型训练。 本数据集的生成流程与[CodeAlpaca](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20khttps://github.com/sahil280114/codealpaca#data-generation-process)类似,你可通过[此处](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k)下载CodeAlpaca数据集。但本数据集旨在达成两个目标: a)提升任务难度; b)增强任务的精选度。 上述两点设计可实现更具价值的评估效果,但本数据集规模不足以支撑完整的模型训练。 本数据集包含以下类别: * 20条简单Python指令跟随任务 * 20条中等难度Python指令跟随任务 * 10条高级Python指令跟随任务 * 15条Python机器学习相关问题 * 20条C++指令跟随任务 * 10条HTML指令跟随任务 * 20条多语言代码反馈类问题 按编程语言分类统计: * Python:81条 * C++:21条 * HTML:10条 * Ruby:1条 * Bash:1条 * MATLAB:1条 * React:1条 * Scala:1条 * JavaScript:1条 * Java:1条 * PHP:1条 按指令任务类型分类统计: * 代码补全/指令跟随:95条 * 代码缺陷修复:20条 若要获取当前任务的统计信息,可使用以下代码片段: python from datasets import load_dataset d = load_dataset("HuggingFaceH4/code_evaluation_prompts") language_list = d['train']['language'] language_count = {ele:language_list.count(ele) for ele in language_list} 你也可以通过类似的代码统计指令任务类型(代码生成与缺陷修复建议)的分布情况。 若有意参与贡献,请提交包含特定编程语言与问题内容的拉取请求(PR)。 以下为2023年5月3日版本中用于生成提示词的ChatGPT初始指令(生成的响应会经过筛选): * 生成一批Python编程题的指令,格式为`{"prompt": 指令内容}` * 这些指令效果不错,能否再生成一些你能想到的最难且最具Python风格的指令? * 换个思路,能否为我生成20条难度无需过高,但聚焦于机器学习领域的指令(例如结合PyTorch与Jax的任务)? * 生成一批C++编程题的指令,格式为`{"prompt": 指令内容}` * 能否生成5条符合上述格式的指令,要求指令中包含一段存在缺陷的代码,并以代码编写者的身份请求针对该代码的反馈?
提供机构:
maas
创建时间:
2025-02-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作