humanevalpack

Name: humanevalpack
Creator: maas
Published: 2025-12-05 16:53:54
License: 暂无描述

魔搭社区2025-12-05 更新2025-11-08 收录

下载链接：

https://modelscope.cn/datasets/bigcode/humanevalpack

下载链接

链接失效反馈

官方服务：

资源简介：

![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true) # Dataset Card for HumanEvalPack ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Additional Information](#additional-information) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Repository:** https://github.com/bigcode-project/octopack - **Paper:** [OctoPack: Instruction Tuning Code Large Language Models](https://arxiv.org/abs/2308.07124) - **Point of Contact:** [Niklas Muennighoff](mailto:n.muennighoff@gmail.com) ### Dataset Summary > HumanEvalPack is an extension of OpenAI's HumanEval to cover 6 total languages across 3 tasks. The Python split is exactly the same as OpenAI's Python HumanEval. The other splits are translated by humans (similar to HumanEval-X but with additional cleaning, see [here](https://github.com/bigcode-project/octopack/tree/main/evaluation/create/humaneval-x#modifications-muennighoff)). Refer to the [OctoPack paper](https://arxiv.org/abs/2308.07124) for more details. > - **Languages:** Python, JavaScript, Java, Go, C++, Rust - **OctoPack🐙🎒:** <table> <tr> <th>Data</t> <td><a href=https://huggingface.co/datasets/bigcode/commitpack>CommitPack</a></td> <td>4TB of GitHub commits across 350 programming languages</td> </tr> <tr> <th></t> <td><a href=https://huggingface.co/datasets/bigcode/commitpackft>CommitPackFT</a></td> <td>Filtered version of CommitPack for high-quality commit messages that resemble instructions</td> </tr> <tr> <th>Model</t> <td><a href=https://huggingface.co/bigcode/octocoder>OctoCoder</a></td> <td>StarCoder (16B parameters) instruction tuned on CommitPackFT + OASST</td> </tr> <tr> <th></t> <td><a href=https://huggingface.co/bigcode/octogeex>OctoGeeX</a></td> <td>CodeGeeX2 (6B parameters) instruction tuned on CommitPackFT + OASST</td> </tr> <tr> <th>Evaluation</t> <td><a href=https://huggingface.co/datasets/bigcode/humanevalpack>HumanEvalPack</a></td> <td>Extension of OpenAI's HumanEval to cover 3 scenarios across 6 languages</td> </tr> </table> ## Usage ```python # pip install -q datasets from datasets import load_dataset # Languages: "python", "js", "java", "go", "cpp", "rust" ds = load_dataset("bigcode/humanevalpack", "python")["test"] ds[0] ``` ## Dataset Structure ### Data Instances An example looks as follows: ```json { "task_id": "Python/0", "prompt": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n given threshold.\n >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n False\n >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n True\n \"\"\"\n", "declaration": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n", "canonical_solution": " for idx, elem in enumerate(numbers):\n for idx2, elem2 in enumerate(numbers):\n if idx != idx2:\n distance = abs(elem - elem2)\n if distance < threshold:\n return True\n\n return False\n", "buggy_solution": " for idx, elem in enumerate(numbers):\n for idx2, elem2 in enumerate(numbers):\n if idx != idx2:\n distance = elem - elem2\n if distance < threshold:\n return True\n\n return False\n", "bug_type": "missing logic", "failure_symptoms": "incorrect output", "entry_point": "has_close_elements", "import": "" "test_setup": "" "test": "\n\n\n\n\ndef check(has_close_elements):\n assert has_close_elements([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True\n assert has_close_elements([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False\n assert has_close_elements([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True\n assert has_close_elements([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False\n assert has_close_elements([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True\n assert has_close_elements([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True\n assert has_close_elements([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False\n\ncheck(has_close_elements)", "example_test": "def check(has_close_elements):\n assert has_close_elements([1.0, 2.0, 3.0], 0.5) == False\n assert has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) == True\ncheck(has_close_elements)\n", "signature": "has_close_elements(numbers: List[float], threshold: float) -> bool", "docstring": "Check if in given list of numbers, are any two numbers closer to each other than\ngiven threshold.\n>>> has_close_elements([1.0, 2.0, 3.0], 0.5)\nFalse\n>>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\nTrue", "instruction": "Write a Python function `has_close_elements(numbers: List[float], threshold: float) -> bool` to solve the following problem:\nCheck if in given list of numbers, are any two numbers closer to each other than\ngiven threshold.\n>>> has_close_elements([1.0, 2.0, 3.0], 0.5)\nFalse\n>>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\nTrue" } ``` ### Data Fields The data fields are the same among all splits: - `task_id`: Indicates the language (Python/JavaScript/Java/Go/C++/Rust) and task id (from 0 to 163) of the problem - `prompt`: the prompt for models relying on code continuation - `declaration`: the declaration of the function (same as prompt but without the docstring) - `canonical_solution`: the correct solution passing all unit tests for the problem - `buggy_solution`: same as `canonical_solution` but with a subtle human-written bug causing the unit tests to fail - `bug_type`: the type of the bug in `buggy_solution` (one of [`missing logic`, `excess logic`, `value misuse`, `operator misuse`, `variable misuse`, `function misuse`]) - `failure_symptoms`: the problem the bug causes (one of [`incorrect output`, `stackoverflow`, `infinite loop`]) - `entry_point`: the name of the function - `import`: imports necessary for the solution (only present for Go) - `test_setup`: imports necessary for the test execution (only present for Go) - `test`: the unit tests for the problem - `example_test`: additional unit tests different from `test` that could be e.g. provided to the model (these are not used in the paper) - `signature`: the signature of the function - `docstring`: the docstring describing the problem - `instruction`: an instruction for HumanEvalSynthesize in the form `Write a {language_name} function {signature} to solve the following problem:\n{docstring}` ## Citation Information ```bibtex @article{muennighoff2023octopack, title={OctoPack: Instruction Tuning Code Large Language Models}, author={Niklas Muennighoff and Qian Liu and Armel Zebaze and Qinkai Zheng and Binyuan Hui and Terry Yue Zhuo and Swayam Singh and Xiangru Tang and Leandro von Werra and Shayne Longpre}, journal={arXiv preprint arXiv:2308.07124}, year={2023} } ```

![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true) # HumanEvalPack 数据集卡片 ## 目录 - [目录](#table-of-contents) - [数据集描述](#dataset-description) - [数据集概述](#dataset-summary) - [支持语言](#languages) - [数据集结构](#dataset-structure) - [数据样例](#data-instances) - [数据字段](#data-fields) - [数据子集](#data-splits) - [数据集构建](#dataset-creation) - [构建初衷](#curation-rationale) - [数据源](#source-data) - [标注信息](#annotations) - [附加信息](#additional-information) - [许可协议](#licensing-information) - [引用信息](#citation-information) - [贡献者](#contributions) ## 数据集描述 - **仓库地址**：https://github.com/bigcode-project/octopack - **相关论文**：[OctoPack：指令微调代码大语言模型](https://arxiv.org/abs/2308.07124) - **联系人**：[Niklas Muennighoff](mailto:n.muennighoff@gmail.com) ### 数据集概述 > HumanEvalPack 是 OpenAI 旗下 HumanEval 数据集的扩展版本，涵盖3类任务场景，共支持6种编程语言。其中 Python 子集与 OpenAI 原版 Python HumanEval 完全一致。其余子集均经过人工翻译（与 HumanEval-X 类似但额外增加了清洗步骤，详见[此处](https://github.com/bigcode-project/octopack/tree/main/evaluation/create/humaneval-x#modifications-muennighoff)）。更多细节请参考[OctoPack 论文](https://arxiv.org/abs/2308.07124)。 > - **支持语言**：Python、JavaScript、Java、Go、C++、Rust - **OctoPack🐙🎒：** <table> <tr> <th>分类</th> <th>项目链接</th> <th>描述</th> </tr> <tr> <th>数据</th> <td><a href=https://huggingface.co/datasets/bigcode/commitpack>CommitPack</a></td> <td>涵盖350种编程语言的4TB规模GitHub提交数据集</td> </tr> <tr> <th></th> <td><a href=https://huggingface.co/datasets/bigcode/commitpackft>CommitPackFT</a></td> <td>从 CommitPack 中筛选出的高质量类指令式提交消息子集</td> </tr> <tr> <th>模型</th> <td><a href=https://huggingface.co/bigcode/octocoder>OctoCoder</a></td> <td>基于 CommitPackFT 与 OASST 数据集进行指令微调的160亿参数 StarCoder 模型</td> </tr> <tr> <th></th> <td><a href=https://huggingface.co/bigcode/octogeex>OctoGeeX</a></td> <td>基于 CommitPackFT 与 OASST 数据集进行指令微调的60亿参数 CodeGeeX2 模型</td> </tr> <tr> <th>评测</th> <td><a href=https://huggingface.co/datasets/bigcode/humanevalpack>HumanEvalPack</a></td> <td>OpenAI HumanEval 的扩展版本，涵盖6种编程语言的3类任务场景</td> </tr> </table> ## 使用方法 python # 静默安装 datasets 库 # pip install -q datasets from datasets import load_dataset # 支持的语言："python", "js", "java", "go", "cpp", "rust" ds = load_dataset("bigcode/humanevalpack", "python")["test"] ds[0] ## 数据集结构 ### 数据样例一条典型的数据样例如下所示： json { "task_id": "Python/0", "prompt": "from typing import List def has_close_elements(numbers: List[float], threshold: float) -> bool: """ Check if in given list of numbers, are any two numbers closer to each other than given threshold. >>> has_close_elements([1.0, 2.0, 3.0], 0.5) False >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) True """ ", "declaration": "from typing import List def has_close_elements(numbers: List[float], threshold: float) -> bool: ", "canonical_solution": " for idx, elem in enumerate(numbers): for idx2, elem2 in enumerate(numbers): if idx != idx2: distance = abs(elem - elem2) if distance < threshold: return True return False ", "buggy_solution": " for idx, elem in enumerate(numbers): for idx2, elem2 in enumerate(numbers): if idx != idx2: distance = elem - elem2 if distance < threshold: return True return False ", "bug_type": "missing logic", "failure_symptoms": "incorrect output", "entry_point": "has_close_elements", "import": "" "test_setup": "" "test": " def check(has_close_elements): assert has_close_elements([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True assert has_close_elements([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False assert has_close_elements([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True assert has_close_elements([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False assert has_close_elements([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True assert has_close_elements([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True assert has_close_elements([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False check(has_close_elements)", "example_test": "def check(has_close_elements): assert has_close_elements([1.0, 2.0, 3.0], 0.5) == False assert has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) == True check(has_close_elements) ", "signature": "has_close_elements(numbers: List[float], threshold: float) -> bool", "docstring": "Check if in given list of numbers, are any two numbers closer to each other than given threshold. >>> has_close_elements([1.0, 2.0, 3.0], 0.5) False >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) True", "instruction": "Write a Python function `has_close_elements(numbers: List[float], threshold: float) -> bool` to solve the following problem: Check if in given list of numbers, are any two numbers closer to each other than given threshold. >>> has_close_elements([1.0, 2.0, 3.0], 0.5) False >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) True" } ### 数据字段所有数据子集的字段均保持统一： - `task_id`：表示问题所属的编程语言（Python/JavaScript/Java/Go/C++/Rust）与任务ID（取值范围为0至163） - `prompt`：用于代码续写模型的输入提示文本 - `declaration`：函数声明（与`prompt`字段内容一致，但不含文档字符串） - `canonical_solution`：通过所有单元测试的标准解决方案代码 - `buggy_solution`：与标准解决方案代码一致，但包含一处人工编写的细微漏洞，会导致单元测试失败 - `bug_type`：`buggy_solution`中漏洞的类型，可选值为 [`missing logic`（逻辑缺失）、`excess logic`（逻辑冗余）、`value misuse`（值误用）、`operator misuse`（运算符误用）、`variable misuse`（变量误用）、`function misuse`（函数误用）] - `failure_symptoms`：漏洞引发的异常类型，可选值为 [`incorrect output`（输出错误）、`stackoverflow`（栈溢出）、`infinite loop`（无限循环）] - `entry_point`：目标函数的名称 - `import`：解决方案所需的导入语句（仅Go语言子集包含该字段） - `test_setup`：测试执行所需的导入语句（仅Go语言子集包含该字段） - `test`：用于验证解决方案正确性的单元测试集 - `example_test`：与`test`字段不同的额外单元测试，可用于向模型提供示例提示（本论文未使用该字段） - `signature`：目标函数的类型签名 - `docstring`：描述任务需求的文档字符串 - `instruction`：用于HumanEvalSynthesize任务的标准指令，格式为`编写一个{编程语言名称}函数{函数签名}以解决以下问题： {文档字符串}` ## 引用信息 bibtex @article{muennighoff2023octopack, title={OctoPack: Instruction Tuning Code Large Language Models}, author={Niklas Muennighoff and Qian Liu and Armel Zebaze and Qinkai Zheng and Binyuan Hui and Terry Yue Zhuo and Swayam Singh and Xiangru Tang and Leandro von Werra and Shayne Longpre}, journal={arXiv preprint arXiv:2308.07124}, year={2023} }

提供机构：

maas

创建时间：

2025-10-11

搜集汇总

数据集介绍