openai_humaneval|编程评估数据集|代码生成数据集

魔搭社区2025-05-09 更新2025-01-11 收录

编程评估

代码生成

下载链接：

https://modelscope.cn/datasets/openai-mirror/openai_humaneval

下载链接

链接失效反馈

资源简介：

# Dataset Card for OpenAI HumanEval ## Table of Contents - [OpenAI HumanEval](#openai-humaneval) - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Initial Data Collection and Normalization](#initial-data-collection-and-normalization) - [Who are the source language producers?](#who-are-the-source-language-producers) - [Annotations](#annotations) - [Annotation process](#annotation-process) - [Who are the annotators?](#who-are-the-annotators) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Repository:** [GitHub Repository](https://github.com/openai/human-eval) - **Paper:** [Evaluating Large Language Models Trained on Code](https://arxiv.org/abs/2107.03374) ### Dataset Summary The HumanEval dataset released by OpenAI includes 164 programming problems with a function sig- nature, docstring, body, and several unit tests. They were handwritten to ensure not to be included in the training set of code generation models. ### Supported Tasks and Leaderboards ### Languages The programming problems are written in Python and contain English natural text in comments and docstrings. ## Dataset Structure ```python from datasets import load_dataset load_dataset("openai_humaneval") DatasetDict({ test: Dataset({ features: ['task_id', 'prompt', 'canonical_solution', 'test', 'entry_point'], num_rows: 164 }) }) ``` ### Data Instances An example of a dataset instance: ``` { "task_id": "test/0", "prompt": "def return1():\n", "canonical_solution": " return 1", "test": "def check(candidate):\n assert candidate() == 1", "entry_point": "return1" } ``` ### Data Fields - `task_id`: identifier for the data sample - `prompt`: input for the model containing function header and docstrings - `canonical_solution`: solution for the problem in the `prompt` - `test`: contains function to test generated code for correctness - `entry_point`: entry point for test ### Data Splits The dataset only consists of a test split with 164 samples. ## Dataset Creation ### Curation Rationale Since code generation models are often trained on dumps of GitHub a dataset not included in the dump was necessary to properly evaluate the model. However, since this dataset was published on GitHub it is likely to be included in future dumps. ### Source Data The dataset was handcrafted by engineers and researchers at OpenAI. #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations [More Information Needed] #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information None. ## Considerations for Using the Data Make sure you execute generated Python code in a safe environment when evauating against this dataset as generated code could be harmful. ### Social Impact of Dataset With this dataset code generating models can be better evaluated which leads to fewer issues introduced when using such models. ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators OpenAI ### Licensing Information MIT License ### Citation Information ``` @misc{chen2021evaluating, title={Evaluating Large Language Models Trained on Code}, author={Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Ponde de Oliveira Pinto and Jared Kaplan and Harri Edwards and Yuri Burda and Nicholas Joseph and Greg Brockman and Alex Ray and Raul Puri and Gretchen Krueger and Michael Petrov and Heidy Khlaaf and Girish Sastry and Pamela Mishkin and Brooke Chan and Scott Gray and Nick Ryder and Mikhail Pavlov and Alethea Power and Lukasz Kaiser and Mohammad Bavarian and Clemens Winter and Philippe Tillet and Felipe Petroski Such and Dave Cummings and Matthias Plappert and Fotios Chantzis and Elizabeth Barnes and Ariel Herbert-Voss and William Hebgen Guss and Alex Nichol and Alex Paino and Nikolas Tezak and Jie Tang and Igor Babuschkin and Suchir Balaji and Shantanu Jain and William Saunders and Christopher Hesse and Andrew N. Carr and Jan Leike and Josh Achiam and Vedant Misra and Evan Morikawa and Alec Radford and Matthew Knight and Miles Brundage and Mira Murati and Katie Mayer and Peter Welinder and Bob McGrew and Dario Amodei and Sam McCandlish and Ilya Sutskever and Wojciech Zaremba}, year={2021}, eprint={2107.03374}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` ### Contributions Thanks to [@lvwerra](https://github.com/lvwerra) for adding this dataset.

提供机构：

maas

创建时间：

2025-01-08

用户留言

有没有相关的论文或文献参考？

这个数据集是基于什么背景创建的？

数据集的作者是谁？

能帮我联系到这个数据集的作者吗？

这个数据集如何下载？

点击留言

数据主题

具身智能

数据集 4098个

机构 8个

大模型

数据集 439个

机构 10个

无人机

数据集 37个

机构 6个

指令微调

数据集 36个

机构 6个

蛋白质结构

数据集 50个

机构 8个

空间智能

数据集 21个

机构 5个

5,000+

优质数据集

54 个

任务类型

进入经典数据集

热门数据集

中国区域交通网络数据集

该数据集包含中国各区域的交通网络信息，包括道路、铁路、航空和水路等多种交通方式的网络结构和连接关系。数据集详细记录了各交通节点的位置、交通线路的类型、长度、容量以及相关的交通流量信息。

data.stats.gov.cn 收录

中国空气质量数据集（2014-2020年）

数据集中的空气质量数据类型包括PM2.5, PM10, SO2, NO2, O3, CO, AQI，包含了2014-2020年全国360个城市的逐日空气质量监测数据。监测数据来自中国环境监测总站的全国城市空气质量实时发布平台，每日更新。数据集的原始文件为CSV的文本记录，通过空间化处理生产出Shape格式的空间数据。数据集包括CSV格式和Shape格式两数数据格式。

国家地球系统科学数据中心收录

Fruits-360

一个高质量的水果图像数据集，包含多种水果的图像，如苹果、香蕉、樱桃等，总计42345张图片，分为训练集和验证集，共有64个水果类别。

github 收录

鄱阳湖流域主要水文站实时日水位观测数据集（2017-2024年）

该数据集为鄱阳湖流域主要水文站的逐日实时水位数据集。包含了外洲站、李家渡站、湖口站、星子站、万家埠站、都昌等10个主要水文站的日水位数据，观测时间为每日8：00。共享政策为一次可共享3000条数据，一个站点的一日数据为一条记录，一年可申请一次。数据集包含1个excel表格文件，日水位.xlsx。

国家地球系统科学数据中心收录

IUCN Red List

IUCN Red List（国际自然保护联盟濒危物种红色名录）是一个全球性的物种评估数据库，旨在提供关于生物多样性状况的科学信息。该数据集包含了全球范围内动植物物种的分类、分布、种群趋势、威胁因素和保护措施等信息。