nuprl-staging/MultiPL-E
收藏Hugging Face2025-07-15 更新2024-07-06 收录
下载链接:
https://hf-mirror.com/datasets/nuprl-staging/MultiPL-E
下载链接
链接失效反馈官方服务:
资源简介:
MultiPLE-E数据集是一个包含多种编程语言的代码相关任务数据集。数据集中的注释由机器生成,语言内容由机器和专家共同生成。数据集为单语(英语),大小在1K到10K之间。数据集包含多个配置,每个配置对应不同的编程语言,并包含特定的特征和测试集。特征包括prompt、doctests、tests和stop_tokens等。数据集的来源包括原始数据集和扩展数据集(如OpenAI的Humaneval和MBPP)。
The MultiPLE-E dataset is a code-related task dataset that includes multiple programming languages. The annotations in the dataset are machine-generated, and the language content is generated by both machines and experts. The dataset is monolingual (English) and has a size category of 1K<n<10K. The dataset contains multiple configurations, each corresponding to a different programming language, and includes specific features and test sets. Features include prompt, doctests, tests, and stop_tokens. The datasets sources include original datasets and extended datasets (such as OpenAIs Humaneval and MBPP).
提供机构:
nuprl-staging
原始信息汇总
数据集概述
基本信息
- 数据集名称: MultiPLE-E
- 语言: 英语(en)
- 许可证: MIT
- 多语言性: 单语言(monolingual)
- 数据规模: 1K < n < 10K
- 数据来源: 原始数据(original),扩展数据(extended|openai_humaneval, extended|mbpp)
数据集配置
配置列表
- humaneval-clj
- humaneval-cpp
- humaneval-cs
- humaneval-d
- humaneval-elixir
- humaneval-go
- humaneval-hs
- humaneval-java
- humaneval-jl
- humaneval-js
- humaneval-lua
- humaneval-ml
- humaneval-php
- humaneval-pl
- humaneval-r
- humaneval-rb
- humaneval-rkt
- humaneval-rs
- humaneval-scala
- humaneval-sh
- humaneval-swift
- humaneval-ts
- mbpp-clj
- mbpp-cpp
- mbpp-cs
- mbpp-d
- mbpp-elixir
- mbpp-go
- mbpp-hs
- mbpp-java
- mbpp-jl
- mbpp-js
- mbpp-lua
- mbpp-ml
- mbpp-php
- mbpp-pl
- mbpp-r
- mbpp-rb
- mbpp-rkt
- mbpp-rs
- mbpp-scala
- mbpp-sh
- mbpp-swift
- mbpp-ts
配置详情
humaneval-clj
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 174890 bytes, 161 examples
- 下载大小: 70395 bytes
- 数据集大小: 174890 bytes
humaneval-cpp
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 245061 bytes, 161 examples
- 下载大小: 83221 bytes
- 数据集大小: 245061 bytes
humaneval-cs
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 288571 bytes, 158 examples
- 下载大小: 82080 bytes
- 数据集大小: 288571 bytes
humaneval-d
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 179391 bytes, 156 examples
- 下载大小: 70027 bytes
- 数据集大小: 179391 bytes
humaneval-elixir
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 207052 bytes, 161 examples
- 下载大小: 74798 bytes
- 数据集大小: 207052 bytes
humaneval-go
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 252128 bytes, 154 examples
- 下载大小: 78121 bytes
- 数据集大小: 252128 bytes
humaneval-hs
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 210523 bytes, 156 examples
- 下载大小: 69373 bytes
- 数据集大小: 210523 bytes
humaneval-java
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 293293 bytes, 158 examples
- 下载大小: 86178 bytes
- 数据集大小: 293293 bytes
humaneval-jl
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 165943 bytes, 159 examples
- 下载大小: 68620 bytes
- 数据集大小: 165943 bytes
humaneval-js
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 187162 bytes, 161 examples
- 下载大小: 70034 bytes
- 数据集大小: 187162 bytes
humaneval-lua
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 190211 bytes, 161 examples
- 下载大小: 70547 bytes
- 数据集大小: 190211 bytes
humaneval-ml
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 169037 bytes, 155 examples
- 下载大小: 68199 bytes
- 数据集大小: 169037 bytes
humaneval-php
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 230721 bytes, 161 examples
- 下载大小: 75195 bytes
- 数据集大小: 230721 bytes
humaneval-pl
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 248652 bytes, 161 examples
- 下载大小: 77247 bytes
- 数据集大小: 248652 bytes
humaneval-r
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 195050 bytes, 161 examples
- 下载大小: 71602 bytes
- 数据集大小: 195050 bytes
humaneval-rb
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 193448 bytes, 161 examples
- 下载大小: 72942 bytes
- 数据集大小: 193448 bytes
humaneval-rkt
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 194898 bytes, 161 examples
- 下载大小: 70785 bytes
- 数据集大小: 194898 bytes
humaneval-rs
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 193677 bytes, 156 examples
- 下载大小: 75300 bytes
- 数据集大小: 193677 bytes
humaneval-scala
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 245564 bytes, 160 examples
- 下载大小: 80950 bytes
- 数据集大小: 245564 bytes
humaneval-sh
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 169419 bytes, 158 examples
- 下载大小: 67691 bytes
- 数据集大小: 169419 bytes
humaneval-swift
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 209818 bytes, 158 examples
- 下载大小: 78057 bytes
- 数据集大小: 209818 bytes
humaneval-ts
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 191144 bytes, 159 examples
- 下载大小: 70427 bytes
- 数据集大小: 191144 bytes
mbpp-clj
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 249203 bytes, 397 examples
- 下载大小: 76741 bytes
- 数据集大小: 249203 bytes
mbpp-cpp
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 362938 bytes, 397 examples
- 下载大小: 97734 bytes
- 数据集大小: 362938 bytes
mbpp-cs
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 418542 bytes, 386 examples
- 下载大小: 99239 bytes
- 数据集大小: 418542 bytes
mbpp-d
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 233997 bytes, 358 examples
- 下载大小: 73269 bytes
- 数据集大小: 233997 bytes
mbpp-elixir
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 299264 bytes, 397 examples
- 下载大小: 84803 bytes
- 数据集大小: 299264 bytes
mbpp-go
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 401215 bytes, 374 examples
- 下载大小: 93635 bytes
- 数据集大小: 401215 bytes
mbpp-hs
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 256021 bytes, 355 examples
- 下载大小: 71870 bytes
- 数据集大小: 256021 bytes
mbpp-java
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 424038 bytes, 386 examples
- 下载大小: 99991 bytes
- 数据集大小: 424038 bytes
mbpp-jl
- 特征:
- name: string
- language: string
- prompt: string
- doctests: string
- original: string
- prompt_terminology: string
- tests: string
- stop_tokens: sequence of string
- 分割:
- test: 229892 bytes, 390 examples
- 下载大小:
搜集汇总
数据集介绍

背景与挑战
背景概述
MultiPL-E是一个支持22种编程语言的代码生成评估数据集,基于HumanEval和MBPP基准测试,通过编译器转换实现多语言支持,并包含多种提示变体以研究模型性能。最新版本新增了对四种语言的支持,并修复了部分问题。
以上内容由遇见数据集搜集并总结生成



