five

q-rz/enamel

收藏
Hugging Face2024-06-13 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/q-rz/enamel
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text2text-generation tags: - code - code-generation dataset_info: features: - name: task_id dtype: string - name: prompt dtype: string - name: input_generator dtype: string - name: input_levels dtype: string - name: reference_solution dtype: string - name: checker dtype: string - name: entry_point dtype: string splits: - name: ENAMEL_HumanEval num_bytes: 197547 num_examples: 164 configs: - config_name: default data_files: - split: ENAMEL_HumanEval path: "dataset/enamel.csv" --- See also: <a href="https://arxiv.org/abs/2406.06647"><img src="https://github.com/q-rz/enamel/raw/main/figures/img.shields.io%20badge%202406.06647-arXiv-B31B1B.svg" alt="Our paper on arXiv" style="display:inline-block;" /></a> <a href="https://github.com/q-rz/enamel"><img src="https://github.com/q-rz/enamel/raw/main/figures/img.shields.io%20badge%20enamel-GitHub-892793.svg" alt="Our code repo on GitHub" style="display:inline-block;" /></a> <a href="https://pypi.org/project/enam/"><img src="https://github.com/q-rz/enamel/raw/main/figures/img.shields.io%20badge%20enam-PyPI-006DAD.svg" alt="Our Python library on PyPI" style="display:inline-block;" /></a> ## What is ENAMEL? ENAMEL is a rigorous and high-standard benchmark for evaluating the capability of large language models (LLMs) in generating efficient code. We provide: - A new metric eff@k characterizing the relationship between code efficiency and sample size k; - A problem set consisting of 142 high-quality problems selected from [OpenAI HumanEval](https://github.com/openai/human-eval); - Expert-written efficient reference solutions, setting a high-standard for efficiency evaluation; - Expert-written strong test case generators, enabling a rigorous evaluation of both correctness and efficiency; - A Python library `enam` for easily evaluating the efficiency of LLM-generated code. If you are interested in our work, please feel free to check [our paper](https://arxiv.org/abs/2406.06647) for detail. <center><img src="https://github.com/q-rz/enamel/raw/main/figures/fig-enamel.svg" alt="Illustration of ENAMEL" style="max-width:1000px;width:100%;" /></center> ## Getting Started For instructions on using this dataset, please check [our GitHub repo](https://github.com/q-rz/enamel#getting-started). ## LLM Leaderboard The following table is a leaderboard of 30 LLMs (under greedy decoding) as well as HumanEval/HumanEval+ canonical solutions. Results show that LLMs fall short of generating expert-level efficient code. For more results, please refer to our paper. We welcome LLM developers to submit their results to enrich this leaderboard. If you would like to submit your results, please organize your generated code samples into a `.json` file as described above and contact Ruizhong Qiu (rq5 AT illinois DOT edu). |No.|Name|eff@1|pass@1| |:-:|:-|:-:|:-:| |1|HumanEval+|0.517|0.958| |2|GPT-4 Turbo (Nov 2023)|0.470|0.796| |3|HumanEval|0.458|0.908| |4|GPT-4 (Jun 2023)|0.454|0.831| |5|Llama 3 70B Instruct|0.421|0.746| |6|Mixtral 8x22B Instruct|0.408|0.746| |7|Claude 3 Opus|0.401|0.789| |8|Phind Code Llama V2|0.394|0.683| |9|Claude 3 Haiku|0.386|0.739| |10|ChatGPT|0.364|0.683| |11|Claude 3 Sonnet|0.345|0.662| |12|Llama 3 8B Instruct|0.344|0.592| |13|Code Llama 34B Python|0.268|0.458| |14|Mixtral 8x7B Instruct|0.266|0.444| |15|Code Llama 70B Python|0.264|0.500| |16|Code Llama 7B Python|0.247|0.373| |17|Code Llama 13B Python|0.216|0.408| |18|StarCoder|0.195|0.352| |19|CodeGen 6B|0.193|0.296| |20|CodeGen 16B|0.169|0.310| |21|CodeT5+ 16B|0.160|0.317| |22|CodeGen 2B|0.153|0.254| |23|Mistral 7B|0.152|0.275| |24|Vicuna 13B|0.123|0.176| |25|SantaCoder|0.100|0.141| |26|Incoder 6B|0.091|0.127| |27|GPT-J|0.083|0.106| |28|Incoder 1B|0.066|0.092| |29|Vicuna 7B|0.061|0.099| |30|GPT-Neo 2B|0.043|0.056| |31|PolyCoder|0.037|0.049| |32|StableLM 7B|0.020|0.021| ## Acknowledgements - [OpenAI HumanEval](https://github.com/openai/human-eval) - [EvalPlus](https://github.com/evalplus/evalplus) - [HuggingFace CodeEval](https://huggingface.co/spaces/evaluate-metric/code_eval)
提供机构:
q-rz
原始信息汇总

ENAMEL 数据集概述

数据集信息

  • 任务类别: 文本生成
  • 标签: 代码生成

数据集特征

  • task_id: 字符串类型
  • prompt: 字符串类型
  • input_generator: 字符串类型
  • input_levels: 字符串类型
  • reference_solution: 字符串类型
  • checker: 字符串类型
  • entry_point: 字符串类型

数据集分割

  • ENAMEL_HumanEval:
    • 字节数: 197547
    • 样本数: 164

配置

  • 配置名称: default
    • 数据文件:
      • 分割: ENAMEL_HumanEval
      • 路径: "dataset/enamel.csv"
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作