five

ManavSinghal157/NoFunEval

收藏
Hugging Face2024-03-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ManavSinghal157/NoFunEval
下载链接
链接失效反馈
官方服务:
资源简介:
NoFunEval数据集是一个用于评估代码语言模型(code LMs)在非功能性需求上表现的数据集。它包含多个子集,如latency、resource_util、runtime_efficiency、maintainability、security和humanevalclassify。这些子集用于评估代码语言模型在效率、安全性、可维护性等方面的表现,而不仅仅是功能性正确性。数据集的目标是揭示代码语言模型在这些非功能性需求上的盲点,并评估其在这些方面的理解和生成能力。

NoFunEval dataset is a dataset for evaluating the performance of code language models (code LMs) on non-functional requirements. It contains multiple subsets such as latency, resource_util, runtime_efficiency, maintainability, security, and humanevalclassify. These subsets are used to assess the performance of code LMs in terms of efficiency, security, maintainability and other relevant dimensions, rather than merely focusing on functional correctness. The core goal of this dataset is to uncover the blind spots of code LMs regarding these non-functional requirements, and to evaluate their understanding and generation capabilities in these specific aspects.
提供机构:
ManavSinghal157
原始信息汇总

NoFunEval 数据集概述

数据集配置

  • 配置名称: default
  • 数据文件:
    • 分割: latency
      • 路径: datasets/latency.jsonl
    • 分割: resource_util
      • 路径: datasets/resource_util.jsonl
    • 分割: runtime_efficiency
      • 路径: datasets/runtime_efficiency.jsonl
    • 分割: maintainability
      • 路径: datasets/maintainability.jsonl
    • 分割: security
      • 路径: datasets/security.jsonl
    • 分割: humanevalclassify
      • 路径: datasets/humanevalclassify.jsonl

数据集用途

  • 目的: 评估代码语言模型(code LMs)在非功能性需求和简单分类实例上的表现。
  • 方法: 使用 Coding Concepts (CoCo) 提示方法,帮助开发者向模型传达领域知识。

数据集生成

  • 生成脚本:
    • NoFunEdit: console python3 src/nofunedit_generation.py --data_subset <subset from nofunedit: eg-latency> --model_path <model name from HF: eg-WizardLM/WizardCoder-15B-V1.0> --temperature <temperature to be set for model generation: eg-0> --max_new_tokens <maximum number of new tokens to be generated: eg-5192> --prompt <type of prompt to use from our dataset: eg-base_prompt> --num_samples <number of samples to be generated: eg-1> --precision <floating point format: eg-fp16> --batch_size <number of examples to send to llm engine at once: eg-1>

    • Classification: console python3 src/classification_generation.py --data_subset <subset from non_func or humanevalclassify: eg-latency> --model <model name from HF: eg-WizardLM/WizardCoder-15B-V1.0> --temperature <temperature to be set for model generation: eg-0> --max_new_tokens <maximum number of new tokens to be generated: eg-5192> --prompt <type of prompt to use from our dataset: eg-base_prompt> --precision <floating point format: eg-fp16> --batch_size <number of examples to send to llm engine at once: eg-1>

数据集评估

  • 评估脚本: console python3 src/evaluation.py --data_subset <subset from nofunedit: eg-latency> --model_path <model name from HF: eg-WizardLM/WizardCoder-15B-V1.0> --prompt <type of prompt to use from our dataset: eg-base_prompt> --num_samples <number of samples to be generated: eg-1> --score_k <K values for score@k: eg-1,5,10,20> --metric <eval_metric to be used: eg-diffbleu>

参数说明

  • data_subset: 数据子集选项,包括 latency, resource_util, maintainability, security, runtime_efficiencyhumanevalclassify
  • model_path: 模型路径,例如 WizardLM/WizardCoder-15B-V1.0
  • prompt: 提示选项,包括 base_prompt, one-shot, chain_of_thought, coding_concepts
  • num_samples: 生成的样本数量,例如 1
  • max_new_tokens: 模型生成的新令牌数量,例如 5192
  • temperature: 模型生成的温度设置,例如 0
  • score_k: Score@K 的 K 值,例如 1,5,10,20
  • metric: 评估指标选项,包括 diffbleu, codeql, codeql-diffbleu, classification, runtime
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作