five

open-llm-leaderboard-old/details_ramachaitanya22__mistral-7B-finetune-health-fitness

收藏
Hugging Face2024-03-16 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_ramachaitanya22__mistral-7B-finetune-health-fitness
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在评估模型[ramachaitanya22/mistral-7B-finetune-health-fitness](https://huggingface.co/ramachaitanya22/mistral-7B-finetune-health-fitness)时自动创建的,评估在[Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)上进行。数据集由63个配置组成,每个配置对应一个评估任务。数据集从1次运行中创建,每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。"train"分割始终指向最新的结果。此外,一个名为"results"的配置存储了所有运行的聚合结果,并用于计算和显示在[Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)上的聚合指标。

该数据集是在评估模型[ramachaitanya22/mistral-7B-finetune-health-fitness](https://huggingface.co/ramachaitanya22/mistral-7B-finetune-health-fitness)时自动创建的,评估在[Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)上进行。数据集由63个配置组成,每个配置对应一个评估任务。数据集从1次运行中创建,每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。"train"分割始终指向最新的结果。此外,一个名为"results"的配置存储了所有运行的聚合结果,并用于计算和显示在[Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)上的聚合指标。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

数据集摘要

该数据集是在评估模型ramachaitanya22/mistral-7B-finetune-health-fitnessOpen LLM Leaderboard上的运行过程中自动创建的。

数据集结构

  • 该数据集包含63个配置,每个配置对应一个评估任务。
  • 数据集从1次运行中创建,每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。
  • "train"分割始终指向最新的结果。
  • 额外的配置"results"存储所有运行的聚合结果,用于计算和显示在Open LLM Leaderboard上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_ramachaitanya22__mistral-7B-finetune-health-fitness", "harness_winogrande_5", split="train")

最新结果

以下是最新结果来自2024-03-16T08:12:57.166226的摘要: python { "all": { "acc": 0.615706723495479, "acc_stderr": 0.03286224344928876, "acc_norm": 0.6223696887633396, "acc_norm_stderr": 0.0335390591979175, "mc1": 0.27050183598531213, "mc1_stderr": 0.015550778332842893, "mc2": 0.42069854026213144, "mc2_stderr": 0.014074226468804811 }, "harness|arc:challenge|25": { "acc": 0.5554607508532423, "acc_stderr": 0.01452122640562708, "acc_norm": 0.591296928327645, "acc_norm_stderr": 0.014365750345427 }, "harness|hellaswag|10": { "acc": 0.6194981079466242, "acc_stderr": 0.004845180034271621, "acc_norm": 0.8265285799641505, "acc_norm_stderr": 0.00377880447460591 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.31, "acc_stderr": 0.04648231987117316, "acc_norm": 0.31, "acc_norm_stderr": 0.04648231987117316 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.6, "acc_stderr": 0.04232073695151589, "acc_norm": 0.6, "acc_norm_stderr": 0.04232073695151589 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.6644736842105263, "acc_stderr": 0.038424985593952674, "acc_norm": 0.6644736842105263, "acc_norm_stderr": 0.038424985593952674 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.54, "acc_stderr": 0.05009082659620333, "acc_norm": 0.54, "acc_norm_stderr": 0.05009082659620333 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.6830188679245283, "acc_stderr": 0.02863723563980089, "acc_norm": 0.6830188679245283, "acc_norm_stderr": 0.02863723563980089 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.7083333333333334, "acc_stderr": 0.038009680605548594, "acc_norm": 0.7083333333333334, "acc_norm_stderr": 0.038009680605548594 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.41, "acc_stderr": 0.04943110704237102, "acc_norm": 0.41, "acc_norm_stderr": 0.04943110704237102 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.5, "acc_stderr": 0.050251890762960605, "acc_norm": 0.5, "acc_norm_stderr": 0.050251890762960605 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.43, "acc_stderr": 0.049756985195624284, "acc_norm": 0.43, "acc_norm_stderr": 0.049756985195624284 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.6184971098265896, "acc_stderr": 0.03703851193099521, "acc_norm": 0.6184971098265896, "acc_norm_stderr": 0.03703851193099521 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.3431372549019608, "acc_stderr": 0.047240073523838876, "acc_norm": 0.3431372549019608, "acc_norm_stderr": 0.047240073523838876 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.75, "acc_stderr": 0.04351941398892446, "acc_norm": 0.75, "acc_norm_stderr": 0.04351941398892446 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.548936170212766, "acc_stderr": 0.032529096196131965, "acc_norm": 0.548936170212766, "acc_norm_stderr": 0.032529096196131965 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.5087719298245614, "acc_stderr": 0.04702880432049615, "acc_norm": 0.5087719298245614, "acc_norm_stderr": 0.04702880432049615 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.593103448275862, "acc_stderr": 0.04093793981266236, "acc_norm": 0.593103448275862, "acc_norm_stderr": 0.04093793981266236 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.4074074074074074, "acc_stderr": 0.025305906241590632, "acc_norm": 0.4074074074074074, "acc_norm_stderr": 0.025305906241590632 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.3968253968253968, "acc_stderr": 0.043758884927270605, "acc_norm": 0.3968253968253968, "acc_norm_stderr": 0.043758884927270605 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.35, "acc_stderr": 0.0479372485441102, "acc_norm": 0.35, "acc_norm_stderr": 0.0479372485441102 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.7483870967741936, "acc_stderr": 0.02468597928623996, "acc_norm": 0.7483870967741936, "acc_norm_stderr": 0.02468597928623996 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.5073891625615764, "acc_stderr": 0.035176035403610105, "acc_norm": 0.5073891625615764, "acc_norm_stderr": 0.035176035403610105 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.66, "acc_stderr": 0.04760952285695237, "acc_norm": 0.66, "acc_norm_stderr": 0.04760952285695237 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.7393939393939394, "acc_stderr": 0.034277431758165236, "acc_norm": 0.7393939393939394, "acc_norm_stderr": 0.034277431758165236 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.7777777777777778, "acc_stderr": 0.029620227874790482, "acc_norm": 0.7777777777777778, "acc_norm_stderr": 0.029620227874790482 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.8497409326424871, "acc_stderr": 0.025787723180723886, "acc_norm": 0.8497409326424871, "acc_norm_stderr": 0.025787723180723886 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.6128205128205129, "acc_stderr": 0.024697216930878937, "acc_norm": 0.6128205128205129, "acc_norm_stderr": 0.024697216930878937 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.32592592592592595, "acc_stderr": 0.028578348365473072, "acc_norm": 0.32592592592592595, "acc_norm_stderr": 0.028578348

二维码
社区交流群
二维码
科研交流群
商业服务