five

open-llm-leaderboard-old/details_huangyt__Mistral-7B-v0.1-Open-Platypus_2.5w-r16-gate_up_down

收藏
Hugging Face2024-01-14 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_huangyt__Mistral-7B-v0.1-Open-Platypus_2.5w-r16-gate_up_down
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在评估模型huangyt/Mistral-7B-v0.1-Open-Platypus_2.5w-r16-gate_up_down时自动创建的,评估是在Open LLM Leaderboard上进行的。数据集由63个配置组成,每个配置对应一个评估任务。数据集是从1次运行中创建的,每次运行都可以在特定配置中找到,运行的时间戳作为分割名称。train分割始终指向最新的结果。此外,还有一个名为results的配置,存储了所有运行的聚合结果,并用于计算和显示Open LLM Leaderboard上的聚合指标。

该数据集是在评估模型huangyt/Mistral-7B-v0.1-Open-Platypus_2.5w-r16-gate_up_down时自动创建的,评估是在Open LLM Leaderboard上进行的。数据集由63个配置组成,每个配置对应一个评估任务。数据集是从1次运行中创建的,每次运行都可以在特定配置中找到,运行的时间戳作为分割名称。train分割始终指向最新的结果。此外,还有一个名为results的配置,存储了所有运行的聚合结果,并用于计算和显示Open LLM Leaderboard上的聚合指标。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

数据集摘要

该数据集是在评估模型 huangyt/Mistral-7B-v0.1-Open-Platypus_2.5w-r16-gate_up_downOpen LLM Leaderboard 上的运行过程中自动创建的。

数据集组成

  • 数据集包含 63 个配置,每个配置对应一个评估任务。
  • 数据集从 1 次运行中创建。每个运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。
  • "train" 分割始终指向最新的结果。
  • 额外的 "results" 配置存储所有运行的聚合结果,用于计算和显示 Open LLM Leaderboard 上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_huangyt__Mistral-7B-v0.1-Open-Platypus_2.5w-r16-gate_up_down", "harness_winogrande_5", split="train")

最新结果

以下是 2024-01-14T17:36:45.221009 运行的最新结果

python { "all": { "acc": 0.6355178040599482, "acc_stderr": 0.03241610229663876, "acc_norm": 0.641571442422577, "acc_norm_stderr": 0.033065020971592085, "mc1": 0.3047735618115055, "mc1_stderr": 0.016114124156882452, "mc2": 0.45435317672164416, "mc2_stderr": 0.014528686611193308 }, "harness|arc:challenge|25": { "acc": 0.5665529010238908, "acc_stderr": 0.014481376224558902, "acc_norm": 0.6126279863481229, "acc_norm_stderr": 0.014235872487909872 }, "harness|hellaswag|10": { "acc": 0.6271659032065325, "acc_stderr": 0.004825702533920412, "acc_norm": 0.8319059948217487, "acc_norm_stderr": 0.0037318549570309373 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.28, "acc_stderr": 0.04512608598542128, "acc_norm": 0.28, "acc_norm_stderr": 0.04512608598542128 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.6148148148148148, "acc_stderr": 0.04203921040156279, "acc_norm": 0.6148148148148148, "acc_norm_stderr": 0.04203921040156279 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.6842105263157895, "acc_stderr": 0.0378272898086547, "acc_norm": 0.6842105263157895, "acc_norm_stderr": 0.0378272898086547 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.55, "acc_stderr": 0.05, "acc_norm": 0.55, "acc_norm_stderr": 0.05 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.6792452830188679, "acc_stderr": 0.02872750295788027, "acc_norm": 0.6792452830188679, "acc_norm_stderr": 0.02872750295788027 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.7013888888888888, "acc_stderr": 0.03827052357950756, "acc_norm": 0.7013888888888888, "acc_norm_stderr": 0.03827052357950756 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.45, "acc_stderr": 0.05, "acc_norm": 0.45, "acc_norm_stderr": 0.05 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.51, "acc_stderr": 0.05024183937956912, "acc_norm": 0.51, "acc_norm_stderr": 0.05024183937956912 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.41, "acc_stderr": 0.04943110704237102, "acc_norm": 0.41, "acc_norm_stderr": 0.04943110704237102 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.653179190751445, "acc_stderr": 0.036291466701596636, "acc_norm": 0.653179190751445, "acc_norm_stderr": 0.036291466701596636 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.38235294117647056, "acc_stderr": 0.04835503696107223, "acc_norm": 0.38235294117647056, "acc_norm_stderr": 0.04835503696107223 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.8, "acc_stderr": 0.04020151261036845, "acc_norm": 0.8, "acc_norm_stderr": 0.04020151261036845 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.574468085106383, "acc_stderr": 0.03232146916224468, "acc_norm": 0.574468085106383, "acc_norm_stderr": 0.03232146916224468 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.4649122807017544, "acc_stderr": 0.04692008381368909, "acc_norm": 0.4649122807017544, "acc_norm_stderr": 0.04692008381368909 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.5379310344827586, "acc_stderr": 0.04154659671707548, "acc_norm": 0.5379310344827586, "acc_norm_stderr": 0.04154659671707548 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.43386243386243384, "acc_stderr": 0.025525034382474884, "acc_norm": 0.43386243386243384, "acc_norm_stderr": 0.025525034382474884 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.4603174603174603, "acc_stderr": 0.04458029125470973, "acc_norm": 0.4603174603174603, "acc_norm_stderr": 0.04458029125470973 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.41, "acc_stderr": 0.049431107042371025, "acc_norm": 0.41, "acc_norm_stderr": 0.049431107042371025 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.7451612903225806, "acc_stderr": 0.024790118459332208, "acc_norm": 0.7451612903225806, "acc_norm_stderr": 0.024790118459332208 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.5221674876847291, "acc_stderr": 0.035145285621750094, "acc_norm": 0.5221674876847291, "acc_norm_stderr": 0.035145285621750094 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.65, "acc_stderr": 0.04793724854411019, "acc_norm": 0.65, "acc_norm_stderr": 0.04793724854411019 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.7818181818181819, "acc_stderr": 0.03225078108306289, "acc_norm": 0.7818181818181819, "acc_norm_stderr": 0.03225078108306289 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.7676767676767676, "acc_stderr": 0.03008862949021749, "acc_norm": 0.7676767676767676, "acc_norm_stderr": 0.03008862949021749 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.8601036269430051, "acc_stderr": 0.025033870583015184, "acc_norm": 0.8601036269430051, "acc_norm_stderr": 0.025033870583015184 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.6282051282051282, "acc_stderr": 0.024503472557110936, "acc_norm": 0.6282051282051282, "acc_norm_stderr": 0.024503472557110936 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.36666666666666664, "acc_stderr": 0.029381620726465076, "acc_norm": 0.36666666666666664, "acc_norm_stderr": 0.029381620726465076 }, "harness|hendrycksTest-high_school_microeconomics|5": { "acc": 0.6596638655462185, "acc_stderr": 0.030778057422931673, "acc_norm": 0.6596638655462185, "acc_norm_stderr": 0.030778057422931673 }, "harness|hendrycksTest-high_school_physics|5": { "acc": 0.3509933774834437, "acc_stderr": 0.03896981964257375, "acc_norm": 0.3509933774834437, "acc_norm_stderr": 0.03896981964257375 }, "harness|hendrycksTest-high_school_psychology|5": { "acc": 0.8348623853211009, "acc_stderr": 0.015919557829976044, "acc_norm": 0.8348623853211009, "acc_norm_stderr": 0.015919557829976044 }, "harness|hendrycksTest-high_school_statistics|5": { "acc": 0.5092592592592593, "acc_stderr": 0.034093869469927006, "acc_norm": 0.5092592592592593, "acc_norm_stderr": 0.034093869469927006 }, "harness|hendrycksTest-high_school_us_history|5": { "acc": 0.7843137254901961, "acc_stderr": 0.02886743144984932, "acc_norm": 0.7843137254901961, "acc_norm_stderr": 0.02886743144984932 }, "harness|hendrycksTest-high_school_world_history|5": { "acc": 0.7974683544303798, "acc_stderr": 0.026160568246601453, "acc_norm": 0.7974683544303798, "acc_norm_stderr": 0.026160568246601453 }, "harness|hendrycksTest-human_aging|5": { "acc": 0.6771300448430493, "acc_stderr": 0.031381476375754995, "acc_norm": 0.6771300448430493, "acc_norm_stderr": 0.031381476375754995 }, "harness|hendrycksTest-human_sexuality|5": { "acc": 0.7709923664122137, "acc_stderr": 0.036853466317118506, "acc_norm": 0.7709923664122137, "acc_norm_stderr": 0.036853466317118506 }, "harness|hendrycksTest-international_law|5": { "acc": 0.8099173553719008, "acc_stderr": 0.03581796951709282, "acc_norm": 0.8099173553719008, "acc_norm_stderr": 0.03581796951709282 }, "harness|hendrycksTest-jurisprudence|5": { "acc": 0.7685185185185185, "acc_stderr": 0.04077494709252627, "acc_norm": 0.7685185185185185, "acc_norm_stderr": 0.04077494709252627 }, "harness|hendrycksTest-logical_fallacies|5": { "acc": 0.7668711656441718, "acc_stderr": 0.0332201579577674, "acc_norm": 0.7668711656441718, "acc_norm_stderr": 0.0332201579577674 }, "harness|hendrycksTest-machine_learning|5": { "acc": 0.48214285714285715, "acc_stderr": 0.047427623612430116, "acc_norm": 0.48214285714285715, "acc_norm_stderr": 0.047427623612430116 }, "harness|hendrycksTest-management|5": { "acc": 0.8349514563106796, "acc_stderr": 0.036756688322331886, "acc_norm": 0.8349514563106796, "acc_norm_stderr": 0.036756688322331886 }, "harness|hendrycksTest-marketing|5": { "acc": 0.8547008547008547, "acc_stderr": 0.023086635086841407, "acc_norm": 0.8547008547008547, "acc_norm_stderr": 0.023086635086841407 }, "harness|hendrycksTest-medical_genetics|5": { "acc": 0.69, "acc_stderr": 0.04648231987117316, "acc_norm": 0.69, "acc_norm_stderr": 0.04648231987117316 }, "harness|hendrycksTest-miscellaneous|5": { "acc": 0.8186462324393359, "acc_stderr": 0.013778693778464085, "acc_norm": 0.8186462324393359, "acc_norm_stderr": 0.013778693778464085 }, "harness|hendrycksTest-moral_disputes|5": { "acc": 0.7341040462427746, "acc_stderr": 0.02378620325550829, "acc_norm": 0.7341040462427746, "acc_norm_stderr": 0.02378620325550829 }, "harness|hendrycksTest-moral_scenarios|5": { "acc": 0.38212290502793295, "acc_stderr": 0.016251139711570762, "acc_norm": 0.38212290502793295, "acc_norm_stderr": 0.016251139711570762 }, "harness|hendrycksTest-nutrition|5": { "acc": 0.761437908496732, "acc_stderr": 0.024404394928087873, "acc_norm": 0.761437908496732, "acc_norm_stderr": 0.024404394928087873 }, "harness|hendrycksTest-philosophy|5": { "acc": 0.7170418006430869, "acc_stderr": 0.025583062489984813, "acc_norm": 0.7170418006430869, "acc_norm_stderr": 0.025583062489984813 }, "harness|hendrycksTest-prehistory|5": { "acc": 0.7438271604938271, "acc_stderr": 0.024288533637726095, "acc_norm": 0.7438271604938271, "acc_norm_stderr": 0.024288533637726095 }, "harness|hendrycksTest-professional_accounting|5": { "acc": 0.48936170212765956, "acc_stderr": 0.029820747191422473, "acc_norm": 0.48936170212765956, "acc_norm_stderr": 0.029820747191422473 }, "harness|hendrycksTest-professional_law|5": { "acc": 0.46870925684485004, "acc_stderr": 0.012745204626083143, "acc_norm": 0.46870925684485004, "acc_norm_stderr": 0.012745204626083143 }, "harness|hendrycksTest-professional_medicine|5": { "acc": 0.6801470588235294, "acc_stderr": 0.02833295951403121, "acc_norm": 0.6801470588235294, "acc_norm_stderr": 0.02833295951403121 }, "harness|hendrycksTest-professional_psychology|5": { "acc": 0.6633986928104575, "acc_stderr": 0.019117213911495155, "acc_norm": 0.6633986928104575, "acc_norm_stderr": 0.019117213911495155 }, "harness|hendrycksTest-public_relations|5": { "acc": 0.6545454545454545, "acc_stderr": 0.04554619617541054, "acc_norm": 0.6545454545454545, "acc_norm_stderr": 0.04554619617541054 }, "harness|hendrycksTest-security_studies|5": { "acc": 0.689795918367347, "acc_stderr": 0.029613459872484378, "acc_norm": 0.689795918367347, "acc_norm_stderr": 0.029613459872484378 }, "harness|hendrycksTest-sociology|5": { "acc": 0.835820895522388, "acc_stderr": 0.026193923544454132, "acc_norm": 0.835820895522388, "acc_norm_stderr": 0.026193923544454132 }, "harness|hendrycksTest-us_foreign_policy|5": { "acc": 0.88, "acc_stderr": 0.03265986323710905, "acc_norm": 0.88, "acc_norm_stderr": 0.03265986323710905 }, "harness|hendrycksTest-virology|5": { "acc": 0.5602409638554217, "acc_stderr": 0.03864139923699122, "acc_norm": 0.5602409638554217, "acc_norm_stderr": 0.03864139923699122 }, "harness|hendrycksTest-world_religions|5": { "acc": 0.8187134502923976, "acc_stderr": 0.02954774168764004, "acc_norm": 0.8187134502923976, "acc_norm_stderr": 0.02954774168764004 }, "harness|truthfulqa:mc|0": { "mc1": 0.3047735618115055, "mc1_stderr": 0.016114124156882452, "mc2": 0.45435317672164416, "mc2_stderr": 0.014528686611193308 }, "harness|winogrande|5": { "acc": 0.7734806629834254, "acc_stderr": 0.011764149054698332 }, "harness|gsm8k|5": { "acc": 0.3912054586808188, "acc_stderr": 0.0134425024027943 } }

配置详情

  • harness_arc_challenge_25

    • 分割: 2024_01_14T17_36_45.221009
      • 路径: **/details_harness|arc:challenge|25_2024-01-14T17-36-45.221009.parquet
    • 分割: latest
      • 路径: **/details_harness|arc:challenge|25_2024-01-14T17-36-45.221009.parquet
  • harness_gsm8k_5

    • 分割: 2024_01_14T17_36_45.221009
      • 路径: **/details_harness|gsm8k|5_2024-01-14T17-36-45.221009.parquet
    • 分割: latest
      • 路径: **/details_harness|gsm8k|5_2024-01-14T17-36-45.221009.parquet
  • harness_hellaswag_10

    • 分割: 2024_01_14T17_36_45.221009
      • 路径: **/details_harness|hellaswag|10_2024-01-14T17-36-45.221009.parquet
    • 分割: latest
      • 路径: **/details_harness|hellaswag|10_2024-01-14T17-36-45.221009.parquet
  • harness_hendrycksTest_5

    • 分割: 2024_01_14T17_36_45.221009
      • 路径:
        • **/details_harness|hendrycksTest-abstract_algebra|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-anatomy|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-astronomy|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-business_ethics|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-clinical_knowledge|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-college_biology|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-college_chemistry|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-college_computer_science|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-college_mathematics|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-college_medicine|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-college_physics|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-computer_security|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-conceptual_physics|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-econometrics|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-electrical_engineering|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-elementary_mathematics|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-formal_logic|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-global_facts|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-high_school_biology|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-high_school_chemistry|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-high_school_computer_science|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-high_school_european_history|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-high_school_geography|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-high_school_government_and_politics|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-high_school_macroeconomics|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-high_school_mathematics|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-high_school_microeconomics|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-high_school_physics|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-high_school_psychology|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-high_school_statistics|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-high_school_us_history|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-high_school_world_history|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-human_aging|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-human_sexuality|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-international_law|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-jurisprudence|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-logical_fallacies|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-machine_learning|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-management|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-marketing|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-medical_genetics|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-miscellaneous|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-moral_disputes|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-moral_scenarios|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-nutrition|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-philosophy|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-prehistory|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-professional_accounting|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-professional_law|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-professional_medicine|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-professional_psychology|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-public_relations|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-security_studies|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-sociology|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-us_foreign_policy|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-virology|5_2024-01-14T17-36-45.221009.parquet
        • **/details_harness|hendrycksTest-world_religions|5_2024-01-14T17-36-45.221009.parquet
二维码
社区交流群
二维码
科研交流群
商业服务