five

open-llm-leaderboard/details_TheBloke__Lemur-70B-Chat-v1-GPTQ

收藏
Hugging Face2023-08-31 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard/details_TheBloke__Lemur-70B-Chat-v1-GPTQ
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在评估模型TheBloke/Lemur-70B-Chat-v1-GPTQ时自动创建的,用于在Open LLM Leaderboard上进行评估。数据集由61个配置组成,每个配置对应一个评估任务。数据集是从1次运行中创建的,每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳命名。train分割始终指向最新的结果。此外,还有一个名为results的配置,存储了所有运行的聚合结果,并用于计算和显示在Open LLM Leaderboard上的聚合指标。

该数据集是在评估模型TheBloke/Lemur-70B-Chat-v1-GPTQ时自动创建的,用于在Open LLM Leaderboard上进行评估。数据集由61个配置组成,每个配置对应一个评估任务。数据集是从1次运行中创建的,每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳命名。train分割始终指向最新的结果。此外,还有一个名为results的配置,存储了所有运行的聚合结果,并用于计算和显示在Open LLM Leaderboard上的聚合指标。
提供机构:
open-llm-leaderboard
原始信息汇总

数据集概述

数据集摘要

该数据集是在对模型 TheBloke/Lemur-70B-Chat-v1-GPTQ 进行评估运行期间自动创建的,用于 Open LLM Leaderboard

数据集组成

  • 数据集由 61 个配置组成,每个配置对应一个评估任务。
  • 数据集从 1 次运行中创建,每次运行可以在每个配置中找到特定的拆分,拆分名称使用运行的时间戳。
  • "train" 拆分始终指向最新的结果。
  • 额外的 "results" 配置存储所有运行的聚合结果,用于计算和显示 Open LLM Leaderboard 上的聚合指标。

最新结果

以下是 2023-08-31T06:46:13.725525 运行的最新结果

python { "all": { "acc": 0.6468074911221942, "acc_stderr": 0.03281612856930076, "acc_norm": 0.6509040444920074, "acc_norm_stderr": 0.032790646231639874, "mc1": 0.3818849449204406, "mc1_stderr": 0.01700810193916349, "mc2": 0.5711470281396481, "mc2_stderr": 0.015283087726691595 }, "harness|arc:challenge|25": { "acc": 0.6075085324232082, "acc_stderr": 0.014269634635670724, "acc_norm": 0.6527303754266212, "acc_norm_stderr": 0.013913034529620446 }, "harness|hellaswag|10": { "acc": 0.6475801633140809, "acc_stderr": 0.00476747536668976, "acc_norm": 0.8440549691296555, "acc_norm_stderr": 0.003620617550747387 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.29, "acc_stderr": 0.045604802157206845, "acc_norm": 0.29, "acc_norm_stderr": 0.045604802157206845 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.5407407407407407, "acc_stderr": 0.04304979692464242, "acc_norm": 0.5407407407407407, "acc_norm_stderr": 0.04304979692464242 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.6973684210526315, "acc_stderr": 0.03738520676119667, "acc_norm": 0.6973684210526315, "acc_norm_stderr": 0.03738520676119667 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.67, "acc_stderr": 0.04725815626252609, "acc_norm": 0.67, "acc_norm_stderr": 0.04725815626252609 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.6754716981132075, "acc_stderr": 0.02881561571343211, "acc_norm": 0.6754716981132075, "acc_norm_stderr": 0.02881561571343211 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.7430555555555556, "acc_stderr": 0.03653946969442099, "acc_norm": 0.7430555555555556, "acc_norm_stderr": 0.03653946969442099 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.47, "acc_stderr": 0.050161355804659205, "acc_norm": 0.47, "acc_norm_stderr": 0.050161355804659205 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.49, "acc_stderr": 0.05024183937956912, "acc_norm": 0.49, "acc_norm_stderr": 0.05024183937956912 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.3, "acc_stderr": 0.046056618647183814, "acc_norm": 0.3, "acc_norm_stderr": 0.046056618647183814 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.6358381502890174, "acc_stderr": 0.03669072477416906, "acc_norm": 0.6358381502890174, "acc_norm_stderr": 0.03669072477416906 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.4117647058823529, "acc_stderr": 0.04897104952726366, "acc_norm": 0.4117647058823529, "acc_norm_stderr": 0.04897104952726366 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.78, "acc_stderr": 0.04163331998932263, "acc_norm": 0.78, "acc_norm_stderr": 0.04163331998932263 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.5914893617021276, "acc_stderr": 0.032134180267015755, "acc_norm": 0.5914893617021276, "acc_norm_stderr": 0.032134180267015755 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.4298245614035088, "acc_stderr": 0.046570472605949625, "acc_norm": 0.4298245614035088, "acc_norm_stderr": 0.046570472605949625 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.5448275862068965, "acc_stderr": 0.04149886942192118, "acc_norm": 0.5448275862068965, "acc_norm_stderr": 0.04149886942192118 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.4603174603174603, "acc_stderr": 0.025670080636909186, "acc_norm": 0.4603174603174603, "acc_norm_stderr": 0.025670080636909186 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.47619047619047616, "acc_stderr": 0.04467062628403273, "acc_norm": 0.47619047619047616, "acc_norm_stderr": 0.04467062628403273 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.5, "acc_stderr": 0.050251890762960605, "acc_norm": 0.5, "acc_norm_stderr": 0.050251890762960605 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.7612903225806451, "acc_stderr": 0.02425107126220884, "acc_norm": 0.7612903225806451, "acc_norm_stderr": 0.02425107126220884 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.4630541871921182, "acc_stderr": 0.035083705204426656, "acc_norm": 0.4630541871921182, "acc_norm_stderr": 0.035083705204426656 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.72, "acc_stderr": 0.04512608598542128, "acc_norm": 0.72, "acc_norm_stderr": 0.04512608598542128 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.7878787878787878, "acc_stderr": 0.03192271569548301, "acc_norm": 0.7878787878787878, "acc_norm_stderr": 0.03192271569548301 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.8131313131313131, "acc_stderr": 0.027772533334218967, "acc_norm": 0.8131313131313131, "acc_norm_stderr": 0.027772533334218967 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.8963730569948186, "acc_stderr": 0.02199531196364424, "acc_norm": 0.8963730569948186, "acc_norm_stderr": 0.02199531196364424 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.6615384615384615, "acc_stderr": 0.023991500500313036, "acc_norm": 0.6615384615384615, "acc_norm_stderr": 0.023991500500313036 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.3, "acc_stderr": 0.027940457136228426, "acc_norm": 0.3, "acc_norm_stderr": 0.027940457136228426 }, "harness|hendrycksTest-high_school_microeconomics|5": { "acc": 0.6596638655462185, "acc_stderr": 0.030778057422931673, "acc_norm": 0.6596638655462185, "acc_norm_stderr": 0.030778057422931673 }, "harness|hendrycksTest-high_school_physics|5": { "acc": 0.41721854304635764, "acc_stderr": 0.04026141497634611, "acc_norm": 0.41721854304635764, "acc_norm_stderr": 0.04026141497634611 }, "harness|hendrycksTest-high_school_psychology|5": { "acc": 0.8348623853211009, "acc_stderr": 0.015919557829976064, "acc_norm": 0.8348623853211009, "acc_norm_stderr": 0.015919557829976064 }, "harness|hendrycksTest-high_school_statistics|5": { "acc": 0.5231481481481481, "acc_stderr": 0.03406315360711507, "acc_norm": 0.5231481481481481, "acc_norm_stderr": 0.03406315360711507 }, "harness|hendrycksTest-high_school_us_history|5": { "acc": 0.8382352941176471, "acc_stderr": 0.02584501798692692, "acc_norm": 0.8382352941176471, "acc_norm_stderr": 0.02584501798692692 }, "harness|hendrycksTest-high_school_world_history|5": { "acc": 0.8354430379746836, "acc_stderr": 0.024135736240566932, "acc_norm": 0.8354430379746836, "acc_norm_stderr": 0.024135736240566932 }, "harness|hendrycksTest-human_aging|5": { "acc": 0.7309417040358744, "acc_stderr": 0.029763779406874972, "acc_norm": 0.7309417040358744, "acc_norm_stderr": 0.029763779406874972 }, "harness|hendrycksTest-human_sexuality|5": { "acc": 0.7404580152671756, "acc_stderr": 0.03844876139785271, "acc_norm": 0.7404580152671756, "acc_norm_stderr": 0.03844876139785271 }, "harness|hendrycksTest-international_law|5": { "acc": 0.8429752066115702, "acc_stderr": 0.03321244842547128, "acc_norm": 0.8429752066115702, "acc_norm_stderr": 0.03321244842547128 }, "harness|hendrycksTest-jurisprudence|5": { "acc": 0.7777777777777778, "acc_stderr": 0.0401910747255735, "acc_norm": 0.7777777777777778, "acc_norm_stderr": 0.0401910747255735 }, "harness|hendrycksTest-logical_fallacies|5": { "acc": 0.7852760736196319, "acc_stderr": 0.03226219377286775, "acc_norm": 0.7852760736196319, "acc_norm_stderr": 0.03226219377286775 }, "harness|hendrycksTest-machine_learning|5": { "acc": 0.49107142857142855, "acc_stderr": 0.04745033255489123, "acc_norm": 0.49107142857142855, "acc_norm_stderr": 0.04745033255489123 }, "harness|hendrycksTest-management|5": { "acc": 0.7864077669902912, "acc_stderr": 0.040580420156460344, "acc_norm": 0.7864077669902912, "acc_norm_stderr": 0.040580420156460344 }, "harness|hendrycksTest-marketing|5": { "acc": 0.8589743589743589, "acc_stderr": 0.02280138253459753, "acc_norm": 0.8589743589743589, "acc_norm_stderr": 0.02280138253459753 }, "harness|hendrycksTest-medical_genetics|5": { "acc": 0.62, "acc_stderr": 0.048783173121456316, "acc_norm": 0.62, "acc_norm_stderr": 0.048783173121456316 }, "harness|hendrycksTest-miscellaneous|5": { "acc": 0.8135376756066411, "acc_stderr": 0.013927751372001512, "acc_norm": 0.8135376756066411, "acc_norm_stderr": 0.013927751372001512 }, "harness|hendrycksTest-moral_disputes|5": { "acc": 0.7427745664739884, "acc_stderr": 0.023532925431044287, "acc_norm": 0.7427745664739884, "acc_norm_stderr": 0.023532925431044287 }, "harness|hendrycksTest-moral_scenarios|5": { "acc": 0.5072625698324023, "acc_stderr": 0.0167207374051795, "acc_norm": 0.5072625698324023, "acc_norm_stderr": 0.0167207374051795 }, "harness|hendrycksTest-nutrition|5": { "acc": 0.7058823529411765, "acc_stderr": 0.026090162504279046, "acc_norm": 0.7058823529411765, "acc_norm_stderr": 0.026090162504279046 }, "harness|hendrycksTest-philosophy|5": { "acc": 0.707395498392283, "acc_stderr": 0.025839898334877983, "acc_norm": 0.707395498392283, "acc_norm_stderr": 0.025839898334877983 }, "harness|hendrycksTest-prehistory|5": { "acc": 0.7469135802469136, "acc_stderr": 0.024191808600712992, "acc_norm": 0.7469135802469136, "acc_norm_stderr": 0.024191808600712992 }, "harness|hendrycksTest-professional_accounting|5": { "acc": 0.49645390070921985, "acc_stderr": 0.02982674915328092, "acc_norm": 0.49645390070921985, "acc_norm_stderr": 0.02982674915328092 }, "harness|hendrycksTest-professional_law|5": { "acc": 0.49934810951760106, "acc_stderr": 0.012770225252255534, "acc_norm": 0.49934810951760106, "acc_norm_stderr": 0.012770225252255534 }, "harness|hendrycksTest-professional_medicine|5": { "acc": 0.6580882352941176, "acc_stderr": 0.028814722422254184, "acc_norm": 0.6580882352941176, "acc_norm_stderr": 0.028814722422254184 }, "harness|hendrycksTest-professional_psychology|5": { "acc": 0.6830065359477124, "acc_stderr": 0.01882421951270621, "acc_norm": 0.6830065359477124, "acc_norm_stderr": 0.01882421951270621 }, "harness|hendrycksTest-public_relations|5": { "acc": 0.7, "acc_stderr": 0.04389311454644287, "acc_norm": 0.7, "acc_norm_stderr": 0.04389311454644287 }, "harness|hendrycksTest-security_studies|5": { "acc": 0.7918367346938775, "acc_stderr": 0.025991117672813296, "acc_norm": 0.7918367346938775, "acc_norm_stderr": 0.025991117672813296 }, "harness|hendrycksTest-sociology|5": { "acc": 0.845771144278607, "acc_stderr": 0.025538433368578334, "acc_norm": 0.845771144278607, "acc_norm_stderr": 0.025538433368578334 }, "harness|hendrycksTest-us_foreign_policy|5": { "acc": 0.86, "acc_stderr": 0.034873508801977704, "acc_norm": 0.86, "acc_norm_stderr": 0.034873508801977704 }, "harness|hendrycksTest-virology|5": { "acc": 0.5120481927710844, "acc_stderr": 0.038913644958358175, "acc_norm": 0.5120481927710844, "acc_norm_stderr": 0.038913644958358175 }, "harness|hendrycksTest-world_religions|5": { "acc": 0.783625730994152, "acc_stderr": 0.03158149539338734, "acc_norm": 0.783625730994152, "acc_norm_stderr": 0.03158149539338734 }, "harness|truthfulqa:mc|0": { "mc1": 0.3818849449204406, "mc1_stderr": 0.01700810193916349, "mc2": 0.5711470281396481, "mc2_stderr": 0.015283087726691595 } }

数据集配置

  • 配置名称: harness_arc_challenge_25

    • 数据文件:
      • 拆分: 2023_08_31T06_46_13.725525
        • 路径: **/details_harness|arc:challenge|25_2023-08-31T06:46:13.725525.parquet
      • 拆分: latest
        • 路径: **/details_harness|arc:challenge|25_2023-08-31T06:46:13.725525.parquet
  • 配置名称: harness_hellaswag_10

    • 数据文件:
      • 拆分: 2023_08_31T06_46_13.725525
        • 路径: **/details_harness|hellaswag|10_2023-08-31T06:46:13.725525.parquet
      • 拆分: latest
        • 路径: **/details_harness|hellaswag|10_2023-08-31T06:46:13.725525.parquet
  • 配置名称: harness_hendrycksTest_5

    • 数据文件:
      • 拆分: 2023_08_31T06_46_13.725525
        • 路径:
          • **/details_harness|hendrycksTest-abstract_algebra|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-anatomy|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-astronomy|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-business_ethics|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-clinical_knowledge|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-college_biology|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-college_chemistry|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-college_computer_science|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-college_mathematics|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-college_medicine|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-college_physics|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-computer_security|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-conceptual_physics|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-econometrics|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-electrical_engineering|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-elementary_mathematics|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-formal_logic|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-global_facts|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-high_school_biology|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-high_school_chemistry|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-high_school_computer_science|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-high_school_european_history|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-high_school_geography|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-high_school_mathematics|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-high_school_microeconomics|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-high_school_physics|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-high_school_psychology|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-high_school_statistics|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-high_school_us_history|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-high_school_world_history|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-human_aging|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-human_sexuality|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-international_law|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-jurisprudence|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-logical_fallacies|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-machine_learning|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-management|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-marketing|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-medical_genetics|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-miscellaneous|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-moral_disputes|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-moral_scenarios|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-nutrition|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-philosophy|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-prehistory|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-professional_accounting|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-professional_law|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-professional_medicine|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-professional_psychology|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-public_relations|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-security_studies|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-sociology|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-us_foreign_policy|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-virology|5_2023-08-31T06:46:13.725525.parquet
          • **/details_harness|hendrycksTest-world_religions|5_2023-08-31T06:46:13.725525.parquet
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作