five

open-llm-leaderboard-old/details_Azazelle__xDAN-SlimOrca

收藏
Hugging Face2023-12-30 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_Azazelle__xDAN-SlimOrca
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在Open LLM Leaderboard上对模型Azazelle/xDAN-SlimOrca进行评估时自动创建的。数据集由63个配置组成,每个配置对应一个评估任务。数据集是从1次运行中创建的,每次运行在每个配置中作为一个特定的分割找到,分割名称使用运行的时间戳命名。train分割始终指向最新的结果。一个额外的配置results存储了运行的所有聚合结果,并用于计算和显示Open LLM Leaderboard上的聚合指标。README还包含了如何从运行中加载详细信息的说明,并提供了特定运行的最新结果。

该数据集是在Open LLM Leaderboard上对模型Azazelle/xDAN-SlimOrca进行评估时自动创建的。数据集由63个配置组成,每个配置对应一个评估任务。数据集是从1次运行中创建的,每次运行在每个配置中作为一个特定的分割找到,分割名称使用运行的时间戳命名。train分割始终指向最新的结果。一个额外的配置results存储了运行的所有聚合结果,并用于计算和显示Open LLM Leaderboard上的聚合指标。README还包含了如何从运行中加载详细信息的说明,并提供了特定运行的最新结果。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

数据集简介

该数据集是在评估模型Azazelle/xDAN-SlimOrcaOpen LLM Leaderboard上的自动创建的。数据集包含63个配置,每个配置对应一个评估任务。

数据集结构

  • 配置数量:63个配置
  • 数据来源:从1次运行中创建,每个运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。
  • 最新结果:"train"分割始终指向最新结果。
  • 结果汇总:一个额外的配置"results"存储所有运行的汇总结果,用于计算和显示在Open LLM Leaderboard上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_Azazelle__xDAN-SlimOrca", "harness_winogrande_5", split="train")

最新结果

以下是2023-12-30T02:47:16.082570运行的最新结果:

python { "all": { "acc": 0.6385100804568992, "acc_stderr": 0.03217117249906078, "acc_norm": 0.6407471767083691, "acc_norm_stderr": 0.03280870726203415, "mc1": 0.408812729498164, "mc1_stderr": 0.01720995215164173, "mc2": 0.5768241806655555, "mc2_stderr": 0.015542058188975288 }, "harness|arc:challenge|25": { "acc": 0.628839590443686, "acc_stderr": 0.014117971901142818, "acc_norm": 0.6561433447098977, "acc_norm_stderr": 0.013880644570156213 }, "harness|hellaswag|10": { "acc": 0.6734714200358495, "acc_stderr": 0.004679847503411344, "acc_norm": 0.8570005974905397, "acc_norm_stderr": 0.00349356791409329 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.32, "acc_stderr": 0.046882617226215034, "acc_norm": 0.32, "acc_norm_stderr": 0.046882617226215034 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.6074074074074074, "acc_stderr": 0.04218506215368879, "acc_norm": 0.6074074074074074, "acc_norm_stderr": 0.04218506215368879 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.6776315789473685, "acc_stderr": 0.03803510248351585, "acc_norm": 0.6776315789473685, "acc_norm_stderr": 0.03803510248351585 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.6, "acc_stderr": 0.04923659639173309, "acc_norm": 0.6, "acc_norm_stderr": 0.04923659639173309 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.6716981132075471, "acc_stderr": 0.02890159361241178, "acc_norm": 0.6716981132075471, "acc_norm_stderr": 0.02890159361241178 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.7361111111111112, "acc_stderr": 0.03685651095897532, "acc_norm": 0.7361111111111112, "acc_norm_stderr": 0.03685651095897532 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.49, "acc_stderr": 0.05024183937956912, "acc_norm": 0.49, "acc_norm_stderr": 0.05024183937956912 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.52, "acc_stderr": 0.050211673156867795, "acc_norm": 0.52, "acc_norm_stderr": 0.050211673156867795 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.31, "acc_stderr": 0.04648231987117316, "acc_norm": 0.31, "acc_norm_stderr": 0.04648231987117316 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.6069364161849711, "acc_stderr": 0.0372424959581773, "acc_norm": 0.6069364161849711, "acc_norm_stderr": 0.0372424959581773 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.37254901960784315, "acc_stderr": 0.04810840148082635, "acc_norm": 0.37254901960784315, "acc_norm_stderr": 0.04810840148082635 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.78, "acc_stderr": 0.04163331998932263, "acc_norm": 0.78, "acc_norm_stderr": 0.04163331998932263 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.5829787234042553, "acc_stderr": 0.03223276266711712, "acc_norm": 0.5829787234042553, "acc_norm_stderr": 0.03223276266711712 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.43859649122807015, "acc_stderr": 0.04668000738510455, "acc_norm": 0.43859649122807015, "acc_norm_stderr": 0.04668000738510455 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.5586206896551724, "acc_stderr": 0.04137931034482758, "acc_norm": 0.5586206896551724, "acc_norm_stderr": 0.04137931034482758 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.42592592592592593, "acc_stderr": 0.02546714904546955, "acc_norm": 0.42592592592592593, "acc_norm_stderr": 0.02546714904546955 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.42857142857142855, "acc_stderr": 0.0442626668137991, "acc_norm": 0.42857142857142855, "acc_norm_stderr": 0.0442626668137991 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.28, "acc_stderr": 0.045126085985421276, "acc_norm": 0.28, "acc_norm_stderr": 0.045126085985421276 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.7806451612903226, "acc_stderr": 0.023540799358723285, "acc_norm": 0.7806451612903226, "acc_norm_stderr": 0.023540799358723285 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.4975369458128079, "acc_stderr": 0.03517945038691063, "acc_norm": 0.4975369458128079, "acc_norm_stderr": 0.03517945038691063 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.72, "acc_stderr": 0.04512608598542127, "acc_norm": 0.72, "acc_norm_stderr": 0.04512608598542127 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.7696969696969697, "acc_stderr": 0.0328766675860349, "acc_norm": 0.7696969696969697, "acc_norm_stderr": 0.0328766675860349 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.7878787878787878, "acc_stderr": 0.029126522834586815, "acc_norm": 0.7878787878787878, "acc_norm_stderr": 0.029126522834586815 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.8911917098445595, "acc_stderr": 0.022473253332768776, "acc_norm": 0.8911917098445595, "acc_norm_stderr": 0.022473253332768776 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.6358974358974359, "acc_stderr": 0.02439667298509476, "acc_norm": 0.6358974358974359, "acc_norm_stderr": 0.02439667298509476 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.3148148148148148, "acc_stderr": 0.028317533496066482, "acc_norm": 0.3148148148148148, "acc_norm_stderr": 0.028317533496066482 }, "harness|hendrycksTest-high_school_microeconomics|5": { "acc": 0.7184873949579832, "acc_stderr": 0.029213549414372174, "acc_norm": 0.7184873949579832, "acc_norm_stderr": 0.029213549414372174 }, "harness|hendrycksTest-high_school_physics|5": { "acc": 0.31788079470198677, "acc_stderr": 0.038020397601079024, "acc_norm": 0.31788079470198677, "acc_norm_stderr": 0.038020397601079024 }, "harness|hendrycksTest-high_school_psychology|5": { "acc": 0.8256880733944955, "acc_stderr": 0.016265675632010358, "acc_norm": 0.8256880733944955, "acc_norm_stderr": 0.016265675632010358 }, "harness|hendrycksTest-high_school_statistics|5": { "acc": 0.49537037037037035, "acc_stderr": 0.03409825519163572, "acc_norm": 0.49537037037037035, "acc_norm_stderr": 0.03409825519163572 }, "harness|hendrycksTest-high_school_us_history|5": { "acc": 0.8284313725490197, "acc_stderr": 0.026460569561240647, "acc_norm": 0.8284313725490197, "acc_norm_stderr": 0.026460569561240647 }, "harness|hendrycksTest-high_school_world_history|5": { "acc": 0.7974683544303798, "acc_stderr": 0.026160568246601436, "acc_norm": 0.7974683544303798, "acc_norm_stderr": 0.026160568246601436 }, "harness|hendrycksTest-human_aging|5": { "acc": 0.7219730941704036, "acc_stderr": 0.030069584874494036, "acc_norm": 0.7219730941704036, "acc_norm_stderr": 0.030069584874494036 }, "harness|hendrycksTest-human_sexuality|5": { "acc": 0.7557251908396947, "acc_stderr": 0.03768335959728743, "acc_norm": 0.7557251908396947, "acc_norm_stderr": 0.03768335959728743 }, "harness|hendrycksTest-international_law|5": { "acc": 0.7933884297520661, "acc_stderr": 0.03695980128098824, "acc_norm": 0.7933884297520661, "acc_norm_stderr": 0.03695980128098824 }, "harness|hendrycksTest-jurisprudence|5": { "acc": 0.7962962962962963, "acc_stderr": 0.03893542518824847, "acc_norm": 0.7962962962962963, "acc_norm_stderr": 0.03893542518824847 }, "harness|hendrycksTest-logical_fallacies|5": { "acc": 0.7791411042944786, "acc_stderr": 0.03259177392742178, "acc_norm": 0.7791411042944786, "acc_norm_stderr": 0.03259177392742178 }, "harness|hendrycksTest-machine_learning|5": { "acc": 0.5, "acc_stderr": 0.04745789978762494, "acc_norm": 0.5, "acc_norm_stderr": 0.04745789978762494 }, "harness|hendrycksTest-management|5": { "acc": 0.7961165048543689, "acc_stderr": 0.039891398595317706, "acc_norm": 0.7961165048543689, "acc_norm_stderr": 0.039891398595317706 }, "harness|hendrycksTest-marketing|5": { "acc": 0.8675213675213675, "acc_stderr": 0.022209309073165612, "acc_norm": 0.8675213675213675, "acc_norm_stderr": 0.022209309073165612 }, "harness|hendrycksTest-medical_genetics|5": { "acc": 0.68, "acc_stderr": 0.046882617226215034, "acc_norm": 0.68, "acc_norm_stderr": 0.046882617226215034 }, "harness|hendrycksTest-miscellaneous|5": { "acc": 0.8160919540229885, "acc_stderr": 0.013853724170922526, "acc_norm": 0.8160919540229885, "acc_norm_stderr": 0.013853724170922526 }, "harness|hendrycksTest-moral_disputes|5": { "acc": 0.7283236994219653, "acc_stderr": 0.023948512905468355, "acc_norm": 0.7283236994219653, "acc_norm_stderr": 0.023948512905468355 }, "harness|hendrycksTest-moral_scenarios|5": { "acc": 0.3653631284916201, "acc_stderr": 0.01610483388014229, "acc_norm": 0.3653631284916201, "acc_norm_stderr": 0.01610483388014229 }, "harness|hendrycksTest-nutrition|5": { "acc": 0.7352941176470589, "acc_stderr": 0.025261691219729484, "acc_norm": 0.7352941176470589, "acc_norm_stderr": 0.025261691219729484 }, "harness|hendrycksTest-philosophy|5": { "acc": 0.7138263665594855, "acc_stderr": 0.025670259242188933, "acc_norm": 0.7138263665594855, "acc_norm_stderr": 0.025670259242188933 }, "harness|hendrycksTest-prehistory|5": { "acc": 0.7469135802469136, "acc_stderr": 0.024191808600713002, "acc_norm": 0.7469135802469136, "acc_norm_stderr": 0.024191808600713002 }, "harness|hendrycksTest-professional_accounting|5": { "acc": 0.4858156028368794, "acc_stderr": 0.02981549448368206, "acc_norm": 0.4858156028368794, "acc_norm_stderr": 0.02981549448368206 }, "harness|hendrycksTest-professional_law|5": { "acc": 0.49478487614080835, "acc_stderr": 0.012769541449652547, "acc_norm": 0.49478487614080835, "acc_norm_stderr": 0.012769541449652547 }, "harness|hendrycksTest-professional_medicine|5": { "acc": 0.6433823529411765, "acc_stderr": 0.029097209568411952, "acc_norm": 0.6433823529411765, "acc_norm_stderr": 0.029097209568411952 }, "harness|hendrycksTest-professional_psychology|5": { "acc": 0.6552287581699346, "acc_stderr": 0.01922832201869664, "acc_norm": 0.6552287581699346, "acc_norm_stderr": 0.01922832201869664 }, "harness|hendrycksTest-public_relations|5": { "acc": 0.6454545454545455, "acc_stderr": 0.045820048415054174, "acc_norm": 0.6454545454545455, "acc_norm_stderr": 0.045820048415054174 }, "harness|hendrycksTest-security_studies|5": { "acc": 0.7510204081632653, "acc_stderr": 0.02768297952296022, "acc_norm": 0.7510204081632653, "acc_norm_stderr": 0.02768297952296022 }, "harness|hendrycksTest-sociology|5": { "acc": 0.835820895522388, "acc_stderr": 0.026193923544454125, "acc_norm": 0.835820895522388, "acc_norm_stderr": 0.026193923544454125 }, "harness|hendrycksTest-us_foreign_policy|5": { "acc": 0.87, "acc_stderr": 0.03379976689896309, "acc_norm": 0.87, "acc_norm_stderr": 0.03379976689896309 }, "harness|hendrycksTest-virology|5": { "acc": 0.5301204819277109, "acc_stderr": 0.03885425420866767, "acc_norm": 0.5301204819277109, "acc_norm_stderr": 0.03885425420866767 }, "harness|hendrycksTest-world_religions|5": { "acc": 0.7953216374269005, "acc_stderr": 0.030944459778533207, "acc_norm": 0.7953216374269005, "acc_norm_stderr": 0.030944459778533207 }, "harness|truthfulqa:mc|0": { "mc1": 0.408812729498164, "mc1_stderr": 0.01720995215164173, "mc2": 0.5768241806655555, "mc2_stderr": 0.015542058188975288 }, "harness|winogrande|5": { "acc": 0.77663772691397, "acc_stderr": 0.0117056975652052 }, "harness|gsm8k|5": { "acc": 0.579226686884003, "acc_stderr": 0.013598489497182838 } }

数据集配置

  • 配置名称:harness_arc_challenge_25

    • 数据文件
      • 分割:2023_12_30T02_47_16.082570
        • 路径:**/details_harness|arc:challenge|25_2023-12-30T02-47-16.082570.parquet
      • 分割:latest
        • 路径:**/details_harness|arc:challenge|25_2023-12-30T02-47-16.082570.parquet
  • 配置名称:harness_gsm8k_5

    • 数据文件
      • 分割:2023_12_30T02_47_16.082570
        • 路径:**/details_harness|gsm8k|5_2023-12-30T02-47-16.082570.parquet
      • 分割:latest
        • 路径:**/details_harness|gsm8k|5_2023-12-30T02-47-16.082570.parquet
  • 配置名称:harness_hellaswag_10

    • 数据文件
      • 分割:2023_12_30T02_47_16.082570
        • 路径:**/details_harness|hellaswag|10_2023-12-30T02-47-16.082570.parquet
      • 分割:latest
        • 路径:**/details_harness|hellaswag|10_2023-12-30T02-47-16.082570.parquet
  • 配置名称:harness_hendrycksTest_5

    • 数据文件
      • 分割:2023_12_30T02_47_16.082570
        • 路径:
          • **/details_harness|hendrycksTest-abstract_algebra|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-anatomy|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-astronomy|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-business_ethics|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-clinical_knowledge|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-college_biology|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-college_chemistry|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-college_computer_science|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-college_mathematics|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-college_medicine|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-college_physics|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-computer_security|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-conceptual_physics|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-econometrics|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-electrical_engineering|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-elementary_mathematics|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-formal_logic|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-global_facts|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-high_school_biology|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-high_school_chemistry|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-high_school_computer_science|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-high_school_european_history|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-high_school_geography|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-high_school_mathematics|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-high_school_microeconomics|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-high_school_physics|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-high_school_psychology|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-high_school_statistics|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-high_school_us_history|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-high_school_world_history|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-human_aging|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-human_sexuality|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-international_law|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-jurisprudence|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-logical_fallacies|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-machine_learning|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-management|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-marketing|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-medical_genetics|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-miscellaneous|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-moral_disputes|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-moral_scenarios|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-nutrition|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-philosophy|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-prehistory|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-professional_accounting|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-professional_law|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-professional_medicine|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-professional_psychology|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-public_relations|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-security_studies|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-sociology|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-us_foreign_policy|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-virology|5_2023-12-30T02-47-16.082570.parquet
          • **/details_harness|hendrycksTest-world_religions|5_2023-12-30T02-47-16.082570.parquet
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作