open-llm-leaderboard-old/details_Qwen__Qwen2-72B
收藏Hugging Face2024-05-30 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_Qwen__Qwen2-72B
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是在模型Qwen/Qwen2-72B在Open LLM Leaderboard上的评估运行期间自动创建的。数据集由63个配置组成,每个配置对应一个评估任务。数据集是从1次运行中生成的,每次运行都作为每个配置中的一个特定分割存储,分割名称使用运行的时间戳。train分割始终指向最新的结果。一个额外的配置results存储了运行的所有聚合结果,用于计算和显示Open LLM Leaderboard上的聚合指标。README还提供了一个使用`datasets`库中的`load_dataset`函数加载运行细节的示例。还包括了2024-05-30T19:20:17.942751运行的最新结果,显示了不同任务的各种准确率指标。
该数据集是在模型Qwen/Qwen2-72B在Open LLM Leaderboard上的评估运行期间自动创建的。数据集由63个配置组成,每个配置对应一个评估任务。数据集是从1次运行中生成的,每次运行都作为每个配置中的一个特定分割存储,分割名称使用运行的时间戳。train分割始终指向最新的结果。一个额外的配置results存储了运行的所有聚合结果,用于计算和显示Open LLM Leaderboard上的聚合指标。README还提供了一个使用`datasets`库中的`load_dataset`函数加载运行细节的示例。还包括了2024-05-30T19:20:17.942751运行的最新结果,显示了不同任务的各种准确率指标。
提供机构:
open-llm-leaderboard-old
原始信息汇总
数据集概述
数据集基本信息
- 名称: Evaluation run of Qwen/Qwen2-72B
- 来源: 自动创建于模型 Qwen/Qwen2-72B 在 Open LLM Leaderboard 上的评估运行。
- 配置数量: 63
- 创建次数: 1
数据集结构
- 配置: 每个配置对应一个评估任务。
- 分割: 每个配置包含多个分割,分割名称使用运行的时间戳。"train" 分割指向最新的结果。
- 额外配置: "results" 配置存储所有运行的聚合结果,用于计算和显示聚合指标。
最新结果
- 时间戳: 2024-05-30T19:20:17.942751
- 任务结果:
- all:
- acc: 0.8326744621410869
- acc_stderr: 0.02515669535436435
- acc_norm: 0.8359507487155015
- acc_norm_stderr: 0.025643417234058778
- mc1: 0.3733170134638923
- mc1_stderr: 0.016932370557570627
- mc2: 0.5473731276983239
- mc2_stderr: 0.014519749581903284
- harness|arc:challenge|25:
- acc: 0.658703071672355
- acc_stderr: 0.013855831287497726
- acc_norm: 0.6877133105802048
- acc_norm_stderr: 0.013542598541688065
- harness|hellaswag|10:
- acc: 0.6763592909778928
- acc_stderr: 0.004669085411342196
- acc_norm: 0.8727345150368453
- acc_norm_stderr: 0.003325890225529866
- harness|hendrycksTest-abstract_algebra|5:
- acc: 0.67
- acc_stderr: 0.04725815626252607
- acc_norm: 0.67
- acc_norm_stderr: 0.04725815626252607
- harness|hendrycksTest-anatomy|5:
- acc: 0.8
- acc_stderr: 0.03455473702325438
- acc_norm: 0.8
- acc_norm_stderr: 0.03455473702325438
- harness|hendrycksTest-astronomy|5:
- acc: 0.9210526315789473
- acc_stderr: 0.021944342818247937
- acc_norm: 0.9210526315789473
- acc_norm_stderr: 0.021944342818247937
- harness|hendrycksTest-business_ethics|5:
- acc: 0.8
- acc_stderr: 0.04020151261036844
- acc_norm: 0.8
- acc_norm_stderr: 0.04020151261036844
- harness|hendrycksTest-clinical_knowledge|5:
- acc: 0.8716981132075472
- acc_stderr: 0.020582475687991857
- acc_norm: 0.8716981132075472
- acc_norm_stderr: 0.020582475687991857
- harness|hendrycksTest-college_biology|5:
- acc: 0.9305555555555556
- acc_stderr: 0.021257974822832038
- acc_norm: 0.9305555555555556
- acc_norm_stderr: 0.021257974822832038
- harness|hendrycksTest-college_chemistry|5:
- acc: 0.6
- acc_stderr: 0.049236596391733084
- acc_norm: 0.6
- acc_norm_stderr: 0.049236596391733084
- harness|hendrycksTest-college_computer_science|5:
- acc: 0.81
- acc_stderr: 0.039427724440366234
- acc_norm: 0.81
- acc_norm_stderr: 0.039427724440366234
- harness|hendrycksTest-college_mathematics|5:
- acc: 0.65
- acc_stderr: 0.0479372485441102
- acc_norm: 0.65
- acc_norm_stderr: 0.0479372485441102
- harness|hendrycksTest-college_medicine|5:
- acc: 0.838150289017341
- acc_stderr: 0.028083594279575762
- acc_norm: 0.838150289017341
- acc_norm_stderr: 0.028083594279575762
- harness|hendrycksTest-college_physics|5:
- acc: 0.6568627450980392
- acc_stderr: 0.04724007352383889
- acc_norm: 0.6568627450980392
- acc_norm_stderr: 0.04724007352383889
- harness|hendrycksTest-computer_security|5:
- acc: 0.84
- acc_stderr: 0.036845294917747094
- acc_norm: 0.84
- acc_norm_stderr: 0.036845294917747094
- harness|hendrycksTest-conceptual_physics|5:
- acc: 0.8936170212765957
- acc_stderr: 0.02015597730704988
- acc_norm: 0.8936170212765957
- acc_norm_stderr: 0.02015597730704988
- harness|hendrycksTest-econometrics|5:
- acc: 0.7368421052631579
- acc_stderr: 0.041424397194893686
- acc_norm: 0.7368421052631579
- acc_norm_stderr: 0.041424397194893686
- harness|hendrycksTest-electrical_engineering|5:
- acc: 0.8206896551724138
- acc_stderr: 0.031967664333731854
- acc_norm: 0.8206896551724138
- acc_norm_stderr: 0.031967664333731854
- harness|hendrycksTest-elementary_mathematics|5:
- acc: 0.8862433862433863
- acc_stderr: 0.0163528764804948
- acc_norm: 0.8862433862433863
- acc_norm_stderr: 0.0163528764804948
- harness|hendrycksTest-formal_logic|5:
- acc: 0.7380952380952381
- acc_stderr: 0.03932537680392871
- acc_norm: 0.7380952380952381
- acc_norm_stderr: 0.03932537680392871
- harness|hendrycksTest-global_facts|5:
- acc: 0.63
- acc_stderr: 0.048523658709391
- acc_norm: 0.63
- acc_norm_stderr: 0.048523658709391
- harness|hendrycksTest-high_school_biology|5:
- acc: 0.9354838709677419
- acc_stderr: 0.013975683705589406
- acc_norm: 0.9354838709677419
- acc_norm_stderr: 0.013975683705589406
- harness|hendrycksTest-high_school_chemistry|5:
- acc: 0.7783251231527094
- acc_stderr: 0.0292255758924896
- acc_norm: 0.7783251231527094
- acc_norm_stderr: 0.0292255758924896
- harness|hendrycksTest-high_school_computer_science|5:
- acc: 0.91
- acc_stderr: 0.028762349126466115
- acc_norm: 0.91
- acc_norm_stderr: 0.028762349126466115
- harness|hendrycksTest-high_school_european_history|5:
- acc: 0.8848484848484849
- acc_stderr: 0.024925699798115347
- acc_norm: 0.8848484848484849
- acc_norm_stderr: 0.024925699798115347
- harness|hendrycksTest-high_school_geography|5:
- acc: 0.9393939393939394
- acc_stderr: 0.01699999492742161
- acc_norm: 0.9393939393939394
- acc_norm_stderr: 0.01699999492742161
- harness|hendrycksTest-high_school_government_and_politics|5:
- acc: 0.9896373056994818
- acc_stderr: 0.007308424386792192
- acc_norm: 0.9896373056994818
- acc_norm_stderr: 0.007308424386792192
- harness|hendrycksTest-high_school_macroeconomics|5:
- acc: 0.882051282051282
- acc_stderr: 0.016353801778303412
- acc_norm: 0.882051282051282
- acc_norm_stderr: 0.016353801778303412
- harness|hendrycksTest-high_school_mathematics|5:
- acc: 0.6851851851851852
- acc_stderr: 0.028317533496066475
- acc_norm: 0.6851851851851852
- acc_norm_stderr: 0.028317533496066475
- harness|hendrycksTest-high_school_microeconomics|5:
- acc: 0.9369747899159664
- acc_stderr: 0.015785085223670916
- acc_norm: 0.9369747899159664
- acc_norm_stderr: 0.015785085223670916
- harness|hendrycksTest-high_school_physics|5:
- acc: 0.7019867549668874
- acc_stderr: 0.037345356767871984
- acc_norm: 0.7019867549668874
- acc_norm_stderr: 0.037345356767871984
- harness|hendrycksTest-high_school_psychology|5:
- acc: 0.9357798165137615
- acc_stderr: 0.010510494713201386
- acc_norm: 0.9357798165137615
- acc
- all:



