open-llm-leaderboard-old/details_uukuguy__Mistral-7B-OpenOrca-lora
收藏数据集概述
数据集摘要
该数据集是在评估模型 uukuguy/Mistral-7B-OpenOrca-lora 在 Open LLM Leaderboard 上的自动创建的。数据集包含 64 个配置,每个配置对应一个评估任务。数据集从 1 次运行中创建,每个运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。"train" 分割始终指向最新的结果。
数据集结构
数据集包含多个配置,每个配置对应不同的评估任务。以下是部分配置的详细信息:
-
harness_arc_challenge_25
- 分割:2023_11_13T15_44_18.785582
- 路径:
**/details_harness|arc:challenge|25_2023-11-13T15-44-18.785582.parquet - 分割:latest
- 路径:
**/details_harness|arc:challenge|25_2023-11-13T15-44-18.785582.parquet
-
harness_drop_3
- 分割:2023_11_13T15_44_18.785582
- 路径:
**/details_harness|drop|3_2023-11-13T15-44-18.785582.parquet - 分割:latest
- 路径:
**/details_harness|drop|3_2023-11-13T15-44-18.785582.parquet
-
harness_gsm8k_5
- 分割:2023_11_13T15_44_18.785582
- 路径:
**/details_harness|gsm8k|5_2023-11-13T15-44-18.785582.parquet - 分割:latest
- 路径:
**/details_harness|gsm8k|5_2023-11-13T15-44-18.785582.parquet
-
harness_hellaswag_10
- 分割:2023_11_13T15_44_18.785582
- 路径:
**/details_harness|hellaswag|10_2023-11-13T15-44-18.785582.parquet - 分割:latest
- 路径:
**/details_harness|hellaswag|10_2023-11-13T15-44-18.785582.parquet
-
harness_hendrycksTest_5
- 分割:2023_11_13T15_44_18.785582
- 路径:
**/details_harness|hendrycksTest-abstract_algebra|5_2023-11-13T15-44-18.785582.parquet**/details_harness|hendrycksTest-anatomy|5_2023-11-13T15-44-18.785582.parquet**/details_harness|hendrycksTest-astronomy|5_2023-11-13T15-44-18.785582.parquet**/details_harness|hendrycksTest-business_ethics|5_2023-11-13T15-44-18.785582.parquet**/details_harness|hendrycksTest-clinical_knowledge|5_2023-11-13T15-44-18.785582.parquet**/details_harness|hendrycksTest-college_biology|5_2023-11-13T15-44-18.785582.parquet**/details_harness|hendrycksTest-college_chemistry|5_2023-11-13T15-44-18.785582.parquet**/details_harness|hendrycksTest-college_computer_science|5_2023-11-13T15-44-18.785582.parquet**/details_harness|hendrycksTest-college_mathematics|5_2023-11-13T15-44-18.785582.parquet**/details_harness|hendrycksTest-college_medicine|5_2023-11-13T15-44-18.785582.parquet**/details_harness|hendrycksTest-college_physics|5_2023-11-13T15-44-18.785582.parquet
最新结果
以下是 最新结果 的摘要:
python { "all": { "acc": 0.6351832920969729, "acc_stderr": 0.03210898212657927, "acc_norm": 0.6445450507876114, "acc_norm_stderr": 0.03280393070910138, "mc1": 0.2839657282741738, "mc1_stderr": 0.015785370858396725, "mc2": 0.4274271734982197, "mc2_stderr": 0.014247308828610854, "em": 0.0019924496644295304, "em_stderr": 0.00045666764626669387, "f1": 0.06191694630872485, "f1_stderr": 0.0013823026381279647 }, "harness|arc:challenge|25": { "acc": 0.5742320819112628, "acc_stderr": 0.014449464278868807, "acc_norm": 0.6194539249146758, "acc_norm_stderr": 0.014188277712349814 }, "harness|hellaswag|10": { "acc": 0.6357299342760406, "acc_stderr": 0.004802413919932666, "acc_norm": 0.8361880103565027, "acc_norm_stderr": 0.003693484894179416 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.29, "acc_stderr": 0.045604802157206845, "acc_norm": 0.29, "acc_norm_stderr": 0.045604802157206845 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.6444444444444445, "acc_stderr": 0.04135176749720385, "acc_norm": 0.6444444444444445, "acc_norm_stderr": 0.04135176749720385 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.6644736842105263, "acc_stderr": 0.03842498559395268, "acc_norm": 0.6644736842105263, "acc_norm_stderr": 0.03842498559395268 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.58, "acc_stderr": 0.049604496374885836, "acc_norm": 0.58, "acc_norm_stderr": 0.049604496374885836 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.6792452830188679, "acc_stderr": 0.028727502957880267, "acc_norm": 0.6792452830188679, "acc_norm_stderr": 0.028727502957880267 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.7361111111111112, "acc_stderr": 0.03685651095897532, "acc_norm": 0.7361111111111112, "acc_norm_stderr": 0.03685651095897532 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.49, "acc_stderr": 0.05024183937956911, "acc_norm": 0.49, "acc_norm_stderr": 0.05024183937956911 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.52, "acc_stderr": 0.050211673156867795, "acc_norm": 0.52, "acc_norm_stderr": 0.050211673156867795 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.35, "acc_stderr": 0.047937248544110196, "acc_norm": 0.35, "acc_norm_stderr": 0.047937248544110196 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.6358381502890174, "acc_stderr": 0.03669072477416907, "acc_norm": 0.6358381502890174, "acc_norm_stderr": 0.03669072477416907 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.38235294117647056, "acc_stderr": 0.04835503696107223, "acc_norm": 0.38235294117647056, "acc_norm_stderr": 0.04835503696107223 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.79, "acc_stderr": 0.04093601807403326, "acc_norm": 0.79, "acc_norm_stderr": 0.04093601807403326 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.5787234042553191, "acc_stderr": 0.03227834510146268, "acc_norm": 0.5787234042553191, "acc_norm_stderr": 0.03227834510146268 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.5087719298245614, "acc_stderr": 0.04702880432049615, "acc_norm": 0.5087719298245614, "acc_norm_stderr": 0.04702880432049615 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.5793103448275863, "acc_stderr": 0.0411391498118926, "acc_norm": 0.5793103448275863, "acc_norm_stderr": 0.0411391498118926 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.373015873015873, "acc_stderr": 0.02490699045899257, "acc_norm": 0.373015873015873, "acc_norm_stderr": 0.02490699045899257 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.40476190476190477, "acc_stderr": 0.0



