open-llm-leaderboard-old/details_upstage__SOLAR-10.7B-Instruct-v1.0
收藏数据集概述
数据集简介
该数据集是在对模型 upstage/SOLAR-10.7B-Instruct-v1.0 进行评估运行时自动创建的。数据集包含 63 个配置,每个配置对应一个评估任务。数据集由 1 次运行创建,每次运行的详细结果可以在每个配置的特定拆分中找到,拆分名称使用运行的时间戳。"train" 拆分始终指向最新的结果。
数据集结构
数据集包含多个配置,每个配置对应不同的评估任务。以下是部分配置的示例:
harness_arc_challenge_25harness_gsm8k_5harness_hellaswag_10harness_hendrycksTest_5
每个配置包含多个数据文件,分为不同的拆分(如 2023_12_13T21_02_33.929144 和 latest),每个拆分包含相应的数据文件路径。
数据加载示例
以下是加载数据集的示例代码:
python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_upstage__SOLAR-10.7B-Instruct-v1.0", "harness_winogrande_5", split="train")
最新结果
以下是来自最新运行 2023-12-13T21:02:33.929144 的部分结果示例:
python { "all": { "acc": 0.6657586984797939, "acc_stderr": 0.03165995758526614, "acc_norm": 0.6666511531376961, "acc_norm_stderr": 0.0323050384069596, "mc1": 0.5667074663402693, "mc1_stderr": 0.017347024450107485, "mc2": 0.7142943510205136, "mc2_stderr": 0.015024530295000761 }, "harness|arc:challenge|25": { "acc": 0.6808873720136519, "acc_stderr": 0.013621696119173307, "acc_norm": 0.7107508532423208, "acc_norm_stderr": 0.01325001257939344 }, "harness|hellaswag|10": { "acc": 0.7070304720175263, "acc_stderr": 0.004541944342035901, "acc_norm": 0.8815972913762199, "acc_norm_stderr": 0.003224240722351317 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.41, "acc_stderr": 0.049431107042371025, "acc_norm": 0.41, "acc_norm_stderr": 0.049431107042371025 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.6148148148148148, "acc_stderr": 0.04203921040156279, "acc_norm": 0.6148148148148148, "acc_norm_stderr": 0.04203921040156279 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.7368421052631579, "acc_stderr": 0.03583496176361072, "acc_norm": 0.7368421052631579, "acc_norm_stderr": 0.03583496176361072 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.74, "acc_stderr": 0.0440844002276808, "acc_norm": 0.74, "acc_norm_stderr": 0.0440844002276808 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.6792452830188679, "acc_stderr": 0.02872750295788027, "acc_norm": 0.6792452830188679, "acc_norm_stderr": 0.02872750295788027 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.7638888888888888, "acc_stderr": 0.03551446610810826, "acc_norm": 0.7638888888888888, "acc_norm_stderr": 0.03551446610810826 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.44, "acc_stderr": 0.04988876515698589, "acc_norm": 0.44, "acc_norm_stderr": 0.04988876515698589 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.52, "acc_stderr": 0.05021167315686779, "acc_norm": 0.52, "acc_norm_stderr": 0.05021167315686779 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.31, "acc_stderr": 0.04648231987117316, "acc_norm": 0.31, "acc_norm_stderr": 0.04648231987117316 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.6647398843930635, "acc_stderr": 0.03599586301247077, "acc_norm": 0.6647398843930635, "acc_norm_stderr": 0.03599586301247077 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.38235294117647056, "acc_stderr": 0.04835503696107223, "acc_norm": 0.38235294117647056, "acc_norm_stderr": 0.04835503696107223 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.76, "acc_stderr": 0.042923469599092816, "acc_norm": 0.76, "acc_norm_stderr": 0.042923469599092816 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.6297872340425532, "acc_stderr": 0.03156564682236785, "acc_norm": 0.6297872340425532, "acc_norm_stderr": 0.03156564682236785 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.5, "acc_stderr": 0.047036043419179864, "acc_norm": 0.5, "acc_norm_stderr": 0.047036043419179864 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.6413793103448275, "acc_stderr": 0.039966295748767186, "acc_norm": 0.6413793103448275, "acc_norm_stderr": 0.039966295748767186 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.47883597883597884, "acc_stderr": 0.025728230952130726, "acc_norm": 0.47883597883597884, "acc_norm_stderr": 0.025728230952130726 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.4444444444444444, "acc_stderr": 0.044444444444444495, "acc_norm": 0.4444444444444444, "acc_norm_stderr": 0.044444444444444495 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.36, "acc_stderr": 0.048241815132442176, "acc_norm": 0.36, "acc_norm_stderr": 0.048241815132442176 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.8032258064516129, "acc_stderr": 0.022616409420742025, "acc_norm": 0.8032258064516129, "acc_norm_stderr": 0.022616409420742025 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.5172413793103449, "acc_stderr": 0.03515895551165698, "acc_norm": 0.5172413793103449, "acc_norm_stderr": 0.03515895551165698 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.72, "acc_stderr": 0.04512608598542128, "acc_norm": 0.72, "acc_norm_stderr": 0.04512608598542128 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.8, "acc_stderr": 0.031234752377721175, "acc_norm": 0.8, "acc_norm_stderr": 0.031234752377721175 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.8737373737373737, "acc_stderr": 0.02366435940288023, "acc_norm": 0.8737373737373737, "acc_norm_stderr": 0.02366435940288023 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.9067357512953368, "acc_stderr": 0.02098685459328973, "acc_norm": 0.9067357512953368, "acc_norm_stderr": 0.02098685459328973 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.6615384615384615, "acc_stderr": 0.023991500500313036, "acc_norm": 0.6615384615384615, "acc_norm_stderr": 0.023991500500313036 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.3814814814814815, "acc_stderr": 0.029616718927497593, "acc_norm": 0.3814814814814815, "acc_norm_stderr": 0.029616718927497593 }, "harness|hendrycksTest-high_school_microeconomics|5": { "acc": 0.7184873949579832, "acc_stderr": 0.



