open-llm-leaderboard/details_facebook__galactica-30b
收藏数据集概述
数据集摘要
该数据集是在模型None在Open LLM Leaderboard上的评估运行期间自动创建的。数据集包含119个配置,每个配置对应一个评估任务。
数据集创建
数据集从2次运行中创建。每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。"train"分割始终指向最新的结果。
结果配置
一个额外的配置"results"存储所有运行的聚合结果(用于计算和显示Open LLM Leaderboard上的聚合指标)。
加载数据示例
python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_facebook__galactica-30b", "original_mmlu_world_religions_5", split="train")
最新结果
以下是2023-08-28T21:10:05.029353运行的最新结果: python { "all": { "acc": 0.4666487872974609, "acc_stderr": 0.036447127375734134 }, "original|mmlu:abstract_algebra|5": { "acc": 0.25, "acc_stderr": 0.04351941398892446 }, "original|mmlu:anatomy|5": { "acc": 0.5407407407407407, "acc_stderr": 0.04304979692464242 }, "original|mmlu:astronomy|5": { "acc": 0.506578947368421, "acc_stderr": 0.040685900502249704 }, "original|mmlu:business_ethics|5": { "acc": 0.46, "acc_stderr": 0.05009082659620332 }, "original|mmlu:clinical_knowledge|5": { "acc": 0.5471698113207547, "acc_stderr": 0.030635627957961823 }, "original|mmlu:college_biology|5": { "acc": 0.5694444444444444, "acc_stderr": 0.04140685639111502 }, "original|mmlu:college_chemistry|5": { "acc": 0.32, "acc_stderr": 0.04688261722621504 }, "original|mmlu:college_computer_science|5": { "acc": 0.41, "acc_stderr": 0.049431107042371025 }, "original|mmlu:college_mathematics|5": { "acc": 0.34, "acc_stderr": 0.04760952285695235 }, "original|mmlu:college_medicine|5": { "acc": 0.5028901734104047, "acc_stderr": 0.03812400565974834 }, "original|mmlu:college_physics|5": { "acc": 0.3333333333333333, "acc_stderr": 0.04690650298201943 }, "original|mmlu:computer_security|5": { "acc": 0.65, "acc_stderr": 0.0479372485441102 }, "original|mmlu:conceptual_physics|5": { "acc": 0.4765957446808511, "acc_stderr": 0.03265019475033581 }, "original|mmlu:econometrics|5": { "acc": 0.3684210526315789, "acc_stderr": 0.04537815354939391 }, "original|mmlu:electrical_engineering|5": { "acc": 0.5862068965517241, "acc_stderr": 0.04104269211806232 }, "original|mmlu:elementary_mathematics|5": { "acc": 0.31216931216931215, "acc_stderr": 0.023865206836972585 }, "original|mmlu:formal_logic|5": { "acc": 0.2698412698412698, "acc_stderr": 0.03970158273235172 }, "original|mmlu:global_facts|5": { "acc": 0.31, "acc_stderr": 0.04648231987117316 }, "original|mmlu:high_school_biology|5": { "acc": 0.5548387096774193, "acc_stderr": 0.028272410186214906 }, "original|mmlu:high_school_chemistry|5": { "acc": 0.39408866995073893, "acc_stderr": 0.034381579670365446 }, "original|mmlu:high_school_computer_science|5": { "acc": 0.48, "acc_stderr": 0.05021167315686781 }, "original|mmlu:high_school_european_history|5": { "acc": 0.5818181818181818, "acc_stderr": 0.03851716319398393 }, "original|mmlu:high_school_geography|5": { "acc": 0.5353535353535354, "acc_stderr": 0.03553436368828063 }, "original|mmlu:high_school_government_and_politics|5": { "acc": 0.5595854922279793, "acc_stderr": 0.03582724530036093 }, "original|mmlu:high_school_macroeconomics|5": { "acc": 0.4230769230769231, "acc_stderr": 0.025049197876042338 }, "original|mmlu:high_school_mathematics|5": { "acc": 0.2518518518518518, "acc_stderr": 0.026466117538959905 }, "original|mmlu:high_school_microeconomics|5": { "acc": 0.4579831932773109, "acc_stderr": 0.03236361111951941 }, "original|mmlu:high_school_physics|5": { "acc": 0.3576158940397351, "acc_stderr": 0.03913453431177258 }, "original|mmlu:high_school_psychology|5": { "acc": 0.6293577981651376, "acc_stderr": 0.02070745816435298 }, "original|mmlu:high_school_statistics|5": { "acc": 0.33796296296296297, "acc_stderr": 0.03225941352631295 }, "original|mmlu:high_school_us_history|5": { "acc": 0.4411764705882353, "acc_stderr": 0.03484941514429231 }, "original|mmlu:high_school_world_history|5": { "acc": 0.6033755274261603, "acc_stderr": 0.03184399873811225 }, "original|mmlu:human_aging|5": { "acc": 0.5515695067264574, "acc_stderr": 0.033378837362550984 }, "original|mmlu:human_sexuality|5": { "acc": 0.5801526717557252, "acc_stderr": 0.043285772152629715 }, "original|mmlu:international_law|5": { "acc": 0.6528925619834711, "acc_stderr": 0.04345724570292534 }, "original|mmlu:jurisprudence|5": { "acc": 0.5185185185185185, "acc_stderr": 0.04830366024635331 }, "original|mmlu:logical_fallacies|5": { "acc": 0.50920245398773, "acc_stderr": 0.03927705600787443 }, "original|mmlu:machine_learning|5": { "acc": 0.33035714285714285, "acc_stderr": 0.04464285714285712 }, "original|mmlu:management|5": { "acc": 0.6019417475728155, "acc_stderr": 0.048467482539772386 }, "original|mmlu:marketing|5": { "acc": 0.6965811965811965, "acc_stderr": 0.03011821010694266 }, "original|mmlu:medical_genetics|5": { "acc": 0.58, "acc_stderr": 0.049604496374885836 }, "original|mmlu:miscellaneous|5": { "acc": 0.4942528735632184, "acc_stderr": 0.01787878232612923 }, "original|mmlu:moral_disputes|5": { "acc": 0.4479768786127168, "acc_stderr": 0.026772990653361826 }, "original|mmlu:moral_scenarios|5": { "acc": 0.2435754189944134, "acc_stderr": 0.014355911964767864 }, "original|mmlu:nutrition|5": { "acc": 0.5196078431372549, "acc_stderr": 0.028607893699576073 }, "original|mmlu:philosophy|5": { "acc": 0.48231511254019294, "acc_stderr": 0.02838032284907713 }, "original|mmlu:prehistory|5": { "acc": 0.5401234567901234, "acc_stderr": 0.027731022753539277 }, "original|mmlu:professional_accounting|5": { "acc": 0.35815602836879434, "acc_stderr": 0.028602085862759422 }, "original|mmlu:professional_law|5": { "acc": 0.34028683181225555, "acc_stderr": 0.012101217610223794 }, "original|mmlu:professional_medicine|5": { "acc": 0.44485294117647056, "acc_stderr": 0.03018753206032939 }, "original|mmlu:professional_psychology|5": { "acc": 0.5130718954248366, "acc_stderr": 0.020220920829626916 }, "original|mmlu:public_relations|5": { "acc": 0.5272727272727272, "acc_stderr": 0.04782001791380061 }, "original|



