five

open-llm-leaderboard-old/details_aqweteddy__llama_chat-tv_en_luban-tv_stable_platypus2

收藏
Hugging Face2023-09-14 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_aqweteddy__llama_chat-tv_en_luban-tv_stable_platypus2
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在模型aqweteddy/llama_chat-tv_en_luban-tv_stable_platypus2在Open LLM Leaderboard上的评估运行期间自动创建的。数据集包含61个配置,每个配置对应一个评估任务。数据集由1次运行创建,每次运行可以在每个配置的特定分割中找到,分割名称使用运行的时间戳命名。train分割始终指向最新的结果。此外,还有一个名为results的配置存储了所有运行的聚合结果,并用于计算和显示Open LLM Leaderboard上的聚合指标。

该数据集是在模型aqweteddy/llama_chat-tv_en_luban-tv_stable_platypus2在Open LLM Leaderboard上的评估运行期间自动创建的。数据集包含61个配置,每个配置对应一个评估任务。数据集由1次运行创建,每次运行可以在每个配置的特定分割中找到,分割名称使用运行的时间戳命名。train分割始终指向最新的结果。此外,还有一个名为results的配置存储了所有运行的聚合结果,并用于计算和显示Open LLM Leaderboard上的聚合指标。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

该数据集是在对模型 aqweteddy/llama_chat-tv_en_luban-tv_stable_platypus2 进行评估运行期间自动创建的。数据集包含61个配置,每个配置对应一个评估任务。数据集从1次运行中创建,每个运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。"train" 分割始终指向最新的结果。

数据集结构

配置

  • config_name: harness_arc_challenge_25

    • split: 2023_09_14T19_14_45.418998
      • 路径: **/details_harness|arc:challenge|25_2023-09-14T19-14-45.418998.parquet
    • split: latest
      • 路径: **/details_harness|arc:challenge|25_2023-09-14T19-14-45.418998.parquet
  • config_name: harness_hellaswag_10

    • split: 2023_09_14T19_14_45.418998
      • 路径: **/details_harness|hellaswag|10_2023-09-14T19-14-45.418998.parquet
    • split: latest
      • 路径: **/details_harness|hellaswag|10_2023-09-14T19-14-45.418998.parquet
  • config_name: harness_hendrycksTest_5

    • split: 2023_09_14T19_14_45.418998
      • 路径:
        • **/details_harness|hendrycksTest-abstract_algebra|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-anatomy|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-astronomy|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-business_ethics|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-clinical_knowledge|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-college_biology|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-college_chemistry|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-college_computer_science|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-college_mathematics|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-college_medicine|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-college_physics|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-computer_security|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-conceptual_physics|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-econometrics|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-electrical_engineering|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-elementary_mathematics|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-formal_logic|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-global_facts|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-high_school_biology|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-high_school_chemistry|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-high_school_computer_science|5_2023-09-14T19-14-45.418998.parquet
        • **/details_harness|hendrycksTest-high_school_european_history|5_2023-09-14T19-14-45.418998.parquet
        • ...

最新结果

这些是最新结果,来自运行 2023-09-14T19:14:45.418998

python { "all": { "acc": 0.4945007463965186, "acc_stderr": 0.03527731102256181, "acc_norm": 0.49711825981073476, "acc_norm_stderr": 0.035276035393914426, "mc1": 0.32558139534883723, "mc1_stderr": 0.016403989469907825, "mc2": 0.5188093512935639, "mc2_stderr": 0.016351300657386426 }, "harness|arc:challenge|25": { "acc": 0.4351535836177474, "acc_stderr": 0.014487986197186043, "acc_norm": 0.4453924914675768, "acc_norm_stderr": 0.01452398763834409 }, "harness|hellaswag|10": { "acc": 0.4660426209918343, "acc_stderr": 0.004978260641742204, "acc_norm": 0.6102370045807608, "acc_norm_stderr": 0.004866997110388195 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.29, "acc_stderr": 0.045604802157206845, "acc_norm": 0.29, "acc_norm_stderr": 0.045604802157206845 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.4740740740740741, "acc_stderr": 0.04313531696750574, "acc_norm": 0.4740740740740741, "acc_norm_stderr": 0.04313531696750574 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.47368421052631576, "acc_stderr": 0.04063302731486671, "acc_norm": 0.47368421052631576, "acc_norm_stderr": 0.04063302731486671 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.53, "acc_stderr": 0.05016135580465919, "acc_norm": 0.53, "acc_norm_stderr": 0.05016135580465919 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.5320754716981132, "acc_stderr": 0.030709486992556538, "acc_norm": 0.5320754716981132, "acc_norm_stderr": 0.030709486992556538 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.5277777777777778, "acc_stderr": 0.04174752578923185, "acc_norm": 0.5277777777777778, "acc_norm_stderr": 0.04174752578923185 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.33, "acc_stderr": 0.047258156262526045, "acc_norm": 0.33, "acc_norm_stderr": 0.047258156262526045 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.43, "acc_stderr": 0.049756985195624284, "acc_norm": 0.43, "acc_norm_stderr": 0.049756985195624284 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.29, "acc_stderr": 0.04560480215720684, "acc_norm": 0.29, "acc_norm_stderr": 0.04560480215720684 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.4797687861271676, "acc_stderr": 0.03809342081273957, "acc_norm": 0.4797687861271676, "acc_norm_stderr": 0.03809342081273957 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.2647058823529412, "acc_stderr": 0.04389869956808778, "acc_norm": 0.2647058823529412, "acc_norm_stderr": 0.04389869956808778 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.67, "acc_stderr": 0.04725815626252609, "acc_norm": 0.67, "acc_norm_stderr": 0.04725815626252609 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.425531914893617, "acc_stderr": 0.032321469162244695, "acc_norm": 0.425531914893617, "acc_norm_stderr": 0.032321469162244695 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.3333333333333333, "acc_stderr": 0.044346007015849245, "acc_norm": 0.3333333333333333, "acc_norm_stderr": 0.044346007015849245 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.4206896551724138, "acc_stderr": 0.0411391498118926, "acc_norm": 0.4206896551724138, "acc_norm_stderr": 0.0411391498118926 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.2857142857142857, "acc_stderr": 0.02326651221373057, "acc_norm": 0.2857142857142857, "acc_norm_stderr": 0.02326651221373057 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.3412698412698413, "acc_stderr": 0.04240799327574924, "acc_norm": 0.3412698412698413, "acc_norm_stderr": 0.04240799327574924 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.36, "acc_stderr": 0.04824181513244218, "acc_norm": 0.36, "acc_norm_stderr": 0.04824181513244218 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.535483870967742, "acc_stderr": 0.028372287797962935, "acc_norm": 0.535483870967742, "acc_norm_stderr": 0.028372287797962935 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.3842364532019704, "acc_stderr": 0.03422398565657551, "acc_norm": 0.3842364532019704, "acc_norm_stderr": 0.03422398565657551 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.56, "acc_stderr": 0.049888765156985884, "acc_norm": 0.56, "acc_norm_stderr": 0.049888765156985884 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.6363636363636364, "acc_stderr": 0.03756335775187896, "acc_norm": 0.6363636363636364, "acc_norm_stderr": 0.03756335775187896 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.6515151515151515, "acc_stderr": 0.033948539651564025, "acc_norm": 0.6515151515151515, "acc_norm_stderr": 0.033948539651564025 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.7461139896373057, "acc_stderr": 0.0314102478056532, "acc_norm": 0.7461139896373057, "acc_norm_stderr": 0.0314102478056532 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.5153846153846153, "acc_stderr": 0.025339003010106515, "acc_norm": 0.5153846153846153, "acc_norm_stderr": 0.025339003010106515 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.2851851851851852, "acc_stderr": 0.02752859921034049, "acc_norm": 0.2851851851851852, "acc_norm_stderr": 0.02752859921034049 }, "harness|hendrycksTest-high_school_microeconomics|5": { "acc": 0.47058823529411764, "acc_stderr": 0.03242225027115006, "acc_norm": 0.47058823529411764, "acc_norm_stderr": 0.03242225027115006 }, "harness|hendrycksTest-high_school_physics|5": { "acc": 0.32450331125827814, "acc_stderr": 0.038227469376587525, "acc_norm": 0.32450331125827814, "acc_norm_stderr": 0.038227469376587525 }, "harness|hendrycksTest-high_school_psychology|5": { "acc": 0.6495412844036698, "acc_stderr": 0.020456077599824467, "acc_norm": 0.6495412844036698, "acc_norm_stderr": 0.020456077599824467 }, "harness|hendrycksTest-high_school_statistics|5": { "acc": 0.28703703703703703, "acc_stderr": 0.030851992993257013, "acc_norm": 0.28703703703703703, "acc_norm_stderr": 0.030851992993257013 }, "harness|hendrycksTest-high_school_us_history|5": { "acc": 0.5784313725490197, "acc_stderr": 0.03465868196380762, "acc_norm": 0.5784313725490197, "acc_norm_stderr": 0.03465868196380762 }, "harness|hendrycksTest-high_school_world_history|5": { "acc": 0.6582278481012658, "acc_stderr": 0.030874537537553617, "acc_norm": 0.6582278481012658, "acc_norm_stderr": 0.030874537537553617 }, "harness|hendrycksTest-human_aging|5": { "acc": 0.5739910313901345, "acc_stderr": 0.033188332862172806, "acc_norm": 0.5739910313901345, "acc_norm_stderr": 0.033188332862172806 }, "harness|hendrycksTest-human_sexuality|5": { "acc": 0.5572519083969466, "acc_stderr": 0.0435644720266507, "acc_norm": 0.5572519083969466, "acc_norm_stderr": 0.0435644720266507 }, "harness|hendrycksTest-international_law|5": { "acc": 0.6033057851239669, "acc_stderr": 0.04465869780531009, "acc_norm": 0.6033057851239669, "acc_norm_stderr": 0.04465869780531009 }, "harness|hendrycksTest-jurisprudence|5": { "acc": 0.5648148148148148, "acc_stderr": 0.04792898170907061, "acc_norm": 0.5648148148148148, "acc_norm_stderr": 0.04792898170907061 }, "harness|hendrycksTest-logical_fallacies|5": { "acc": 0.5705521472392638, "acc_stderr": 0.03889066619112723, "acc_norm": 0.5705521472392638, "acc_norm_stderr": 0.03889066619112723 }, "harness|hendrycksTest-machine_learning|5": { "acc": 0.45535714285714285, "acc_stderr": 0.047268355537191, "acc_norm": 0.45535714285714285, "acc_norm_stderr": 0.047268355537191 }, "harness|hendrycksTest-management|5": { "acc": 0.6699029126213593, "acc_stderr": 0.0465614711001235, "acc_norm": 0.6699029126213593, "acc_norm_stderr": 0.0465614711001235 }, "harness|hendrycksTest-marketing|5": { "acc": 0.7863247863247863, "acc_stderr": 0.026853450377009157, "acc_norm": 0.7863247863247863, "acc_norm_stderr": 0.026853450377009157 }, "harness|hendrycksTest-medical_genetics|5": { "acc": 0.53, "acc_stderr": 0.05016135580465919, "acc_norm": 0.53, "acc_norm_stderr": 0.05016135580465919 }, "harness|hendrycksTest-miscellaneous|5": { "acc": 0.6730523627075351, "acc_stderr": 0.016774908180131467, "acc_norm": 0.6730523627075351, "acc_norm_stderr": 0.016774908180131467 }, "harness|hendrycksTest-moral_disputes|5": { "acc": 0.5144508670520231, "acc_stderr": 0.02690784985628254, "acc_norm": 0.5144508670520231, "acc_norm_stderr": 0.02690784985628254 }, "harness|hendrycksTest-moral_scenarios|5": { "acc": 0.41564245810055866, "acc_stderr": 0.016482782187500662, "acc_norm": 0.41564245810055866, "acc_norm_stderr": 0.016482782187500662 }, "harness|hendrycksTest-nutrition|5": { "acc": 0.5163398692810458, "acc_stderr": 0.028614624752805427, "acc_norm": 0.5163398692810458, "acc_norm_stderr": 0.028614624752805427 }, "harness|hendrycksTest-philosophy|5": { "acc": 0.5209003215434084, "acc_stderr": 0.02837327096106942, "acc_norm": 0.5209003215434084, "acc_norm_stderr": 0.02837327096106942 }, "harness|hendrycksTest-prehistory|5": { "acc": 0.5185185185185185, "acc_stderr": 0.027801656212323674, "acc_norm": 0.5185185185185185, "acc_norm_stderr": 0.027801656212323674 }, "harness|hendrycksTest-professional_accounting|5": { "acc": 0.38652482269503546, "acc_stderr": 0.029049190342543465, "acc_norm": 0.38652482269503546, "acc_norm_stderr": 0.029049190342543465 }, "harness|hendrycksTest-professional_law|5": { "acc": 0.3728813559322034, "acc_stderr": 0.012350630058333353, "acc_norm": 0.3728813559322034, "acc_norm_stderr": 0.012350630058333353 }, "harness|hendrycksTest-professional_medicine|5": { "acc": 0.40441176470588236, "acc_stderr": 0.029812630701569743, "acc_norm": 0.40441176470588236, "acc_norm_stderr": 0.029812630701569743 }, "harness|hendrycksTest-professional_psychology|5": { "acc": 0.45588235294117646, "acc_stderr": 0.020148939420415738, "acc_norm": 0.45588235294117646, "acc_norm_stderr": 0.020148939420415738 }, "harness|hendrycksTest-public_relations|5": { "acc": 0.5909090909090909, "acc_stderr": 0.04709306978661896, "acc_norm": 0.5909090909090909, "acc_norm_stderr": 0.04709306978661896 }, "harness|hendrycksTest-security_studies|5": { "acc": 0.5306122448979592, "acc_stderr": 0.031949171367580624, "acc_norm": 0.5306122448979592, "acc_norm_stderr": 0.031949171367580624 }, "harness|hendrycksTest-sociology|5": { "acc": 0.572139303482587, "acc_stderr": 0.03498541988407795, "acc_norm": 0.572139303482587, "acc_norm_stderr": 0.03498541988407795 }, "harness|hendrycksTest-us_foreign_policy|5": { "acc": 0.65, "acc_stderr": 0.0479372485441102, "acc_norm": 0.65, "acc_norm_stderr": 0.0479372485441102 }, "harness|hendrycksTest-virology|5": { "acc": 0.39759036144578314, "acc_stderr": 0.038099730845402184, "acc_norm": 0.39759036144578314, "acc_norm_stderr": 0.038099730845402184 }, "harness|hendrycksTest-world_religions|5": { "acc": 0.7017543859649122, "acc_stderr": 0.03508771929824563, "acc_norm": 0.7017543859649122, "acc_norm_stderr": 0.03508771929824563 }, "harness|truthfulqa:mc|0": { "mc1": 0.32558139534883723, "mc1_stderr": 0.016403989469907825, "mc2": 0.5188093512935639, "mc2_stderr": 0.016351300657386426 } }

二维码
社区交流群
二维码
科研交流群
商业服务