five

open-llm-leaderboard-old/details_jan-hq__LlamaCorn-1.1B

收藏
Hugging Face2024-01-17 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_jan-hq__LlamaCorn-1.1B
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在评估模型[jan-hq/LlamaCorn-1.1B](https://huggingface.co/jan-hq/LlamaCorn-1.1B)在[Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)上的表现时自动创建的。数据集由63个配置组成,每个配置对应一个评估任务。数据集是从1次运行中创建的,每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。"train"分割始终指向最新的结果。此外,一个名为"results"的配置存储了所有运行的聚合结果,并用于计算和显示[Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)上的聚合指标。

该数据集是在评估模型[jan-hq/LlamaCorn-1.1B](https://huggingface.co/jan-hq/LlamaCorn-1.1B)在[Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)上的表现时自动创建的。数据集由63个配置组成,每个配置对应一个评估任务。数据集是从1次运行中创建的,每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。"train"分割始终指向最新的结果。此外,一个名为"results"的配置存储了所有运行的聚合结果,并用于计算和显示[Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)上的聚合指标。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

数据集摘要

该数据集是在对模型 jan-hq/LlamaCorn-1.1B 进行评估运行期间自动创建的,评估结果展示在 Open LLM Leaderboard 上。

数据集组成

  • 数据集包含 63 个配置,每个配置对应一个评估任务。
  • 数据集从 1 次运行中创建,每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。
  • "train" 分割始终指向最新的结果。
  • 一个额外的配置 "results" 存储所有运行的聚合结果,用于计算和显示 Open LLM Leaderboard 上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_jan-hq__LlamaCorn-1.1B", "harness_winogrande_5", split="train")

最新结果

以下是 2024-01-17T02:48:10.552865 运行的最新结果

python { "all": { "acc": 0.29375199116706574, "acc_stderr": 0.03225608414226124, "acc_norm": 0.29607425614190314, "acc_norm_stderr": 0.03309063417483788, "mc1": 0.23255813953488372, "mc1_stderr": 0.0147891575310805, "mc2": 0.3677529114898043, "mc2_stderr": 0.013980681587593108 }, "harness|arc:challenge|25": { "acc": 0.3148464163822526, "acc_stderr": 0.01357265770308495, "acc_norm": 0.3412969283276451, "acc_norm_stderr": 0.013855831287497723 }, "harness|hellaswag|10": { "acc": 0.44612626966739694, "acc_stderr": 0.004960732382255234, "acc_norm": 0.5933081059549891, "acc_norm_stderr": 0.004902125388002216 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.28, "acc_stderr": 0.04512608598542128, "acc_norm": 0.28, "acc_norm_stderr": 0.04512608598542128 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.23703703703703705, "acc_stderr": 0.03673731683969506, "acc_norm": 0.23703703703703705, "acc_norm_stderr": 0.03673731683969506 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.24342105263157895, "acc_stderr": 0.03492349668884239, "acc_norm": 0.24342105263157895, "acc_norm_stderr": 0.03492349668884239 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.37, "acc_stderr": 0.04852365870939099, "acc_norm": 0.37, "acc_norm_stderr": 0.04852365870939099 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.28679245283018867, "acc_stderr": 0.027834912527544057, "acc_norm": 0.28679245283018867, "acc_norm_stderr": 0.027834912527544057 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.2569444444444444, "acc_stderr": 0.03653946969442099, "acc_norm": 0.2569444444444444, "acc_norm_stderr": 0.03653946969442099 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.28, "acc_stderr": 0.04512608598542127, "acc_norm": 0.28, "acc_norm_stderr": 0.04512608598542127 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.28, "acc_stderr": 0.045126085985421276, "acc_norm": 0.28, "acc_norm_stderr": 0.045126085985421276 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.35, "acc_stderr": 0.04793724854411019, "acc_norm": 0.35, "acc_norm_stderr": 0.04793724854411019 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.23699421965317918, "acc_stderr": 0.03242414757483098, "acc_norm": 0.23699421965317918, "acc_norm_stderr": 0.03242414757483098 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.24509803921568626, "acc_stderr": 0.042801058373643966, "acc_norm": 0.24509803921568626, "acc_norm_stderr": 0.042801058373643966 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.34, "acc_stderr": 0.04760952285695235, "acc_norm": 0.34, "acc_norm_stderr": 0.04760952285695235 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.33191489361702126, "acc_stderr": 0.030783736757745657, "acc_norm": 0.33191489361702126, "acc_norm_stderr": 0.030783736757745657 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.2807017543859649, "acc_stderr": 0.042270544512322004, "acc_norm": 0.2807017543859649, "acc_norm_stderr": 0.042270544512322004 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.296551724137931, "acc_stderr": 0.03806142687309994, "acc_norm": 0.296551724137931, "acc_norm_stderr": 0.03806142687309994 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.2751322751322751, "acc_stderr": 0.023000086859068642, "acc_norm": 0.2751322751322751, "acc_norm_stderr": 0.023000086859068642 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.24603174603174602, "acc_stderr": 0.03852273364924316, "acc_norm": 0.24603174603174602, "acc_norm_stderr": 0.03852273364924316 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.24, "acc_stderr": 0.04292346959909283, "acc_norm": 0.24, "acc_norm_stderr": 0.04292346959909283 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.25806451612903225, "acc_stderr": 0.024892469172462826, "acc_norm": 0.25806451612903225, "acc_norm_stderr": 0.024892469172462826 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.21674876847290642, "acc_stderr": 0.028990331252516235, "acc_norm": 0.21674876847290642, "acc_norm_stderr": 0.028990331252516235 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.28, "acc_stderr": 0.04512608598542129, "acc_norm": 0.28, "acc_norm_stderr": 0.04512608598542129 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.34545454545454546, "acc_stderr": 0.037131580674819135, "acc_norm": 0.34545454545454546, "acc_norm_stderr": 0.037131580674819135 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.26262626262626265, "acc_stderr": 0.03135305009533086, "acc_norm": 0.26262626262626265, "acc_norm_stderr": 0.03135305009533086 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.2694300518134715, "acc_stderr": 0.03201867122877795, "acc_norm": 0.2694300518134715, "acc_norm_stderr": 0.03201867122877795 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.2794871794871795, "acc_stderr": 0.022752388839776823, "acc_norm": 0.2794871794871795, "acc_norm_stderr": 0.022752388839776823 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.24814814814814815, "acc_stderr": 0.026335739404055803, "acc_norm": 0.24814814814814815, "acc_norm_stderr": 0.026335739404055803 }, "harness|hendrycksTest-high_school_microeconomics|5": { "acc": 0.2857142857142857, "acc_stderr": 0.029344572500634342, "acc_norm": 0.2857142857142857, "acc_norm_stderr": 0.029344572500634342 }, "harness|hendrycksTest-high_school_physics|5": { "acc": 0.2781456953642384, "acc_stderr": 0.03658603262763743, "acc_norm": 0.2781456953642384, "acc_norm_stderr": 0.03658603262763743 }, "harness|hendrycksTest-high_school_psychology|5": { "acc": 0.24403669724770644, "acc_stderr": 0.018415286351416413, "acc_norm": 0.24403669724770644, "acc_norm_stderr": 0.018415286351416413 }, "harness|hendrycksTest-high_school_statistics|5": { "acc": 0.3101851851851852, "acc_stderr": 0.03154696285656628, "acc_norm": 0.3101851851851852, "acc_norm_stderr": 0.03154696285656628 }, "harness|hendrycksTest-high_school_us_history|5": { "acc": 0.28921568627450983, "acc_stderr": 0.031822318676475544, "acc_norm": 0.28921568627450983, "acc_norm_stderr": 0.031822318676475544 }, "harness|hendrycksTest-high_school_world_history|5": { "acc": 0.3924050632911392, "acc_stderr": 0.03178471874564729, "acc_norm": 0.3924050632911392, "acc_norm_stderr": 0.03178471874564729 }, "harness|hendrycksTest-human_aging|5": { "acc": 0.4125560538116592, "acc_stderr": 0.03304062175449297, "acc_norm": 0.4125560538116592, "acc_norm_stderr": 0.03304062175449297 }, "harness|hendrycksTest-human_sexuality|5": { "acc": 0.32061068702290074, "acc_stderr": 0.040933292298342784, "acc_norm": 0.32061068702290074, "acc_norm_stderr": 0.040933292298342784 }, "harness|hendrycksTest-international_law|5": { "acc": 0.34710743801652894, "acc_stderr": 0.043457245702925355, "acc_norm": 0.34710743801652894, "acc_norm_stderr": 0.043457245702925355 }, "harness|hendrycksTest-jurisprudence|5": { "acc": 0.35185185185185186, "acc_stderr": 0.04616631111801713, "acc_norm": 0.35185185185185186, "acc_norm_stderr": 0.04616631111801713 }, "harness|hendrycksTest-logical_fallacies|5": { "acc": 0.2392638036809816, "acc_stderr": 0.03351953879521269, "acc_norm": 0.2392638036809816, "acc_norm_stderr": 0.03351953879521269 }, "harness|hendrycksTest-machine_learning|5": { "acc": 0.3482142857142857, "acc_stderr": 0.045218299028335865, "acc_norm": 0.3482142857142857, "acc_norm_stderr": 0.045218299028335865 }, "harness|hendrycksTest-management|5": { "acc": 0.2621359223300971, "acc_stderr": 0.04354631077260597, "acc_norm": 0.2621359223300971, "acc_norm_stderr": 0.04354631077260597 }, "harness|hendrycksTest-marketing|5": { "acc": 0.3547008547008547, "acc_stderr": 0.03134250486245402, "acc_norm": 0.3547008547008547, "acc_norm_stderr": 0.03134250486245402 }, "harness|hendrycksTest-medical_genetics|5": { "acc": 0.31, "acc_stderr": 0.04648231987117316, "acc_norm": 0.31, "acc_norm_stderr": 0.04648231987117316 }, "harness|hendrycksTest-miscellaneous|5": { "acc": 0.32567049808429116, "acc_stderr": 0.01675798945854968, "acc_norm": 0.32567049808429116, "acc_norm_stderr": 0.01675798945854968 }, "harness|hendrycksTest-moral_disputes|5": { "acc": 0.315028901734104, "acc_stderr": 0.0250093137900697, "acc_norm": 0.315028901734104, "acc_norm_stderr": 0.0250093137900697 }, "harness|hendrycksTest-moral_scenarios|5": { "acc": 0.26145251396648045, "acc_stderr": 0.014696599650364553, "acc_norm": 0.26145251396648045, "acc_norm_stderr": 0.014696599650364553 }, "harness|hendrycksTest-nutrition|5": { "acc": 0.27450980392156865, "acc_stderr": 0.025553169991826507, "acc_norm": 0.27450980392156865, "acc_norm_stderr": 0.025553169991826507 }, "harness|hendrycksTest-philosophy|5": { "acc": 0.2990353697749196, "acc_stderr": 0.02600330111788514, "acc_norm": 0.2990353697749196, "acc_norm_stderr": 0.02600330111788514 }, "harness|hendrycksTest-prehistory|5": { "acc": 0.2993827160493827, "acc_stderr": 0.025483115601195466, "acc_norm": 0.2993827160493827, "acc_norm_stderr": 0.025483115601195466 }, "harness|hendrycksTest-professional_accounting|5": { "acc": 0.2553191489361702, "acc_stderr": 0.026011992930902013, "acc_norm": 0.2553191489361702, "acc_norm_stderr": 0.026011992930902013 }, "harness|hendrycksTest-professional_law|5": { "acc": 0.2457627118644068, "acc_stderr": 0.01099615663514269, "acc_norm": 0.2457627118644068, "acc_norm_stderr": 0.01099615663514269 }, "harness|hendrycksTest-professional_medicine|5": { "acc": 0.21323529411764705, "acc_stderr": 0.02488097151229428, "acc_norm": 0.21323529411764705, "acc_norm_stderr": 0.02488097151229428 }, "harness|hendrycksTest-professional_psychology|5": { "acc": 0.25326797385620914, "acc_stderr": 0.01759348689536683, "acc_norm": 0.25326797385620914, "acc_norm_stderr": 0.01759348689536683 }, "harness|hendrycksTest-public_relations|5": { "acc": 0.2909090909090909, "acc_stderr": 0.04350271442923243, "acc_norm": 0.2909090909090909, "acc_norm_stderr": 0.04350271442923243 }, "harness|hendrycksTest-security_studies|5": { "acc": 0.22040816326530613, "acc_stderr": 0.026537045312145294, "acc_norm": 0.22040816326530613, "acc_norm_stderr": 0.026537045312145294 }, "harness|hendrycksTest-sociology|5": { "acc": 0.2885572139303483, "acc_stderr": 0.032038410402133226, "acc_norm": 0.2885572139303483, "acc_norm_stderr": 0.032038410402133226 }, "harness|hendrycksTest-us_foreign_policy|5": { "acc": 0.38, "acc_stderr": 0.048783173121456316, "acc_norm": 0.38, "acc_norm_stderr": 0.048783173121456316 }, "harness|hendrycksTest-virology|5": { "acc": 0.29518072289156627, "acc_stderr": 0.035509201856896294, "acc_norm": 0.29518072289156627, "acc_norm_stderr": 0.035509201856896294 }, "harness|hendrycksTest-world_religions|5": { "acc": 0.3333333333333333, "acc_stderr": 0.036155076303109344, "acc_norm": 0.3333333333333333, "acc_norm_stderr": 0.036155076303109344 }, "harness|truthfulqa:mc|0": { "mc1": 0.23255813953488372, "mc1_stderr": 0.0147891575310805, "mc2": 0.3677529114898043, "mc2_stderr": 0.013980681587593108 }, "harness|winogrande|5": { "acc": 0.6195737963693765, "acc_stderr": 0.013644727908656833 }, "harness|gsm8k|5": { "acc": 0.004548900682335102, "acc_stderr": 0.0018535550440036204 } }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作