five

open-llm-leaderboard-old/details_DreadPoor__BagelLake-7B-slerp

收藏
Hugging Face2024-02-10 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_DreadPoor__BagelLake-7B-slerp
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在评估模型DreadPoor/BagelLake-7B-slerp时自动生成的,用于Open LLM Leaderboard的评估。数据集包含63个配置,每个配置对应一个评估任务。数据集由1次运行生成,每次运行的结果存储为特定配置中的一个分割,分割名称使用运行的时间戳。train分割始终指向最新的结果。此外,数据集还包含一个名为results的配置,用于存储所有运行的聚合结果,并用于计算和显示Open LLM Leaderboard上的聚合指标。

该数据集是在评估模型DreadPoor/BagelLake-7B-slerp时自动生成的,用于Open LLM Leaderboard的评估。数据集包含63个配置,每个配置对应一个评估任务。数据集由1次运行生成,每次运行的结果存储为特定配置中的一个分割,分割名称使用运行的时间戳。train分割始终指向最新的结果。此外,数据集还包含一个名为results的配置,用于存储所有运行的聚合结果,并用于计算和显示Open LLM Leaderboard上的聚合指标。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

数据集摘要

该数据集是在对模型 DreadPoor/BagelLake-7B-slerp 进行评估运行期间自动创建的,评估结果发布在 Open LLM Leaderboard 上。

数据集组成

  • 数据集包含 63 个配置,每个配置对应一个评估任务。
  • 数据集从 1 次运行中创建,每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。
  • "train" 分割始终指向最新的结果。
  • 额外的 "results" 配置存储所有运行的聚合结果,用于计算和显示聚合指标在 Open LLM Leaderboard 上。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_DreadPoor__BagelLake-7B-slerp", "harness_winogrande_5", split="train")

最新结果

以下是 2024-02-10T15:59:28.200270 运行的最新结果

python { "all": { "acc": 0.6459222271769905, "acc_stderr": 0.032303779328089297, "acc_norm": 0.6472329006433842, "acc_norm_stderr": 0.03296936485487387, "mc1": 0.48225214198286415, "mc1_stderr": 0.017492470843075363, "mc2": 0.6375898149834941, "mc2_stderr": 0.015466648799208926 }, "harness|arc:challenge|25": { "acc": 0.6638225255972696, "acc_stderr": 0.013804855026205763, "acc_norm": 0.6825938566552902, "acc_norm_stderr": 0.013602239088038167 }, "harness|hellaswag|10": { "acc": 0.6734714200358495, "acc_stderr": 0.004679847503411344, "acc_norm": 0.8507269468233419, "acc_norm_stderr": 0.0035562912320503525 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.32, "acc_stderr": 0.046882617226215034, "acc_norm": 0.32, "acc_norm_stderr": 0.046882617226215034 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.5925925925925926, "acc_stderr": 0.04244633238353227, "acc_norm": 0.5925925925925926, "acc_norm_stderr": 0.04244633238353227 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.6710526315789473, "acc_stderr": 0.03823428969926605, "acc_norm": 0.6710526315789473, "acc_norm_stderr": 0.03823428969926605 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.6, "acc_stderr": 0.04923659639173309, "acc_norm": 0.6, "acc_norm_stderr": 0.04923659639173309 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.690566037735849, "acc_stderr": 0.028450154794118637, "acc_norm": 0.690566037735849, "acc_norm_stderr": 0.028450154794118637 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.7638888888888888, "acc_stderr": 0.03551446610810826, "acc_norm": 0.7638888888888888, "acc_norm_stderr": 0.03551446610810826 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.49, "acc_stderr": 0.05024183937956912, "acc_norm": 0.49, "acc_norm_stderr": 0.05024183937956912 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.5, "acc_stderr": 0.050251890762960605, "acc_norm": 0.5, "acc_norm_stderr": 0.050251890762960605 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.31, "acc_stderr": 0.04648231987117316, "acc_norm": 0.31, "acc_norm_stderr": 0.04648231987117316 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.653179190751445, "acc_stderr": 0.036291466701596636, "acc_norm": 0.653179190751445, "acc_norm_stderr": 0.036291466701596636 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.4117647058823529, "acc_stderr": 0.048971049527263666, "acc_norm": 0.4117647058823529, "acc_norm_stderr": 0.048971049527263666 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.79, "acc_stderr": 0.04093601807403326, "acc_norm": 0.79, "acc_norm_stderr": 0.04093601807403326 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.5574468085106383, "acc_stderr": 0.032469569197899575, "acc_norm": 0.5574468085106383, "acc_norm_stderr": 0.032469569197899575 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.5087719298245614, "acc_stderr": 0.04702880432049615, "acc_norm": 0.5087719298245614, "acc_norm_stderr": 0.04702880432049615 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.5517241379310345, "acc_stderr": 0.041443118108781526, "acc_norm": 0.5517241379310345, "acc_norm_stderr": 0.041443118108781526 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.4074074074074074, "acc_stderr": 0.02530590624159063, "acc_norm": 0.4074074074074074, "acc_norm_stderr": 0.02530590624159063 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.42063492063492064, "acc_stderr": 0.04415438226743744, "acc_norm": 0.42063492063492064, "acc_norm_stderr": 0.04415438226743744 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.43, "acc_stderr": 0.049756985195624284, "acc_norm": 0.43, "acc_norm_stderr": 0.049756985195624284 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.7580645161290323, "acc_stderr": 0.0243625996930311, "acc_norm": 0.7580645161290323, "acc_norm_stderr": 0.0243625996930311 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.5221674876847291, "acc_stderr": 0.03514528562175008, "acc_norm": 0.5221674876847291, "acc_norm_stderr": 0.03514528562175008 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.68, "acc_stderr": 0.04688261722621505, "acc_norm": 0.68, "acc_norm_stderr": 0.04688261722621505 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.7696969696969697, "acc_stderr": 0.0328766675860349, "acc_norm": 0.7696969696969697, "acc_norm_stderr": 0.0328766675860349 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.7929292929292929, "acc_stderr": 0.02886977846026705, "acc_norm": 0.7929292929292929, "acc_norm_stderr": 0.02886977846026705 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.8860103626943006, "acc_stderr": 0.022935144053919443, "acc_norm": 0.8860103626943006, "acc_norm_stderr": 0.022935144053919443 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.6666666666666666, "acc_stderr": 0.023901157979402538, "acc_norm": 0.6666666666666666, "acc_norm_stderr": 0.023901157979402538 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.35555555555555557, "acc_stderr": 0.02918571494985741, "acc_norm": 0.35555555555555557, "acc_norm_stderr": 0.02918571494985741 }, "harness|hendrycksTest-high_school_microeconomics|5": { "acc": 0.6974789915966386, "acc_stderr": 0.029837962388291932, "acc_norm": 0.6974789915966386, "acc_norm_stderr": 0.029837962388291932 }, "harness|hendrycksTest-high_school_physics|5": { "acc": 0.40397350993377484, "acc_stderr": 0.040064856853653415, "acc_norm": 0.40397350993377484, "acc_norm_stderr": 0.040064856853653415 }, "harness|hendrycksTest-high_school_psychology|5": { "acc": 0.8330275229357799, "acc_stderr": 0.01599015488507337, "acc_norm": 0.8330275229357799, "acc_norm_stderr": 0.01599015488507337 }, "harness|hendrycksTest-high_school_statistics|5": { "acc": 0.5555555555555556, "acc_stderr": 0.03388857118502325, "acc_norm": 0.5555555555555556, "acc_norm_stderr": 0.03388857118502325 }, "harness|hendrycksTest-high_school_us_history|5": { "acc": 0.803921568627451, "acc_stderr": 0.027865942286639318, "acc_norm": 0.803921568627451, "acc_norm_stderr": 0.027865942286639318 }, "harness|hendrycksTest-high_school_world_history|5": { "acc": 0.7932489451476793, "acc_stderr": 0.026361651668389087, "acc_norm": 0.7932489451476793, "acc_norm_stderr": 0.026361651668389087 }, "harness|hendrycksTest-human_aging|5": { "acc": 0.7040358744394619, "acc_stderr": 0.030636591348699813, "acc_norm": 0.7040358744394619, "acc_norm_stderr": 0.030636591348699813 }, "harness|hendrycksTest-human_sexuality|5": { "acc": 0.7557251908396947, "acc_stderr": 0.037683359597287434, "acc_norm": 0.7557251908396947, "acc_norm_stderr": 0.037683359597287434 }, "harness|hendrycksTest-international_law|5": { "acc": 0.7603305785123967, "acc_stderr": 0.038968789850704164, "acc_norm": 0.7603305785123967, "acc_norm_stderr": 0.038968789850704164 }, "harness|hendrycksTest-jurisprudence|5": { "acc": 0.7962962962962963, "acc_stderr": 0.03893542518824847, "acc_norm": 0.7962962962962963, "acc_norm_stderr": 0.03893542518824847 }, "harness|hendrycksTest-logical_fallacies|5": { "acc": 0.7668711656441718, "acc_stderr": 0.0332201579577674, "acc_norm": 0.7668711656441718, "acc_norm_stderr": 0.0332201579577674 }, "harness|hendrycksTest-machine_learning|5": { "acc": 0.4642857142857143, "acc_stderr": 0.04733667890053756, "acc_norm": 0.4642857142857143, "acc_norm_stderr": 0.04733667890053756 }, "harness|hendrycksTest-management|5": { "acc": 0.7669902912621359, "acc_stderr": 0.04185832598928315, "acc_norm": 0.7669902912621359, "acc_norm_stderr": 0.04185832598928315 }, "harness|hendrycksTest-marketing|5": { "acc": 0.8760683760683761, "acc_stderr": 0.021586494001281382, "acc_norm": 0.8760683760683761, "acc_norm_stderr": 0.021586494001281382 }, "harness|hendrycksTest-medical_genetics|5": { "acc": 0.74, "acc_stderr": 0.04408440022768078, "acc_norm": 0.74, "acc_norm_stderr": 0.04408440022768078 }, "harness|hendrycksTest-miscellaneous|5": { "acc": 0.8275862068965517, "acc_stderr": 0.013507943909371803, "acc_norm": 0.8275862068965517, "acc_norm_stderr": 0.013507943909371803 }, "harness|hendrycksTest-moral_disputes|5": { "acc": 0.708092485549133, "acc_stderr": 0.024476994076247337, "acc_norm": 0.708092485549133, "acc_norm_stderr": 0.024476994076247337 }, "harness|hendrycksTest-moral_scenarios|5": { "acc": 0.38100558659217876, "acc_stderr": 0.016242028834053623, "acc_norm": 0.38100558659217876, "acc_norm_stderr": 0.016242028834053623 }, "harness|hendrycksTest-nutrition|5": { "acc": 0.7320261437908496, "acc_stderr": 0.02536060379624256, "acc_norm": 0.7320261437908496, "acc_norm_stderr": 0.02536060379624256 }, "harness|hendrycksTest-philosophy|5": { "acc": 0.7234726688102894, "acc_stderr": 0.02540383297817961, "acc_norm": 0.7234726688102894, "acc_norm_stderr": 0.02540383297817961 }, "harness|hendrycksTest-prehistory|5": { "acc": 0.7283950617283951, "acc_stderr": 0.024748624490537368, "acc_norm": 0.7283950617283951, "acc_norm_stderr": 0.024748624490537368 }, "harness|hendrycksTest-professional_accounting|5": { "acc": 0.46099290780141844, "acc_stderr": 0.029736592526424438, "acc_norm": 0.46099290780141844, "acc_norm_stderr": 0.029736592526424438 }, "harness|hendrycksTest-professional_law|5": { "acc": 0.45827900912646674, "acc_stderr": 0.01272570165695364, "acc_norm": 0.45827900912646674, "acc_norm_stderr": 0.01272570165695364 }, "harness|hendrycksTest-professional_medicine|5": { "acc": 0.6801470588235294, "acc_stderr": 0.028332959514031218, "acc_norm": 0.6801470588235294, "acc_norm_stderr": 0.028332959514031218 }, "harness|hendrycksTest-professional_psychology|5": { "acc": 0.6552287581699346, "acc_stderr": 0.019228322018696644, "acc_norm": 0.6552287581699346, "acc_norm_stderr": 0.019228322018696644 }, "harness|hendrycksTest-public_relations|5": { "acc": 0.6909090909090909, "acc_stderr": 0.044262946482000985, "acc_norm": 0.6909090909090909, "acc_norm_stderr": 0.044262946482000985 }, "harness|hendrycksTest-security_studies|5": { "acc": 0.7142857142857143, "acc_stderr": 0.0289205832206756, "acc_norm": 0.7142857142857143, "acc_norm_stderr": 0.0289205832206756 }, "harness|hendrycksTest-sociology|5": { "acc": 0.8407960199004975, "acc_stderr": 0.02587064676616914, "acc_norm": 0.8407960199004975, "acc_norm_stderr": 0.02587064676616914 }, "harness|hendrycksTest-us_foreign_policy|5": { "acc": 0.85, "acc_stderr": 0.035887028128263734, "acc_norm": 0.85, "acc_norm_stderr": 0.035887028128263734 }, "harness|hendrycksTest-virology|5": { "acc": 0.5421686746987951, "acc_stderr": 0.0387862677100236, "acc_norm": 0.5421686746987951, "acc_norm_stderr": 0.0387862677100236 }, "harness|hendrycksTest-world_religions|5": { "acc": 0.8421052631578947, "acc_stderr": 0.027966785859160893, "acc_norm": 0.8421052631578947, "acc_norm_stderr": 0.027966785859160893 }, "harness|truthfulqa:mc|0": { "mc1": 0.48225214198286415, "mc1_stderr": 0.017492470843075363, "mc2": 0.6375898149834941, "mc2_stderr": 0.015466648799208926 }, "harness|winogrande|5": { "acc": 0.8366219415943172, "acc_stderr": 0.010390695970273764 }, "harness|gsm8k|5": { "acc": 0.5739196360879454, "acc_stderr": 0.013621144396086707 } }

二维码
社区交流群
二维码
科研交流群
商业服务