five

open-llm-leaderboard-old/details_NLPinas__yi-bagel-2x34b

收藏
Hugging Face2024-02-03 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_NLPinas__yi-bagel-2x34b
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在模型NLPinas/yi-bagel-2x34b在Open LLM Leaderboard上的评估运行期间自动创建的。数据集由63个配置组成,每个配置对应一个评估任务。它包含一次运行的结果,每次运行在每个配置中表示为特定的分割。train分割始终指向最新的结果。一个名为results的额外配置存储了运行的所有聚合结果,这些结果用于计算和显示Open LLM Leaderboard上的聚合指标。README还提供了如何使用Python中的datasets库加载运行细节的示例。

该数据集是在模型NLPinas/yi-bagel-2x34b在Open LLM Leaderboard上的评估运行期间自动创建的。数据集由63个配置组成,每个配置对应一个评估任务。它包含一次运行的结果,每次运行在每个配置中表示为特定的分割。train分割始终指向最新的结果。一个名为results的额外配置存储了运行的所有聚合结果,这些结果用于计算和显示Open LLM Leaderboard上的聚合指标。README还提供了如何使用Python中的datasets库加载运行细节的示例。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

数据集简介

该数据集是在对模型 NLPinas/yi-bagel-2x34b 进行评估运行期间自动创建的,用于 Open LLM Leaderboard

数据集组成

数据集包含 63 个配置,每个配置对应一个评估任务。数据集从 1 次运行中创建,每个运行可以在每个配置中作为一个特定的分片找到,分片名称使用运行的时间戳。"train" 分片始终指向最新的结果。

额外配置

一个额外的配置 "results" 存储了所有运行的聚合结果,用于计算和显示 Open LLM Leaderboard 上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_NLPinas__yi-bagel-2x34b", "harness_winogrande_5", split="train")

最新结果

以下是 2024-02-03T01:32:07.521685 运行的最新结果

python { "all": { "acc": 0.7615183276503468, "acc_stderr": 0.02832471118128543, "acc_norm": 0.7668016554766149, "acc_norm_stderr": 0.028849292688075817, "mc1": 0.5642594859241126, "mc1_stderr": 0.01735834539886313, "mc2": 0.7142422056307771, "mc2_stderr": 0.014238871538897193 }, "harness|arc:challenge|25": { "acc": 0.6962457337883959, "acc_stderr": 0.013438909184778762, "acc_norm": 0.726962457337884, "acc_norm_stderr": 0.013019332762635748 }, "harness|hellaswag|10": { "acc": 0.6620195180242979, "acc_stderr": 0.0047205513235471265, "acc_norm": 0.8544114718183629, "acc_norm_stderr": 0.0035197241633108875 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.46, "acc_stderr": 0.05009082659620333, "acc_norm": 0.46, "acc_norm_stderr": 0.05009082659620333 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.7481481481481481, "acc_stderr": 0.03749850709174021, "acc_norm": 0.7481481481481481, "acc_norm_stderr": 0.03749850709174021 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.868421052631579, "acc_stderr": 0.027508689533549912, "acc_norm": 0.868421052631579, "acc_norm_stderr": 0.027508689533549912 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.78, "acc_stderr": 0.04163331998932262, "acc_norm": 0.78, "acc_norm_stderr": 0.04163331998932262 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.8075471698113208, "acc_stderr": 0.024262979839372274, "acc_norm": 0.8075471698113208, "acc_norm_stderr": 0.024262979839372274 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.8958333333333334, "acc_stderr": 0.025545239210256917, "acc_norm": 0.8958333333333334, "acc_norm_stderr": 0.025545239210256917 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.51, "acc_stderr": 0.05024183937956912, "acc_norm": 0.51, "acc_norm_stderr": 0.05024183937956912 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.64, "acc_stderr": 0.048241815132442176, "acc_norm": 0.64, "acc_norm_stderr": 0.048241815132442176 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.43, "acc_stderr": 0.049756985195624284, "acc_norm": 0.43, "acc_norm_stderr": 0.049756985195624284 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.7456647398843931, "acc_stderr": 0.0332055644308557, "acc_norm": 0.7456647398843931, "acc_norm_stderr": 0.0332055644308557 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.5784313725490197, "acc_stderr": 0.04913595201274503, "acc_norm": 0.5784313725490197, "acc_norm_stderr": 0.04913595201274503 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.81, "acc_stderr": 0.039427724440366234, "acc_norm": 0.81, "acc_norm_stderr": 0.039427724440366234 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.7702127659574468, "acc_stderr": 0.027501752944412417, "acc_norm": 0.7702127659574468, "acc_norm_stderr": 0.027501752944412417 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.5877192982456141, "acc_stderr": 0.04630653203366596, "acc_norm": 0.5877192982456141, "acc_norm_stderr": 0.04630653203366596 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.7241379310344828, "acc_stderr": 0.037245636197746304, "acc_norm": 0.7241379310344828, "acc_norm_stderr": 0.037245636197746304 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.716931216931217, "acc_stderr": 0.023201392938194974, "acc_norm": 0.716931216931217, "acc_norm_stderr": 0.023201392938194974 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.6111111111111112, "acc_stderr": 0.04360314860077459, "acc_norm": 0.6111111111111112, "acc_norm_stderr": 0.04360314860077459 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.58, "acc_stderr": 0.049604496374885836, "acc_norm": 0.58, "acc_norm_stderr": 0.049604496374885836 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.9032258064516129, "acc_stderr": 0.016818943416345197, "acc_norm": 0.9032258064516129, "acc_norm_stderr": 0.016818943416345197 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.6403940886699507, "acc_stderr": 0.03376458246509567, "acc_norm": 0.6403940886699507, "acc_norm_stderr": 0.03376458246509567 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.8, "acc_stderr": 0.04020151261036846, "acc_norm": 0.8, "acc_norm_stderr": 0.04020151261036846 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.8606060606060606, "acc_stderr": 0.027045948825865394, "acc_norm": 0.8606060606060606, "acc_norm_stderr": 0.027045948825865394 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.9191919191919192, "acc_stderr": 0.019417681889724536, "acc_norm": 0.9191919191919192, "acc_norm_stderr": 0.019417681889724536 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.9689119170984456, "acc_stderr": 0.012525310625527033, "acc_norm": 0.9689119170984456, "acc_norm_stderr": 0.012525310625527033 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.8179487179487179, "acc_stderr": 0.0195652367829309, "acc_norm": 0.8179487179487179, "acc_norm_stderr": 0.0195652367829309 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.4703703703703704, "acc_stderr": 0.030431963547936584, "acc_norm": 0.4703703703703704, "acc_norm_stderr": 0.030431963547936584 }, "harness|hendrycksTest-high_school_microeconomics|5": { "acc": 0.8235294117647058, "acc_stderr": 0.02476290267805791, "acc_norm": 0.8235294117647058, "acc_norm_stderr": 0.02476290267805791 }, "harness|hendrycksTest-high_school_physics|5": { "acc": 0.4966887417218543, "acc_stderr": 0.04082393379449654, "acc_norm": 0.4966887417218543, "acc_norm_stderr": 0.04082393379449654 }, "harness|hendrycksTest-high_school_psychology|5": { "acc": 0.9137614678899083, "acc_stderr": 0.012035597300116245, "acc_norm": 0.9137614678899083, "acc_norm_stderr": 0.012035597300116245 }, "harness|hendrycksTest-high_school_statistics|5": { "acc": 0.6666666666666666, "acc_stderr": 0.0321495214780275, "acc_norm": 0.6666666666666666, "acc_norm_stderr": 0.0321495214780275 }, "harness|hendrycksTest-high_school_us_history|5": { "acc": 0.9166666666666666, "acc_stderr": 0.019398452135813905, "acc_norm": 0.9166666666666666, "acc_norm_stderr": 0.019398452135813905 }, "harness|hendrycksTest-high_school_world_history|5": { "acc": 0.9071729957805907, "acc_stderr": 0.01888975055095671, "acc_norm": 0.9071729957805907, "acc_norm_stderr": 0.01888975055095671 }, "harness|hendrycksTest-human_aging|5": { "acc": 0.8026905829596412, "acc_stderr": 0.02670985334496796, "acc_norm": 0.8026905829596412, "acc_norm_stderr": 0.02670985334496796 }, "harness|hendrycksTest-human_sexuality|5": { "acc": 0.8702290076335878, "acc_stderr": 0.029473649496907065, "acc_norm": 0.8702290076335878, "acc_norm_stderr": 0.029473649496907065 }, "harness|hendrycksTest-international_law|5": { "acc": 0.8925619834710744, "acc_stderr": 0.028268812192540637, "acc_norm": 0.8925619834710744, "acc_norm_stderr": 0.028268812192540637 }, "harness|hendrycksTest-jurisprudence|5": { "acc": 0.8981481481481481, "acc_stderr": 0.02923927267563275, "acc_norm": 0.8981481481481481, "acc_norm_stderr": 0.02923927267563275 }, "harness|hendrycksTest-logical_fallacies|5": { "acc": 0.8650306748466258, "acc_stderr": 0.026845765054553838, "acc_norm": 0.8650306748466258, "acc_norm_stderr": 0.026845765054553838 }, "harness|hendrycksTest-machine_learning|5": { "acc": 0.5178571428571429, "acc_stderr": 0.047427623612430116, "acc_norm": 0.5178571428571429, "acc_norm_stderr": 0.047427623612430116 }, "harness|hendrycksTest-management|5": { "acc": 0.8737864077669902, "acc_stderr": 0.03288180278808628, "acc_norm": 0.8737864077669902, "acc_norm_stderr": 0.03288180278808628 }, "harness|hendrycksTest-marketing|5": { "acc": 0.9444444444444444, "acc_stderr": 0.015006312806446912, "acc_norm": 0.9444444444444444, "acc_norm_stderr": 0.015006312806446912 }, "harness|hendrycksTest-medical_genetics|5": { "acc": 0.9, "acc_stderr": 0.03015113445777634, "acc_norm": 0.9, "acc_norm_stderr": 0.03015113445777634 }, "harness|hendrycksTest-miscellaneous|5": { "acc": 0.9016602809706258, "acc_stderr": 0.010648356301876338, "acc_norm": 0.9016602809706258, "acc_norm_stderr": 0.010648356301876338 }, "harness|hendrycksTest-moral_disputes|5": { "acc": 0.815028901734104, "acc_stderr": 0.02090397584208303, "acc_norm": 0.815028901734104, "acc_norm_stderr": 0.02090397584208303 }, "harness|hendrycksTest-moral_scenarios|5": { "acc": 0.7988826815642458, "acc_stderr": 0.013405946402609049, "acc_norm": 0.7988826815642458, "acc_norm_stderr": 0.013405946402609049 }, "harness|hendrycksTest-nutrition|5": { "acc": 0.8529411764705882, "acc_stderr": 0.020279402936174588, "acc_norm": 0.8529411764705882, "acc_norm_stderr": 0.020279402936174588 }, "harness|hendrycksTest-philosophy|5": { "acc": 0.8135048231511254, "acc_stderr": 0.022122439772480768, "acc_norm": 0.8135048231511254, "acc_norm_stderr": 0.022122439772480768 }, "harness|hendrycksTest-prehistory|5": { "acc": 0.8672839506172839, "acc_stderr": 0.018877353839571842, "acc_norm": 0.8672839506172839, "acc_norm_stderr": 0.018877353839571842 }, "harness|hendrycksTest-professional_accounting|5": { "acc": 0.624113475177305, "acc_stderr": 0.028893955412115875, "acc_norm": 0.624113475177305, "acc_norm_stderr": 0.028893955412115875 }, "harness|hendrycksTest-professional_law|5": { "acc": 0.5808344198174706, "acc_stderr": 0.012602244505788224, "acc_norm": 0.5808344198174706, "acc_norm_stderr": 0.012602244505788224 }, "harness|hendrycksTest-professional_medicine|5": { "acc": 0.8308823529411765, "acc_stderr": 0.022770868010113018, "acc_norm": 0.8308823529411765, "acc_norm_stderr": 0.022770868010113018 }, "harness|hendrycksTest-professional_psychology|5": { "acc": 0.8120915032679739, "acc_stderr": 0.015803565736776694, "acc_norm": 0.8120915032679739, "acc_norm_stderr": 0.015803565736776694 }, "harness|hendrycksTest-public_relations|5": { "acc": 0.7363636363636363, "acc_stderr": 0.04220224692971987, "acc_norm": 0.7363636363636363, "acc_norm_stderr": 0.04220224692971987 }, "harness|hendrycksTest-security_studies|5": { "acc": 0.8285714285714286, "acc_stderr": 0.02412746346265015, "acc_norm": 0.8285714285714286, "acc_norm_stderr": 0.02412746346265015 }, "harness|hendrycksTest-sociology|5": { "acc": 0.900497512437811, "acc_stderr": 0.021166216304659393, "acc_norm": 0.900497512437811, "acc_norm_stderr": 0.021166216304659393 }, "harness|hendrycksTest-us_foreign_policy|5": { "acc": 0.89, "acc_stderr": 0.03144660377352203, "acc_norm": 0.89, "acc_norm_stderr": 0.03144660377352203 }, "harness|hendrycksTest-virology|5": { "acc": 0.5843373493975904, "acc_stderr": 0.03836722176598053, "acc_norm": 0.5843373493975904, "acc_norm_stderr": 0.03836722176598053 }, "harness|hendrycksTest-world_religions|5": { "acc": 0.8888888888888888, "acc_stderr": 0.02410338420207286, "acc_norm": 0.8888888888888888, "acc_norm_stderr": 0.02410338420207286 }, "harness|truthfulqa:mc|0": { "mc1": 0.5642594859241126, "mc1_stderr": 0.01735834539886313, "mc2": 0.7142422056307771, "mc2_stderr": 0.014238871538897193 }, "harness|winogrande|5": { "acc": 0.8271507498026835, "acc_stderr": 0.010626964529971868 }, "harness|gsm8k|5": { "acc": 0.6072782410917361, "acc_stderr": 0.013451745349586576 } }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作