five

open-llm-leaderboard-old/details_jondurbin__nontoxic-bagel-34b-v0.2

收藏
Hugging Face2024-01-05 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_jondurbin__nontoxic-bagel-34b-v0.2
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在评估模型jondurbin/nontoxic-bagel-34b-v0.2的过程中自动创建的。数据集由63个配置组成,每个配置对应一个评估任务。数据集是从1次运行中创建的,每次运行都可以在特定配置中找到,分割名称使用运行的时间戳。train分割始终指向最新结果。此外,还有一个名为results的配置存储了所有运行的聚合结果,用于计算和显示在Open LLM Leaderboard上的聚合指标。

该数据集是在评估模型jondurbin/nontoxic-bagel-34b-v0.2的过程中自动创建的。数据集由63个配置组成,每个配置对应一个评估任务。数据集是从1次运行中创建的,每次运行都可以在特定配置中找到,分割名称使用运行的时间戳。train分割始终指向最新结果。此外,还有一个名为results的配置存储了所有运行的聚合结果,用于计算和显示在Open LLM Leaderboard上的聚合指标。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

该数据集是在对模型 jondurbin/nontoxic-bagel-34b-v0.2 进行评估运行期间自动创建的,评估结果发布在 Open LLM Leaderboard 上。

数据集组成

  • 数据集包含 63 个配置,每个配置对应一个评估任务。
  • 数据集从 1 次运行中创建,每个运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。
  • "train" 分割始终指向最新的结果。
  • 一个额外的配置 "results" 存储所有运行的聚合结果,用于计算和显示在 Open LLM Leaderboard 上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_jondurbin__nontoxic-bagel-34b-v0.2", "harness_winogrande_5", split="train")

最新结果

以下是 2024-01-05T02:55:21.348986 运行的最新结果

python { "all": { "acc": 0.7594956082544593, "acc_stderr": 0.028345085033316512, "acc_norm": 0.7650118685420522, "acc_norm_stderr": 0.028868671238544558, "mc1": 0.5826193390452876, "mc1_stderr": 0.017262891063272164, "mc2": 0.7269948354406905, "mc2_stderr": 0.014159145919355787 }, "harness|arc:challenge|25": { "acc": 0.7005119453924915, "acc_stderr": 0.013385021637313572, "acc_norm": 0.7244027303754266, "acc_norm_stderr": 0.01305716965576184 }, "harness|hellaswag|10": { "acc": 0.6645090619398526, "acc_stderr": 0.004711968379069026, "acc_norm": 0.8564031069508066, "acc_norm_stderr": 0.003499638255180272 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.45, "acc_stderr": 0.05, "acc_norm": 0.45, "acc_norm_stderr": 0.05 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.7407407407407407, "acc_stderr": 0.03785714465066653, "acc_norm": 0.7407407407407407, "acc_norm_stderr": 0.03785714465066653 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.8618421052631579, "acc_stderr": 0.028081042939576552, "acc_norm": 0.8618421052631579, "acc_norm_stderr": 0.028081042939576552 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.77, "acc_stderr": 0.04229525846816505, "acc_norm": 0.77, "acc_norm_stderr": 0.04229525846816505 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.8113207547169812, "acc_stderr": 0.024079995130062253, "acc_norm": 0.8113207547169812, "acc_norm_stderr": 0.024079995130062253 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.8819444444444444, "acc_stderr": 0.026983346503309382, "acc_norm": 0.8819444444444444, "acc_norm_stderr": 0.026983346503309382 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.5, "acc_stderr": 0.050251890762960605, "acc_norm": 0.5, "acc_norm_stderr": 0.050251890762960605 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.66, "acc_stderr": 0.04760952285695237, "acc_norm": 0.66, "acc_norm_stderr": 0.04760952285695237 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.44, "acc_stderr": 0.04988876515698589, "acc_norm": 0.44, "acc_norm_stderr": 0.04988876515698589 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.7572254335260116, "acc_stderr": 0.0326926380614177, "acc_norm": 0.7572254335260116, "acc_norm_stderr": 0.0326926380614177 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.5490196078431373, "acc_stderr": 0.04951218252396262, "acc_norm": 0.5490196078431373, "acc_norm_stderr": 0.04951218252396262 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.81, "acc_stderr": 0.039427724440366234, "acc_norm": 0.81, "acc_norm_stderr": 0.039427724440366234 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.7702127659574468, "acc_stderr": 0.027501752944412417, "acc_norm": 0.7702127659574468, "acc_norm_stderr": 0.027501752944412417 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.6052631578947368, "acc_stderr": 0.04598188057816542, "acc_norm": 0.6052631578947368, "acc_norm_stderr": 0.04598188057816542 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.7241379310344828, "acc_stderr": 0.037245636197746304, "acc_norm": 0.7241379310344828, "acc_norm_stderr": 0.037245636197746304 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.7142857142857143, "acc_stderr": 0.023266512213730578, "acc_norm": 0.7142857142857143, "acc_norm_stderr": 0.023266512213730578 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.6031746031746031, "acc_stderr": 0.0437588849272706, "acc_norm": 0.6031746031746031, "acc_norm_stderr": 0.0437588849272706 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.56, "acc_stderr": 0.04988876515698589, "acc_norm": 0.56, "acc_norm_stderr": 0.04988876515698589 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.9096774193548387, "acc_stderr": 0.016306570644488313, "acc_norm": 0.9096774193548387, "acc_norm_stderr": 0.016306570644488313 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.6305418719211823, "acc_stderr": 0.03395970381998573, "acc_norm": 0.6305418719211823, "acc_norm_stderr": 0.03395970381998573 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.81, "acc_stderr": 0.039427724440366234, "acc_norm": 0.81, "acc_norm_stderr": 0.039427724440366234 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.8666666666666667, "acc_stderr": 0.026544435312706463, "acc_norm": 0.8666666666666667, "acc_norm_stderr": 0.026544435312706463 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.9191919191919192, "acc_stderr": 0.019417681889724536, "acc_norm": 0.9191919191919192, "acc_norm_stderr": 0.019417681889724536 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.9637305699481865, "acc_stderr": 0.013492659751295133, "acc_norm": 0.9637305699481865, "acc_norm_stderr": 0.013492659751295133 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.8102564102564103, "acc_stderr": 0.01988016540658878, "acc_norm": 0.8102564102564103, "acc_norm_stderr": 0.01988016540658878 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.45555555555555555, "acc_stderr": 0.03036486250482443, "acc_norm": 0.45555555555555555, "acc_norm_stderr": 0.03036486250482443 }, "har

二维码
社区交流群
二维码
科研交流群
商业服务