five

open-llm-leaderboard-old/details_InnerI__InnerI-AI-sn6-7B-slerp

收藏
Hugging Face2024-03-09 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_InnerI__InnerI-AI-sn6-7B-slerp
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在评估模型InnerI/InnerI-AI-sn6-7B-slerp在Open LLM Leaderboard上的表现时自动生成的。数据集包含63个配置,每个配置对应一个评估任务。数据集由1次运行生成,每次运行的结果存储为特定的分割,分割名称使用运行的时间戳。train分割始终指向最新的结果。此外,results配置存储了所有运行的聚合结果,用于计算和显示Open LLM Leaderboard上的聚合指标。

该数据集是在评估模型InnerI/InnerI-AI-sn6-7B-slerp在Open LLM Leaderboard上的表现时自动生成的。数据集包含63个配置,每个配置对应一个评估任务。数据集由1次运行生成,每次运行的结果存储为特定的分割,分割名称使用运行的时间戳。train分割始终指向最新的结果。此外,results配置存储了所有运行的聚合结果,用于计算和显示Open LLM Leaderboard上的聚合指标。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

数据集简介

该数据集是在评估模型InnerI/InnerI-AI-sn6-7B-slerpOpen LLM Leaderboard上的自动创建的。数据集包含63个配置,每个配置对应一个评估任务。

数据集结构

数据集由1次运行创建,每个运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。"train"分割总是指向最新的结果。

额外配置

一个额外的配置"results"存储所有运行的聚合结果,用于计算和显示在Open LLM Leaderboard上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_InnerI__InnerI-AI-sn6-7B-slerp", "harness_winogrande_5", split="train")

最新结果

以下是2024-03-09T23:27:33.296041运行的最新结果:

python { "all": { "acc": 0.5863710004612449, "acc_stderr": 0.03343473736042695, "acc_norm": 0.5912816909298625, "acc_norm_stderr": 0.034112482625008565, "mc1": 0.386780905752754, "mc1_stderr": 0.01704885701051511, "mc2": 0.5470215227427601, "mc2_stderr": 0.015011831793917758 }, "harness|arc:challenge|25": { "acc": 0.5298634812286689, "acc_stderr": 0.014585305840007105, "acc_norm": 0.5836177474402731, "acc_norm_stderr": 0.014405618279436174 }, "harness|hellaswag|10": { "acc": 0.5883290181238797, "acc_stderr": 0.0049113035697697935, "acc_norm": 0.7758414658434575, "acc_norm_stderr": 0.004161746750401134 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.33, "acc_stderr": 0.047258156262526066, "acc_norm": 0.33, "acc_norm_stderr": 0.047258156262526066 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.5481481481481482, "acc_stderr": 0.04299268905480864, "acc_norm": 0.5481481481481482, "acc_norm_stderr": 0.04299268905480864 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.618421052631579, "acc_stderr": 0.03953173377749194, "acc_norm": 0.618421052631579, "acc_norm_stderr": 0.03953173377749194 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.55, "acc_stderr": 0.049999999999999996, "acc_norm": 0.55, "acc_norm_stderr": 0.049999999999999996 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.6792452830188679, "acc_stderr": 0.028727502957880267, "acc_norm": 0.6792452830188679, "acc_norm_stderr": 0.028727502957880267 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.6319444444444444, "acc_stderr": 0.04032999053960719, "acc_norm": 0.6319444444444444, "acc_norm_stderr": 0.04032999053960719 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.44, "acc_stderr": 0.04988876515698589, "acc_norm": 0.44, "acc_norm_stderr": 0.04988876515698589 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.5, "acc_stderr": 0.050251890762960605, "acc_norm": 0.5, "acc_norm_stderr": 0.050251890762960605 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.33, "acc_stderr": 0.047258156262526045, "acc_norm": 0.33, "acc_norm_stderr": 0.047258156262526045 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.6069364161849711, "acc_stderr": 0.03724249595817731, "acc_norm": 0.6069364161849711, "acc_norm_stderr": 0.03724249595817731 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.3235294117647059, "acc_stderr": 0.046550104113196177, "acc_norm": 0.3235294117647059, "acc_norm_stderr": 0.046550104113196177 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.74, "acc_stderr": 0.044084400227680794, "acc_norm": 0.74, "acc_norm_stderr": 0.044084400227680794 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.5063829787234042, "acc_stderr": 0.032683358999363366, "acc_norm": 0.5063829787234042, "acc_norm_stderr": 0.032683358999363366 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.41228070175438597, "acc_stderr": 0.046306532033665956, "acc_norm": 0.41228070175438597, "acc_norm_stderr": 0.046306532033665956 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.5379310344827586, "acc_stderr": 0.04154659671707548, "acc_norm": 0.5379310344827586, "acc_norm_stderr": 0.04154659671707548 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.4126984126984127, "acc_stderr": 0.02535574126305527, "acc_norm": 0.4126984126984127, "acc_norm_stderr": 0.02535574126305527 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.36507936507936506, "acc_stderr": 0.04306241259127153, "acc_norm": 0.36507936507936506, "acc_norm_stderr": 0.04306241259127153 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.38, "acc_stderr": 0.04878317312145632, "acc_norm": 0.38, "acc_norm_stderr": 0.04878317312145632 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.7032258064516129, "acc_stderr": 0.025988500792411898, "acc_norm": 0.7032258064516129, "acc_norm_stderr": 0.025988500792411898 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.45320197044334976, "acc_stderr": 0.035025446508458714, "acc_norm": 0.45320197044334976, "acc_norm_stderr": 0.035025446508458714 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.67, "acc_stderr": 0.04725815626252609, "acc_norm": 0.67, "acc_norm_stderr": 0.04725815626252609 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.7272727272727273, "acc_stderr": 0.0347769116216366, "acc_norm": 0.7272727272727273, "acc_norm_stderr": 0.0347769116216366 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.7424242424242424, "acc_stderr": 0.031156269519646836, "acc_norm": 0.7424242424242424, "acc_norm_stderr": 0.031156269519646836 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.8031088082901554, "acc_stderr": 0.02869787397186068, "acc_norm": 0.8031088082901554, "acc_norm_stderr": 0.02869787397186068 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.5692307692307692, "acc_stderr": 0.025106820660539753, "acc_norm": 0.5692307692307692, "acc_norm_stderr": 0.025106820660539753 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.32592592592592595, "acc_stderr": 0.028578348365473082, "acc_norm": 0.32592592592592595

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作