five

open-llm-leaderboard-old/details_brucethemoose__CapyTessBorosYi-34B-200K-DARE-Ties

收藏
Hugging Face2023-12-05 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_brucethemoose__CapyTessBorosYi-34B-200K-DARE-Ties
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在评估模型brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties时自动创建的。数据集由63个配置组成,每个配置对应一个评估任务。数据集是从1次运行中创建的,每次运行都可以在特定配置中找到,分割名称使用运行的时间戳。此外,数据集还包含一个名为“results”的配置,用于存储所有运行的聚合结果,并用于计算和显示在Open LLM Leaderboard上的聚合指标。

该数据集是在评估模型brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties时自动创建的。数据集由63个配置组成,每个配置对应一个评估任务。数据集是从1次运行中创建的,每次运行都可以在特定配置中找到,分割名称使用运行的时间戳。此外,数据集还包含一个名为“results”的配置,用于存储所有运行的聚合结果,并用于计算和显示在Open LLM Leaderboard上的聚合指标。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

数据集来源

该数据集是在评估模型 brucethemoose/CapyTessBorosYi-34B-200K-DARE-TiesOpen LLM Leaderboard 上的运行过程中自动创建的。

数据集结构

  • 数据集包含 63 个配置,每个配置对应一个评估任务。
  • 数据集从 1 次运行中创建,每个运行可以在每个配置中找到一个特定的分割,分割名称使用运行的时间戳。
  • "train" 分割始终指向最新的结果。
  • 一个额外的配置 "results" 存储所有运行的聚合结果,用于计算和显示 Open LLM Leaderboard 上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_brucethemoose__CapyTessBorosYi-34B-200K-DARE-Ties", "harness_winogrande_5", split="train")

最新结果

以下是 2023-12-05T03:16:54.690977 运行的最新结果

python { "all": { "acc": 0.7567711901753588, "acc_stderr": 0.028382267920122734, "acc_norm": 0.7615616815437645, "acc_norm_stderr": 0.028914131489708655, "mc1": 0.40514075887392903, "mc1_stderr": 0.017185611727753368, "mc2": 0.5583921075323958, "mc2_stderr": 0.015750345067611658 }, "harness|arc:challenge|25": { "acc": 0.6203071672354948, "acc_stderr": 0.014182119866974872, "acc_norm": 0.6493174061433447, "acc_norm_stderr": 0.013944635930726097 }, "harness|hellaswag|10": { "acc": 0.6693885680143398, "acc_stderr": 0.004694718918225748, "acc_norm": 0.8591913961362279, "acc_norm_stderr": 0.0034711315448920457 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.48, "acc_stderr": 0.050211673156867795, "acc_norm": 0.48, "acc_norm_stderr": 0.050211673156867795 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.7407407407407407, "acc_stderr": 0.03785714465066653, "acc_norm": 0.7407407407407407, "acc_norm_stderr": 0.03785714465066653 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.9078947368421053, "acc_stderr": 0.02353268597044349, "acc_norm": 0.9078947368421053, "acc_norm_stderr": 0.02353268597044349 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.77, "acc_stderr": 0.04229525846816506, "acc_norm": 0.77, "acc_norm_stderr": 0.04229525846816506 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.8301886792452831, "acc_stderr": 0.02310839379984132, "acc_norm": 0.8301886792452831, "acc_norm_stderr": 0.02310839379984132 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.8888888888888888, "acc_stderr": 0.026280550932848076, "acc_norm": 0.8888888888888888, "acc_norm_stderr": 0.026280550932848076 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.48, "acc_stderr": 0.050211673156867795, "acc_norm": 0.48, "acc_norm_stderr": 0.050211673156867795 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.58, "acc_stderr": 0.049604496374885836, "acc_norm": 0.58, "acc_norm_stderr": 0.049604496374885836 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.4, "acc_stderr": 0.049236596391733084, "acc_norm": 0.4, "acc_norm_stderr": 0.049236596391733084 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.7456647398843931, "acc_stderr": 0.0332055644308557, "acc_norm": 0.7456647398843931, "acc_norm_stderr": 0.0332055644308557 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.5490196078431373, "acc_stderr": 0.049512182523962604, "acc_norm": 0.5490196078431373, "acc_norm_stderr": 0.049512182523962604 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.83, "acc_stderr": 0.03775251680686371, "acc_norm": 0.83, "acc_norm_stderr": 0.03775251680686371 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.7829787234042553, "acc_stderr": 0.026947483121496224, "acc_norm": 0.7829787234042553, "acc_norm_stderr": 0.026947483121496224 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.6052631578947368, "acc_stderr": 0.045981880578165414, "acc_norm": 0.6052631578947368, "acc_norm_stderr": 0.045981880578165414 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.7517241379310344, "acc_stderr": 0.03600105692727771, "acc_norm": 0.7517241379310344, "acc_norm_stderr": 0.03600105692727771 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.6878306878306878, "acc_stderr": 0.023865206836972592, "acc_norm": 0.6878306878306878, "acc_norm_stderr": 0.023865206836972592 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.5396825396825397, "acc_stderr": 0.04458029125470973, "acc_norm": 0.5396825396825397, "acc_norm_stderr": 0.04458029125470973 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.6, "acc_stderr": 0.049236596391733084, "acc_norm": 0.6, "acc_norm_stderr": 0.049236596391733084 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.896774193548387, "acc_stderr": 0.01730838128103453, "acc_norm": 0.896774193548387, "acc_norm_stderr": 0.01730838128103453 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.6502463054187192, "acc_stderr": 0.03355400904969566, "acc_norm": 0.6502463054187192, "acc_norm_stderr": 0.03355400904969566 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.8, "acc_stderr": 0.040201512610368445, "acc_norm": 0.8, "acc_norm_stderr": 0.040201512610368445 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.8606060606060606, "acc_stderr": 0.027045948825865394, "acc_norm": 0.8606060606060606, "acc_norm_stderr": 0.027045948825865394 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.9343434343434344, "acc_stderr": 0.01764652667723332, "acc_norm": 0.9343434343434344, "acc_norm_stderr": 0.01764652667723332 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.9740932642487047, "acc_stderr": 0.01146452335695318, "acc_norm": 0.9740932642487047, "acc_norm_stderr": 0.01146452335695318 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.8076923076923077, "acc_stderr": 0.019982347208637303, "acc_norm": 0.8076923076923077, "acc_norm_stderr": 0.019982347208637303 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.4037037037037037

二维码
社区交流群
二维码
科研交流群
商业服务