five

open-llm-leaderboard-old/details_localfultonextractor__Erosumika-7B-v3-0.2

收藏
Hugging Face2024-03-27 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_localfultonextractor__Erosumika-7B-v3-0.2
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在评估模型localfultonextractor/Erosumika-7B-v3-0.2时自动创建的,主要用于在Open LLM Leaderboard上展示模型的性能。数据集包含63个配置,每个配置对应一个评估任务。数据集由1次运行生成,每次运行的结果作为特定的分割存储在配置中,分割名称使用运行的时间戳。train分割始终指向最新的结果。此外,还有一个名为results的配置,存储了所有运行的聚合结果,用于在Open LLM Leaderboard上计算和展示聚合指标。

该数据集是在评估模型localfultonextractor/Erosumika-7B-v3-0.2时自动创建的,主要用于在Open LLM Leaderboard上展示模型的性能。数据集包含63个配置,每个配置对应一个评估任务。数据集由1次运行生成,每次运行的结果作为特定的分割存储在配置中,分割名称使用运行的时间戳。train分割始终指向最新的结果。此外,还有一个名为results的配置,存储了所有运行的聚合结果,用于在Open LLM Leaderboard上计算和展示聚合指标。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

该数据集是在模型localfultonextractor/Erosumika-7B-v3-0.2Open LLM Leaderboard上的评估运行期间自动创建的。

数据集组成

  • 该数据集包含63个配置,每个配置对应一个评估任务。
  • 数据集从1次运行中创建,每个运行可以在每个配置中作为一个特定的分片找到,分片名称使用运行的时戳。
  • "train"分片始终指向最新的结果。
  • 一个额外的配置"results"存储所有运行的聚合结果,用于计算和显示Open LLM Leaderboard上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_localfultonextractor__Erosumika-7B-v3-0.2", "harness_winogrande_5", split="train")

最新结果

以下是最新结果来自2024-03-27T20:01:46.356275的摘要:

python { "all": { "acc": 0.6019840938005351, "acc_stderr": 0.03315259716439375, "acc_norm": 0.6055572122527448, "acc_norm_stderr": 0.03383091388067506, "mc1": 0.3953488372093023, "mc1_stderr": 0.017115815632418197, "mc2": 0.5576654290074458, "mc2_stderr": 0.01526781403132161 }, "harness|arc:challenge|25": { "acc": 0.6245733788395904, "acc_stderr": 0.014150631435111726, "acc_norm": 0.6774744027303754, "acc_norm_stderr": 0.013659980894277366 }, "harness|hellaswag|10": { "acc": 0.6413065126468831, "acc_stderr": 0.004786368011500459, "acc_norm": 0.8495319657438757, "acc_norm_stderr": 0.003567988965337711 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.3, "acc_stderr": 0.04605661864718381, "acc_norm": 0.3, "acc_norm_stderr": 0.04605661864718381 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.5407407407407407, "acc_stderr": 0.04304979692464242, "acc_norm": 0.5407407407407407, "acc_norm_stderr": 0.04304979692464242 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.6973684210526315, "acc_stderr": 0.0373852067611967, "acc_norm": 0.6973684210526315, "acc_norm_stderr": 0.0373852067611967 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.57, "acc_stderr": 0.049756985195624284, "acc_norm": 0.57, "acc_norm_stderr": 0.049756985195624284 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.6679245283018868, "acc_stderr": 0.02898545565233439, "acc_norm": 0.6679245283018868, "acc_norm_stderr": 0.02898545565233439 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.7152777777777778, "acc_stderr": 0.037738099906869334, "acc_norm": 0.7152777777777778, "acc_norm_stderr": 0.037738099906869334 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.39, "acc_stderr": 0.04902071300001975, "acc_norm": 0.39, "acc_norm_stderr": 0.04902071300001975 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.48, "acc_stderr": 0.050211673156867795, "acc_norm": 0.48, "acc_norm_stderr": 0.050211673156867795 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.29, "acc_stderr": 0.04560480215720684, "acc_norm": 0.29, "acc_norm_stderr": 0.04560480215720684 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.630057803468208, "acc_stderr": 0.0368122963339432, "acc_norm": 0.630057803468208, "acc_norm_stderr": 0.0368122963339432 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.30392156862745096, "acc_stderr": 0.04576665403207763, "acc_norm": 0.30392156862745096, "acc_norm_stderr": 0.04576665403207763 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.75, "acc_stderr": 0.04351941398892446, "acc_norm": 0.75, "acc_norm_stderr": 0.04351941398892446 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.5234042553191489, "acc_stderr": 0.03265019475033582, "acc_norm": 0.5234042553191489, "acc_norm_stderr": 0.03265019475033582 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.4298245614035088, "acc_stderr": 0.04657047260594963, "acc_norm": 0.4298245614035088, "acc_norm_stderr": 0.04657047260594963 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.5517241379310345, "acc_stderr": 0.04144311810878151, "acc_norm": 0.5517241379310345, "acc_norm_stderr": 0.04144311810878151 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.4021164021164021, "acc_stderr": 0.02525303255499769, "acc_norm": 0.4021164021164021, "acc_norm_stderr": 0.02525303255499769 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.40476190476190477, "acc_stderr": 0.04390259265377562, "acc_norm": 0.40476190476190477, "acc_norm_stderr": 0.04390259265377562 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.35, "acc_stderr": 0.047937248544110196, "acc_norm": 0.35, "acc_norm_stderr": 0.047937248544110196 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.6645161290322581, "acc_stderr": 0.02686020644472436, "acc_norm": 0.6645161290322581, "acc_norm_stderr": 0.02686020644472436 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.4729064039408867, "acc_stderr": 0.03512819077876105, "acc_norm": 0.4729064039408867, "acc_norm_stderr": 0.03512819077876105 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.57, "acc_stderr": 0.04975698519562428, "acc_norm": 0.57, "acc_norm_stderr": 0.04975698519562428 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.7090909090909091, "acc_stderr": 0.03546563019624336, "acc_norm": 0.7090909090909091, "acc_norm_stderr": 0.03546563019624336 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.7424242424242424, "acc_stderr": 0.03115626951964684, "acc_norm": 0.7424242424242424, "acc_norm_stderr": 0.03115626951964684 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.8393782383419689, "acc_stderr": 0.02649905770139746, "acc_norm": 0.8393782383419689, "acc_norm_stderr": 0.02649905770139746 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.5974358974358974, "acc_stderr": 0.02486499515976775, "acc_norm": 0.5974358974358974, "acc_norm_stderr": 0.02486499515976775 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.3148148148148148, "acc_stderr": 0.02831753349606649, "acc_norm": 0.3148148148148148,

二维码
社区交流群
二维码
科研交流群
商业服务