five

open-llm-leaderboard/details_lloorree__kssht-euripedes-70b

收藏
Hugging Face2023-09-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard/details_lloorree__kssht-euripedes-70b
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在评估模型lloorree/kssht-euripedes-70b时自动生成的,包含61个配置,每个配置对应一个评估任务。数据集由1次运行生成,每次运行的结果存储为特定配置中的一个分割,分割名称使用运行的时间戳。train分割始终指向最新的结果。此外,还有一个名为results的配置,存储了所有运行的聚合结果,用于在Open LLM Leaderboard上计算和显示聚合指标。
提供机构:
open-llm-leaderboard
原始信息汇总

数据集概述

该数据集是在对模型 lloorree/kssht-euripedes-70b 进行评估运行期间自动创建的,用于 Open LLM Leaderboard

数据集组成

  • 数据集包含 61 个配置,每个配置对应一个评估任务。
  • 数据集从 1 次运行中创建,每个运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。
  • "train" 分割始终指向最新的结果。
  • 一个额外的配置 "results" 存储所有运行的聚合结果,用于计算和显示 Open LLM Leaderboard 上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_lloorree__kssht-euripedes-70b", "harness_truthfulqa_mc_0", split="train")

最新结果

以下是 2023-09-19T00:12:39.048571 运行的最新结果

python { "all": { "acc": 0.7032771782081723, "acc_stderr": 0.030834102504125972, "acc_norm": 0.70714084898032, "acc_norm_stderr": 0.030804015376568177, "mc1": 0.3953488372093023, "mc1_stderr": 0.017115815632418197, "mc2": 0.5551008582453495, "mc2_stderr": 0.014893190834168417 }, "harness|arc:challenge|25": { "acc": 0.658703071672355, "acc_stderr": 0.013855831287497723, "acc_norm": 0.6979522184300341, "acc_norm_stderr": 0.013417519144716413 }, "harness|hellaswag|10": { "acc": 0.6872137024497113, "acc_stderr": 0.004626805906522211, "acc_norm": 0.8759211312487553, "acc_norm_stderr": 0.0032899775233939097 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.29, "acc_stderr": 0.04560480215720684, "acc_norm": 0.29, "acc_norm_stderr": 0.04560480215720684 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.6592592592592592, "acc_stderr": 0.04094376269996794, "acc_norm": 0.6592592592592592, "acc_norm_stderr": 0.04094376269996794 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.8092105263157895, "acc_stderr": 0.031975658210325, "acc_norm": 0.8092105263157895, "acc_norm_stderr": 0.031975658210325 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.77, "acc_stderr": 0.04229525846816505, "acc_norm": 0.77, "acc_norm_stderr": 0.04229525846816505 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.720754716981132, "acc_stderr": 0.027611163402399715, "acc_norm": 0.720754716981132, "acc_norm_stderr": 0.027611163402399715 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.8333333333333334, "acc_stderr": 0.031164899666948617, "acc_norm": 0.8333333333333334, "acc_norm_stderr": 0.031164899666948617 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.48, "acc_stderr": 0.050211673156867795, "acc_norm": 0.48, "acc_norm_stderr": 0.050211673156867795 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.57, "acc_stderr": 0.049756985195624284, "acc_norm": 0.57, "acc_norm_stderr": 0.049756985195624284 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.4, "acc_stderr": 0.049236596391733084, "acc_norm": 0.4, "acc_norm_stderr": 0.049236596391733084 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.653179190751445, "acc_stderr": 0.036291466701596636, "acc_norm": 0.653179190751445, "acc_norm_stderr": 0.036291466701596636 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.39215686274509803, "acc_stderr": 0.04858083574266345, "acc_norm": 0.39215686274509803, "acc_norm_stderr": 0.04858083574266345 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.77, "acc_stderr": 0.04229525846816506, "acc_norm": 0.77, "acc_norm_stderr": 0.04229525846816506 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.6723404255319149, "acc_stderr": 0.030683020843231004, "acc_norm": 0.6723404255319149, "acc_norm_stderr": 0.030683020843231004 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.4649122807017544, "acc_stderr": 0.04692008381368909, "acc_norm": 0.4649122807017544, "acc_norm_stderr": 0.04692008381368909 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.6620689655172414, "acc_stderr": 0.039417076320648906, "acc_norm": 0.6620689655172414, "acc_norm_stderr": 0.039417076320648906 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.4444444444444444, "acc_stderr": 0.02559185776138218, "acc_norm": 0.4444444444444444, "acc_norm_stderr": 0.02559185776138218 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.49206349206349204, "acc_stderr": 0.044715725362943486, "acc_norm": 0.49206349206349204, "acc_norm_stderr": 0.044715725362943486 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.53, "acc_stderr": 0.050161355804659205, "acc_norm": 0.53, "acc_norm_stderr": 0.050161355804659205 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.8129032258064516, "acc_stderr": 0.02218571009225225, "acc_norm": 0.8129032258064516, "acc_norm_stderr": 0.02218571009225225 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.5566502463054187, "acc_stderr": 0.03495334582162933, "acc_norm": 0.5566502463054187, "acc_norm_stderr": 0.03495334582162933 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.76, "acc_stderr": 0.04292346959909281, "acc_norm": 0.76, "acc_norm_stderr": 0.04292346959909281 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.8363636363636363, "acc_stderr": 0.02888787239548795, "acc_norm": 0.8363636363636363, "acc_norm_stderr": 0.02888787239548795 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.8686868686868687, "acc_stderr": 0.02406315641682252, "acc_norm": 0.8686868686868687, "acc_norm_stderr": 0.02406315641682252 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.9222797927461139, "acc_stderr": 0.019321805557223157, "acc_norm": 0.9222797927461139, "acc_norm_stderr": 0.019321805557223157 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.7230769230769231, "acc_stderr": 0.022688042352424994, "acc_norm": 0.7230769230769231, "acc_norm_stderr": 0.022688042352424994 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.337037037037037, "acc_stderr": 0.028820884666253255, "acc_norm": 0.337037037037037, "acc

二维码
社区交流群
二维码
科研交流群
商业服务