five

open-llm-leaderboard-old/details_AA051611__A0118

收藏
Hugging Face2024-01-18 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_AA051611__A0118
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在模型AA051611/A0118的评估运行期间自动创建的,用于Open LLM Leaderboard的评估。数据集由63个配置组成,每个配置对应一个评估任务。数据集由2次运行生成,每次运行都作为特定配置中的一个分割,分割名称使用运行的时间戳。train分割始终指向最新的结果。此外,还有一个名为results的配置,存储了所有运行的聚合结果,并用于计算和显示Open LLM Leaderboard上的聚合指标。

该数据集是在模型AA051611/A0118的评估运行期间自动创建的,用于Open LLM Leaderboard的评估。数据集由63个配置组成,每个配置对应一个评估任务。数据集由2次运行生成,每次运行都作为特定配置中的一个分割,分割名称使用运行的时间戳。train分割始终指向最新的结果。此外,还有一个名为results的配置,存储了所有运行的聚合结果,并用于计算和显示Open LLM Leaderboard上的聚合指标。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

数据集来源

该数据集是在模型 AA051611/A0118Open LLM Leaderboard 上的评估运行期间自动创建的。

数据集结构

数据集包含 63 个配置,每个配置对应一个评估任务。数据集从 2 次运行中创建,每次运行可以在每个配置中作为一个特定的分割找到,分割名称使用运行的时间戳。"train" 分割始终指向最新的结果。

额外配置

一个额外的配置 "results" 存储了所有运行的聚合结果,用于计算和显示在 Open LLM Leaderboard 上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_AA051611__A0118", "harness_winogrande_5", split="train")

最新结果

这些是最新的结果,来自 2024-01-18T23:48:21.810095 的运行: python { "all": { "acc": 0.6750935567286499, "acc_stderr": 0.03150224444254494, "acc_norm": 0.6839013238259298, "acc_norm_stderr": 0.03214560635872275, "mc1": 0.390452876376989, "mc1_stderr": 0.01707823074343144, "mc2": 0.5579325936654852, "mc2_stderr": 0.015526306494139296 }, "harness|arc:challenge|25": { "acc": 0.5691126279863481, "acc_stderr": 0.014471133392642476, "acc_norm": 0.5921501706484642, "acc_norm_stderr": 0.0143610972884497 }, "harness|hellaswag|10": { "acc": 0.6517625970922127, "acc_stderr": 0.004754380554929216, "acc_norm": 0.8378809002190799, "acc_norm_stderr": 0.0036780679944244557 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.37, "acc_stderr": 0.048523658709391, "acc_norm": 0.37, "acc_norm_stderr": 0.048523658709391 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.6, "acc_stderr": 0.04232073695151589, "acc_norm": 0.6, "acc_norm_stderr": 0.04232073695151589 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.7960526315789473, "acc_stderr": 0.0327900040631005, "acc_norm": 0.7960526315789473, "acc_norm_stderr": 0.0327900040631005 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.72, "acc_stderr": 0.04512608598542128, "acc_norm": 0.72, "acc_norm_stderr": 0.04512608598542128 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.7245283018867924, "acc_stderr": 0.027495663683724053, "acc_norm": 0.7245283018867924, "acc_norm_stderr": 0.027495663683724053 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.7847222222222222, "acc_stderr": 0.03437079344106135, "acc_norm": 0.7847222222222222, "acc_norm_stderr": 0.03437079344106135 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.51, "acc_stderr": 0.05024183937956912, "acc_norm": 0.51, "acc_norm_stderr": 0.05024183937956912 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.55, "acc_stderr": 0.049999999999999996, "acc_norm": 0.55, "acc_norm_stderr": 0.049999999999999996 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.41, "acc_stderr": 0.049431107042371025, "acc_norm": 0.41, "acc_norm_stderr": 0.049431107042371025 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.6647398843930635, "acc_stderr": 0.03599586301247078, "acc_norm": 0.6647398843930635, "acc_norm_stderr": 0.03599586301247078 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.45098039215686275, "acc_stderr": 0.04951218252396264, "acc_norm": 0.45098039215686275, "acc_norm_stderr": 0.04951218252396264 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.79, "acc_stderr": 0.04093601807403326, "acc_norm": 0.79, "acc_norm_stderr": 0.04093601807403326 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.6978723404255319, "acc_stderr": 0.030017554471880557, "acc_norm": 0.6978723404255319, "acc_norm_stderr": 0.030017554471880557 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.5175438596491229, "acc_stderr": 0.04700708033551038, "acc_norm": 0.5175438596491229, "acc_norm_stderr": 0.04700708033551038 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.6827586206896552, "acc_stderr": 0.03878352372138622, "acc_norm": 0.6827586206896552, "acc_norm_stderr": 0.03878352372138622 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.5952380952380952, "acc_stderr": 0.025279850397404904, "acc_norm": 0.5952380952380952, "acc_norm_stderr": 0.025279850397404904 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.5396825396825397, "acc_stderr": 0.04458029125470973, "acc_norm": 0.5396825396825397, "acc_norm_stderr": 0.04458029125470973 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.47, "acc_stderr": 0.05016135580465919, "acc_norm": 0.47, "acc_norm_stderr": 0.05016135580465919 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.8161290322580645, "acc_stderr": 0.02203721734026782, "acc_norm": 0.8161290322580645, "acc_norm_stderr": 0.02203721734026782 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.5862068965517241, "acc_stderr": 0.03465304488406795, "acc_norm": 0.5862068965517241, "acc_norm_stderr": 0.03465304488406795 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.74, "acc_stderr": 0.0440844002276808, "acc_norm": 0.74, "acc_norm_stderr": 0.0440844002276808 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.7212121212121212, "acc_stderr": 0.03501438706296781, "acc_norm": 0.7212121212121212, "acc_norm_stderr": 0.03501438706296781 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.8939393939393939, "acc_stderr": 0.021938047738853137, "acc_norm": 0.8939393939393939, "acc_norm_stderr": 0.021938047738853137 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.9067357512953368, "acc_stderr": 0.020986854593289733, "acc_norm": 0.9067357512953368, "acc_norm_stderr": 0.020986854593289733 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.7256410256410256, "acc_stderr": 0.022622765767493214, "acc_norm": 0.7256410256410256, "acc_norm_stderr": 0.022622765767493214 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.3851851851851852, "acc_stderr": 0.029670906124630882, "acc_norm": 0.3851851851851852, "acc_norm_stderr": 0.029670906124630882 }, "harness|hendrycksTest-high_school_microeconomics|5": { "acc": 0.7521008403361344, "acc_stderr": 0.028047967224176896, "acc_norm": 0.

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作