five

open-llm-leaderboard-old/details_TheBloke__wizardLM-7B-HF

收藏
Hugging Face2023-08-27 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_TheBloke__wizardLM-7B-HF
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在Open LLM Leaderboard上对模型TheBloke/wizardLM-7B-HF进行评估时自动创建的。数据集由61个配置组成,每个配置对应一个评估任务。数据集从1次运行中创建,每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。train分割始终指向最新的结果。此外,results配置存储了所有运行的聚合结果,并用于计算和显示Open LLM Leaderboard上的聚合指标。

该数据集是在Open LLM Leaderboard上对模型TheBloke/wizardLM-7B-HF进行评估时自动创建的。数据集由61个配置组成,每个配置对应一个评估任务。数据集从1次运行中创建,每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。train分割始终指向最新的结果。此外,results配置存储了所有运行的聚合结果,并用于计算和显示Open LLM Leaderboard上的聚合指标。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

数据集名称

Evaluation run of TheBloke/wizardLM-7B-HF

数据集描述

该数据集是在对模型 TheBloke/wizardLM-7B-HF 进行评估运行期间自动创建的,用于 Open LLM Leaderboard

数据集组成

数据集由 61 个配置组成,每个配置对应一个评估任务。数据集从 1 次运行中创建,每个运行可以在每个配置中作为一个特定的分片找到,分片名称使用运行的时间戳。"train" 分片总是指向最新的结果。

额外配置

一个额外的配置 "results" 存储了所有运行的聚合结果,用于计算和显示 Open LLM Leaderboard 上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_TheBloke__wizardLM-7B-HF", "harness_truthfulqa_mc_0", split="train")

最新结果

这些是最新的结果,来自 2023-07-18T11:33:18.439367 的运行。

python { "all": { "acc": 0.38566819917906325, "acc_stderr": 0.03482242619787474, "acc_norm": 0.3891088361419288, "acc_norm_stderr": 0.03481173503822327, "mc1": 0.31456548347613217, "mc1_stderr": 0.01625524199317919, "mc2": 0.45584096136441793, "mc2_stderr": 0.016028055350830416 }, "harness|arc:challenge|25": { "acc": 0.48464163822525597, "acc_stderr": 0.014604496129394913, "acc_norm": 0.5034129692832765, "acc_norm_stderr": 0.014611050403244081 }, "harness|hellaswag|10": { "acc": 0.5685122485560645, "acc_stderr": 0.004942716091996078, "acc_norm": 0.7527384983071101, "acc_norm_stderr": 0.004305383398710189 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.35, "acc_stderr": 0.0479372485441102, "acc_norm": 0.35, "acc_norm_stderr": 0.0479372485441102 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.43703703703703706, "acc_stderr": 0.042849586397534, "acc_norm": 0.43703703703703706, "acc_norm_stderr": 0.042849586397534 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.40131578947368424, "acc_stderr": 0.03988903703336284, "acc_norm": 0.40131578947368424, "acc_norm_stderr": 0.03988903703336284 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.48, "acc_stderr": 0.050211673156867795, "acc_norm": 0.48, "acc_norm_stderr": 0.050211673156867795 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.4377358490566038, "acc_stderr": 0.03053333843046751, "acc_norm": 0.4377358490566038, "acc_norm_stderr": 0.03053333843046751 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.3680555555555556, "acc_stderr": 0.04032999053960719, "acc_norm": 0.3680555555555556, "acc_norm_stderr": 0.04032999053960719 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.24, "acc_stderr": 0.04292346959909283, "acc_norm": 0.24, "acc_norm_stderr": 0.04292346959909283 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.29, "acc_stderr": 0.045604802157206845, "acc_norm": 0.29, "acc_norm_stderr": 0.045604802157206845 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.23, "acc_stderr": 0.04229525846816506, "acc_norm": 0.23, "acc_norm_stderr": 0.04229525846816506 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.35260115606936415, "acc_stderr": 0.036430371689585475, "acc_norm": 0.35260115606936415, "acc_norm_stderr": 0.036430371689585475 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.22549019607843138, "acc_stderr": 0.041583075330832865, "acc_norm": 0.22549019607843138, "acc_norm_stderr": 0.041583075330832865 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.49, "acc_stderr": 0.05024183937956911, "acc_norm": 0.49, "acc_norm_stderr": 0.05024183937956911 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.4, "acc_stderr": 0.03202563076101735, "acc_norm": 0.4, "acc_norm_stderr": 0.03202563076101735 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.2631578947368421, "acc_stderr": 0.04142439719489362, "acc_norm": 0.2631578947368421, "acc_norm_stderr": 0.04142439719489362 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.32413793103448274, "acc_stderr": 0.03900432069185555, "acc_norm": 0.32413793103448274, "acc_norm_stderr": 0.03900432069185555 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.30687830687830686, "acc_stderr": 0.02375292871211214, "acc_norm": 0.30687830687830686, "acc_norm_stderr": 0.02375292871211214 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.25396825396825395, "acc_stderr": 0.03893259610604675, "acc_norm": 0.25396825396825395, "acc_norm_stderr": 0.03893259610604675 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.32, "acc_stderr": 0.046882617226215034, "acc_norm": 0.32, "acc_norm_stderr": 0.046882617226215034 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.36129032258064514, "acc_stderr": 0.02732754844795754, "acc_norm": 0.36129032258064514, "acc_norm_stderr": 0.02732754844795754 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.30049261083743845, "acc_stderr": 0.03225799476233484, "acc_norm": 0.30049261083743845, "acc_norm_stderr": 0.03225799476233484 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.34, "acc_stderr": 0.04760952285695235, "acc_norm": 0.34, "acc_norm_stderr": 0.04760952285695235 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.45454545454545453, "acc_stderr": 0.03888176921674099, "acc_norm": 0.45454545454545453, "acc_norm_stderr": 0.03888176921674099 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.42424242424242425, "acc_stderr": 0.03521224908841583, "acc_norm": 0.42424242424242425, "acc_norm_stderr": 0.03521224908841583 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.46632124352331605, "acc_stderr": 0.03600244069867178, "acc_norm": 0.46632124352331605, "acc_norm_stderr": 0.03600244069867178 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.35384615384615387, "acc_stderr": 0.024243783994062164, "acc_norm": 0.35384615384615387, "acc_norm_stderr": 0.024243783994062164 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.24444444444444444, "acc_stderr": 0.026202766534652148, "acc_norm": 0.24444444444444444, "acc_norm_stderr": 0.026202766534652148 }, "harness|hendrycksTest-high_school_microeconomics|5": { "acc": 0.327

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作