five

open-llm-leaderboard-old/details_oh-yeontaek__llama-2-70B-LoRA-assemble-v3

收藏
Hugging Face2023-09-15 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_oh-yeontaek__llama-2-70B-LoRA-assemble-v3
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在Open LLM Leaderboard上对模型oh-yeontaek/llama-2-70B-LoRA-assemble-v3进行评估时自动生成的。数据集包含61个配置,每个配置对应一个评估任务。数据集由一次运行生成,每次运行在每个配置中表示为特定的分割,分割名称使用运行的时间戳。train分割始终指向最新结果。此外,results配置存储了运行的所有聚合结果,用于计算和显示Open LLM Leaderboard上的聚合指标。README还提供了一个Python代码片段来加载数据集,并详细说明了特定运行的最新结果。

该数据集是在Open LLM Leaderboard上对模型oh-yeontaek/llama-2-70B-LoRA-assemble-v3进行评估时自动生成的。数据集包含61个配置,每个配置对应一个评估任务。数据集由一次运行生成,每次运行在每个配置中表示为特定的分割,分割名称使用运行的时间戳。train分割始终指向最新结果。此外,results配置存储了运行的所有聚合结果,用于计算和显示Open LLM Leaderboard上的聚合指标。README还提供了一个Python代码片段来加载数据集,并详细说明了特定运行的最新结果。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

该数据集是在对模型 oh-yeontaek/llama-2-70B-LoRA-assemble-v3 进行评估运行期间自动创建的,用于 Open LLM Leaderboard

数据集组成

  • 数据集包含 61 个配置,每个配置对应一个评估任务。
  • 数据集从 1 次运行中创建,每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。
  • "train" 分割始终指向最新结果。
  • 额外的 "results" 配置存储所有运行的聚合结果,用于计算和显示 Open LLM Leaderboard 上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_oh-yeontaek__llama-2-70B-LoRA-assemble-v3", "harness_truthfulqa_mc_0", split="train")

最新结果

以下是 2023-09-15T17:36:30.757691 运行的最新结果

python { "all": { "acc": 0.6985803552112708, "acc_stderr": 0.03118492094070661, "acc_norm": 0.7024274155828159, "acc_norm_stderr": 0.031154550420018332, "mc1": 0.47980416156670747, "mc1_stderr": 0.01748921684973705, "mc2": 0.658093697491632, "mc2_stderr": 0.014747866760131165 }, "harness|arc:challenge|25": { "acc": 0.6860068259385665, "acc_stderr": 0.013562691224726291, "acc_norm": 0.7209897610921502, "acc_norm_stderr": 0.013106784883601334 }, "harness|hellaswag|10": { "acc": 0.6820354511053575, "acc_stderr": 0.004647338877642188, "acc_norm": 0.8740290778729337, "acc_norm_stderr": 0.0033113844981586464 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.4, "acc_stderr": 0.049236596391733084, "acc_norm": 0.4, "acc_norm_stderr": 0.049236596391733084 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.6370370370370371, "acc_stderr": 0.041539484047424, "acc_norm": 0.6370370370370371, "acc_norm_stderr": 0.041539484047424 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.7828947368421053, "acc_stderr": 0.03355045304882924, "acc_norm": 0.7828947368421053, "acc_norm_stderr": 0.03355045304882924 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.76, "acc_stderr": 0.04292346959909284, "acc_norm": 0.76, "acc_norm_stderr": 0.04292346959909284 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.7547169811320755, "acc_stderr": 0.026480357179895695, "acc_norm": 0.7547169811320755, "acc_norm_stderr": 0.026480357179895695 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.8194444444444444, "acc_stderr": 0.03216600808802267, "acc_norm": 0.8194444444444444, "acc_norm_stderr": 0.03216600808802267 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.48, "acc_stderr": 0.050211673156867795, "acc_norm": 0.48, "acc_norm_stderr": 0.050211673156867795 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.62, "acc_stderr": 0.04878317312145632, "acc_norm": 0.62, "acc_norm_stderr": 0.04878317312145632 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.38, "acc_stderr": 0.048783173121456316, "acc_norm": 0.38, "acc_norm_stderr": 0.048783173121456316 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.6647398843930635, "acc_stderr": 0.03599586301247077, "acc_norm": 0.6647398843930635, "acc_norm_stderr": 0.03599586301247077 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.3431372549019608, "acc_stderr": 0.04724007352383888, "acc_norm": 0.3431372549019608, "acc_norm_stderr": 0.04724007352383888 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.74, "acc_stderr": 0.04408440022768078, "acc_norm": 0.74, "acc_norm_stderr": 0.04408440022768078 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.676595744680851, "acc_stderr": 0.03057944277361034, "acc_norm": 0.676595744680851, "acc_norm_stderr": 0.03057944277361034 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.4649122807017544, "acc_stderr": 0.04692008381368909, "acc_norm": 0.4649122807017544, "acc_norm_stderr": 0.04692008381368909 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.6413793103448275, "acc_stderr": 0.03996629574876719, "acc_norm": 0.6413793103448275, "acc_norm_stderr": 0.03996629574876719 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.47354497354497355, "acc_stderr": 0.025715239811346758, "acc_norm": 0.47354497354497355, "acc_norm_stderr": 0.025715239811346758 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.49206349206349204, "acc_stderr": 0.044715725362943486, "acc_norm": 0.49206349206349204, "acc_norm_stderr": 0.044715725362943486 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.49, "acc_stderr": 0.05024183937956912, "acc_norm": 0.49, "acc_norm_stderr": 0.05024183937956912 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.8193548387096774, "acc_stderr": 0.02188617856717253, "acc_norm": 0.8193548387096774, "acc_norm_stderr": 0.02188617856717253 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.541871921182266, "acc_stderr": 0.03505630140785741, "acc_norm": 0.541871921182266, "acc_norm_stderr": 0.03505630140785741 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.79, "acc_stderr": 0.040936018074033256, "acc_norm": 0.79, "acc_norm_stderr": 0.040936018074033256 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.8484848484848485, "acc_stderr": 0.027998073798781675, "acc_norm": 0.8484848484848485, "acc_norm_stderr": 0.027998073798781675 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.8888888888888888, "acc_stderr": 0.022390787638216763, "acc_norm": 0.8888888888888888, "acc_norm_stderr": 0.022390787638216763 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.927461139896373, "acc_stderr": 0.018718998520678178, "acc_norm": 0.927461139896373, "acc_norm_stderr": 0.018718998520678178 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.6974358974358974, "acc_stderr": 0.02329088805377272, "acc_norm": 0.6974358974358974, "acc_norm_stderr": 0.02329088805377272 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.32592592592592595, "acc_stderr": 0.028578348365473072, "acc_norm": 0.32592592592592595, "acc_norm_

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作