five

open-llm-leaderboard-old/details_DrNicefellow__Mistral-7-from-Mixtral-8x7B-v0.1

收藏
Hugging Face2024-04-15 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_DrNicefellow__Mistral-7-from-Mixtral-8x7B-v0.1
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在Open LLM Leaderboard上对模型DrNicefellow/Mistral-7-from-Mixtral-8x7B-v0.1进行评估时自动创建的。数据集由63个配置组成,每个配置对应一个评估任务。数据集是从1次运行中创建的,每次运行都作为每个配置中的一个特定分割存储,分割名称使用运行的时间戳。train分割始终指向最新的结果。一个额外的配置results存储了运行的所有聚合结果,用于计算和显示Open LLM Leaderboard上的聚合指标。README还提供了一个使用Python中的datasets库加载运行细节的示例。

该数据集是在Open LLM Leaderboard上对模型DrNicefellow/Mistral-7-from-Mixtral-8x7B-v0.1进行评估时自动创建的。数据集由63个配置组成,每个配置对应一个评估任务。数据集是从1次运行中创建的,每次运行都作为每个配置中的一个特定分割存储,分割名称使用运行的时间戳。train分割始终指向最新的结果。一个额外的配置results存储了运行的所有聚合结果,用于计算和显示Open LLM Leaderboard上的聚合指标。README还提供了一个使用Python中的datasets库加载运行细节的示例。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

数据集来源

该数据集是在对模型 DrNicefellow/Mistral-7-from-Mixtral-8x7B-v0.1 进行评估时自动创建的,评估结果展示在 Open LLM Leaderboard 上。

数据集结构

  • 配置数量:63个配置,每个配置对应一个评估任务。
  • 数据来源:数据集从1次运行中创建,每个运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。"train" 分割始终指向最新的结果。
  • 额外配置:"results" 配置存储所有运行的聚合结果,用于计算和显示在 Open LLM Leaderboard 上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_DrNicefellow__Mistral-7-from-Mixtral-8x7B-v0.1", "harness_winogrande_5", split="train")

最新结果

以下是 2024-04-15T19:37:06.260208 运行的最新结果

python { "all": { "acc": 0.25049519970650264, "acc_stderr": 0.030515149972395742, "acc_norm": 0.25213482744715937, "acc_norm_stderr": 0.03133280465134686, "mc1": 0.24357405140758873, "mc1_stderr": 0.01502635482491078, "mc2": 0.4853989711502455, "mc2_stderr": 0.016173294661967198 }, "harness|arc:challenge|25": { "acc": 0.2175767918088737, "acc_stderr": 0.012057262020972506, "acc_norm": 0.2909556313993174, "acc_norm_stderr": 0.013273077865907576 }, "harness|hellaswag|10": { "acc": 0.25791674965146383, "acc_stderr": 0.0043659384072096095, "acc_norm": 0.2656841266679944, "acc_norm_stderr": 0.00440794105887496 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.23, "acc_stderr": 0.04229525846816506, "acc_norm": 0.23, "acc_norm_stderr": 0.04229525846816506 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.3333333333333333, "acc_stderr": 0.04072314811876837, "acc_norm": 0.3333333333333333, "acc_norm_stderr": 0.04072314811876837 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.23026315789473684, "acc_stderr": 0.03426059424403165, "acc_norm": 0.23026315789473684, "acc_norm_stderr": 0.03426059424403165 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.18, "acc_stderr": 0.03861229196653697, "acc_norm": 0.18, "acc_norm_stderr": 0.03861229196653697 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.2188679245283019, "acc_stderr": 0.02544786382510861, "acc_norm": 0.2188679245283019, "acc_norm_stderr": 0.02544786382510861 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.2569444444444444, "acc_stderr": 0.03653946969442099, "acc_norm": 0.2569444444444444, "acc_norm_stderr": 0.03653946969442099 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.22, "acc_stderr": 0.04163331998932269, "acc_norm": 0.22, "acc_norm_stderr": 0.04163331998932269 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.3, "acc_stderr": 0.046056618647183814, "acc_norm": 0.3, "acc_norm_stderr": 0.046056618647183814 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.25, "acc_stderr": 0.04351941398892446, "acc_norm": 0.25, "acc_norm_stderr": 0.04351941398892446 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.24277456647398843, "acc_stderr": 0.0326926380614177, "acc_norm": 0.24277456647398843, "acc_norm_stderr": 0.0326926380614177 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.21568627450980393, "acc_stderr": 0.04092563958237654, "acc_norm": 0.21568627450980393, "acc_norm_stderr": 0.04092563958237654 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.23, "acc_stderr": 0.04229525846816506, "acc_norm": 0.23, "acc_norm_stderr": 0.04229525846816506 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.2, "acc_stderr": 0.0261488180184245, "acc_norm": 0.2, "acc_norm_stderr": 0.0261488180184245 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.2719298245614035, "acc_stderr": 0.04185774424022056, "acc_norm": 0.2719298245614035, "acc_norm_stderr": 0.04185774424022056 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.27586206896551724, "acc_stderr": 0.037245636197746325, "acc_norm": 0.27586206896551724, "acc_norm_stderr": 0.037245636197746325 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.291005291005291, "acc_stderr": 0.02339382650048487, "acc_norm": 0.291005291005291, "acc_norm_stderr": 0.02339382650048487 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.12698412698412698, "acc_stderr": 0.029780417522688434, "acc_norm": 0.12698412698412698, "acc_norm_stderr": 0.029780417522688434 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.18, "acc_stderr": 0.038612291966536934, "acc_norm": 0.18, "acc_norm_stderr": 0.038612291966536934 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.24838709677419354, "acc_stderr": 0.024580028921481003, "acc_norm": 0.24838709677419354, "acc_norm_stderr": 0.024580028921481003 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.28078817733990147, "acc_stderr": 0.03161856335358609, "acc_norm": 0.28078817733990147, "acc_norm_stderr": 0.03161856335358609 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.29, "acc_stderr": 0.045604802157206845, "acc_norm": 0.29, "acc_norm_stderr": 0.045604802157206845 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.21818181818181817, "acc_stderr": 0.03225078108306289, "acc_norm": 0.21818181818181817, "acc_norm_stderr": 0.03225078108306289 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.2474747474747475, "acc_stderr": 0.030746300742124495, "acc_norm": 0.2474747474747475, "acc_norm_stderr": 0.030746300742124495 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.22279792746113988, "acc_stderr": 0.030031147977641545, "acc_norm": 0.22279792746113988, "acc_norm_stderr": 0.030031147977641545 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.22564102564102564, "acc_stderr": 0.021193632525148543, "acc_norm": 0.22564102564102564, "acc_norm_stderr": 0.021193632525148543 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.23333333333333334, "acc_stderr": 0.025787874220959312,

二维码
社区交流群
二维码
科研交流群
商业服务