five

open-llm-leaderboard/details_quantumaikr__open_llama_7b_hf

收藏
Hugging Face2023-08-27 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard/details_quantumaikr__open_llama_7b_hf
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在评估模型quantumaikr/open_llama_7b_hf时自动创建的,用于Open LLM Leaderboard的评估。数据集包含61个配置,每个配置对应一个评估任务。数据集由1次运行生成,每次运行的结果作为特定配置中的一个分割,分割名称使用运行的时间戳。train分割始终指向最新的结果。此外,results配置存储了所有运行的聚合结果,用于计算和显示Open LLM Leaderboard上的聚合指标。
提供机构:
open-llm-leaderboard
原始信息汇总

数据集概述

数据集简介

该数据集是在评估模型 quantumaikr/open_llama_7b_hf 的过程中自动创建的,用于 Open LLM Leaderboard

数据集组成

数据集包含 61 个配置,每个配置对应一个评估任务。数据集从 1 次运行中创建,每个运行可以在每个配置中找到特定的分片,分片名称使用运行的时间戳。"train" 分片始终指向最新的结果。

额外配置

一个额外的配置 "results" 存储了所有运行的聚合结果,用于计算和显示 Open LLM Leaderboard 上的聚合指标。

加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_quantumaikr__open_llama_7b_hf", "harness_truthfulqa_mc_0", split="train")

最新结果

以下是 2023-07-19T17:01:48.631436 运行的最新结果

python { "all": { "acc": 0.2648279332452004, "acc_stderr": 0.03195749858994142, "acc_norm": 0.26548960439125074, "acc_norm_stderr": 0.03196726632461042, "mc1": 0.23133414932680538, "mc1_stderr": 0.014761945174862661, "mc2": 0.4954484536663258, "mc2_stderr": 0.016312743256662564 }, "harness|arc:challenge|25": { "acc": 0.23293515358361774, "acc_stderr": 0.012352507042617391, "acc_norm": 0.2645051194539249, "acc_norm_stderr": 0.012889272949313366 }, "harness|hellaswag|10": { "acc": 0.26199960167297354, "acc_stderr": 0.004388237557526716, "acc_norm": 0.26946823341963755, "acc_norm_stderr": 0.004427767996301633 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.3, "acc_stderr": 0.046056618647183814, "acc_norm": 0.3, "acc_norm_stderr": 0.046056618647183814 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.31851851851851853, "acc_stderr": 0.040247784019771096, "acc_norm": 0.31851851851851853, "acc_norm_stderr": 0.040247784019771096 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.2894736842105263, "acc_stderr": 0.036906779861372814, "acc_norm": 0.2894736842105263, "acc_norm_stderr": 0.036906779861372814 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.21, "acc_stderr": 0.040936018074033256, "acc_norm": 0.21, "acc_norm_stderr": 0.040936018074033256 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.2490566037735849, "acc_stderr": 0.026616482980501704, "acc_norm": 0.2490566037735849, "acc_norm_stderr": 0.026616482980501704 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.2708333333333333, "acc_stderr": 0.03716177437566016, "acc_norm": 0.2708333333333333, "acc_norm_stderr": 0.03716177437566016 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.34, "acc_stderr": 0.04760952285695236, "acc_norm": 0.34, "acc_norm_stderr": 0.04760952285695236 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.27, "acc_stderr": 0.0446196043338474, "acc_norm": 0.27, "acc_norm_stderr": 0.0446196043338474 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.26, "acc_stderr": 0.04408440022768079, "acc_norm": 0.26, "acc_norm_stderr": 0.04408440022768079 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.23699421965317918, "acc_stderr": 0.03242414757483098, "acc_norm": 0.23699421965317918, "acc_norm_stderr": 0.03242414757483098 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.4019607843137255, "acc_stderr": 0.04878608714466996, "acc_norm": 0.4019607843137255, "acc_norm_stderr": 0.04878608714466996 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.16, "acc_stderr": 0.03684529491774708, "acc_norm": 0.16, "acc_norm_stderr": 0.03684529491774708 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.2425531914893617, "acc_stderr": 0.028020226271200214, "acc_norm": 0.2425531914893617, "acc_norm_stderr": 0.028020226271200214 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.2807017543859649, "acc_stderr": 0.042270544512322, "acc_norm": 0.2807017543859649, "acc_norm_stderr": 0.042270544512322 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.23448275862068965, "acc_stderr": 0.035306258743465914, "acc_norm": 0.23448275862068965, "acc_norm_stderr": 0.035306258743465914 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.25132275132275134, "acc_stderr": 0.022340482339643898, "acc_norm": 0.25132275132275134, "acc_norm_stderr": 0.022340482339643898 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.2222222222222222, "acc_stderr": 0.037184890068181146, "acc_norm": 0.2222222222222222, "acc_norm_stderr": 0.037184890068181146 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.32, "acc_stderr": 0.046882617226215034, "acc_norm": 0.32, "acc_norm_stderr": 0.046882617226215034 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.3161290322580645, "acc_stderr": 0.02645087448904277, "acc_norm": 0.3161290322580645, "acc_norm_stderr": 0.02645087448904277 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.31527093596059114, "acc_stderr": 0.03269080871970186, "acc_norm": 0.31527093596059114, "acc_norm_stderr": 0.03269080871970186 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.23, "acc_stderr": 0.04229525846816506, "acc_norm": 0.23, "acc_norm_stderr": 0.04229525846816506 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.24848484848484848, "acc_stderr": 0.03374402644139405, "acc_norm": 0.24848484848484848, "acc_norm_stderr": 0.03374402644139405 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.31313131313131315, "acc_stderr": 0.033042050878136525, "acc_norm": 0.31313131313131315, "acc_norm_stderr": 0.033042050878136525 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.3005181347150259, "acc_stderr": 0.0330881859441575, "acc_norm": 0.3005181347150259, "acc_norm_stderr": 0.0330881859441575 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.2128205128205128, "acc_stderr": 0.020752423722128013, "acc_norm": 0.2128205128205128, "acc_norm_stderr": 0.020752423722128013 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.25555555555555554, "acc_stderr": 0.02659393910184408, "acc_norm": 0.25555555555555554, "acc_norm_stderr":

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作