five

open-llm-leaderboard-old/details_Josephgflowers__Tinyllama-Cinder-1.3B-Reason-Test.2

收藏
Hugging Face2024-01-29 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_Josephgflowers__Tinyllama-Cinder-1.3B-Reason-Test.2
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在模型Josephgflowers/Tinyllama-Cinder-1.3B-Reason-Test.2的评估运行期间自动创建的,用于Open LLM Leaderboard的评估。数据集包含63个配置,每个配置对应一个评估任务。数据集由1次运行生成,每次运行的结果作为特定的分割存储在配置中,分割名称使用运行的时间戳。此外,数据集还包含一个名为"results"的配置,用于存储所有运行的聚合结果。

该数据集是在模型Josephgflowers/Tinyllama-Cinder-1.3B-Reason-Test.2的评估运行期间自动创建的,用于Open LLM Leaderboard的评估。数据集包含63个配置,每个配置对应一个评估任务。数据集由1次运行生成,每次运行的结果作为特定的分割存储在配置中,分割名称使用运行的时间戳。此外,数据集还包含一个名为"results"的配置,用于存储所有运行的聚合结果。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

该数据集是在评估模型 Josephgflowers/Tinyllama-Cinder-1.3B-Reason-Test.2Open LLM Leaderboard 上的运行过程中自动创建的。

数据集组成

  • 数据集包含 63 个配置,每个配置对应一个评估任务。
  • 数据集从 1 次运行中创建,每个运行可以在每个配置中找到一个特定的分割,分割名称使用运行的时间戳。
  • "train" 分割总是指向最新的结果。
  • 一个额外的配置 "results" 存储所有运行的聚合结果,用于计算和显示 Open LLM Leaderboard 上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_Josephgflowers__Tinyllama-Cinder-1.3B-Reason-Test.2", "harness_winogrande_5", split="train")

最新结果

以下是 2024-01-29T01:07:57.572756 运行的最新结果

python { "all": { "acc": 0.251327576424007, "acc_stderr": 0.030372163539921712, "acc_norm": 0.251062888632938, "acc_norm_stderr": 0.03108831080407431, "mc1": 0.24357405140758873, "mc1_stderr": 0.015026354824910782, "mc2": 0.3899721945335931, "mc2_stderr": 0.014222197893576758 }, "harness|arc:challenge|25": { "acc": 0.2977815699658703, "acc_stderr": 0.013363080107244487, "acc_norm": 0.32764505119453924, "acc_norm_stderr": 0.013715847940719346 }, "harness|hellaswag|10": { "acc": 0.4402509460266879, "acc_stderr": 0.004954026775425775, "acc_norm": 0.5826528579964151, "acc_norm_stderr": 0.00492113386493189 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.22, "acc_stderr": 0.04163331998932268, "acc_norm": 0.22, "acc_norm_stderr": 0.04163331998932268 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.21481481481481482, "acc_stderr": 0.035478541985608236, "acc_norm": 0.21481481481481482, "acc_norm_stderr": 0.035478541985608236 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.17763157894736842, "acc_stderr": 0.031103182383123398, "acc_norm": 0.17763157894736842, "acc_norm_stderr": 0.031103182383123398 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.31, "acc_stderr": 0.04648231987117316, "acc_norm": 0.31, "acc_norm_stderr": 0.04648231987117316 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.2490566037735849, "acc_stderr": 0.02661648298050171, "acc_norm": 0.2490566037735849, "acc_norm_stderr": 0.02661648298050171 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.22916666666666666, "acc_stderr": 0.035146974678623884, "acc_norm": 0.22916666666666666, "acc_norm_stderr": 0.035146974678623884 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.23, "acc_stderr": 0.042295258468165085, "acc_norm": 0.23, "acc_norm_stderr": 0.042295258468165085 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.2, "acc_stderr": 0.04020151261036846, "acc_norm": 0.2, "acc_norm_stderr": 0.04020151261036846 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.22, "acc_stderr": 0.04163331998932269, "acc_norm": 0.22, "acc_norm_stderr": 0.04163331998932269 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.21965317919075145, "acc_stderr": 0.031568093627031744, "acc_norm": 0.21965317919075145, "acc_norm_stderr": 0.031568093627031744 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.1568627450980392, "acc_stderr": 0.036186648199362466, "acc_norm": 0.1568627450980392, "acc_norm_stderr": 0.036186648199362466 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.19, "acc_stderr": 0.039427724440366234, "acc_norm": 0.19, "acc_norm_stderr": 0.039427724440366234 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.25957446808510637, "acc_stderr": 0.02865917937429232, "acc_norm": 0.25957446808510637, "acc_norm_stderr": 0.02865917937429232 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.2631578947368421, "acc_stderr": 0.0414243971948936, "acc_norm": 0.2631578947368421, "acc_norm_stderr": 0.0414243971948936 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.25517241379310346, "acc_stderr": 0.03632984052707842, "acc_norm": 0.25517241379310346, "acc_norm_stderr": 0.03632984052707842 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.25925925925925924, "acc_stderr": 0.022569897074918417, "acc_norm": 0.25925925925925924, "acc_norm_stderr": 0.022569897074918417 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.2698412698412698, "acc_stderr": 0.039701582732351734, "acc_norm": 0.2698412698412698, "acc_norm_stderr": 0.039701582732351734 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.14, "acc_stderr": 0.0348735088019777, "acc_norm": 0.14, "acc_norm_stderr": 0.0348735088019777 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.2870967741935484, "acc_stderr": 0.025736542745594525, "acc_norm": 0.2870967741935484, "acc_norm_stderr": 0.025736542745594525 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.270935960591133, "acc_stderr": 0.031270907132977, "acc_norm": 0.270935960591133, "acc_norm_stderr": 0.031270907132977 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.23, "acc_stderr": 0.04229525846816505, "acc_norm": 0.23, "acc_norm_stderr": 0.04229525846816505 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.28484848484848485, "acc_stderr": 0.035243908445117836, "acc_norm": 0.28484848484848485, "acc_norm_stderr": 0.035243908445117836 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.18181818181818182, "acc_stderr": 0.02747960301053878, "acc_norm": 0.18181818181818182, "acc_norm_stderr": 0.02747960301053878 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.23316062176165803, "acc_stderr": 0.03051611137147601, "acc_norm": 0.23316062176165803, "acc_norm_stderr": 0.03051611137147601 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.2846153846153846, "acc_stderr": 0.022878322799706283, "acc_norm": 0.2846153846153846, "acc_norm_stderr": 0.022878322799706283 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.24074074074074073, "acc_stderr": 0.026067159222275784, "acc_norm": 0.24074074074074073, "acc_norm_stderr": 0.026067159222275784 }, "harness|hendrycksTest-high_school_microeconomics|5": { "acc": 0.21428571428571427, "acc_stderr": 0.02665353159671548, "acc_norm": 0.21428571428571427, "acc_norm_stderr": 0.02665353159671548 }, "harness|hendrycksTest-high_school_physics|5": { "acc": 0.2052980132450331, "acc_stderr": 0.03297986648473834, "acc_norm": 0.2052980132450331, "acc_norm_stderr": 0.03297986648473834 }, "harness|hendrycksTest-high_school_psychology|5": { "acc": 0.22385321100917432, "acc_stderr": 0.01787121776779022, "acc_norm": 0.22385321100917432, "acc_norm_stderr": 0.01787121776779022 }, "harness|hendrycksTest-high_school_statistics|5": { "acc": 0.4074074074074074, "acc_stderr": 0.033509916046960436, "acc_norm": 0.4074074074074074, "acc_norm_stderr": 0.033509916046960436 }, "harness|hendrycksTest-high_school_us_history|5": { "acc": 0.24509803921568626, "acc_stderr": 0.030190282453501954, "acc_norm": 0.24509803921568626, "acc_norm_stderr": 0.030190282453501954 }, "harness|hendrycksTest-high_school_world_history|5": { "acc": 0.29535864978902954, "acc_stderr": 0.02969633871342288, "acc_norm": 0.29535864978902954, "acc_norm_stderr": 0.02969633871342288 }, "harness|hendrycksTest-human_aging|5": { "acc": 0.35874439461883406, "acc_stderr": 0.032190792004199956, "acc_norm": 0.35874439461883406, "acc_norm_stderr": 0.032190792004199956 }, "harness|hendrycksTest-human_sexuality|5": { "acc": 0.20610687022900764, "acc_stderr": 0.03547771004159463, "acc_norm": 0.20610687022900764, "acc_norm_stderr": 0.03547771004159463 }, "harness|hendrycksTest-international_law|5": { "acc": 0.2396694214876033, "acc_stderr": 0.03896878985070417, "acc_norm": 0.2396694214876033, "acc_norm_stderr": 0.03896878985070417 }, "harness|hendrycksTest-jurisprudence|5": { "acc": 0.16666666666666666, "acc_stderr": 0.036028141763926456, "acc_norm": 0.16666666666666666, "acc_norm_stderr": 0.036028141763926456 }, "harness|hendrycksTest-logical_fallacies|5": { "acc": 0.26993865030674846, "acc_stderr": 0.03487825168497892, "acc_norm": 0.26993865030674846, "acc_norm_stderr": 0.03487825168497892 }, "harness|hendrycksTest-machine_learning|5": { "acc": 0.23214285714285715, "acc_stderr": 0.04007341809755805, "acc_norm": 0.23214285714285715, "acc_norm_stderr": 0.04007341809755805 }, "harness|hendrycksTest-management|5": { "acc": 0.2524271844660194, "acc_stderr": 0.04301250399690877, "acc_norm": 0.2524271844660194, "acc_norm_stderr": 0.04301250399690877 }, "harness|hendrycksTest-marketing|5": { "acc": 0.23931623931623933, "acc_stderr": 0.02795182680892433, "acc_norm": 0.23931623931623933, "acc_norm_stderr": 0.02795182680892433 }, "harness|hendrycksTest-medical_genetics|5": { "acc": 0.32, "acc_stderr": 0.04688261722621504, "acc_norm": 0.32, "acc_norm_stderr": 0.04688261722621504 }, "harness|hendrycksTest-miscellaneous|5": { "acc": 0.2784163473818646, "acc_stderr": 0.016028295188992462, "acc_norm": 0.2784163473818646, "acc_norm_stderr": 0.016028295188992462 }, "harness|hendrycksTest-moral_disputes|5": { "acc": 0.2514450867052023, "acc_stderr": 0.023357365785874044, "acc_norm": 0.2514450867052023, "acc_norm_stderr": 0.023357365785874044 }, "harness|hendrycksTest-moral_scenarios|5": { "acc": 0.23798882681564246, "acc_stderr": 0.014242630070574915, "acc_norm": 0.23798882681564246, "acc_norm_stderr": 0.014242630070574915 }, "harness|hendrycksTest-nutrition|5": { "acc": 0.24183006535947713, "acc_stderr": 0.024518195641879334, "acc_norm": 0.24183006535947713, "acc_norm_stderr": 0.024518195641879334 }, "harness|hendrycksTest-philosophy|5": { "acc": 0.2090032154340836, "acc_stderr": 0.02309314039837422, "acc_norm": 0.2090032154340836, "acc_norm_stderr": 0.02309314039837422 }, "harness|hendrycksTest-prehistory|5": { "acc": 0.24691358024691357, "acc_stderr": 0.023993501709042107, "acc_norm": 0.24691358024691357, "acc_norm_stderr": 0.023993501709042107 }, "harness|hendrycksTest-professional_accounting|5": { "acc": 0.2198581560283688, "acc_stderr": 0.024706141070705484, "acc_norm": 0.2198581560283688, "acc_norm_stderr": 0.024706141070705484 }, "harness|hendrycksTest-professional_law|5": { "acc": 0.23272490221642764, "acc_stderr": 0.0107925955538885, "acc_norm": 0.23272490221642764, "acc_norm_stderr": 0.0107925955538885 }, "harness|hendrycksTest-professional_medicine|5": { "acc": 0.22426470588235295, "acc_stderr": 0.025336848563332355, "acc_norm": 0.22426470588235295, "acc_norm_stderr": 0.025336848563332355 }, "harness|hendrycksTest-professional_psychology|5": { "acc": 0.2434640522875817, "acc_stderr": 0.017362473762146634, "acc_norm": 0.2434640522875817, "acc_norm_stderr": 0.017362473762146634 }, "harness|hendrycksTest-public_relations|5": { "acc": 0.2727272727272727, "acc_stderr": 0.04265792110940589, "acc_norm": 0.2727272727272727, "acc_norm_stderr": 0.04265792110940589 }, "harness|hendrycksTest-security_studies|5": { "acc": 0.1673469387755102, "acc_stderr": 0.023897144768914524, "acc_norm": 0.1673469387755102, "acc_norm_stderr": 0.023897144768914524 }, "harness|hendrycksTest-sociology|5": { "acc": 0.2537313432835821, "acc_stderr": 0.030769444967296018, "acc_norm": 0.2537313432835821, "acc_norm_stderr": 0.030769444967296018 }, "harness|hendrycksTest-us_foreign_policy|5": { "acc": 0.3, "acc_stderr": 0.046056618647183814, "acc_norm": 0.3, "acc_norm_stderr": 0.046056618647183814 }, "harness|hendrycksTest-virology|5": { "acc": 0.25903614457831325, "acc_stderr": 0.03410646614071856, "acc_norm": 0.25903614457831325, "acc_norm_stderr": 0.03410646614071856 }, "harness|hendrycksTest-world_religions|5": { "acc": 0.30994152046783624, "acc_stderr": 0.035469769593931624, "acc_norm": 0.30994152046783624, "acc_norm_stderr": 0.035469769593931624 }, "harness|truthfulqa:mc|0": { "mc1": 0.24357405140758873, "mc1_stderr": 0.015026354824910782, "mc2": 0.3899721945335931, "mc2_stderr": 0.014222197893576758 }, "harness|winogrande|5": { "acc": 0.6503551696921863, "acc_stderr": 0.013402073680850503 }, "harness|gsm8k|5": { "acc": 0.0401819560272934, "acc_stderr": 0.005409439736970487 } }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作