open-llm-leaderboard/details_ehartford__Wizard-Vicuna-30B-Uncensored
收藏数据集概述
数据集简介
该数据集是在评估模型ehartford/Wizard-Vicuna-30B-Uncensored在Open LLM Leaderboard上的自动创建的。数据集包含64个配置,每个配置对应一个评估任务。
数据集结构
数据集由2次运行创建,每次运行的结果可以在每个配置中找到,以运行的时间戳命名的特定分片形式存在。"train"分片始终指向最新的结果。
额外配置
一个额外的配置"results"存储了所有运行结果的聚合,用于计算和显示在Open LLM Leaderboard上的聚合指标。
数据加载示例
python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_ehartford__Wizard-Vicuna-30B-Uncensored", "harness_winogrande_5", split="train")
最新结果
以下是最新结果: python { "all": { "em": 0.18162751677852348, "em_stderr": 0.0039482621737543045, "f1": 0.2674087667785243, "f1_stderr": 0.004012090110572664, "acc": 0.46353130406008236, "acc_stderr": 0.01059244186586655 }, "harness|drop|3": { "em": 0.18162751677852348, "em_stderr": 0.0039482621737543045, "f1": 0.2674087667785243, "f1_stderr": 0.004012090110572664 }, "harness|gsm8k|5": { "acc": 0.1425322213798332, "acc_stderr": 0.009629588445673819 }, "harness|winogrande|5": { "acc": 0.7845303867403315, "acc_stderr": 0.011555295286059279 } }
配置详情
配置列表
harness_arc_challenge_25harness_drop_3harness_gsm8k_5harness_hellaswag_10harness_hendrycksTest_5harness_hendrycksTest_abstract_algebra_5harness_hendrycksTest_anatomy_5harness_hendrycksTest_astronomy_5harness_hendrycksTest_business_ethics_5harness_hendrycksTest_clinical_knowledge_5harness_hendrycksTest_college_biology_5harness_hendrycksTest_college_chemistry_5harness_hendrycksTest_college_computer_science_5harness_hendrycksTest_college_mathematics_5harness_hendrycksTest_college_medicine_5harness_hendrycksTest_college_physics_5harness_hendrycksTest_computer_security_5harness_hendrycksTest_conceptual_physics_5harness_hendrycksTest_econometrics_5harness_hendrycksTest_electrical_engineering_5harness_hendrycksTest_elementary_mathematics_5



