details_Qwen__Qwen1.5-14B-Chat
收藏数据集概述
数据集名称
- 数据集名称: Evaluation run of Qwen/Qwen1.5-14B-Chat
数据集来源
- 来源: 该数据集是在模型 Qwen/Qwen1.5-14B-Chat 的评估运行过程中自动创建的。
数据集结构
- 配置数量: 117 个配置,每个配置对应一个评估任务。
- 运行次数: 数据集由 2 次运行生成,每次运行的结果作为一个特定的分割(split),分割名称使用运行的时间戳命名。"train" 分割始终指向最新的结果。
- 结果配置: 包含一个名为 "results" 的配置,存储所有运行的聚合结果。
数据加载示例
python from datasets import load_dataset data = load_dataset("amztheory/details_Qwen__Qwen1.5-14B-Chat", "results", split="train")
最新结果
- 最新运行时间: 2025-01-10T09:45:08.386669
- 最新结果文件: results_2025-01-10T09-45-08.386669.json
最新结果摘要
json { "all": { "acc_norm": 0.5012502365334057, "acc_norm_stderr": 0.0328328010546483, "f1": 0.8219794463963592, "f1_stderr": 0.045217071789679014 }, "community|alghafa:mcq_exams_test_ar|0": { "acc_norm": 0.3357271095152603, "acc_norm_stderr": 0.020027607629453156 }, "community|alghafa:meta_ar_dialects|0": { "acc_norm": 0.32233549582947174, "acc_norm_stderr": 0.006363645295981301 }, "community|alghafa:meta_ar_msa|0": { "acc_norm": 0.3787709497206704, "acc_norm_stderr": 0.016223533510365113 }, "community|alghafa:multiple_choice_facts_truefalse_balanced_task|0": { "acc_norm": 0.52, "acc_norm_stderr": 0.05807730170189531 }, "community|alghafa:multiple_choice_grounded_statement_soqal_task|0": { "acc_norm": 0.58, "acc_norm_stderr": 0.040433888371749035 }, "community|alghafa:multiple_choice_grounded_statement_xglue_mlqa_task|0": { "acc_norm": 0.41333333333333333, "acc_norm_stderr": 0.040341569222180455 }, "community|alghafa:multiple_choice_rating_sentiment_no_neutral_task|0": { "acc_norm": 0.7939962476547843, "acc_norm_stderr": 0.004523397134548639 }, "community|alghafa:multiple_choice_rating_sentiment_task|0": { "acc_norm": 0.5541284403669725, "acc_norm_stderr": 0.0064202470016455305 }, "community|alghafa:multiple_choice_sentiment_task|0": { "acc_norm": 0.4174418604651163, "acc_norm_stderr": 0.011894048296224074 }, "community|arabic_exams|0": { "acc_norm": 0.4264432029795158, "acc_norm_stderr": 0.021361729869269146 }, "community|arabic_mmlu:Accounting (University)|0": { "acc_norm": 0.5, "acc_norm_stderr": 0.058520573598065284 }, "community|arabic_mmlu:Arabic Language (General)|0": { "acc_norm": 0.553921568627451, "acc_norm_stderr": 0.020109864547181357 }, "community|arabic_mmlu:Arabic Language (Grammar)|0": { "acc_norm": 0.3835616438356164, "acc_norm_stderr": 0.025486589299152422 }, "community|arabic_mmlu:Arabic Language (High School)|0": { "acc_norm": 0.3641025641025641, "acc_norm_stderr": 0.024396672985094785 }, "community|arabic_mmlu:Arabic Language (Middle School)|0": { "acc_norm": 0.5185185185185185, "acc_norm_stderr": 0.09799078929868857 }, "community|arabic_mmlu:Arabic Language (Primary School)|0": { "acc_norm": 0.5714285714285714, "acc_norm_stderr": 0.031236022160528714 }, "community|arabic_mmlu:Biology (High School)|0": { "acc_norm": 0.41944641589779985, "acc_norm_stderr": 0.013150978621344823 }, "community|arabic_mmlu:Civics (High School)|0": { "acc_norm": 0.45977011494252873, "acc_norm_stderr": 0.053741581963657706 }, "community|arabic_mmlu:Civics (Middle School)|0": { "acc_norm": 0.5042372881355932, "acc_norm_stderr": 0.032615232401979465 }, "community|arabic_mmlu:Computer Science (High School)|0": { "acc_norm": 0.5708812260536399, "acc_norm_stderr": 0.03069551782571805 }, "community|arabic_mmlu:Computer Science (Middle School)|0": { "acc_norm": 0.9259259259259259, "acc_norm_stderr": 0.051361129280113806 }, "community|arabic_mmlu:Computer Science (Primary School)|0": { "acc_norm": 0.7210526315789474, "acc_norm_stderr": 0.03262223525734098 }, "community|arabic_mmlu:Computer Science (University)|0": { "acc_norm": 0.609375, "acc_norm_stderr": 0.06146842128667525 }, "community|arabic_mmlu:Driving Test|0": { "acc_norm": 0.6573080099091659, "acc_norm_stderr": 0.013644064189915319 }, "community|arabic_mmlu:Economics (High School)|0": { "acc_norm": 0.5583333333333333, "acc_norm_stderr": 0.026208783650750977 }, "community|arabic_mmlu:Economics (Middle School)|0": { "acc_norm": 0.7471264367816092, "acc_norm_stderr": 0.04687049503854671 }, "community|arabic_mmlu:Economics (University)|0": { "acc_norm": 0.5182481751824818, "acc_norm_stderr": 0.042846082608231466 }, "community|arabic_mmlu:General Knowledge|0": { "acc_norm": 0.47685185185185186, "acc_norm_stderr": 0.017001948059514615 }, "community|arabic_mmlu:General Knowledge (Middle School)|0": { "acc_norm": 0.686046511627907, "acc_norm_stderr": 0.03549043982227173 }, "community|arabic_mmlu:General Knowledge (Primary School)|0": { "acc_norm": 0.6604938271604939, "acc_norm_stderr": 0.03732031330740126 }, "community|arabic_mmlu:Geography (High School)|0": { "acc_norm": 0.45664739884393063, "acc_norm_stderr": 0.015468278797637118 }, "community|arabic_mmlu:Geography (Middle School)|0": { "acc_norm": 0.5882352941176471, "acc_norm_stderr": 0.029896163033125485 }, "community|arabic_mmlu:Geography (Primary School)|0": { "acc_norm": 0.5263157894736842, "acc_norm_stderr": 0.06672270432067237 }, "community|arabic_mmlu:History (High School)|0": { "acc_norm": 0.4131578947368421, "acc_norm_stderr": 0.01787301307874886 }, "community|arabic_mmlu:History (Middle School)|0": { "acc_norm": 0.5369458128078818, "acc_norm_stderr": 0.035083705204426656 }, "community|arabic_mmlu:History (Primary School)|0": { "acc_norm": 0.5196078431372549, "acc_norm_stderr": 0.04971358884367406 }, "community|arabic_mmlu:Islamic Studies|0": { "acc_norm": 0.3458528951486698, "acc_norm_stderr": 0.01883098685502422 }, "community|arabic_mmlu:Islamic Studies (High School)|0": { "acc_norm": 0.6167664670658682, "acc_norm_stderr": 0.026642195538092498 }, "community|arabic_mmlu:Islamic Studies (Middle School)|0": { "acc_norm": 0.6008403361344538, "acc_norm_stderr": 0.03181110032413925 }, "community|arabic_mmlu:Islamic Studies (Primary School)|0": { "acc_norm": 0.6896896896896897, "acc_norm_stderr": 0.01464399928487927 }, "community|arabic_mmlu:Law (Professional)|0": { "acc_norm": 0.6910828025477707, "acc_norm_stderr": 0.026116436415099396 }, "community|arabic_mmlu:Management (University)|0": { "acc_norm": 0.6666666666666666, "acc_norm_stderr": 0.05479966243511907 }, "community|arabic_mmlu:Math (Primary School)|0": { "acc_norm": 0.5574572127139364, "acc_norm_stderr": 0.02458970515830585 }, "community|arabic_mmlu:Natural Science (Middle School)|0": { "acc_norm": 0.6157024793388429, "acc_norm_stderr": 0.03133363075160923 }, "community|arabic_mmlu:Natural Science (Primary School)|0": { "acc_norm": 0.7261904761904762, "acc_norm_stderr": 0.024362796967135468 }, "community|arabic_mmlu:Philosophy (High School)|0": { "acc_norm": 0.5897435897435898, "acc_norm_stderr": 0.0797934979708204 }, "community|arabic_mmlu:Physics (High School)|0": { "acc_norm": 0.3803921568627451, "acc_norm_stderr": 0.0304619269182863 }, "community|arabic_mmlu:Political Science (University)|0": { "acc_norm": 0.5523809523809524, "acc_norm_stderr": 0.034395409440258005 }, "community|arabic_mmlu:Social Science (Middle School)|0": { "acc_norm": 0.4605809128630705, "acc_norm_stderr": 0.03217440335948301 }, "community|arabic_mmlu:Social Science (Primary School)|0": { "acc_norm": 0.7078014184397163, "acc_norm_stderr": 0.017139906024924396 }, "community|arabic_mmlu_ht:abstract_algebra|0": { "acc_norm": 0.26, "acc_norm_stderr": 0.044084400227680794 }, "community|arabic_mmlu_ht:anatomy|0": { "acc_norm": 0.35555555555555557, "acc_norm_stderr": 0.04135176749720386 }, "community|arabic_mmlu_ht:astronomy|0": { "acc_norm": 0.5, "acc_norm_stderr": 0.04068942293855797 }, "community|arabic_mmlu_ht:business_ethics|0": { "acc_norm": 0.56, "acc_norm_stderr": 0.04988876515698589 }, "community|arabic_mmlu_ht:clinical_knowledge|0": { "acc_norm": 0.49056603773584906, "acc_norm_stderr": 0.030767394707808093 }, "community|arabic_mmlu_ht:college_biology|0": { "acc_norm": 0.3611111111111111, "acc_norm_stderr": 0.04016660030451233 }, "community|arabic_mmlu_ht:college_chemistry|0": { "acc_norm": 0.34, "acc_norm_stderr": 0.04760952285695236 }, "community|arabic_mmlu_ht:college_computer_science|0": { "acc_norm": 0.35, "acc_norm_stderr": 0.047937248544110196 }, "community|arabic_mmlu_ht:college_mathematics|0": { "acc_norm": 0.34, "acc_norm_stderr": 0.04760952285695236 }, "community|arabic_mmlu_ht:college_medicine|0": { "acc_norm": 0.4161849710982659, "acc_norm_stderr": 0.03758517775404947 }, "community|arabic_mmlu_ht:college_physics|0": { "acc_norm": 0.27450980392156865, "acc_norm_stderr": 0.04440521906179326 }, "community|arabic_mmlu_ht:computer_security|0": { "acc_norm": 0.48, "acc_norm_stderr": 0.050211673156867795 }, "community|arabic_mmlu_ht:conceptual_physics|0": { "acc_norm": 0.42127659574468085, "acc_norm_stderr": 0.03227834510146267 }, "community|arabic_mmlu_ht:econometrics|0": { "acc_norm": 0.3157894736842105, "acc_norm_stderr": 0.04372748290278006 }, "community|arabic_mmlu_ht:electrical_engineering|0": { "acc_norm": 0.47586206896551725, "acc_norm_stderr": 0.0416180850350153 }, "community|arabic_mmlu_ht:elementary_mathematics|0": { "acc_norm": 0.43915343915343913, "acc_norm_stderr": 0.025559920550531006 }, "community|arabic_mmlu_ht:formal_logic|0": { "acc_norm": 0.3888888888888889, "acc_norm_stderr": 0.04360314860077459 }, "community|arabic_mmlu_ht:global_facts|0": { "acc_norm": 0.33, "acc_norm_stderr": 0.04725815626252604 }, "community|arabic_mmlu_ht:high_school_




