OpenLLMTurkishLeadboardv2/details_Orbina__Orbita-v0.1
收藏数据集概述
数据集是在评估模型 Orbina/Orbita-v0.1 在 Open LLM Turkish Leaderboard v0.2 上的运行过程中自动创建的。
评估结果
准确率(Acc)和标准误差(Acc_stderr)
-
winogrande_tr:
- Acc: 0.5616113744075829
- Acc_stderr: 0.01395090314113922
-
truthfulqa_v0.2:
- Acc: 0.50778392726845
- Acc_stderr: 0.015415009310400483
-
mmlu_tr_v0.2:
- Acc: 0.49515640020705465
- Acc_stderr: 0.004171392981675135
-
mmlu_humanities_v0.2:
- Acc: 0.44067410612616714
- Acc_stderr: 0.0072042837016912074
-
mmlu_formal_logic_v0.2:
- Acc: 0.42857142857142855
- Acc_stderr: 0.04426266681379909
-
mmlu_high_school_european_history_v0.2:
- Acc: 0.6133333333333333
- Acc_stderr: 0.03989546370031041
-
mmlu_high_school_us_history_v0.2:
- Acc: 0.5810055865921788
- Acc_stderr: 0.03698147842986738
-
mmlu_high_school_world_history_v0.2:
- Acc: 0.6666666666666666
- Acc_stderr: 0.0323761954119088
-
mmlu_international_law_v0.2:
- Acc: 0.6859504132231405
- Acc_stderr: 0.042369647530410184
-
mmlu_jurisprudence_v0.2:
- Acc: 0.5849056603773585
- Acc_stderr: 0.048086333949706635
-
mmlu_logical_fallacies_v0.2:
- Acc: 0.4968944099378882
- Acc_stderr: 0.039527708265086496
-
mmlu_moral_disputes_v0.2:
- Acc: 0.5324675324675324
- Acc_stderr: 0.028476280736968677
-
mmlu_moral_scenarios_v0.2:
- Acc: 0.23853211009174313
- Acc_stderr: 0.01444076314197968
-
mmlu_philosophy_v0.2:
- Acc: 0.5518394648829431
- Acc_stderr: 0.028808128856107652
-
mmlu_prehistory_v0.2:
- Acc: 0.5366666666666666
- Acc_stderr: 0.02883789055433726
-
mmlu_professional_law_v0.2:
- Acc: 0.37319884726224783
- Acc_stderr: 0.012986640233707492
-
mmlu_world_religions_v0.2:
- Acc: 0.6071428571428571
- Acc_stderr: 0.03779240554853983
-
mmlu_other_v0.2:
- Acc: 0.5325149303251493
- Acc_stderr: 0.008911579610042552
-
mmlu_business_ethics_v0.2:
- Acc: 0.5959595959595959
- Acc_stderr: 0.04956872738042618
-
mmlu_clinical_knowledge_v0.2:
- Acc: 0.5390625
- Acc_stderr: 0.0312155140597541
-
mmlu_college_medicine_v0.2:
- Acc: 0.5178571428571429
- Acc_stderr: 0.0386664782674949
-
mmlu_global_facts_v0.2:
- Acc: 0.29591836734693877
- Acc_stderr: 0.04634593001555603
-
mmlu_human_aging_v0.2:
- Acc: 0.5188679245283019
- Acc_stderr: 0.03439690285738042
-
mmlu_management_v0.2:
- Acc: 0.6262626262626263
- Acc_stderr: 0.04887069039502487
-
mmlu_marketing_v0.2:
- Acc: 0.7004608294930875
- Acc_stderr: 0.03116677479474285
-
mmlu_medical_genetics_v0.2:
- Acc: 0.5578947368421052
- Acc_stderr: 0.051224183891818126
-
mmlu_miscellaneous_v0.2:
- Acc: 0.6096605744125326
- Acc_stderr: 0.017637399302140862
-
mmlu_nutrition_v0.2:
- Acc: 0.521311475409836
- Acc_stderr: 0.02865090594093615
-
mmlu_professional_accounting_v0.2:
- Acc: 0.3154121863799283
- Acc_stderr: 0.027869643826017407
-
mmlu_professional_medicine_v0.2:
- Acc: 0.4789272030651341
- Acc_stderr: 0.03098113180316629
-
mmlu_virology_v0.2:
- Acc: 0.4779874213836478
- Acc_stderr: 0.03973929649561242
-
mmlu_social_sciences_v0.2:
- Acc: 0.572094572094572
- Acc_stderr: 0.00888927240965376
-
mmlu_econometrics_v0.2:
- Acc: 0.37719298245614036
- Acc_stderr: 0.04559522141958216
-
mmlu_high_school_geography_v0.2:
- Acc: 0.6395939086294417
- Acc_stderr: 0.03429416121196761
-
mmlu_high_school_government_and_politics_v0.2:
- Acc: 0.5989304812834224
- Acc_stderr: 0.03593697887872985
-
mmlu_high_school_macroeconomics_v0.2:
- Acc: 0.5128205128205128
- Acc_stderr: 0.025342671293807257
-
mmlu_high_school_microeconomics_v0.2:
- Acc: 0.5569620253164557
- Acc_stderr: 0.032335327775334835
-
mmlu_high_school_psychology_v0.2:
- Acc: 0.6435272045028143
- Acc_stderr: 0.020765425535814862
-
mmlu_human_sexuality_v0.2:
- Acc: 0.6173913043478261
- Acc_stderr: 0.04552031372871532
-
mmlu_professional_psychology_v0.2:
- Acc: 0.45286195286195285
- Acc_stderr: 0.020441088985356612
-
mmlu_public_relations_v0.2:
- Acc: 0.6296296296296297
- Acc_stderr: 0.0466840803302493
-
mmlu_security_studies_v0.2:
- Acc: 0.6068376068376068
- Acc_stderr: 0.03199957924651048
-
mmlu_sociology_v0.2:
- Acc: 0.7128205128205128
- Acc_stderr: 0.032483733385398866
-
mmlu_us_foreign_policy_v0.2:
- Acc: 0.7373737373737373
- Acc_stderr: 0.04445287676983945
-
mmlu_stem_v0.2:
- Acc: 0.46163723916532906
- Acc_stderr: 0.008775995548684237
-
mmlu_abstract_algebra_v0.2:
- Acc: 0.27
- Acc_stderr: 0.04461960433384741
-
mmlu_anatomy_v0.2:
- Acc: 0.44274809160305345
- Acc_stderr: 0.04356447202665069
-
mmlu_astronomy:
- Acc: 0.5231788079470199
- Acc_stderr: 0.04078093859163085
-
mmlu_college_biology_v0.2:
- Acc: 0.528169014084507
- Acc_stderr: 0.042040718749170536
-
mmlu_college_chemistry_v0.2:
- Acc: 0.3838383838383838
- Acc_stderr: 0.04912566964083466
-
mmlu_college_computer_science_v0.2:
- Acc: 0.494949494949495
- Acc_stderr: 0.05050505050505048
-
mmlu_college_mathematics_v0.2:
- Acc: 0.4
- Acc_stderr: 0.04923659639173309
-
mmlu_college_physics_v0.2:
- Acc: 0.297029702970297
- Acc_stderr: 0.04569497330381909
-
mmlu_computer_security_v0.2:
- Acc: 0.6
- Acc_stderr: 0.049236596391733084
-
mmlu_conceptual_physics_v0.2:
- Acc: 0.5107296137339056
- Acc_stderr: 0.03281904904358935
-
mmlu_electrical_engineering_v0.2:
- Acc: 0.4861111111111111
- Acc_stderr: 0.041795966175810016
-
mmlu_elementary_mathematics_v0.2:
- Acc: 0.4772117962466488
- Acc_stderr: 0.025896853805342974
-
mmlu_high_school_biology_v0.2:
- Acc: 0.6433333333333333
- Acc_stderr: 0.027702163901059916
-
mmlu_high_school_chemistry_v0.2:
- Acc: 0.4010152284263959
- Acc_stderr: 0.03500743470573262
-
**mmlu_high_school_computer_science_v0



