OpenLLMTurkishLeadboardv2/details_cypienai__cymist
收藏数据集概述
数据集是在评估模型cypienai/cymist在Open LLM土耳其Leaderboard v0.2运行期间自动创建的。
评估结果
数据集包含多个子任务的评估结果,每个子任务都有其特定的准确率(acc)和准确率标准误差(acc_stderr)。以下是部分子任务的评估结果:
子任务及评估结果
-
winogrande_tr
- 准确率(acc): 0.5371248025276462
- 准确率标准误差(acc_stderr): 0.01401923436016685
-
truthfulqa_v0.2
- 准确率(acc): 0.46923339264492214
- 准确率标准误差(acc_stderr): 0.015681512687621395
-
mmlu_tr_v0.2
- 准确率(acc): 0.3208607557494639
- 准确率标准误差(acc_stderr): 0.003990297141110835
-
mmlu_humanities_v0.2
- 准确率(acc): 0.294693691641995
- 准确率标准误差(acc_stderr): 0.00685375354614323
-
mmlu_formal_logic_v0.2
- 准确率(acc): 0.2619047619047619
- 准确率标准误差(acc_stderr): 0.03932537680392871
-
mmlu_high_school_european_history_v0.2
- 准确率(acc): 0.28
- 准确率标准误差(acc_stderr): 0.03678342200269278
-
mmlu_high_school_us_history_v0.2
- 准确率(acc): 0.29608938547486036
- 准确率标准误差(acc_stderr): 0.03421843754304871
-
mmlu_high_school_world_history_v0.2
- 准确率(acc): 0.3004694835680751
- 准确率标准误差(acc_stderr): 0.03148731198588209
-
mmlu_international_law_v0.2
- 准确率(acc): 0.39669421487603307
- 准确率标准误差(acc_stderr): 0.044658697805310094
-
mmlu_jurisprudence_v0.2
- 准确率(acc): 0.33962264150943394
- 准确率标准误差(acc_stderr): 0.046216787599682646
-
mmlu_logical_fallacies_v0.2
- 准确率(acc): 0.37888198757763975
- 准确率标准误差(acc_stderr): 0.03835120818393935
-
mmlu_moral_disputes_v0.2
- 准确率(acc): 0.3538961038961039
- 准确率标准误差(acc_stderr): 0.027291027241446268
-
mmlu_moral_scenarios_v0.2
- 准确率(acc): 0.2408256880733945
- 准确率标准误差(acc_stderr): 0.014488154868754013
-
mmlu_philosophy_v0.2
- 准确率(acc): 0.3277591973244147
- 准确率标准误差(acc_stderr): 0.027191411117655603
-
mmlu_prehistory_v0.2
- 准确率(acc): 0.36
- 准确率标准误差(acc_stderr): 0.027759116734379554
-
mmlu_professional_law_v0.2
- 准确率(acc): 0.2680115273775216
- 准确率标准误差(acc_stderr): 0.011892978321681123
-
mmlu_world_religions_v0.2
- 准确率(acc): 0.35714285714285715
- 准确率标准误差(acc_stderr): 0.037078314653891886
-
mmlu_other_v0.2
- 准确率(acc): 0.3487060384870604
- 准确率标准误差(acc_stderr): 0.008618664806708944
-
mmlu_business_ethics_v0.2
- 准确率(acc): 0.3939393939393939
- 准确率标准误差(acc_stderr): 0.04935824351078519
-
mmlu_clinical_knowledge_v0.2
- 准确率(acc): 0.3046875
- 准确率标准误差(acc_stderr): 0.0288235352734838
-
mmlu_college_medicine_v0.2
- 准确率(acc): 0.3333333333333333
- 准确率标准误差(acc_stderr): 0.03647837701097217
-
mmlu_global_facts_v0.2
- 准确率(acc): 0.3469387755102041
- 准确率标准误差(acc_stderr): 0.04833007873885538
-
mmlu_human_aging_v0.2
- 准确率(acc): 0.3160377358490566
- 准确率标准误差(acc_stderr): 0.03200695165667737
-
mmlu_management_v0.2
- 准确率(acc): 0.42424242424242425
- 准确率标准误差(acc_stderr): 0.04992451339684325
-
mmlu_marketing_v0.2
- 准确率(acc): 0.4976958525345622
- 准确率标准误差(acc_stderr): 0.03402032963187416
-
mmlu_medical_genetics_v0.2
- 准确率(acc): 0.30526315789473685
- 准确率标准误差(acc_stderr): 0.047498887145627756
-
mmlu_miscellaneous_v0.2
- 准确率(acc): 0.38642297650130547
- 准确率标准误差(acc_stderr): 0.017604970322138192
-
mmlu_nutrition_v0.2
- 准确率(acc): 0.3377049180327869
- 准确率标准误差(acc_stderr): 0.027124245464236005
-
mmlu_professional_accounting_v0.2
- 准确率(acc): 0.26881720430107525
- 准确率标准误差(acc_stderr): 0.02659004756292602
-
mmlu_professional_medicine_v0.2
- 准确率(acc): 0.2413793103448276
- 准确率标准误差(acc_stderr): 0.026538458224468438
-
mmlu_virology_v0.2
- 准确率(acc): 0.3836477987421384
- 准确率标准误差(acc_stderr): 0.038685862827055204
-
mmlu_social_sciences_v0.2
- 准确率(acc): 0.35764235764235763
- 准确率标准误差(acc_stderr): 0.008726922101163874
-
mmlu_econometrics_v0.2
- 准确率(acc): 0.3157894736842105
- 准确率标准误差(acc_stderr): 0.04372748290278007
-
mmlu_high_school_geography_v0.2
- 准确率(acc): 0.4010152284263959
- 准确率标准误差(acc_stderr): 0.03500743470573262
-
mmlu_high_school_government_and_politics_v0.2
- 准确率(acc): 0.33689839572192515
- 准确率标准误差(acc_stderr): 0.03465636737116506
-
mmlu_high_school_macroeconomics_v0.2
- 准确率(acc): 0.32051282051282054
- 准确率标准误差(acc_stderr): 0.02366129639396428
-
mmlu_high_school_microeconomics_v0.2
- 准确率(acc): 0.3333333333333333
- 准确率标准误差(acc_stderr): 0.030685820596610805
-
mmlu_high_school_psychology_v0.2
- 准确率(acc): 0.38649155722326456
- 准确率标准误差(acc_stderr): 0.02111176103014589
-
mmlu_human_sexuality_v0.2
- 准确率(acc): 0.3739130434782609
- 准确率标准误差(acc_stderr): 0.04531585828644964
-
mmlu_professional_psychology_v0.2
- 准确率(acc): 0.3114478114478115
- 准确率标准误差(acc_stderr): 0.019016637438856367
-
mmlu_public_relations_v0.2
- 准确率(acc): 0.3611111111111111
- 准确率标准误差(acc_stderr): 0.04643454608906275
-
mmlu_security_studies_v0.2
- 准确率(acc): 0.38461538461538464
- 准确率标准误差(acc_stderr): 0.031871953479424654
-
mmlu_sociology_v0.2
- 准确率(acc): 0.40512820512820513
- 准确率标准误差(acc_stderr): 0.03524577495610961
-
mmlu_us_foreign_policy_v0.2
- 准确率(acc): 0.5050505050505051
- 准确率标准误差(acc_stderr): 0.05050505050505048
-
mmlu_stem_v0.2
- 准确率(acc): 0.2953451043338684
- 准确率标准误差(acc_stderr): 0.008149804347491315
-
mmlu_abstract_algebra_v0.2
- 准确率(acc): 0.31
- 准确率标准误差(acc_stderr): 0.04648231987117316
-
mmlu_anatomy_v0.2
- 准确率(acc): 0.3816793893129771
- 准确率标准误差(acc_stderr): 0.0426073515764456
-
mmlu_astronomy
- 准确率(acc): 0.2847682119205298
- 准确率标准误差(acc_stderr): 0.03684881521389024
-
mmlu_college_biology_v0.2
- 准确率(acc): 0.3028169014084507
- 准确率标准误差(acc_stderr): 0.038694917439795515
-
mmlu_college_chemistry_v0.2
- 准确率(acc):



