OpenLLMTurkishLeadboardv2/details_Trendyol__Trendyol-LLM-7b-chat-v1.0
收藏数据集概述
数据集是在评估模型Trendyol/Trendyol-LLM-7b-chat-v1.0在Open LLM土耳其Leaderboard v0.2运行期间自动创建的。
评估结果
准确度(Acc)和标准误差(Acc_stderr)
-
winogrande_tr:
- Acc: 0.5639810426540285
- Acc_stderr: 0.013942468639504206
-
truthfulqa_v0.2:
- Acc: 0.431264994699023
- Acc_stderr: 0.01584776908566045
-
mmlu_tr_v0.2:
- Acc: 0.3948827922798196
- Acc_stderr: 0.0041515196850484635
-
mmlu_humanities_v0.2:
- Acc: 0.37144158506035074
- Acc_stderr: 0.0072290666516993994
-
mmlu_formal_logic_v0.2:
- Acc: 0.31746031746031744
- Acc_stderr: 0.04163453031302859
-
mmlu_high_school_european_history_v0.2:
- Acc: 0.4533333333333333
- Acc_stderr: 0.040782795278808064
-
mmlu_high_school_us_history_v0.2:
- Acc: 0.39664804469273746
- Acc_stderr: 0.03666722301252672
-
mmlu_high_school_world_history_v0.2:
- Acc: 0.4835680751173709
- Acc_stderr: 0.03432159174112686
-
mmlu_international_law_v0.2:
- Acc: 0.5289256198347108
- Acc_stderr: 0.04556710331269498
-
mmlu_jurisprudence_v0.2:
- Acc: 0.39622641509433965
- Acc_stderr: 0.047732492983673595
-
mmlu_logical_fallacies_v0.2:
- Acc: 0.37267080745341613
- Acc_stderr: 0.03822525970525206
-
mmlu_moral_disputes_v0.2:
- Acc: 0.42207792207792205
- Acc_stderr: 0.028187838402155125
-
mmlu_moral_scenarios_v0.2:
- Acc: 0.31880733944954126
- Acc_stderr: 0.015790288247596724
-
mmlu_philosophy_v0.2:
- Acc: 0.4414715719063545
- Acc_stderr: 0.028765099513410327
-
mmlu_prehistory_v0.2:
- Acc: 0.39666666666666667
- Acc_stderr: 0.028291496425144964
-
mmlu_professional_law_v0.2:
- Acc: 0.31412103746397696
- Acc_stderr: 0.012463327930569887
-
mmlu_world_religions_v0.2:
- Acc: 0.5238095238095238
- Acc_stderr: 0.038647269200068196
-
mmlu_other_v0.2:
- Acc: 0.4455872594558726
- Acc_stderr: 0.008953386445254787
-
mmlu_business_ethics_v0.2:
- Acc: 0.48484848484848486
- Acc_stderr: 0.050484431990002604
-
mmlu_clinical_knowledge_v0.2:
- Acc: 0.41015625
- Acc_stderr: 0.030801585176036275
-
mmlu_college_medicine_v0.2:
- Acc: 0.4226190476190476
- Acc_stderr: 0.03822500265005227
-
mmlu_global_facts_v0.2:
- Acc: 0.35714285714285715
- Acc_stderr: 0.048651065269982086
-
mmlu_human_aging_v0.2:
- Acc: 0.4339622641509434
- Acc_stderr: 0.0341198763105892
-
mmlu_management_v0.2:
- Acc: 0.5656565656565656
- Acc_stderr: 0.05007027870966083
-
mmlu_marketing_v0.2:
- Acc: 0.5529953917050692
- Acc_stderr: 0.03382905613755031
-
mmlu_medical_genetics_v0.2:
- Acc: 0.5157894736842106
- Acc_stderr: 0.05154534179593067
-
mmlu_miscellaneous_v0.2:
- Acc: 0.5287206266318538
- Acc_stderr: 0.018047690113669693
-
mmlu_nutrition_v0.2:
- Acc: 0.3770491803278688
- Acc_stderr: 0.02779643435707082
-
mmlu_professional_accounting_v0.2:
- Acc: 0.31899641577060933
- Acc_stderr: 0.02795407992616445
-
mmlu_professional_medicine_v0.2:
- Acc: 0.3486590038314176
- Acc_stderr: 0.029554116131305524
-
mmlu_virology_v0.2:
- Acc: 0.42138364779874216
- Acc_stderr: 0.039283090474869206
-
mmlu_social_sciences_v0.2:
- Acc: 0.4368964368964369
- Acc_stderr: 0.008983743720354284
-
mmlu_econometrics_v0.2:
- Acc: 0.32456140350877194
- Acc_stderr: 0.04404556157374768
-
mmlu_high_school_geography_v0.2:
- Acc: 0.5177664974619289
- Acc_stderr: 0.03569173227650389
-
mmlu_high_school_government_and_politics_v0.2:
- Acc: 0.37433155080213903
- Acc_stderr: 0.03548492341343032
-
mmlu_high_school_macroeconomics_v0.2:
- Acc: 0.3769230769230769
- Acc_stderr: 0.024570975364225995
-
mmlu_high_school_microeconomics_v0.2:
- Acc: 0.41350210970464135
- Acc_stderr: 0.03205649904851858
-
mmlu_high_school_psychology_v0.2:
- Acc: 0.49530956848030017
- Acc_stderr: 0.02167679538974159
-
mmlu_human_sexuality_v0.2:
- Acc: 0.5130434782608696
- Acc_stderr: 0.04681335351503156
-
mmlu_professional_psychology_v0.2:
- Acc: 0.37542087542087543
- Acc_stderr: 0.019884999970068064
-
mmlu_public_relations_v0.2:
- Acc: 0.5277777777777778
- Acc_stderr: 0.04826217294139894
-
mmlu_security_studies_v0.2:
- Acc: 0.4230769230769231
- Acc_stderr: 0.032366121762202014
-
mmlu_sociology_v0.2:
- Acc: 0.49743589743589745
- Acc_stderr: 0.03589743589743589
-
mmlu_us_foreign_policy_v0.2:
- Acc: 0.5959595959595959
- Acc_stderr: 0.049568727380426184
-
mmlu_stem_v0.2:
- Acc: 0.33836276083467093
- Acc_stderr: 0.008421487382708258
-
mmlu_abstract_algebra_v0.2:
- Acc: 0.27
- Acc_stderr: 0.0446196043338474
-
mmlu_anatomy_v0.2:
- Acc: 0.37404580152671757
- Acc_stderr: 0.04243869242230524
-
mmlu_astronomy:
- Acc: 0.4370860927152318
- Acc_stderr: 0.04050035722230636
-
mmlu_college_biology_v0.2:
- Acc: 0.39436619718309857
- Acc_stderr: 0.04115715424330713
-
mmlu_college_chemistry_v0.2:
- Acc: 0.3434343434343434
- Acc_stderr: 0.0479675905875748
-
mmlu_college_computer_science_v0.2:
- Acc: 0.3333333333333333
- Acc_stderr: 0.04761904761904759
-
mmlu_college_mathematics_v0.2:
- Acc: 0.36
- Acc_stderr: 0.048241815132442176
-
mmlu_college_physics_v0.2:
- Acc: 0.37623762376237624
- Acc_stderr: 0.048444078505841884
-
mmlu_computer_security_v0.2:
- Acc: 0.43
- Acc_stderr: 0.049756985195624284
-
mmlu_conceptual_physics_v0.2:
- Acc: 0.3218884120171674
- Acc_stderr: 0.03067321238265887
-
mmlu_electrical_engineering_v0.2:
- Acc: 0.4236111111111111
- Acc_stderr: 0.0413212501972337
-
mmlu_elementary_mathematics_v0.2:
- Acc: 0.3136729222520107
- Acc_stderr: 0.024056509418958694
-
mmlu_high_school_biology_v0.2:
- Acc: 0.41333333333333333
- Acc_stderr: 0.028478055207315847
-
mmlu_high_school_chemistry_v0.2:
- Acc: 0.28426395939086296
- Acc_stderr: 0



