OpenLLMTurkishLeadboardv2/details_Trendyol__Trendyol-LLM-7b-chat-dpo-v1.0
收藏数据集概述
数据集是在评估模型Trendyol/Trendyol-LLM-7b-chat-dpo-v1.0在Open LLM土耳其Leaderboard v0.2期间自动创建的。
评估结果
准确度(Acc)和标准误差(Acc_stderr)
-
Winogrande_tr:
- Acc: 0.5766
- Acc_stderr: 0.0139
-
TruthfulQA_v0.2:
- Acc: 0.4619
- Acc_stderr: 0.0162
-
MMLU_tr_v0.2:
- Acc: 0.3961
- Acc_stderr: 0.0041
-
MMLU_humanities_v0.2:
- Acc: 0.3701
- Acc_stderr: 0.0072
-
MMLU_formal_logic_v0.2:
- Acc: 0.2698
- Acc_stderr: 0.0397
-
MMLU_high_school_european_history_v0.2:
- Acc: 0.4267
- Acc_stderr: 0.0405
-
MMLU_high_school_us_history_v0.2:
- Acc: 0.4302
- Acc_stderr: 0.0371
-
MMLU_high_school_world_history_v0.2:
- Acc: 0.4554
- Acc_stderr: 0.0342
-
MMLU_international_law_v0.2:
- Acc: 0.4959
- Acc_stderr: 0.0456
-
MMLU_jurisprudence_v0.2:
- Acc: 0.3962
- Acc_stderr: 0.0477
-
MMLU_logical_fallacies_v0.2:
- Acc: 0.3851
- Acc_stderr: 0.0385
-
MMLU_moral_disputes_v0.2:
- Acc: 0.4286
- Acc_stderr: 0.0282
-
MMLU_moral_scenarios_v0.2:
- Acc: 0.2993
- Acc_stderr: 0.0155
-
MMLU_philosophy_v0.2:
- Acc: 0.4916
- Acc_stderr: 0.0290
-
MMLU_prehistory_v0.2:
- Acc: 0.4033
- Acc_stderr: 0.0284
-
MMLU_professional_law_v0.2:
- Acc: 0.3156
- Acc_stderr: 0.0125
-
MMLU_world_religions_v0.2:
- Acc: 0.5357
- Acc_stderr: 0.0386
-
MMLU_other_v0.2:
- Acc: 0.4502
- Acc_stderr: 0.0089
-
MMLU_business_ethics_v0.2:
- Acc: 0.4747
- Acc_stderr: 0.0504
-
MMLU_clinical_knowledge_v0.2:
- Acc: 0.4492
- Acc_stderr: 0.0311
-
MMLU_college_medicine_v0.2:
- Acc: 0.3988
- Acc_stderr: 0.0379
-
MMLU_global_facts_v0.2:
- Acc: 0.3061
- Acc_stderr: 0.0468
-
MMLU_human_aging_v0.2:
- Acc: 0.4575
- Acc_stderr: 0.0343
-
MMLU_management_v0.2:
- Acc: 0.5152
- Acc_stderr: 0.0505
-
MMLU_marketing_v0.2:
- Acc: 0.6083
- Acc_stderr: 0.0332
-
MMLU_medical_genetics_v0.2:
- Acc: 0.4211
- Acc_stderr: 0.0509
-
MMLU_miscellaneous_v0.2:
- Acc: 0.5431
- Acc_stderr: 0.0180
-
MMLU_nutrition_v0.2:
- Acc: 0.3934
- Acc_stderr: 0.0280
-
MMLU_professional_accounting_v0.2:
- Acc: 0.3333
- Acc_stderr: 0.0283
-
MMLU_professional_medicine_v0.2:
- Acc: 0.3372
- Acc_stderr: 0.0293
-
MMLU_virology_v0.2:
- Acc: 0.3836
- Acc_stderr: 0.0387
-
MMLU_social_sciences_v0.2:
- Acc: 0.4312
- Acc_stderr: 0.0090
-
MMLU_econometrics_v0.2:
- Acc: 0.3421
- Acc_stderr: 0.0446
-
MMLU_high_school_geography_v0.2:
- Acc: 0.5076
- Acc_stderr: 0.0357
-
MMLU_high_school_government_and_politics_v0.2:
- Acc: 0.3476
- Acc_stderr: 0.0349
-
MMLU_high_school_macroeconomics_v0.2:
- Acc: 0.3846
- Acc_stderr: 0.0247
-
MMLU_high_school_microeconomics_v0.2:
- Acc: 0.4262
- Acc_stderr: 0.0322
-
MMLU_high_school_psychology_v0.2:
- Acc: 0.4765
- Acc_stderr: 0.0217
-
MMLU_human_sexuality_v0.2:
- Acc: 0.5391
- Acc_stderr: 0.0467
-
MMLU_professional_psychology_v0.2:
- Acc: 0.3586
- Acc_stderr: 0.0197
-
MMLU_public_relations_v0.2:
- Acc: 0.4815
- Acc_stderr: 0.0483
-
MMLU_security_studies_v0.2:
- Acc: 0.4231
- Acc_stderr: 0.0324
-
MMLU_sociology_v0.2:
- Acc: 0.5231
- Acc_stderr: 0.0359
-
MMLU_us_foreign_policy_v0.2:
- Acc: 0.5859
- Acc_stderr: 0.0498
-
MMLU_stem_v0.2:
- Acc: 0.3467
- Acc_stderr: 0.0084
-
MMLU_abstract_algebra_v0.2:
- Acc: 0.3100
- Acc_stderr: 0.0465
-
MMLU_anatomy_v0.2:
- Acc: 0.4351
- Acc_stderr: 0.0435
-
MMLU_astronomy:
- Acc: 0.4305
- Acc_stderr: 0.0404
-
MMLU_college_biology_v0.2:
- Acc: 0.3944
- Acc_stderr: 0.0412
-
MMLU_college_chemistry_v0.2:
- Acc: 0.2727
- Acc_stderr: 0.0450
-
MMLU_college_computer_science_v0.2:
- Acc: 0.2727
- Acc_stderr: 0.0450
-
MMLU_college_mathematics_v0.2:
- Acc: 0.3100
- Acc_stderr: 0.0465
-
MMLU_college_physics_v0.2:
- Acc: 0.3267
- Acc_stderr: 0.0469
-
MMLU_computer_security_v0.2:
- Acc: 0.4700
- Acc_stderr: 0.0502
-
MMLU_conceptual_physics_v0.2:
- Acc: 0.2790
- Acc_stderr: 0.0294
-
MMLU_electrical_engineering_v0.2:
- Acc: 0.4861
- Acc_stderr: 0.0418
-
MMLU_elementary_mathematics_v0.2:
- Acc: 0.3432
- Acc_stderr: 0.0246
-
MMLU_high_school_biology_v0.2:
- Acc: 0.4367
- Acc_stderr: 0.0287
-
MMLU_high_school_chemistry_v0.2:
- Acc: 0.3503
- Acc_stderr: 0.0341
-
MMLU_high_school_computer_science_v0.2:
- Acc: 0.4800
- Acc_stderr: 0.0502
-
MMLU_high_school_mathematics_v0.2:
- Acc: 0.2667
- Acc_stderr: 0.0270
-
MMLU_high_school_physics_v0.2:
- Acc: 0.2993
- Acc_stderr: 0.0379
-
MMLU_high_school_statistics_v0.2:
- Acc: 0.2037
- Acc_stderr: 0.0275
-
MMLU_machine_learning_v0.2:
- Acc: 0.3125
- Acc_stderr: 0.0440
-
Hellaswag_tr-v0.2:
- Acc: 0.3799
- Acc_stderr: 0.0052
- Acc_norm: 0.4629
- Acc_norm_stderr: 0.0053
-
Gsm8k_tr-v0.2:
- Exact_match_strict-match: 0.0714
- Exact_match_stderr_strict-match: 0.0071
- Exact_match_flexible-extract: 0.0213
- Exact_match_stderr_flexible-extract: 0.0040



