OpenLLMTurkishLeadboardv2/details_aerdincdal__CBDDO-LLM-8B-Instruct-v1
收藏数据集概述
数据集是在评估模型 aerdincdal/CBDDO-LLM-8B-Instruct-v1 在 Open LLM Turkish Leaderboard v0.2 上的运行过程中自动创建的。
评估结果
准确率(Acc)和标准误差(Acc_stderr)
-
Winogrande_tr_v0.2:
- 准确率: 0.5379
- 标准误差: 0.0140
-
TruthfulQA_v0.2:
- 准确率: 0.4657
- 标准误差: 0.0151
-
MMLU_tr_v0.2:
- 准确率: 0.4370
- 标准误差: 0.0042
-
MMLU_humanities_v0.2:
- 准确率: 0.4086
- 标准误差: 0.0072
-
MMLU_formal_logic_v0.2:
- 准确率: 0.3175
- 标准误差: 0.0416
-
MMLU_high_school_european_history_v0.2:
- 准确率: 0.5800
- 标准误差: 0.0404
-
MMLU_high_school_us_history_v0.2:
- 准确率: 0.5754
- 标准误差: 0.0370
-
MMLU_high_school_world_history_v0.2:
- 准确率: 0.5634
- 标准误差: 0.0341
-
MMLU_international_law_v0.2:
- 准确率: 0.5702
- 标准误差: 0.0452
-
MMLU_jurisprudence_v0.2:
- 准确率: 0.5660
- 标准误差: 0.0484
-
MMLU_logical_fallacies_v0.2:
- 准确率: 0.4224
- 标准误差: 0.0390
-
MMLU_moral_disputes_v0.2:
- 准确率: 0.4805
- 标准误差: 0.0285
-
MMLU_moral_scenarios_v0.2:
- 准确率: 0.2936
- 标准误差: 0.0154
-
MMLU_philosophy_v0.2:
- 准确率: 0.4950
- 标准误差: 0.0290
-
MMLU_prehistory_v0.2:
- 准确率: 0.4733
- 标准误差: 0.0289
-
MMLU_professional_law_v0.2:
- 准确率: 0.3235
- 标准误差: 0.0126
-
MMLU_world_religions_v0.2:
- 准确率: 0.6190
- 标准误差: 0.0376
-
MMLU_other_v0.2:
- 准确率: 0.4917
- 标准误差: 0.0089
-
MMLU_business_ethics_v0.2:
- 准确率: 0.5354
- 标准误差: 0.0504
-
MMLU_clinical_knowledge_v0.2:
- 准确率: 0.4844
- 标准误差: 0.0313
-
MMLU_college_medicine_v0.2:
- 准确率: 0.3988
- 标准误差: 0.0379
-
MMLU_global_facts_v0.2:
- 准确率: 0.3469
- 标准误差: 0.0483
-
MMLU_human_aging_v0.2:
- 准确率: 0.4953
- 标准误差: 0.0344
-
MMLU_management_v0.2:
- 准确率: 0.5253
- 标准误差: 0.0504
-
MMLU_marketing_v0.2:
- 准确率: 0.6359
- 标准误差: 0.0327
-
MMLU_medical_genetics_v0.2:
- 准确率: 0.5789
- 标准误差: 0.0509
-
MMLU_miscellaneous_v0.2:
- 准确率: 0.5849
- 标准误差: 0.0178
-
MMLU_nutrition_v0.2:
- 准确率: 0.4984
- 标准误差: 0.0287
-
MMLU_professional_accounting_v0.2:
- 准确率: 0.2760
- 标准误差: 0.0268
-
MMLU_professional_medicine_v0.2:
- 准确率: 0.4291
- 标准误差: 0.0307
-
MMLU_virology_v0.2:
- 准确率: 0.4088
- 标准误差: 0.0391
-
MMLU_social_sciences_v0.2:
- 准确率: 0.4852
- 标准误差: 0.0090
-
MMLU_econometrics_v0.2:
- 准确率: 0.3421
- 标准误差: 0.0446
-
MMLU_high_school_geography_v0.2:
- 准确率: 0.5838
- 标准误差: 0.0352
-
MMLU_high_school_government_and_politics_v0.2:
- 准确率: 0.5080
- 标准误差: 0.0367
-
MMLU_high_school_macroeconomics_v0.2:
- 准确率: 0.4308
- 标准误差: 0.0251
-
MMLU_high_school_microeconomics_v0.2:
- 准确率: 0.3755
- 标准误差: 0.0315
-
MMLU_high_school_psychology_v0.2:
- 准确率: 0.5385
- 标准误差: 0.0216
-
MMLU_human_sexuality_v0.2:
- 准确率: 0.5391
- 标准误差: 0.0467
-
MMLU_professional_psychology_v0.2:
- 准确率: 0.4040
- 标准误差: 0.0202
-
MMLU_public_relations_v0.2:
- 准确率: 0.5370
- 标准误差: 0.0482
-
MMLU_security_studies_v0.2:
- 准确率: 0.4658
- 标准误差: 0.0327
-
MMLU_sociology_v0.2:
- 准确率: 0.6615
- 标准误差: 0.0340
-
MMLU_us_foreign_policy_v0.2:
- 准确率: 0.6667
- 标准误差: 0.0476
-
MMLU_stem_v0.2:
- 准确率: 0.3778
- 标准误差: 0.0085
-
MMLU_abstract_algebra_v0.2:
- 准确率: 0.2800
- 标准误差: 0.0451
-
MMLU_anatomy_v0.2:
- 准确率: 0.4733
- 标准误差: 0.0438
-
MMLU_astronomy:
- 准确率: 0.4371
- 标准误差: 0.0405
-
MMLU_college_biology_v0.2:
- 准确率: 0.5070
- 标准误差: 0.0421
-
MMLU_college_chemistry_v0.2:
- 准确率: 0.2828
- 标准误差: 0.0455
-
MMLU_college_computer_science_v0.2:
- 准确率: 0.3636
- 标准误差: 0.0486
-
MMLU_college_mathematics_v0.2:
- 准确率: 0.3800
- 标准误差: 0.0488
-
MMLU_college_physics_v0.2:
- 准确率: 0.2574
- 标准误差: 0.0437
-
MMLU_computer_security_v0.2:
- 准确率: 0.5400
- 标准误差: 0.0501
-
MMLU_conceptual_physics_v0.2:
- 准确率: 0.3820
- 标准误差: 0.0319
-
MMLU_electrical_engineering_v0.2:
- 准确率: 0.4861
- 标准误差: 0.0418
-
MMLU_elementary_mathematics_v0.2:
- 准确率: 0.2761
- 标准误差: 0.0232
-
MMLU_high_school_biology_v0.2:
- 准确率: 0.5100
- 标准误差: 0.0289
-
MMLU_high_school_chemistry_v0.2:
- 准确率: 0.4061
- 标准误差: 0.0351
-
MMLU_high_school_computer_science_v0.2:
- 准确率: 0.5100
- 标准误差: 0.0502
-
MMLU_high_school_mathematics_v0.2:
- 准确率: 0.2741
- 标准误差: 0.0272
-
MMLU_high_school_physics_v0.2:
- 准确率: 0.3129
- 标准误差: 0.0384
-
MMLU_high_school_statistics_v0.2:
- 准确率: 0.2593
- 标准误差: 0.0299
-
MMLU_machine_learning_v0.2:
- 准确率: 0.4018
- 标准误差: 0.0465
-
Hellaswag_tr-v0.2:
- 准确率: 0.3695
- 标准误差: 0.0051
-
Gsm8k_tr-v0.2:
- 准确率(严格匹配): 0.0091
- 标准误差(严格匹配): 0.0026
- 准确率(灵活提取): 0.1807
- 标准误差(灵活提取): 0.0106
-
Arc_tr-v0.2:
- 准确率: 0.3797
- 标准误差: 0.0142
数据集创建背景
该数据集是在评估模型 aerdincdal/CBDDO-LLM-8B-Instruct-v1 在 Open LLM Turkish Leaderboard v0.2 上的运行过程中自动创建的。



