five

OpenLLMTurkishLeadboardv2/details_Trendyol__Trendyol-LLM-7b-chat-dpo-v1.0

收藏
Hugging Face2024-04-27 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/OpenLLMTurkishLeadboardv2/details_Trendyol__Trendyol-LLM-7b-chat-dpo-v1.0
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在Open LLM Turkish Leaderboardv0.2上对Trendyol/Trendyol-LLM-7b-chat-dpo-v1.0模型进行评估时自动创建的。数据集包含了模型在多个任务上的评估结果,包括准确率和标准误差等详细信息。

该数据集是在Open LLM Turkish Leaderboardv0.2上对Trendyol/Trendyol-LLM-7b-chat-dpo-v1.0模型进行评估时自动创建的。数据集包含了模型在多个任务上的评估结果,包括准确率和标准误差等详细信息。
提供机构:
OpenLLMTurkishLeadboardv2
原始信息汇总

数据集概述

数据集是在评估模型Trendyol/Trendyol-LLM-7b-chat-dpo-v1.0在Open LLM土耳其Leaderboard v0.2期间自动创建的。

评估结果

准确度(Acc)和标准误差(Acc_stderr)

  • Winogrande_tr:

    • Acc: 0.5766
    • Acc_stderr: 0.0139
  • TruthfulQA_v0.2:

    • Acc: 0.4619
    • Acc_stderr: 0.0162
  • MMLU_tr_v0.2:

    • Acc: 0.3961
    • Acc_stderr: 0.0041
  • MMLU_humanities_v0.2:

    • Acc: 0.3701
    • Acc_stderr: 0.0072
  • MMLU_formal_logic_v0.2:

    • Acc: 0.2698
    • Acc_stderr: 0.0397
  • MMLU_high_school_european_history_v0.2:

    • Acc: 0.4267
    • Acc_stderr: 0.0405
  • MMLU_high_school_us_history_v0.2:

    • Acc: 0.4302
    • Acc_stderr: 0.0371
  • MMLU_high_school_world_history_v0.2:

    • Acc: 0.4554
    • Acc_stderr: 0.0342
  • MMLU_international_law_v0.2:

    • Acc: 0.4959
    • Acc_stderr: 0.0456
  • MMLU_jurisprudence_v0.2:

    • Acc: 0.3962
    • Acc_stderr: 0.0477
  • MMLU_logical_fallacies_v0.2:

    • Acc: 0.3851
    • Acc_stderr: 0.0385
  • MMLU_moral_disputes_v0.2:

    • Acc: 0.4286
    • Acc_stderr: 0.0282
  • MMLU_moral_scenarios_v0.2:

    • Acc: 0.2993
    • Acc_stderr: 0.0155
  • MMLU_philosophy_v0.2:

    • Acc: 0.4916
    • Acc_stderr: 0.0290
  • MMLU_prehistory_v0.2:

    • Acc: 0.4033
    • Acc_stderr: 0.0284
  • MMLU_professional_law_v0.2:

    • Acc: 0.3156
    • Acc_stderr: 0.0125
  • MMLU_world_religions_v0.2:

    • Acc: 0.5357
    • Acc_stderr: 0.0386
  • MMLU_other_v0.2:

    • Acc: 0.4502
    • Acc_stderr: 0.0089
  • MMLU_business_ethics_v0.2:

    • Acc: 0.4747
    • Acc_stderr: 0.0504
  • MMLU_clinical_knowledge_v0.2:

    • Acc: 0.4492
    • Acc_stderr: 0.0311
  • MMLU_college_medicine_v0.2:

    • Acc: 0.3988
    • Acc_stderr: 0.0379
  • MMLU_global_facts_v0.2:

    • Acc: 0.3061
    • Acc_stderr: 0.0468
  • MMLU_human_aging_v0.2:

    • Acc: 0.4575
    • Acc_stderr: 0.0343
  • MMLU_management_v0.2:

    • Acc: 0.5152
    • Acc_stderr: 0.0505
  • MMLU_marketing_v0.2:

    • Acc: 0.6083
    • Acc_stderr: 0.0332
  • MMLU_medical_genetics_v0.2:

    • Acc: 0.4211
    • Acc_stderr: 0.0509
  • MMLU_miscellaneous_v0.2:

    • Acc: 0.5431
    • Acc_stderr: 0.0180
  • MMLU_nutrition_v0.2:

    • Acc: 0.3934
    • Acc_stderr: 0.0280
  • MMLU_professional_accounting_v0.2:

    • Acc: 0.3333
    • Acc_stderr: 0.0283
  • MMLU_professional_medicine_v0.2:

    • Acc: 0.3372
    • Acc_stderr: 0.0293
  • MMLU_virology_v0.2:

    • Acc: 0.3836
    • Acc_stderr: 0.0387
  • MMLU_social_sciences_v0.2:

    • Acc: 0.4312
    • Acc_stderr: 0.0090
  • MMLU_econometrics_v0.2:

    • Acc: 0.3421
    • Acc_stderr: 0.0446
  • MMLU_high_school_geography_v0.2:

    • Acc: 0.5076
    • Acc_stderr: 0.0357
  • MMLU_high_school_government_and_politics_v0.2:

    • Acc: 0.3476
    • Acc_stderr: 0.0349
  • MMLU_high_school_macroeconomics_v0.2:

    • Acc: 0.3846
    • Acc_stderr: 0.0247
  • MMLU_high_school_microeconomics_v0.2:

    • Acc: 0.4262
    • Acc_stderr: 0.0322
  • MMLU_high_school_psychology_v0.2:

    • Acc: 0.4765
    • Acc_stderr: 0.0217
  • MMLU_human_sexuality_v0.2:

    • Acc: 0.5391
    • Acc_stderr: 0.0467
  • MMLU_professional_psychology_v0.2:

    • Acc: 0.3586
    • Acc_stderr: 0.0197
  • MMLU_public_relations_v0.2:

    • Acc: 0.4815
    • Acc_stderr: 0.0483
  • MMLU_security_studies_v0.2:

    • Acc: 0.4231
    • Acc_stderr: 0.0324
  • MMLU_sociology_v0.2:

    • Acc: 0.5231
    • Acc_stderr: 0.0359
  • MMLU_us_foreign_policy_v0.2:

    • Acc: 0.5859
    • Acc_stderr: 0.0498
  • MMLU_stem_v0.2:

    • Acc: 0.3467
    • Acc_stderr: 0.0084
  • MMLU_abstract_algebra_v0.2:

    • Acc: 0.3100
    • Acc_stderr: 0.0465
  • MMLU_anatomy_v0.2:

    • Acc: 0.4351
    • Acc_stderr: 0.0435
  • MMLU_astronomy:

    • Acc: 0.4305
    • Acc_stderr: 0.0404
  • MMLU_college_biology_v0.2:

    • Acc: 0.3944
    • Acc_stderr: 0.0412
  • MMLU_college_chemistry_v0.2:

    • Acc: 0.2727
    • Acc_stderr: 0.0450
  • MMLU_college_computer_science_v0.2:

    • Acc: 0.2727
    • Acc_stderr: 0.0450
  • MMLU_college_mathematics_v0.2:

    • Acc: 0.3100
    • Acc_stderr: 0.0465
  • MMLU_college_physics_v0.2:

    • Acc: 0.3267
    • Acc_stderr: 0.0469
  • MMLU_computer_security_v0.2:

    • Acc: 0.4700
    • Acc_stderr: 0.0502
  • MMLU_conceptual_physics_v0.2:

    • Acc: 0.2790
    • Acc_stderr: 0.0294
  • MMLU_electrical_engineering_v0.2:

    • Acc: 0.4861
    • Acc_stderr: 0.0418
  • MMLU_elementary_mathematics_v0.2:

    • Acc: 0.3432
    • Acc_stderr: 0.0246
  • MMLU_high_school_biology_v0.2:

    • Acc: 0.4367
    • Acc_stderr: 0.0287
  • MMLU_high_school_chemistry_v0.2:

    • Acc: 0.3503
    • Acc_stderr: 0.0341
  • MMLU_high_school_computer_science_v0.2:

    • Acc: 0.4800
    • Acc_stderr: 0.0502
  • MMLU_high_school_mathematics_v0.2:

    • Acc: 0.2667
    • Acc_stderr: 0.0270
  • MMLU_high_school_physics_v0.2:

    • Acc: 0.2993
    • Acc_stderr: 0.0379
  • MMLU_high_school_statistics_v0.2:

    • Acc: 0.2037
    • Acc_stderr: 0.0275
  • MMLU_machine_learning_v0.2:

    • Acc: 0.3125
    • Acc_stderr: 0.0440
  • Hellaswag_tr-v0.2:

    • Acc: 0.3799
    • Acc_stderr: 0.0052
    • Acc_norm: 0.4629
    • Acc_norm_stderr: 0.0053
  • Gsm8k_tr-v0.2:

    • Exact_match_strict-match: 0.0714
    • Exact_match_stderr_strict-match: 0.0071
    • Exact_match_flexible-extract: 0.0213
    • Exact_match_stderr_flexible-extract: 0.0040
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作