five

OpenLLMTurkishLeadboardv2/details_NovusResearch__Novus-7b-tr_v1

收藏
Hugging Face2024-04-27 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/OpenLLMTurkishLeadboardv2/details_NovusResearch__Novus-7b-tr_v1
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在Open LLM Turkish Leaderboardv0.2上对模型NovusResearch/Novus-7b-tr_v1进行评估时自动创建的。数据集包含了多个评估任务的结果,如winogrande_tr-v0.2、truthfulqa_v0.2、mmlu_tr_v0.2等,每个任务都有相应的准确率和标准误差。此外,还提供了每个任务的配置信息,包括任务名称、数据集路径、测试分割、fewshot分割、文档到文本的转换方式等。

该数据集是在Open LLM Turkish Leaderboardv0.2上对模型NovusResearch/Novus-7b-tr_v1进行评估时自动创建的。数据集包含了多个评估任务的结果,如winogrande_tr-v0.2、truthfulqa_v0.2、mmlu_tr_v0.2等,每个任务都有相应的准确率和标准误差。此外,还提供了每个任务的配置信息,包括任务名称、数据集路径、测试分割、fewshot分割、文档到文本的转换方式等。
提供机构:
OpenLLMTurkishLeadboardv2
原始信息汇总

数据集概述

数据集是在评估模型NovusResearch/Novus-7b-tr_v1运行期间自动创建的,用于Open LLM土耳其Leaderboard v0.2。

评估结果

准确率(Acc)

  • Winogrande_tr-v0.2: 0.5355450236966824
  • TruthfulQA_v0.2: 0.4885051906710711
  • MMLU_tr_v0.2: 0.43082156326258964
  • MMLU_humanities_v0.2: 0.40218629013892054
  • MMLU_formal_logic_v0.2: 0.36507936507936506
  • MMLU_high_school_european_history_v0.2: 0.5666666666666667
  • MMLU_high_school_us_history_v0.2: 0.4748603351955307
  • MMLU_high_school_world_history_v0.2: 0.5492957746478874
  • MMLU_international_law_v0.2: 0.6033057851239669
  • MMLU_jurisprudence_v0.2: 0.5188679245283019
  • MMLU_logical_fallacies_v0.2: 0.40372670807453415
  • MMLU_moral_disputes_v0.2: 0.474025974025974
  • MMLU_moral_scenarios_v0.2: 0.30160550458715596
  • MMLU_philosophy_v0.2: 0.5250836120401338
  • MMLU_prehistory_v0.2: 0.48333333333333334
  • MMLU_professional_law_v0.2: 0.31051873198847263
  • MMLU_world_religions_v0.2: 0.5833333333333334
  • MMLU_other_v0.2: 0.47777040477770405
  • MMLU_business_ethics_v0.2: 0.43434343434343436
  • MMLU_clinical_knowledge_v0.2: 0.5078125
  • MMLU_college_medicine_v0.2: 0.4166666666666667
  • MMLU_global_facts_v0.2: 0.22448979591836735
  • MMLU_human_aging_v0.2: 0.5141509433962265
  • MMLU_management_v0.2: 0.5656565656565656
  • MMLU_marketing_v0.2: 0.6451612903225806
  • MMLU_medical_genetics_v0.2: 0.5052631578947369
  • MMLU_miscellaneous_v0.2: 0.5783289817232375
  • MMLU_nutrition_v0.2: 0.4918032786885246
  • MMLU_professional_accounting_v0.2: 0.3154121863799283
  • MMLU_professional_medicine_v0.2: 0.3333333333333333
  • MMLU_virology_v0.2: 0.33962264150943394
  • MMLU_social_sciences_v0.2: 0.46653346653346656
  • MMLU_econometrics_v0.2: 0.34210526315789475
  • MMLU_high_school_geography_v0.2: 0.5634517766497462
  • MMLU_high_school_government_and_politics_v0.2: 0.48128342245989303
  • MMLU_high_school_macroeconomics_v0.2: 0.40512820512820513
  • MMLU_high_school_microeconomics_v0.2: 0.4219409282700422
  • MMLU_high_school_psychology_v0.2: 0.5290806754221389
  • MMLU_human_sexuality_v0.2: 0.5217391304347826
  • MMLU_professional_psychology_v0.2: 0.3569023569023569
  • MMLU_public_relations_v0.2: 0.5
  • MMLU_security_studies_v0.2: 0.49145299145299143
  • MMLU_sociology_v0.2: 0.5948717948717949
  • MMLU_us_foreign_policy_v0.2: 0.6464646464646465
  • MMLU_stem_v0.2: 0.3913322632423756
  • MMLU_abstract_algebra_v0.2: 0.34
  • MMLU_anatomy_v0.2: 0.3969465648854962
  • MMLU_astronomy: 0.423841059602649
  • MMLU_college_biology_v0.2: 0.39436619718309857
  • MMLU_college_chemistry_v0.2: 0.3434343434343434
  • MMLU_college_computer_science_v0.2: 0.43434343434343436
  • MMLU_college_mathematics_v0.2: 0.35
  • MMLU_college_physics_v0.2: 0.36633663366336633
  • MMLU_computer_security_v0.2: 0.53
  • MMLU_conceptual_physics_v0.2: 0.37339055793991416
  • MMLU_electrical_engineering_v0.2: 0.4513888888888889
  • MMLU_elementary_mathematics_v0.2: 0.3646112600536193
  • MMLU_high_school_biology_v0.2: 0.49
  • MMLU_high_school_chemistry_v0.2: 0.40609137055837563
  • MMLU_high_school_computer_science_v0.2: 0.53
  • MMLU_high_school_mathematics_v0.2: 0.37037037037037035
  • MMLU_high_school_physics_v0.2: 0.25170068027210885
  • MMLU_high_school_statistics_v0.2: 0.33796296296296297
  • MMLU_machine_learning_v0.2: 0.29464285714285715
  • Hellaswag_tr-v0.2: 0.33623122953596024
  • Gsm8k_tr-v0.2: 0.2968868640850418
  • Arc_tr-v0.2: 0.3037542662116041

准确率标准误差(Acc_stderr)

  • Winogrande_tr-v0.2: 0.01402247071006201
  • TruthfulQA_v0.2: 0.015768652311999566
  • MMLU_tr_v0.2: 0.004174591802570225
  • MMLU_humanities_v0.2: 0.007232346019717871
  • MMLU_formal_logic_v0.2: 0.04306241259127154
  • MMLU_high_school_european_history_v0.2: 0.04059586016811274
  • MMLU_high_school_us_history_v0.2: 0.03742918386493421
  • MMLU_high_school_world_history_v0.2: 0.034172835303063566
  • MMLU_international_law_v0.2: 0.04465869780531009
  • MMLU_jurisprudence_v0.2: 0.048760249366915184
  • MMLU_logical_fallacies_v0.2: 0.03878880744346832
  • MMLU_moral_disputes_v0.2: 0.02849797695401096
  • MMLU_moral_scenarios_v0.2: 0.015551094415874425
  • MMLU_philosophy_v0.2: 0.028927751498085054
  • MMLU_prehistory_v0.2: 0.028899677829858885
  • MMLU_professional_law_v0.2: 0.01242415632832327
  • MMLU_world_religions_v0.2: 0.03814999984740004
  • MMLU_other_v0.2: 0.008883749561788206
  • MMLU_business_ethics_v0.2: 0.05007027870966083
  • MMLU_clinical_knowledge_v0.2: 0.03130739215119687
  • MMLU_college_medicine_v0.2: 0.038149999847400036
  • MMLU_global_facts_v0.2: 0.04236490079110509
  • MMLU_human_aging_v0.2: 0.03440763105600619
  • MMLU_management_v0.2: 0.05007027870966083
  • MMLU_marketing_v0.2: 0.032555380151585905
  • MMLU_medical_genetics_v0.2: 0.05156820511122477
  • MMLU_miscellaneous_v0.2: 0.017854333269534662
  • MMLU_nutrition_v0.2: 0.02867311307973662
  • MMLU_professional_accounting_v0.2: 0.027869643826017403
  • MMLU_professional_medicine_v0.2: 0.029235267310234354
  • MMLU_virology_v0.2: 0.03767609312195345
  • MMLU_social_sciences_v0.2: 0.008987563549766611
  • MMLU_econometrics_v0.2: 0.04462917535336937
  • MMLU_high_school_geography_v0.2: 0.03542553789144082
  • MMLU_high_school_government_and_politics_v0.2: 0.03663608375537842
  • MMLU_high_school_macroeconomics_v0.2: 0.024890471769938145
  • MMLU_high_school_microeconomics_v0.2: 0.032148146302403695
  • MMLU_high_school_psychology_v0.2: 0.0216410530540354
  • MMLU_human_sexuality_v0.2: 0.04678500755208441
  • MMLU_professional_psychology_v0.2: 0.01967368983532993
  • MMLU_public_relations_v0.2: 0.04833682445228318
  • MMLU_security_studies_v0.2: 0.032751303000970296
  • MMLU_sociology_v0.2: 0.03524577495610961
  • MMLU_us_foreign_policy_v0.2: 0.048292065023611885
  • MMLU_stem_v0.2: 0.008691765752085577
  • MMLU_abstract_algebra_v0.2: 0.04760952285695236
  • **MMLU_anat
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作