OpenLLMTurkishLeadboardv2/details_NovusResearch__Novus-7b-tr_v1
收藏Hugging Face2024-04-27 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/OpenLLMTurkishLeadboardv2/details_NovusResearch__Novus-7b-tr_v1
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是在Open LLM Turkish Leaderboardv0.2上对模型NovusResearch/Novus-7b-tr_v1进行评估时自动创建的。数据集包含了多个评估任务的结果,如winogrande_tr-v0.2、truthfulqa_v0.2、mmlu_tr_v0.2等,每个任务都有相应的准确率和标准误差。此外,还提供了每个任务的配置信息,包括任务名称、数据集路径、测试分割、fewshot分割、文档到文本的转换方式等。
该数据集是在Open LLM Turkish Leaderboardv0.2上对模型NovusResearch/Novus-7b-tr_v1进行评估时自动创建的。数据集包含了多个评估任务的结果,如winogrande_tr-v0.2、truthfulqa_v0.2、mmlu_tr_v0.2等,每个任务都有相应的准确率和标准误差。此外,还提供了每个任务的配置信息,包括任务名称、数据集路径、测试分割、fewshot分割、文档到文本的转换方式等。
提供机构:
OpenLLMTurkishLeadboardv2
原始信息汇总
数据集概述
数据集是在评估模型NovusResearch/Novus-7b-tr_v1运行期间自动创建的,用于Open LLM土耳其Leaderboard v0.2。
评估结果
准确率(Acc)
- Winogrande_tr-v0.2: 0.5355450236966824
- TruthfulQA_v0.2: 0.4885051906710711
- MMLU_tr_v0.2: 0.43082156326258964
- MMLU_humanities_v0.2: 0.40218629013892054
- MMLU_formal_logic_v0.2: 0.36507936507936506
- MMLU_high_school_european_history_v0.2: 0.5666666666666667
- MMLU_high_school_us_history_v0.2: 0.4748603351955307
- MMLU_high_school_world_history_v0.2: 0.5492957746478874
- MMLU_international_law_v0.2: 0.6033057851239669
- MMLU_jurisprudence_v0.2: 0.5188679245283019
- MMLU_logical_fallacies_v0.2: 0.40372670807453415
- MMLU_moral_disputes_v0.2: 0.474025974025974
- MMLU_moral_scenarios_v0.2: 0.30160550458715596
- MMLU_philosophy_v0.2: 0.5250836120401338
- MMLU_prehistory_v0.2: 0.48333333333333334
- MMLU_professional_law_v0.2: 0.31051873198847263
- MMLU_world_religions_v0.2: 0.5833333333333334
- MMLU_other_v0.2: 0.47777040477770405
- MMLU_business_ethics_v0.2: 0.43434343434343436
- MMLU_clinical_knowledge_v0.2: 0.5078125
- MMLU_college_medicine_v0.2: 0.4166666666666667
- MMLU_global_facts_v0.2: 0.22448979591836735
- MMLU_human_aging_v0.2: 0.5141509433962265
- MMLU_management_v0.2: 0.5656565656565656
- MMLU_marketing_v0.2: 0.6451612903225806
- MMLU_medical_genetics_v0.2: 0.5052631578947369
- MMLU_miscellaneous_v0.2: 0.5783289817232375
- MMLU_nutrition_v0.2: 0.4918032786885246
- MMLU_professional_accounting_v0.2: 0.3154121863799283
- MMLU_professional_medicine_v0.2: 0.3333333333333333
- MMLU_virology_v0.2: 0.33962264150943394
- MMLU_social_sciences_v0.2: 0.46653346653346656
- MMLU_econometrics_v0.2: 0.34210526315789475
- MMLU_high_school_geography_v0.2: 0.5634517766497462
- MMLU_high_school_government_and_politics_v0.2: 0.48128342245989303
- MMLU_high_school_macroeconomics_v0.2: 0.40512820512820513
- MMLU_high_school_microeconomics_v0.2: 0.4219409282700422
- MMLU_high_school_psychology_v0.2: 0.5290806754221389
- MMLU_human_sexuality_v0.2: 0.5217391304347826
- MMLU_professional_psychology_v0.2: 0.3569023569023569
- MMLU_public_relations_v0.2: 0.5
- MMLU_security_studies_v0.2: 0.49145299145299143
- MMLU_sociology_v0.2: 0.5948717948717949
- MMLU_us_foreign_policy_v0.2: 0.6464646464646465
- MMLU_stem_v0.2: 0.3913322632423756
- MMLU_abstract_algebra_v0.2: 0.34
- MMLU_anatomy_v0.2: 0.3969465648854962
- MMLU_astronomy: 0.423841059602649
- MMLU_college_biology_v0.2: 0.39436619718309857
- MMLU_college_chemistry_v0.2: 0.3434343434343434
- MMLU_college_computer_science_v0.2: 0.43434343434343436
- MMLU_college_mathematics_v0.2: 0.35
- MMLU_college_physics_v0.2: 0.36633663366336633
- MMLU_computer_security_v0.2: 0.53
- MMLU_conceptual_physics_v0.2: 0.37339055793991416
- MMLU_electrical_engineering_v0.2: 0.4513888888888889
- MMLU_elementary_mathematics_v0.2: 0.3646112600536193
- MMLU_high_school_biology_v0.2: 0.49
- MMLU_high_school_chemistry_v0.2: 0.40609137055837563
- MMLU_high_school_computer_science_v0.2: 0.53
- MMLU_high_school_mathematics_v0.2: 0.37037037037037035
- MMLU_high_school_physics_v0.2: 0.25170068027210885
- MMLU_high_school_statistics_v0.2: 0.33796296296296297
- MMLU_machine_learning_v0.2: 0.29464285714285715
- Hellaswag_tr-v0.2: 0.33623122953596024
- Gsm8k_tr-v0.2: 0.2968868640850418
- Arc_tr-v0.2: 0.3037542662116041
准确率标准误差(Acc_stderr)
- Winogrande_tr-v0.2: 0.01402247071006201
- TruthfulQA_v0.2: 0.015768652311999566
- MMLU_tr_v0.2: 0.004174591802570225
- MMLU_humanities_v0.2: 0.007232346019717871
- MMLU_formal_logic_v0.2: 0.04306241259127154
- MMLU_high_school_european_history_v0.2: 0.04059586016811274
- MMLU_high_school_us_history_v0.2: 0.03742918386493421
- MMLU_high_school_world_history_v0.2: 0.034172835303063566
- MMLU_international_law_v0.2: 0.04465869780531009
- MMLU_jurisprudence_v0.2: 0.048760249366915184
- MMLU_logical_fallacies_v0.2: 0.03878880744346832
- MMLU_moral_disputes_v0.2: 0.02849797695401096
- MMLU_moral_scenarios_v0.2: 0.015551094415874425
- MMLU_philosophy_v0.2: 0.028927751498085054
- MMLU_prehistory_v0.2: 0.028899677829858885
- MMLU_professional_law_v0.2: 0.01242415632832327
- MMLU_world_religions_v0.2: 0.03814999984740004
- MMLU_other_v0.2: 0.008883749561788206
- MMLU_business_ethics_v0.2: 0.05007027870966083
- MMLU_clinical_knowledge_v0.2: 0.03130739215119687
- MMLU_college_medicine_v0.2: 0.038149999847400036
- MMLU_global_facts_v0.2: 0.04236490079110509
- MMLU_human_aging_v0.2: 0.03440763105600619
- MMLU_management_v0.2: 0.05007027870966083
- MMLU_marketing_v0.2: 0.032555380151585905
- MMLU_medical_genetics_v0.2: 0.05156820511122477
- MMLU_miscellaneous_v0.2: 0.017854333269534662
- MMLU_nutrition_v0.2: 0.02867311307973662
- MMLU_professional_accounting_v0.2: 0.027869643826017403
- MMLU_professional_medicine_v0.2: 0.029235267310234354
- MMLU_virology_v0.2: 0.03767609312195345
- MMLU_social_sciences_v0.2: 0.008987563549766611
- MMLU_econometrics_v0.2: 0.04462917535336937
- MMLU_high_school_geography_v0.2: 0.03542553789144082
- MMLU_high_school_government_and_politics_v0.2: 0.03663608375537842
- MMLU_high_school_macroeconomics_v0.2: 0.024890471769938145
- MMLU_high_school_microeconomics_v0.2: 0.032148146302403695
- MMLU_high_school_psychology_v0.2: 0.0216410530540354
- MMLU_human_sexuality_v0.2: 0.04678500755208441
- MMLU_professional_psychology_v0.2: 0.01967368983532993
- MMLU_public_relations_v0.2: 0.04833682445228318
- MMLU_security_studies_v0.2: 0.032751303000970296
- MMLU_sociology_v0.2: 0.03524577495610961
- MMLU_us_foreign_policy_v0.2: 0.048292065023611885
- MMLU_stem_v0.2: 0.008691765752085577
- MMLU_abstract_algebra_v0.2: 0.04760952285695236
- **MMLU_anat



