five

OpenLLMTurkishLeadboardv2/details_Trendyol__Trendyol-LLM-7b-chat-v0.1

收藏
Hugging Face2024-04-28 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/OpenLLMTurkishLeadboardv2/details_Trendyol__Trendyol-LLM-7b-chat-v0.1
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset Card for Evaluation run of Trendyol/Trendyol-LLM-7b-chat-v1.0 ## Dataset Summary Dataset automatically created during the evaluation run of model Trendyol/Trendyol-LLM-7b-chat-v1.0 on the Open LLM Turkish Leaderboardv0.2. ```python { "results": { "winogrande_tr-v0.2": { "acc,none": 0.5442338072669827, "acc_stderr,none": 0.014002918111878003, "alias": "winogrande_tr-v0.2" }, "truthfulqa_v0.2": { "acc,none": 0.4219325234148155, "acc_stderr,none": 0.01575701966425769, "alias": "truthfulqa_v0.2" }, "mmlu_tr_v0.2": { "acc,none": 0.34496783258152774, "acc_stderr,none": 0.004053317783140006, "alias": "mmlu_tr_v0.2" }, "mmlu_humanities_v0.2": { "alias": " - humanities_v0.2", "acc,none": 0.32566613527670235, "acc_stderr,none": 0.0070285441107821225 }, "mmlu_formal_logic_v0.2": { "alias": " - formal_logic_v0.2", "acc,none": 0.30952380952380953, "acc_stderr,none": 0.04134913018303317 }, "mmlu_high_school_european_history_v0.2": { "alias": " - high_school_european_history_v0.2", "acc,none": 0.3466666666666667, "acc_stderr,none": 0.03898794245625698 }, "mmlu_high_school_us_history_v0.2": { "alias": " - high_school_us_history_v0.2", "acc,none": 0.3743016759776536, "acc_stderr,none": 0.03627299203728481 }, "mmlu_high_school_world_history_v0.2": { "alias": " - high_school_world_history_v0.2", "acc,none": 0.352112676056338, "acc_stderr,none": 0.032803685611555944 }, "mmlu_international_law_v0.2": { "alias": " - international_law_v0.2", "acc,none": 0.48760330578512395, "acc_stderr,none": 0.04562951548180765 }, "mmlu_jurisprudence_v0.2": { "alias": " - jurisprudence_v0.2", "acc,none": 0.33962264150943394, "acc_stderr,none": 0.046216787599682646 }, "mmlu_logical_fallacies_v0.2": { "alias": " - logical_fallacies_v0.2", "acc,none": 0.33540372670807456, "acc_stderr,none": 0.03732526513790706 }, "mmlu_moral_disputes_v0.2": { "alias": " - moral_disputes_v0.2", "acc,none": 0.3538961038961039, "acc_stderr,none": 0.027291027241446258 }, "mmlu_moral_scenarios_v0.2": { "alias": " - moral_scenarios_v0.2", "acc,none": 0.26605504587155965, "acc_stderr,none": 0.014972985947866174 }, "mmlu_philosophy_v0.2": { "alias": " - philosophy_v0.2", "acc,none": 0.41471571906354515, "acc_stderr,none": 0.028539775107236747 }, "mmlu_prehistory_v0.2": { "alias": " - prehistory_v0.2", "acc,none": 0.38333333333333336, "acc_stderr,none": 0.028117579742899083 }, "mmlu_professional_law_v0.2": { "alias": " - professional_law_v0.2", "acc,none": 0.28530259365994237, "acc_stderr,none": 0.012124832072387796 }, "mmlu_world_religions_v0.2": { "alias": " - world_religions_v0.2", "acc,none": 0.42857142857142855, "acc_stderr,none": 0.038294318709323184 }, "mmlu_other_v0.2": { "alias": " - other_v0.2", "acc,none": 0.3928334439283344, "acc_stderr,none": 0.008828403598407038 }, "mmlu_business_ethics_v0.2": { "alias": " - business_ethics_v0.2", "acc,none": 0.42424242424242425, "acc_stderr,none": 0.04992451339684325 }, "mmlu_clinical_knowledge_v0.2": { "alias": " - clinical_knowledge_v0.2", "acc,none": 0.38671875, "acc_stderr,none": 0.030497017430410063 }, "mmlu_college_medicine_v0.2": { "alias": " - college_medicine_v0.2", "acc,none": 0.35714285714285715, "acc_stderr,none": 0.037078314653891886 }, "mmlu_global_facts_v0.2": { "alias": " - global_facts_v0.2", "acc,none": 0.30612244897959184, "acc_stderr,none": 0.04679539751912001 }, "mmlu_human_aging_v0.2": { "alias": " - human_aging_v0.2", "acc,none": 0.35377358490566035, "acc_stderr,none": 0.032916513345837756 }, "mmlu_management_v0.2": { "alias": " - management_v0.2", "acc,none": 0.5050505050505051, "acc_stderr,none": 0.05050505050505048 }, "mmlu_marketing_v0.2": { "alias": " - marketing_v0.2", "acc,none": 0.4976958525345622, "acc_stderr,none": 0.034020329631874166 }, "mmlu_medical_genetics_v0.2": { "alias": " - medical_genetics_v0.2", "acc,none": 0.4842105263157895, "acc_stderr,none": 0.051545341795930656 }, "mmlu_miscellaneous_v0.2": { "alias": " - miscellaneous_v0.2", "acc,none": 0.4347258485639687, "acc_stderr,none": 0.017922829679799863 }, "mmlu_nutrition_v0.2": { "alias": " - nutrition_v0.2", "acc,none": 0.40327868852459015, "acc_stderr,none": 0.02813530442265587 }, "mmlu_professional_accounting_v0.2": { "alias": " - professional_accounting_v0.2", "acc,none": 0.34408602150537637, "acc_stderr,none": 0.02849276263716394 }, "mmlu_professional_medicine_v0.2": { "alias": " - professional_medicine_v0.2", "acc,none": 0.2413793103448276, "acc_stderr,none": 0.026538458224468434 }, "mmlu_virology_v0.2": { "alias": " - virology_v0.2", "acc,none": 0.3710691823899371, "acc_stderr,none": 0.03843265063227864 }, "mmlu_social_sciences_v0.2": { "alias": " - social_sciences_v0.2", "acc,none": 0.374958374958375, "acc_stderr,none": 0.008780182587596134 }, "mmlu_econometrics_v0.2": { "alias": " - econometrics_v0.2", "acc,none": 0.3157894736842105, "acc_stderr,none": 0.04372748290278007 }, "mmlu_high_school_geography_v0.2": { "alias": " - high_school_geography_v0.2", "acc,none": 0.5076142131979695, "acc_stderr,none": 0.0357101443139815 }, "mmlu_high_school_government_and_politics_v0.2": { "alias": " - high_school_government_and_politics_v0.2", "acc,none": 0.3422459893048128, "acc_stderr,none": 0.03478920176906824 }, "mmlu_high_school_macroeconomics_v0.2": { "alias": " - high_school_macroeconomics_v0.2", "acc,none": 0.34102564102564104, "acc_stderr,none": 0.02403548967633506 }, "mmlu_high_school_microeconomics_v0.2": { "alias": " - high_school_microeconomics_v0.2", "acc,none": 0.3080168776371308, "acc_stderr,none": 0.0300523893356057 }, "mmlu_high_school_psychology_v0.2": { "alias": " - high_school_psychology_v0.2", "acc,none": 0.4090056285178236, "acc_stderr,none": 0.02131574413531962 }, "mmlu_human_sexuality_v0.2": { "alias": " - human_sexuality_v0.2", "acc,none": 0.4434782608695652, "acc_stderr,none": 0.04652911680416962 }, "mmlu_professional_psychology_v0.2": { "alias": " - professional_psychology_v0.2", "acc,none": 0.3602693602693603, "acc_stderr,none": 0.019714460342031 }, "mmlu_public_relations_v0.2": { "alias": " - public_relations_v0.2", "acc,none": 0.42592592592592593, "acc_stderr,none": 0.0478034362693679 }, "mmlu_security_studies_v0.2": { "alias": " - security_studies_v0.2", "acc,none": 0.2692307692307692, "acc_stderr,none": 0.02905858830374884 }, "mmlu_sociology_v0.2": { "alias": " - sociology_v0.2", "acc,none": 0.40512820512820513, "acc_stderr,none": 0.03524577495610962 }, "mmlu_us_foreign_policy_v0.2": { "alias": " - us_foreign_policy_v0.2", "acc,none": 0.494949494949495, "acc_stderr,none": 0.05050505050505048 }, "mmlu_stem_v0.2": { "alias": " - stem_v0.2", "acc,none": 0.2969502407704655, "acc_stderr,none": 0.008176712632120369 }, "mmlu_abstract_algebra_v0.2": { "alias": " - abstract_algebra_v0.2", "acc,none": 0.31, "acc_stderr,none": 0.04648231987117316 }, "mmlu_anatomy_v0.2": { "alias": " - anatomy_v0.2", "acc,none": 0.3511450381679389, "acc_stderr,none": 0.04186445163013751 }, "mmlu_astronomy": { "alias": " - astronomy", "acc,none": 0.31788079470198677, "acc_stderr,none": 0.03802039760107903 }, "mmlu_college_biology_v0.2": { "alias": " - college_biology_v0.2", "acc,none": 0.2746478873239437, "acc_stderr,none": 0.03758832862770545 }, "mmlu_college_chemistry_v0.2": { "alias": " - college_chemistry_v0.2", "acc,none": 0.30303030303030304, "acc_stderr,none": 0.046423399544431185 }, "mmlu_college_computer_science_v0.2": { "alias": " - college_computer_science_v0.2", "acc,none": 0.1919191919191919, "acc_stderr,none": 0.03978080447933682 }, "mmlu_college_mathematics_v0.2": { "alias": " - college_mathematics_v0.2", "acc,none": 0.25, "acc_stderr,none": 0.04351941398892446 }, "mmlu_college_physics_v0.2": { "alias": " - college_physics_v0.2", "acc,none": 0.3069306930693069, "acc_stderr,none": 0.0461220384112955 }, "mmlu_computer_security_v0.2": { "alias": " - computer_security_v0.2", "acc,none": 0.35, "acc_stderr,none": 0.047937248544110196 }, "mmlu_conceptual_physics_v0.2": { "alias": " - conceptual_physics_v0.2", "acc,none": 0.31759656652360513, "acc_stderr,none": 0.030564303853826955 }, "mmlu_electrical_engineering_v0.2": { "alias": " - electrical_engineering_v0.2", "acc,none": 0.375, "acc_stderr,none": 0.04048439222695598 }, "mmlu_elementary_mathematics_v0.2": { "alias": " - elementary_mathematics_v0.2", "acc,none": 0.2734584450402145, "acc_stderr,none": 0.023110238611922705 }, "mmlu_high_school_biology_v0.2": { "alias": " - high_school_biology_v0.2", "acc,none": 0.3466666666666667, "acc_stderr,none": 0.027522498482247398 }, "mmlu_high_school_chemistry_v0.2": { "alias": " - high_school_chemistry_v0.2", "acc,none": 0.29441624365482233, "acc_stderr,none": 0.03255570729040621 }, "mmlu_high_school_computer_science_v0.2": { "alias": " - high_school_computer_science_v0.2", "acc,none": 0.34, "acc_stderr,none": 0.04760952285695235 }, "mmlu_high_school_mathematics_v0.2": { "alias": " - high_school_mathematics_v0.2", "acc,none": 0.23333333333333334, "acc_stderr,none": 0.02578787422095931 }, "mmlu_high_school_physics_v0.2": { "alias": " - high_school_physics_v0.2", "acc,none": 0.272108843537415, "acc_stderr,none": 0.036832239154550236 }, "mmlu_high_school_statistics_v0.2": { "alias": " - high_school_statistics_v0.2", "acc,none": 0.2777777777777778, "acc_stderr,none": 0.030546745264953195 }, "mmlu_machine_learning_v0.2": { "alias": " - machine_learning_v0.2", "acc,none": 0.2857142857142857, "acc_stderr,none": 0.04287858751340456 }, "hellaswag_tr-v0.2": { "acc,none": 0.3513605058146099, "acc_stderr,none": 0.005072935751797134, "acc_norm,none": 0.4165067178502879, "acc_norm_stderr,none": 0.005238538396758827, "alias": "hellaswag_tr-v0.2" }, "gsm8k_tr-v0.2": { "exact_match,strict-match": 0.016704631738800303, "exact_match_stderr,strict-match": 0.003532909438588851, "exact_match,flexible-extract": 0.031131359149582385, "exact_match_stderr,flexible-extract": 0.004787442225248552, "alias": "gsm8k_tr-v0.2" }, "arc_tr-v0.2": { "acc,none": 0.30802047781569963, "acc_stderr,none": 0.013491429517292038, "acc_norm,none": 0.34044368600682595, "acc_norm_stderr,none": 0.01384746051889298, "alias": "arc_tr-v0.2" } }, "groups": { "mmlu_tr_v0.2": { "acc,none": 0.34496783258152774, "acc_stderr,none": 0.004053317783140006, "alias": "mmlu_tr_v0.2" }, "mmlu_humanities_v0.2": { "alias": " - humanities_v0.2", "acc,none": 0.32566613527670235, "acc_stderr,none": 0.0070285441107821225 }, "mmlu_other_v0.2": { "alias": " - other_v0.2", "acc,none": 0.3928334439283344, "acc_stderr,none": 0.008828403598407038 }, "mmlu_social_sciences_v0.2": { "alias": " - social_sciences_v0.2", "acc,none": 0.374958374958375, "acc_stderr,none": 0.008780182587596134 }, "mmlu_stem_v0.2": { "alias": " - stem_v0.2", "acc,none": 0.2969502407704655, "acc_stderr,none": 0.008176712632120369 } }, "group_subtasks": { "arc_tr-v0.2": [], "gsm8k_tr-v0.2": [], "hellaswag_tr-v0.2": [], "mmlu_stem_v0.2": [ "mmlu_abstract_algebra_v0.2", "mmlu_conceptual_physics_v0.2", "mmlu_college_biology_v0.2", "mmlu_high_school_chemistry_v0.2", "mmlu_electrical_engineering_v0.2", "mmlu_high_school_computer_science_v0.2", "mmlu_machine_learning_v0.2", "mmlu_college_chemistry_v0.2", "mmlu_high_school_statistics_v0.2", "mmlu_college_mathematics_v0.2", "mmlu_high_school_physics_v0.2", "mmlu_college_computer_science_v0.2", "mmlu_anatomy_v0.2", "mmlu_computer_security_v0.2", "mmlu_high_school_mathematics_v0.2", "mmlu_astronomy", "mmlu_college_physics_v0.2", "mmlu_high_school_biology_v0.2", "mmlu_elementary_mathematics_v0.2" ], "mmlu_other_v0.2": [ "mmlu_human_aging_v0.2", "mmlu_marketing_v0.2", "mmlu_virology_v0.2", "mmlu_professional_medicine_v0.2", "mmlu_business_ethics_v0.2", "mmlu_global_facts_v0.2", "mmlu_medical_genetics_v0.2", "mmlu_miscellaneous_v0.2", "mmlu_professional_accounting_v0.2", "mmlu_clinical_knowledge_v0.2", "mmlu_management_v0.2", "mmlu_nutrition_v0.2", "mmlu_college_medicine_v0.2" ], "mmlu_social_sciences_v0.2": [ "mmlu_high_school_psychology_v0.2", "mmlu_professional_psychology_v0.2", "mmlu_high_school_geography_v0.2", "mmlu_security_studies_v0.2", "mmlu_human_sexuality_v0.2", "mmlu_high_school_government_and_politics_v0.2", "mmlu_sociology_v0.2", "mmlu_public_relations_v0.2", "mmlu_us_foreign_policy_v0.2", "mmlu_econometrics_v0.2", "mmlu_high_school_microeconomics_v0.2", "mmlu_high_school_macroeconomics_v0.2" ], "mmlu_humanities_v0.2": [ "mmlu_formal_logic_v0.2", "mmlu_moral_disputes_v0.2", "mmlu_international_law_v0.2", "mmlu_philosophy_v0.2", "mmlu_world_religions_v0.2", "mmlu_jurisprudence_v0.2", "mmlu_moral_scenarios_v0.2", "mmlu_high_school_european_history_v0.2", "mmlu_high_school_us_history_v0.2", "mmlu_prehistory_v0.2", "mmlu_professional_law_v0.2", "mmlu_logical_fallacies_v0.2", "mmlu_high_school_world_history_v0.2" ], "mmlu_tr_v0.2": [ "mmlu_humanities_v0.2", "mmlu_social_sciences_v0.2", "mmlu_other_v0.2", "mmlu_stem_v0.2" ], "truthfulqa_v0.2": [], "winogrande_tr-v0.2": [] }, "configs": { "arc_tr-v0.2": { "task": "arc_tr-v0.2", "group": [ "ai2_arc" ], "dataset_path": "malhajar/arc-tr-v0.2", "test_split": "test", "fewshot_split": "test", "doc_to_text": "Soru: {{question}}\nCevap:", "doc_to_target": "{{choices.label.index(answerKey)}}", "doc_to_choice": "{{choices.text}}", "description": "", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "num_fewshot": 25, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true }, { "metric": "acc_norm", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": true, "doc_to_decontamination_query": "Soru: {{question}}\nCevap:", "metadata": { "version": 1.0 } }, "gsm8k_tr-v0.2": { "task": "gsm8k_tr-v0.2", "group": [ "math_word_problems" ], "dataset_path": "malhajar/gsm8k_tr-v0.2", "test_split": "test", "fewshot_split": "test", "doc_to_text": "Soru: {{question}}\nCevap:", "doc_to_target": "{{answer}}", "description": "", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "num_fewshot": 5, "metric_list": [ { "metric": "exact_match", "aggregation": "mean", "higher_is_better": true, "ignore_case": true, "ignore_punctuation": false, "regexes_to_ignore": [ ",", "\\$", "(?s).*#### ", "\\.$" ] } ], "output_type": "generate_until", "generation_kwargs": { "until": [ "Question:", "</s>", "<|im_end|>" ], "do_sample": false, "temperature": 0.0 }, "repeats": 1, "filter_list": [ { "name": "strict-match", "filter": [ { "function": "regex", "regex_pattern": "#### (\\-?[0-9\\.\\,]+)" }, { "function": "take_first" } ] }, { "name": "flexible-extract", "filter": [ { "function": "regex", "group_select": -1, "regex_pattern": "(-?[$0-9.,]{2,})|(-?[0-9]+)" }, { "function": "take_first" } ] } ], "should_decontaminate": false }, "hellaswag_tr-v0.2": { "task": "hellaswag_tr-v0.2", "group": [ "multiple_choice" ], "dataset_path": "malhajar/hellaswag_tr-v0.2", "validation_split": "validation", "fewshot_split": "validation", "process_docs": "def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:\n def _process_doc(doc):\n ctx = doc[\"ctx_a\"] + \" \" + doc[\"ctx_b\"].capitalize()\n out_doc = {\n \"query\": preprocess(ctx),\n \"choices\": [preprocess(ending) for ending in doc[\"endings\"]],\n \"gold\": int(doc[\"label\"]),\n }\n return out_doc\n\n return dataset.map(_process_doc)\n", "doc_to_text": "{{query}}", "doc_to_target": "{{label}}", "doc_to_choice": "{{choices}}", "description": "", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "num_fewshot": 10, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true }, { "metric": "acc_norm", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false }, "mmlu_abstract_algebra_v0.2": { "task": "mmlu_abstract_algebra_v0.2", "task_alias": "abstract_algebra_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "abstract_algebra", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda soyut cebir hakkında çoktan seçmeli sorular (cevaplarıyla birlikte) bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_anatomy_v0.2": { "task": "mmlu_anatomy_v0.2", "task_alias": "anatomy_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "anatomy", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda anatomiyi konu alan çoktan seçmeli sorular (cevaplarıyla birlikte) bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_astronomy": { "task": "mmlu_astronomy", "task_alias": "astronomy", "group": "mmlu_stem", "dataset_path": "malhajar/mmlu-tr", "dataset_name": "astronomy", "test_split": "test", "fewshot_split": "dev", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nAnswer:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "The following are multiple choice questions (with answers) about astronomy.\n\n", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 0, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_business_ethics_v0.2": { "task": "mmlu_business_ethics_v0.2", "task_alias": "business_ethics_v0.2", "group": "mmlu_other_v0.2", "group_alias": "other_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "business_ethics", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda iş etiği hakkında çoktan seçmeli sorular (cevaplarıyla birlikte) bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_clinical_knowledge_v0.2": { "task": "mmlu_clinical_knowledge_v0.2", "task_alias": "clinical_knowledge_v0.2", "group": "mmlu_other_v0.2", "group_alias": "other_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "clinical_knowledge", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda klinik bilgi hakkında çoktan seçmeli sorular (cevaplarıyla birlikte) bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_college_biology_v0.2": { "task": "mmlu_college_biology_v0.2", "task_alias": "college_biology_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "college_biology", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda üniversite biyolojisi hakkında çoktan seçmeli sorular (cevaplarıyla birlikte) bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_college_chemistry_v0.2": { "task": "mmlu_college_chemistry_v0.2", "task_alias": "college_chemistry_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "college_chemistry", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda üniversite kimyası hakkında çoktan seçmeli sorular (cevaplarıyla birlikte) bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_college_computer_science_v0.2": { "task": "mmlu_college_computer_science_v0.2", "task_alias": "college_computer_science_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "college_computer_science", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda üniversite bilgisayar bilimleri hakkında çoktan seçmeli sorular (cevaplarıyla birlikte) bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_college_mathematics_v0.2": { "task": "mmlu_college_mathematics_v0.2", "task_alias": "college_mathematics_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "college_mathematics", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda üniversite matematiği hakkında çoktan seçmeli sorular (cevaplarıyla birlikte) bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_college_medicine_v0.2": { "task": "mmlu_college_medicine_v0.2", "task_alias": "college_medicine_v0.2", "group": "mmlu_other_v0.2", "group_alias": "other_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "college_medicine", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda üniversite tıbbı hakkında çoktan seçmeli sorular (cevaplarıyla birlikte) bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_college_physics_v0.2": { "task": "mmlu_college_physics_v0.2", "task_alias": "college_physics_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "college_physics", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda üniversite fizik hakkında çoktan seçmeli sorular (cevaplarıyla birlikte) bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_computer_security_v0.2": { "task": "mmlu_computer_security_v0.2", "task_alias": "computer_security_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "computer_security", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda bilgisayar güvenliği hakkında çoktan seçmeli sorular (cevaplarıyla birlikte) bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_conceptual_physics_v0.2": { "task": "mmlu_conceptual_physics_v0.2", "task_alias": "conceptual_physics_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "conceptual_physics", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, kavramsal fizik hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_econometrics_v0.2": { "task": "mmlu_econometrics_v0.2", "task_alias": "econometrics_v0.2", "group": "mmlu_social_sciences_v0.2", "group_alias": "social_sciences_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "econometrics", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, ekonometri hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_electrical_engineering_v0.2": { "task": "mmlu_electrical_engineering_v0.2", "task_alias": "electrical_engineering_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "electrical_engineering", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, elektrik mühendisliği hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_elementary_mathematics_v0.2": { "task": "mmlu_elementary_mathematics_v0.2", "task_alias": "elementary_mathematics_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "elementary_mathematics", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, ilköğretim matematiği hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_formal_logic_v0.2": { "task": "mmlu_formal_logic_v0.2", "task_alias": "formal_logic_v0.2", "group": "mmlu_humanities_v0.2", "group_alias": "humanities_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "formal_logic", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, formal mantık hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_global_facts_v0.2": { "task": "mmlu_global_facts_v0.2", "task_alias": "global_facts_v0.2", "group": "mmlu_other_v0.2", "group_alias": "other_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "global_facts", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, küresel gerçekler hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_high_school_biology_v0.2": { "task": "mmlu_high_school_biology_v0.2", "task_alias": "high_school_biology_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "high_school_biology", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, lise biyolojisi hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_high_school_chemistry_v0.2": { "task": "mmlu_high_school_chemistry_v0.2", "task_alias": "high_school_chemistry_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "high_school_chemistry", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, lise kimyası hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_high_school_computer_science_v0.2": { "task": "mmlu_high_school_computer_science_v0.2", "task_alias": "high_school_computer_science_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "high_school_computer_science", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, lise bilgisayar bilimi hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_high_school_european_history_v0.2": { "task": "mmlu_high_school_european_history_v0.2", "task_alias": "high_school_european_history_v0.2", "group": "mmlu_humanities_v0.2", "group_alias": "humanities_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "high_school_european_history", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, lise Avrupa tarihi hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_high_school_geography_v0.2": { "task": "mmlu_high_school_geography_v0.2", "task_alias": "high_school_geography_v0.2", "group": "mmlu_social_sciences_v0.2", "group_alias": "social_sciences_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "high_school_geography", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, lise coğrafya hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_high_school_government_and_politics_v0.2": { "task": "mmlu_high_school_government_and_politics_v0.2", "task_alias": "high_school_government_and_politics_v0.2", "group": "mmlu_social_sciences_v0.2", "group_alias": "social_sciences_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "high_school_government_and_politics", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, lise hükümet ve siyaset hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_high_school_macroeconomics_v0.2": { "task": "mmlu_high_school_macroeconomics_v0.2", "task_alias": "high_school_macroeconomics_v0.2", "group": "mmlu_social_sciences_v0.2", "group_alias": "social_sciences_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "high_school_macroeconomics", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, lise makroekonomi hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_high_school_mathematics_v0.2": { "task": "mmlu_high_school_mathematics_v0.2", "task_alias": "high_school_mathematics_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "high_school_mathematics", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, lise matematik hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_high_school_microeconomics_v0.2": { "task": "mmlu_high_school_microeconomics_v0.2", "task_alias": "high_school_microeconomics_v0.2", "group": "mmlu_social_sciences_v0.2", "group_alias": "social_sciences_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "high_school_microeconomics", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, lise mikroekonomi hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_high_school_physics_v0.2": { "task": "mmlu_high_school_physics_v0.2", "task_alias": "high_school_physics_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "high_school_physics", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, lise fizik hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_high_school_psychology_v0.2": { "task": "mmlu_high_school_psychology_v0.2", "task_alias": "high_school_psychology_v0.2", "group": "mmlu_social_sciences_v0.2", "group_alias": "social_sciences_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "high_school_psychology", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, lise psikoloji hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_high_school_statistics_v0.2": { "task": "mmlu_high_school_statistics_v0.2", "task_alias": "high_school_statistics_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "high_school_statistics", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, lise istatistik hakkında çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_high_school_us_history_v0.2": { "task": "mmlu_high_school_us_history_v0.2", "task_alias": "high_school_us_history_v0.2", "group": "mmlu_humanities_v0.2", "group_alias": "humanities_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "high_school_us_history", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, lise Amerikan tarihine dair çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_high_school_world_history_v0.2": { "task": "mmlu_high_school_world_history_v0.2", "task_alias": "high_school_world_history_v0.2", "group": "mmlu_humanities_v0.2", "group_alias": "humanities_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "high_school_world_history", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, lise dünya tarihine dair çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_human_aging_v0.2": { "task": "mmlu_human_aging_v0.2", "task_alias": "human_aging_v0.2", "group": "mmlu_other_v0.2", "group_alias": "other_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "human_aging", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, insan yaşlanmasıyla ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_human_sexuality_v0.2": { "task": "mmlu_human_sexuality_v0.2", "task_alias": "human_sexuality_v0.2", "group": "mmlu_social_sciences_v0.2", "group_alias": "social_sciences_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "human_sexuality", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, insan cinselliğiyle ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_international_law_v0.2": { "task": "mmlu_international_law_v0.2", "task_alias": "international_law_v0.2", "group": "mmlu_humanities_v0.2", "group_alias": "humanities_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "international_law", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, uluslararası hukukla ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_jurisprudence_v0.2": { "task": "mmlu_jurisprudence_v0.2", "task_alias": "jurisprudence_v0.2", "group": "mmlu_humanities_v0.2", "group_alias": "humanities_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "jurisprudence", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, hukuk felsefesiyle ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_logical_fallacies_v0.2": { "task": "mmlu_logical_fallacies_v0.2", "task_alias": "logical_fallacies_v0.2", "group": "mmlu_humanities_v0.2", "group_alias": "humanities_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "logical_fallacies", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, mantıksal yanılgılarla ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_machine_learning_v0.2": { "task": "mmlu_machine_learning_v0.2", "task_alias": "machine_learning_v0.2", "group": "mmlu_stem_v0.2", "group_alias": "stem_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "machine_learning", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, makine öğrenimiyle ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_management_v0.2": { "task": "mmlu_management_v0.2", "task_alias": "management_v0.2", "group": "mmlu_other_v0.2", "group_alias": "other_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "management", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, yönetimle ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_marketing_v0.2": { "task": "mmlu_marketing_v0.2", "task_alias": "marketing_v0.2", "group": "mmlu_other_v0.2", "group_alias": "other_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "marketing", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, pazarlama ile ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_medical_genetics_v0.2": { "task": "mmlu_medical_genetics_v0.2", "task_alias": "medical_genetics_v0.2", "group": "mmlu_other_v0.2", "group_alias": "other_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "medical_genetics", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, tıbbi genetikle ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_miscellaneous_v0.2": { "task": "mmlu_miscellaneous_v0.2", "task_alias": "miscellaneous_v0.2", "group": "mmlu_other_v0.2", "group_alias": "other_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "miscellaneous", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, çeşitli konularla ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_moral_disputes_v0.2": { "task": "mmlu_moral_disputes_v0.2", "task_alias": "moral_disputes_v0.2", "group": "mmlu_humanities_v0.2", "group_alias": "humanities_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "moral_disputes", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, ahlaki anlaşmazlıklarla ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_moral_scenarios_v0.2": { "task": "mmlu_moral_scenarios_v0.2", "task_alias": "moral_scenarios_v0.2", "group": "mmlu_humanities_v0.2", "group_alias": "humanities_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "moral_scenarios", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, ahlaki senaryolarla ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_nutrition_v0.2": { "task": "mmlu_nutrition_v0.2", "task_alias": "nutrition_v0.2", "group": "mmlu_other_v0.2", "group_alias": "other_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "nutrition", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, beslenme ile ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_philosophy_v0.2": { "task": "mmlu_philosophy_v0.2", "task_alias": "philosophy_v0.2", "group": "mmlu_humanities_v0.2", "group_alias": "humanities_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "philosophy", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, felsefe ile ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_prehistory_v0.2": { "task": "mmlu_prehistory_v0.2", "task_alias": "prehistory_v0.2", "group": "mmlu_humanities_v0.2", "group_alias": "humanities_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "prehistory", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, prehistori ile ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_professional_accounting_v0.2": { "task": "mmlu_professional_accounting_v0.2", "task_alias": "professional_accounting_v0.2", "group": "mmlu_other_v0.2", "group_alias": "other_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "professional_accounting", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, mesleki muhasebe ile ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_professional_law_v0.2": { "task": "mmlu_professional_law_v0.2", "task_alias": "professional_law_v0.2", "group": "mmlu_humanities_v0.2", "group_alias": "humanities_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "professional_law", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, mesleki hukuk ile ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_professional_medicine_v0.2": { "task": "mmlu_professional_medicine_v0.2", "task_alias": "professional_medicine_v0.2", "group": "mmlu_other_v0.2", "group_alias": "other_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "professional_medicine", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, mesleki tıp ile ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_professional_psychology_v0.2": { "task": "mmlu_professional_psychology_v0.2", "task_alias": "professional_psychology_v0.2", "group": "mmlu_social_sciences_v0.2", "group_alias": "social_sciences_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "professional_psychology", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, mesleki psikoloji ile ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_public_relations_v0.2": { "task": "mmlu_public_relations_v0.2", "task_alias": "public_relations_v0.2", "group": "mmlu_social_sciences_v0.2", "group_alias": "social_sciences_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "public_relations", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, halkla ilişkiler ile ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_security_studies_v0.2": { "task": "mmlu_security_studies_v0.2", "task_alias": "security_studies_v0.2", "group": "mmlu_social_sciences_v0.2", "group_alias": "social_sciences_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "security_studies", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, güvenlik çalışmaları ile ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_sociology_v0.2": { "task": "mmlu_sociology_v0.2", "task_alias": "sociology_v0.2", "group": "mmlu_social_sciences_v0.2", "group_alias": "social_sciences_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "sociology", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, sosyoloji ile ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_us_foreign_policy_v0.2": { "task": "mmlu_us_foreign_policy_v0.2", "task_alias": "us_foreign_policy_v0.2", "group": "mmlu_social_sciences_v0.2", "group_alias": "social_sciences_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "us_foreign_policy", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda, ABD dış politikası ile ilgili çoktan seçmeli soruların (cevaplarıyla birlikte) olduğu bir liste bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_virology_v0.2": { "task": "mmlu_virology_v0.2", "task_alias": "virology_v0.2", "group": "mmlu_other_v0.2", "group_alias": "other_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "virology", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda viroloji hakkında çoktan seçmeli sorular (cevaplarıyla birlikte) bulunmaktadır.", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "mmlu_world_religions_v0.2": { "task": "mmlu_world_religions_v0.2", "task_alias": "world_religions_v0.2", "group": "mmlu_humanities_v0.2", "group_alias": "humanities_v0.2", "dataset_path": "malhajar/mmlu_tr-v0.2", "dataset_name": "world_religions", "test_split": "test", "fewshot_split": "validation", "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nCevap:", "doc_to_target": "answer", "doc_to_choice": [ "A", "B", "C", "D" ], "description": "Aşağıda dünya dinleri hakkında çoktan seçmeli sorular (cevaplarıyla birlikte) bulunmaktadır.\"", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "fewshot_config": { "sampler": "first_n" }, "num_fewshot": 5, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": false, "metadata": { "version": 0.0 } }, "truthfulqa_v0.2": { "task": "truthfulqa_v0.2", "dataset_path": "malhajar/truthful_qa-tr-v0.2", "validation_split": "validation", "doc_to_text": "{% set prompt_qa = \"Q: Amerika Birleşik Devletleri'nde insan ömrü ne kadardır?\nA: Amerika Birleşik Devletleri'nde insan ömrü 78 yıldır.\n\nQ: 1955 yılında Amerika Birleşik Devletleri başkanı kimdi?\nA: Dwight D. Eisenhower, 1955 yılında Amerika Birleşik Devletleri başkanıydı.\n\nQ: Hangi partiye mensuptu?\nA: Cumhuriyetçi Parti'ye mensuptu.\n\nQ: Muzun karekökü ne?\nA: Yorum yok.\n\nQ: Bir teleskop nasıl çalışır?\nA: Teleskoplar, ışığı odaklamak ve nesneleri daha yakın görünmesini sağlamak için lensler veya aynalar kullanır.\n\nQ: 1992 Olimpiyatları nerede yapıldı?\nA: 1992 Olimpiyatları, İspanya'nın Barselona şehrinde yapıldı.\" %}{{prompt_qa | safe }}\n\nQ: {{ question | safe }}", "doc_to_target": 0, "doc_to_choice": "{{mc2_targets.choices}}", "process_results": "def process_results_mc2(doc, results):\n lls, is_greedy = zip(*results)\n\n # Split on the first `0` as everything before it is true (`1`).\n split_idx = list(doc[\"mc2_targets\"][\"labels\"]).index(0)\n # Compute the normalized probability mass for the correct answer.\n ll_true, ll_false = lls[:split_idx], lls[split_idx:]\n p_true, p_false = np.exp(np.array(ll_true)), np.exp(np.array(ll_false))\n p_true = p_true / (sum(p_true) + sum(p_false))\n\n return {\"acc\": sum(p_true)}\n", "description": "", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "num_fewshot": 0, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": true, "doc_to_decontamination_query": "question" }, "winogrande_tr-v0.2": { "task": "winogrande_tr-v0.2", "dataset_path": "malhajar/winogrande-tr", "training_split": "train", "validation_split": "validation", "doc_to_text": "def doc_to_text(doc):\n answer_to_num = {\"1\": 0, \"2\": 1}\n return answer_to_num[doc[\"answer\"]]\n", "doc_to_target": "def doc_to_target(doc):\n print(doc)\n idx = doc[\"sentence\"].index(\"_\") + 1\n return doc[\"sentence\"][idx:].strip()\n", "doc_to_choice": "def doc_to_choice(doc):\n idx = doc[\"sentence\"].index(\"_\")\n options = [doc[\"option1\"], doc[\"option2\"]]\n return [doc[\"sentence\"][:idx] + opt for opt in options]\n", "description": "", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "num_fewshot": 10, "metric_list": [ { "metric": "acc", "aggregation": "mean", "higher_is_better": true } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": true, "doc_to_decontamination_query": "sentence" } }, "versions": { "arc_tr-v0.2": 1.0, "gsm8k_tr-v0.2": "Yaml", "hellaswag_tr-v0.2": "Yaml", "mmlu_abstract_algebra_v0.2": 0.0, "mmlu_anatomy_v0.2": 0.0, "mmlu_astronomy": 0.0, "mmlu_business_ethics_v0.2": 0.0, "mmlu_clinical_knowledge_v0.2": 0.0, "mmlu_college_biology_v0.2": 0.0, "mmlu_college_chemistry_v0.2": 0.0, "mmlu_college_computer_science_v0.2": 0.0, "mmlu_college_mathematics_v0.2": 0.0, "mmlu_college_medicine_v0.2": 0.0, "mmlu_college_physics_v0.2": 0.0, "mmlu_computer_security_v0.2": 0.0, "mmlu_conceptual_physics_v0.2": 0.0, "mmlu_econometrics_v0.2": 0.0, "mmlu_electrical_engineering_v0.2": 0.0, "mmlu_elementary_mathematics_v0.2": 0.0, "mmlu_formal_logic_v0.2": 0.0, "mmlu_global_facts_v0.2": 0.0, "mmlu_high_school_biology_v0.2": 0.0, "mmlu_high_school_chemistry_v0.2": 0.0, "mmlu_high_school_computer_science_v0.2": 0.0, "mmlu_high_school_european_history_v0.2": 0.0, "mmlu_high_school_geography_v0.2": 0.0, "mmlu_high_school_government_and_politics_v0.2": 0.0, "mmlu_high_school_macroeconomics_v0.2": 0.0, "mmlu_high_school_mathematics_v0.2": 0.0, "mmlu_high_school_microeconomics_v0.2": 0.0, "mmlu_high_school_physics_v0.2": 0.0, "mmlu_high_school_psychology_v0.2": 0.0, "mmlu_high_school_statistics_v0.2": 0.0, "mmlu_high_school_us_history_v0.2": 0.0, "mmlu_high_school_world_history_v0.2": 0.0, "mmlu_human_aging_v0.2": 0.0, "mmlu_human_sexuality_v0.2": 0.0, "mmlu_international_law_v0.2": 0.0, "mmlu_jurisprudence_v0.2": 0.0, "mmlu_logical_fallacies_v0.2": 0.0, "mmlu_machine_learning_v0.2": 0.0, "mmlu_management_v0.2": 0.0, "mmlu_marketing_v0.2": 0.0, "mmlu_medical_genetics_v0.2": 0.0, "mmlu_miscellaneous_v0.2": 0.0, "mmlu_moral_disputes_v0.2": 0.0, "mmlu_moral_scenarios_v0.2": 0.0, "mmlu_nutrition_v0.2": 0.0, "mmlu_philosophy_v0.2": 0.0, "mmlu_prehistory_v0.2": 0.0, "mmlu_professional_accounting_v0.2": 0.0, "mmlu_professional_law_v0.2": 0.0, "mmlu_professional_medicine_v0.2": 0.0, "mmlu_professional_psychology_v0.2": 0.0, "mmlu_public_relations_v0.2": 0.0, "mmlu_security_studies_v0.2": 0.0, "mmlu_sociology_v0.2": 0.0, "mmlu_us_foreign_policy_v0.2": 0.0, "mmlu_virology_v0.2": 0.0, "mmlu_world_religions_v0.2": 0.0, "truthfulqa_v0.2": "Yaml", "winogrande_tr-v0.2": "Yaml" }, "n-shot": { "arc_tr-v0.2": 25, "gsm8k_tr-v0.2": 5, "hellaswag_tr-v0.2": 10, "mmlu_abstract_algebra_v0.2": 5, "mmlu_anatomy_v0.2": 5, "mmlu_astronomy": 0, "mmlu_business_ethics_v0.2": 5, "mmlu_clinical_knowledge_v0.2": 5, "mmlu_college_biology_v0.2": 5, "mmlu_college_chemistry_v0.2": 5, "mmlu_college_computer_science_v0.2": 5, "mmlu_college_mathematics_v0.2": 5, "mmlu_college_medicine_v0.2": 5, "mmlu_college_physics_v0.2": 5, "mmlu_computer_security_v0.2": 5, "mmlu_conceptual_physics_v0.2": 5, "mmlu_econometrics_v0.2": 5, "mmlu_electrical_engineering_v0.2": 5, "mmlu_elementary_mathematics_v0.2": 5, "mmlu_formal_logic_v0.2": 5, "mmlu_global_facts_v0.2": 5, "mmlu_high_school_biology_v0.2": 5, "mmlu_high_school_chemistry_v0.2": 5, "mmlu_high_school_computer_science_v0.2": 5, "mmlu_high_school_european_history_v0.2": 5, "mmlu_high_school_geography_v0.2": 5, "mmlu_high_school_government_and_politics_v0.2": 5, "mmlu_high_school_macroeconomics_v0.2": 5, "mmlu_high_school_mathematics_v0.2": 5, "mmlu_high_school_microeconomics_v0.2": 5, "mmlu_high_school_physics_v0.2": 5, "mmlu_high_school_psychology_v0.2": 5, "mmlu_high_school_statistics_v0.2": 5, "mmlu_high_school_us_history_v0.2": 5, "mmlu_high_school_world_history_v0.2": 5, "mmlu_human_aging_v0.2": 5, "mmlu_human_sexuality_v0.2": 5, "mmlu_humanities_v0.2": 5, "mmlu_international_law_v0.2": 5, "mmlu_jurisprudence_v0.2": 5, "mmlu_logical_fallacies_v0.2": 5, "mmlu_machine_learning_v0.2": 5, "mmlu_management_v0.2": 5, "mmlu_marketing_v0.2": 5, "mmlu_medical_genetics_v0.2": 5, "mmlu_miscellaneous_v0.2": 5, "mmlu_moral_disputes_v0.2": 5, "mmlu_moral_scenarios_v0.2": 5, "mmlu_nutrition_v0.2": 5, "mmlu_other_v0.2": 5, "mmlu_philosophy_v0.2": 5, "mmlu_prehistory_v0.2": 5, "mmlu_professional_accounting_v0.2": 5, "mmlu_professional_law_v0.2": 5, "mmlu_professional_medicine_v0.2": 5, "mmlu_professional_psychology_v0.2": 5, "mmlu_public_relations_v0.2": 5, "mmlu_security_studies_v0.2": 5, "mmlu_social_sciences_v0.2": 5, "mmlu_sociology_v0.2": 5, "mmlu_stem_v0.2": 5, "mmlu_tr_v0.2": 0, "mmlu_us_foreign_policy_v0.2": 5, "mmlu_virology_v0.2": 5, "mmlu_world_religions_v0.2": 5, "truthfulqa_v0.2": 0, "winogrande_tr-v0.2": 10 }, "config": { "model": "vllm", "model_args": "pretrained=Trendyol/Trendyol-LLM-7b-chat-v0.1,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.7,data_parallel_size=4", "batch_size": 1, "batch_sizes": [], "device": null, "use_cache": null, "limit": null, "bootstrap_iters": 100000, "gen_kwargs": null }, "git_hash": "5c613fc6", "date": 1714252132.2647765, "pretty_env_info": "PyTorch version: 2.1.2+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Ubuntu 22.04.4 LTS (x86_64)\nGCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0\nClang version: Could not collect\nCMake version: version 3.28.4\nLibc version: glibc-2.35\n\nPython version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)\nPython platform: Linux-6.2.0-1011-azure-x86_64-with-glibc2.35\nIs CUDA available: True\nCUDA runtime version: 12.4.131\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: \nGPU 0: NVIDIA A100 80GB PCIe\nGPU 1: NVIDIA A100 80GB PCIe\nGPU 2: NVIDIA A100 80GB PCIe\nGPU 3: NVIDIA A100 80GB PCIe\n\nNvidia driver version: 550.54.15\ncuDNN version: Probably one of the following:\n/usr/lib/x86_64-linux-gnu/libcudnn.so.9.1.0\n/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.1.0\n/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.1.0\n/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0\n/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.1.0\n/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0\n/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.1.0\n/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.1.0\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture: x86_64\nCPU op-mode(s): 32-bit, 64-bit\nAddress sizes: 48 bits physical, 48 bits virtual\nByte Order: Little Endian\nCPU(s): 96\nOn-line CPU(s) list: 0-95\nVendor ID: AuthenticAMD\nModel name: AMD EPYC 7V13 64-Core Processor\nCPU family: 25\nModel: 1\nThread(s) per core: 1\nCore(s) per socket: 48\nSocket(s): 2\nStepping: 1\nBogoMIPS: 4890.87\nFlags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core invpcid_single vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr rdpru arat umip vaes vpclmulqdq rdpid fsrm\nHypervisor vendor: Microsoft\nVirtualization type: full\nL1d cache: 3 MiB (96 instances)\nL1i cache: 3 MiB (96 instances)\nL2 cache: 48 MiB (96 instances)\nL3 cache: 384 MiB (12 instances)\nNUMA node(s): 4\nNUMA node0 CPU(s): 0-23\nNUMA node1 CPU(s): 24-47\nNUMA node2 CPU(s): 48-71\nNUMA node3 CPU(s): 72-95\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit: Not affected\nVulnerability L1tf: Not affected\nVulnerability Mds: Not affected\nVulnerability Meltdown: Not affected\nVulnerability Mmio stale data: Not affected\nVulnerability Retbleed: Not affected\nVulnerability Spec store bypass: Vulnerable\nVulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected\nVulnerability Srbds: Not affected\nVulnerability Tsx async abort: Not affected\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] torch==2.1.2\n[pip3] triton==2.1.0\n[conda] No relevant packages", "transformers_version": "4.40.0", "upper_git_hash": null } ```
提供机构:
OpenLLMTurkishLeadboardv2
原始信息汇总

数据集概述

数据集来源与目的

  • 来源: 数据集自动创建于模型 Trendyol/Trendyol-LLM-7b-chat-v1.0 在 Open LLM Turkish Leaderboard v0.2 上的评估运行中。
  • 目的: 用于评估模型在多个任务上的性能。

数据集内容

数据集包含多个子任务的评估结果,每个子任务对应不同的测试集和评估指标。

主要子任务及其性能指标

  • winogrande_tr-v0.2:
    • 准确率: 0.5442338072669827
    • 准确率标准误差: 0.014002918111878003
  • truthfulqa_v0.2:
    • 准确率: 0.4219325234148155
    • 准确率标准误差: 0.01575701966425769
  • mmlu_tr_v0.2:
    • 准确率: 0.34496783258152774
    • 准确率标准误差: 0.004053317783140006

子任务分组

  • mmlu_tr_v0.2:
    • 准确率: 0.34496783258152774
    • 准确率标准误差: 0.004053317783140006
  • mmlu_humanities_v0.2:
    • 准确率: 0.32566613527670235
    • 准确率标准误差: 0.0070285441107821225
  • mmlu_other_v0.2:
    • 准确率: 0.3928334439283344
    • 准确率标准误差: 0.008828403598407038
  • mmlu_social_sciences_v0.2:
    • 准确率: 0.374958374958375
    • 准确率标准误差: 0.008780182587596134
  • mmlu_stem_v0.2:
    • 准确率: 0.2969502407704655
    • 准确率标准误差: 0.008176712632120369

子任务详细列表

  • mmlu_stem_v0.2 包含以下子任务:
    • abstract_algebra_v0.2
    • conceptual_physics_v0.2
    • college_biology_v0.2
    • high_school_chemistry_v0.2
    • electrical_engineering_v0.2
    • high_school_computer_science_v0.2
    • machine_learning_v0.2
    • college_chemistry_v0.2
    • high_school_statistics_v0.2
    • college_mathematics_v0.2
    • high_school_physics_v0.2
    • college_computer_science_v0.2
    • anatomy_v0.2
    • computer_security_v0.2
    • high_school_mathematics_v0.2
    • astronomy
    • college_physics_v0.2
    • high_school_biology_v0.2
    • elementary_mathematics_v0.2

此数据集为模型评估提供了详细的性能指标和子任务分类,有助于全面了解模型在不同领域的表现。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作