five

ehasler/cmmlu

收藏
Hugging Face2026-05-01 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/ehasler/cmmlu
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: agronomy features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 24545 num_examples: 169 - name: dev num_bytes: 476 num_examples: 5 download_size: 25303 dataset_size: 25021 - config_name: anatomy features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 17588 num_examples: 148 - name: dev num_bytes: 404 num_examples: 5 download_size: 17840 dataset_size: 17992 - config_name: ancient_chinese features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 29471 num_examples: 164 - name: dev num_bytes: 755 num_examples: 5 download_size: 27661 dataset_size: 30226 - config_name: arts features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 20761 num_examples: 160 - name: dev num_bytes: 443 num_examples: 5 download_size: 21928 dataset_size: 21204 - config_name: astronomy features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 32126 num_examples: 165 - name: dev num_bytes: 495 num_examples: 5 download_size: 29929 dataset_size: 32621 - config_name: business_ethics features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 35514 num_examples: 209 - name: dev num_bytes: 479 num_examples: 5 download_size: 30778 dataset_size: 35993 - config_name: chinese_civil_service_exam features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 78684 num_examples: 160 - name: dev num_bytes: 1166 num_examples: 5 download_size: 68081 dataset_size: 79850 - config_name: chinese_driving_rule features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 24411 num_examples: 131 - name: dev num_bytes: 743 num_examples: 5 download_size: 22917 dataset_size: 25154 - config_name: chinese_food_culture features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 21353 num_examples: 136 - name: dev num_bytes: 494 num_examples: 5 download_size: 22456 dataset_size: 21847 - config_name: chinese_foreign_policy features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 42579 num_examples: 107 - name: dev num_bytes: 1146 num_examples: 5 download_size: 37229 dataset_size: 43725 - config_name: chinese_history features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 126437 num_examples: 323 - name: dev num_bytes: 1191 num_examples: 5 download_size: 104939 dataset_size: 127628 - config_name: chinese_literature features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 36355 num_examples: 204 - name: dev num_bytes: 539 num_examples: 5 download_size: 33819 dataset_size: 36894 - config_name: chinese_teacher_qualification features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 47695 num_examples: 179 - name: dev num_bytes: 890 num_examples: 5 download_size: 42531 dataset_size: 48585 - config_name: clinical_knowledge features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 80528 num_examples: 237 - name: dev num_bytes: 781 num_examples: 5 download_size: 51912 dataset_size: 81309 - config_name: college_actuarial_science features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 25353 num_examples: 106 - name: dev num_bytes: 868 num_examples: 5 download_size: 22107 dataset_size: 26221 - config_name: college_education features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 25174 num_examples: 107 - name: dev num_bytes: 648 num_examples: 5 download_size: 24226 dataset_size: 25822 - config_name: college_engineering_hydrology features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 20769 num_examples: 106 - name: dev num_bytes: 568 num_examples: 5 download_size: 20024 dataset_size: 21337 - config_name: college_law features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 32457 num_examples: 108 - name: dev num_bytes: 887 num_examples: 5 download_size: 30470 dataset_size: 33344 - config_name: college_mathematics features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 32055 num_examples: 105 - name: dev num_bytes: 887 num_examples: 5 download_size: 25650 dataset_size: 32942 - config_name: college_medical_statistics features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 22811 num_examples: 106 - name: dev num_bytes: 720 num_examples: 5 download_size: 22124 dataset_size: 23531 - config_name: college_medicine features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 51595 num_examples: 273 - name: dev num_bytes: 487 num_examples: 5 download_size: 41501 dataset_size: 52082 - config_name: computer_science features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 33135 num_examples: 204 - name: dev num_bytes: 496 num_examples: 5 download_size: 30582 dataset_size: 33631 - config_name: computer_security features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 46298 num_examples: 171 - name: dev num_bytes: 709 num_examples: 5 download_size: 37163 dataset_size: 47007 - config_name: conceptual_physics features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 45650 num_examples: 147 - name: dev num_bytes: 1139 num_examples: 5 download_size: 37559 dataset_size: 46789 - config_name: construction_project_management features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 27882 num_examples: 139 - name: dev num_bytes: 611 num_examples: 5 download_size: 26924 dataset_size: 28493 - config_name: economics features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 31427 num_examples: 159 - name: dev num_bytes: 641 num_examples: 5 download_size: 28133 dataset_size: 32068 - config_name: education features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 24988 num_examples: 163 - name: dev num_bytes: 503 num_examples: 5 download_size: 23852 dataset_size: 25491 - config_name: electrical_engineering features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 32906 num_examples: 172 - name: dev num_bytes: 497 num_examples: 5 download_size: 30189 dataset_size: 33403 - config_name: elementary_chinese features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 42588 num_examples: 252 - name: dev num_bytes: 501 num_examples: 5 download_size: 39615 dataset_size: 43089 - config_name: elementary_commonsense features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 26737 num_examples: 198 - name: dev num_bytes: 413 num_examples: 5 download_size: 27365 dataset_size: 27150 - config_name: elementary_information_and_technology features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 41330 num_examples: 238 - name: dev num_bytes: 491 num_examples: 5 download_size: 33666 dataset_size: 41821 - config_name: elementary_mathematics features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 34542 num_examples: 230 - name: dev num_bytes: 410 num_examples: 5 download_size: 30211 dataset_size: 34952 - config_name: ethnology features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 23308 num_examples: 135 - name: dev num_bytes: 484 num_examples: 5 download_size: 21775 dataset_size: 23792 - config_name: food_science features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 19717 num_examples: 143 - name: dev num_bytes: 492 num_examples: 5 download_size: 21370 dataset_size: 20209 - config_name: genetics features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 33584 num_examples: 176 - name: dev num_bytes: 563 num_examples: 5 download_size: 29458 dataset_size: 34147 - config_name: global_facts features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 27574 num_examples: 149 - name: dev num_bytes: 617 num_examples: 5 download_size: 26895 dataset_size: 28191 - config_name: high_school_biology features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 73046 num_examples: 169 - name: dev num_bytes: 1198 num_examples: 5 download_size: 58915 dataset_size: 74244 - config_name: high_school_chemistry features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 49389 num_examples: 132 - name: dev num_bytes: 972 num_examples: 5 download_size: 42879 dataset_size: 50361 - config_name: high_school_geography features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 30643 num_examples: 118 - name: dev num_bytes: 668 num_examples: 5 download_size: 29287 dataset_size: 31311 - config_name: high_school_mathematics features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 24722 num_examples: 164 - name: dev num_bytes: 523 num_examples: 5 download_size: 22820 dataset_size: 25245 - config_name: high_school_physics features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 31806 num_examples: 110 - name: dev num_bytes: 1033 num_examples: 5 download_size: 30611 dataset_size: 32839 - config_name: high_school_politics features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 61435 num_examples: 143 - name: dev num_bytes: 1469 num_examples: 5 download_size: 49585 dataset_size: 62904 - config_name: human_sexuality features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 21507 num_examples: 126 - name: dev num_bytes: 563 num_examples: 5 download_size: 22528 dataset_size: 22070 - config_name: international_law features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 44422 num_examples: 185 - name: dev num_bytes: 656 num_examples: 5 download_size: 36932 dataset_size: 45078 - config_name: journalism features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 28038 num_examples: 172 - name: dev num_bytes: 470 num_examples: 5 download_size: 26625 dataset_size: 28508 - config_name: jurisprudence features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 133606 num_examples: 411 - name: dev num_bytes: 527 num_examples: 5 download_size: 88060 dataset_size: 134133 - config_name: legal_and_moral_basis features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 56449 num_examples: 214 - name: dev num_bytes: 732 num_examples: 5 download_size: 43577 dataset_size: 57181 - config_name: logical features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 23329 num_examples: 123 - name: dev num_bytes: 533 num_examples: 5 download_size: 22681 dataset_size: 23862 - config_name: machine_learning features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 30048 num_examples: 122 - name: dev num_bytes: 777 num_examples: 5 download_size: 28353 dataset_size: 30825 - config_name: management features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 39210 num_examples: 210 - name: dev num_bytes: 590 num_examples: 5 download_size: 32474 dataset_size: 39800 - config_name: marketing features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 36460 num_examples: 180 - name: dev num_bytes: 653 num_examples: 5 download_size: 29759 dataset_size: 37113 - config_name: marxist_theory features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 41398 num_examples: 189 - name: dev num_bytes: 690 num_examples: 5 download_size: 34192 dataset_size: 42088 - config_name: modern_chinese features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 30354 num_examples: 116 - name: dev num_bytes: 620 num_examples: 5 download_size: 32456 dataset_size: 30974 - config_name: nutrition features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 23368 num_examples: 145 - name: dev num_bytes: 495 num_examples: 5 download_size: 23894 dataset_size: 23863 - config_name: philosophy features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 19495 num_examples: 105 - name: dev num_bytes: 566 num_examples: 5 download_size: 21593 dataset_size: 20061 - config_name: professional_accounting features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 33310 num_examples: 175 - name: dev num_bytes: 602 num_examples: 5 download_size: 27823 dataset_size: 33912 - config_name: professional_law features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 70708 num_examples: 211 - name: dev num_bytes: 860 num_examples: 5 download_size: 56758 dataset_size: 71568 - config_name: professional_medicine features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 61575 num_examples: 376 - name: dev num_bytes: 484 num_examples: 5 download_size: 51935 dataset_size: 62059 - config_name: professional_psychology features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 38395 num_examples: 232 - name: dev num_bytes: 553 num_examples: 5 download_size: 33626 dataset_size: 38948 - config_name: public_relations features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 31002 num_examples: 174 - name: dev num_bytes: 521 num_examples: 5 download_size: 27353 dataset_size: 31523 - config_name: security_study features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 26948 num_examples: 135 - name: dev num_bytes: 624 num_examples: 5 download_size: 26146 dataset_size: 27572 - config_name: sociology features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 37875 num_examples: 226 - name: dev num_bytes: 523 num_examples: 5 download_size: 30651 dataset_size: 38398 - config_name: sports_science features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 25292 num_examples: 165 - name: dev num_bytes: 518 num_examples: 5 download_size: 25273 dataset_size: 25810 - config_name: traditional_chinese_medicine features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 26242 num_examples: 185 - name: dev num_bytes: 359 num_examples: 5 download_size: 25693 dataset_size: 26601 - config_name: virology features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 29110 num_examples: 169 - name: dev num_bytes: 485 num_examples: 5 download_size: 26817 dataset_size: 29595 - config_name: world_history features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 66687 num_examples: 161 - name: dev num_bytes: 1570 num_examples: 5 download_size: 58413 dataset_size: 68257 - config_name: world_religions features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string splits: - name: test num_bytes: 21635 num_examples: 160 - name: dev num_bytes: 439 num_examples: 5 download_size: 22412 dataset_size: 22074 configs: - config_name: agronomy data_files: - split: test path: agronomy/test-* - split: dev path: agronomy/dev-* - config_name: anatomy data_files: - split: test path: anatomy/test-* - split: dev path: anatomy/dev-* - config_name: ancient_chinese data_files: - split: test path: ancient_chinese/test-* - split: dev path: ancient_chinese/dev-* - config_name: arts data_files: - split: test path: arts/test-* - split: dev path: arts/dev-* - config_name: astronomy data_files: - split: test path: astronomy/test-* - split: dev path: astronomy/dev-* - config_name: business_ethics data_files: - split: test path: business_ethics/test-* - split: dev path: business_ethics/dev-* - config_name: chinese_civil_service_exam data_files: - split: test path: chinese_civil_service_exam/test-* - split: dev path: chinese_civil_service_exam/dev-* - config_name: chinese_driving_rule data_files: - split: test path: chinese_driving_rule/test-* - split: dev path: chinese_driving_rule/dev-* - config_name: chinese_food_culture data_files: - split: test path: chinese_food_culture/test-* - split: dev path: chinese_food_culture/dev-* - config_name: chinese_foreign_policy data_files: - split: test path: chinese_foreign_policy/test-* - split: dev path: chinese_foreign_policy/dev-* - config_name: chinese_history data_files: - split: test path: chinese_history/test-* - split: dev path: chinese_history/dev-* - config_name: chinese_literature data_files: - split: test path: chinese_literature/test-* - split: dev path: chinese_literature/dev-* - config_name: chinese_teacher_qualification data_files: - split: test path: chinese_teacher_qualification/test-* - split: dev path: chinese_teacher_qualification/dev-* - config_name: clinical_knowledge data_files: - split: test path: clinical_knowledge/test-* - split: dev path: clinical_knowledge/dev-* - config_name: college_actuarial_science data_files: - split: test path: college_actuarial_science/test-* - split: dev path: college_actuarial_science/dev-* - config_name: college_education data_files: - split: test path: college_education/test-* - split: dev path: college_education/dev-* - config_name: college_engineering_hydrology data_files: - split: test path: college_engineering_hydrology/test-* - split: dev path: college_engineering_hydrology/dev-* - config_name: college_law data_files: - split: test path: college_law/test-* - split: dev path: college_law/dev-* - config_name: college_mathematics data_files: - split: test path: college_mathematics/test-* - split: dev path: college_mathematics/dev-* - config_name: college_medical_statistics data_files: - split: test path: college_medical_statistics/test-* - split: dev path: college_medical_statistics/dev-* - config_name: college_medicine data_files: - split: test path: college_medicine/test-* - split: dev path: college_medicine/dev-* - config_name: computer_science data_files: - split: test path: computer_science/test-* - split: dev path: computer_science/dev-* - config_name: computer_security data_files: - split: test path: computer_security/test-* - split: dev path: computer_security/dev-* - config_name: conceptual_physics data_files: - split: test path: conceptual_physics/test-* - split: dev path: conceptual_physics/dev-* - config_name: construction_project_management data_files: - split: test path: construction_project_management/test-* - split: dev path: construction_project_management/dev-* - config_name: economics data_files: - split: test path: economics/test-* - split: dev path: economics/dev-* - config_name: education data_files: - split: test path: education/test-* - split: dev path: education/dev-* - config_name: electrical_engineering data_files: - split: test path: electrical_engineering/test-* - split: dev path: electrical_engineering/dev-* - config_name: elementary_chinese data_files: - split: test path: elementary_chinese/test-* - split: dev path: elementary_chinese/dev-* - config_name: elementary_commonsense data_files: - split: test path: elementary_commonsense/test-* - split: dev path: elementary_commonsense/dev-* - config_name: elementary_information_and_technology data_files: - split: test path: elementary_information_and_technology/test-* - split: dev path: elementary_information_and_technology/dev-* - config_name: elementary_mathematics data_files: - split: test path: elementary_mathematics/test-* - split: dev path: elementary_mathematics/dev-* - config_name: ethnology data_files: - split: test path: ethnology/test-* - split: dev path: ethnology/dev-* - config_name: food_science data_files: - split: test path: food_science/test-* - split: dev path: food_science/dev-* - config_name: genetics data_files: - split: test path: genetics/test-* - split: dev path: genetics/dev-* - config_name: global_facts data_files: - split: test path: global_facts/test-* - split: dev path: global_facts/dev-* - config_name: high_school_biology data_files: - split: test path: high_school_biology/test-* - split: dev path: high_school_biology/dev-* - config_name: high_school_chemistry data_files: - split: test path: high_school_chemistry/test-* - split: dev path: high_school_chemistry/dev-* - config_name: high_school_geography data_files: - split: test path: high_school_geography/test-* - split: dev path: high_school_geography/dev-* - config_name: high_school_mathematics data_files: - split: test path: high_school_mathematics/test-* - split: dev path: high_school_mathematics/dev-* - config_name: high_school_physics data_files: - split: test path: high_school_physics/test-* - split: dev path: high_school_physics/dev-* - config_name: high_school_politics data_files: - split: test path: high_school_politics/test-* - split: dev path: high_school_politics/dev-* - config_name: human_sexuality data_files: - split: test path: human_sexuality/test-* - split: dev path: human_sexuality/dev-* - config_name: international_law data_files: - split: test path: international_law/test-* - split: dev path: international_law/dev-* - config_name: journalism data_files: - split: test path: journalism/test-* - split: dev path: journalism/dev-* - config_name: jurisprudence data_files: - split: test path: jurisprudence/test-* - split: dev path: jurisprudence/dev-* - config_name: legal_and_moral_basis data_files: - split: test path: legal_and_moral_basis/test-* - split: dev path: legal_and_moral_basis/dev-* - config_name: logical data_files: - split: test path: logical/test-* - split: dev path: logical/dev-* - config_name: machine_learning data_files: - split: test path: machine_learning/test-* - split: dev path: machine_learning/dev-* - config_name: management data_files: - split: test path: management/test-* - split: dev path: management/dev-* - config_name: marketing data_files: - split: test path: marketing/test-* - split: dev path: marketing/dev-* - config_name: marxist_theory data_files: - split: test path: marxist_theory/test-* - split: dev path: marxist_theory/dev-* - config_name: modern_chinese data_files: - split: test path: modern_chinese/test-* - split: dev path: modern_chinese/dev-* - config_name: nutrition data_files: - split: test path: nutrition/test-* - split: dev path: nutrition/dev-* - config_name: philosophy data_files: - split: test path: philosophy/test-* - split: dev path: philosophy/dev-* - config_name: professional_accounting data_files: - split: test path: professional_accounting/test-* - split: dev path: professional_accounting/dev-* - config_name: professional_law data_files: - split: test path: professional_law/test-* - split: dev path: professional_law/dev-* - config_name: professional_medicine data_files: - split: test path: professional_medicine/test-* - split: dev path: professional_medicine/dev-* - config_name: professional_psychology data_files: - split: test path: professional_psychology/test-* - split: dev path: professional_psychology/dev-* - config_name: public_relations data_files: - split: test path: public_relations/test-* - split: dev path: public_relations/dev-* - config_name: security_study data_files: - split: test path: security_study/test-* - split: dev path: security_study/dev-* - config_name: sociology data_files: - split: test path: sociology/test-* - split: dev path: sociology/dev-* - config_name: sports_science data_files: - split: test path: sports_science/test-* - split: dev path: sports_science/dev-* - config_name: traditional_chinese_medicine data_files: - split: test path: traditional_chinese_medicine/test-* - split: dev path: traditional_chinese_medicine/dev-* - config_name: virology data_files: - split: test path: virology/test-* - split: dev path: virology/dev-* - config_name: world_history data_files: - split: test path: world_history/test-* - split: dev path: world_history/dev-* - config_name: world_religions data_files: - split: test path: world_religions/test-* - split: dev path: world_religions/dev-* ---
提供机构:
ehasler
搜集汇总
数据集介绍
main_image_url
构建方式
CMMLU数据集是一个面向中文语境的大规模多任务语言理解评测基准,其构建方式严谨而系统。该数据集广泛覆盖了从基础学科到专业领域的67个主题,涵盖人文社科、自然科学、工程技术等多个维度。每个主题下的测试集与开发集均采用四选一的选择题形式,包含一道明确表述的问题(Question)、四个候选选项(A、B、C、D)以及一个标准答案(Answer)。开发集为每个主题提供5个样例,用于少样本学习或模型调优,而测试集则包含百余至数百个不等的样本,用于评估模型的真实知识掌握程度。数据来源于中国各类考试、教科书及专业文献,确保了题目的权威性与多样性。数据集以config划分为不同子集,便于研究者按需加载,整体设计旨在全面检验模型在中文各学科知识上的理解与推理能力。
特点
CMMLU数据集具有显著的结构化与系统性特点。其最核心的特征在于覆盖了67个细致划分的知识领域,从初等教育的基础常识如小学数学、信息技术,到高等教育的前沿学科如基因学、精算科学,再到中国特有的文化知识如古代文学、中医理论,展现了极强的领域广度与深度。每个子集规模适中,测试样本数通常在100至400之间,总计包含约一万余道题目,平衡了评测的全面性与单次任务的计算成本。数据集统一采用四选一标准格式,答案明确唯一,为自动评估提供了便利。此外,开发集与测试集的分离设计,支持了少样本学习和零样本评估等多样化评测范式。整体上,CMMLU不仅反映模型的中文语言能力,更侧重于检验其对中国本土知识体系的理解与整合水平。
使用方法
使用CMMLU数据集进行评测时,研究者可通过Hugging Face Datasets库便捷加载。加载时需指定具体的配置名称,如'load_dataset("cmmlu", "agronomy")'以获取农学子集。每个子集包含'test'和'dev'两个拆分,其中'dev'提供5个带答案样例,可用于上下文学习或提示工程构建。评估流程通常为:将测试集中的问题与选项拼接为完整输入,交由语言模型生成答案,再与标准答案进行精确匹配以计算准确率。为全面衡量模型能力,建议对所有67个子集分别测试,并汇总平均得分。该数据集适用于中文大语言模型的基准测试、知识蒸馏评估以及跨学科能力分析等场景,实验时需注意不同子集间的难度差异,并结合领域特性解读模型表现。
背景与挑战
背景概述
CMMLU(Chinese Massive Multitask Language Understanding)是由中国科研团队构建的大规模中文多任务理解数据集,旨在系统评估语言模型在中文语境下的知识广度与深度。该数据集诞生于大语言模型蓬勃发展但中文评估体系薄弱的背景下,核心研究问题是现有英文基准如MMLU难以全面反映模型在中文语言、文化及学科知识上的掌握程度。CMMLU涵盖从基础学科到专业领域的广泛主题,包含数十个学科的子集,为模型在农学、解剖学、古代汉语、法学、临床医学等领域的推理能力提供了严谨的测试基准。其对相关领域的影响力在于,填补了中文综合知识评估的空白,成为衡量中文大模型能力的重要标尺。
当前挑战
CMMLU面临的挑战首要在于领域问题的复杂性:现有模型在中文多学科知识融合推理上表现不足,尤其是对中文特有的文化背景、成语典故及专业术语的理解,易出现语义偏差,这要求模型具备深厚的语言与文化知识储备。在数据集构建过程中,挑战在于如何从海量中文教材、考试试题及学术资料中,精确筛选并生成覆盖各学科、难度层级分明且答案唯一的多选题。同时,需确保题目不依赖特定模型训练时可能见过的数据,避免数据泄露,并平衡各子集的样本量与代表性,以公正反映模型在广泛知识维度上的真实能力。
常用场景
经典使用场景
CMMLU数据集作为一项覆盖中国语言与文化背景的多学科知识评估基准,广泛应用于大规模语言模型的中文能力评测。其设计核心在于通过涵盖从基础学科到专业领域、从初等教育到高等教育的67个学科主题,全方位衡量模型在中文语境下的知识储备与推理深度。研究者通常利用该数据集的测试集,在零样本或少样本设置下评估模型对中文事实性知识的掌握程度,亦或将其作为微调阶段的目标语料,以增强模型在中文特定领域(如法律、医学、工程)的表现。由于每个子集均配备数百道四选一题目,其结构兼具规模性与专业性,成为检验模型中文理解与领域专长的经典标杆。
衍生相关工作
围绕CMMLU衍生出多项具有影响力的研究工作,其中最具代表性的是针对中文大模型特定知识领域的专项分析,例如探究模型在中医、中国古代文学等特色学科上的表现缺陷,继而催生知识增强型微调方法。部分学者基于该数据集构建了细粒度能力评估框架,将模型表现与学科难度、题型复杂度相关联,从而指导预训练数据的优化配比。同时,CMMLU也作为重要评测基准出现在中文多模态大模型(如CogVLM、Qwen-VL)的论文中,用于验证模型在图文结合场景下的中文知识推理能力。这些衍生工作进一步巩固了CMMLU作为中文大模型评测核心地位的作用。
数据集最近研究
最新研究方向
CMMLU数据集作为中文大规模多学科知识理解基准,当前前沿研究方向聚焦于评估大语言模型在涵盖农学、法学、医学等67个学科领域中的中文知识掌握与推理能力。该数据集紧密关联国产大模型如GPT-4、ChatGLM、文心一言等的中文能力评测热点事件,尤其在公务员考试、教师资格等中国特有场景下的表现备受关注。CMMLU的推出填补了中文综合性知识评估的空白,为模型在跨学科、深层次理解上的性能优化提供了关键标尺,其影响深远,不仅推动了中文自然语言处理研究的进展,也为模型在教育、法律等专业领域的可靠应用奠定了评价基础。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作