ehasler/cmmlu
收藏Hugging Face2026-05-01 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/ehasler/cmmlu
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: agronomy
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 24545
num_examples: 169
- name: dev
num_bytes: 476
num_examples: 5
download_size: 25303
dataset_size: 25021
- config_name: anatomy
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 17588
num_examples: 148
- name: dev
num_bytes: 404
num_examples: 5
download_size: 17840
dataset_size: 17992
- config_name: ancient_chinese
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 29471
num_examples: 164
- name: dev
num_bytes: 755
num_examples: 5
download_size: 27661
dataset_size: 30226
- config_name: arts
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 20761
num_examples: 160
- name: dev
num_bytes: 443
num_examples: 5
download_size: 21928
dataset_size: 21204
- config_name: astronomy
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 32126
num_examples: 165
- name: dev
num_bytes: 495
num_examples: 5
download_size: 29929
dataset_size: 32621
- config_name: business_ethics
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 35514
num_examples: 209
- name: dev
num_bytes: 479
num_examples: 5
download_size: 30778
dataset_size: 35993
- config_name: chinese_civil_service_exam
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 78684
num_examples: 160
- name: dev
num_bytes: 1166
num_examples: 5
download_size: 68081
dataset_size: 79850
- config_name: chinese_driving_rule
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 24411
num_examples: 131
- name: dev
num_bytes: 743
num_examples: 5
download_size: 22917
dataset_size: 25154
- config_name: chinese_food_culture
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 21353
num_examples: 136
- name: dev
num_bytes: 494
num_examples: 5
download_size: 22456
dataset_size: 21847
- config_name: chinese_foreign_policy
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 42579
num_examples: 107
- name: dev
num_bytes: 1146
num_examples: 5
download_size: 37229
dataset_size: 43725
- config_name: chinese_history
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 126437
num_examples: 323
- name: dev
num_bytes: 1191
num_examples: 5
download_size: 104939
dataset_size: 127628
- config_name: chinese_literature
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 36355
num_examples: 204
- name: dev
num_bytes: 539
num_examples: 5
download_size: 33819
dataset_size: 36894
- config_name: chinese_teacher_qualification
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 47695
num_examples: 179
- name: dev
num_bytes: 890
num_examples: 5
download_size: 42531
dataset_size: 48585
- config_name: clinical_knowledge
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 80528
num_examples: 237
- name: dev
num_bytes: 781
num_examples: 5
download_size: 51912
dataset_size: 81309
- config_name: college_actuarial_science
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 25353
num_examples: 106
- name: dev
num_bytes: 868
num_examples: 5
download_size: 22107
dataset_size: 26221
- config_name: college_education
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 25174
num_examples: 107
- name: dev
num_bytes: 648
num_examples: 5
download_size: 24226
dataset_size: 25822
- config_name: college_engineering_hydrology
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 20769
num_examples: 106
- name: dev
num_bytes: 568
num_examples: 5
download_size: 20024
dataset_size: 21337
- config_name: college_law
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 32457
num_examples: 108
- name: dev
num_bytes: 887
num_examples: 5
download_size: 30470
dataset_size: 33344
- config_name: college_mathematics
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 32055
num_examples: 105
- name: dev
num_bytes: 887
num_examples: 5
download_size: 25650
dataset_size: 32942
- config_name: college_medical_statistics
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 22811
num_examples: 106
- name: dev
num_bytes: 720
num_examples: 5
download_size: 22124
dataset_size: 23531
- config_name: college_medicine
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 51595
num_examples: 273
- name: dev
num_bytes: 487
num_examples: 5
download_size: 41501
dataset_size: 52082
- config_name: computer_science
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 33135
num_examples: 204
- name: dev
num_bytes: 496
num_examples: 5
download_size: 30582
dataset_size: 33631
- config_name: computer_security
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 46298
num_examples: 171
- name: dev
num_bytes: 709
num_examples: 5
download_size: 37163
dataset_size: 47007
- config_name: conceptual_physics
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 45650
num_examples: 147
- name: dev
num_bytes: 1139
num_examples: 5
download_size: 37559
dataset_size: 46789
- config_name: construction_project_management
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 27882
num_examples: 139
- name: dev
num_bytes: 611
num_examples: 5
download_size: 26924
dataset_size: 28493
- config_name: economics
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 31427
num_examples: 159
- name: dev
num_bytes: 641
num_examples: 5
download_size: 28133
dataset_size: 32068
- config_name: education
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 24988
num_examples: 163
- name: dev
num_bytes: 503
num_examples: 5
download_size: 23852
dataset_size: 25491
- config_name: electrical_engineering
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 32906
num_examples: 172
- name: dev
num_bytes: 497
num_examples: 5
download_size: 30189
dataset_size: 33403
- config_name: elementary_chinese
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 42588
num_examples: 252
- name: dev
num_bytes: 501
num_examples: 5
download_size: 39615
dataset_size: 43089
- config_name: elementary_commonsense
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 26737
num_examples: 198
- name: dev
num_bytes: 413
num_examples: 5
download_size: 27365
dataset_size: 27150
- config_name: elementary_information_and_technology
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 41330
num_examples: 238
- name: dev
num_bytes: 491
num_examples: 5
download_size: 33666
dataset_size: 41821
- config_name: elementary_mathematics
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 34542
num_examples: 230
- name: dev
num_bytes: 410
num_examples: 5
download_size: 30211
dataset_size: 34952
- config_name: ethnology
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 23308
num_examples: 135
- name: dev
num_bytes: 484
num_examples: 5
download_size: 21775
dataset_size: 23792
- config_name: food_science
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 19717
num_examples: 143
- name: dev
num_bytes: 492
num_examples: 5
download_size: 21370
dataset_size: 20209
- config_name: genetics
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 33584
num_examples: 176
- name: dev
num_bytes: 563
num_examples: 5
download_size: 29458
dataset_size: 34147
- config_name: global_facts
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 27574
num_examples: 149
- name: dev
num_bytes: 617
num_examples: 5
download_size: 26895
dataset_size: 28191
- config_name: high_school_biology
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 73046
num_examples: 169
- name: dev
num_bytes: 1198
num_examples: 5
download_size: 58915
dataset_size: 74244
- config_name: high_school_chemistry
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 49389
num_examples: 132
- name: dev
num_bytes: 972
num_examples: 5
download_size: 42879
dataset_size: 50361
- config_name: high_school_geography
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 30643
num_examples: 118
- name: dev
num_bytes: 668
num_examples: 5
download_size: 29287
dataset_size: 31311
- config_name: high_school_mathematics
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 24722
num_examples: 164
- name: dev
num_bytes: 523
num_examples: 5
download_size: 22820
dataset_size: 25245
- config_name: high_school_physics
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 31806
num_examples: 110
- name: dev
num_bytes: 1033
num_examples: 5
download_size: 30611
dataset_size: 32839
- config_name: high_school_politics
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 61435
num_examples: 143
- name: dev
num_bytes: 1469
num_examples: 5
download_size: 49585
dataset_size: 62904
- config_name: human_sexuality
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 21507
num_examples: 126
- name: dev
num_bytes: 563
num_examples: 5
download_size: 22528
dataset_size: 22070
- config_name: international_law
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 44422
num_examples: 185
- name: dev
num_bytes: 656
num_examples: 5
download_size: 36932
dataset_size: 45078
- config_name: journalism
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 28038
num_examples: 172
- name: dev
num_bytes: 470
num_examples: 5
download_size: 26625
dataset_size: 28508
- config_name: jurisprudence
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 133606
num_examples: 411
- name: dev
num_bytes: 527
num_examples: 5
download_size: 88060
dataset_size: 134133
- config_name: legal_and_moral_basis
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 56449
num_examples: 214
- name: dev
num_bytes: 732
num_examples: 5
download_size: 43577
dataset_size: 57181
- config_name: logical
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 23329
num_examples: 123
- name: dev
num_bytes: 533
num_examples: 5
download_size: 22681
dataset_size: 23862
- config_name: machine_learning
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 30048
num_examples: 122
- name: dev
num_bytes: 777
num_examples: 5
download_size: 28353
dataset_size: 30825
- config_name: management
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 39210
num_examples: 210
- name: dev
num_bytes: 590
num_examples: 5
download_size: 32474
dataset_size: 39800
- config_name: marketing
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 36460
num_examples: 180
- name: dev
num_bytes: 653
num_examples: 5
download_size: 29759
dataset_size: 37113
- config_name: marxist_theory
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 41398
num_examples: 189
- name: dev
num_bytes: 690
num_examples: 5
download_size: 34192
dataset_size: 42088
- config_name: modern_chinese
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 30354
num_examples: 116
- name: dev
num_bytes: 620
num_examples: 5
download_size: 32456
dataset_size: 30974
- config_name: nutrition
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 23368
num_examples: 145
- name: dev
num_bytes: 495
num_examples: 5
download_size: 23894
dataset_size: 23863
- config_name: philosophy
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 19495
num_examples: 105
- name: dev
num_bytes: 566
num_examples: 5
download_size: 21593
dataset_size: 20061
- config_name: professional_accounting
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 33310
num_examples: 175
- name: dev
num_bytes: 602
num_examples: 5
download_size: 27823
dataset_size: 33912
- config_name: professional_law
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 70708
num_examples: 211
- name: dev
num_bytes: 860
num_examples: 5
download_size: 56758
dataset_size: 71568
- config_name: professional_medicine
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 61575
num_examples: 376
- name: dev
num_bytes: 484
num_examples: 5
download_size: 51935
dataset_size: 62059
- config_name: professional_psychology
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 38395
num_examples: 232
- name: dev
num_bytes: 553
num_examples: 5
download_size: 33626
dataset_size: 38948
- config_name: public_relations
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 31002
num_examples: 174
- name: dev
num_bytes: 521
num_examples: 5
download_size: 27353
dataset_size: 31523
- config_name: security_study
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 26948
num_examples: 135
- name: dev
num_bytes: 624
num_examples: 5
download_size: 26146
dataset_size: 27572
- config_name: sociology
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 37875
num_examples: 226
- name: dev
num_bytes: 523
num_examples: 5
download_size: 30651
dataset_size: 38398
- config_name: sports_science
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 25292
num_examples: 165
- name: dev
num_bytes: 518
num_examples: 5
download_size: 25273
dataset_size: 25810
- config_name: traditional_chinese_medicine
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 26242
num_examples: 185
- name: dev
num_bytes: 359
num_examples: 5
download_size: 25693
dataset_size: 26601
- config_name: virology
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 29110
num_examples: 169
- name: dev
num_bytes: 485
num_examples: 5
download_size: 26817
dataset_size: 29595
- config_name: world_history
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 66687
num_examples: 161
- name: dev
num_bytes: 1570
num_examples: 5
download_size: 58413
dataset_size: 68257
- config_name: world_religions
features:
- name: Question
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: Answer
dtype: string
splits:
- name: test
num_bytes: 21635
num_examples: 160
- name: dev
num_bytes: 439
num_examples: 5
download_size: 22412
dataset_size: 22074
configs:
- config_name: agronomy
data_files:
- split: test
path: agronomy/test-*
- split: dev
path: agronomy/dev-*
- config_name: anatomy
data_files:
- split: test
path: anatomy/test-*
- split: dev
path: anatomy/dev-*
- config_name: ancient_chinese
data_files:
- split: test
path: ancient_chinese/test-*
- split: dev
path: ancient_chinese/dev-*
- config_name: arts
data_files:
- split: test
path: arts/test-*
- split: dev
path: arts/dev-*
- config_name: astronomy
data_files:
- split: test
path: astronomy/test-*
- split: dev
path: astronomy/dev-*
- config_name: business_ethics
data_files:
- split: test
path: business_ethics/test-*
- split: dev
path: business_ethics/dev-*
- config_name: chinese_civil_service_exam
data_files:
- split: test
path: chinese_civil_service_exam/test-*
- split: dev
path: chinese_civil_service_exam/dev-*
- config_name: chinese_driving_rule
data_files:
- split: test
path: chinese_driving_rule/test-*
- split: dev
path: chinese_driving_rule/dev-*
- config_name: chinese_food_culture
data_files:
- split: test
path: chinese_food_culture/test-*
- split: dev
path: chinese_food_culture/dev-*
- config_name: chinese_foreign_policy
data_files:
- split: test
path: chinese_foreign_policy/test-*
- split: dev
path: chinese_foreign_policy/dev-*
- config_name: chinese_history
data_files:
- split: test
path: chinese_history/test-*
- split: dev
path: chinese_history/dev-*
- config_name: chinese_literature
data_files:
- split: test
path: chinese_literature/test-*
- split: dev
path: chinese_literature/dev-*
- config_name: chinese_teacher_qualification
data_files:
- split: test
path: chinese_teacher_qualification/test-*
- split: dev
path: chinese_teacher_qualification/dev-*
- config_name: clinical_knowledge
data_files:
- split: test
path: clinical_knowledge/test-*
- split: dev
path: clinical_knowledge/dev-*
- config_name: college_actuarial_science
data_files:
- split: test
path: college_actuarial_science/test-*
- split: dev
path: college_actuarial_science/dev-*
- config_name: college_education
data_files:
- split: test
path: college_education/test-*
- split: dev
path: college_education/dev-*
- config_name: college_engineering_hydrology
data_files:
- split: test
path: college_engineering_hydrology/test-*
- split: dev
path: college_engineering_hydrology/dev-*
- config_name: college_law
data_files:
- split: test
path: college_law/test-*
- split: dev
path: college_law/dev-*
- config_name: college_mathematics
data_files:
- split: test
path: college_mathematics/test-*
- split: dev
path: college_mathematics/dev-*
- config_name: college_medical_statistics
data_files:
- split: test
path: college_medical_statistics/test-*
- split: dev
path: college_medical_statistics/dev-*
- config_name: college_medicine
data_files:
- split: test
path: college_medicine/test-*
- split: dev
path: college_medicine/dev-*
- config_name: computer_science
data_files:
- split: test
path: computer_science/test-*
- split: dev
path: computer_science/dev-*
- config_name: computer_security
data_files:
- split: test
path: computer_security/test-*
- split: dev
path: computer_security/dev-*
- config_name: conceptual_physics
data_files:
- split: test
path: conceptual_physics/test-*
- split: dev
path: conceptual_physics/dev-*
- config_name: construction_project_management
data_files:
- split: test
path: construction_project_management/test-*
- split: dev
path: construction_project_management/dev-*
- config_name: economics
data_files:
- split: test
path: economics/test-*
- split: dev
path: economics/dev-*
- config_name: education
data_files:
- split: test
path: education/test-*
- split: dev
path: education/dev-*
- config_name: electrical_engineering
data_files:
- split: test
path: electrical_engineering/test-*
- split: dev
path: electrical_engineering/dev-*
- config_name: elementary_chinese
data_files:
- split: test
path: elementary_chinese/test-*
- split: dev
path: elementary_chinese/dev-*
- config_name: elementary_commonsense
data_files:
- split: test
path: elementary_commonsense/test-*
- split: dev
path: elementary_commonsense/dev-*
- config_name: elementary_information_and_technology
data_files:
- split: test
path: elementary_information_and_technology/test-*
- split: dev
path: elementary_information_and_technology/dev-*
- config_name: elementary_mathematics
data_files:
- split: test
path: elementary_mathematics/test-*
- split: dev
path: elementary_mathematics/dev-*
- config_name: ethnology
data_files:
- split: test
path: ethnology/test-*
- split: dev
path: ethnology/dev-*
- config_name: food_science
data_files:
- split: test
path: food_science/test-*
- split: dev
path: food_science/dev-*
- config_name: genetics
data_files:
- split: test
path: genetics/test-*
- split: dev
path: genetics/dev-*
- config_name: global_facts
data_files:
- split: test
path: global_facts/test-*
- split: dev
path: global_facts/dev-*
- config_name: high_school_biology
data_files:
- split: test
path: high_school_biology/test-*
- split: dev
path: high_school_biology/dev-*
- config_name: high_school_chemistry
data_files:
- split: test
path: high_school_chemistry/test-*
- split: dev
path: high_school_chemistry/dev-*
- config_name: high_school_geography
data_files:
- split: test
path: high_school_geography/test-*
- split: dev
path: high_school_geography/dev-*
- config_name: high_school_mathematics
data_files:
- split: test
path: high_school_mathematics/test-*
- split: dev
path: high_school_mathematics/dev-*
- config_name: high_school_physics
data_files:
- split: test
path: high_school_physics/test-*
- split: dev
path: high_school_physics/dev-*
- config_name: high_school_politics
data_files:
- split: test
path: high_school_politics/test-*
- split: dev
path: high_school_politics/dev-*
- config_name: human_sexuality
data_files:
- split: test
path: human_sexuality/test-*
- split: dev
path: human_sexuality/dev-*
- config_name: international_law
data_files:
- split: test
path: international_law/test-*
- split: dev
path: international_law/dev-*
- config_name: journalism
data_files:
- split: test
path: journalism/test-*
- split: dev
path: journalism/dev-*
- config_name: jurisprudence
data_files:
- split: test
path: jurisprudence/test-*
- split: dev
path: jurisprudence/dev-*
- config_name: legal_and_moral_basis
data_files:
- split: test
path: legal_and_moral_basis/test-*
- split: dev
path: legal_and_moral_basis/dev-*
- config_name: logical
data_files:
- split: test
path: logical/test-*
- split: dev
path: logical/dev-*
- config_name: machine_learning
data_files:
- split: test
path: machine_learning/test-*
- split: dev
path: machine_learning/dev-*
- config_name: management
data_files:
- split: test
path: management/test-*
- split: dev
path: management/dev-*
- config_name: marketing
data_files:
- split: test
path: marketing/test-*
- split: dev
path: marketing/dev-*
- config_name: marxist_theory
data_files:
- split: test
path: marxist_theory/test-*
- split: dev
path: marxist_theory/dev-*
- config_name: modern_chinese
data_files:
- split: test
path: modern_chinese/test-*
- split: dev
path: modern_chinese/dev-*
- config_name: nutrition
data_files:
- split: test
path: nutrition/test-*
- split: dev
path: nutrition/dev-*
- config_name: philosophy
data_files:
- split: test
path: philosophy/test-*
- split: dev
path: philosophy/dev-*
- config_name: professional_accounting
data_files:
- split: test
path: professional_accounting/test-*
- split: dev
path: professional_accounting/dev-*
- config_name: professional_law
data_files:
- split: test
path: professional_law/test-*
- split: dev
path: professional_law/dev-*
- config_name: professional_medicine
data_files:
- split: test
path: professional_medicine/test-*
- split: dev
path: professional_medicine/dev-*
- config_name: professional_psychology
data_files:
- split: test
path: professional_psychology/test-*
- split: dev
path: professional_psychology/dev-*
- config_name: public_relations
data_files:
- split: test
path: public_relations/test-*
- split: dev
path: public_relations/dev-*
- config_name: security_study
data_files:
- split: test
path: security_study/test-*
- split: dev
path: security_study/dev-*
- config_name: sociology
data_files:
- split: test
path: sociology/test-*
- split: dev
path: sociology/dev-*
- config_name: sports_science
data_files:
- split: test
path: sports_science/test-*
- split: dev
path: sports_science/dev-*
- config_name: traditional_chinese_medicine
data_files:
- split: test
path: traditional_chinese_medicine/test-*
- split: dev
path: traditional_chinese_medicine/dev-*
- config_name: virology
data_files:
- split: test
path: virology/test-*
- split: dev
path: virology/dev-*
- config_name: world_history
data_files:
- split: test
path: world_history/test-*
- split: dev
path: world_history/dev-*
- config_name: world_religions
data_files:
- split: test
path: world_religions/test-*
- split: dev
path: world_religions/dev-*
---
提供机构:
ehasler
搜集汇总
数据集介绍

构建方式
CMMLU数据集是一个面向中文语境的大规模多任务语言理解评测基准,其构建方式严谨而系统。该数据集广泛覆盖了从基础学科到专业领域的67个主题,涵盖人文社科、自然科学、工程技术等多个维度。每个主题下的测试集与开发集均采用四选一的选择题形式,包含一道明确表述的问题(Question)、四个候选选项(A、B、C、D)以及一个标准答案(Answer)。开发集为每个主题提供5个样例,用于少样本学习或模型调优,而测试集则包含百余至数百个不等的样本,用于评估模型的真实知识掌握程度。数据来源于中国各类考试、教科书及专业文献,确保了题目的权威性与多样性。数据集以config划分为不同子集,便于研究者按需加载,整体设计旨在全面检验模型在中文各学科知识上的理解与推理能力。
特点
CMMLU数据集具有显著的结构化与系统性特点。其最核心的特征在于覆盖了67个细致划分的知识领域,从初等教育的基础常识如小学数学、信息技术,到高等教育的前沿学科如基因学、精算科学,再到中国特有的文化知识如古代文学、中医理论,展现了极强的领域广度与深度。每个子集规模适中,测试样本数通常在100至400之间,总计包含约一万余道题目,平衡了评测的全面性与单次任务的计算成本。数据集统一采用四选一标准格式,答案明确唯一,为自动评估提供了便利。此外,开发集与测试集的分离设计,支持了少样本学习和零样本评估等多样化评测范式。整体上,CMMLU不仅反映模型的中文语言能力,更侧重于检验其对中国本土知识体系的理解与整合水平。
使用方法
使用CMMLU数据集进行评测时,研究者可通过Hugging Face Datasets库便捷加载。加载时需指定具体的配置名称,如'load_dataset("cmmlu", "agronomy")'以获取农学子集。每个子集包含'test'和'dev'两个拆分,其中'dev'提供5个带答案样例,可用于上下文学习或提示工程构建。评估流程通常为:将测试集中的问题与选项拼接为完整输入,交由语言模型生成答案,再与标准答案进行精确匹配以计算准确率。为全面衡量模型能力,建议对所有67个子集分别测试,并汇总平均得分。该数据集适用于中文大语言模型的基准测试、知识蒸馏评估以及跨学科能力分析等场景,实验时需注意不同子集间的难度差异,并结合领域特性解读模型表现。
背景与挑战
背景概述
CMMLU(Chinese Massive Multitask Language Understanding)是由中国科研团队构建的大规模中文多任务理解数据集,旨在系统评估语言模型在中文语境下的知识广度与深度。该数据集诞生于大语言模型蓬勃发展但中文评估体系薄弱的背景下,核心研究问题是现有英文基准如MMLU难以全面反映模型在中文语言、文化及学科知识上的掌握程度。CMMLU涵盖从基础学科到专业领域的广泛主题,包含数十个学科的子集,为模型在农学、解剖学、古代汉语、法学、临床医学等领域的推理能力提供了严谨的测试基准。其对相关领域的影响力在于,填补了中文综合知识评估的空白,成为衡量中文大模型能力的重要标尺。
当前挑战
CMMLU面临的挑战首要在于领域问题的复杂性:现有模型在中文多学科知识融合推理上表现不足,尤其是对中文特有的文化背景、成语典故及专业术语的理解,易出现语义偏差,这要求模型具备深厚的语言与文化知识储备。在数据集构建过程中,挑战在于如何从海量中文教材、考试试题及学术资料中,精确筛选并生成覆盖各学科、难度层级分明且答案唯一的多选题。同时,需确保题目不依赖特定模型训练时可能见过的数据,避免数据泄露,并平衡各子集的样本量与代表性,以公正反映模型在广泛知识维度上的真实能力。
常用场景
经典使用场景
CMMLU数据集作为一项覆盖中国语言与文化背景的多学科知识评估基准,广泛应用于大规模语言模型的中文能力评测。其设计核心在于通过涵盖从基础学科到专业领域、从初等教育到高等教育的67个学科主题,全方位衡量模型在中文语境下的知识储备与推理深度。研究者通常利用该数据集的测试集,在零样本或少样本设置下评估模型对中文事实性知识的掌握程度,亦或将其作为微调阶段的目标语料,以增强模型在中文特定领域(如法律、医学、工程)的表现。由于每个子集均配备数百道四选一题目,其结构兼具规模性与专业性,成为检验模型中文理解与领域专长的经典标杆。
衍生相关工作
围绕CMMLU衍生出多项具有影响力的研究工作,其中最具代表性的是针对中文大模型特定知识领域的专项分析,例如探究模型在中医、中国古代文学等特色学科上的表现缺陷,继而催生知识增强型微调方法。部分学者基于该数据集构建了细粒度能力评估框架,将模型表现与学科难度、题型复杂度相关联,从而指导预训练数据的优化配比。同时,CMMLU也作为重要评测基准出现在中文多模态大模型(如CogVLM、Qwen-VL)的论文中,用于验证模型在图文结合场景下的中文知识推理能力。这些衍生工作进一步巩固了CMMLU作为中文大模型评测核心地位的作用。
数据集最近研究
最新研究方向
CMMLU数据集作为中文大规模多学科知识理解基准,当前前沿研究方向聚焦于评估大语言模型在涵盖农学、法学、医学等67个学科领域中的中文知识掌握与推理能力。该数据集紧密关联国产大模型如GPT-4、ChatGLM、文心一言等的中文能力评测热点事件,尤其在公务员考试、教师资格等中国特有场景下的表现备受关注。CMMLU的推出填补了中文综合性知识评估的空白,为模型在跨学科、深层次理解上的性能优化提供了关键标尺,其影响深远,不仅推动了中文自然语言处理研究的进展,也为模型在教育、法律等专业领域的可靠应用奠定了评价基础。
以上内容由遇见数据集搜集并总结生成



