miulab/tmlu

Name: miulab/tmlu
Creator: miulab
Published: 2024-05-08 08:35:29
License: 暂无描述

Hugging Face2024-05-08 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/miulab/tmlu

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - question-answering - text-classification language: - zh pretty_name: TMLU size_categories: - 1K<n<10K configs: - config_name: AST_chinese data_files: - split: test path: "AST_chinese_test.jsonl" - split: dev path: "AST_chinese_dev.jsonl" - config_name: AST_mathematics data_files: - split: test path: "AST_mathematics_test.jsonl" - split: dev path: "AST_mathematics_dev.jsonl" - config_name: AST_biology data_files: - split: test path: "AST_biology_test.jsonl" - split: dev path: "AST_biology_dev.jsonl" - config_name: AST_chemistry data_files: - split: test path: "AST_chemistry_test.jsonl" - split: dev path: "AST_chemistry_dev.jsonl" - config_name: AST_physics data_files: - split: test path: "AST_physics_test.jsonl" - split: dev path: "AST_physics_dev.jsonl" - config_name: AST_civics data_files: - split: test path: "AST_civics_test.jsonl" - split: dev path: "AST_civics_dev.jsonl" - config_name: AST_geography data_files: - split: test path: "AST_geography_test.jsonl" - split: dev path: "AST_geography_dev.jsonl" - config_name: AST_history data_files: - split: test path: "AST_history_test.jsonl" - split: dev path: "AST_history_dev.jsonl" - config_name: GSAT_chinese data_files: - split: test path: "GSAT_chinese_test.jsonl" - split: dev path: "GSAT_chinese_dev.jsonl" - config_name: GSAT_chemistry data_files: - split: test path: "GSAT_chemistry_test.jsonl" - split: dev path: "GSAT_chemistry_dev.jsonl" - config_name: GSAT_biology data_files: - split: test path: "GSAT_biology_test.jsonl" - split: dev path: "GSAT_biology_dev.jsonl" - config_name: GSAT_physics data_files: - split: test path: "GSAT_physics_test.jsonl" - split: dev path: "GSAT_physics_dev.jsonl" - config_name: GSAT_earth_science data_files: - split: test path: "GSAT_earth_science_test.jsonl" - split: dev path: "GSAT_earth_science_dev.jsonl" - config_name: GSAT_mathematics data_files: - split: test path: "GSAT_mathematics_test.jsonl" - split: dev path: "GSAT_mathematics_dev.jsonl" - config_name: GSAT_geography data_files: - split: test path: "GSAT_geography_test.jsonl" - split: dev path: "GSAT_geography_dev.jsonl" - config_name: GSAT_history data_files: - split: test path: "GSAT_history_test.jsonl" - split: dev path: "GSAT_history_dev.jsonl" - config_name: GSAT_civics data_files: - split: test path: "GSAT_civics_test.jsonl" - split: dev path: "GSAT_civics_dev.jsonl" - config_name: CAP_mathematics data_files: - split: test path: "CAP_mathematics_test.jsonl" - split: dev path: "CAP_mathematics_dev.jsonl" - config_name: CAP_biology data_files: - split: test path: "CAP_biology_test.jsonl" - split: dev path: "CAP_biology_dev.jsonl" - config_name: CAP_physics data_files: - split: test path: "CAP_physics_test.jsonl" - split: dev path: "CAP_physics_dev.jsonl" - config_name: CAP_chemistry data_files: - split: test path: "CAP_chemistry_test.jsonl" - split: dev path: "CAP_chemistry_dev.jsonl" - config_name: CAP_earth_science data_files: - split: test path: "CAP_earth_science_test.jsonl" - split: dev path: "CAP_earth_science_dev.jsonl" - config_name: CAP_civics data_files: - split: test path: "CAP_civics_test.jsonl" - split: dev path: "CAP_civics_dev.jsonl" - config_name: CAP_history data_files: - split: test path: "CAP_history_test.jsonl" - split: dev path: "CAP_history_dev.jsonl" - config_name: CAP_geography data_files: - split: test path: "CAP_geography_test.jsonl" - split: dev path: "CAP_geography_dev.jsonl" - config_name: CAP_chinese data_files: - split: test path: "CAP_chinese_test.jsonl" - split: dev path: "CAP_chinese_dev.jsonl" - config_name: driving_rule data_files: - split: test path: "driving_rule_test.jsonl" - split: dev path: "driving_rule_dev.jsonl" - config_name: basic_traditional_chinese_medicine data_files: - split: test path: "basic_traditional_chinese_medicine_test.jsonl" - split: dev path: "basic_traditional_chinese_medicine_dev.jsonl" - config_name: clinical_traditional_chinese_medicine data_files: - split: test path: "clinical_traditional_chinese_medicine_test.jsonl" - split: dev path: "clinical_traditional_chinese_medicine_dev.jsonl" - config_name: lawyer_qualification data_files: - split: test path: "lawyer_qualification_test.jsonl" - split: dev path: "lawyer_qualification_dev.jsonl" - config_name: nutritionist data_files: - split: test path: "nutritionist_test.jsonl" - split: dev path: "nutritionist_dev.jsonl" - config_name: tour_leader data_files: - split: test path: "tour_leader_test.jsonl" - split: dev path: "tour_leader_dev.jsonl" - config_name: tour_guide data_files: - split: test path: "tour_guide_test.jsonl" - split: dev path: "tour_guide_dev.jsonl" - config_name: taiwan_tourist_resources data_files: - split: test path: "taiwan_tourist_resources_test.jsonl" - split: dev path: "taiwan_tourist_resources_dev.jsonl" - config_name: clinical_psychologist data_files: - split: test path: "clinical_psychologist_test.jsonl" - split: dev path: "clinical_psychologist_dev.jsonl" - config_name: teacher_qualification data_files: - split: test path: "teacher_qualification_test.jsonl" - split: dev path: "teacher_qualification_dev.jsonl" - config_name: accountant data_files: - split: test path: "accountant_test.jsonl" - split: dev path: "accountant_dev.jsonl" --- # Dataset Card for Dataset Name  This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1). ## Dataset Details - AST: 分科測驗（110前指考） - GSAT: 學科能力測驗 - CAP: 國中教育會考 ### Dataset Description  - **Curated by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] ### Dataset Sources [optional]  - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses  ### Direct Use  [More Information Needed] ### Out-of-Scope Use  [More Information Needed] ## Dataset Structure  [More Information Needed] ## Dataset Creation ### Curation Rationale  [More Information Needed] ### Source Data  #### Data Collection and Processing  [More Information Needed] #### Who are the source data producers?  [More Information Needed] ### Annotations [optional]  #### Annotation process  [More Information Needed] #### Who are the annotators?  [More Information Needed] #### Personal and Sensitive Information  [More Information Needed] ## Bias, Risks, and Limitations  [More Information Needed] ### Recommendations  Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. ## Citation [optional]  **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional]  [More Information Needed] ## More Information [optional] ### Evaluation #### CAP ##### ChatGPT Total: 199 / 389 (0.5116) | Subject | Accuracy | correct / total | |:------------- | -------- |:--------------- | | chinese | 0.5179 | 29 / 56 | | mathematics | 0.3273 | 36 / 110 | | physics | 0.5000 | 5 / 10 | | chemistry | 0.2727 | 6 / 22 | | biology | 0.4545 | 10 / 22 | | earth science | 0.4000 | 4 / 10 | | geography | 0.5750 | 23 / 40 | | history | 0.8235 | 42 / 51 | | civics | 0.6471 | 44 / 68 | ##### GPT-4-turbo Total: 289 / 389 (0.7429) | Subject | Accuracy | correct / total | |:------------- | -------- |:--------------- | | chinese | 0.8571 | 48 / 56 | | mathematics | 0.4000 | 44 / 110 | | physics | 0.7000 | 7 / 10 | | chemistry | 0.8182 | 18 / 22 | | biology | 0.9091 | 20 / 22 | | earth science | 0.8000 | 8 / 10 | | geography | 0.9000 | 36 / 40 | | history | 0.9608 | 49 / 51 | | civics | 0.8676 | 59 / 68 | ##### Claude-Instant-1 Total: 214 / 389 (0.5501) | Subject | Accuracy | correct / total | |:------------- | -------- |:--------------- | | chinese | 0.6071 | 34 / 56 | | mathematics | 0.2636 | 29 / 110 | | physics | 0.4000 | 4 / 10 | | chemistry | 0.4545 | 10 / 22 | | biology | 0.5909 | 13 / 22 | | earth science | 0.4000 | 4 / 10 | | geography | 0.6500 | 26 / 40 | | history | 0.8431 | 43 / 51 | | civics | 0.7500 | 51 / 68 | ##### Claude-2 Total: 213 / 389 (0.5476) | Subject | Accuracy | correct / total | |:------------- | -------- |:--------------- | | chinese | 0.6071 | 34 / 56 | | mathematics | 0.3727 | 41 / 110 | | physics | 0.6000 | 6 / 10 | | chemistry | 0.5000 | 11 / 22 | | biology | 0.6364 | 14 / 22 | | earth science | 0.7000 | 7 / 10 | | geography | 0.7000 | 28 / 40 | | history | 0.7255 | 37 / 51 | | civics | 0.5147 | 35 / 68 | #### GSAT ##### ChatGPT Total: 180 / 387 (0.4651) | Subject | Accuracy | correct / total | |:------------- | -------- |:--------------- | | chinese | 0.3587 | 33 / 92 | | mathematics | 0.2083 | 5 / 24 | | physics | 0.3684 | 7 / 19 | | chemistry | 0.2917 | 7 / 24 | | biology | 0.2500 | 4 / 16 | | earth science | 0.4211 | 8 / 19 | | geography | 0.5455 | 24 / 44 | | history | 0.6049 | 49 / 81 | | civics | 0.6324 | 43 / 68 | ##### GPT-4-turbo Total: 293 / 387 (0.7571) | Subject | Accuracy | correct / total | |:------------- | -------- |:--------------- | | chinese | 0.7826 | 72 / 92 | | mathematics | 0.2500 | 6 / 24 | | physics | 0.7368 | 14 / 19 | | chemistry | 0.5417 | 13 / 24 | | biology | 0.6875 | 11 / 16 | | earth science | 0.8421 | 16 / 19 | | geography | 0.8864 | 39 / 44 | | history | 0.8519 | 69 / 81 | | civics | 0.7794 | 53 / 68 | ##### Claude-instant-1 Total: 213 / 387 (0.5504) | Subject | Accuracy | correct / total | |:------------- | -------- |:--------------- | | chinese | 0.4891 | 45 / 92 | | mathematics | 0.2500 | 6 / 24 | | physics | 0.3684 | 7 / 19 | | chemistry | 0.3333 | 8 / 24 | | biology | 0.5625 | 9 / 16 | | earth science | 0.4211 | 8 / 19 | | geography | 0.6818 | 30 / 44 | | history | 0.7160 | 58 / 81 | | civics | 0.6176 | 42 / 68 | ##### Claude-2 Total: 180 / 387 (0.4651) | Subject | Accuracy | correct / total | |:------------- | -------- |:--------------- | | chinese | 0.3152 | 29 / 92 | | mathematics | 0.2083 | 5 / 24 | | physics | 0.3684 | 7 / 19 | | chemistry | 0.2917 | 7 / 24 | | biology | 0.1875 | 3 / 16 | | earth science | 0.2632 | 5 / 19 | | geography | 0.6818 | 30 / 44 | | history | 0.6914 | 56 / 81 | | civics | 0.5588 | 38 / 68 | #### AST ##### ChatGPT Total: 193 / 405 (0.4765) | Subject | Accuracy | correct / total | |:----------- | -------- |:--------------- | | chinese | 0.4365 | 55 / 126 | | mathematics | 0.1500 | 3 / 20 | | physics | 0.2368 | 9 / 38 | | chemistry | 0.2759 | 8 / 29 | | biology | 0.7500 | 27 / 36 | | geography | 0.5094 | 27 / 53 | | history | 0.7843 | 40 / 51 | | civics | 0.4615 | 24 / 52 | ##### GPT-4-turbo Total: 280 / 405 (0.6914) | Subject | Accuracy | correct / total | |:----------- | -------- |:--------------- | | chinese | 0.7302 | 92 / 126 | | mathematics | 0.1500 | 3 / 20 | | physics | 0.5263 | 20 / 38 | | chemistry | 0.3103 | 9 / 29 | | biology | 0.8889 | 32 / 36 | | geography | 0.6981 | 37 / 53 | | history | 0.9804 | 50 / 51 | | civics | 0.7115 | 37 / 52 | ##### Claude-instant-1 Total: 219 / 405 (0.5407) | Subject | Accuracy | correct / total | |:----------- | -------- |:--------------- | | chinese | 0.5635 | 71 / 126 | | mathematics | 0.3500 | 7 / 20 | | physics | 0.3947 | 15 / 38 | | chemistry | 0.1724 | 5 / 29 | | biology | 0.6389 | 23 / 36 | | geography | 0.6038 | 32 / 53 | | history | 0.6863 | 35 / 51 | | civics | 0.5962 | 31 / 52 | ##### Claude-2 Total: 185 / 405 (0.4568) | Subject | Accuracy | correct / total | |:----------- | -------- |:--------------- | | chinese | 0.4365 | 55 / 126 | | mathematics | 0.0500 | 1 / 20 | | physics | 0.3421 | 13 / 38 | | chemistry | 0.1034 | 3 / 29 | | biology | 0.4444 | 16 / 36 | | geography | 0.6604 | 35 / 53 | | history | 0.7255 | 37 / 51 | | civics | 0.4808 | 25 / 52 | ## Dataset Card Authors [optional] [More Information Needed] ## Dataset Card Contact [More Information Needed]

提供机构：

miulab

原始信息汇总

数据集概述

数据集详情

任务类别

问答
文本分类

语言

中文

数据集名称

TMLU

数据规模

1K<n<10K

配置详情

数据集包含多个配置，每个配置对应不同学科的测试和开发数据文件。以下是部分配置及其数据文件路径：

AST系列

AST_chinese
- 测试集: AST_chinese_test.jsonl
- 开发集: AST_chinese_dev.jsonl
AST_mathematics
- 测试集: AST_mathematics_test.jsonl
- 开发集: AST_mathematics_dev.jsonl
AST_biology
- 测试集: AST_biology_test.jsonl
- 开发集: AST_biology_dev.jsonl
AST_chemistry
- 测试集: AST_chemistry_test.jsonl
- 开发集: AST_chemistry_dev.jsonl
AST_physics
- 测试集: AST_physics_test.jsonl
- 开发集: AST_physics_dev.jsonl
AST_civics
- 测试集: AST_civics_test.jsonl
- 开发集: AST_civics_dev.jsonl
AST_geography
- 测试集: AST_geography_test.jsonl
- 开发集: AST_geography_dev.jsonl
AST_history
- 测试集: AST_history_test.jsonl
- 开发集: AST_history_dev.jsonl

GSAT系列

GSAT_chinese
- 测试集: GSAT_chinese_test.jsonl
- 开发集: GSAT_chinese_dev.jsonl
GSAT_chemistry
- 测试集: GSAT_chemistry_test.jsonl
- 开发集: GSAT_chemistry_dev.jsonl
GSAT_biology
- 测试集: GSAT_biology_test.jsonl
- 开发集: GSAT_biology_dev.jsonl
GSAT_physics
- 测试集: GSAT_physics_test.jsonl
- 开发集: GSAT_physics_dev.jsonl
GSAT_earth_science
- 测试集: GSAT_earth_science_test.jsonl
- 开发集: GSAT_earth_science_dev.jsonl
GSAT_mathematics
- 测试集: GSAT_mathematics_test.jsonl
- 开发集: GSAT_mathematics_dev.jsonl
GSAT_geography
- 测试集: GSAT_geography_test.jsonl
- 开发集: GSAT_geography_dev.jsonl
GSAT_history
- 测试集: GSAT_history_test.jsonl
- 开发集: GSAT_history_dev.jsonl
GSAT_civics
- 测试集: GSAT_civics_test.jsonl
- 开发集: GSAT_civics_dev.jsonl

CAP系列

CAP_mathematics
- 测试集: CAP_mathematics_test.jsonl
- 开发集: CAP_mathematics_dev.jsonl
CAP_biology
- 测试集: CAP_biology_test.jsonl
- 开发集: CAP_biology_dev.jsonl
CAP_physics
- 测试集: CAP_physics_test.jsonl
- 开发集: CAP_physics_dev.jsonl
CAP_chemistry
- 测试集: CAP_chemistry_test.jsonl
- 开发集: CAP_chemistry_dev.jsonl
CAP_earth_science
- 测试集: CAP_earth_science_test.jsonl
- 开发集: CAP_earth_science_dev.jsonl
CAP_civics
- 测试集: CAP_civics_test.jsonl
- 开发集: CAP_civics_dev.jsonl
CAP_history
- 测试集: CAP_history_test.jsonl
- 开发集: CAP_history_dev.jsonl
CAP_geography
- 测试集: CAP_geography_test.jsonl
- 开发集: CAP_geography_dev.jsonl
CAP_chinese
- 测试集: CAP_chinese_test.jsonl
- 开发集: CAP_chinese_dev.jsonl

其他系列

driving_rule
- 测试集: driving_rule_test.jsonl
- 开发集: driving_rule_dev.jsonl
basic_traditional_chinese_medicine
- 测试集: basic_traditional_chinese_medicine_test.jsonl
- 开发集: basic_traditional_chinese_medicine_dev.jsonl
clinical_traditional_chinese_medicine
- 测试集: clinical_traditional_chinese_medicine_test.jsonl
- 开发集: clinical_traditional_chinese_medicine_dev.jsonl
lawyer_qualification
- 测试集: lawyer_qualification_test.jsonl
- 开发集: lawyer_qualification_dev.jsonl
nutritionist
- 测试集: nutritionist_test.jsonl
- 开发集: nutritionist_dev.jsonl
tour_leader
- 测试集: tour_leader_test.jsonl
- 开发集: tour_leader_dev.jsonl
tour_guide
- 测试集: tour_guide_test.jsonl
- 开发集: tour_guide_dev.jsonl
taiwan_tourist_resources
- 测试集: taiwan_tourist_resources_test.jsonl
- 开发集: taiwan_tourist_resources_dev.jsonl
clinical_psychologist
- 测试集: clinical_psychologist_test.jsonl
- 开发集: clinical_psychologist_dev.jsonl
teacher_qualification
- 测试集: teacher_qualification_test.jsonl
- 开发集: teacher_qualification_dev.jsonl
accountant
- 测试集: accountant_test.jsonl
- 开发集: accountant_dev.jsonl

5,000+

优质数据集

54 个

任务类型

进入经典数据集