aisc-team-c2/MMedBench

Name: aisc-team-c2/MMedBench
Creator: aisc-team-c2
Published: 2024-03-05 01:44:42
License: 暂无描述

Hugging Face2024-03-05 更新2024-06-22 收录

下载链接：

https://hf-mirror.com/datasets/aisc-team-c2/MMedBench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 language: - en - zh - ja - fr - ru - es tags: - medical task_categories: - question-answering configs: - config_name: english data_files: "English.jsonl" - config_name: french data_files: "French.jsonl" --- *This is a dataset repository made for the AISC class at Harvard Medical School. Please find the original dataset repository here: https://huggingface.co/datasets/Henrychur/MMedBench* # MMedBench [💻Github Repo](https://github.com/MAGIC-AI4Med/MMedLM) [🖨️arXiv Paper](https://arxiv.org/abs/2402.13963) The official benchmark for "Towards Building Multilingual Language Model for Medicine". ## Introduction This repo contains MMedBench, a comprehensive multilingual medical benchmark comprising 45,048 QA pairs for training and 8,518 QA pairs for testing. Each sample includes a question, options, the correct answer, and a reference explanation for the selection of the correct answer. To access the data, please download MMedBench.zip. Upon extracting the file, you will find two folders named Train and Test. Each folder contains six .jsonl files, each named after its respective language. Each line in these files represents a sample, with the following attributes for each sample: |Key |Value Type |Description | |------------------|-------------------|-----------------------------------------| |question |String | A string of question | |options |Dict | A dict where key is the index ‘A,B,C,D,E’ and value is the string of option| | |answer_idx |String | A string of right answer idxs. Each idx is split by ','| |rationale |String | A string of explanation for the selection of the correct answer | |human_checked |Bool | Whether the rationale has been manually checked. | |human_check_passed |Bool | Whether the rationale has passed manual check. | Our [GitHub](https://github.com/MAGIC-AI4Med/MMedLM) provides the code for finetuning on the trainset of MMedBench. Check out for more details. ## News [2024.2.21] Our pre-print paper is released ArXiv. Dive into our findings [here](https://arxiv.org/abs/2402.13963). [2024.2.20] We release [MMedLM](https://huggingface.co/Henrychur/MMedLM) and [MMedLM 2](https://huggingface.co/Henrychur/MMedLM2). With an auto-regressive continues training on MMedC, these models achieves superior performance compared to all other open-source models, even rivaling GPT-4 on MMedBench. [2023.2.20] We release [MMedC](https://huggingface.co/datasets/Henrychur/MMedC), a multilingual medical corpus containing 25.5B tokens. [2023.2.20] We release [MMedBench](https://huggingface.co/datasets/Henrychur/MMedBench), a new multilingual medical multi-choice question-answering benchmark with rationale. Check out the leaderboard [here](https://henrychur.github.io/MultilingualMedQA/). ## Evaluation on MMedBench The further pretrained MMedLM 2 showcast it's great performance in medical domain across different language. | Method | Size | Year | MMedC | MMedBench | English | Chinese | Japanese | French | Russian | Spanish | Avg. | |------------------|------|---------|-----------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------| | GPT-3.5 | - | 2022.12 | ✗ | ✗ | 56.88 | 52.29 | 34.63 | 32.48 | 66.36 | 66.06 | 51.47 | | GPT-4 | - | 2023.3 | ✗ | ✗ | 78.00 | 75.07 | 72.91 | 56.59 | 83.62 | 85.67 | 74.27 | | Gemini-1.0 pro | - | 2024.1 | ✗ | ✗ | 53.73 | 60.19 | 44.22 | 29.90 | 73.44 | 69.69 | 55.20 | | BLOOMZ | 7B | 2023.5 | ✗ | trainset | 43.28 | 58.06 | 32.66 | 26.37 | 62.89 | 47.34 | 45.10 | | InternLM | 7B | 2023.7 | ✗ | trainset | 44.07 | 64.62 | 37.19 | 24.92 | 58.20 | 44.97 | 45.67 | | Llama\ 2 | 7B | 2023.7 | ✗ | trainset | 43.36 | 50.29 | 25.13 | 20.90 | 66.80 | 47.10 | 42.26 | | MedAlpaca | 7B | 2023.3 | ✗ | trainset | 46.74 | 44.80 | 29.64 | 21.06 | 59.38 | 45.00 | 41.11 | | ChatDoctor | 7B | 2023.4 | ✗ | trainset | 43.52 | 43.26 | 25.63 | 18.81 | 62.50 | 43.44 | 39.53 | | PMC-LLaMA | 7B | 2023.4 | ✗ | trainset | 47.53 | 42.44 | 24.12 | 20.74 | 62.11 | 43.29 | 40.04 | | Mistral | 7B | 2023.10 | ✗ | trainset | 61.74 | 71.10 | 44.72 | 48.71 | 74.22 | 63.86 | 60.73 | | InternLM\ 2 | 7B | 2024.2 | ✗ | trainset | 57.27 | 77.55 | 47.74 | 41.00 | 68.36 | 59.59 | 58.59 | | MMedLM~(Ours) | 7B | - | ✗ | trainset | 49.88 | 70.49 | 46.23 | 36.66 | 72.27 | 54.52 | 55.01 | | MMedLM\ 2~(Ours) | 7B | - | ✗ | trainset | 61.74 | 80.01 | 61.81 | 52.09 | 80.47 | 67.65 | 67.30 | - GPT and Gemini is evluated under zero-shot setting through API - Open-source models first undergo training on the trainset of MMedBench before evaluate. ## Contact If you have any question, please feel free to contact qiupengcheng@pjlab.org.cn. ## Citation ``` @misc{qiu2024building, title={Towards Building Multilingual Language Model for Medicine}, author={Pengcheng Qiu and Chaoyi Wu and Xiaoman Zhang and Weixiong Lin and Haicheng Wang and Ya Zhang and Yanfeng Wang and Weidi Xie}, year={2024}, eprint={2402.13963}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

提供机构：

aisc-team-c2

原始信息汇总

MMedBench 数据集概述

数据集基本信息

许可证: cc-by-4.0
语言:
- 英语 (en)
- 中文 (zh)
- 日语 (ja)
- 法语 (fr)
- 俄语 (ru)
- 西班牙语 (es)
标签: medical
任务类别: question-answering

数据集配置

配置名称: english
- 数据文件: English.jsonl
配置名称: french
- 数据文件: French.jsonl

数据集详细介绍

数据集名称: MMedBench
数据集类型: 多语言医学问答基准
数据量:
- 训练集: 45,048 个 QA 对
- 测试集: 8,518 个 QA 对
数据格式:
- 包含两个文件夹: Train 和 Test
- 每个文件夹包含六个 .jsonl 文件，每个文件对应一种语言
- 每行代表一个样本，包含以下属性:
  - question: 问题字符串
  - options: 选项字典，键为索引 A,B,C,D,E，值为选项字符串
  - answer_idx: 正确答案索引字符串，每个索引用逗号分隔
  - rationale: 正确答案选择的解释字符串
  - human_checked: 是否手动检查过解释
  - human_check_passed: 解释是否通过手动检查

数据集评估

评估方法:
- 模型在 MMedBench 训练集上进行微调后进行评估
- 评估结果包括不同语言的表现，如英语、中文、日语、法语、俄语和西班牙语

联系信息

联系方式: qiupengcheng@pjlab.org.cn

引用信息

@misc{qiu2024building, title={Towards Building Multilingual Language Model for Medicine}, author={Pengcheng Qiu and Chaoyi Wu and Xiaoman Zhang and Weixiong Lin and Haicheng Wang and Ya Zhang and Yanfeng Wang and Weidi Xie}, year={2024}, eprint={2402.13963}, archivePrefix={arXiv}, primaryClass={cs.CL} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集