winder-hybrids/MedicalTextbook_QA
收藏Hugging Face2024-02-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/winder-hybrids/MedicalTextbook_QA
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: Anatomy_Gray
features:
- name: question
dtype: string
- name: choices
sequence: string
- name: answer
dtype: int64
- name: original_text
dtype: string
splits:
- name: test
num_bytes: 1898508
num_examples: 500
download_size: 152583
dataset_size: 1898508
- config_name: Biochemistry_Lippincott
features:
- name: question
dtype: string
- name: choices
sequence: string
- name: answer
dtype: int64
- name: original_text
dtype: string
splits:
- name: test
num_bytes: 1657461
num_examples: 500
download_size: 161466
dataset_size: 1657461
- config_name: Cell_Biology_Alberts
features:
- name: question
dtype: string
- name: choices
sequence: string
- name: answer
dtype: int64
- name: original_text
dtype: string
splits:
- name: test
num_bytes: 1911167
num_examples: 500
download_size: 178902
dataset_size: 1911167
- config_name: Gynecology_Novak
features:
- name: question
dtype: string
- name: choices
sequence: string
- name: answer
dtype: int64
- name: original_text
dtype: string
splits:
- name: test
num_bytes: 1716835
num_examples: 500
download_size: 166726
dataset_size: 1716835
- config_name: Histology_Ross
features:
- name: question
dtype: string
- name: choices
sequence: string
- name: answer
dtype: int64
- name: original_text
dtype: string
splits:
- name: test
num_bytes: 1805108
num_examples: 500
download_size: 161573
dataset_size: 1805108
- config_name: Immunology_Janeway
features:
- name: question
dtype: string
- name: choices
sequence: string
- name: answer
dtype: int64
- name: original_text
dtype: string
splits:
- name: test
num_bytes: 1662680
num_examples: 500
download_size: 163548
dataset_size: 1662680
- config_name: Neurology_Adams
features:
- name: question
dtype: string
- name: choices
sequence: string
- name: answer
dtype: int64
- name: original_text
dtype: string
splits:
- name: test
num_bytes: 1891656
num_examples: 500
download_size: 188245
dataset_size: 1891656
- config_name: Obstentrics_Williams
features:
- name: question
dtype: string
- name: choices
sequence: string
- name: answer
dtype: int64
- name: original_text
dtype: string
splits:
- name: test
num_bytes: 1597198
num_examples: 500
download_size: 169259
dataset_size: 1597198
- config_name: Pathology_Robbins
features:
- name: question
dtype: string
- name: choices
sequence: string
- name: answer
dtype: int64
- name: original_text
dtype: string
splits:
- name: test
num_bytes: 1749146
num_examples: 500
download_size: 175037
dataset_size: 1749146
- config_name: Pediatrics_Nelson
features:
- name: question
dtype: string
- name: choices
sequence: string
- name: answer
dtype: int64
- name: original_text
dtype: string
splits:
- name: test
num_bytes: 1885412
num_examples: 500
download_size: 180188
dataset_size: 1885412
- config_name: Pharmacology_Katzung
features:
- name: question
dtype: string
- name: choices
sequence: string
- name: answer
dtype: int64
- name: original_text
dtype: string
splits:
- name: test
num_bytes: 1748810
num_examples: 500
download_size: 172568
dataset_size: 1748810
- config_name: Physiology_Levy
features:
- name: question
dtype: string
- name: choices
sequence: string
- name: answer
dtype: int64
- name: original_text
dtype: string
splits:
- name: test
num_bytes: 1756829
num_examples: 500
download_size: 167776
dataset_size: 1756829
- config_name: Psichiatry_DSM-5
features:
- name: question
dtype: string
- name: choices
sequence: string
- name: answer
dtype: int64
- name: original_text
dtype: string
splits:
- name: test
num_bytes: 1976522
num_examples: 500
download_size: 171016
dataset_size: 1976522
configs:
- config_name: Anatomy_Gray
data_files:
- split: test
path: Anatomy_Gray/test-*
- config_name: Biochemistry_Lippincott
data_files:
- split: test
path: Biochemistry_Lippincott/test-*
- config_name: Cell_Biology_Alberts
data_files:
- split: test
path: Cell_Biology_Alberts/test-*
- config_name: Gynecology_Novak
data_files:
- split: test
path: Gynecology_Novak/test-*
- config_name: Histology_Ross
data_files:
- split: test
path: Histology_Ross/test-*
- config_name: Immunology_Janeway
data_files:
- split: test
path: Immunology_Janeway/test-*
- config_name: Neurology_Adams
data_files:
- split: test
path: Neurology_Adams/test-*
- config_name: Obstentrics_Williams
data_files:
- split: test
path: Obstentrics_Williams/test-*
- config_name: Pathology_Robbins
data_files:
- split: test
path: Pathology_Robbins/test-*
- config_name: Pediatrics_Nelson
data_files:
- split: test
path: Pediatrics_Nelson/test-*
- config_name: Pharmacology_Katzung
data_files:
- split: test
path: Pharmacology_Katzung/test-*
- config_name: Physiology_Levy
data_files:
- split: test
path: Physiology_Levy/test-*
- config_name: Psichiatry_DSM-5
data_files:
- split: test
path: Psichiatry_DSM-5/test-*
---
# Medical textbook question answering
This corpus contains multiple-choice quiz questions for 13 commonly-used medical textbooks. The questions are designed to examine understanding of the main concepts in the textbooks.
The QA data is used to evaluate knowledge learning of language models in the following paper:
- **Paper:** [Conditional language learning with context](link pending)
### Data Splits
- subjects: anatomy, biochemistry, cell biology, gynecology, histology, immunology, neurology, obstentrics, pathology, pediatrics, pharmacology, physiology, psychiatry
- 500 questions for each subject
## Dataset Creation
Question and answers are generated by GPT-4 given excerpts from the textbooks. Refer to the paper for the instructions used to generate the questions.
### Citation Information
```
pending
```
提供机构:
winder-hybrids
原始信息汇总
医学教科书问答数据集
数据集概述
该数据集包含13本常用医学教科书的多项选择题,旨在检验对教科书中主要概念的理解。每个主题包含500个问题。
数据集配置
数据集包含以下配置:
- Anatomy_Gray
- Biochemistry_Lippincott
- Cell_Biology_Alberts
- Gynecology_Novak
- Histology_Ross
- Immunology_Janeway
- Neurology_Adams
- Obstentrics_Williams
- Pathology_Robbins
- Pediatrics_Nelson
- Pharmacology_Katzung
- Physiology_Levy
- Psichiatry_DSM-5
数据特征
每个配置包含以下特征:
question: 问题,数据类型为字符串。choices: 选项,数据类型为字符串序列。answer: 答案,数据类型为整数。original_text: 原始文本,数据类型为字符串。
数据分割
每个配置的测试集包含500个样本,具体信息如下:
- Anatomy_Gray: 1898508字节
- Biochemistry_Lippincott: 1657461字节
- Cell_Biology_Alberts: 1911167字节
- Gynecology_Novak: 1716835字节
- Histology_Ross: 1805108字节
- Immunology_Janeway: 1662680字节
- Neurology_Adams: 1891656字节
- Obstentrics_Williams: 1597198字节
- Pathology_Robbins: 1749146字节
- Pediatrics_Nelson: 1885412字节
- Pharmacology_Katzung: 1748810字节
- Physiology_Levy: 1756829字节
- Psichiatry_DSM-5: 1976522字节
数据文件路径
每个配置的测试集数据文件路径如下:
- Anatomy_Gray: Anatomy_Gray/test-*
- Biochemistry_Lippincott: Biochemistry_Lippincott/test-*
- Cell_Biology_Alberts: Cell_Biology_Alberts/test-*
- Gynecology_Novak: Gynecology_Novak/test-*
- Histology_Ross: Histology_Ross/test-*
- Immunology_Janeway: Immunology_Janeway/test-*
- Neurology_Adams: Neurology_Adams/test-*
- Obstentrics_Williams: Obstentrics_Williams/test-*
- Pathology_Robbins: Pathology_Robbins/test-*
- Pediatrics_Nelson: Pediatrics_Nelson/test-*
- Pharmacology_Katzung: Pharmacology_Katzung/test-*
- Physiology_Levy: Physiology_Levy/test-*
- Psichiatry_DSM-5: Psichiatry_DSM-5/test-*



