five

winder-hybrids/MedicalTextbook_QA

收藏
Hugging Face2024-02-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/winder-hybrids/MedicalTextbook_QA
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: Anatomy_Gray features: - name: question dtype: string - name: choices sequence: string - name: answer dtype: int64 - name: original_text dtype: string splits: - name: test num_bytes: 1898508 num_examples: 500 download_size: 152583 dataset_size: 1898508 - config_name: Biochemistry_Lippincott features: - name: question dtype: string - name: choices sequence: string - name: answer dtype: int64 - name: original_text dtype: string splits: - name: test num_bytes: 1657461 num_examples: 500 download_size: 161466 dataset_size: 1657461 - config_name: Cell_Biology_Alberts features: - name: question dtype: string - name: choices sequence: string - name: answer dtype: int64 - name: original_text dtype: string splits: - name: test num_bytes: 1911167 num_examples: 500 download_size: 178902 dataset_size: 1911167 - config_name: Gynecology_Novak features: - name: question dtype: string - name: choices sequence: string - name: answer dtype: int64 - name: original_text dtype: string splits: - name: test num_bytes: 1716835 num_examples: 500 download_size: 166726 dataset_size: 1716835 - config_name: Histology_Ross features: - name: question dtype: string - name: choices sequence: string - name: answer dtype: int64 - name: original_text dtype: string splits: - name: test num_bytes: 1805108 num_examples: 500 download_size: 161573 dataset_size: 1805108 - config_name: Immunology_Janeway features: - name: question dtype: string - name: choices sequence: string - name: answer dtype: int64 - name: original_text dtype: string splits: - name: test num_bytes: 1662680 num_examples: 500 download_size: 163548 dataset_size: 1662680 - config_name: Neurology_Adams features: - name: question dtype: string - name: choices sequence: string - name: answer dtype: int64 - name: original_text dtype: string splits: - name: test num_bytes: 1891656 num_examples: 500 download_size: 188245 dataset_size: 1891656 - config_name: Obstentrics_Williams features: - name: question dtype: string - name: choices sequence: string - name: answer dtype: int64 - name: original_text dtype: string splits: - name: test num_bytes: 1597198 num_examples: 500 download_size: 169259 dataset_size: 1597198 - config_name: Pathology_Robbins features: - name: question dtype: string - name: choices sequence: string - name: answer dtype: int64 - name: original_text dtype: string splits: - name: test num_bytes: 1749146 num_examples: 500 download_size: 175037 dataset_size: 1749146 - config_name: Pediatrics_Nelson features: - name: question dtype: string - name: choices sequence: string - name: answer dtype: int64 - name: original_text dtype: string splits: - name: test num_bytes: 1885412 num_examples: 500 download_size: 180188 dataset_size: 1885412 - config_name: Pharmacology_Katzung features: - name: question dtype: string - name: choices sequence: string - name: answer dtype: int64 - name: original_text dtype: string splits: - name: test num_bytes: 1748810 num_examples: 500 download_size: 172568 dataset_size: 1748810 - config_name: Physiology_Levy features: - name: question dtype: string - name: choices sequence: string - name: answer dtype: int64 - name: original_text dtype: string splits: - name: test num_bytes: 1756829 num_examples: 500 download_size: 167776 dataset_size: 1756829 - config_name: Psichiatry_DSM-5 features: - name: question dtype: string - name: choices sequence: string - name: answer dtype: int64 - name: original_text dtype: string splits: - name: test num_bytes: 1976522 num_examples: 500 download_size: 171016 dataset_size: 1976522 configs: - config_name: Anatomy_Gray data_files: - split: test path: Anatomy_Gray/test-* - config_name: Biochemistry_Lippincott data_files: - split: test path: Biochemistry_Lippincott/test-* - config_name: Cell_Biology_Alberts data_files: - split: test path: Cell_Biology_Alberts/test-* - config_name: Gynecology_Novak data_files: - split: test path: Gynecology_Novak/test-* - config_name: Histology_Ross data_files: - split: test path: Histology_Ross/test-* - config_name: Immunology_Janeway data_files: - split: test path: Immunology_Janeway/test-* - config_name: Neurology_Adams data_files: - split: test path: Neurology_Adams/test-* - config_name: Obstentrics_Williams data_files: - split: test path: Obstentrics_Williams/test-* - config_name: Pathology_Robbins data_files: - split: test path: Pathology_Robbins/test-* - config_name: Pediatrics_Nelson data_files: - split: test path: Pediatrics_Nelson/test-* - config_name: Pharmacology_Katzung data_files: - split: test path: Pharmacology_Katzung/test-* - config_name: Physiology_Levy data_files: - split: test path: Physiology_Levy/test-* - config_name: Psichiatry_DSM-5 data_files: - split: test path: Psichiatry_DSM-5/test-* --- # Medical textbook question answering This corpus contains multiple-choice quiz questions for 13 commonly-used medical textbooks. The questions are designed to examine understanding of the main concepts in the textbooks. The QA data is used to evaluate knowledge learning of language models in the following paper: - **Paper:** [Conditional language learning with context](link pending) ### Data Splits - subjects: anatomy, biochemistry, cell biology, gynecology, histology, immunology, neurology, obstentrics, pathology, pediatrics, pharmacology, physiology, psychiatry - 500 questions for each subject ## Dataset Creation Question and answers are generated by GPT-4 given excerpts from the textbooks. Refer to the paper for the instructions used to generate the questions. ### Citation Information ``` pending ```
提供机构:
winder-hybrids
原始信息汇总

医学教科书问答数据集

数据集概述

该数据集包含13本常用医学教科书的多项选择题,旨在检验对教科书中主要概念的理解。每个主题包含500个问题。

数据集配置

数据集包含以下配置:

  • Anatomy_Gray
  • Biochemistry_Lippincott
  • Cell_Biology_Alberts
  • Gynecology_Novak
  • Histology_Ross
  • Immunology_Janeway
  • Neurology_Adams
  • Obstentrics_Williams
  • Pathology_Robbins
  • Pediatrics_Nelson
  • Pharmacology_Katzung
  • Physiology_Levy
  • Psichiatry_DSM-5

数据特征

每个配置包含以下特征:

  • question: 问题,数据类型为字符串。
  • choices: 选项,数据类型为字符串序列。
  • answer: 答案,数据类型为整数。
  • original_text: 原始文本,数据类型为字符串。

数据分割

每个配置的测试集包含500个样本,具体信息如下:

  • Anatomy_Gray: 1898508字节
  • Biochemistry_Lippincott: 1657461字节
  • Cell_Biology_Alberts: 1911167字节
  • Gynecology_Novak: 1716835字节
  • Histology_Ross: 1805108字节
  • Immunology_Janeway: 1662680字节
  • Neurology_Adams: 1891656字节
  • Obstentrics_Williams: 1597198字节
  • Pathology_Robbins: 1749146字节
  • Pediatrics_Nelson: 1885412字节
  • Pharmacology_Katzung: 1748810字节
  • Physiology_Levy: 1756829字节
  • Psichiatry_DSM-5: 1976522字节

数据文件路径

每个配置的测试集数据文件路径如下:

  • Anatomy_Gray: Anatomy_Gray/test-*
  • Biochemistry_Lippincott: Biochemistry_Lippincott/test-*
  • Cell_Biology_Alberts: Cell_Biology_Alberts/test-*
  • Gynecology_Novak: Gynecology_Novak/test-*
  • Histology_Ross: Histology_Ross/test-*
  • Immunology_Janeway: Immunology_Janeway/test-*
  • Neurology_Adams: Neurology_Adams/test-*
  • Obstentrics_Williams: Obstentrics_Williams/test-*
  • Pathology_Robbins: Pathology_Robbins/test-*
  • Pediatrics_Nelson: Pediatrics_Nelson/test-*
  • Pharmacology_Katzung: Pharmacology_Katzung/test-*
  • Physiology_Levy: Physiology_Levy/test-*
  • Psichiatry_DSM-5: Psichiatry_DSM-5/test-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作