five

Medical-imaging-data-sets

收藏
github2024-08-23 更新2024-09-19 收录
下载链接:
https://github.com/Dedapic/Medical-imaging-data-sets
下载链接
链接失效反馈
官方服务:
资源简介:
汇集了不同类型的医学成像相关的数据集,每一个数据集都有相关的介绍和下载地址处。

This resource compiles various types of medical imaging-related datasets, each of which is equipped with relevant introductions and download links.
创建时间:
2024-08-22
原始信息汇总

Medical-imaging-data-sets

内容简介

汇集了不同类型的医学成像相关的数据集,每一个数据集都有相关的介绍和下载地址处。

主要分类

  • 一. QA类型与语言类型的数据集
  • 二. 医学图像类数据集
  • 三. Captioning dataset
  • 四. 集合多种数据集类型的数据集
  • 五. 专门针对一个部位/器官的数据集

一. QA类型与语言类型的数据集

(1)Chat-Doctor(Yunxiang et al. 2023)的数据集

  • 模型简介:A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge
  • 涵盖内容
    • Autonomous ChatDoctor with Disease Database Demo.
    • 100k real conversations between patients and doctors from HealthCareMagic.com HealthCareMagic-100k.
    • Real conversations between patients and doctors from icliniq.com icliniq-10k.
    • Checkpoints of ChatDoctor.
    • Stanford Alpaca data for basic conversational capabilities.
  • 下载地址
    • InstructorDoctor-5K:https://drive.google.com/file/d/1nDTKZ3wZbZWTkFMBkxlamrzbNz0frugg/view
    • InstructorDoctor-200k:https://drive.google.com/file/d/1lyfqIwlLSClhgrCutWuEe_IACNq6XNUt/view

(2)PubMedQA(Jin 等人,2019 年)

  • 数据集简介:A Dataset for Biomedical Research Question Answering
  • 摘要:The task of PubMedQA is to answer research questions with yes/no/maybe using the corresponding abstracts.
  • 下载地址:https://github.com/pubmedqa/pubmedqa

(3)MedMCQA(Pal, Umapathi 等人,2022 年)

  • 数据集简介:A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.
  • 摘要:MedMCQA has more than 194k high-quality AIIMS & NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects.
  • 下载地址:https://drive.google.com/uc?export=download&id=15VkJdq5eyWIkfb_aoD3oS8i4tScbHYky

(4)HEAD-QA

  • 数据集简介:A Healthcare Dataset for Complex Reasoning.
  • 摘要:HEAD-QA is a multi-choice HEAlthcare Dataset. The questions come from exams to access a specialized position in the Spanish healthcare system.
  • 下载地址:https://huggingface.co/datasets/dvilares/head_qa

(5)PMC-VQA

  • 数据集简介:PMC-VQA is a large-scale medical visual question-answering dataset, which contains 227k VQA pairs of 149k images.
  • 下载地址:https://huggingface.co/datasets/xmcmic/PMC-VQA

(6)VQA-RAD (Visual Question Answering in Radiology)

  • 数据集简介:VQA-RAD consists of 3,515 question–answer pairs on 315 radiology images.
  • 下载地址:https://huggingface.co/datasets/flaviagiammarino/vqa-rad

(7)ScanQA模型的数据集

  • 模型简介:3D Question Answering for Spatial Scene Understanding
  • 摘要:Our new ScanQA dataset contains over 41k question-answer pairs from 800 indoor scenes obtained from the ScanNet dataset.
  • 下载地址:https://drive.google.com/drive/folders/1-21A3TBE0QuofEwDg5oDz2z0HEdbVgL2

(8)FrenchMedMCQA

  • 数据集简介:FrenchMedMCQA is the first publicly available Multiple-Choice Question Answering (MCQA) dataset in French for medical domain.
  • 下载地址:https://huggingface.co/datasets/qanastek/frenchmedmcqa

(9)MMedC

  • 数据集简介:A multilingual medical corpus with 25.5 billion tokens.
  • 下载地址:https://huggingface.co/datasets/Henrychur/MMedC

(10)Medical Meadow

  • 数据集简介:Medical Meadow currently encompasses roughly 1.5 million data points across a diverse range of tasks.
  • 下载地址:https://huggingface.co/datasets/medalpaca/medical_meadow_medqa

(11)CulturaX

  • 数据集简介:We present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages.
  • 下载地址:https://huggingface.co/datasets/uonlp/CulturaX

(12)PMC-CaseReport

  • 数据集简介:PMC-CaseReport is a filtered subset of PMC-Inline with around 103K case reports.
  • 下载地址:https://huggingface.co/datasets/chaoyi-wu/PMC-CaseReport

(13)SLAKE

  • 数据集简介:SLAKE is an English-Chinese bilingual dataset consisting of 642 images and 14,028 question-answer pairs.
  • 下载地址:https://huggingface.co/datasets/BoKelvin/SLAKE

(14)NLM-TB

  • 数据集简介:NLM-TB 是一个用于肺结核研究和管理的数据库。
  • 下载地址
    • https://openi.nlm.nih.gov/imgs/collections/NLM-MontgomeryCXRSet.zip
    • https://openi.nlm.nih.gov/imgs/collections/ChinaSet_AllFiles.zip

二. 医学图像类数据集

(1)QUILT-1M

  • 数据集简介:The Quilt1m dataset gathers pathology image-text pairs from four public sources.
  • 摘要:Quilt-1M, with 1M paired image-text samples, marks it as the largest vision-language histopathology dataset to date.
  • 下载地址
    • Rescaled: https://zenodo.org/records/8239942Zenodo
    • Full: https://docs.google.com/forms/d/e/1FAIpQLSdSe06DIbPn71jA2rCxe_5tUPfyHhSH1Z7ZTJBxWM26cnpZFg/viewform

(2)MIMIC-CXR

  • 数据集简介:MIMIC-CXR has chest X-ray scans from 227,835 studies.
  • 摘要:The MIMIC Chest X-ray (MIMIC-CXR) Database v2.0.0 is a large publicly available dataset of chest radiographs in DICOM format.
  • 下载地址:https://physionet.org/content/mimic-cxr/2.1.0/

(3)CT-RATE

  • 数据集简介:CT-RATE consists of 25,692 non-contrast chest CT volumes, expanded to 50,188 through various reconstructions.
  • 下载地址:https://huggingface.co/datasets/ibrahimhamamci/CT-RATE

(4)MMC4

  • 数据集简介:MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
  • 摘要:mmc4 spans everyday topics like cooking, travel, technology, etc.
  • 下载地址:https://github.com/allenai/mmc4

(5)VinDr-Mammo

  • 数据集简介:A large-scale benchmark dataset for computer-aided detection and diagnosis in full-field digital mammography.
  • 摘要:This project introduces a large-scale full-field digital mammography dataset of 5,000 four-view exams.
  • 下载地址:https://physionet.org/content/vindr-mammo/1.0.0/

(6)VinDr-SpineXR

  • 数据集简介:A large annotated medical image dataset for spinal lesions detection and classification from radiographs.
  • 摘要:The dataset, called VinDr-SpineXR, contains 10,466 spine X-ray images from 5,000 studies.
  • 下载地址:https://physionet.org/content/vindr-spinexr/1.0.0/

(7)VinDr-PCXR

  • 数据集简介:An open, large-scale pediatric chest X-ray dataset for interpretation of common thoracic diseases.
  • 摘要:The dataset is divided into a training set of 7,728 and a test set of 1,397.
  • 下载地址:https://physionet.org/content/vindr-pcxr/1.0.0/

(8)RAD-ChestCT Dataset

  • 数据集简介:The RAD-ChestCT dataset is a large medical imaging dataset developed by Duke MD/PhD student Rachel Draelos.
  • 下载地址:https://zenodo.org/records/6406114#.Ytl6OXbMLAQ

(9)ChestX-ray dataset

  • 数据集简介:ChestX-ray dataset comprises 112,120 frontal-view X-ray images of 30,805 unique patients.
  • 下载地址
    • https://huggingface.co/datasets/alkzar90/NIH-Chest-X-ray-dataset
    • https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/36938765345

(10)LIDC-IDRI

  • 数据集简介:The Lung Image Database Consortium image collection (LIDC-IDRI) consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions.
  • 下载地址:https://www.cancerimagingarchive.net/collection/lidc-idri/

(11)COVIDx CXR-4

  • 数据集简介:An Expanded Multi-Institutional Open-Source Benchmark Dataset for Chest X-ray Image-Based Computer-Aided COVID-19 Diagnostics.
  • 摘要:COVIDx CXR-4 expands significantly on the previous COVIDx CXR-3 dataset by increasing the total patient cohort size.
  • 下载地址:https://www.kaggle.com/datasets/andyczhao/covidx-cxr2

(12)PadChest

  • 数据集简介:一个大型胸部 X 射线成像数据集,带有多标签注释报告。
  • 摘要:该数据集包括 2009 年至 2017 年圣胡安医院放射科医生对 67,000 名患者的 160,000 多张图像进行解释和报告。
  • 下载地址:https://bimcv.cipf.es/bimcv-projects/padchest/

(13)ChestX-Det

  • 数据集简介:ChestX-Det consists of 3578 images from NIH ChestX-14.
  • 摘要:The 13 categories are Atelectasis, Calcification, Cardiomegaly, Consolidation, Diffuse Nodule, Effusion, Emphysema, Fibrosis, Fracture, Mass, Nodule, Pleural Thickening, Pneumothorax.
  • 下载地址:https://github.com/Deepwise-AILab/ChestX-Det-Dataset?tab=readme-ov-file

(14)VinDr-CXR

  • 数据集简介:An open dataset of chest X-rays with radiologist annotations.
  • 摘要:The released dataset is divided into a training set of 15,000 and a test set of 3,000.
  • 下载地址:https://physionet.org/content/vindr-cxr/1.0.0/

(15)Medical Segmentation Decathlon

  • 数据集简介:The Medical Segmentation Decathlon is a collection of medical image segmentation datasets.
  • 摘要:It contains a total of 2,633 three-dimensional images collected across multiple anatomies of interest.
  • 下载地址:http://medicaldecathlon.com/

(16)FUMPE

  • 数据集简介:FUMPE consists of computed-tomography angiography (CTA) images for pulmonary embolism (PE) of 35 different patients.
  • 下载地址:https://www.kaggle.com/datasets/andrewmvd/pulmonary-embolism-in-ct-images

(17)AbdomenCT-1K

  • 数据集简介:We present a large and diverse abdominal CT organ segmentation dataset, termed AbdomenCT-1K.
  • 下载地址:https://docs.google.com/forms/d/e/1FAIpQLSeuZ3yanPc0E-SxvYD2ZX8eu-BKxxdQT5GQUpyzfUeK39ytow/viewform

(18)ATLAS Data

  • 数据集简介:955 T1-weighted MRI scans, divided into a training dataset and a test dataset.
  • 下载地址:未提供具体下载地址。
搜集汇总
数据集介绍
main_image_url
构建方式
Medical-imaging-data-sets数据集的构建方式体现了多源数据的综合集成与精细分类。该数据集汇集了来自不同来源的医学成像数据,涵盖了多种医学图像类型,如胸部X射线、CT扫描和MRI等。每个数据集均经过精心筛选和标注,确保数据的高质量和临床相关性。此外,数据集还包含了详细的元数据和注释信息,以便于研究人员进行深入分析和模型训练。
特点
该数据集的主要特点在于其多样性和广泛性。它不仅包含了多种类型的医学图像数据,还涵盖了从常见疾病到罕见病例的广泛范围。此外,数据集中的每个图像都附有详细的注释和标签,便于进行精确的图像分析和疾病诊断。数据集的多样性还体现在其多语言支持上,包括英语、中文等多种语言的标注,为全球研究者提供了便利。
使用方法
使用Medical-imaging-data-sets数据集时,研究者可以根据具体需求选择合适的数据子集进行分析。数据集提供了详细的下载指南和使用说明,确保用户能够顺利获取和处理数据。对于机器学习和深度学习模型的训练,数据集的高质量标注和多样性提供了丰富的训练样本。此外,数据集还支持多种数据格式,便于不同研究工具和平台的集成与应用。
背景与挑战
背景概述
医学成像数据集(Medical-imaging-data-sets)是由多个研究机构和专家团队共同构建的,旨在汇集不同类型的医学成像数据,以支持医学图像分析和诊断的研究。该数据集的创建始于2023年,由Yunxiang等研究人员主导,其核心研究问题是如何通过大规模的医学图像数据集来提升医学诊断模型的性能。该数据集的发布对医学图像处理领域产生了深远影响,为研究人员提供了丰富的资源,以开发和验证新的算法和技术。
当前挑战
医学成像数据集的构建面临多重挑战。首先,数据集的多样性和复杂性要求高度的标准化和质量控制,以确保数据的可靠性和一致性。其次,医学图像的隐私和安全问题是一个重大挑战,需要严格的数据脱敏和保护措施。此外,数据集的规模和多样性也带来了技术上的挑战,如数据存储、处理和分析的高效性。最后,如何确保数据集的广泛适用性和持续更新也是一个重要的研究方向。
常用场景
经典使用场景
Medical-imaging-data-sets 数据集在医学成像领域中具有广泛的应用,尤其在医学图像的自动分析和诊断中表现突出。例如,该数据集常用于训练和验证深度学习模型,以实现对胸部X射线、CT扫描和MRI图像的自动解读。通过这些数据集,研究人员可以开发出能够识别和分类多种疾病的算法,如肺结核、乳腺癌和心血管疾病等。此外,这些数据集还支持医学图像的语义分割和目标检测任务,从而提高诊断的准确性和效率。
实际应用
在实际应用中,Medical-imaging-data-sets 数据集被广泛用于开发和部署医学图像分析系统。例如,在医院和诊所中,这些系统可以用于辅助放射科医生进行快速和准确的诊断,减少误诊率。此外,数据集还支持远程医疗和移动医疗应用,使得患者可以在家中接受初步的医学图像分析服务。通过这些应用,数据集不仅提高了医疗服务的效率,还扩大了医疗资源的覆盖范围,特别是在资源匮乏的地区。
衍生相关工作
基于 Medical-imaging-data-sets 数据集,许多相关的经典工作得以开展。例如,研究人员利用该数据集开发了多种先进的医学图像分析算法,如基于卷积神经网络(CNN)的图像分类和分割模型。此外,数据集还促进了跨学科的研究合作,如医学与计算机科学的结合,推动了人工智能在医疗领域的应用。许多高影响力的学术论文和专利技术都是基于该数据集的研究成果,进一步推动了医学图像处理领域的技术进步和创新。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作