COVID-19 image data collection
收藏github2020-11-16 更新2024-05-31 收录
下载链接:
https://github.com/sharika-anjum/covid-chestxray-dataset
下载链接
链接失效反馈官方服务:
资源简介:
构建一个公开的开放数据集,包含COVID-19阳性或疑似患者的胸部X光和CT图像,以及其他病毒性和细菌性肺炎的图像。数据来源于公共资源及医院和医生的间接收集,所有图像和数据将在此GitHub仓库公开发布。
An open and publicly available dataset will be constructed, which includes chest X-ray and CT images from patients with confirmed or suspected COVID-19, as well as images of other viral and bacterial pneumonia cases. The data is sourced from public resources as well as indirectly collected from hospitals and physicians, and all images and data will be publicly released through this GitHub repository.
创建时间:
2020-11-16
原始信息汇总
COVID-19 Image Data Collection
Project Summary
- Objective: To build a public open dataset of chest X-ray and CT images of patients positive or suspected of COVID-19, or other viral and bacterial pneumonias (MERS, SARS, ARDS).
- Data Sources: Collected from public sources and indirectly from hospitals and physicians.
- Data Availability: All images and data are publicly released in this GitHub repository.
- Ethics Approval: Approved by the University of Montreals Ethics Committee #CERSES-20-058-D.
Dataset Structure
- Image Views: Includes PA, AP, and AP Supine views.
- Labels: Binary labels (0=No, 1=Yes) arranged in a hierarchy.
- Current Stats:
- COVID19_Dataset (num_samples=481): Views=[PA, AP]
- Labels Distribution: Detailed breakdown for various conditions including COVID-19, Bacterial, Viral, and others.
- COVID19_Dataset (num_samples=173): Views=[AP Supine]
- Labels Distribution: Similar detailed breakdown for conditions.
- COVID19_Dataset (num_samples=481): Views=[PA, AP]
Annotations
- Lung Bounding Boxes and Chest X-ray Segmentation: Contributed by General Blockchain, Inc. under CC BY 4.0.
- Pneumonia Severity Scores: Available for 94 images under CC BY-SA.
- Generated Lung Segmentations: From the paper "Lung Segmentation from Chest X-rays using Variational Data Imputation" under CC BY-SA.
- Brixia Score: Available for 192 images under CC BY-NC-SA.
- Lung and Other Segmentations: For 517 images in COCO and raster formats by v7labs under CC BY.
Contribution
- Data Submission: Direct submission to the project following the research protocol.
- Image Extraction: Assistance in identifying publications not already included.
- Data Sources: Suggestions for additional data sources like Radiopaedia, SIRM, Eurorad, and Coronacases.
- Image Annotation: Contribution of bounding box/masks for detection of problematic regions.
Data Formats
- Chest X-ray: Preferred formats are dcm, jpg, or png.
- CT: Preferred format is nifti (in gzip format), but dcms are also accepted.
Background
- Purpose: To improve prognostic predictions for triaging and managing patient care during the COVID-19 pandemic.
- Existing Datasets: Comparison with large public datasets from NIH, Spain, Stanford, MIT, and Indiana University.
- Unique Features of COVID-19: Discussed in relation to chest X-ray and CT imaging patterns.
Goal
- AI Development: Use images to develop AI-based approaches for predicting and understanding COVID-19 infection.
- Model Release: Plans to release models using the open-source Chester AI Radiology Assistant platform.
- Predictive Tasks: Focus on predicting healthy vs. pneumonia and prognostic/severity predictions.
Expected Outcomes
- Tool Impact: Provide physicians with a digital second opinion and quantitative scores for patient assessment.
- Data Impact: Enable parallel development of tools and rapid local validation of models, and support various research tasks.
Contact
- Principal Investigator (PI): Joseph Paul Cohen, Postdoctoral Fellow, Mila, University of Montreal.
Citations
- Second Paper: Available on arXiv with source code for baselines.
- Initial Paper: Details the COVID-19 image data collection.
License
- Image Licensing: Each image has a specified license in the metadata.csv file.
- Metadata and Documents: Released under a CC BY-NC-SA 4.0 license.
搜集汇总
数据集介绍

构建方式
COVID-19图像数据集的构建旨在收集公开的胸部X光和CT图像,涵盖COVID-19阳性或疑似患者以及其他病毒性和细菌性肺炎病例。数据来源包括公开渠道和医院及医生的间接收集。所有图像和元数据均通过GitHub仓库公开发布,并经过蒙特利尔大学伦理委员会的批准(#CERSES-20-058-D)。数据集的构建过程注重多样性和代表性,确保涵盖不同视角(如PA、AP、AP Supine)和多种病理类型。
特点
该数据集包含481例PA和AP视角的胸部X光图像以及173例AP Supine视角的图像,涵盖了多种病理标签,如COVID-19、细菌性肺炎、病毒性肺炎等。数据集的标签采用层次化结构,便于分类和分析。此外,数据集还提供了肺部边界框、胸部X光分割、肺炎严重程度评分等丰富的注释信息,为深度学习模型的训练和验证提供了高质量的基础。
使用方法
该数据集可用于开发基于AI的COVID-19诊断和预后预测模型。用户可通过GitHub获取图像和元数据,并使用提供的Python数据加载器(如torchxrayvision)进行数据处理。数据集支持多种任务,如健康与肺炎的分类、肺炎严重程度预测等。用户还可根据需求扩展数据集,提交新的图像或注释信息,进一步丰富数据资源。
背景与挑战
背景概述
COVID-19 image data collection 数据集由蒙特利尔大学的 Joseph Paul Cohen 等人于2020年创建,旨在为COVID-19及其他病毒性和细菌性肺炎的胸部X光和CT图像提供一个公开的开放数据集。该数据集通过公开来源和医院、医生的间接收集方式构建,涵盖了COVID-19、MERS、SARS和ARDS等多种肺炎类型的图像数据。该数据集的研究背景源于COVID-19大流行期间对患者预后预测的需求,尤其是在缺乏专门用于计算分析的COVID-19胸部影像数据集的情况下。该数据集的发布为开发基于人工智能的诊断和预后工具提供了重要基础,推动了相关领域的研究进展。
当前挑战
COVID-19 image data collection 数据集在构建和应用过程中面临多重挑战。首先,数据集的构建依赖于多样化的数据来源,包括公开文献和医疗机构,这导致数据质量和标注一致性难以保证。其次,COVID-19影像的视觉特征与其他肺炎类型存在重叠,增加了模型区分不同肺炎类型的难度。此外,数据集中样本分布不均衡,某些类别的样本数量较少,可能影响模型的泛化能力。最后,尽管数据集旨在支持AI模型的开发,但其临床诊断性能仍需通过严格的临床研究验证,以避免误用或过度依赖自动化工具。
常用场景
经典使用场景
COVID-19 image data collection 数据集在医学影像分析领域具有广泛的应用,尤其是在COVID-19疫情的背景下,该数据集为研究人员提供了大量的胸部X光和CT影像数据。这些数据主要用于训练和验证深度学习模型,以区分COVID-19与其他病毒性或细菌性肺炎的影像特征。通过该数据集,研究人员能够开发出更精确的诊断工具,帮助医生在临床实践中快速识别COVID-19患者。
解决学术问题
该数据集解决了COVID-19影像诊断中的关键问题,特别是在缺乏大规模标注数据的情况下,为研究人员提供了丰富的影像资源。通过该数据集,研究人员能够探索COVID-19影像的独特特征,并开发出基于人工智能的诊断模型。这些模型不仅能够提高诊断的准确性,还能为临床决策提供支持,尤其是在资源有限的医疗环境中。
衍生相关工作
基于该数据集,许多经典的研究工作得以展开。例如,研究人员开发了多种深度学习模型,用于COVID-19影像的自动分类和分割。此外,该数据集还催生了一系列关于COVID-19影像特征的研究,这些研究不仅提高了诊断的准确性,还为未来的流行病学研究提供了宝贵的数据支持。
以上内容由遇见数据集搜集并总结生成



