COVID-19 image data collection
收藏COVID-19 Image Data Collection
Project Summary
The project aims to build a public open dataset of chest X-ray and CT images of patients positive or suspected of COVID-19, as well as other viral and bacterial pneumonias such as MERS, SARS, and ARDS. Data is collected from public sources and indirectly from hospitals and physicians. All images and data are released publicly on GitHub.
Data and Metadata
- Images: Available in the
imagesdirectory. - Metadata: Available in
metadata.csv. - Label Hierarchy: Labels are arranged hierarchically as depicted in the image
docs/hierarchy.jpg.
Dataset Statistics
- COVID19_Dataset:
- Num_samples=481: Views include PA and AP.
- Num_samples=173: Views include AP Supine.
Annotations
- Lung Bounding Boxes and Chest X-ray Segmentation: Contributed by General Blockchain, Inc. under CC BY 4.0.
- Pneumonia Severity Scores: Available for 94 images under CC BY-SA.
- Generated Lung Segmentations: Available under CC BY-SA.
- Brixia Score: Available for 192 images under CC BY-NC-SA.
- Lung and Other Segmentations: Available for 517 images under CC BY.
Contribution
- Data Submission: Directly to the project following the research protocol.
- Image Extraction from Publications: Help identify publications not already included.
- Data from Other Sites: Data can be scraped from sites like Radiopaedia, SIRM, Eurorad, and Coronacases.
- Bounding Box/Masks: For detection of problematic regions in collected images.
Background
The dataset aims to improve prognostic predictions for triaging and managing patient care during the COVID-19 pandemic. It complements existing public datasets by focusing specifically on COVID-19 chest X-rays and CT scans.
Goal
The goal is to develop AI-based approaches using these images to predict and understand the infection. The models will be released using the open-source Chester AI Radiology Assistant platform.
Expected Outcomes
- Tool Impact: Provide physicians with a digital second opinion and quantitative scores for patient assessments.
- Data Impact: Enable parallel development of tools and rapid local validation of models.
Contact
PI: Joseph Paul Cohen, Postdoctoral Fellow, Mila, University of Montreal.
License
- Images: License specified in metadata.csv (Apache 2.0, CC BY-NC-SA 4.0, CC BY 4.0).
- Metadata, Scripts, Documents: Released under CC BY-NC-SA 4.0.




