Multimodal Spirometry Dataset for Quality Assessment: Tabular Parameters and Flow-Volume Loop Images
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/ywpvkxnt2z
下载链接
链接失效反馈官方服务:
资源简介:
This dataset accompanies the study titled "Explainable Machine Learning for Quality Assessment of Spirometry Tests: A Multimodal Framework." It comprises 2,768 spirometry records collected retrospectively from a tertiary care hospital (Yozgat Bozok University, Turkey) between January 2020 and October 2024.
Dataset Structure
The dataset consists of two components linked by a unique patient identifier (Id):
spirometry_tabular_data.csv — A structured tabular file containing 2,768 records with 44 columns. The first column (Id) serves as the unique patient identifier that maps each row to its corresponding flow-volume loop image. The remaining columns include demographic parameters (Gender, Age, Height, Weight), environmental variables (Temperature, Barometric Pressure), spirometric measurements (FEV1, FVC, FEF25-75%, PEF, FET100%, FIVC, FIV1, FEF/FIF50, Vol Extrap) with their PRED, BEST, and %PRED values, and a binary target label (class: 1 = adequate, 0 = inadequate).
flow_volume_loop_images/ — A folder containing 2,768 PNG images of flow-volume loops. Each image is named as {Id}.png (e.g., 1.png, 100.png, 2336.png), directly corresponding to the Id column in the tabular file.
Linkage: The Id column in the CSV file and the image filenames establish a one-to-one mapping between tabular records and flow-volume loop images.
Annotations: The acceptability of spirometric maneuvers (adequate/inadequate) was labeled by expert physicians according to ATS/ERS 2005 guidelines and updated technical recommendations by Stanojevic et al.
Class Distribution: Approximately balanced — 52.9% adequate (class 1, n=1,464) and 47.1% inadequate (class 0, n=1,304).
Missing Data: The data quality score is 70.6%, with notable missing rates in certain features (e.g., FEV1, FVC, and FEF values exceeding 90% missingness in some subsets). Missing values are encoded as "Na" in the CSV file.
Usage: This dataset was used to train and evaluate multiple classification models including tree-based ensemble methods for tabular data, an EfficientNet-B2-based deep learning model for image data, and a late fusion architecture integrating both modalities. Explainability analyses were conducted using SHAP, LIME, and Grad-CAM.
Ethical Approval: Protocol No: 2024-GOKAEK-248_18.09.2024_148, Ethics Committee of Yozgat Bozok University.
Associated Article: This dataset is associated with a manuscript currently under review. The full citation and DOI of the related article will be added upon publication.
创建时间:
2026-03-11



