Multimodal Spirometry Dataset for Quality Assessment: Tabular Parameters and Flow-Volume Loop Images

Mendeley Data2026-04-18 收录

下载链接：

https://data.mendeley.com/datasets/ywpvkxnt2z

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset accompanies the study titled "Explainable Machine Learning for Quality Assessment of Spirometry Tests: A Multimodal Framework." It comprises 2,768 spirometry records collected retrospectively from a tertiary care hospital (Yozgat Bozok University, Turkey) between January 2020 and October 2024. Dataset Structure The dataset consists of two components linked by a unique patient identifier (Id): spirometry_tabular_data.csv — A structured tabular file containing 2,768 records with 44 columns. The first column (Id) serves as the unique patient identifier that maps each row to its corresponding flow-volume loop image. The remaining columns include demographic parameters (Gender, Age, Height, Weight), environmental variables (Temperature, Barometric Pressure), spirometric measurements (FEV1, FVC, FEF25-75%, PEF, FET100%, FIVC, FIV1, FEF/FIF50, Vol Extrap) with their PRED, BEST, and %PRED values, and a binary target label (class: 1 = adequate, 0 = inadequate). flow_volume_loop_images/ — A folder containing 2,768 PNG images of flow-volume loops. Each image is named as {Id}.png (e.g., 1.png, 100.png, 2336.png), directly corresponding to the Id column in the tabular file. Linkage: The Id column in the CSV file and the image filenames establish a one-to-one mapping between tabular records and flow-volume loop images. Annotations: The acceptability of spirometric maneuvers (adequate/inadequate) was labeled by expert physicians according to ATS/ERS 2005 guidelines and updated technical recommendations by Stanojevic et al. Class Distribution: Approximately balanced — 52.9% adequate (class 1, n=1,464) and 47.1% inadequate (class 0, n=1,304). Missing Data: The data quality score is 70.6%, with notable missing rates in certain features (e.g., FEV1, FVC, and FEF values exceeding 90% missingness in some subsets). Missing values are encoded as "Na" in the CSV file. Usage: This dataset was used to train and evaluate multiple classification models including tree-based ensemble methods for tabular data, an EfficientNet-B2-based deep learning model for image data, and a late fusion architecture integrating both modalities. Explainability analyses were conducted using SHAP, LIME, and Grad-CAM. Ethical Approval: Protocol No: 2024-GOKAEK-248_18.09.2024_148, Ethics Committee of Yozgat Bozok University. Associated Article: This dataset is associated with a manuscript currently under review. The full citation and DOI of the related article will be added upon publication.

创建时间：

2026-03-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集