MFITN-E2NetGA: a transformer-based multi-level fusion framework for multi-class respiratory disease classification from lung sounds

Name: MFITN-E2NetGA: a transformer-based multi-level fusion framework for multi-class respiratory disease classification from lung sounds
Creator: Taylor & Francis
Published: 2025-12-19 03:40:58
License: 暂无描述

DataCite Commons2025-12-19 更新2026-02-09 收录

下载链接：

https://tandf.figshare.com/articles/dataset/MFITN-E2NetGA_a_transformer-based_multi-level_fusion_framework_for_multi-class_respiratory_disease_classification_from_lung_sounds/30686537

下载链接

链接失效反馈

官方服务：

资源简介：

Respiratory diseases are among the most prevalent health challenges worldwide, and timely detection is critical for improving patient outcomes. Auscultation of lung sounds is often the first diagnostic step but heavily relies on the clinician’s expertise, which may lead to variability in assessments. Automating this process can enhance diagnostic efficiency and reliability. This study introduces an advanced Artificial Intelligence (AI) approach to improve lung sound classification by extracting relevant acoustic features and learning their relationships with various respiratory conditions. We propose a novel deep-learning pipeline using pulmonary sound data to classify multiple respiratory disorders. Three complementary audio representations—Mel-Frequency Cepstral Coefficients (MFCCs), Mel Spectrograms, and Cochleograms—are employed to capture time-frequency and perceptual characteristics of lung sounds. A Multi-level Feature Integration Transformer Network (MFITN) is developed to efficiently integrate these heterogeneous features through transformer-based attention mechanisms across abstraction layers. The fused representation is processed by our customized classifier, E2Net-GA—an enhanced EfficientNetV2 model augmented with a Global Attention Mechanism (GAM) and Lightweight Attention Network (LAN) modules. On benchmark datasets, the MFITN-E2Net-GA framework achieved superior performance: for the ICBHI−2017 dataset, accuracy of 98.75%, F1-score of 98.35%, precision of 98.10%, specificity of 97.45%, and recall of 98.75%; for another lung sound dataset, accuracy of 98.95%, F1-score of 98.48%, precision of 98.16%, specificity of 99.36%, and recall of 98.90%. By effectively capturing diverse acoustic features, the proposed multimodal approach enhances diagnostic accuracy, supporting early identification of lung diseases and contributing to improved clinical decision-making and patient care.

提供机构：

Taylor & Francis

创建时间：

2025-11-22

5,000+

优质数据集

54 个

任务类型

进入经典数据集