Prevalence and classification of VTE.
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Prevalence_and_classification_of_VTE_/30582893
下载链接
链接失效反馈官方服务:
资源简介:
Objective
This study evaluated whether active learning can enable efficient classification of venous thromboembolism (VTE) reports using minimal labeled data. In parallel, we assessed whether deep learning (DL) models can achieve substantially superior performance compared to traditional machine learning (ML) models and overcome the limitations associated with small sample sizes and class imbalance in real-world clinical datasets.
Methods
5,839 imaging reports with, of which 1,088 (18.6%) were VTE-positive. Traditional ML models (RF, SVM, SVM with SGD, GBM) were combined with active learning strategies (random sampling, uncertainty-based, word similarity, TF-IDF similarity). DL models (LSTM, multi-kernel 1D-CNN with GloVe, BERT-based models) were also evaluated. F1 scores were used as the performance metric.
Results
Among VTE-positive patients, 65.0% had corresponding ICD-10 codes, indicating frequent under-documentation. ML models with active learning achieved F1 scores of 0.70–0.80, while DL models, particularly LSTM and multi-kernel 1D-CNN with GloVe achieved F1 scores ≥0.94 in a 7-class classification, even under severe class imbalance. Excluding the “No DVT and PE” class for a 6-class classification among VTE-positive cases led to reduced model performance, with the largest decline observed in BioBERT. The average inference time per report ranged from 0.0014 to 0.024 seconds depending on the model architecture, suggesting that the system is feasible for near real-time deployment in clinical settings.
Conclusion
DL models substantially outperformed traditional ML in classifying VTE reports, with high accuracy, acceptable inference time, and robustness to class imbalance. These models hold promise for augmenting clinical workflows, particularly in addressing under-coded but clinically significant VTE cases.
创建时间:
2025-11-10



