A Study on Lightweight Method of TCM Structured Large Model Based on Memory-Constrained Pruning
收藏中国科学数据2026-04-16 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.11999/JEIT250909
下载链接
链接失效反馈官方服务:
资源简介:
ObjectiveThe structuring of Traditional Chinese Medicine (TCM) Electronic Medical Records (EMRs) is essential for knowledge discovery, clinical decision support, and intelligent diagnosis. However, two major barriers remain. First, TCM EMRs are primarily unstructured free text and often paired with tongue images, which complicates automated processing. Second, grassroots hospitals usually have limited GPU resources, which restricts the deployment of large pretrained models. This study aims to address these challenges by proposing a lightweight multimodal model based on memory-constrained pruning. The method is designed to preserve near-state-of-the-art accuracy while sharply reducing memory consumption and computation cost, ensuring practical use in resource-limited healthcare settings.MethodsA three-stage architecture is used, comprising an encoder, a multimodal fusion module, and a decoder. For text, a distilled TinyBERT encoder is combined with a BiLSTM-CRF decoder to extract 23 categories of TCM clinical entities, including symptoms, syndromes, prescriptions, and herbs. For images, a ResNet-50 encoder processes tongue diagnosis photographs. A memory-constrained pruning strategy is introduced in which an LSTM decision network observes convolutional feature maps and adaptively prunes redundant channels while retaining key diagnostic information. Gradient reparameterization and dynamic channel grouping improve pruning flexibility, and a reinforcement-learning controller stabilizes training. INT8 mixed-precision quantization, gradient accumulation, and Dynamic Batch Pruning (DBP) further reduce memory usage. A TCM terminology-enhanced lexicon is integrated into the encoder embeddings to improve recognition of rare entities. The system is trained end-to-end on paired EMR-tongue datasets (Fig. 1) to optimize multimodal information flow.Results and DiscussionsExperiments are performed on 10,500 de-identified EMRs paired with tongue images from 21 tertiary hospitals. On an RTX 3060 GPU, the model achieves an F1-score of 91.7%, reduces peak GPU memory to 3.8 GB, and reaches an inference speed of 22 records per second (Table 1). Compared with BERT-Large, memory consumption decreases by 75%, throughput increases 1.75×, and accuracy remains comparable. Ablation studies confirm the contributions of each component. The adaptive attention gating mechanism increases F1 by 2.8% (Table 3). DBP reduces memory usage by 38.7% with minimal accuracy loss and improves performance on EMRs exceeding 5 000 characters. The terminology-enhanced lexicon improves recognition of rare entities such as “blood stasis” by 6.2%. Structured EMR fields also support association rule mining, and the confidence of syndrome-symptom relationships increases by 18%. These findings highlight three observations: (1) multimodal fusion with lightweight design provides clinical advantages over unimodal models; (2) memory-constrained pruning achieves stable channel reduction under strict hardware limits and outperforms magnitude-based pruning; and (3) pruning, quantization, and dynamic batching show strong synergy when jointly designed. The results support the feasibility of deploying high-performing TCM EMR structuring systems in real-world environments with limited computational capacity.ConclusionsThis study proposes a lightweight multimodal framework for structuring TCM EMRs. Memory-constrained pruning, combined with quantization and DBP, substantially compresses the visual encoder while maintaining text-image fusion accuracy. The approach reaches near-state-of-the-art performance with sharply reduced hardware requirements, enabling deployment in regional hospitals and clinics. Beyond efficiency gains, the structured multimodal outputs enhance TCM knowledge graphs and improve downstream tasks such as syndrome classification and treatment recommendation. The framework narrows the gap between powerful pretrained models and limited hardware resources in grassroots institutions and provides a scalable direction for lightweight multimodal NLP in medical informatics. Future work includes integrating modalities such as pulse-wave signals, extending pruning strategies with graph neural networks, and exploring adaptive cross-modal attention to strengthen clinical applicability.
创建时间:
2026-04-16



