Data from: High-fidelity parameter-efficient fine-tuning for joint recognition and linking of diagnoses to ICD-10 in non-standard primary care notes
收藏DataCite Commons2026-01-29 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.7m0cfxq8b
下载链接
链接失效反馈官方服务:
资源简介:
Joint recognition and ICD-10 linking of diagnoses in bilingual,
non-standard Spanish and Catalan primary-care notes is challenging. We
evaluate Parameter-Efficient Fine-Tuning (PEFT) techniques as a
resource-conscious alternative to full fine-tuning (FFT) for multi-label
clinical text classification. On a corpus of 21,812 Catalan and Spanish
clinical notes from Catalonia, we compared the PEFT techniques LoRA, DoRA,
LoHA, LoKR, and QLoRA applied to multilingual transformers (BERT, RoBERTa,
DistilBERT, mDeBERTa). FFT delivered the best strict Micro-F1 (63.0), but
BERT-QLoRA scored 62.2, only 0.8 points lower, while reducing trainable
parameters by 67.5% and memory by 33.7%. Training on combined bilingual
data consistently improved generalization across individual languages. The
small FFT margin was confined to rare labels, indicating limited benefit
from updating all parameters. Among PEFT techniques, QLoRA offered the
strongest accuracy–efficiency balance; LoRA and DoRA were competitive,
whereas LoHA and LoKR incurred larger losses. Adapter rank mattered: ranks
below 128 sharply degraded Micro-F1. The substantial memory savings enable
deployment on commodity GPUs while delivering performance very close to
FFT. PEFT, particularly QLoRA, supports accurate and memory-efficient
joint entity recognition and ICD-10 linking in multilingual, low-resource
clinical settings.
提供机构:
Dryad
创建时间:
2025-10-09



