High-fidelity parameter-efficient fine-tuning for joint recognition and linking of diagnoses to ICD-10 in non-standard primary care notes
收藏DataONE2025-10-09 更新2025-10-18 收录
下载链接:
https://search.dataone.org/view/sha256:a20268d01f63742a6a97c2215051b7731ed8a1b880da3cb7e9c9b2e880338b65
下载链接
链接失效反馈官方服务:
资源简介:
Joint recognition and ICD-10 linking of diagnoses in bilingual, non-standard Spanish and Catalan primary-care notes is challenging. We evaluate Parameter-Efficient Fine-Tuning (PEFT) techniques as a resource-conscious alternative to full fine-tuning (FFT) for multi-label clinical text classification. On a corpus of 21,812 Catalan and Spanish clinical notes from Catalonia, we compared the PEFT techniques LoRA, DoRA, LoHA, LoKR, and QLoRA applied to multilingual transformers (BERT, RoBERTa, DistilBERT, mDeBERTa). FFT delivered the best strict Micro-F1 (63.0), but BERT-QLoRA scored 62.2, only 0.8 points lower, while reducing trainable parameters by 67.5% and memory by 33.7%. Training on combined bilingual data consistently improved generalization across individual languages. The small FFT margin was confined to rare labels, indicating limited benefit from updating all parameters. Among PEFT techniques, QLoRA offered the strongest accuracyâefficiency balance; LoRA and DoRA were competitive,..., , # Data from: High-fidelity parameter-efficient fine-tuning for joint recognition and linking of diagnoses to ICD-10 in non-standard primary care notes
Dataset DOI: [10.5061/dryad.7m0cfxq8b](https://doi.org/10.5061/dryad.7m0cfxq8b)
## Description of the data and file structure
This dataset contains supplementary materials supporting the article *âHigh-Fidelity Parameter-Efficient Fine-Tuning for Joint Recognition and Linking of Diagnoses to ICD-10 in Non-Standard Primary Care Notesâ* (JAMIA Open, 2025). The files include trained model checkpoints and the corresponding training and evaluation scripts used in the study. These resources were generated through extensive experiments on a corpus of Spanish and Catalan primary care clinical notes. Raw clinical data are not included due to privacy and legal restrictions. The deposited materials enable reproducibility of the reported results, facilitate inspection of model architectures and hyperparameters, and provide code templates that can ...,
创建时间:
2025-10-10



