Fine-tuning foundational models to code diagnoses from veterinary health records
收藏DataCite Commons2026-01-25 更新2026-05-04 收录
下载链接:
https://physionet.org/content/vet-diagnosis-coding/
下载链接
链接失效反馈官方服务:
资源简介:
Veterinary medical records represent a large data resource for application to
veterinary and One Health clinical research efforts. Use of the data is
limited by interoperability challenges including inconsistent data formats and
data siloing. Clinical coding using standardized medical terminologies
enhances the quality of medical records and facilitates their interoperability
with veterinary and human health records from other sites. Previous studies,
such as DeepTag and VetTag, evaluated the application of Natural Language
Processing (NLP) to automate veterinary diagnosis coding, employing long
short-term memory (LSTM) and transformer models to infer a subset of
Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT) diagnosis
codes from free-text clinical notes. This study expands on these efforts by
incorporating all 7,739 distinct SNOMED-CT diagnosis codes recognized by the
Colorado State University (CSU) Veterinary Teaching Hospital (VTH) and by
leveraging the increasing availability of pre-trained language models (LMs).
Twelve freely-available pre-trained LMs (GatorTron, MedicalAI ClinicalBERT,
medAlpaca, VetBERT, PetBERT, BERT, BERT Large, RoBERTa, GPT-2, GPT-2 XL,
DeBERTa V3, and ModernBERT) were fine-tuned on the free-text notes from
246,473 manually-coded veterinary patient visits included in the CSU VTH's
electronic health records (EHRs), which resulted in superior performance
relative to previous efforts. The most accurate results were obtained when
expansive labeled data were used to fine-tune relatively large clinical LMs,
but the study also showed that comparable results can be obtained using more
limited resources and non-clinical LMs. The results of this study contribute
to the improvement of the quality of veterinary EHRs by investigating
accessible methods for automated coding and support both animal and human
health research by paving the way for more integrated and comprehensive health
databases that span species and institutions.
提供机构:
PhysioNet
创建时间:
2026-01-21



