Intelligent documentation in medical education: Can AI replace manual case logging?
收藏DataCite Commons2026-05-04 更新2026-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.3tx95x6x0
下载链接
链接失效反馈官方服务:
资源简介:
This study investigates the feasibility of using large language models
(LLMs) to automate procedural case log documentation in radiology
training. We evaluate whether AI can replace manual logging, identify
procedure types most challenging for extraction, and assess integration
into clinical workflows. We retrospectively analyzed 36 ,659 radiology
reports authored by nine interventional radiology residents (2018–2024). A
subset of 414 reports was manually annotated for 39 procedures spanning
vascular diagnosis, vascular intervention, and non-vascular intervention.
Candidate models, Qwen-2.5 and Claude-3.5, were chosen based on privacy,
hardware constraints, and availability, and tested under instruction and
chain-of-thought prompting. A crosswalk baseline using structured exam
codes provided comparison. Performance was measured by sensitivity,
specificity, and F1-score, along with inference time and token efficiency
to estimate operational cost. Both local and commercial LLMs outperformed
the crosswalk benchmark. Qwen-2.5 achieved sensitivities up to 94.19\% and
F1-scores of 86.66 with chain-of-thought prompting, while Claude-3.5-Haiku
reached an F1-score of 86.89 and specificity of 99.29\%. Errors were
concentrated in ambiguous “other” procedures, whereas common procedures
were reliably classified. Chain-of-thought prompting reduced false
positives relative to instruction prompting. Commercial inference
delivered sub-2s latency and concise outputs, while local deployment
traded speed for lower recurring cost. Automation could save more than 35
hours of manual annotation per resident annually. LLMs thus offer a
scalable, accurate, and cost-efficient solution for radiology case log
documentation. Optimizing for procedure-specific challenges and ensuring
seamless integration with existing systems will be essential. Future work
should validate across larger, multi-institution datasets and explore
additional prompting strategies.
提供机构:
Dryad
创建时间:
2026-05-04



