five

Data Sheet 1_Limitations of broadly trained LLMs in interpreting orthopedic Walch glenoid classifications.docx

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_1_Limitations_of_broadly_trained_LLMs_in_interpreting_orthopedic_Walch_glenoid_classifications_docx/30002473
下载链接
链接失效反馈
官方服务:
资源简介:
Artificial intelligence (AI) integration in medical practice has grown substantially, with physician use nearly doubling from 38% in 2023 to 68% in 2024. Recent advances in large language models (LLMs) include multimodal inputs, showing potential for medical image interpretation and clinical software integrations. This study evaluated the accuracy of two popular LLMs, Claude 3.5 Sonnet and DeepSeek R1, in interpreting glenoid diagrams using Walch glenoid classification in preoperative shoulder reconstruction applications. Test images included seven black-white Walch glenoid diagrams from Radiopedia. LLMs were accessed via Perplexity.ai without specialized medical training. LLMs were tested across multiple conversation threads with prompt instructions of varying length, ranging from 22 to 864 words for DeepSeek and 127 to 840 words for Claude. Performance differed significantly between models. DeepSeek achieved 44% accuracy (7/16), while Claude had 0% accuracy (0/16). DeepSeek showed a mild positive correlation between instruction length and response accuracy. Common errors across both LLMs included misclassifying A2 as either A1 (32%) or B2 (20%). Results highlight limitations in broadly trained LLMs’ ability to interpret even simplified medical diagrams. DeepSeek’s continuous learning feature and open-source dataset integration exhibited superior accuracy, although it was still insufficient for clinical applications. These limitations stem from LLM training data containing primarily text instead of medical images, creating pattern recognition deficiencies when interpreting visual medical information. Despite AI’s growing adoption in healthcare, this study concludes that as of February 2025, publicly available broadly trained LLMs lack the consistency and accuracy necessary for reliable medical image interpretation, emphasizing the need for specialized training before clinical implementation.
创建时间:
2025-08-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作