"Benchmark Dataset for Thyroid Eye Disease Patient Counseling Across Five Large Language Models"
收藏DataCite Commons2026-03-02 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/competitions/benchmark-dataset-thyroid-eye-disease-ted-patient-counseling-across-five-large
下载链接
链接失效反馈官方服务:
资源简介:
"Thyroid eye disease (TED) requires timely risk stratification and triage, yet patients increasingly use large language model (LLM) chatbots for guidance. Comparative evidence on the safety and counseling quality of newer web-deployed LLMs in ophthalmic care remains limited. We performed a cross-sectional benchmarking study using a prespecified 35-item Chinese TED patient-counseling question bank and a standardized single-turn protocol to evaluate five publicly accessible LLM chatbot services (Gemini 3 Pro, ChatGPT-5.2, DeepSeek-V3.1, Doubao, and Qwen3-Max). All systems were accessed through official web interfaces in Quzhou, China, during 27\u201329 December 2025, and user-visible model identifiers\/settings were documented. Objective response features (response time, words, characters, paragraphs, sentences, and tables) were quantified, and two blinded experts rated outputs against a guideline-\/consensus-informed reference standard using 5-point Likert scales for Accuracy, Logic, Coherence, Safety, and Content Accessibility. Between-model comparisons and correlation analyses were conducted. Response time differed significantly (P<0.001): Gemini 3 Pro was fastest (32.52\u00b14.53 s) and Doubao slowest (63.33\u00b111.69 s). Output structure also varied substantially, with Doubao generating the longest responses, ChatGPT-5.2 the shortest, and Qwen3-Max the most table-formatted outputs. Significant between-model differences were observed for accuracy, logic, coherence, and content accessibility (all P\u22640.007), but not safety (P=0.828). Longer or slower responses did not consistently indicate higher clinical quality. These findings highlight substantial heterogeneity across contemporary LLMs for TED counseling and support risk-centered, structured response design and further validation in multi-turn, safety-focused ophthalmic triage workflows. "
提供机构:
IEEE DataPort
创建时间:
2026-03-02



