five

"Benchmark Dataset for Thyroid Eye Disease Patient Counseling Across Five Large Language Models"

收藏
DataCite Commons2026-03-02 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/competitions/benchmark-dataset-thyroid-eye-disease-ted-patient-counseling-across-five-large
下载链接
链接失效反馈
官方服务:
资源简介:
"Thyroid eye disease (TED) requires timely risk stratification and triage, yet patients increasingly use large language model (LLM) chatbots for guidance. Comparative evidence on the safety and counseling quality of newer web-deployed LLMs in ophthalmic care remains limited. We performed a cross-sectional benchmarking study using a prespecified 35-item Chinese TED patient-counseling question bank and a standardized single-turn protocol to evaluate five publicly accessible LLM chatbot services (Gemini 3 Pro, ChatGPT-5.2, DeepSeek-V3.1, Doubao, and Qwen3-Max). All systems were accessed through official web interfaces in Quzhou, China, during 27\u201329 December 2025, and user-visible model identifiers\/settings were documented. Objective response features (response time, words, characters, paragraphs, sentences, and tables) were quantified, and two blinded experts rated outputs against a guideline-\/consensus-informed reference standard using 5-point Likert scales for Accuracy, Logic, Coherence, Safety, and Content Accessibility. Between-model comparisons and correlation analyses were conducted. Response time differed significantly (P<0.001): Gemini 3 Pro was fastest (32.52\u00b14.53 s) and Doubao slowest (63.33\u00b111.69 s). Output structure also varied substantially, with Doubao generating the longest responses, ChatGPT-5.2 the shortest, and Qwen3-Max the most table-formatted outputs. Significant between-model differences were observed for accuracy, logic, coherence, and content accessibility (all P\u22640.007), but not safety (P=0.828). Longer or slower responses did not consistently indicate higher clinical quality. These findings highlight substantial heterogeneity across contemporary LLMs for TED counseling and support risk-centered, structured response design and further validation in multi-turn, safety-focused ophthalmic triage workflows. "
提供机构:
IEEE DataPort
创建时间:
2026-03-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作