five

Dataset for Comparative evaluation and performance of large language models inclinical infection control scenarios: a benchmark study

收藏
Figshare2025-09-17 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Dataset_for_Comparative_evaluation_and_performance_of_large_language_models_inclinical_infection_control_scenarios_a_benchmark_study/30149236
下载链接
链接失效反馈
官方服务:
资源简介:
This cross-sectional benchmarking study evaluated three large language models (LLMs)—GPT-4.1, DeepSeek V3, and Gemini 2.5 Pro Experimental—for supporting infection control nurses (ICNs) in clinical infection prevention and control (IPC) consultations. Using 30 real hospital scenarios from Queen Mary Hospital (Hong Kong), each LLM first generated clarifying questions, then produced recommendations via two prompting methods: open-ended and a structured template. Sixteen experts (ICNs and physicians) rated outputs on coherence, conciseness, usefulness/relevance, evidence quality, and actionability (1–10). GPT-4.1 and DeepSeek V3 outperformed Gemini on composite quality (36.77 ± 7.53; 36.25 ± 8.02 vs. 33.22 ± 7.92; p
创建时间:
2025-09-17
二维码
社区交流群
二维码
科研交流群
商业服务