Dataset for Comparative evaluation and performance of large language models inclinical infection control scenarios: a benchmark study

Figshare2025-09-17 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Dataset_for_Comparative_evaluation_and_performance_of_large_language_models_inclinical_infection_control_scenarios_a_benchmark_study/30149236

下载链接

链接失效反馈

官方服务：

资源简介：

This cross-sectional benchmarking study evaluated three large language models (LLMs)—GPT-4.1, DeepSeek V3, and Gemini 2.5 Pro Experimental—for supporting infection control nurses (ICNs) in clinical infection prevention and control (IPC) consultations. Using 30 real hospital scenarios from Queen Mary Hospital (Hong Kong), each LLM first generated clarifying questions, then produced recommendations via two prompting methods: open-ended and a structured template. Sixteen experts (ICNs and physicians) rated outputs on coherence, conciseness, usefulness/relevance, evidence quality, and actionability (1–10). GPT-4.1 and DeepSeek V3 outperformed Gemini on composite quality (36.77 ± 7.53; 36.25 ± 8.02 vs. 33.22 ± 7.92; p

创建时间：

2025-09-17