Supplementary Material for: Navigating Gynecological Oncology with Different Versions of ChatGPT: A Transformative Breakthrough or the Next Black Box Challenge?
收藏DataCite Commons2024-12-21 更新2025-01-06 收录
下载链接:
https://karger.figshare.com/articles/dataset/Supplementary_Material_for_Navigating_Gynecological_Oncology_with_Different_Versions_of_ChatGPT_A_Transformative_Breakthrough_or_the_Next_Black_Box_Challenge_/28031981
下载链接
链接失效反馈官方服务:
资源简介:
Introduction: The study evaluates the performance of large language model (LLM) versions of ChatGPT—ChatGPT-3.5, ChatGPT-4, and ChatGPT-Omni—in addressing inquiries related to the diagnosis and treatment of gynecological cancers, including ovarian, endometrial, and cervical cancers.
Methods: A total of 804 questions were equally distributed across four categories: True/False, Multiple-Choice, Open-Ended, and Case-scenario, with each question type representing varying levels of complexity. Performance was assessed using a six-point Likert scale, focusing on accuracy, completeness, and alignment with established clinical guidelines.
Results: For True/False queries, ChatGPT-Omni achieved accuracy rates of 100% for easy, 98% for medium, and 97% for complicated questions, higher than ChatGPT-4 (94%, 90%, 85%) and ChatGPT-3.5 (90%, 85%, 80%) (p=0.041, 0.023, 0.014, respectively). In Multiple-Choice, ChatGPT-Omni maintained superior accuracy with 100% for easy, 98% for medium, and 93% for complicated queries, compared to ChatGPT-4 (92%, 88%, 80%) and ChatGPT-3.5 (85%, 80%, 70%) (p=0.035, 0.028, 0.011). For Open-Ended questions, ChatGPT-Omni had mean Likert scores of 5.8 for easy, 5.5 for medium, and 5.2 for complex levels, outperforming ChatGPT-4 (5.4, 5.0, 4.5) and ChatGPT-3.5 (5.0, 4.5, 4.0) (p=0.037, 0.026, 0.015). Similar trends were observed in Case-Scenario questions, where ChatGPT-Omni achieved scores of 5.6, 5.3, and 4.9 for easy, medium, and hard levels, respectively (p=0.017, 0.008, 0.012).
Conclusions: ChatGPT-Omni exhibited superior performance in responding to clinical queries related to gynecological cancers, underscoring its potential utility as a decision-support tool and an educational resource in clinical practice.
引言:本研究评估了ChatGPT的大语言模型(LLM)版本——ChatGPT-3.5、ChatGPT-4及ChatGPT-Omni在处理妇科癌症(包括卵巢癌、子宫内膜癌和宫颈癌)诊断与治疗相关咨询时的表现。
方法:共设置804个问题,均匀分布于四类题型:判断题、选择题、开放式问题及病例情景题,每种题型代表不同的复杂度水平。研究采用六点李克特量表评估模型表现,重点关注准确性、完整性及与既定临床指南的一致性。
结果:在判断题中,ChatGPT-Omni在简单、中等及复杂问题上的准确率分别为100%、98%和97%,高于ChatGPT-4(94%、90%、85%)及ChatGPT-3.5(90%、85%、80%),差异均具有统计学意义(p值分别为0.041、0.023、0.014)。在选择题中,ChatGPT-Omni仍保持优势,其简单、中等及复杂问题准确率为100%、98%、93%,优于ChatGPT-4(92%、88%、80%)及ChatGPT-3.5(85%、80%、70%)(p值分别为0.035、0.028、0.011)。对于开放式问题,ChatGPT-Omni在简单、中等及复杂水平的平均李克特评分分别为5.8、5.5和5.2,表现优于ChatGPT-4(5.4、5.0、4.5)及ChatGPT-3.5(5.0、4.5、4.0)(p值分别为0.037、0.026、0.015)。病例情景题中亦观察到类似趋势:ChatGPT-Omni在简单、中等及困难水平的评分分别为5.6、5.3和4.9(p值分别为0.017、0.008、0.012)。
结论:ChatGPT-Omni在回应妇科癌症相关临床咨询时表现卓越,凸显其作为临床实践中决策支持工具及教育资源的潜在价值。
提供机构:
Karger Publishers
创建时间:
2024-12-16



