five

Supplementary Material for: Navigating Gynecological Oncology with Different Versions of ChatGPT: A Transformative Breakthrough or the Next Black Box Challenge?

收藏
DataCite Commons2025-05-01 更新2025-01-06 收录
下载链接:
https://karger.figshare.com/articles/dataset/Supplementary_Material_for_Navigating_Gynecological_Oncology_with_Different_Versions_of_ChatGPT_A_Transformative_Breakthrough_or_the_Next_Black_Box_Challenge_/28031981/2
下载链接
链接失效反馈
官方服务:
资源简介:
Introduction: The study evaluates the performance of large language model (LLM) versions of ChatGPT—ChatGPT-3.5, ChatGPT-4, and ChatGPT-Omni—in addressing inquiries related to the diagnosis and treatment of gynecological cancers, including ovarian, endometrial, and cervical cancers. Methods: A total of 804 questions were equally distributed across four categories: True/False, Multiple-Choice, Open-Ended, and Case-scenario, with each question type representing varying levels of complexity. Performance was assessed using a six-point Likert scale, focusing on accuracy, completeness, and alignment with established clinical guidelines. Results: For True/False queries, ChatGPT-Omni achieved accuracy rates of 100% for easy, 98% for medium, and 97% for complicated questions, higher than ChatGPT-4 (94%, 90%, 85%) and ChatGPT-3.5 (90%, 85%, 80%) (p=0.041, 0.023, 0.014, respectively). In Multiple-Choice, ChatGPT-Omni maintained superior accuracy with 100% for easy, 98% for medium, and 93% for complicated queries, compared to ChatGPT-4 (92%, 88%, 80%) and ChatGPT-3.5 (85%, 80%, 70%) (p=0.035, 0.028, 0.011). For Open-Ended questions, ChatGPT-Omni had mean Likert scores of 5.8 for easy, 5.5 for medium, and 5.2 for complex levels, outperforming ChatGPT-4 (5.4, 5.0, 4.5) and ChatGPT-3.5 (5.0, 4.5, 4.0) (p=0.037, 0.026, 0.015). Similar trends were observed in Case-Scenario questions, where ChatGPT-Omni achieved scores of 5.6, 5.3, and 4.9 for easy, medium, and hard levels, respectively (p=0.017, 0.008, 0.012). Conclusions: ChatGPT-Omni exhibited superior performance in responding to clinical queries related to gynecological cancers, underscoring its potential utility as a decision-support tool and an educational resource in clinical practice.

引言:本研究评估了ChatGPT的三款大语言模型(Large Language Model,LLM)版本——ChatGPT-3.5、ChatGPT-4及ChatGPT-Omni,在解答与妇科癌症(包括卵巢癌、子宫内膜癌及宫颈癌)的诊断与治疗相关临床问询时的表现。 方法:本研究共纳入804道问题,将其平均分配至四类题型:判断题、多项选择题、开放式问答题及案例情境题,每类题型对应不同的复杂度等级。采用六点李克特(Likert)量表对模型表现进行评估,评估维度涵盖准确性、完整性及与现行临床指南的契合度。 结果:在判断题问询中,ChatGPT-Omni在简单、中等及复杂难度题目上的准确率分别达100%、98%及97%,优于ChatGPT-4(对应准确率为94%、90%、85%)与ChatGPT-3.5(对应准确率为90%、85%、80%),三组差异均具有统计学意义(p值分别为0.041、0.023、0.014)。在多项选择题题型中,ChatGPT-Omni同样保持更优准确率:简单题目准确率100%,中等题目98%,复杂题目93%;ChatGPT-4对应准确率分别为92%、88%、80%,ChatGPT-3.5则为85%、80%、70%,差异均具有统计学意义(p值分别为0.035、0.028、0.011)。针对开放式问答题,ChatGPT-Omni在简单、中等及复杂难度题目上的平均李克特评分分别为5.8、5.5及5.2,优于ChatGPT-4的5.4、5.0、4.5与ChatGPT-3.5的5.0、4.5、4.0,差异均具有统计学意义(p值分别为0.037、0.026、0.015)。案例情境题中亦呈现相似趋势:ChatGPT-Omni在简单、中等及高难度题目上的评分分别为5.6、5.3及4.9,差异均具有统计学意义(p值分别为0.017、0.008、0.012)。 结论:ChatGPT-Omni在应答妇科癌症相关临床问询时表现更优,凸显其作为临床实践中决策支持工具与教育资源的潜在应用价值。
提供机构:
Karger Publishers
创建时间:
2024-12-20
二维码
社区交流群
二维码
科研交流群
商业服务