Supplementary Material for: Navigating Gynecological Oncology with Different Versions of ChatGPT: A Transformative Breakthrough or the Next Black Box Challenge?

Name: Supplementary Material for: Navigating Gynecological Oncology with Different Versions of ChatGPT: A Transformative Breakthrough or the Next Black Box Challenge?
Creator: Karger Publishers
Published: 2025-05-01 06:41:06
License: 暂无描述

DataCite Commons2025-05-01 更新2025-01-06 收录

下载链接：

https://karger.figshare.com/articles/dataset/Supplementary_Material_for_Navigating_Gynecological_Oncology_with_Different_Versions_of_ChatGPT_A_Transformative_Breakthrough_or_the_Next_Black_Box_Challenge_/28031981/2

下载链接

链接失效反馈

官方服务：

资源简介：

Introduction: The study evaluates the performance of large language model (LLM) versions of ChatGPT—ChatGPT-3.5, ChatGPT-4, and ChatGPT-Omni—in addressing inquiries related to the diagnosis and treatment of gynecological cancers, including ovarian, endometrial, and cervical cancers. Methods: A total of 804 questions were equally distributed across four categories: True/False, Multiple-Choice, Open-Ended, and Case-scenario, with each question type representing varying levels of complexity. Performance was assessed using a six-point Likert scale, focusing on accuracy, completeness, and alignment with established clinical guidelines. Results: For True/False queries, ChatGPT-Omni achieved accuracy rates of 100% for easy, 98% for medium, and 97% for complicated questions, higher than ChatGPT-4 (94%, 90%, 85%) and ChatGPT-3.5 (90%, 85%, 80%) (p=0.041, 0.023, 0.014, respectively). In Multiple-Choice, ChatGPT-Omni maintained superior accuracy with 100% for easy, 98% for medium, and 93% for complicated queries, compared to ChatGPT-4 (92%, 88%, 80%) and ChatGPT-3.5 (85%, 80%, 70%) (p=0.035, 0.028, 0.011). For Open-Ended questions, ChatGPT-Omni had mean Likert scores of 5.8 for easy, 5.5 for medium, and 5.2 for complex levels, outperforming ChatGPT-4 (5.4, 5.0, 4.5) and ChatGPT-3.5 (5.0, 4.5, 4.0) (p=0.037, 0.026, 0.015). Similar trends were observed in Case-Scenario questions, where ChatGPT-Omni achieved scores of 5.6, 5.3, and 4.9 for easy, medium, and hard levels, respectively (p=0.017, 0.008, 0.012). Conclusions: ChatGPT-Omni exhibited superior performance in responding to clinical queries related to gynecological cancers, underscoring its potential utility as a decision-support tool and an educational resource in clinical practice.

引言：本研究评估了ChatGPT的三款大语言模型（Large Language Model，LLM）版本——ChatGPT-3.5、ChatGPT-4及ChatGPT-Omni，在解答与妇科癌症（包括卵巢癌、子宫内膜癌及宫颈癌）的诊断与治疗相关临床问询时的表现。方法：本研究共纳入804道问题，将其平均分配至四类题型：判断题、多项选择题、开放式问答题及案例情境题，每类题型对应不同的复杂度等级。采用六点李克特（Likert）量表对模型表现进行评估，评估维度涵盖准确性、完整性及与现行临床指南的契合度。结果：在判断题问询中，ChatGPT-Omni在简单、中等及复杂难度题目上的准确率分别达100%、98%及97%，优于ChatGPT-4（对应准确率为94%、90%、85%）与ChatGPT-3.5（对应准确率为90%、85%、80%），三组差异均具有统计学意义（p值分别为0.041、0.023、0.014）。在多项选择题题型中，ChatGPT-Omni同样保持更优准确率：简单题目准确率100%，中等题目98%，复杂题目93%；ChatGPT-4对应准确率分别为92%、88%、80%，ChatGPT-3.5则为85%、80%、70%，差异均具有统计学意义（p值分别为0.035、0.028、0.011）。针对开放式问答题，ChatGPT-Omni在简单、中等及复杂难度题目上的平均李克特评分分别为5.8、5.5及5.2，优于ChatGPT-4的5.4、5.0、4.5与ChatGPT-3.5的5.0、4.5、4.0，差异均具有统计学意义（p值分别为0.037、0.026、0.015）。案例情境题中亦呈现相似趋势：ChatGPT-Omni在简单、中等及高难度题目上的评分分别为5.6、5.3及4.9，差异均具有统计学意义（p值分别为0.017、0.008、0.012）。结论：ChatGPT-Omni在应答妇科癌症相关临床问询时表现更优，凸显其作为临床实践中决策支持工具与教育资源的潜在应用价值。

提供机构：

Karger Publishers

创建时间：

2024-12-20