Supplementary Material for: Navigating Gynecological Oncology with Different Versions of ChatGPT: A Transformative Breakthrough or the Next Black Box Challenge?

Name: Supplementary Material for: Navigating Gynecological Oncology with Different Versions of ChatGPT: A Transformative Breakthrough or the Next Black Box Challenge?
Creator: Karger Publishers
Published: 2024-12-21 08:55:16
License: 暂无描述

DataCite Commons2024-12-21 更新2025-01-06 收录

下载链接：

https://karger.figshare.com/articles/dataset/Supplementary_Material_for_Navigating_Gynecological_Oncology_with_Different_Versions_of_ChatGPT_A_Transformative_Breakthrough_or_the_Next_Black_Box_Challenge_/28031981

下载链接

链接失效反馈

官方服务：

资源简介：

Introduction: The study evaluates the performance of large language model (LLM) versions of ChatGPT—ChatGPT-3.5, ChatGPT-4, and ChatGPT-Omni—in addressing inquiries related to the diagnosis and treatment of gynecological cancers, including ovarian, endometrial, and cervical cancers. Methods: A total of 804 questions were equally distributed across four categories: True/False, Multiple-Choice, Open-Ended, and Case-scenario, with each question type representing varying levels of complexity. Performance was assessed using a six-point Likert scale, focusing on accuracy, completeness, and alignment with established clinical guidelines. Results: For True/False queries, ChatGPT-Omni achieved accuracy rates of 100% for easy, 98% for medium, and 97% for complicated questions, higher than ChatGPT-4 (94%, 90%, 85%) and ChatGPT-3.5 (90%, 85%, 80%) (p=0.041, 0.023, 0.014, respectively). In Multiple-Choice, ChatGPT-Omni maintained superior accuracy with 100% for easy, 98% for medium, and 93% for complicated queries, compared to ChatGPT-4 (92%, 88%, 80%) and ChatGPT-3.5 (85%, 80%, 70%) (p=0.035, 0.028, 0.011). For Open-Ended questions, ChatGPT-Omni had mean Likert scores of 5.8 for easy, 5.5 for medium, and 5.2 for complex levels, outperforming ChatGPT-4 (5.4, 5.0, 4.5) and ChatGPT-3.5 (5.0, 4.5, 4.0) (p=0.037, 0.026, 0.015). Similar trends were observed in Case-Scenario questions, where ChatGPT-Omni achieved scores of 5.6, 5.3, and 4.9 for easy, medium, and hard levels, respectively (p=0.017, 0.008, 0.012). Conclusions: ChatGPT-Omni exhibited superior performance in responding to clinical queries related to gynecological cancers, underscoring its potential utility as a decision-support tool and an educational resource in clinical practice.

引言：本研究评估了ChatGPT的大语言模型（LLM）版本——ChatGPT-3.5、ChatGPT-4及ChatGPT-Omni在处理妇科癌症（包括卵巢癌、子宫内膜癌和宫颈癌）诊断与治疗相关咨询时的表现。方法：共设置804个问题，均匀分布于四类题型：判断题、选择题、开放式问题及病例情景题，每种题型代表不同的复杂度水平。研究采用六点李克特量表评估模型表现，重点关注准确性、完整性及与既定临床指南的一致性。结果：在判断题中，ChatGPT-Omni在简单、中等及复杂问题上的准确率分别为100%、98%和97%，高于ChatGPT-4（94%、90%、85%）及ChatGPT-3.5（90%、85%、80%），差异均具有统计学意义（p值分别为0.041、0.023、0.014）。在选择题中，ChatGPT-Omni仍保持优势，其简单、中等及复杂问题准确率为100%、98%、93%，优于ChatGPT-4（92%、88%、80%）及ChatGPT-3.5（85%、80%、70%）（p值分别为0.035、0.028、0.011）。对于开放式问题，ChatGPT-Omni在简单、中等及复杂水平的平均李克特评分分别为5.8、5.5和5.2，表现优于ChatGPT-4（5.4、5.0、4.5）及ChatGPT-3.5（5.0、4.5、4.0）（p值分别为0.037、0.026、0.015）。病例情景题中亦观察到类似趋势：ChatGPT-Omni在简单、中等及困难水平的评分分别为5.6、5.3和4.9（p值分别为0.017、0.008、0.012）。结论：ChatGPT-Omni在回应妇科癌症相关临床咨询时表现卓越，凸显其作为临床实践中决策支持工具及教育资源的潜在价值。

提供机构：

Karger Publishers

创建时间：

2024-12-16