Table 1_ChatGPT and reference intervals: a comparative analysis of repeatability in GPT-3.5 Turbo, GPT-4, and GPT-4o.xlsx

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://figshare.com/articles/dataset/Table_1_ChatGPT_and_reference_intervals_a_comparative_analysis_of_repeatability_in_GPT-3_5_Turbo_GPT-4_and_GPT-4o_xlsx/30867521

下载链接

链接失效反馈

官方服务：

资源简介：

BackgroundLarge language models such as ChatGPT hold promise as rapid “curbside consultation” tools in laboratory medicine. However, their ability to generate consistent and clinically reliable reference intervals—particularly in the absence of contextual clinical information—remains uncertain. MethodThis cross-sectional study evaluated whether three versions of ChatGPT (GPT-3.5-Turbo, GPT-4, GPT-4o) maintain repeatable reference-interval outputs when the prompt intentionally omits the interval, using reference interval variability as a stress-test for model consistency. Standardized prompts were submitted through 726,000 chatbot requests. A total of 246,842 reference intervals across 47 laboratory parameters were then analyzed for consistency using the coefficient of variation (CV) and regression models. ResultsOn average, the chatbots exhibited a CV of 26.50% (IQR: 7.35–129.01%) for the lower limit and 15.82% (IQR: 4.50–45.30%) for the upper limit upon repetition. GPT-4 and GPT-4o demonstrated significantly lower CVs compared to GPT-3.5-Turbo. Reference intervals for poorly standardized parameters were particularly inconsistent across lower (β: 0.6; 95% CI: 0.35 to 0.86; p < 0.001) and upper limit (β: 0.5; 95% CI: 0.28 to 0.71; p < 0.001), while unit expressions also showed variability. ConclusionWhile the newer ChatGPT versions tested demonstrate improved repeatability, diagnostically unacceptable variability persists, particularly for poorly standardized analytes. Mitigating this requires thoughtful prompt design (e.g., mandatory inclusion of reference intervals), global harmonization of laboratory standards, further model refinement, and robust regulatory oversight. Until then, AI chatbots should be restricted to professional use and trained to refuse laboratory interpretation when reference intervals are not provided by the user.

创建时间：

2025-12-12