Performance of GPT-4o and o1-Pro on United Kingdom Medical Licensing Assessment-style items: a comparative study

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://doi.org/10.7910/DVN/KVAMLJ

下载链接

链接失效反馈

官方服务：

资源简介：

Large language models (LLMs) such as ChatGPT, and their potential to support autonomous learning for licensing exams like the UK Medical Licensing Assessment (UKMLA), are of growing interest. However, empirical evaluations of artificial intelligence (AI) performance against the UKMLA standard remain limited. : We evaluated the performance of 2 recent ChatGPT versions, GPT-4o and o1-Pro, on a curated set of 374 UKMLA-style single-best-answer items spanning diverse medical specialties. Statistical comparisons using McNemar’s test assessed the significance of differences between the 2 models. Specialties were analyzed to identify domain-specific variation. In addition, 20 image-based items were evaluated.

创建时间：

2025-11-06