five

Performance of GPT-4o and o1-Pro on United Kingdom Medical Licensing Assessment-style items: a comparative study

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://doi.org/10.7910/DVN/KVAMLJ
下载链接
链接失效反馈
官方服务:
资源简介:
Large language models (LLMs) such as ChatGPT, and their potential to support autonomous learning for licensing exams like the UK Medical Licensing Assessment (UKMLA), are of growing interest. However, empirical evaluations of artificial intelligence (AI) performance against the UKMLA standard remain limited. : We evaluated the performance of 2 recent ChatGPT versions, GPT-4o and o1-Pro, on a curated set of 374 UKMLA-style single-best-answer items spanning diverse medical specialties. Statistical comparisons using McNemar’s test assessed the significance of differences between the 2 models. Specialties were analyzed to identify domain-specific variation. In addition, 20 image-based items were evaluated.
创建时间:
2025-11-06
二维码
社区交流群
二维码
科研交流群
商业服务