Evaluating AI generated question banks for UK medical licensing examination preparation: An observational study

Figshare2025-08-10 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/_b_Evaluating_AI_generated_question_banks_for_UK_medical_licensing_examination_preparation_An_observational_study_b_/29877167

下载链接

链接失效反馈

官方服务：

资源简介：

Introduction: Artificial intelligence has become increasingly accessible, however its application for undergraduate medical examination preparation is still being investigated. This study determines if ChatGPT-4o is able to generate multiple-choice practice questions with a good level of content accuracy for medical students preparing for the UK medical licensing assessment applied knowledge test (UKMLA AKT).Methods: This is a descriptive observational study. ChatGPT-4o was requested to generate UKMLA applied knowledge test questions using 3 different commands. 40 practice multiple choice questions were generated by each command and evaluated by 3 physicians registered with the General Medical Council. Questions were evaluated for content accuracy, topics in line with the UKMLA content map and question focus with respect to diagnosis, investigation, management and prescribing.Results: ChatGPT-4o generated multiple choice questions with an overall content accuracy of 81.67%. No statistically significant difference was seen between commands for content accuracy and cohen’s kappa statistic demonstrated good inter-rater reliability (0.834, 95% CI, 0.716 to 0.951). Without prompting for topic, 15.0% and 13.75% of questions were generated on the cardiovascular and respiratory system respectively. Specifying topics from the UKMLA content map increased the diversity of question topics generated. Additionally, questions produced predominantly concerned the diagnosis and management of conditions.Discussion: ChatGPT-4o was able to generate multiple-choice practice questions for the UKMLA AKT with a good level of content accuracy. A tendency to produce questions on the cardiovascular and respiratory system was demonstrated. Questions generally focused on diagnosis and management of conditions unless prompted. Careful adjustments of prompts successfully increased the diversity of practice questions generated. This study demonstrated that ChatGPT-4o can be used to generate question banks for medical students preparing for national licensing examinations with close human oversight.

创建时间：

2025-08-10