Comparison between GPT-4 and human raters in grading pharmacy students’ exam responses in Malaysia: a cross-sectional study

DataONE2025-08-20 更新2025-11-01 收录

下载链接：

https://search.dataone.org/view/sha256:38a5104ed05712068d8d6e66b49b7ec13ce9152440d10cf6facf419866270e9b

下载链接

链接失效反馈

官方服务：

资源简介：

Manual grading is time-consuming and prone to inconsistencies, prompting exploration of generative artificial intelligence tools like GPT-4 to enhance efficiency and reliability. This study investigates GPT-4’s potential in grading pharmacy students’ exam responses, focusing on the impact of optimized prompts. Specifically, it evaluated the alignment between GPT-4 and human raters’, assessed GPT-4’s consistency over time, and determined its error rates in grading pharmacy students’ exam responses. We conducted a comparative study using past exam responses graded by university-trained raters vs. GPT-4’s . Responses were randomized before evaluation by GPT-4, accessed via a Plus account between April–September 2024. Prompt optimization was performed on 16 responses, followed by evaluation of 3 prompt delivery methods. We then applied the optimized approach across 4 item types. Intraclass correlation coefficients and error analyses assessed consistency and agreement between GPT-4 and human ratings.

创建时间：

2025-10-29