Comparison between GPT-4 and human raters in grading pharmacy students’ exam responses in Malaysia: a cross-sectional study
收藏DataONE2025-08-20 更新2025-11-01 收录
下载链接:
https://search.dataone.org/view/sha256:38a5104ed05712068d8d6e66b49b7ec13ce9152440d10cf6facf419866270e9b
下载链接
链接失效反馈官方服务:
资源简介:
Manual grading is time-consuming and prone to inconsistencies, prompting exploration of generative artificial intelligence tools like GPT-4 to enhance efficiency and reliability. This study investigates GPT-4’s potential in grading pharmacy students’ exam responses, focusing on the impact of optimized prompts. Specifically, it evaluated the alignment between GPT-4 and human raters’, assessed GPT-4’s consistency over time, and determined its error rates in grading pharmacy students’ exam responses. We conducted a comparative study using past exam responses graded by university-trained raters vs. GPT-4’s . Responses were randomized before evaluation by GPT-4, accessed via a Plus account between April–September 2024. Prompt optimization was performed on 16 responses, followed by evaluation of 3 prompt delivery methods. We then applied the optimized approach across 4 item types. Intraclass correlation coefficients and error analyses assessed consistency and agreement between GPT-4 and human ratings.
创建时间:
2025-10-29



