five

Performance of GPT-3.5 and GPT-4 on standardized urology knowledge assessment items in the United States: a descriptive study

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/4EJOCL
下载链接
链接失效反馈
官方服务:
资源简介:
This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States. In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024.
创建时间:
2024-07-02
二维码
社区交流群
二维码
科研交流群
商业服务