Performance of Large Language Model Artificial Intelligence on Dermatology Board Exam Style Questions
收藏Mendeley Data2024-01-31 更新2024-06-26 收录
下载链接:
https://data.mendeley.com/datasets/6j48wcyvxf
下载链接
链接失效反馈官方服务:
资源简介:
Google BARD performed better than ChatGPT in all question genres (General Dermatology, Dermatopathology, Surgery, Pediatric Dermatology). Differences in scores were detected to be statistically significant for the ‘Question Genre’ (p<0.05) but not the ‘Type.' (p>0.05) for ChatGPT and Google BARD. Compared to General Dermatology, performance in Dermatopathology was worse for both ChatGPT and Google BARD.
在全部问题类别(普通皮肤病学、皮肤病理学、外科学、儿童皮肤病学)中,谷歌BARD(Google BARD)的表现均优于ChatGPT。针对ChatGPT与谷歌BARD的对比分析显示,「问题类别(Question Genre)」维度下的得分差异具有统计学显著性(p<0.05),而「题型(Type)」维度下的得分差异则无统计学显著性(p>0.05)。相较于普通皮肤病学任务,两款模型在皮肤病理学任务中的表现均更差。
创建时间:
2024-01-31



