Performance of Large Language Model Artificial Intelligence on Dermatology Board Exam Style Questions

Mendeley Data2024-01-31 更新2024-06-26 收录

下载链接：

https://data.mendeley.com/datasets/6j48wcyvxf

下载链接

链接失效反馈

官方服务：

资源简介：

Google BARD performed better than ChatGPT in all question genres (General Dermatology, Dermatopathology, Surgery, Pediatric Dermatology). Differences in scores were detected to be statistically significant for the ‘Question Genre’ (p<0.05) but not the ‘Type.' (p>0.05) for ChatGPT and Google BARD. Compared to General Dermatology, performance in Dermatopathology was worse for both ChatGPT and Google BARD.

在全部问题类别（普通皮肤病学、皮肤病理学、外科学、儿童皮肤病学）中，谷歌BARD（Google BARD）的表现均优于ChatGPT。针对ChatGPT与谷歌BARD的对比分析显示，「问题类别（Question Genre）」维度下的得分差异具有统计学显著性（p<0.05），而「题型（Type）」维度下的得分差异则无统计学显著性（p>0.05）。相较于普通皮肤病学任务，两款模型在皮肤病理学任务中的表现均更差。

创建时间：

2024-01-31

5,000+

优质数据集

54 个

任务类型

进入经典数据集