Battle of the Bots: Solving Clinical Cases in Osteoarticular Infections with Large Language Models
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/79tbbm7v24
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains the three core datasets underpinning our LLM evaluation study in infectious disease decision‐making:
Multiple‐Choice Answers: Model responses mapped to predefined, guideline‐based answer keys for each clinical question.
Raw LLM Outputs: Unedited textual answers generated by all 15 tested language models.
Likert‐Scale Ratings: Explanation‐quality scores assigned by two blinded, board‐certified reviewers, including consensus‐resolved discrepancies and interrater reliability statistics.
Together, these files enable full replication of our accuracy and explanation‐quality analyses.
创建时间:
2025-05-02



